freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	29f4e216f2	Rename the kld_unload event handler to kld_unload_try, and add a new kld_unload event handler which gets invoked after a linker file has been successfully unloaded. The kld_unload and kld_load event handlers are now invoked with the shared linker lock held, while kld_unload_try is invoked with the lock exclusively held. Convert hwpmc(4) to use these event handlers instead of having kern_kldload() and kern_kldunload() invoke hwpmc(4) hooks whenever files are loaded or unloaded. This has no functional effect, but simplifes the linker code somewhat. Reviewed by: jhb	2013-08-24 21:13:38 +00:00
Xin LI	439024135c	MFV r254749: Don't hold dd_lock for long by breaking it when not doing dsl_dir accounting. It is not necessary to hold the lock while manipulating the parent's accounting, because there is no interface for userland to see a consistent picture of both parent and child at the same time anyway. Illumos ZFS issues: 4046 dsl_dataset_t ds_dir->dd_lock is highly contended	2013-08-24 00:42:37 +00:00
Xin LI	00e37ef129	MFV r254747: Fix a panic from dbuf_free_range() from dmu_free_object() while doing zfs receive. This is a regression from FreeBSD r253821. Illumos ZFS issues: 4047 panic from dbuf_free_range() from dmu_free_object() while doing zfs receive	2013-08-24 00:19:26 +00:00
Xin LI	3f0164abf3	MFV r254422: Illumos DTrace issues: 3089 want ::typedef 3094 libctf should support removing a dynamic type 3095 libctf does not validate arrays correctly 3096 libctf does not validate function types correctly	2013-08-23 23:21:24 +00:00
Andriy Gapon	2073a41a42	zfs: do not reject any operations on a pool just because it's a boot pool Unlike the upstream FreeBSD supports booting to all kinds of pools. Requested by: many Tested by: sbruno MFC after: 12 days	2013-08-23 14:43:32 +00:00
Andriy Gapon	05869c0ea7	zfs: inline and remove zfs_vnode_lock It didn't serve any useful purpose, but obscured file and line information useful for debugging. MFC after: 5 days X-MFC with: r254445	2013-08-23 14:40:09 +00:00
Konstantin Belousov	5944de8ecd	Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-22 07:39:53 +00:00
Kenneth D. Merry	7da1a731c6	Expand the use of stat(2) flags to allow storing some Windows/DOS and CIFS file attributes as BSD stat(2) flags. This work is intended to be compatible with ZFS, the Solaris CIFS server's interaction with ZFS, somewhat compatible with MacOS X, and of course compatible with Windows. The Windows attributes that are implemented were chosen based on the attributes that ZFS already supports. The summary of the flags is as follows: UF_SYSTEM: Command line name: "system" or "usystem" ZFS name: XAT_SYSTEM, ZFS_SYSTEM Windows: FILE_ATTRIBUTE_SYSTEM This flag means that the file is used by the operating system. FreeBSD does not enforce any special handling when this flag is set. UF_SPARSE: Command line name: "sparse" or "usparse" ZFS name: XAT_SPARSE, ZFS_SPARSE Windows: FILE_ATTRIBUTE_SPARSE_FILE This flag means that the file is sparse. Although ZFS may modify this in some situations, there is not generally any special handling for this flag. UF_OFFLINE: Command line name: "offline" or "uoffline" ZFS name: XAT_OFFLINE, ZFS_OFFLINE Windows: FILE_ATTRIBUTE_OFFLINE This flag means that the file has been moved to offline storage. FreeBSD does not have any special handling for this flag. UF_REPARSE: Command line name: "reparse" or "ureparse" ZFS name: XAT_REPARSE, ZFS_REPARSE Windows: FILE_ATTRIBUTE_REPARSE_POINT This flag means that the file is a Windows reparse point. ZFS has special handling code for reparse points, but we don't currently have the other supporting infrastructure for them. UF_HIDDEN: Command line name: "hidden" or "uhidden" ZFS name: XAT_HIDDEN, ZFS_HIDDEN Windows: FILE_ATTRIBUTE_HIDDEN This flag means that the file may be excluded from a directory listing if the application honors it. FreeBSD has no special handling for this flag. The name and bit definition for UF_HIDDEN are identical to the definition in MacOS X. UF_READONLY: Command line name: "urdonly", "rdonly", "readonly" ZFS name: XAT_READONLY, ZFS_READONLY Windows: FILE_ATTRIBUTE_READONLY This flag means that the file may not written or appended, but its attributes may be changed. ZFS currently enforces this flag, but Illumos developers have discussed disabling enforcement. The behavior of this flag is different than MacOS X. MacOS X uses UF_IMMUTABLE to represent the DOS readonly permission, but that flag has a stronger meaning than the semantics of DOS readonly permissions. UF_ARCHIVE: Command line name: "uarch", "uarchive" ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE Windows name: FILE_ATTRIBUTE_ARCHIVE The UF_ARCHIVED flag means that the file has changed and needs to be archived. The meaning is same as the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute. msdosfs and ZFS have special handling for this flag. i.e. they will set it when the file changes. sys/param.h: Bump __FreeBSD_version to 1000047 for the addition of new stat(2) flags. chflags.1: Document the new command line flag names (e.g. "system", "hidden") available to the user. ls.1: Reference chflags(1) for a list of file flags and their meanings. strtofflags.c: Implement the mapping between the new command line flag names and new stat(2) flags. chflags.2: Document all of the new stat(2) flags, and explain the intended behavior in a little more detail. Explain how they map to Windows file attributes. Different filesystems behave differently with respect to flags, so warn the application developer to take care when using them. zfs_vnops.c: Add support for getting and setting the UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN, UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags. All of these flags are implemented using attributes that ZFS already supports, so the on-disk format has not changed. ZFS currently doesn't allow setting the UF_REPARSE flag, and we don't really have the other infrastructure to support reparse points. msdosfs_denode.c, msdosfs_vnops.c: Add support for getting and setting UF_HIDDEN, UF_SYSTEM and UF_READONLY in MSDOSFS. It supported SF_ARCHIVED, but this has been changed to be UF_ARCHIVE, which has the same semantics as the DOS archive attribute instead of inverse semantics like SF_ARCHIVED. After discussion with Bruce Evans, change several things in the msdosfs behavior: Use UF_READONLY to indicate whether a file is writeable instead of file permissions, but don't actually enforce it. Refuse to change attributes on the root directory, because it is special in FAT filesystems, but allow most other attribute changes on directories. Don't set the archive attribute on a directory when its modification time is updated. Windows and DOS don't set the archive attribute in that scenario, so we are now bug-for-bug compatible. smbfs_node.c, smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM, UF_READONLY and UF_ARCHIVE in SMBFS. This is similar to changes that Apple has made in their version of SMBFS (as of smb-583.8, posted on opensource.apple.com), but not quite the same. We map SMB_FA_READONLY to UF_READONLY, because UF_READONLY is intended to match the semantics of the DOS readonly flag. The MacOS X code maps both UF_IMMUTABLE and SF_IMMUTABLE to SMB_FA_READONLY, but the immutable flags have stronger meaning than the DOS readonly bit. stat.h: Add definitions for UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY and UF_HIDDEN. The definition of UF_HIDDEN is the same as the MacOS X definition. Add commented-out definitions of UF_COMPRESSED and UF_TRACKED. They are defined in MacOS X (as of 10.8.2), but we do not implement them (yet). ufs_vnops.c: Add support for getting and setting UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY, UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS. Alphabetize the flags that are supported. These new flags are only stored, UFS does not take any action if the flag is set. Sponsored by: Spectra Logic Reviewed by: bde (earlier version)	2013-08-21 23:04:48 +00:00
Justin T. Gibbs	5119608387	Add kstat entries for ZFS compression statistics. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c: Add module lifetime functions to allocate and teardown state data. Report: - Compression attempts. - Buffers found to be empty. - Compression calls that are skipped because the data length is already less than or equal to the minimum block length. - Compression attempts that fail to yield a 12.5% compression ratio. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c: Add calls to the zio_compress.c module's init and fini functions. Sponosred by: Spectra Logic Corporation MFC after: 2 weeks	2013-08-21 19:40:43 +00:00
Justin T. Gibbs	439d30d121	Enhance the ZFS vdev layer to maintain both a logical and a physical minimum allocation size for devices. Use this information to automatically increase ZFS's minimum allocation size for new top-level vdevs to a value that more closely matches the optimum device allocation size. Use GEOM's stripesize attribute, if set, as the physical sector size of the GEOM. Calculate the minimum blocksize of each metaslab class. Use the calculated value instead of SPA_MINBLOCKSIZE (512b) when determining the likelyhood of compression yeilding a reduction in physical space usage. Report devices with sub-optimal block size configuration in "zpool status". Also properly fail attempts to attach devices with a logical block size greater than 8kB, since this will cause corruption to ZFS's label area. Sponsored by: Spectra Logic Corporaion MFC after: 2 weeks Background ========== Many modern devices use physical allocation units that are much larger than the minimum logical allocation size accessible by external commands. Two prevalent examples of this are 512e disk drives (512b logical sector, 4K physical sector) and flash devices (512b logical sector, 4K or larger allocation block size, and 128k or larger erase block size). Operations that modify less than the physical sector size result in a costly read-modify-write or garbage collection sequence on these devices. Simply exporting the true physical sector of the device to ZFS would yield optimal performance, but has two serious drawbacks: 1) Existing pools created with devices that have different logical and physical block sizes, but were configured to use the logical block size (e.g. because the OS version used for pool construction reported the logical block size instead of the physical block size) will suddenly find that the vdev allocation size has increased. This can be easily tolerated for active members of the array, but ZFS would prevent replacement of a vdev with another identical device because it now appears that the smaller allocation size required by the pool is not supported by the new device. 2) The device's physical block size may be too large to be supported by ZFS. The optimal allocation size for the vdev may be quite large. For example, a RAID controller may export a vdev that requires read-modify-write cycles unless accessed using 64k aligned/sized requests. ZFS currently has an 8k minimum block size limit. Reporting both the logical and physical allocation sizes for vdevs solves these problems. A device may be used so long as the logical block size is compatible with the configuration. By comparing the logical and physical block sizes, new configurations can be optimized and administrators can be notified of any existing pools that are sub-optimal. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h: Add the SPA_ASHIFT constant. ZFS currently has a hard upper limit of 13 (8k) for ashift and this constant is used to both document and enforce this limit. sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h: Add the VDEV_AUX_ASHIFT_TOO_BIG error code. Add fields for exporting the configured, logical, and physical ashift to the vdev_stat_t structure. Add VDEV_STAT_VALID() macro which can be used to verify the presence of required vdev_stat_t fields in nvlist data. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c: Provide a SYSCTL_PROC handler for "max_auto_ashift". Since the limit is only referenced long after boot when a create operation occurs, there's no compelling need for it to be a boot time configurable tunable. This also allows the validation code for the max_auto_ashift value to be contained within the sysctl handler. Populate the new fields in the vdev_stat_t structure. Fail vdev opens if the vdev reports an ashift larger than SPA_MAXASHIFT. Propogate vdev_logical_ashift and vdev_physical_ashift between child and parent vdevs as is done for vdev_ashift. In vdev_open(), restore code that fails opens for devices where vdev_ashift grows. This can only happen now if the device's logical ashift grows, which means it really isn't safe to use the device. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c: Update the vdev_open() API so that both logical (what was just ashift before) and physical ashift are reported. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h: Add two new fields, vdev_physical_ashift and vdev_logical_ashift, to vdev_t. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: Add vdev_ashift_optimize(). Call it anytime a new top-level vdev is allocated. cddl/contrib/opensolaris/cmd/zpool/zpool_main.c: Add text for the VDEV_AUX_ASHIFT_TOO_BIG error. For each sub-optimally configured leaf vdev, report configured and native block sizes. cddl/contrib/opensolaris/cmd/zpool/zpool_main.c: cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h: cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c: Introduce a new zpool status: ZPOOL_STATUS_NON_NATIVE_ASHIFT. This status is reported on healthy pools containing vdevs configured to use a block size smaller than their reported physical block size. cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c: Update find_vdev_problem() and supporting functions to provide the full vdev_stat_t structure to problem checking routines, and to allow decent into replacing vdevs. Add a vdev_non_native_ashift() validator which is used on the full vdev tree to check for ZPOOL_STATUS_NON_NATIVE_ASHIFT. cddl/contrib/opensolaris/lib/libzpool/common/kernel.c: cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h: Enhance sysctl userland stubs now that a SYSCTL_PROC handler is used in vdev.c. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h: When the group membership of a metaslab class changes (i.e. when a vdev is added or removed from a pool), walk the group list to determine the smallest block size currently available and record this in the metaslab class. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c: Add the metaslab_class_get_minblocksize() accessor. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c: In zio_compress_data(), take the minimum blocksize as an input parameter instead of assuming SPA_MINBLOCKSIZE. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c: In l2arc_compress_buf(), pass SPA_MINBLOCKSIZE as the minimum blocksize of the device. The l2arc code performs has it's own code for deciding if compression is worth while, so this effectively disables zio_compress_data() from second guessing the original decision. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c: In zio_write_bp_init(), use the minimum blocksize of the normal metaslab class when compressing data.	2013-08-21 04:10:24 +00:00
Xin LI	2640fb93f5	MFV r254421: Illumos ZFS issues: 3996 want a libzfs_core API to rollback to latest snapshot	2013-08-21 00:04:31 +00:00
Xin LI	c21d9cfe3d	MFV r254220: Illumos ZFS issues: 4039 zfs_rename()/zfs_link() needs stronger test for XDEV	2013-08-20 22:31:13 +00:00
Pawel Jakub Dawidek	2c40899ecc	Remove redundant variable.	2013-08-17 14:09:46 +00:00
Mark Johnston	12ede07ab8	Use kld_{load,unload} instead of mod_{load,unload} for the linker file load and unload event handlers added in r254266. Reported by: jhb X-MFC with: r254266	2013-08-14 00:42:21 +00:00
Mark Johnston	8776669b53	FreeBSD's DTrace implementation has a few problems with respect to handling probes declared in a kernel module when that module is unloaded. In particular, * Unloading a module with active SDT probes will cause a panic. [1] * A module's (FBT/SDT) probes aren't destroyed when the module is unloaded; trying to use them after the fact will generally cause a panic. This change fixes both problems by porting the DTrace module load/unload handlers from illumos and registering them with the corresponding EVENTHANDLER(9) handlers. This allows the DTrace framework to destroy all probes defined in a module when that module is unloaded, and to prevent a module unload from proceeding if some of its probes are active. The latter problem has already been fixed for FBT probes by checking lf->nenabled in kern_kldunload(), but moving the check into the DTrace framework generalizes it to all kernel providers and also fixes a race in the current implementation (since a probe may be activated between the check and the call to linker_file_unload()). Additionally, the SDT implementation has been reworked to define SDT providers/probes/argtypes in linker sets rather than using SYSINIT/SYSUNINIT to create and destroy SDT probes when a module is loaded or unloaded. This simplifies things quite a bit since it means that pretty much all of the SDT code can live in sdt.ko, and since it becomes easier to integrate SDT with the DTrace framework. Furthermore, this allows FreeBSD to be quite flexible in that SDT providers spanning multiple modules can be created on the fly when a module is loaded; at the moment it looks like illumos' SDT implementation requires all SDT probes to be statically defined in a single kernel table. PR: 166927, 166926, 166928 Reported by: davide [1] Reviewed by: avg, trociny (earlier version) MFC after: 1 month	2013-08-13 03:10:39 +00:00
Rui Paulo	e009490afc	fasttrap_fork(): unlock the processes before removing the tracepoints. In the future, we'll need to come up with new proc_*() functions that accept locked processes. For now, this prevents postgresql + DTrace from crashing the system. MFC after: 1 month	2013-08-11 00:57:01 +00:00
Attilio Rao	c7aebda8a1	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
Xin LI	43667c1f68	MFV r254079: Illumos ZFS issues: 3957 ztest should update the cachefile before killing itself 3958 multiple scans can lead to partial resilvering 3959 ddt entries are not always resilvered 3960 dsl_scan can skip over dedup-ed blocks if physical birth != logical birth 3961 freed gang blocks are not resilvered and can cause pool to suspend 3962 ztest should print out zfs debug buffer before exiting	2013-08-08 23:38:31 +00:00
Xin LI	9d2f243aa6	MFV r254071: Fix a regression introduced by fix for Illumos bug #3834. Quote from Matthew Ahrens on the Illumos issue: ztest fails this assertion because ztest_dmu_read_write() does dmu_tx_hold_free(tx, bigobj, bigoff, bigsize); and then dmu_object_set_checksum(os, bigobj, (enum zio_checksum)ztest_random_dsl_prop(ZFS_PROP_CHECKSUM), tx); If the region to free is past the end of the file, the DMU assumes that there will be nothing to do for this object. However, ztest does set_checksum(), which must modify the dnode. The fix is for ztest to also call dmu_tx_hold_bonus(tx, bigobj); so we can account for the dirty data associated with setting the checksum Illumos ZFS issues: 3955 ztest failure: assertion refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite	2013-08-07 22:21:00 +00:00
Xin LI	4f7b34578b	MFV r254070: Merge vendor bugfix for ZFS test suite that triggers false positives. Illumos ZFS issues: 3949 ztest fault injection should avoid resilvering devices 3950 ztest: deadman fires when we're doing a scan 3951 ztest hang when running dedup test 3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together	2013-08-07 21:16:14 +00:00
Xin LI	c668ff330e	MFV r254011: This change have no effect to FreeBSD but integrated for completeness. Illumos ZFS issues: 348 ZFS should handle DKIOCGMEDIAINFOEXT failure	2013-08-06 21:36:01 +00:00
Alexander Motin	d9aca4ed74	Block reporting of ZFS features for suspended pools. Before executing any subcommand, zpool tool fetches pools configuration from the kernel. Before features support was added, kernel was regenerating that configuration based on data always present in memory. Unfortunately, pool features list and activity counters are not such. They are stored in ZAP, that normally resides in ARC, but under heavy memory pressure may be swapped out. If pool is suspended at this point, there is no way to recover it back since any zpool command will stuck. This change has one predictable flaw: `zpool upgrade` always wish to upgrade suspended pools, but fortunately it can't do it due to the suspension.	2013-08-06 14:41:41 +00:00
Alexander Motin	f8dcf872c4	Disable r252840 when ZFS TRIM is enabled (vfs.zfs.trim.enabled=1) and really disable TRIM otherwise. r252840 (illumos bug 3836) is based on assumption that zio_free_sync() has no lock dependencies and should complete immediately. Unfortunately, with our TRIM implementation that is not true due to ZIO_STAGE_VDEV_IO_START added to the ZIO_FREE_PIPELINE, which, while not really accessing devices, still acquires SCL_ZIO lock for read to be sure devices won't disappear. When TRIM is disabled, this patch enables direct free execution from r252840 and removes ZIO_STAGE_VDEV_IO_START and ZIO_STAGE_VDEV_IO_ASSESS stages from the pipeline to avoid lock acquisition. Otherwise it queues free request as it was before r252840.	2013-08-06 14:30:28 +00:00
Alexander Motin	526bb4af8a	Make `zpool clear` to reopen also reconnected cache and spare devices. Since `zpool status` reports about such kinds of errors, it is strange that they are not cleared by `zpool clear`.	2013-08-06 14:23:33 +00:00
Alexander Motin	ad727e8d64	Make ZFS to use separate thread to handle SPA_ASYNC_REMOVE async events. Existing async thread is running only on successfull spa_sync() completion, that is impossible in case of pool loosing required (last) disk(s). That indefinite delay of SPA_ASYNC_REMOVE processing made ZFS to not close the lost disks, preventing GEOM/CAM from destroying devices and reusing names on later disk reattach. In earlier version of the patch I've tried to just run existing thread immediately, unrelated to spa_sync() completion, but that exposed number of situations where it could stuck due to locks held by stuck spa_sync(), that are required for other kinds of async events. Experiments with OpenIndiana snapshot confirmed that they also have this issue with lost disks reattach.	2013-08-06 14:20:41 +00:00
Attilio Rao	be99683637	Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib	2013-08-05 08:55:35 +00:00
Attilio Rao	3b6714cacb	The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho	2013-08-04 21:07:24 +00:00
Steven Hartland	e44e975c1b	zfs_ioc_rename should not leave the value of zc_name passed in via zc altered on return. MFC after: 1 week	2013-08-04 11:38:08 +00:00
Xin LI	bd3d1456a5	MFV r253783: Skip eviction step of processing free records when doing ZFS receive to avoid the expensive search operation of non-existent dbufs in dn_dbufs. Illumos ZFS issues: 3834 incremental replication of 'holey' file systems is slow MFC after: 2 weeks	2013-07-30 21:35:02 +00:00
Xin LI	1c4ead73c6	MFV r253782: To quote Illumos issue #3888: When 'zfs recv -F' is used with an incremental recv it rolls back any changes made since the last snapshot in case new changes were made to the file system while the recv is in progress (without -F the recv would fail when it does it's final check to commit the recv-ed data as the recv-ed data conflicts with the newly written data). However, if there is a snapshot taken after the recv began rolling back to the 'latest' snapshot will not help and the recv will still fail. 'zfs recv -F' should be extended to destroy any snapshots created since the source snapshot when finishing the recv (effectively rolling back through all snapshots, instead of just to the latest snapshot). Illumos ZFS issues: 3888 zfs recv -F should destroy any snapshots created since the incremental source MFC after: 2 weeks	2013-07-30 21:20:12 +00:00
Xin LI	d637247e1f	MFV r253781 + r253871: Illumos ZFS issues: 3894 zfs should not allow snapshot of inconsistent dataset MFC after: 2 weeks	2013-07-30 21:02:09 +00:00
Xin LI	44e362e207	MFV r253780: To quote Illumos #3875: The problem here is that if we ever end up in the error path, we drop the locks protecting access to the zfsvfs_t prior to forcibly unmounting the filesystem. Because z_os is NULL, any thread that had already picked up the zfsvfs_t and was sitting in ZFS_ENTER() when we dropped our locks in zfs_resume_fs() will now acquire the lock, attempt to use z_os, and panic. Illumos ZFS issues: 3875 panic in zfs_root() after failed rollback MFC after: 2 weeks	2013-07-30 20:37:32 +00:00
Alexander Motin	ec4d2e0d96	Allow three IOCTLs to be used on suspended pool, restoring state that existed before IOCTL code refactoring merged change 4445fffb from illumos at r248571. This change allows `zpool clear` to be used again to recover suspended pool. It seems the only was supposed by the code to restore pool operation after reconnecting lost disks that were required for data completeness. There are still cases where `zpool clear` command can just safely stuck due to deadlocks inside ZFS kernel part, but probably that is better then having no chances to recover at all.	2013-07-30 14:50:44 +00:00
Alexander Motin	698cd997d6	Partially close race between calls of orphan() method from GEOM and close() method from ZFS core, that reliably causes use-after-free panic if SSD vdev detached during inititial erase.	2013-07-28 20:07:34 +00:00
Alexander Motin	ffacde9be5	Following r222950, revert unintentional change cls -> class in argument name in r245264. Aside from non-uniformity, that again confused C++ compilers.	2013-07-25 08:41:22 +00:00
Andriy Gapon	f66c1f6482	zfs module: perform cleanup during shutdown in addition to module unload - move init and fini code into separate functions (like it is done upstream) - invoke fini code via shutdown_post_sync event hook This should make zfs close its underlying devices during shutdown, which may be important for their drivers. MFC after: 20 days	2013-07-24 09:59:16 +00:00
Andriy Gapon	886dbd270f	zfs: move vnode creation from zfs_znode_cache_constructor to zfs_znode_alloc All other places where a znode is allocated do not need z_vnode at all. These are: - zfs_create_share_dir - zfs_create_fs This chnage ensures two things: - VN_LOCK_ASHARE is not erroneously called for VFIFO vnodes - vn_lock is called on a fully constructed vnode with correct v_ops The change also allows to make zfs_znode_cache_constructor a normal kmem_cache constructor again (as it is in upstream). This allows to avoid a problem where zfs_znode_cache_destructor may be called on un-constructed znodes. MFC after: 17 days	2013-07-24 09:15:59 +00:00
Xin LI	c92bc5e996	Manually merge part of vendor import r238583 from Illumos. Illumos changeset: 13680:2bd022a765e2 Illumos ZFS issue: 2671 zpool import should not fail if vdev ashift has increased MFC after: 3 days	2013-07-18 00:22:42 +00:00
Andriy Gapon	37b8b2d4d8	dtrace/fasttrap: install hook functions only after all data is initialized Sponsored by: HybridCluster MFC after: 7 days	2013-07-09 09:05:00 +00:00
Andriy Gapon	9c1f50af0a	zfs: try to properly handle i/o errors in mappedread_sf Unconditionally freeing a page is not good, especially if it is the page that was wired by the caller. The checks are picked up from kern_sendfile. MFC after: 3 weeks	2013-07-09 08:47:11 +00:00
Andriy Gapon	78ed7a7855	zfs: load zpool.cache after a root fs is mounted MFC after: 3 weeks	2013-07-09 08:37:42 +00:00
Mark Johnston	46d27dbb38	Hide references to mod_lock. In FreeBSD it is always acquired with the provider lock held, so its use has no effect.	2013-07-05 22:42:10 +00:00
Martin Matuska	12df7d65b0	MFV r252839: Quoting illumos issue #3836: Currently zio_free() always puts the zio on a list for subsequent processing by zio_free_sync(). This is only necessary for frees that might need to issue reads (gang and dedup blocks). By processing the majority of the frees as we encounter them, we reduce the amount of time that the spa_sync() thread spends burning CPU and not doing any i/o, thus increasing the overall write throughput of the system. Illumos ZFS issues: 3836 zio_free() can be processed immediately in the common case MFC after: 1 week	2013-07-05 21:29:59 +00:00
Mark Johnston	0022f867b4	Be sure to destory the fasttrap cleanup mutex when unloading the fasttrap module. This should be MFCed with r250953.	2013-07-01 23:12:59 +00:00
Robert Millan	2592710c47	Enable kernel-specific code for FreeBSD also on other systems that use the kernel of FreeBSD. Reviewed by: pjd	2013-06-30 23:14:55 +00:00
Steven Hartland	baa0b41221	Remove invalid ASSERT which causes a panic on zfs renames when run with ASSERTS. Removal was missed in merge of illumos 3464 (r248571) MFC after: 2 days	2013-06-29 23:15:45 +00:00
Martin Matuska	f82ca5238a	Unbreak "zfs jail" and "zfs unjail" (broken since r248571) I missed to register zfs_ioc_jail and zfs_ioc_unjail as legacy ioctl's with the new zfs_ioctl_register_legacy() function. These operations do not modify pools or datasets so there is no need to log them to pool history. Reported by: Alexander Leidinger <ale@FreeBSD.org> and others on current@ MFC after: 3 days	2013-06-29 16:45:37 +00:00
Gavin Atkinson	af582854d8	Don't try to re-insert an already present but invalid page. This could happen if a thread doing a page-in loses a ZFS range lock race to a thread writing to the same range This fixes "panic: vm_page_alloc: pindex already allocated" in http://docs.FreeBSD.org/cgi/mid.cgi?1372165971.96049.42.camel Submitted by: avg MFC after: 1 week	2013-06-28 07:51:12 +00:00
Xin LI	e33806a54a	MFV r252215: Restore a previous behavior before r251646, where when destructing ZFS snapshot, the ioctl would return ENOENT when it hit any of them in the errlist (the new behavior was only return ENOENT when all returns error). Illumos ZFS issues: 3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl MFC after: 1 week	2013-06-25 22:14:32 +00:00
Steven Hartland	9446debe6b	Fix intermittent ZFS lock panic when kernel is compiled with debugging caused by access of uninitialized smlock in mmutex_init. MFC after: 1 week	2013-06-21 15:47:10 +00:00
Steven Hartland	5f921c5911	Fixed import of destroyed ZFS pools failing due to vdev_geom incorrectly preventing config loads from devices associated with destroyed pools. Reviewed by: avg MFC after: 1 week	2013-06-21 12:02:09 +00:00
Xin LI	9625321547	MFV r251644: Poor ZFS send / receive performance due to snapshot hold / release processing (by smh@) Illumos ZFS issues: 3740 Poor ZFS send / receive performance due to snapshot hold / release processing MFC after: 2 weeks	2013-06-12 07:07:06 +00:00
Xin LI	ed8fd1989f	MFV r251626: ZFS event processing should work on R/O root filesystems Illumos ZFS issues: 3749 zfs event processing should work on R/O root filesystems MFC after: 2 weeks	2013-06-11 19:35:44 +00:00
Xin LI	3b245f3ee1	MFV r251624: txg commit callbacks don't work Illumos ZFS issues: 3747 txg commit callbacks don't work MFC after: 2 weeks	2013-06-11 19:29:31 +00:00
Xin LI	3f3a9cac29	MFV r251622: ZFS shouldn't ignore errors unmounting snapshots Illumos ZFS issues: 3744 zfs shouldn't ignore errors unmounting snapshots MFC after: 2 weeks	2013-06-11 19:22:20 +00:00
Xin LI	57e06a1a63	MFV r251621: ZFS needs a refcount audit Illumos ZFS issues: 3741 zfs needs a refcount audit MFC after: 2 weeks	2013-06-11 19:16:14 +00:00
Xin LI	a91afe8a8d	MFV r251620: ZFS comments need cleaner, more consistent style Illumos ZFS issues: 3741 zfs comments need cleaner, more consistent style MFC after: 2 weeks	2013-06-11 19:12:06 +00:00
Xin LI	4acaabea05	MFV r251619: ZFS needs better comments. Illumos ZFS issues: 3741 zfs needs better comments MFC after: 2 weeks	2013-06-11 19:02:36 +00:00
Xin LI	9e43a32a5c	MFV r251519: * Illumos ZFS issue #3805 arc shouldn't cache freed blocks Quote from the Illumos issue: ZFS should proactively evict freed blocks from the cache. Even though these freed blocks will never be used again, and thus will eventually be evicted, this causes us to use memory inefficiently for 2 reasons: 1. A block that is freed has no chance of being accessed again, but will be kept in memory preferentially to a block that was accessed before it (and is thus older) but has not been freed and thus has at least some chance of being accessed again. 2. We partition the ARC into several buckets: user data that has been accessed only once (MRU) metadata that has been accessed only once (MRU) user data that has been accessed more than once (MFU) metadata that has been accessed more than once (MFU) The user data vs metadata split is somewhat arbitrary, and the primary control on how much memory is used to cache data vs metadata is to simply try to keep the proportion the same as it has been in the past (each bucket "evicts against" itself). The secondary control is to evict data before evicting metadata. Because of this bucketing, we may end up with one bucket mostly containing freed blocks that are very old, while another bucket has more recently accessed, still-allocated blocks. Data in the useful bucket (with still-allocated blocks) may be evicted in preference to data in the useless bucket (with old, freed blocks). On dcenter, we saw that the MFU metadata bucket was 230MB, while the MFU data bucket was 27GB and the MRU metadata bucket was 256GB. However, the vast majority of data in the MRU metadata bucket (256GB) was freed blocks, and thus useless. Meanwhile, the MFU metadata bucket (230MB) was constantly evicting useful blocks that will be soon needed. The problem of cache segmentation is a larger problem that needs more investigation. However, if we stop caching freed blocks, it should reduce the impact of this more fundamental issue. MFC after: 2 weeks	2013-06-08 09:11:20 +00:00
Xin LI	ca8a27d4b1	MFV r251474: * Illumos zfs issue #3137 L2ARC compression Whether or not to compress buffers entering the L2ARC is controlled by "compression" setting on the dataset, when compression is not "off", L2ARC compression is enabled. The compress method is always LZ4 for L2ARC when enabled because it works best for the scenario. MFC after: 2 weeks	2013-06-06 23:21:41 +00:00
Mark Johnston	427bc75e19	The fasttrap provider cleans up probes asynchronously when a process with USDT probes exits. This was previously done with a callout; however, it is possible to sleep while holding the DTrace mutexes, so a panic will occur on INVARIANTS kernels if the callout handler can't immediately acquire one of these mutexes. This panic will be frequently triggered on systems where a USDT-enabled program (perl, for instance) is often run. This revision changes the fasttrap cleanup mechanism so that a dedicated thread is used instead of a callout. The old behaviour is otherwise preserved. Reviewed by: rpaulo MFC after: 1 month	2013-05-24 03:29:32 +00:00
Mark Johnston	09e6105ff4	Bring back part of r249367 by adding DTrace's temporal option, which allows users to guarantee that the output of DTrace scripts will be time-ordered. This option is enabled by adding the line #pragma D option temporal to the beginning of a script, or by adding '-x temporal' to the arguments of dtrace(1). This change fixes a bug in the original port of the temporal option. This bug was causing some assertions to fail, so they had been disabled; in this revision the assertions are working properly and are enabled. The DTrace version number has been bumped from 1.9.0 to 1.9.1 to reflect the language change that's being introduced. This change corresponds to part of illumos-gate commit e5803b76927480: 3021 option for time-ordered output from dtrace(1M) Reviewed by: pfg Obtained from: illumos MFC after: 1 month	2013-05-12 16:26:33 +00:00
Davide Italiano	a28d25df31	In case ZFS doesn't use UMA for buffers there's no need to waste memory creating zones that will remain empty. Reviewed by: pjd	2013-05-01 17:34:44 +00:00
Steven Hartland	562a9d583b	Changed ZFS TRIM sysctl from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled Enabled ZFS TRIM by default Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-04-26 11:24:20 +00:00
Martin Matuska	95b6497f5e	MFV r249857: Merge vendor bugfix for a possible deadlock related to async destroy and improve write performance by introducing a new lock protecting tx_open_txg. Illumos ZFS issues: 3642 dsl_scan_active() should not issue I/O to determine if async destroying is active 3643 txg_delay should not hold the tc_lock MFC after: 1 week	2013-04-24 21:21:03 +00:00
Martin Matuska	90eafc0bb8	The zfs synctask code restructuring introduced a new bug that makes it impossible to set quota and reservation on pools lower than version 22. Problem has been reported and a solution discussed with vendor. Illumos ZFS issues: 3739 cannot set zfs quota or reservation on pool version < 22 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reported by: Steve Wills <swills@FreeBSD.org> MFC after: 3 days	2013-04-23 06:28:35 +00:00
Pedro F. Giffuni	03836978be	DTrace: Revert r249367 The following change from illumos brought caused DTrace to pause in an interactive environment: 3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider This was not detected during testing because it doesn't affect scripts. We shouldn't be changing the environment, especially since the LD_NOLAZYLOAD option doesn't apply to our (GNU) ld. Unfortunately the change from upstream was made in such a way that it is very difficult to separate this change from the others so, at least for now, it's better to just revert everything. Reference: https://www.illumos.org/issues/3026 Reported by: Navdeep Parhar and Mark Johnston	2013-04-17 02:20:17 +00:00
Pedro F. Giffuni	ddd5b8e9b4	DTrace: option for time-ordered output Merge changes from illumos: 3021 option for time-ordered output from dtrace(1M) 3022 DTrace: keys should not affect the sort order when sorting by value 3023 it should be possible to dereference dynamic variables 3024 D integer narrowing needs some work 3025 register leak in D code generation 3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider This brings yet another feature implemented in upstream DTrace. A complete description is available here: http://dtrace.org/blogs/ahl/2012/07/28/my-new-dtrace-favorite/ This change bumps the DT_VERS_* number to 1.9.1 in accordance to what is done in illumos. This change was somewhat complicated because upstream is mixed many changes in an individual commit and some of the tests don't really apply to us. There are also appear to be differences in timestamping with Solaris so we had to workaround some assertions making sure no regression happened. Special thanks to Fabian Keil for changes and testing. Illumos Revisions: 13758:23432da34147 Reference: https://www.illumos.org/issues/3021 https://www.illumos.org/issues/3022 https://www.illumos.org/issues/3023 https://www.illumos.org/issues/3024 https://www.illumos.org/issues/3025 https://www.illumos.org/issues/1694 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 months	2013-04-11 16:24:36 +00:00
Martin Matuska	2cb0c5e424	MFV r249354: Merge bugfixes accepted and integrated by vendor. Underlying problems have been reported by us and fixed in r240942 and r249196. Illumos ZFS issues: 3645 dmu_send_impl: possibilty of pool hold leak 3692 Panic on zfs receive of a recursive deduplicated stream MFC after: 8 days	2013-04-11 07:40:30 +00:00
Martin Matuska	86161c3eeb	Cast to (void *)(uintptr_t) on copyout and copyin of zfs_iocparm_t.zfs_cmd MFC after: 9 days	2013-04-10 07:01:17 +00:00
Martin Matuska	83b4af1142	ZFS expects a copyout of zfs_cmd_t on an ioctl error. Our sys_ioctl() doesn't copyout in this case. To solve this issue a new struct zfs_iocparm_t is introduced consisting of: - zfs_ioctl_version (future backwards compatibility purposes) - user space pointer to zfs_cmd_t (copyin and copyout) - size of zfs_cmd_t (verification purposes) The copyin and copyout of zfs_cmd_t is now done the illumos (vendor) way what makes porting of new changes easier and ensures correct behavior if returning an error. MFC after: 10 days	2013-04-09 22:27:44 +00:00
Martin Matuska	a548fef5dc	MFV r249186: Do not list read-only pools in zpool.cache Reduce diff against vendor in unused vdev_disk.c Illumos ZFS issues: 3639 zpool.cache should skip over readonly pools 3640 want automatic devid updates MFC after: 1 week	2013-04-06 17:24:00 +00:00
Martin Matuska	95e7edacfe	MFV r248660: Merge vendor change - modify time processing in deadman thread. Illumos ZFS issues: 3618 ::zio dcmd does not show timestamp data MFC after: 3 weeks	2013-04-06 17:15:47 +00:00
Martin Matuska	367437755d	Provide a fix for kernel panic if receiving recursive deduplicated streams. Problem reported to vendor. Illumos ZFS issues: 3692 Panic on zfs receive of a recursive deduplicated stream MFC after: 2 weeks	2013-04-06 11:54:41 +00:00
Martin Matuska	f1b5c26470	MFV r248217: Merge change from vendor to reduce diff only. ZFS dtrace probes are not supported on FreeBSD yet. Illumos ZFS issues: 3598 want to dtrace when errors are generated in zfs MFC after: 3 weeks	2013-04-06 10:39:38 +00:00
Martin Matuska	bae7bccf39	MFV r242816: Import vendor change to reduce diff, no effect on FreeBSD. Illumos ZFS issues: 3517 importing pool with autoreplace=on and "hole" vdevs crashes syseventd	2013-04-06 08:21:37 +00:00
Andriy Gapon	9ff9b984c9	spa_open_common: fix argument to zvol_create_minors Prior to r248571 spa_open was always called with a bare pool name, but now it is called with a dataset name instead (spa_lookup handles that). So, when a ZFS root is mounted spa_open is called with a name of a root dataset, which can very well be different from the pool name. But zvol_create_minors should be called with the pool name, because it performs a recursive traversal of all datasets under the name to find all those that are volumes. MFC after: 7 days	2013-04-03 11:06:26 +00:00
Martin Matuska	03863a70e1	Fix possible pool hold leak in dmu_send_impl() Problem reported to vendor: https://www.illumos.org/issues/3645 Reported by: Andriy Gapon <avg@FreeBSD.org> MFC after: 15 days	2013-04-03 09:52:30 +00:00
Martin Matuska	41451f4a0e	Do not check against uninitialized rc and comment out vendor code MFC after: 16 days	2013-04-02 08:15:39 +00:00
Pedro F. Giffuni	9f4c7ba460	Dtrace: enablings on defunct providers prevent providers from unregistering Merge change from illumos: 1368 enablings on defunct providers prevent providers from unregistering We try to address some underlying differences between the Solaris and FreeBSD implementations: dtrace_attach() / dtrace_detach() are currently unimplemented in FreeBSD but the new code from illumos makes use of taskq so some adaptations were made to dtrace_open() and dtrace_close() to handle them appropriately. Illumos Revision: r13430:8e6add739e38 Reference: https://www.illumos.org/issues/1368 Reviewed by: gnn Tested by: Fabian Keil Obtained from: Illumos MFC after: 3 weeks	2013-04-01 19:13:46 +00:00
Martin Matuska	20547d41f8	Call dmu_snapshot_list_next() in zvol.c with dsl_pool_config lock held Submitted by: Andriy Gapon <avg@FreeBSD.org> MFC after: 17 days	2013-04-01 16:14:57 +00:00
Pedro F. Giffuni	f5678b698a	Dtrace: dtrace.c erroneously checks for memory alignment on amd64. Merge change from illumos: 3511 dtrace.c erroneously checks for memory alignment on amd64 Illumos Revision: c93cc65 Reference: https://www.illumos.org/issues/3511 Obtained from: Illumos MFC after: 3 weeks	2013-03-26 20:17:08 +00:00
Pedro F. Giffuni	5472787377	Dtrace: Add SUN MDB-like type-aware print() action. Merge change from illumos: 1694 Add type-aware print() action This is a very nice feature implemented in upstream Dtrace. A complete description is available here: http://dtrace.org/blogs/eschrock/2011/10/26/your-mdb-fell-into-my-dtrace/ This change bumps the DT_VERS_* number to 1.9.0 in accordance to what is done in illumos. While here also include some minor cleanups to ease further merging and appease clang with a fix by Fabian Keil. Illumos Revisions: 13501:c3a7090dbc16 13483:f413e6c5d297 Reference: https://www.illumos.org/issues/1560 https://www.illumos.org/issues/1694 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month	2013-03-25 20:38:09 +00:00
Pedro F. Giffuni	730cecb05a	Dtrace: add toupper()/tolower() and enhancements to lltostr(). Merge changes from illumos: 1451 DTrace needs toupper()/tolower() subroutines 1457 lltostr() D subroutine should take an optional base This change bumps the DT_VERS_* number to 1.8.1 in accordance to what is done in illumos. The test suite we currently include is outdated and doesnt support some updates in tst.subr.d which had to be left out for now. Illumos Revisions: r13458 5e394d8db762 r13459 c3454574dd1a Reference: https://www.illumos.org/issues/1451 https://www.illumos.org/issues/1457 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month	2013-03-25 15:40:57 +00:00
Pedro F. Giffuni	f2e66d30b8	Dtrace: add optional size argument to tracemem(). Merge change from illumos: 1455 DTrace tracemem() should take an optional size argument Our local enhancements to dt_print_bytes were equivalent to those in illumos but we made it match the illumos version to ease further code merges. For now leave out tst.smallsize.d and tst.smallsize.d.out since those don't seem to work cleanly on FreeBSD. This change bumps the DT_VERS_* number to 1.7.1 in accordance to what is done in illumos. Illumos Revision: 13457:571b0355c2e3 Reference: https://www.illumos.org/issues/1455 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month	2013-03-24 19:12:08 +00:00
Will Andrews	58567a1b4e	ZFS: Fix a panic while unmounting a busy filesystem. This particular scenario was easily reproduced using a NFS export. When the first 'zfs unmount' occurred, it returned EBUSY via this path, while vflush() had flushed references on the filesystem's root vnode, which in turn caused its v_interlock to be destroyed. The next time 'zfs unmount' was called, vflush() tried to obtain this lock, which caused this panic. Since vflush() on FreeBSD is a definitive call, there is no need to check vfsp->vfs_count after it completes. Simply #ifdef sun this check. Submitted by: avg Reviewed by: avg Approved by: ken (mentor) MFC after: 1 month	2013-03-23 16:34:56 +00:00
Steven Hartland	def84b9736	Fix for building libzpool under i386. Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 23:06:11 +00:00
Steven Hartland	2b114ad2a4	Add missing descriptions for ZFS sysctls Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 11:25:21 +00:00
Steven Hartland	adea827b21	Optimisation of TRIM processing. Previously TRIM processing was very bursty. This was made worse by the fact that TRIM requests on SSD's are typically much slower than reads or writes. This often resulted in stalls while large numbers of TRIM's where processed. In addition due to the way the TRIM thread was only woken by writes, deletes could stall in the queue for extensive periods of time. This patch adds a number of controls to how often the TRIM thread for each SPA processes its outstanding delete requests. vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32) vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing (seconds) Given the most common TRIM implementation is ATA TRIM the current defaults are targeted at that. Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 11:02:08 +00:00
Steven Hartland	6ad46cec23	Names the ZFS TRIM thread Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 10:41:30 +00:00
Steven Hartland	89e5b43079	TRIM cache devices based on time instead of TXGs. Currently, the trim module uses the same algorithm for data and cache devices when deciding to issue TRIM requests, based on how far in the past the TXG is. Unfortunately, this is not ideal for cache devices, because the L2ARC doesn't use the concept of TXGs at all. In fact, when using a pool for reading only, the L2ARC is written but the TXG counter doesn't increase, and so no new TRIM requests are issued to the cache device. This patch fixes the issue by using time instead of the TXG number as the criteria for trimming on cache devices. The basic delay principle stays the same, but parameters are expressed in seconds instead of TXGs. The new parameters are named trim_l2arc_limit and trim_l2arc_batch, and both default to 30 second. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `17122c31ac` MFC after: 2 weeks	2013-03-21 10:29:05 +00:00
Steven Hartland	78ad0c1c80	Improve TXG handling in the TRIM module. This patch adds some improvements to the way the trim module considers TXGs: - Free ZIOs are registered with the TXG from the ZIO itself, not the current SPA syncing TXG (which may be out of date); - L2ARC are registered with a zero TXG number, as L2ARC has no concept of TXGs; - The TXG limit for issuing TRIMs is now computed from the last synced TXG, not the currently syncing TXG. Indeed, under extremely unlikely race conditions, there is a risk we could trim blocks which have been freed in a TXG that has not finished syncing, resulting in potential data corruption in case of a crash. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `5b46ad40d9` MFC after: 2 weeks	2013-03-21 10:16:10 +00:00
Steven Hartland	e07e3a3792	Don't register repair writes in the trim map. The trim map inflight writes tree assumes non-conflicting writes, i.e. that there will never be two simultaneous write I/Os to the same range on the same vdev. This seemed like a sane assumption; however, in actual testing, it appears that repair I/Os can very well conflict with "normal" writes. I'm not quite sure if these conflicting writes are supposed to happen or not, but in the mean time, let's ignore repair writes for now. This should be safe considering that, by definition, we never repair blocks that are freed. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: Source: `6a3cebaf7c`	2013-03-21 10:02:32 +00:00
Steven Hartland	e05aad2d33	Add TRIM support for L2ARC. This adds TRIM support to cache vdevs. When ARC buffers are removed from the L2ARC in arc_hdr_destroy(), arc_release() or l2arc_evict(), the size previously occupied by the buffer gets scheduled for TRIMming. As always, actual TRIMs are only issued to the L2ARC after txg_trim_limit. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `31aae37399` MFC after: 2 weeks	2013-03-21 09:34:41 +00:00
Martin Matuska	192d547574	Release hold on pool before calling zvol_create_minor()	2013-03-20 09:56:20 +00:00
Martin Matuska	a0abc0d302	Run zvol_create_minors() only if in non-error case	2013-03-19 22:27:15 +00:00
Martin Matuska	e56718d734	Run zvol_create_minors() on snapshot creation	2013-03-19 22:14:50 +00:00
Martin Matuska	07091d8f14	MFV r247580: Merge synctask code restructuring from vendor. Modify forward and backward compatibility to support new change. Illumos ZFS issues: 3464 zfs synctask code needs restructuring Sponsored by: Hybrid Logic Ltd.	2013-03-19 12:51:18 +00:00
Martin Matuska	520268fb97	MFC @248493	2013-03-19 11:09:15 +00:00
Martin Matuska	87a5cb4650	Plug memory leak in dsl_check_snap_cb() This was unnoticed because the function is very rarely used. MFC after: 3 days	2013-03-19 07:47:51 +00:00
Martin Matuska	a602517b63	Add missing zvol_create_mirrors() on zfs_ioc_create()	2013-03-18 20:22:40 +00:00
Martin Matuska	876a84e867	MFC @248461	2013-03-18 09:39:51 +00:00
Martin Matuska	6f4accc2de	Move common zfs ioctl compatibility functions (userland) into libzfs_compat.c Introduce additional constants for zfs ioctl versions	2013-03-18 09:32:29 +00:00
Justin Hibbits	80a5635c8b	Add FBT for PowerPC DTrace. Also, clean up the DTrace assembly code, much of which is not necessary for PowerPC. The FBT module can likely be factored into 3 separate files: common, intel, and powerpc, rather than duplicating most of the code between the x86 and PowerPC flavors. All DTrace modules for PowerPC will be MFC'd together once Fasttrap is completed.	2013-03-18 05:30:18 +00:00
Martin Matuska	af2e40ccd1	Merge libzfs_core part of r239388 Illumos ZFS issues: 3085 zfs diff panics, then panics in a loop on booting References: https://www.illumos.org/issues/3085	2013-03-17 18:49:11 +00:00
Martin Matuska	70b0720877	Fix accidentially changed ioc variable for old v15 compatibility	2013-03-17 17:28:06 +00:00
Martin Matuska	6cf922c88b	Fix typo in sysctl description Reported by: Jeremy Chadwick MFC after: 3 days	2013-03-17 15:53:27 +00:00
Martin Matuska	e2b4467975	libzfs_core: - provide complete backwards compatibility (old utility, new kernel) - add zfs_cmd_t compatibility mapping in both directions - determine ioctl address in zfs_ioctl_compat.c	2013-03-17 10:57:04 +00:00
Martin Matuska	4f33cfb284	Initialize "error" variable where illumos does.	2013-03-16 20:28:38 +00:00
Martin Matuska	a03fbc7ecf	MFC @248093	2013-03-09 11:57:51 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Martin Matuska	91edf37414	Comment out unfeasible illumos copyin code and restore previous behavior.	2013-03-07 23:45:16 +00:00
Martin Matuska	b3b6851c78	Add missing init functions Reduce diff to illumos	2013-03-06 11:33:25 +00:00
Xin LI	2f79ac7f21	Diff reduction with Illumos	2013-03-06 01:21:56 +00:00
Xin LI	227d24fc59	Use adx2 instead of adx in the second vsprintf, this fixes a panic.	2013-03-05 22:58:53 +00:00
Martin Matuska	400c4069a5	MFV r247845: Import ZFS bpobj bugfix from vendor. Illumos ZFS issues: 3603 panic from bpobj_enqueue_subobj() 3604 zdb should print bpobjs more verbosely References: https://www.illumos.org/issues/3603 https://www.illumos.org/issues/3604 MFC after: 1 week	2013-03-05 18:54:41 +00:00
Martin Matuska	dce1a726f2	WiP merge of libzfs_core (MFV r238590, r238592) not yet working, ioctl handling needs to be changed	2013-03-05 08:09:53 +00:00
Justin T. Gibbs	7e2a739f03	Fix assertion failure when using userland DTrace probes from the pid provider on a kernel compiled with INVARIANTS. sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c: In fasttrap_probe_pid(), attempts to write to the address space of the thread that fired the probe must be performed with the process of the thread held. Use _PHOLD() to ensure this is the case. In fasttrap_probe_pid(), use proc_write_regs() instead of calling set_regs() directly. proc_write_regs() performs invariant checks to verify the calling environment of set_regs(). PROC_LOCK()/UNLOCK() around the call to proc_write_regs() so that it's invariants are satisfied. Sponsored by: Spectra Logic Corporation Reviewed by: gnn, rpaulo MFC after: 1 week	2013-03-04 22:07:36 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
Xin LI	5c737b11df	MFV r247575: Import a fix tighten assertion on SPA versions from vendor (Illumos). Illumos ZFS issue: 3543 Feature flags causes assertion in spa.c to miss certain cases MFC after: 2 weeks	2013-03-01 22:20:13 +00:00
Martin Matuska	4abd59512a	MFV r247316: Merge new read-only zfs properties from vendor (illumos) Illumos ZFS issues: 3588 provide zfs properties for logical (uncompressed) space used and referenced References: https://www.illumos.org/issues/3588 MFC after: 2 weeks	2013-03-01 21:58:51 +00:00
Martin Matuska	bb508e7732	Fix the zfs_ioctl compat layer to support zfs_cmd size change introduced in r247265 (ZFS deadman thread). Both new utilities now support the old kernel and new kernel properly detects old utilities. For future backwards compatibility, the vfs.zfs.version.ioctl read-only sysctl has been introduced. With this sysctl zfs utilities will be able to detect the ioctl interface version of the currently loaded zfs module. As a side effect, the zfs utilities between r247265 and this revision don't support the old kernel module. If you are using HEAD newer or equal than r247265, install the new kernel module (or whole kernel) first. MFC after: 10 days	2013-03-01 09:42:58 +00:00
Martin Matuska	24245e76ea	MFV 247176, 247178, 247315: Import metaslab_sync() speedup from vendor (illumos). Illumos ZFS issues: 3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread 3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not condensing) 3578 transferring the freed map to the defer map should be constant time 3579 ztest trips assertion in metaslab_weight() References: https://www.illumos.org/issues/3552 https://www.illumos.org/issues/3564 https://www.illumos.org/issues/3578 https://www.illumos.org/issues/3579 MFC after: 2 weeks	2013-02-27 14:45:23 +00:00
Martin Matuska	e4428d63a8	Be more verbose on ZFS deadman I/O panic Patch suggested upstream. Suggested by: Olivier Cinquin MFC after: 12 days	2013-02-26 20:41:27 +00:00
Martin Matuska	e70664bafc	MFV v242732: Merge the ZFS I/O deadman thread from vendor (illumos). This feature panics the system on hanging ZFS I/O, helps debugging and resumes failed service. The panic behavior can be controlled with the loader-only tunables: vfs.zfs.deadman_enabled (enable or disable panic on stalled ZFS I/O) vfs.zfs.deadman_synctime (expiration time for stalled ZFS I/O) By default, ZFS I/O deadman is enabled by default on amd64 and i386 excluding virtual guest machines. Illumos ZFS issues: 3246 ZFS I/O deadman thread References: https://www.illumos.org/issues/3246 MFC after: 2 weeks	2013-02-25 12:33:31 +00:00
Martin Matuska	781c0f87d3	MFV r246653: Import vendor change to avoid "unitialized variable" warnings. Illumos ZFS issues: 3522 zfs module should not allow uninitialized variables References: https://www.illumos.org/issues/3522	2013-02-23 11:21:05 +00:00
Justin T. Gibbs	6a8f90edf5	Avoid panic when tearing down the DTrace pid provider for a process that has crashed. sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c: In fasttrap_pid_disable(), we cannot PHOLD the proc structure for a process that no longer exists, but we still have other, fasttrap specific, state that must be cleaned up for probes that existed in the dead process. Instead of returning early if the process related to our probes isn't found, conditionalize the locking and carry on with a NULL proc pointer. The rest of the fasttrap code already understands that a NULL proc is possible and does the right things in this case. Sponsored by: Spectra Logic Corporation Reviewed by: rpaulo, gnn MFC after: 1 week	2013-02-20 17:55:17 +00:00
Xin LI	e469d5a70f	Eliminate real_LZ4_uncompress. It's unused and does not perform sufficient check against input stream (i.e. it could read beyond specified input buffer).	2013-02-14 21:02:18 +00:00
Martin Matuska	7c695febc9	Change vfs.zfs.write_to_degraded from CTLFLAG_RW to CTLFLAG_RWTUN Suggested by: pjd	2013-02-13 23:11:25 +00:00
Xin LI	314caba11c	Restore De Bruijn algorithm for sparc64 where the compiler rely on a library function for __builtin_c?z. Tested by: Michael Moll <kvedulv kvedulv de>	2013-02-13 17:30:54 +00:00
Martin Matuska	6a33bbc041	Merge zfs_ioctl.c code that should have been merged together with ZFS v28. Fixes several problems if working with read-only pools. Changed code originaly introduced in onnv-gate 13061:bda0decf867b Contains changes up to illumos-gate 13700:4bc0783f6064 PR: kern/175897 Suggested by: avg MFC after: 2 weeks	2013-02-11 21:10:55 +00:00
Martin Matuska	9689178c3f	MFV r246633: Import vendor bugfixes regarding SA rounding, header size and layout. This was already partially fixed by avg. Illumos ZFS issues: 3512 rounding discrepancy in sa_find_sizes() 3513 mismatch between SA header size and layout References: https://www.illumos.org/issues/3512 https://www.illumos.org/issues/3513 MFC after: 2 weeks	2013-02-11 14:29:38 +00:00
Martin Matuska	bb03847418	MFV r246394: Add tunable to allow block allocation on degraded vdevs. Illumos ZFS issues: 3507 Tunable to allow block allocation even on degraded vdevs References: https://www.illumos.org/issues/3507 MFC after: 2 weeks	2013-02-11 13:59:57 +00:00
Martin Matuska	ff20578569	MFV r246392: Import vendor ZFS bugfix fixing a possible deadlock in arc_read(). Illumos ZFS issues: 3498 panic in arc_read(): !refcount_is_zero(&pbuf->b_hdr->b_refcnt) References: https://www.illumos.org/issues/3498 MFC after: 2 weeks	2013-02-11 12:42:11 +00:00
Martin Matuska	8a2dc7faae	MFV r246390: Import minor type change in refcount.h header from vendor (illumos). MFC after: 2 weeks	2013-02-11 07:48:57 +00:00
Martin Matuska	fd9778c236	MFV r246388: Import vendor bugfixes Illumos ZFS issues: 3422 zpool create/syseventd race yield non-importable pool 3425 first write to a new zvol can fail with EFBIG References: https://www.illumos.org/issues/3422 https://www.illumos.org/issues/3425 MFC after: 2 weeks	2013-02-10 19:32:55 +00:00
Xin LI	ef17620fc8	MFV r245512: * Illumos zfs issue #3035 [1] LZ4 compression support in ZFS. LZ4 is a new high-speed BSD-licensed compression algorithm created by Yann Collet that delivers very high compression and decompression performance compared to lzjb (>50% faster on compression, >80% faster on decompression and around 3x faster on compression of incompressible data), while giving better compression ratio [1]. This version of LZ4 corresponds to upstream's [2] revision 85. Please note that for obvious reasons this is not backward read compatible. This means once a pool have LZ4 compressed data, these data can no longer be read by older ZFS implementations. Local changes: - On-stack hash table disabled and using kernel slab allocator instead, at this time. This requires larger kernel thread stack for zio workers. This may change in the future should we adjusted the zio workers' thread stack size. - likely and unlikely will be undefined if they are already defined, this is required for i386 XEN build. - Removed De Bruijn sequence based __builtin_ctz family of builtins in favor of the latter. Both GCC and clang supports these builtins. - Changed the way the LZ4 code detects endianness. - Manual pages modifications to mention the feature based on Illumos counterpart. - Boot loader changes to make it support LZ4 decompression. [1] https://www.illumos.org/issues/3035 [2] http://code.google.com/p/lz4/source/list Obtained from: Illumos (13921:9d721847e469) Tested on: FreeBSD/amd64 MFC after: 1 month	2013-02-09 06:39:28 +00:00
Andriy Gapon	0dcab786b8	zfs_vget, zfs_fhtovp: properly handle the z_shares_dir object A special gfs vnode corresponds to that object. A regular zfs vnode must not be returned. This should be upstreamed. Reported by: pluknet Submitted by: rmacklem Tested by: pluknet MFC after: 10 days	2013-02-08 07:49:54 +00:00
Andriy Gapon	e2bb19dce5	zfs: update comments about zfid_long_t to match the FreeBSD definitions MFC after: 1 week	2013-02-08 07:44:15 +00:00
Andriy Gapon	c7d346f269	zfs: fix, improve and re-organize page_lookup and page_unlock Now they are split into two pairs: page_hold/page_unhold for mappedread and page_busy/page_unbusy for update_pages. For mappedread we simply hold a page that is to be used as a source if it is resident and valid (and not busy). This is sufficient since we are only doing page -> user buffer copying. There is no page <-> backing storage I/O involved. update_pages is now better split to properly handle the putpages case (page -> arc) and the regular write case (arc -> page). For the latter we use complete protocol of marking an object with paging-in-progress and marking a page with io_start (busy count). Also, in this case we remove the write bit from all page mappings and clear dirty bits of the pages, the former is needed to ensure that the latter does the right thing. Additionally we update a page if it is cached instead of just freeing it as was done before. This needs to be verified. A minor detail: ZFS-backed pages should always be either fully valid or fully invalid. Assert this and use simpler API that does not deal with sub-page blocks. Reviewed by: kib MFC after: 26 days	2013-02-03 18:42:20 +00:00
Andriy Gapon	13235aaa89	zfs: add MODULE_VERSION for zfsctrl This should allow the kernel linker to easily detect a situation when the module is present both in a kernel and in a preloaded file (zfs.ko). Reviewed by: jhb MFC after: 5 days	2013-02-02 11:35:18 +00:00
Andriy Gapon	ea84c62f93	spa_generate_rootconf: add support for old vdev labels It seems that old ZFS versions (v15) completely omit "vdev_children" property when there is a single child. Reported by: jase Tested by: jase MFC after: 1 week	2013-01-26 10:34:17 +00:00
Xin LI	5c74885e99	MFV r245510: improve the comment in txg.c Obtained from: Illumos (13910:f3454e0a097c) MFC after: 2 weeks	2013-01-16 22:59:50 +00:00
Konstantin Belousov	614b9f9130	For zfs vnodes, use the standard inode number based hash algorithm. Reviewed and tested by: peter Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:45:33 +00:00
Xin LI	290a1ba9a4	The current ZFS code expects ddt_zap_count to always succeed by asserting the underlying zap_count() to return no errors. However, it is possible that the pool reaches to such a state where zap_count would return error, leading to panics when a pool is imported. This commit changes the ddt_zap_count to return error returned from zap_count and handle the error appropriately. With this change, it's now possible to let zpool rollback damaged transaction groups and import the pool. Obtained from: ZFS on Linux github (`e8fd45a0f9`) MFC after: 1 month	2013-01-10 19:26:56 +00:00
Andriy Gapon	f71fbb1d12	zfs: solaris doesn't have KM_ZERO, kmem_zalloc should be used instead To do: remove KM_ZERO declaration Pointyhat to: avg (for mindlessly using the pseudo-flag) MFC after: instantly (to fix stable/8 build)	2012-12-23 19:58:41 +00:00
Steven Hartland	5780c4a723	Added vfs.zfs.vdev.trim_on_init sysctl which allows full vdev trim on initialisation to be enabled (1) / disabled (0) defaults to enabled. This is useful for devices which have a slow trim speed and are either new or have otherwise already been wiped e.g. secure erase. PR: kern/173116 Submitted by: Steven Hartland Approved by: pjd (mentor)	2012-12-13 17:39:07 +00:00
Steven Hartland	c440a359ca	Upgrades trim free request sizes before inserting them into to free map, making range consolidation much more effective particularly for small deletes. This reduces memory used by the free map as well as reducing the number of bio requests down to geom required to process all deletes. In tests this achieved a factor of 10 reduction of trim ranges / geom call downs. While I'm here correct the description of zio_vdev_io_start. PR: kern/173254 Submitted by: Steven Hartland Approved by: pjd (mentor)	2012-12-13 17:06:38 +00:00
Steven Hartland	7150222c0a	Renamed zfs trim stats removing duplicate zio_trim identifier from the name Added description option to kstats. Added descriptions for zio_trim kstats PR: kern/173113 Submitted by: Steven Hartland Reviewed by: pjd Approved by: pjd MFC after: 2 weeks	2012-12-12 16:14:14 +00:00
Xin LI	2740382ebd	Use SA_ZPL_CRTIME instead of SA_ZPL_CTIME for creation time. Submitted by: phil.stone at gmx.com MFC after: 2 weeks	2012-12-03 04:25:37 +00:00
Andriy Gapon	289b3b96ac	zfs_getpages: make use of vm_page_readahead_finish Suggested by: kib MFC after: 5 days	2012-12-01 18:13:53 +00:00
Andriy Gapon	992ffc58ae	gfs_file_inactive: replace bad code with ugly code Also, make it explicit that V_XATTRDIR is not properly supported in gfs code yet. The bad code was plain incorrect: (a) it spoiled handling of v_usecount reaching zero and (b) it leaked v_holdcnt. The ugly code employs potentially unsafe locking tricks. Ideally we should separate vnode lifecycle and gfs node lifecycle. A gfs node should have its own reference count where its child nodes should be accounted. PR: kern/151111 Reviewed by: kib MFC after: 13 days	2012-12-01 18:12:55 +00:00
Martin Matuska	7faa32552f	MFV r243395: Introduce a new dataset aclmode setting "restricted" to protect ACL's being destroyed or corrupted by a drive-by chmod. illumos-gate 13889:a67716f16746 3254 add support in zfs for aclmode=restricted References: https://www.illumos.org/issues/3254 MFC after: 2 weeks	2012-11-26 12:24:39 +00:00
Martin Matuska	53e5858c68	Add loader(8) tunable to enable/disable nopwrite functionality: vfs.zfs.nopwrite_enabled MFC after: 2 weeks	2012-11-25 16:54:43 +00:00
Martin Matuska	dd801aa546	MFV r243013 and r243267: Import the zio nop-write improvement from Illumos. To reduce I/O, nop-write omits overwriting data if the checksum (cryptographically secure) of new data matches the checksum of existing data. It also saves space if snapshots are in use. It currently works only on datasets with enabled compression, disabled deduplication and sha256 checksums. IllumOS 13887:196932ec9e6a and 13888:7204b3392a58 3236 zio nop-write References: https://www.illumos.org/issues/3236 MFC after: 2 weeks	2012-11-25 16:32:07 +00:00
Andriy Gapon	3a0e1b57bb	zfs_freebsd_reclaim: remove a stray variable ... which leaked from a subsequent local change. Unfortunately I noticed that only after commit. MFC after: 5 weeks X-MFC with: r243520	2012-11-25 15:46:29 +00:00
Andriy Gapon	4ff1c77d22	zfs: overhaul zfs-vfs glue for vnode life-cycle management * There is no need for the delayed destruction of znodes via taskqueue, now that we do not need to fear recursion from getnewvnode into zfs_inactive and zfs_freebsd_reclaim, thus making znode/vnode state machine a bit simpler. * More complete porting of zfs_inactive from Solaris VFS model to FreeBSD vop_inactive and vop_reclaim model. All destructive actions are done in zfs_freebsd_reclaim. This allows to simplify zfs_zget logic. * Allow zfs_zget to return a doomed vnode if the current thread already has an exclusive lock on the vnode. * Clean up Solaris-isms like bailing out of reclaim/inactive on certain values of v_usecount (aka v_count) or directly messing with this counter. * Do not clear z_vnode while znode is still accessible. z_vnode should be cleared only after zfs_znode_dmu_fini. Otherwise zfs_zget may get an effectively half-deconstructed znode. This allows to simplify zfs_zget logic further. The above changes fix at least two known/reported problems: o An indefinite wait in the following code path: vgone -> VOP_RECLAIM -> zfs_freebsd_reclaim -> vnode_destroy_vobject -> put_pages -> zfs_write -> zil_commit -> zfs_zget This happened because vgone marks a vnode as VI_DOOMED before calling VOP_RECLAIM, but zfs_zget would not return a doomed vnode under any circumstances. The fix in this change is not complete as it won't fix a deadlock between two threads doing VOP_RECLAIM where one thread is in zil_commit trying to zfs_zget a znode/vnode being reclaimed by the other thread, which would be blocked trying to enter zil_commit. This type of deadlock has not been reported as of now. o An indefinite wait in the unmount path caused by a znode "falling through the cracks" in inactive+reclaim. This would happen if the znode is unlinked while its vnode is still active. To Do: pass locking flags parameter to zfs_zget, so that the zfs-vfs glue code doesn't have to re-lock a vnode but could ask for proper locking from the very start. This would also allow for the higher level code to obtain a doomed vnode when it is expected/requested. Or to avoid blocking when it is not allowed (see zil_commit example above). ffs_vgetf seems like a good source of inspiration. Tested by: Willem Jan Withagen <wjw@digiware.nl> MFC after: 6 weeks	2012-11-25 15:33:26 +00:00
Andriy Gapon	7ca5310ea3	zfs_fhtovp: there is no reason to amend lock flags with LK_RETRY here MFC after: 12 days	2012-11-25 15:07:27 +00:00
Andriy Gapon	7192f62bcc	add zfs_bmap to aid vnode_pager_haspage ... otherwise zfs_getpages would mostly be called with one page at a time. It is expected that ZFS VOP_BMAP is only called from vnode_pager_haspage. Since ZFS files can have variable block sizes and also because we don't really know if any given blocks are consecutive, we can not really report any additional blocks behind or ahead of a given block. Since physical block numbers do not make sense for ZFS, we do not do any real translation and thus pass back blk = lblk. The net effect is that vnode_pager_haspage knows that the block exists and that the pages backed by the block can be accessed. vnode_pager_haspage may be wrong about the exact count of the pages backed by the block, because of a variable block size, which vnode_pager_haspage doesn't really know - it only knows max block size in a filesystem. So pages from multiple blocks can be passed to zfs_getpages, but that is expected and correctly handled. vnode_pager should not call zfs_bmap for any other reason, because ZFS implements VOP_PUTPAGES and thus vnode_pager_generic_getpages is not used. vfs_cluster code vfs_bio code should not be called for ZFS, because ZFS does not use buffer cache layer. Also, ZFS does not use vn_bmap_seekhole, it has its prviate mechanism for working with holes. The above list should cover all the current calls to VOP_BMAP. Reviewed by: kib MFC after: 6 weeks	2012-11-25 15:01:12 +00:00
Andriy Gapon	b609e5f891	zfs_getpages: optimize for large block sizes MFC after: 6 weeks	2012-11-25 14:53:26 +00:00
Martin Matuska	2f06dfc9a3	MFV r243012: Illumos 13886:e3261d03efbf 3349 zpool upgrade -V bumps the on disk version number, but leaves the in core version References: https://www.illumos.org/issues/3349 MFC after: 1 week	2012-11-25 10:53:42 +00:00
Martin Matuska	2b8d4033cc	MFV r242735: Illumos 13879:4eac7a87eff2: 3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb() 3330 space_seg_t should have its own kmem_cache 3331 deferred frees should happen after sync_pass 1 3335 make SYNC_PASS_* constants tunable New loader-only tunables: vfs.zfs.sync_pass_deferred_free vfs.zfs.sync_pass_dont_compress vfs.zfs.sync_pass_rewrite References: https://www.illumos.org/issues/3329 https://www.illumos.org/issues/3330 https://www.illumos.org/issues/3331 https://www.illumos.org/issues/3335 MFC after: 2 weeks	2012-11-25 09:06:32 +00:00
Andriy Gapon	328998eac1	zfs roopool: add support for multi-vdev configurations Tested by: madpilot MFC after: 10 days	2012-11-24 13:23:15 +00:00
Andriy Gapon	e1fccde2c9	spa_import_rootpool: initialize ub_version before calling spa_config_parse ... because the latter makes some decision based on the version. This is especially important for raidz vdevs. This is similar to what spa_load does. This is not an issue for upstream because they do not seem to support using raidz as a root pool. Reported by: Andrei Lavreniyuk <andy.lavr@gmail.com> Tested by: Andrei Lavreniyuk <andy.lavr@gmail.com> MFC after: 6 days	2012-11-24 13:16:49 +00:00
Andriy Gapon	cfca00a2fb	spa_import_rootpool: do not call spa_history_log_version The call is a NOP, because pool version in spa_ubsync.ub_version is not initialized and thus appears to be zero. If the version is properly set then the call leads to a NULL pointer dereference because the spa object is still under-constructed. The same change was independently made in the upstream as a part of a larger change (4445fffbbb1ea25fd0e9ea68b9380dd7a6709025). MFC after: 6 days	2012-11-24 13:14:53 +00:00
Andriy Gapon	c4f59a3c09	zfs: create devices/geoms from zvols after receiveing them PR: kern/167066 Tested by: Andreas Nilsson <andrnils@gmail.com> MFC after: 13 days	2012-11-24 13:07:31 +00:00
Andriy Gapon	dbe922173c	zfs_remove: assert that delete_now case is never true on FreeBSD That case is specific to Solaris VFS and it would violate pretty fundamental contracts of FreeBSD VFS. Discussed with: pjd MFC after: 12 days	2012-11-19 11:30:08 +00:00
Andriy Gapon	7b069f7fee	zfs_remove: set VV_NOSYNC flag if a node is unlinked Suggested by: kib MFC after: 12 days	2012-11-19 11:25:20 +00:00
Andriy Gapon	62875a6b10	spa_import_rootpool: fall back to use configuration from zpool.cache... if we fail to generate a proper root pool configuration based on disk probing. Currently we can not properly generate the configuration for multi-vdev pools. Make that explicit. Reported by: madpilot, Bartosz Stec <bartosz.stec@it4pro.pl> Tested by: madpilot, Bartosz Stec <bartosz.stec@it4pro.pl> MFC after: 4 days	2012-11-18 11:47:25 +00:00
Konstantin Belousov	f13b5a0f01	Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month	2012-11-13 12:52:31 +00:00
Andriy Gapon	7631b580ff	zfs_ioc_destroy_snaps_nvl: remove disk device entries for zvol snapshots ... before trying to destroy the zvol snapshots themselves. PR: kern/173442 Reported by: Petri Helenius <petri@helenius.fi>, mm Obtained from: Brian Behlendorf <behlendorf1@llnl.gov>, Illumos Bug #3170 Tested by: Petri Helenius <petri@helenius.fi> MFC after: 10 days	2012-11-10 12:22:26 +00:00
Xin LI	a9b09a3f3c	MFV r242729 (mm): Illumos r13840:97fd5cdf328a: 3145 single-copy arc 3212 ztest: race condition between vdev_online() and spa_vdev_remove() Illumos r13849:3468a95b27cd: 3258 ztest's use of file descriptors is unstable	2012-11-10 01:52:52 +00:00
Attilio Rao	bc2258da88	Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag. Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.	2012-11-09 18:02:25 +00:00
Justin Hibbits	c757049235	Implement DTrace for PowerPC. This includes both 32-bit and 64-bit. There is one known issue: Some probes will display an error message along the lines of: "Invalid address (0)" I tested this with both a simple dtrace probe and dtruss on a few different binaries on 32-bit. I only compiled 64-bit, did not run it, but I don't expect problems without the modules loaded. Volunteers are welcome. MFC after: 1 month	2012-11-07 23:45:09 +00:00
Andriy Gapon	2c6024ec1b	zfs_dirlook: bailout early if directory is unlinked Otherwise we could fail with an incorrect error if e.g. parent object id is removed too or we can even return a wrong vnode if parent object has been already re-used. Discussed with: pjd Also see: http://article.gmane.org/gmane.os.freebsd.devel.file-systems/13863 MFC after: 26 days	2012-11-04 14:50:08 +00:00
Andriy Gapon	5c997cc429	zfsctl_snapdir_lookup: obtain a snapname in the remount case ... which is triggered if somebody did regular umount on a snapshot mount. Reviewed by: Matthew Ahrens <mahrens@delphix.com> MFC after: 20 days	2012-11-04 14:43:15 +00:00
Andriy Gapon	88c8884a71	zfs: set MNTK_EXTENDED_SHARED flag Discussed with: kib MFC after: 20 days	2012-11-04 14:36:11 +00:00
Andriy Gapon	71900cfaf7	zfs_vnode_forget: dispose of larvae vnode using public vfs api (mostly) Reviewed by: kib MFC after: 19 days	2012-11-04 14:24:00 +00:00
Andriy Gapon	a16e534dbe	zfs_umount: no need to set MNTK_UNMOUNTF here, dounmount handles that Reviewed by: kib MFC after: 19 days	2012-11-04 14:22:25 +00:00
Andriy Gapon	62eeeb8ff8	zfs_vnode_lock: no need to double-guess caller's intentions here vn_lock should do the right thing with respect to given vnode lock flags. If a caller doesn't mind a doomed vnode, then zfs should deliver. Reviewed by: kib MFC after: 19 days	2012-11-04 14:15:13 +00:00
Andriy Gapon	d548e8b66f	zfs_mount: drop vfs.zfs.rootpool.prefer_cached_config tunable It turned out to be not that useful, because its default value may lead to a problem when a root pool is present in zpool.cache, but its on-disk status is 'exported'. This may happen if the pool was imported in a different environment with -f flag and then exported. MFC after: 12 days	2012-11-04 13:50:08 +00:00
Andriy Gapon	7ac8ca0d58	zfs_freebsd_close: call zfs_close with count=1 instead of count=0 Otherwise we may be leaking z_sync_cnt, which may lead to unnecessary ZIL sync-ing. MFC after: 12 days	2012-11-04 13:48:48 +00:00
Xin LI	7f24254add	s/dettach/detach/g Approved by: pjd MFC after: 1 month	2012-10-30 01:29:45 +00:00
Andriy Gapon	86812da016	zfs: fix label validation code in vdev_geom_read_config POOL_STATE_SPARE and POOL_STATE_L2CACHE were not handled correctly and thus the cache and spare disks would not be correctly probed. Reported by: Michael Schmiedgen <schmiedgen@gmx.net>, Matthew D. Fuller <fullermd@over-yonder.net> Tested by: Michael Schmiedgen <schmiedgen@gmx.net>, flo MFC after: 5 days	2012-10-26 14:50:16 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Andriy Gapon	42abbab471	zfs: wait in arc_lowmem only if curproc == pageproc ... otherwise the current thread might be holding ARC locks and thus run into a deadlock. This happens, for example, when a thread does memory allocation in the ARC code and runs into KVA shortage. Also, it really makes the most sense to wait in pageproc, so that the results of ARC reclamation are seen before the page cache is acted upon. In other cases where vm_lowmem is invoked, e.g. on KVA space shortage, the callers perform multiple attempts (up to 8) and wait for rather long intervals between them (up to 4 seconds), so ARC reclaim results should become visible even without explicit waiting on the ARC thread. Note that this is not a critical issue for typical ZFS usages where KVA space should already be large enough. On amd64 systems setting KVA size to twice the physical memory size is known to mitigate KVA fragmentation issues in practice. Side note: perhaps vm_lowmem 'how' parameter should be used to differentiate between causes of the event. Reported by: Nikolay Denev <ndenev@gmail.com> MFC after: 19 days	2012-10-20 10:02:18 +00:00
Andriy Gapon	edb085b8ab	zfs: make use of getnewvnode_reserve in zfs_mknode and zfs_zget getnewvnode_reserve helps to avoid "recursing" back into zfs code via getnewvnode when that latter needs to reclaim some vnodes. zfs code may hold a number of locks around getnewvnode and doesn't expect any recursion to happen on those locks, because that never happens in solaris. I believe that this change also eleiminates a need for the delayed znode destruction via the taskqueue. Many thanks to kib for devising getnewvnode_reserve. Reported by: flo Tested by: bapt, kwm, swills MFC after: 2 weeks X-MFC after: r241556	2012-10-17 10:59:56 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Andriy Gapon	8bf749ef3a	zvol: set mediasize in geom provider right upon its creation ... instead of deferring the action until first open. Unlike upstream this has no benefit on FreeBSD. We know that as soon as the provider is created it is going to be tasted and thus opened. Initial mediasize of zero causes tasting failure and subsequent retasting because of the size change. MFC after: 14 days	2012-10-06 19:57:27 +00:00
Andriy Gapon	61e100ee3b	zfs_mount: taste geom providers for root pool config This should allow to mount a dataset as a root filesystem even if it belongs to a pool that is not described in zpool.cache. This adds some overhead to the boot process though. If the root filesystem's pool is found in zpool.cache, the by default its cached configuration will be used for import. vfs.zfs.rootpool.prefer_cached_config could be set to zero to force the config to be retasted. Discussed with: gibbs, pjd, des MFC after: 25 days	2012-10-06 19:33:47 +00:00
Martin Matuska	8469b12c2e	Merge recent vendor changes in ZFS. Illumos issued covered: 2811 missing implementation: zfs send -r 3139 zdb dies when it tries to determine path of unlinked file 3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg 3208 moving zpool cross-endian results in incorrect user/group accounting References: https://www.illumos.org/issues/ + [issue_id] Obtained from: illumos (vendor/illumos, vendor/illumos-sys) MFC after: 2 weeks	2012-09-26 09:37:58 +00:00
Pawel Jakub Dawidek	c622f88dd2	It is possible to recursively destroy snapshots even if the snapshot doesn't exist on a dataset we are starting from. For example if we have the following configuration: tank tank/foo tank/foo@snap tank/bar tank/bar@snap We can execute: # zfs destroy -t tank@snap eventhough tank@snap doesn't exit. Unfortunately it is not possible to do the same with recursive rename: # zfs rename -r tank@snap tank@pans cannot open 'tank@snap': dataset does not exist ...until now. This change allows to recursively rename snapshots even if snapshot doesn't exist on the starting dataset. Sponsored by: rsync.net MFC after: 2 weeks	2012-09-23 20:12:10 +00:00
Pawel Jakub Dawidek	bcb77be2b7	Add TRIM support. The code builds a map of regions that were freed. On every write the code consults the map and eventually removes ranges that were freed before, but are now overwritten. Freed blocks are not TRIMed immediately. There is a tunable that defines how many txg we should wait with TRIMming freed blocks (64 by default). There is a low priority thread that TRIMs ranges when the time comes. During TRIM we keep in-flight ranges on a list to detect colliding writes - we have to delay writes that collide with in-flight TRIMs in case something will be reordered and write will reached the disk before the TRIM. We don't have to do the same for in-flight writes, as colliding writes just remove ranges to TRIM. Sponsored by: multiplay.co.uk This work includes some important fixes and some improvements obtained from the zfsonlinux project, including TRIMming entire vdevs on pool create/add/attach and on pool import for spare and cache vdevs. Obtained from: zfsonlinux Submitted by: Etienne Dechamps <etienne.dechamps@ovh.net>	2012-09-23 19:40:58 +00:00
Andriy Gapon	81c4584e30	zfs: allow a zvol to be used as a pool vdev, again Do this by checking if spa_namespace_lock is already held and not taking it again in that case. Add a comment explaining why that is done and why it is safe. Reviewed by: pjd MFC after: 24 days	2012-09-22 17:42:53 +00:00
Pawel Jakub Dawidek	3c5a057574	As in r226967, r226987 and r232401 changes to UFS and TMPFS remove cache entries associated with the source and the target of rename(). MFC after: 1 week	2012-09-22 17:32:40 +00:00
Andriy Gapon	6ed9e9f32f	zfs: correctly calculate dn_bonuslen for saving SAs to disk Since all attribute values start at 8-byte aligned boundary, we would previously incorrectly calculate dn_bonuslen if any attribute but the last had a variable-length value with length not multiple of 8. Reported by: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org> Tested by: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> (for upstream) MFC after: 2 weeks	2012-09-18 08:02:54 +00:00
Andriy Gapon	ea559fb573	zfs: allow both DEBUG and ZFS_DEBUG to be defined on command line Discussed with: pjd MFC after: 10 days	2012-09-18 08:00:56 +00:00
Martin Matuska	4c5238d576	Merge recent zfs vendor changes, sync code and adjust userland DEBUG. Illumos issued covered: 1884 Empty "used" field for zfs *space commands 3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument is zero 3028 zfs {group,user}space -n prints (null) instead of numeric GID/UID 3048 zfs {user,group}space [-s\|-S] is broken 3049 zfs {user,group}space -t doesn't really filter the results 3060 zfs {user,group}space -H output isn't tab-delimited 3061 zfs {user,group}space -o doesn't use specified fields order 3064 usr/src/cmd/zpool/zpool_main.c misspells "successful" 3093 zfs {user,group}space's -i is noop 3098 zfs userspace/groupspace fail without saying why when run as non-root References: https://www.illumos.org/issues/ + [issue_id] Obtained from: illumos (vendor/illumos, vendor/illumos-sys) MFC after: 2 weeks	2012-09-12 18:05:43 +00:00
Andriy Gapon	9c5c3cafbd	zfs: fix sa_modify_attrs handling of variable-sized attributes - skip length_idx index for a replaced variable-sized attribute - skip length_idx index for a removed variable-sized attribute - also re-arranged code to make sure that length_idx is always incremented for variable-sized attributes - additionally add an assertion that the number of actually produced attributes is the same as the expected number of resulting attributes In cooperation with: Matthew Ahrens <mahrens@delphix.com> Tested by: Trent Nelson <trent@snakebite.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> (for upstream) To do: get this upstreamed MFC after: 2 weeks	2012-09-11 07:07:52 +00:00
Martin Matuska	6643637f67	Add assfail() and assfail3() to the opensolaris module. Remove obsoleted intermediate cddl/compat/opensolaris/sys/debug.h. MFC after: 2 weeks	2012-09-10 10:24:57 +00:00
Martin Matuska	4a24a25b2f	Merge recent vendor changes and sync code: 1862 incremental zfs receive fails for sparse file > 8PB 3112 ztest does not honor ZFS_DEBUG 3122 zfs destroy filesystem should prefetch blocks 3129 'zpool reopen' restarts resilvers 3130 ztest failure: Assertion failed: 0 == dmu_objset_destroy(name, B_FALSE) (0x0 == 0x10) References: https://www.illumos.org/issues/1862 https://www.illumos.org/issues/3112 https://www.illumos.org/issues/3122 https://www.illumos.org/issues/3129 https://www.illumos.org/issues/3130 Obtained from: illumos (vendor/illumos, vendor/illumos-sys) MFC after: 2 weeks	2012-09-05 12:02:09 +00:00
Ed Schouten	3f0fb35417	Use a proper destructor function. When calling a revoke(2) on a dtrace device, dtrace_close() could be called, even if threads are still stuck in the device. Defer the actual deallocation of datastructures to the cdevpriv destructor. While there, remove the unneeded D_TRACKCLOSE and D_NEEDMINOR flags. For the helper device, we never need it. For the regular dtrace devices, we only need these flags on FreeBSD pre-8. MFC after: 1 month	2012-08-28 18:33:12 +00:00
Martin Matuska	6e767def16	Merge recent vendor changes: 3100 zvol rename fails with EBUSY when dirty 3104 eliminate empty bpobjs 3120 zinject hangs in zfsdev_ioctl() due to uninitialized zc References: https://www.illumos.org/issues/3100 https://www.illumos.org/issues/3104 https://www.illumos.org/issues/3120 Obtained from: illumos (vendor/illumos, vendor/illumos-sys) MFC after: 2 weeks	2012-08-28 12:25:37 +00:00
Martin Matuska	671303c6d5	Merge recent vendor changes: 3086 unnecessarily setting DS_FLAG_INCONSISTENT on async destroyed datasets 3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label Referenes: https://www.illumos.org/issues/3086 https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102 PR: kern/170912, kern/170914 Obtained from: illumos (changeset #13776, #13777) MFC after: 2 weeks	2012-08-23 19:32:57 +00:00
Martin Matuska	bb9b1f7a8b	Backport fix for vendor issue #3085 3085 zfs diff panics, then panics in a loop on booting References: https://www.illumos.org/issues/3085 PR: kern/170763 Obtained from: ssh://anonhg@hg.illumos.org/illumos-gate (r13772) MFC after: 1 week	2012-08-19 09:59:41 +00:00
Hans Petter Selasky	07da61a6cc	Streamline use of cdevpriv and correct some corner cases. 1) It is not useful to call "devfs_clear_cdevpriv()" from "d_close" callbacks, hence for example read, write, ioctl and so on might be sleeping at the time of "d_close" being called and then then freed private data can still be accessed. Examples: dtrace, linux_compat, ksyms (all fixed by this patch) 2) In sys/dev/drm* there are some cases in which memory will be freed twice, if open fails, first by code in the open routine, secondly by the cdevpriv destructor. Move registration of the cdevpriv to the end of the drm open routines. 3) devfs_clear_cdevpriv() is not called if the "d_open" callback registered cdevpriv data and the "d_open" callback function returned an error. Fix this. Discussed with: phk MFC after: 2 weeks	2012-08-15 16:19:39 +00:00
Marius Strobl	787c338407	Include <vm/vm_param.h> for PA_LOCK_COUNT in order to fix kernel build with options ZFS after r239065.	2012-08-05 20:19:27 +00:00
Martin Matuska	e9832bb1da	Partial MFV (illumos-gate 13753:2aba784c276b) 2762 zpool command should have better support for feature flags References: https://www.illumos.org/issues/2762 MFC after: 2 weeks	2012-07-30 23:14:24 +00:00
Edward Tomasz Napierala	b75ca29147	Make ZVOL resizing ('zfs set volsize') properly resize the GEOM provider. Sponsored by: FreeBSD Foundation	2012-07-20 16:56:34 +00:00
Pawel Jakub Dawidek	92484a2615	vdev_io_done stage is not used for ioctls. MFC after: 1 week	2012-07-04 17:39:29 +00:00
Martin Matuska	a6a8d8377f	Expose scrub and resilver tunables. This allows the user to tune the priority trade-off between scrub/resilver and other ZFS I/O. MFC after: 2 weeks Discussed with: pjd	2012-07-02 07:27:14 +00:00
Pedro F. Giffuni	9a9df34345	Bump dtrace_helper_actions_max from 32 to 128 Dave Pacheco from Joyent (and Dtrace.org) bumped the cap to 1024 but, according to his blog, 128 is the recommended minimum. For now bump it safely to 128 although we may have to bump it further if there is demand in the future. Reference: http://www.illumos.org/issues/2558 http://dtrace.org/blogs/dap/2012/01/50/where-does-your-node-program-spend-its-time/	2012-06-29 18:49:14 +00:00
Pedro F. Giffuni	675cf9154b	Bring llquantize support into Dtrace. Bryan Cantrill implemented the equivalent of semi-log graph paper for Dtrace so llquantize will use one logarithmic and one linear scale. Special thanks to Mark Peek for providing fix to an assertion and to Fabian Keill for testing the port. Illumos Revision: 13355:15b74a2a9a9d Reference: https://www.illumos/issues/905 Obtained from: Illumos Tested by: Fabian Keill, mp MFC after: 4 days	2012-06-27 04:39:30 +00:00
Martin Matuska	de37372f73	Import Illumos revision 13736:9f1d48e1681f 2901 ZFS receive fails for exabyte sparse files References: https://www.illumos.org/issues/2901 Obtained from: illumos (issue #2901) MFC after: 1 week	2012-06-22 20:42:11 +00:00
Martin Matuska	2d9cf57e18	Introduce "feature flags" for ZFS pools (bump SPA version to 5000). Add first feature "com.delphix:async_destroy" (asynchronous destroy of ZFS datasets). Implement features support in ZFS boot code. Illumos revisions merged: 13700:2889e2596bd6 13701:1949b688d5fb 2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags References: https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747 Obtained from: illumos (issue #2619, #2747) MFC after: 1 month	2012-06-11 11:35:22 +00:00
Pawel Jakub Dawidek	25892bfc61	ds_guid of 0 is special, as it is used by snapshot receive code to differentiate between an incremental and full stream. Be sure not to generate guid equal to 0. Reported by: someone who saw 0 being generated as 64bit random guid MFC after: 3 days	2012-06-09 20:16:19 +00:00
Pawel Jakub Dawidek	97e9ad8ec4	Tighten up the assertion: because size can't be 0 and even if sm_space is equal to sm_size, any 'sm_space - size' will be less than sm_size. MFC after: 3 days	2012-05-29 18:11:45 +00:00
Pawel Jakub Dawidek	837a617728	Eliminate 'where' argument, we don't use it. MFC after: 3 days	2012-05-29 18:09:14 +00:00
Pawel Jakub Dawidek	8ac2669cc8	Remove unused variable. MFC after: 3 days	2012-05-29 18:05:24 +00:00
Pawel Jakub Dawidek	e21c77d804	Remove unused sysctl. MFC after: 3 days	2012-05-29 17:53:11 +00:00
Martin Matuska	2182d44714	Import illumos changeset 13570:3411fd5f1589 1948 zpool list should show more detailed pool information Display per-vdev information with "zpool list -v". The added expandsize property has currently no value on FreeBSD. This changeset allows adding expansion support to individual vdevs in the future. References: https://www.illumos.org/issues/1948 Obtained from: illumos (issue #1948) MFC after: 2 weeks	2012-05-27 16:00:00 +00:00
Martin Matuska	d9727dc29c	Import illumos changeset 13605:b5c2b5db80d6 (partial) 763 FMD msg URLs should refer to something visible Replace sun.com URL's with illumos.org References: https://www.illumos.org/issues/763 Obtained from: illumos (issue #763) MFC after: 1 week	2012-05-27 12:31:57 +00:00
Edward Tomasz Napierala	9280affe16	Fix enforcement of file size limit with O_APPEND on ZFS. vn_rlimit_fsize takes uio->uio_offset and uio->uio_resid into account when determining whether given write would exceed RLIMIT_FSIZE. When APPEND flag is specified, ZFS updates uio->uio_offset to point to the end of file. But this happens after a call to vn_rlimit_fsize, so vn_rlimit_fsize check can be rendered ineffective by thread that opens some file with O_APPEND and lseeks below RLIMIT_FSIZE before calling write. Submitted by: Mateusz Guzik <mjguzik at gmail dot com> MFC after: 2 weeks	2012-05-22 10:54:42 +00:00
Martin Matuska	a837775a9e	Import illumos changeset 13686:4bc0783f6064 2703 add mechanism to report ZFS send progress If the zfs send command is used with the -v flag, the amount of bytes transmitted is reported in per second updates. References: https://www.illumos.org/issues/2703 Obtained from: illumos (issue #2703) MFC after: 2 weeks	2012-05-10 10:39:45 +00:00
Marius Strobl	35225ae651	Partially revert r232938; ZFS only requires nfs4 but not posix1e. Submitted by: jhb	2012-04-29 16:21:47 +00:00
Ryan Stone	c6024848dd	Implement the D "cpu" variable, which returns curcpu. I have chosen not to follow the example of OpenSolaris and its descendants, which implemented cpu as an inline that took a value out of curthread. At certain points in the FreeBSD scheduler curthread->td_oncpu will no longer be valid (in particukar, just before the thread gets descheduled) so instead I have implemented this as its own built-in variable. Sponsored by: Sandvine Inc. MFC after: 1 week	2012-04-26 01:07:03 +00:00
Edward Tomasz Napierala	af6e6b87ad	Remove unused thread argument to vrecycle(). Reviewed by: kib	2012-04-23 14:10:34 +00:00
Attilio Rao	a0f2c37b6f	- Introduce a cache-miss optimization for consistency with other accesses of the cache member of vm_object objects. - Use novel vm_page_is_cached() for checks outside of the vm subsystem. Reviewed by: alc MFC after: 2 weeks X-MFC: r234039	2012-04-09 17:05:18 +00:00
Andriy Gapon	70542ee01f	zfs_ioctl: no need for ddi_copyin/out here because sys_ioctl handles that On FreeBSD the direct ioctl argument is automatically copied in/out as necesary by the kernel ioctl entry point. PR: kern/164445 Submitted by: Luis Garces-Erice <lge@ieee.org> Tested by: Attila Nagy <bra@fsn.hu> MFC after: 5 days	2012-04-05 07:59:59 +00:00
Oleksandr Tymoshenko	5083ce5c09	Add MIPS support to cddl/contrib part: - header and stub .c file for fasttrap module. It's not supported on MIPS yet, but there is no way to disable support completely - Do as amd64 trying to limit allocated memory	2012-03-24 04:52:18 +00:00
Adrian Chadd	368a79ddd0	Add dependencies onto acl_posix1e and acl_nfs4.	2012-03-13 20:29:04 +00:00
Martin Matuska	e7af90ab00	Analogous to r232059, add a parameter for the ZFS file system: allow.mount.zfs: allow mounting the zfs filesystem inside a jail This way the permssions for mounting all current VFCF_JAIL filesystems inside a jail are controlled wia allow.mount.* jail parameters. Update sysctl descriptions. Update jail(8) and zfs(8) manpages. TODO: document the connection of allow.mount.* and VFCF_JAIL for kernel developers MFC after: 10 days	2012-02-26 16:30:39 +00:00
Martin Matuska	4ee8a13704	Revert r230913 and r230914. The initialization was correct, the problem needs deeper analysis.	2012-02-03 13:40:51 +00:00
Martin Matuska	8b152ded1c	Add copyright information on last commits to comply with CDDL. Discussed with: pluknet@ MFC after: 3 days	2012-02-02 16:33:58 +00:00
Martin Matuska	3ce26884ba	Fix out of bounds write causing random panics, uncovered by the change in r230256 Reviewed by: pluknet@ MFC after: 3 days	2012-02-02 16:18:40 +00:00
Kip Macy	ad69d4e266	always exclude data bufs regardless of debug settings	2012-01-29 00:19:19 +00:00
Kip Macy	cc0021eb34	add tunable for developers working on areas outside of ZFS to further reduce core size by excluding ARC metadata buffers from core dumps	2012-01-28 17:41:42 +00:00
Kip Macy	263811f724	exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64 excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create Reviewed by: alc, avg MFC after: 2 weeks	2012-01-27 20:18:31 +00:00
Martin Matuska	538251bbf6	Merge illumos revisions 13572, 13573, 13574: Rev. 13572: disk sync write perf regression when slog is used post oi_148 [1] Rev. 13573: crash during reguid causes stale config [2] allow and unallow missing from zpool history since removal of pyzfs [5] Rev. 13574: leaking a vdev when removing an l2cache device [3] memory leak when adding a file-based l2arc device [4] leak in ZFS from metaslab_group_create and zfs_ereport_checksum [6] References: https://www.illumos.org/issues/1909 [1] https://www.illumos.org/issues/1949 [2] https://www.illumos.org/issues/1951 [3] https://www.illumos.org/issues/1952 [4] https://www.illumos.org/issues/1953 [5] https://www.illumos.org/issues/1954 [6] Obtained from: illumos (issues #1909, #1949, #1951, #1952, #1953, #1954) MFC after: 2 weeks	2012-01-24 23:09:54 +00:00
Pawel Jakub Dawidek	1698a6aec9	Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: # zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s after: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s MFC after: 1 week	2012-01-21 21:12:53 +00:00
Pawel Jakub Dawidek	b636ebaa6a	By default turn off prefetch when listing snapshots. In my tests it makes listing snapshots 19% faster with cold cache and 47% faster with warm cache. MFC after: 1 week	2012-01-20 22:04:59 +00:00
Sergey Kandaurov	37c2842272	Fix the "lock &zrl->zr_mtx already initialized" assertion by initializing the allocated memory before calling mtx_init(9) on mtx pointing to it. Otherwize, random contents of uninitialized memory might occasionally trigger the assertion. Reported by: Pavel Polyakov <bsd kobyla org> Reviewed by: pjd MFC after: 1 week	2012-01-17 06:23:25 +00:00
Pawel Jakub Dawidek	62859c9061	- Allow to change vfs.zfs.arc_meta_limit at runtime. - Change vfs.zfs.arc_meta_used from CTLFLAG_RDTUN to CTLFLAG_RD, as it is not a tunable. MFC after: 3 days	2012-01-05 22:16:41 +00:00
Dimitry Andric	a5988eb997	In sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, check the the number of links against LINK_MAX (which is INT16_MAX), not against UINT32_MAX. Otherwise, the constant would implicitly be converted to -1. Reviewed by: pjd MFC after: 1 week	2012-01-03 20:53:07 +00:00
Pawel Jakub Dawidek	bb265163b2	From time to time people report space map corruption resulting in panic (ss == NULL) on pool import. I had such a panic recently. With current version of ZFS it is still possible to import the pool in readonly mode and backup all the data, but in case it is impossible for some reason add tunable vfs.zfs.space_map_last_hope, which when set to '1' will tell ZFS to remove colliding range and retry. This seems to have worked for me, but I consider it highly risky to use. MFC after: 1 week	2011-12-18 12:27:45 +00:00
Pawel Jakub Dawidek	efe17e5a28	Implement replying of ACLs updates. ACL changes should go to ZIL only if the 'sync' property is set to 'always', so replying them is not common. MFC after: 1 month	2011-12-18 12:19:03 +00:00
Attilio Rao	77befd1d23	Revert the approach for skipping lockstat_probe_func call when doing lock_success/lock_failure, introduced in r228424, by directly skipping in dtrace_probe. This mainly helps in avoiding namespace pollution and thus lockstat.h dependency by systm.h. As an added bonus, this also helps in MFC case. Reviewed by: avg MFC after: 3 months (or never) X-MFC: r228424	2011-12-12 23:29:32 +00:00
Pawel Jakub Dawidek	75c0e29ff3	Move ru_inblock increment into arc_read_nolock() so we don't account for cached reads. Discussed with: gibbs No objections from: avg Tested by: Marcus Reid <marcus@blazingdot.com> MFC after: 1 week	2011-12-10 13:02:52 +00:00
Pawel Jakub Dawidek	381962ee59	The vfs.zfs.txg.timeout sysctl can be safely modified at run time. MFC after: 1 week	2011-12-09 18:22:57 +00:00
Martin Matuska	62e6ce9a4b	Fix typo in copyright notice. MFC after: 1 month	2011-11-28 21:42:31 +00:00
Martin Matuska	2f7f0f4112	Merge new ZFS features from illumos: 1644 add ZFS "clones" property https://www.illumos.org/issues/1644 1645 add ZFS "written" and "written@..." properties https://www.illumos.org/issues/1645 1646 "zfs send" should estimate size of stream https://www.illumos.org/issues/1646 1647 "zfs destroy" should determine space reclaimed by destroying multiple snapshots https://www.illumos.org/issues/1647 1693 persistent 'comment' field for a zpool https://www.illumos.org/issues/1693 1708 adjust size of zpool history data https://www.illumos.org/issues/1708 1748 desire support for reguid in zfs https://www.illumos.org/issues/1748 Obtained from: illumos (changesets 13514, 13524, 13525) MFC after: 1 month	2011-11-28 21:40:00 +00:00
Konstantin Belousov	f82360acf2	Existing VOP_VPTOCNP() interface has a fatal flow that is critical for nullfs. The problem is that resulting vnode is only required to be held on return from the successfull call to vop, instead of being referenced. Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination with the VOP_VPTOCNP() interface means that the directory vnode returned from VOP_VPTOCNP() is reclaimed in advance, causing vn_fullpath() to error with EBADF or like. Change the interface for VOP_VPTOCNP(), now the dvp must be referenced. Convert all in-tree implementations of VOP_VPTOCNP(), which is trivial, because vhold(9) and vref(9) are similar in the locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(), if any, should have no trouble with the fix. Tested by: pho Reviewed by: mckusick MFC after: 3 weeks (subject of re approval)	2011-11-19 07:50:49 +00:00
Ryan Stone	add89852d6	Replace fasttrap_copyout() with uwrite(). FreeBSD copyout() is not able to write to the .text section of a process. Obtained from: rpaulo MFC after: 3 days	2011-11-07 01:55:58 +00:00
Pawel Jakub Dawidek	df663c3dd3	Correct typo in comment. Reported by: Fabian Keil <fk@fabiankeil.de> MFC after: 3 days	2011-11-05 16:44:25 +00:00
Pawel Jakub Dawidek	98dd1c40c4	In zvol_open() if the spa_namespace_lock is already held, it means that ZFS is trying to open and taste ZVOL as its VDEV. This is not supported, so return an error instead of panicing on spa_namespace_lock recursion. Reported by: Robert Millan <rmh@debian.org> PR: kern/162008 MFC after: 3 days	2011-11-05 16:29:03 +00:00
Martin Matuska	e1d4b72a2e	Fix typo in copyright notice introduced in r226724 (missing character in e-mail adress) Reported by: pjd MFC after: 3 days	2011-10-25 13:52:38 +00:00
Martin Matuska	571e19b341	Update copyright information in several ZFS files, as the clause 3.3 of the CDDL licence explicitly requires every Contributor to add a copyright notice. This also reflects the copyright notices for the changes recently added by Illumos. MFC after: 3 days	2011-10-25 08:35:30 +00:00
Pawel Jakub Dawidek	9782a86c85	- Use better naming now that we allow to rename any mounted file system (not only legacy). - Update copyright to include myself. MFC after: 2 weeks	2011-10-24 21:31:53 +00:00
Pawel Jakub Dawidek	649bbd1cd0	Don't forget to rename mounted snapshots of the file system being renamed. MFC after: 2 weeks	2011-10-24 20:41:31 +00:00
Pawel Jakub Dawidek	27fbc05657	Include <sys/zfs_vfsops.h> only when compiling kernel module. MFC after: 2 weeks	2011-10-24 05:26:40 +00:00
Pawel Jakub Dawidek	497b7ef946	Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' preperty set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. MFC after: 2 weeks	2011-10-24 00:38:09 +00:00
Pawel Jakub Dawidek	72b880fa83	Update per-thread I/O statistics collection in ZFS. This allows to see processes I/O activity in 'top -m io' output. PR kern/156218 Reported by: Marcus Reid <marcus@blazingdot.com> Patch by: avg MFC after: 3 days	2011-10-21 21:49:34 +00:00
Pawel Jakub Dawidek	b39ba076ec	zfs vdev_file_io_start: validate vdev before using vdev_tsd vdev_tsd can be NULL for certain vdev states. At least in userland testing with ztest. Submitted by: avg MFC after: 3 days	2011-10-21 14:00:48 +00:00
Martin Matuska	ceac02f8e6	Import fix for Illumos bug #1475 to reduce diff against upstream. Panic caused by this bug was already partially fixed by pjd@ in p4 CH 185940 and 185942. Reference: 1475 zfs spill block hold can access invalid spill blkptr https://www.illumos.org/issues/1475 Reviewed by: delphij Obtained from: Illumos (issue 1475, changeset 13469:b8e89e5c4167) MFC after: 1 week	2011-10-18 13:58:22 +00:00
Xin LI	4aadb12e0b	Fix a bug in sa_find_sizes() which could lead to panic: When calculating space needed for SA_BONUS buffers, hdrsize is always rounded up to next 8-aligned boundary. However, in two places the round up was done against sum of 'total' plus hdrsize. On the other hand, hdrsize increments by 4 each time, which means in certain conditions, we would end up returning with will_spill == 0 and (total + hdrsize) larger than full_space, leading to a failed assertion because it's invalid for dmu_set_bonus. Sponsored by: iXsystems, Inc. Reviewed by: mm MFC after: 3 days	2011-10-17 22:23:27 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
Martin Matuska	82378711f9	Generalize ffs_pages_remove() into vn_pages_remove(). Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week	2011-08-25 08:17:39 +00:00
Pawel Jakub Dawidek	4969b96e57	We need to unlock and destroy vnode attached to znode which we are freeing. Reviewed by: kib Approved by: re (bz) MFC after: 1 week	2011-08-24 22:07:38 +00:00
Martin Matuska	6e1f1d4690	zfs_ioctl.c: improve code readability in zfs_ioc_dataset_list_next() zvol.c: fix calling of dmu_objset_prefetch() in zvol_create_minors() by passing full instead of relative dataset name and prefetching all visible datasets to be processed later instead of just the pool name Reviewed by: pjd Approved by: re (kib) MFC after: 1 week > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]\|week[s]\|month[s]]. Request a reminder email. > Security: Vulnerability reference (one per line) or description. > Empty fields above will be automatically removed. M opensolaris/uts/common/fs/zfs/zfs_ioctl.c M opensolaris/uts/common/fs/zfs/zvol.c	2011-08-13 21:35:22 +00:00
Martin Matuska	cc82ff1c96	Fix race between dmu_objset_prefetch() invoked from zfs_ioc_dataset_list_next() and dsl_dir_destroy_check() indirectly invoked from dmu_recv_existing_end() via dsl_dataset_destroy() by not prefetching temporary clones, as these count as always inconsistent. In addition, do not prefetch hidden datasets at all as we are not going to process these later. Filed as Illumos Bug #1346 PR: kern/157728 Tested by: Borja Marcos <borjam@sarenet.es>, mm Reviewed by: pjd Approved by: re (kib) MFC after: 1 week	2011-08-13 10:58:53 +00:00
Pawel Jakub Dawidek	7b1085ba55	Eliminate the zfsdev_state_lock entirely and replace it with the spa_namespace_lock. This fixes LOR between the spa_namespace_lock and spa_config lock. LOR can cause deadlock on vdevs removal/insertion. Reported by: gibbs, delphij Tested by: delphij Approved by: re (kib) MFC after: 1 week	2011-08-12 07:04:16 +00:00
Martin Matuska	d32cac295c	Fix panic in zfs_read() if IO_SYNC flag supplied by checking for zfsvfs->z_log before calling zil_commit(). [1] Do not call zfs_read() from zfs_getextattr() with the IO_SYNC flag. Submitted by: Alexander Zagrebin <alex@zagrebin.ru> [1] Reviewed by: pjd@ Approved by: re (kib) MFC after: 3 days	2011-08-02 11:28:33 +00:00
Martin Matuska	ad4887a72a	Fix integer overflow in txg_delay() by initializing the variable "timeout" as clock_t. Filed as Illumos Bug #1313 Reviewed by: avg Approved by: re (kib) MFC after: 3 days	2011-08-01 14:50:31 +00:00
Martin Matuska	4e1407c428	Fix serious bug in ZIL that can lead to pool corruption in the case of a held dataset during remount. Detailed description is available at: https://www.illumos.org/issues/883 illumos-gate revision: 13380:161b964a0e10 Reviewed by: pjd Approved by: re (kib) Obtained from: Illumos (Bug #883) MFC after: 3 days	2011-07-30 19:00:31 +00:00
Xin LI	101b7b5daa	Bring the code more in-line with OpenSolaris source to ease future port. Reviewed by: pjd, mm Approved by: re (kib)	2011-07-21 20:02:22 +00:00
Xin LI	b447d101fa	A different implementation of r224231 proposed by pjd@, which does not require change in the znode structure. Specifically, it queries rdev from the znode in the same sa_bulk_lookup already done in zfs_getattr(). Submitted by: pjd (with some revisions) Reviewed by: pjd, mm Approved by: re (kib)	2011-07-21 20:01:51 +00:00
Xin LI	b1ad061e42	Add a new field to in-core znode, z_rdev, to represent device nodes. PR: kern/159010 Reviewed by: mm@ Approved by: re (kib) MFC after: 2 weeks	2011-07-20 16:53:32 +00:00
Martin Matuska	1bc399c4b1	ZFS tries to allocate blocks evenly across all devices. This means when devices are imbalanced zfs will lots of CPU searching for space on devices which tend to be pretty full. It should instead fail quickly on the full devices and move onto devices which have more availability. New loader tunable: vfs.zfs.mg_alloc_failures (min = 8) Illumos-gate changeset: 13379:4df42cc92254 Obtained from: Illumos (Bug #1051) MFC after: 2 weeks	2011-07-18 08:29:49 +00:00
Martin Matuska	3ded43e7b7	Resurrect the ZFS "aclmode" property Change default of "aclmode" to "discard". Illumos-gate changeset: 13370:8c04143bd318 Obtained from: Illumos (Feature #742) MFC after: 2 weeks	2011-07-18 07:16:44 +00:00
Martin Matuska	fbfed0cda6	Add a new "REFCOMPRESSRATIO" property. For snapshots, this is the same as COMPRESSRATIO, but for filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie, includes blocks in children, but not blocks shared with the origin). This is needed to figure out how much space a filesystem would use if it were not compressed (ignoring snapshots). Illumos-gate revision: 13387 Obtained from: Illumos (Feature #1092) MFC after: 2 weeks	2011-06-28 07:52:01 +00:00
Martin Matuska	85a418012f	Disable vdev cache (readahead) by default. The vdev cache is very underutilized (hit ratio 30%-70%) and may consume excessive memory on systems with many vdevs. Illumos-gate revision: 13346 Obtained from: Illumos (Bug #175) MFC after: 1 week	2011-06-28 06:32:35 +00:00
Ben Laurie	5f301949ef	Fix clang warnings. Approved by: philip (mentor)	2011-06-18 13:56:33 +00:00
Justin T. Gibbs	1c3bf59584	Remove C constructs that are incompatible with C++ from various OpenSolaris and ZFS header files. These changes are sufficient to allow a C++ program to use the libzfs library. Note: The majority of these files already included 'extern "C"' declarations, so the intention of providing C++ compatibility already existed even if it wasn't provided. cddl/compat/opensolaris/include/assert.h: Wrap our compatibility assert implementation in 'extern "C"'. Since this is a compatibility header I matched the Solaris style of doing this explicitly rather than rely on FreeBSD's __BEGIN/END_DECLS macro. sys/cddl/compat/opensolaris/sys/kstat.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h: Rename parameters in function declarations that conflict with C++ keywords. This was the solution preferred by members of the Illumos community. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h: In C, nested structures are visible in the global namespace, but in C++, they take on the namespace of the structure in which they are contained. Flatten nested structure definitions within struct zfs_cmd so these structures are visible in the global namespace when compiled in both languages. Sponsored by: Spectra Logic Corporation	2011-06-10 20:10:30 +00:00
Martin Matuska	baa256da8c	Silence notice on pool creation, import and access. Suggested by: Jeremy Chadwick (freebsd-stable@) Discussed with: pjd MFC after: 1 week	2011-06-07 20:46:31 +00:00
Pawel Jakub Dawidek	b5a060dd8b	Don't pass pointer to name buffer which is on the stack to another thread, because the stack might be paged out once the other thread tries to use the data. Instead, just allocate memory. MFC after: 2 weeks	2011-05-24 20:10:12 +00:00
Pawel Jakub Dawidek	541c60d988	Don't access task structure once we call task function. The task structure might be no longer available. This also allows to eliminates the need for two tasks in the zio structure. Submitted by: anonymous MFC after: 2 weeks	2011-05-24 20:07:15 +00:00
Rick Macklem	965e561750	Fix the zfs file system so that it uses the lock flags argument added to VFS_FHTOVP() by r222167. Reviewed by: pjd	2011-05-22 21:04:32 +00:00
Rick Macklem	694a586a43	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib	2011-05-22 01:07:54 +00:00
Martin Matuska	a5c44f92bf	Restore old (v15) behaviour for a recursive snapshot destroy. (zfs destroy -r pool/dataset@snapshot) To destroy all descendent snapshots with the same name the top level snapshot was not required to exist. So if the top level snapshot does not exist, check permissions of the parent dataset instead. Filed as Illumos Bug #1043 Reviewed by: delphij Approved by: pjd MFC after: together with v28	2011-05-18 07:37:02 +00:00
Marius Strobl	edd870e447	Convert the last use of xcopyout() to ddi_copyout() and remove the now unused xcopyin() as well as xcopyout(). MFC together with r219089. Approved by: mm	2011-05-03 20:13:27 +00:00
Martin Matuska	29bf94b8d8	Fix deduplicated zfs receive (dmu_recv_stream builds incomplete guid_to_ds_map) Illumos-gate changeset: 13329:c48b8bf84ab7 MFC together with v28 Approved by: pjd Obtained from: Illumos (Bug #755)	2011-04-30 14:52:49 +00:00
Marcel Moolenaar	8d098dc0c4	Fix copy-paste bug.	2011-04-27 04:03:04 +00:00
Martin Matuska	8b2aa22d8f	Partially fix ZFS compat code for sparc64. Some endianess bugs still need to be resolved. Submitted by: marius (parts of the fix) MFC after: 1 month	2011-04-08 11:08:26 +00:00
Pawel Jakub Dawidek	65612637e8	Checking file access on size change is bogus. The checks are done earlier by VFS where we know if this is truncate(2) or ftruncate(2). If this is the latter we should depend on the mode the file was opened and not on the current permission. PR: standards/154873 Reported by: Mark Martinec <Mark.Martinec@ijs.si> Discussed with: Eric Schrock <eric.schrock@delphix.com> Discussed with: Mark Maybee <Mark.Maybee@Oracle.COM> MFC after: 1 month	2011-03-24 20:28:09 +00:00
Pawel Jakub Dawidek	d7d23301ae	Fix potential panic in dbuf_sync_list() relate to spill blocks handling. Obtained from: IllumOS MFC after: 1 month	2011-03-14 11:07:12 +00:00
Pawel Jakub Dawidek	cae905e5d0	Correct readdir over ZFS handling. Reported by: Pierre Beyssac <pb@fasterix.frmug.org> MFC after: 1 month	2011-03-08 18:39:41 +00:00
Pawel Jakub Dawidek	a96e8e86f0	Fix libzpool build. MFC after: 1 month	2011-03-06 01:22:14 +00:00
Pawel Jakub Dawidek	2348f1110e	Make renaming of a ZVOL, ZVOL's parent directory and ZVOL snapshot work. Reported by: avg MFC after: 1 month	2011-03-05 22:31:03 +00:00
Pawel Jakub Dawidek	5bf0660559	Simplify zvol_remove_minors() a bit. MFC after: 1 month	2011-03-05 22:24:31 +00:00
Pawel Jakub Dawidek	10b9d77bf1	Finally... Import the latest open-source ZFS version - (SPA) 28. Few new things available from now on: - Data deduplication. - Triple parity RAIDZ (RAIDZ3). - zfs diff. - zpool split. - Snapshot holds. - zpool import -F. Allows to rewind corrupted pool to earlier transaction group. - Possibility to import pool in read-only mode. MFC after: 1 month	2011-02-27 19:41:40 +00:00
Konstantin Belousov	ca67168159	For UIO_NOCOPY case of reading request on zfs vnode, which has vm object attached, activate the page after the successful read, and free the page if read was unsuccessfull. Freshly allocated page is not on any queue yet, and not activating (or deactivating) the page leaves it on no queue, excluding the page from pagedaemon scans and making the memory disappeared until the vnode reclaimed. Reviewed by: avg MFC after: 1 week	2011-02-11 10:46:15 +00:00
Edward Tomasz Napierala	dc7a965673	Make it impossible to clear the MNT_NFS4ACLS flag on ZFS filesystem by using "mount -uw". Reviewed by: pjd MFC after: 2 weeks	2011-02-06 23:34:09 +00:00
Andrey V. Elsukov	459d0e830d	vdev's sectorsize should not be greater than 8 Kbytes and also it should be power of 2. This prevents non-aligned access while probing vdev's labels. PR: kern/147852 Reviewed by: pjd MFC after: 1 week	2011-02-04 15:22:56 +00:00
Edward Tomasz Napierala	7a93bf9a69	Add MNT_NFS4ACLS to ZFS mount flags. It's not conditional, since there is no way to disable NFSv4 ACLs in ZFS. This should make it easier for the NFS server to figure out whether the exported filesystem supports ACLs or not. Reviewed by: pjd MFC after: 2 weeks	2011-01-19 17:11:52 +00:00
Matthew D Fleming	e704482d43	Re-commit the zfs sysctl(9) type-safety changes. Thanks to dim and pjd for the pointer to zfs_context.h for building userland.	2011-01-13 18:20:19 +00:00
Matthew D Fleming	374a993a88	Revert cddl changes for sysctl(9) until I understand why this isn't building on universe.	2011-01-12 23:06:38 +00:00
Matthew D Fleming	4a2ce5903f	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the zfs piece.	2011-01-12 19:53:30 +00:00
Martin Matuska	df06a59a77	MFp4 r186485, r186859: Fix a race by defining two tasks in the zio structure as we can still be returning from issue task when interrupt task is used. Tested by: pjd Approved by: pjd, delphij (mentor) MFC after: 3 days	2011-01-03 12:57:07 +00:00
Pawel Jakub Dawidek	8735863465	Remove redundant semicolon and empty like.	2010-12-11 13:35:25 +00:00
Ivan Voras	d7ccd95be8	Undo r216230: the interaction between saved ashift in metadata and detected ashift does not support this. With this change, pools created while stripesize=512 could not be imported when stripesize becomes larger (on the same drive). Noticed by: pjd	2010-12-07 15:24:08 +00:00
Ivan Voras	8b08562112	Use GEOM stripesize field when calculating ashift. This will enable correct alignment on drives with large sector sizes (e.g. 4 KiB) but the implementation might need to be revisited if devices with large stripesizes appear (e.g. if RAID controllers or flash drives start using the field), probably by introducing a physsectorsize field in GEOM providers. Discussed with: mav, mostly silence on freebsd-geom@ and freebsd-fs@	2010-12-06 12:18:02 +00:00
Andriy Gapon	c59690f249	zfs+sendfile: populate all requested pages, not just those already cached kern_sendfile() uses vm_rdwr() to read-ahead blocks of data to populate page cache. When sendfile stumbles upon a page that is not populated yet, it sends out all the mbufs that it collected so far. This resulted in very poor performance with ZFS when file data is not in the page cache, because ZFS vop_read for UIO_NOCOPY case populated only those pages that are already in cache, but not valid. Which means that most of the time it populated only the first requested page in the described above scenario. Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: Alexander Zagrebin <alexz@visp.ru>, Artemiev Igor <ai@kliksys.ru> MFC after: 12 days	2010-11-16 15:53:44 +00:00
Andriy Gapon	f9e2e99d5d	fix misspelling in a comment Reported by: Daniel Braniss <danny@cs.huji.ac.il> MFC after: 3 days	2010-11-16 12:30:47 +00:00
Martin Matuska	8db47aa15e	Disable VFS_HOLD placed on mnt_vnodecovered during the mount of a snapshot and VFS_RELE on a non-existing hold on snapshot parent's z_vfs. This disables the changes from OpenSolaris onnv-revision 9234:bffdc4fc05c4 (bug IDs: 6792139, 6794830) - not applicable to FreeBSD. This fixes the process hang if umounting a manually mounted snapshot. Reported by: Alexander Zagrebin <alexz@visp.ru> Approved by: delphij (mentor) MFC after: 1 week	2010-11-13 21:09:18 +00:00
Xin LI	b97a9057c2	Validate whether the zfs_cmd_t submitted from userland is not smaller than what we have. Without the check the kernel could accessing memory that does not belong to the request struct. Note that we do not test if the struct equals in size at this time, which may faciliate forward compatibility with newer binaries. Reviewed by: pjd at MeetBSD CA '2010 MFC after: 1 week	2010-11-05 22:18:09 +00:00
Martin Matuska	e25376bdd0	Bugfix merge from OpenSolaris: OpenSolaris onnv-revision: 10209:91f47f0e7728 6830541 zfs_get_data_trips on a verify 6696242 multiple zfs_fillpage() zfs: accessing past end of object panics 6785914 zfs fails to drop dn_struct_rwlock in recovery code path Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6830541, 6696242, 6785914) MFC after: 2 weeks	2010-10-26 15:48:03 +00:00
Andriy Gapon	23a1bcf8c6	zfs: add vop_getpages method implementation This should make vnode_pager_getpages path a bit shorter and clearer. Also this should eliminate problems with partially valid pages. Having this method opens room for future optimizations. To do: try to satisfy other pages besides the required one taking into account tradeofs between number of page faults, read throughput and read latency. Also, eventually vop_putpages should be added too. Reviewed by: kib, mm, pjd MFC after: 3 weeks	2010-10-16 20:43:05 +00:00
Rui Paulo	6e634bb80f	In zfs_post_common(), use %d instead of %hhu. Found with: clang	2010-10-13 17:12:23 +00:00
Andriy Gapon	f6bb41924c	zfs + sendfile: do not produce partially valid pages for vnode's tail Since r212650 and before this change sendfile(2) could produce a partially valid page for a trailing portion of a ZFS vnode. vm_fault() always wants to see a fully valid page even if it's the last page that partially extends beyond vnode's end. Otherwise it calls vop_getpages() to bring in the page. In the case of ZFS this means that the data is read from the page into the same page and this breaks checks in ZFS mappedread() - a thread that set VPO_BUSY on the page in vm_fault() will get blocked forever waiting for it to be cleared. Many thanks to Kai and Jeremy for reproducing the issue and providing important debugging information and help. Reported by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Tested by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Reviewed by: kib MFC after: 3 days To-Do: apply the same treatment to tmpfs + sendfile	2010-10-12 17:04:21 +00:00
Pawel Jakub Dawidek	19ebc67beb	Provide internal ioflags() function that converts ioflag provided by FreeBSD's VFS to OpenSolaris-specific ioflag expected by ZFS. Use it for read and write operations. Reviewed by: mm MFC after: 1 week	2010-10-10 20:49:33 +00:00
Martin Matuska	a362d75576	Change FAPPEND to IO_APPEND as this is a ioflag and not a fflag. This corrects writing to append-only files on ZFS. PR: kern/149495 [1], kern/151082 [2] Submitted by: Daniel Zhelev <daniel@zhelev.biz> [1], Michael Naef <cal@linu.gs> [2] Approved by: delphij (mentor) MFC after: 1 week	2010-10-08 23:01:38 +00:00
Martin Matuska	aa007a9f0e	Properly handle IO with B_FAILFAST Retry IO once with ZIO_FLAG_TRYHARD before declaring a pool faulted OpenSolaris revision and Bug IDs: 9725:0bf7402e8022 6843014 ZFS B_FAILFAST handling is broken Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6843014) MFC after: 3 weeks	2010-09-27 09:42:31 +00:00
Martin Matuska	96a1a6a568	Enable offlining of log devices. OpenSolaris revision and Bug IDs: 9701:cc5b64682e64 6803605 should be able to offline log devices 6726045 vdev_deflate_ratio is not set when offlining a log device 6599442 zpool import has faults in the display Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6803605, 6726045, 6599442) MFC after: 3 weeks	2010-09-27 09:05:51 +00:00
Andriy Gapon	68653c3bd6	zfs_map_page/zfs_unmap_page: do not use sched_pin() and SFB_CPUPRIVATE zfs_map_page/zfs_unmap_page are mostly called around potential I/O paths and it seems to be a not very good idea to do cpu pinning there. Suggested by: kib MFC after: 2 weeks	2010-09-21 05:58:45 +00:00
Andriy Gapon	ff5e15a487	zfs_vnops: use zfs_map_page/zfs_unmap_page helper functions in another place MFC after: 2 weeks	2010-09-21 05:54:36 +00:00
Andriy Gapon	9d5eb9aa5d	zfs arc_reclaim_needed: fix typo in mismerge in r212780 PR: kern/146410, kern/138790 MFC after: 3 weeks X-MFC with: r212780	2010-09-17 07:34:50 +00:00
Andriy Gapon	921d3fd122	zfs+sendfile: advance uio_offset upon reading as well Picked from analogous code in tmpfs. MFC after: 1 week	2010-09-17 07:20:20 +00:00
Andriy Gapon	44532bc5cd	zfs arc_reclaim_needed: remove redundant checks for arc_c_max and arc_c_max Those checks are not present in upstream code and they are enforced in actual calculations of delta by which ARC size can be grown or should be reduced. MFC after: 3 weeks	2010-09-17 07:17:38 +00:00
Andriy Gapon	7c1353491f	zfs arc_reclaim_needed: more reasonable threshold for available pages vm_paging_target() is not a trigger of any kind for pageademon, but rather a "soft" target for it when it's already triggered. Thus, trying to keep 2048 pages above that level at the expense of ARC was simply driving ARC size into the ground even with normal memory loads. Instead, use a threshold at which a pagedaemon scan is triggered, so that ARC reclaiming helps with pagedaemon's task, but the latter still recycles active and inactive pages. PR: kern/146410, kern/138790 MFC after: 3 weeks	2010-09-17 07:14:07 +00:00
Martin Matuska	d1ee63f836	Fix kernel panic when moving a file to .zfs/shares Fix possible loss of correct error return code in ZFS mount OpenSolaris revisions and Bug IDs: 11824:53128e5db7cf 6863610 ZFS mount can lose correct error return 12079:13822b941977 6939941 problem with moving files in zfs (142901-12) Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6863610, 6939941) MFC after: 3 days	2010-09-15 19:55:26 +00:00
Andriy Gapon	8a3883cfb7	zfs vn_has_cached_data: take into account v_object->cache != NULL This mirrors code in tmpfs. This changge shouldn't affect much read path, it may cause unnecessary vm_page_lookup calls in the case where v_object has no active or inactive pages but has some cache pages. I believe this situation to be non-essential. In write path this change should allow us to properly detect the above case and free a cache page when we write to a range that corresponds to it. If this situation is undetected then we could have a discrepancy between data in page cache and in ARC or on disk. This change allows us to re-enable vn_has_cached_data() check in zfs_write. NOTE: strictly speaking resident_page_count and cache fields of v_object should be exmined under VM_OBJECT_LOCK, but for this particular usage we may get away with it. Discussed with: alc, kib Approved by: pjd Tested with: tools/regression/fsx MFC after: 3 weeks	2010-09-15 11:05:41 +00:00
Andriy Gapon	0b1ca38a69	zfs mappedread, update_pages: use int for offset and length within a page uint64_t, int64_t were redundant there Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks	2010-09-15 10:48:16 +00:00
Andriy Gapon	c002c3e8c2	zfs mappedread: use uiomove_fromphys where possible Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks	2010-09-15 10:44:20 +00:00
Andriy Gapon	fbbdb19dcd	zfs: catch up with vm_page_sleep_if_busy changes Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks	2010-09-15 10:39:21 +00:00
Andriy Gapon	21bd3e2576	tmpfs, zfs + sendfile: mark page bits as valid after populating it with data Otherwise, adding insult to injury, in addition to double-caching of data we would always copy the data into a vnode's vm object page from backend. This is specific to sendfile case only (VOP_READ with UIO_NOCOPY). PR: kern/141305 Reported by: Wiktor Niesiobedzki <bsd@vink.pl> Reviewed by: alc Tested by: tools/regression/sockets/sendfile MFC after: 2 weeks	2010-09-15 10:31:27 +00:00
Martin Matuska	9a13d2e1b3	Remove duplicated VFS_HOLD due to a mismerge. PR: kern/150544 Approved by: delphij (mentor) MFC after: 1 day	2010-09-14 12:12:18 +00:00
Martin Matuska	4eeef2e44a	Add missing vop_vector zfsctl_ops_shares Add missing locks around VOP_READDIR and VOP_GETATTR with z_shares_dir PR: kern/150544 Approved by: delphij (mentor) Obtained from: perforce (pjd) MFC after: 1 day	2010-09-14 10:27:32 +00:00
Pawel Jakub Dawidek	3c907063e9	Remove the page queues lock around vm_page_undirty() - it is no longer needed. Reviewed by: alc	2010-09-13 19:47:09 +00:00
Rui Paulo	47047e3418	Revamp locking a bit. This fixes three problems: * processes now can't go away while we are inserting probes (fixes a panic) * if a trap happens, we won't be holding the process lock (fixes a hang) * fix a LOR between the process lock and the fasttrap bucket list lock Thanks to kib for pointing some problems. Sponsored by: The FreeBSD Foundation	2010-09-12 14:12:16 +00:00
Rui Paulo	eae81e9501	Avoid a LOR (sleepable after non-sleepable) in fasttrap_tracepoint_enable(). Sponsored by: The FreeBSD Foundation	2010-09-11 12:58:31 +00:00
Matthew D Fleming	4d369413e1	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
Pawel Jakub Dawidek	86b19d1861	On FreeBSD we can log from pool that have multiple top-level vdevs or log vdevs, so don't deny adding new vdevs if bootfs property is set. MFC after: 2 weeks	2010-09-09 21:20:18 +00:00
Rui Paulo	d3555b6fc2	Fix two bugs in DTrace: * when the process exits, remove the associated USDT probes * when the process forks, duplicate the USDT probes. Sponsored by: The FreeBSD Foundation	2010-09-09 09:58:05 +00:00
Justin T. Gibbs	f03f7a0ca3	Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it. The barrier semantics of bioq_insert_tail() were broken in two ways: o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio. o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice. sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail(). o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active. o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows. o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction. sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set. sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command. Wrap some lines to 80 columns. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED. Sponsored by: Spectra Logic Corporation MFC after: 1 month	2010-09-02 19:40:28 +00:00
Jaakko Heinonen	de478dd4b4	execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege. Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately. PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)	2010-08-30 16:30:18 +00:00
Pawel Jakub Dawidek	b8a4becc2d	Return NULL pointer instead of B_FALSE as it is done in the vendor code. Obtained from: //depot/user/pjd/zfs/...	2010-08-28 19:29:06 +00:00
Pawel Jakub Dawidek	3e9e888541	Move ZUT_OBJS in the same place that is used in vendor code. Obtained from: //depot/user/pjd/zfs/...	2010-08-28 19:28:12 +00:00
Martin Matuska	8d87b396f8	Import changes from OpenSolaris that provide - better ACL caching and speedup of ACL permission checks - faster handling of stat() - lowered mutex contention in the read/writer lock (rrwlock) - several related bugfixes Detailed information (OpenSolaris onnv changesets and Bug IDs): 9749:105f407a2680 6802734 Support for Access Based Enumeration (not used on FreeBSD) 6844861 inconsistent xattr readdir behavior with too-small buffer 9866:ddc5f1d8eb4e 6848431 zfs with rstchown=0 or file_chown_self privilege allows user to "take" ownership 9981:b4907297e740 6775100 stat() performance on files on zfs should be improved 6827779 rrwlock is overly protective of its counters 10143:d2d432dfe597 6857433 memory leaks found at: zfs_acl_alloc/zfs_acl_node_alloc 6860318 truncate() on zfsroot succeeds when file has a component of its path set without access permission 10232:f37b85f7e03e 6865875 zfs sometimes incorrectly giving search access to a dir 10250:b179ceb34b62 `6867395` zpool_upgrade_007_pos testcase panic'd with BAD TRAP: type=e (#pf Page fault) 10269:2788675568fd 6868276 zfs_rezget() can be hazardous when znode has a cached ACL 10295:f7a18a1e9610 6870564 panic in zfs_getsecattr Approved by: delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 2 weeks	2010-08-28 09:24:11 +00:00

... 5 6 7 8 9 ...

1002 Commits