freebsd-nq

Author	SHA1	Message	Date
Andriy Gapon	0908b20b7e	Revert r269093 which introduced physical zio alignment transform Size of physical ZIOs must never be implicitly adjusted, it's a responsibility of a caller to make sure that such a ZIO has proper offset and size. Discussed with: delphij, gibbs MFC after: 2 weeks	2014-11-17 14:16:02 +00:00
Steven Hartland	a559adfbce	Disable TRIM on file backed ZFS vdevs and fix TRIM on init After r265152 TRIM requests are ZIO_TYPE_FREE instead of ZIO_TYPE_IOCTL this meant file backed vdevs to attempted to process the ZIO as a write causing a panic. We now disable TRIM on file backed vdevs and ASSERT the ZIO types supported by each vdev type to ensure we explicity support the ZIO type being processed. Also ensure that TRIM on init is not procesed for devices which declare they didn't support TRIM via vdev_notrim. PR: 195061, 194976, 191573 Sponsored by: Multiplay	2014-11-17 11:32:10 +00:00
Konstantin Belousov	6e646651d3	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
Xin LI	8bcd603968	MFV r274273: ZFS large block support. Please note that booting from datasets that have recordsize greater than 128KB is not supported (but it's Okay to enable the feature on the pool). This may remain unchanged because of memory constraint. Limited safety belt is provided for mounted root filesystem but use caution is advised. Illumos issue: 5027 zfs large block support MFC after: 1 month	2014-11-10 08:20:21 +00:00
Xin LI	42350b6bde	MFV r274272 and diff reduction with upstream. Illumos issue: 5244 zio pipeline callers should explicitly invoke next stage Tested with: ztest plus ZFS over GELI configuration MFC after: 1 month	2014-11-09 07:37:00 +00:00
Xin LI	81f1255e58	MFV r274271: Improve zdb -b performance: - Reduce gethrtime() call to 1/100th of blkptr's; - Skip manipulating the size-ordered tree; - Issue more (10, previously 3) async reads; - Use lighter weight testing in traverse_visitbp(); Illumos issue: 5243 zdb -b could be much faster MFC after: 2 weeks	2014-11-08 07:30:40 +00:00
Andriy Gapon	2fd3cc0cb2	fix l2arc compression buffers leak We have observed that arc_release() can be called concurrently with a l2arc in-flight write. Also, we have observed that arc_hdr_destroy() can be called from arc_write_done() for a zio with ZIO_FLAG_IO_REWRITE flag in similar circumstances. Previously the l2arc headers would be freed while leaking their associated compression buffers. Now the buffers are placed on l2arc_free_on_write list for delayed freeing. This is similar to what was already done to arc buffers that were supposed to be freed concurrently with in-flight writes of those buffers. In addition to fixing the discovered leaks this change also adds some protective code to assert that a compression buffer associated with a l2arc header is never leaked. A new kstat l2_cdata_free_on_write is added. It keeps a count of delayed compression buffer frees which previously would have been leaks. Tested by: Vitalij Satanivskij <satan@ukr.net> et al Requested by: many MFC after: 2 weeks Sponsored by: HybridCluster / ClusterHQ	2014-11-06 11:08:02 +00:00
Alexander Motin	c3e7ba3e6d	Add to CTL support for logical block provisioning threshold notifications. For ZVOL-backed LUNs this allows to inform initiators if storage's used or available spaces get above/below the configured thresholds. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-11-06 00:48:36 +00:00
Josh Paetzel	14127f5b21	This change addresses 4 bugs in ZFS exposed by Richard Kojedzinszky's crash.sh script attached to FreeNAS bug 4109: https://bugs.freenas.org/issues/4109 Three are in the snapshot layer: a) AVG explains in his notes: https://wiki.freebsd.org/AvgVfsSolarisVsFreeBSD "VOP_INACTIVE must not do any destructive actions to a vnode and its filesystem node, nor invalidate them in any way." gfs_vop_inactive and zfsctl_snapshot_inactive did just that. In OpenSolaris VOP_INACTIVE is much closer to FreeBSD's VOP_RECLAIM. Rename & move them to gfs_vop_reclaim and zfsctl_snapshot_reclaim and merge in the requisite vnode_destroy from zfsctl_common_reclaim. b) gfs_lookup_dot and various zfsctl functions do not honor the FreeBSD VFS convention of only locking from the root downward. When looking up ".." the convention is to drop the current leaf vnode lock before acquiring the directory vnode and then subsequently re-acquiring the lock on the leaf vnode. This fixes that in all the places that our exercised by crash.sh. c) The snapshot may already be unmounted when the directory vnode is reclaimed. Check for this case and return. One in the common layer: d) Callers of traverse expect the reference to the vnode passed in to be maintained. Don't release it. This last one may be an unclear contract. There may in fact be some callers that do expect the reference to be dropped on success in addition to callers that expect it to be released. In this case a further audit of the callers is needed and a consensus on the correct behavior. PR: 184677 Submitted by: kmacy Reviewed by: delphij, will, avg MFC after: 2 weeks Sponsored by: iXsystems	2014-10-25 17:42:44 +00:00
Justin Hibbits	3ff2096995	Whitespace X-MFC-with: r273570 MFC after: 1 week	2014-10-24 03:34:21 +00:00
Justin Hibbits	24d5dfb116	Three updates to PowerPC FBT: * Use a constant to define the number of stack frames in a probe exception. * Only allow function symbols in powerpc64 ('.' prefixed) * Set the fbtp_roffset for return probes, so the correct dtrace_probe call is made. MFC after: 1 week	2014-10-24 03:33:01 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Xin LI	78701de4b7	Add tunable vfs.zfs.space_map_blksz for space map's maximum block size. MFC after: 2 weeks	2014-10-18 22:11:10 +00:00
Davide Italiano	2be111bf7d	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
Steven Hartland	ca6505b818	Prevent ZFS leaking pool free space When processing async destroys ZFS would leak space every txg timeout (5 seconds by default), if no writes occurred, until the pool is totally full. At this point it would be unfixable without a pool recreation. In addition if the machine was rebooted with the pool in this situation would fail to import on boot, hanging indefinitely, as the import process requires the ability to write data to the pool. Any attempts to query the pool status during the hung import would not return as the import holds the pool lock. The only way to import such a pool would be to specify -o readonly=on to the zpool import. zdb -bb <pool> can be used to check for "deferred free" size which is where this lost space will be counted. MFC after: 3 days Sponsored by: Multiplay	2014-10-16 02:23:27 +00:00
Xin LI	ba6e85e0cf	Use write_psize instead of write_asize when doing vdev_space_update. Without this change the accounting of L2ARC usage would be wrong and give 16EB free space because the number became negative and overflows. Obtained from: FreeNAS (issue #6239) MFC after: 2 weeks	2014-10-13 20:39:51 +00:00
Xin LI	a4f5b8db9f	Add a tunable for arc_shrink_shift (vfs.zfs.arc_shrink_shift) that controls how much fraction, 1/2^arc_shrink_shift, should be reclaimed when there is memory pressure. Submitted by: Richard Kojedzinszky <krichy at tvnetwork.hu> MFC after: 2 weeks	2014-10-13 05:34:10 +00:00
Xin LI	eba15cf463	MFV r272804: Refactor the code and stop restore_object from creating two transactions. Illumos issue: 3693 restore_object uses at least two transactions to restore an object MFC after: 2 weeks	2014-10-09 07:52:51 +00:00
Xin LI	ce44f14b41	MFV r272803: Illumos issue: 5175 implement dmu_read_uio_dbuf() to improve cached read performance MFC after: 2 weeks	2014-10-09 07:18:40 +00:00
Andriy Gapon	c3d1d2e104	l2arc_write_buffers: reduce headroom value FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS data lists (currently both are 16) while illumos has just a single list of each kind. headroom determines how much data is scanned on a single list during each run of the l2arc feed thread. Because FreeBSD has more lists we proportionally decrease the limit. Reviewed by: Brendan Gregg (earlier version) MFC after: 2 weeks Sponsored by: HybridCluster	2014-10-07 16:08:21 +00:00
Andriy Gapon	9f96723ec5	revert r272702: wrong (earlier) change was committed	2014-10-07 16:06:10 +00:00
Andriy Gapon	4c3b02bfce	reduce L2ARC_WRITE_SIZE on FreeBSD FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS data lists (currently both are 16) while illumos has just a single list of each kind. L2ARC_WRITE_SIZE determines the default value of l2arc_write_max which defines limits on how much data is scanned and written to a cache device during each run of the l2arc feed thread. The limits are applied on the per buffer list basis. Because FreeBSD has more lists we proportionally reduce the limits. Reviewed by: Brendan Gregg (earlier version) MFC after: 2 weeks Sponsored by: HybridCluster	2014-10-07 14:30:24 +00:00
Andriy Gapon	ab26525af2	make userland __assfail from opensolaris compat honor 'aok' variable This should allow zdb -A option to actually make difference. MFC after: 2 weeks	2014-10-07 14:15:50 +00:00
Xin LI	1b5bcb8425	MFV r272591: Use loaned ARC buffer for zfs receive to avoid copy. Illumos issue: 5162 zfs recv should use loaned arc buffer to avoid copy MFC after: 2 weeks	2014-10-06 07:29:17 +00:00
Xin LI	8fb26f5aef	MFV r272585: Split the godfather zio into CPU number's to reduce lock contention. Illumos issue: 5176 lock contention on godfather zio MFC after: 2 weeks	2014-10-06 07:03:17 +00:00
Xin LI	dcb20006f0	MFV r272501: Illumos issue: 5177 remove dead code from dsl_scan.c MFC after: 2 weeks	2014-10-06 05:46:51 +00:00
Xin LI	00769ce74d	MFV r272500: Don't inherit flags other than DS_FLAG_CI_DATASET and DS_FLAG_INCONSISTENT when cloning. This prevents DS_FLAG_DEFER_DESTROY being inherited from a clone that is marked for deferred destroy, which causes snapshots of the clone being destroyed when getting a hold or clone. Illumos issue: 5150 zfs clone of a defer_destroy snapshot causes strangeness MFC after: 1 week	2014-10-06 05:42:20 +00:00
Xin LI	4bb264ae15	Don't make nested definition for range_seg_cache. Reported by: ian MFC after: 1 week X-MFC-With: r272506	2014-10-04 15:42:52 +00:00
Xin LI	4750c382a9	MFV r272499: Illumos issue: 5174 add sdt probe for blocked read in dbuf_read() MFC after: 2 weeks	2014-10-04 08:55:08 +00:00
Xin LI	eb0b70068c	Add a new sysctl, vfs.zfs.vol.unmap_enabled, which allows the system administrator to toggle whether ZFS should ignore UNMAP requests. Illumos issue: 5149 zvols need a way to ignore DKIOCFREE MFC after: 2 weeks	2014-10-04 08:51:57 +00:00
Xin LI	2d36d67c72	Diff reduction with upstream. The code change is not really applicable to FreeBSD. Illumos issue: 5148 zvol's DKIOCFREE holds zfsdev_state_lock too long MFC after: 1 month	2014-10-04 08:41:23 +00:00
Xin LI	523b4c7fdf	MFV r272496: Add tunable for number of metaslabs per vdev (vfs.zfs.vdev.metaslabs_per_vdev). The default remains at 200. Illumos issue: 5161 add tunable for number of metaslabs per vdev MFC after: 2 weeks	2014-10-04 08:29:48 +00:00
Xin LI	a8d7512709	MFV r272495: In arc_kmem_reap_now(), reap range_seg_cache too to reclaim memory in response of memory pressure. Illumos issue: 5163 arc should reap range_seg_cache MFC after: 1 week	2014-10-04 08:14:10 +00:00
Xin LI	8c20e2ff11	MFV r272494: Make space_map_truncate() always do space_map_reallocate(). Without this, setting space_map_max_blksz would cause panic for existing pool, as dmu_objset_set_blocksize would fail if the object have multiple blocks. Illumos issues: 5164 space_map_max_blksz causes panic, does not work 5165 zdb fails assertion when run on pool with recently-enabled spacemap_histogram feature MFC after: 2 weeks	2014-10-04 08:05:39 +00:00
Steven Hartland	14a0d74ea8	Refactor ZFS ARC reclaim checks and limits Remove previously added kmem methods in favour of defines which allow diff minimisation between upstream code base. Rebalance ARC free target to be vm_pageout_wakeup_thresh by default which eliminates issue where ARC gets minimised instead of balancing with VM pageout. The restores the target point prior to r270759. Bring in missing upstream only changes which move unused code to further eliminate code differences. Add additional DTRACE probe to aid monitoring of ARC behaviour. Enable upstream i386 code paths on platforms which don't define UMA_MD_SMALL_ALLOC. Fix mixture of byte an page values in arc_memory_throttle i386 code path value assignment of available_memory. PR: 187594 Review: D702 Reviewed by: avg MFC after: 1 week X-MFC-With: r270759 & r270861 Sponsored by: Multiplay	2014-10-03 20:34:55 +00:00
Steven Hartland	99140218aa	Fix various issues with zvols When performing snapshot renames we could deadlock due to the locking in zvol_rename_minors. In order to avoid this use the same workaround as zvol_open in zvol_rename_minors. Add missing zvol_rename_minors to dsl_dataset_promote_sync. Protect against invalid index into zv_name in zvol_remove_minors. Replace zvol_remove_minor calls with zvol_remove_minors to ensure any potential children are also renamed. Don't fail zvol_create_minors if zvol_create_minor returns EEXIST. Restore the valid pool check in zfs_ioc_destroy_snaps to ensure we don't call zvol_remove_minors when zfs_unmount_snap fails. PR: 193803 MFC after: 1 week Sponsored by: Multiplay	2014-10-03 14:49:48 +00:00
Marcelo Araujo	d8a5961f88	Fix failures and warnings reported by newpynfs20090424 test tool. This fix addresses only issues with the pynfs reports, none of these issues are know to create problems for extant real clients. Submitted by: Bart Hsiao <bart.hsiao@gmail.com> Reworked by: myself Reviewed by: rmacklem Approved by: rmacklem Sponsored by: QNAP Systems Inc.	2014-10-03 02:24:41 +00:00
Xin LI	43ac3722ac	Diff reduction with kernel code: instruct the compiler that the data of these types may be unaligned to their "normal" alignment and exercise caution when accessing them. PR: 194071 MFC after: 3 days	2014-10-02 00:13:08 +00:00
Will Andrews	fbce0221eb	zfsvfs_create(): Refuse to mount datasets whose names are too long. This is checked for in the zfs_snapshot_004_neg STF/ATF test (currently still in projects/zfsd rather than head). sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c: - zfsvfs_create(): Check whether the objset name fits into statfs.f_mntfromname, and return ENAMETOOLONG if not. Although the filesystem can be unmounted via the umount(8) command, any interface that relies on iterating on statfs (e.g. libzfs) will fail to find the filesystem by its objset name, and thus assume it's not mounted. This causes "zfs unmount", "zfs destroy", etc. to fail on these filesystems, whether or not -f is passed. MFC after: 1 month Sponsored by: Spectra Logic MFSpectraBSD: 974872 on 2013/08/09	2014-10-01 14:12:02 +00:00
Xin LI	0b66c7c514	Fix a mismerge in r260183 which prevents snapshot zvol devices being removed and re-instate the fix in r242862. Reported by: Leon Dang <ldang nahannisys com>, smh MFC after: 3 days	2014-09-30 18:50:45 +00:00
Steven Hartland	8caa3daf35	Remove sys/types.h include as per style (9) SDT requries sys/param.h due to use of NULL Reported by: Garrett Sponsored by: Multiplay	2014-09-18 20:38:18 +00:00
Steven Hartland	71f3caaf31	Add dtrace probe support for zfs SET_ERROR(..) MFC after: 1 week Sponsored by: Multiplay	2014-09-18 20:00:36 +00:00
Will Andrews	91dda985cc	Remove debug.zfs_flags in favor of the new vfs.zfs.debug_flags. Replace TUNABLE_INT with CTLFLAG_RWTUN. Submitted by: avg (debug.zfs_flags removal), smh (TUNABLE_INT replacement)	2014-09-18 18:46:38 +00:00
Will Andrews	f8c2f66a6c	Enable ZFS debug flags to be modified via vfs.zfs.debug_flags. This is primarily only of interest to ZFS developers, but it makes it easier to get additional debugging. Submitted by: gibbs MFC after: 1 month Sponsored by: Spectra Logic MFSpectraBSD: 517074 on 2011/12/15 (by will), 662343 on 2013/03/20 (by gibbs)	2014-09-18 16:55:41 +00:00
Will Andrews	cf0a1157d7	Reorder sysctls for spa.c global tunables; add sysctl for ccw_retry_interval. MFC after: 1 month Sponsored by: Spectra Logic	2014-09-18 16:38:03 +00:00
Will Andrews	cf7a096e72	bpobj_iterate_impl(): Close a refcount leak iterating on a sublist. If bpobj_space() returned non-zero here, the sublist would have been left open, along with the bonus buffer hold it requires. This call does not invoke any calls to bpobj_close() itself. This bug doesn't have any known vector, but was found on inspection. MFC after: 1 week Sponsored by: Spectra Logic Affects: All ZFS versions starting 21 May 2010 (illumos cde58dbc) MFSpectraBSD: r1050998 on 2014/03/26	2014-09-18 15:37:53 +00:00
Steven Hartland	d1d469e22b	Remove unused ZFS ARC functions * arc_data_buf_alloc * arc_data_buf_free MFC after: 1 week Sponsored by: Multiplay	2014-09-18 10:46:51 +00:00
Justin Hibbits	e40a5cd3ec	Fix the stack tracing for dtrace/powerpc. Summary: Fix the stack tracing for dtrace/powerpc by using the trapexit/asttrapexit return address sentinels instead of checking within the kernel address space. As part of this, I had to add new inline functions. FBT traces the kernel, so we have to have special case handling for this, since a trap will create a full new trap frame, and there's no way to pass around the 'real' stack. I handle this by special-casing 'aframes == 0' with the trap frame. If aframes counts out to the trap frame, then assume we're looking for the full kernel trap frame, so switch to the real stack pointer. Test Plan: Tested on powerpc64 Reviewers: rpaulo, markj, nwhitehorn Reviewed By: markj, nwhitehorn Differential Revision: https://reviews.freebsd.org/D788 MFC after: 3 week Relnotes: Yes	2014-09-17 02:43:47 +00:00
Steven Hartland	a889b18c52	Added missing ZFS sysctls * vfs.zfs.vdev.async_write_active_min_dirty_percent * vfs.zfs.vdev.async_write_active_max_dirty_percent Added validation of min / max for ZFS sysctl * vfs.zfs.dirty_data_max_percent MFC after: 3 days	2014-09-14 12:23:00 +00:00
Xin LI	f9290bc2c9	MFV r271518: Correctly report hole at end of file. When asked to find a hole, the DMU sees that there are no holes in the object, and returns ESRCH. The ZPL interprets this as "no holes before the end of the file", and therefore inserts the "virtual hole" at the end of the file. Because DMU and ZPL have different ideas of where the end of an object/file is, we will end up returning the end of file, which is generally larger, instead of returning the end of object. The fix is to handle the "virtual hole" in the DMU. If no hole is found, the DMU will return a hole at the end of the file, rather than an error. Illumos issue: 5139 SEEK_HOLE failed to report a hole at end of file MFC after: 1 week	2014-09-13 17:48:44 +00:00

1 2 3 4 5 ...

1144 Commits