freebsd-skq

Author	SHA1	Message	Date
smh	4c023d7eaf	MFC r282205: Fix misuse of input argument in traverse_visitbp Obtained from: zfsonlinux (a585f2f844ed3d4270221fed88f5e494eb55d932 Sponsored by: Multiplay	2015-05-12 09:25:16 +00:00
avg	fa15cb3cc7	MFC r282131: replace a comment about zfs recv -F corner case with a longer one	2015-05-11 09:43:03 +00:00
avg	af4d550915	MFC r282130: zfs_onexit_fd_hold: return EBADF even if devfs_get_cdevpriv gave ENOENT	2015-05-11 08:46:03 +00:00
avg	45e0efb4f5	MFC r282127: dsl_dir_rename_check: return EXDEV on cross-pool rename attempt	2015-05-11 08:43:20 +00:00
avg	75d7ef7863	MFC r282126: FV r282123: 5610 zfs clone from different source and target pools	2015-05-11 08:40:55 +00:00
avg	0b01455958	MFC r282125: MFV r282124: 5393 spurious failures from dsl_dataset_hold_obj()	2015-05-11 08:36:58 +00:00
avg	87689acc05	MFC r282122: nvpair_type_is_array: DATA_TYPE_INT8_ARRAY was not recognized	2015-05-11 08:33:49 +00:00
avg	a8e9a4b88b	MFC r275576: remove opensolaris cyclic code, replace with high-precision callouts	2015-05-11 07:54:39 +00:00
mav	bd39d936df	MFC r281026, r281108, r281109: Make ZFS ARC track both KVA usage and fragmentation. Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage reaches certain threshold (3/4 on i386 or 16/17 otherwise). FreeBSD has even less KVA, but had no such limit on archs with direct map as amd64. As result, on machines with a lot of RAM, during load with very small user- space memory pressure, such as `zfs send`, it was possible to reach state, when there is enough both physical RAM and KVA (I've seen up to 25-30%), but no continuous KVA range to allocate even single 128KB I/O request. Address this situation from two sides: - restore KVA usage limitations in a way the most close to Illumos; - introduce new requirement for KVA fragmentation, specifying that we should have at least one sequential KVA range of zfs_max_recordsize bytes. Experiments show that first limitation done alone is not sufficient. On machine with 64GB of RAM it is sometimes needed to drop up to half of ARC size to get at leats one 1MB KVA chunk. Statically limiting ARC to half of KVA/RAM is too strict, so second limitation makes it to work in cycles: accumulate trash up to certain critical mass, do massive spring-cleaning, and then start littering again.	2015-05-03 07:13:14 +00:00
delphij	44560bae19	MFC r281667: Remove vfs.zfs.snapshot_list_prefetch, the corresponding code was gone in r248571 already.	2015-04-25 00:36:43 +00:00
markj	b4c145b73c	MFC r280834: Bound the number of frames traversed when executing the ustackdepth action.	2015-04-13 01:42:24 +00:00
mav	2aa6497cbb	MFC r280822: Some cosmetic polishing. No functional change.	2015-04-05 06:53:29 +00:00
mav	bfc574567f	MFC r279927: Make DIOCGATTR in device mode handle "GEOM::candelete".	2015-03-27 09:28:30 +00:00
rwatson	6102a34d38	Merge r263233 from HEAD to stable/10: Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. Sponsored by: Google, Inc.	2015-03-19 13:37:36 +00:00
mav	683ce518a8	MFC r277419: Allow skipping dmu_buf_will_dirty() call in dsl_dir_transfer_space(). dsl_dir_transfer_space() is mostly called after dsl_dir_diduse_space(), which already calls dmu_buf_will_dirty() for the same dbuf and tx, so its duplicate call in those cases will change nothing, only spend time. Skipping this call by four times reduces time spent in dbuf_write_done() and descendants, updating dataset statistics with several congested lock acquisitions. When rewriting 8K zvol blocks at 1GB/s rate, this reduces CPU time spent inside dbuf_write_done(), according to profiling, from 45% of 683K samples to 18% of 422K.	2015-02-03 08:06:13 +00:00
smh	ee75a2f04f	MFC r276123: Always sync the global ZFS config cache to reflect the new mosconfig MFC r277351: Clean ZFS spa config before syncing Sponsored by: Multiplay	2015-02-01 12:39:40 +00:00
mav	4a44a3d569	MFC r277185: Fix overflow bug from r248577, turning 30s TRIM timeout into ~4s.	2015-01-28 02:56:18 +00:00
mav	33230ae6d4	MFC r277169: Reimplement TRIM throttling added in r248577. Previous throttling implementation approached problem from the wrong side. It significantly limited useful delaying of TRIM requests and aggregation potential, while not so much controlled TRIM burstiness under heavy load. With this change random 4K write benchmarks (probably the worst case for TRIM) show me IOPS increase by 20%, average latency reduction by 30%, peak TRIM bursts reduction by 3 times and same peak TRIM map size (memory usage). Also the new logic does not force map size down so heavily, really allowing to keep deleted data for 32 TXG or 30 seconds under moderate load. It was practically impossible with old throttling logic, which pushed map down to only 64 segments.	2015-01-28 02:55:20 +00:00
mav	58f009f0d3	MFC r277096: Skip extra bcopy() when scrubbing vdev without redundancy. According to profiler, this bcopy() can use about 10% of CPU time.	2015-01-26 16:29:07 +00:00
mav	e6f27f6344	MFC r276983: When aggregating TRIM segments, move the new one to the end. New segment at the list head may block all TRIM requests until txg of that segment can be processed. On my random I/O tests this change reduce peak TRIM list length from 650 to 450 segments. Hopefully it should reduce TRIM burstiness when list processing is unblocked.	2015-01-25 14:31:44 +00:00
mav	35b0606440	MFC r276952: Add LBA as secondary sort key for synchronous I/O requests. On FreeBSD gethrtime() implemented via getnanouptime(), that has 1ms (1/hz) precision. It makes primary sort key (timestamp) collision very possible. In such situations sorting by secondary key of LBA is much more reasonable then by totally meaningless zio pointer value. With this change on multi-threaded synchronous ZVOL read I've measured 10% throughput increase and average latency reduction.	2015-01-25 14:29:40 +00:00
mav	a833c07b8b	MFC r276913: Use new optimized dmu_read_uio_dbuf() for ZVOLs in device mode. This slightly reduces overhead by avoiding dnode_hold()/dnode_rele() calls.	2015-01-25 14:25:44 +00:00
delphij	ae11365e35	MFC r275923: Add missing continue: we can't proceed further if the kernel does not panic with zfs_panic_recover. Illumos issue: 5438 zfs_blkptr_verify should continue after zfs_panic_recover Reported by: Coverity CID: 1232014	2015-01-23 22:46:07 +00:00
delphij	902f541eb5	MFC r275922: MFV r275914: As of r270383, the dbuf_compare comparator compares the dbuf attributes in the following order: db_level (indirect level) db_blkid (block number) db_state (current state) the address of the element Because db_state is being considered before the element's state, changing of db_state would affect balancedness of the AVL tree, even when the address of element compares differently. For instance, in dbuf_create, db_state may be altered after the node is inserted into the AVL tree and may break AVL tree balancedness. Instead of using db_state as a comparision critera (introduced in r270383), consider it only when we are doing a lookup, that is one of the two dbuf pointers contains DB_SEARCH. Illumos issue: 5422 preserve AVL invariants in dn_dbufs	2015-01-23 18:39:26 +00:00
delphij	b84208b123	MFC r275811: MFV r275783: Convert ARC flags to use enum. Previously, public flags are defined in arc.h and private flags are defined in arc.c which can lead to confusion and programming errors. Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf' or 'ab' because arc_buf_t are often named 'buf' as well. Illumos issue: 5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme	2015-01-23 18:33:50 +00:00
delphij	9ef0e2ecd5	MFC r275782: MFV r275551: Remove "dbuf phys" db->db_data pointer aliases. Use function accessors that cast db->db_data to the appropriate "phys" type, removing the need for clients of the dmu buf user API to keep properly typed pointer aliases to db->db_data in order to conveniently access their data. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: In zap_leaf() and zap_leaf_byteswap, now that the pointer alias field l_phys has been removed, use the db_data field in an on stack dmu_buf_t to point to the leaf's phys data. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: Remove the db_user_data_ptr_ptr field from dbuf and all logic to maintain it. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: Modify the DMU buf user API to remove the ability to specify a db_data aliasing pointer (db_user_data_ptr_ptr). cddl/contrib/opensolaris/cmd/zdb/zdb.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Create and use the new "phys data" accessor functions dsl_dir_phys(), dsl_dataset_phys(), zap_m_phys(), zap_f_phys(), and zap_leaf_phys(). sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Remove now unused "phys pointer" aliases to db->db_data from clients of the DMU buf user API. Illumos issue: 5314 Remove "dbuf phys" db->db_data pointer aliases in ZFS	2015-01-23 18:30:32 +00:00
delphij	cab26f42cc	MFC r275781: MFV r275550: In addition to r273158, make the code in spa_sync() that checks if the current TXG is a no-op TXG less fragile. Illumos issue: 5347 idle pool may run itself out of space	2015-01-23 18:28:37 +00:00
delphij	49b7f05a49	MFC r275748: MFV r247174: Expose arc_meta_limit, et al via kstats. Note that as a result, vfs.zfs.arc_meta_used is removed. The existing vfs.zfs.arc_meta_limit sysctl/tunable is retained with a SYSCTL_PROC wrapper. Illumos ZFS issues: 3561 arc_meta_limit should be exposed via kstats Relnotes: yes	2015-01-23 18:23:19 +00:00
delphij	9a781ab65c	MFC r275740: MFV r275548: Verify that the block pointer is structurally valid, before attempting to read it in. It can only be invalid in the case of a ZFS bug, but this change will help identify such bugs in a more transparent way, by panic'ing with a relevant message, rather than indexing off the end of an array or something. Illumos issue: 5349 verify that block pointer is plausible before reading	2015-01-23 18:16:36 +00:00
delphij	9a50f3f3a8	MFC r275738: MFV r275546: Reduce scrub activities when system there is enough dirty data, namely when dirty data is more than zfs_vdev_async_write_active_min_dirty_percent (once we start to increase the number of concurrent async writes). While there also correct rounding error which would make scrub end up pausing for (zfs_txg_timeout + 1) seconds instead of the desired zfs_txg_timeout seconds. Illumos issue: 5351 scrub goes for an extra second each txg 5352 scrub should pause when there is some dirty data	2015-01-23 17:41:34 +00:00
delphij	85cd2fc0bf	MFC r275737: MFV r275545: If zio_checksum_error() returns other than ECKSUM (e.g. EINVAL), it does not fill in the "zio_bad_cksum_t *info" parameter. Caller should not attempt to use it in this case. Illumos issue: 5348 zio_checksum_error() only fills in info if ECKSUM	2015-01-23 17:31:41 +00:00
delphij	4cf14a4e5e	MFC r275736: MFV r275544: Clean up some duplicated code in dnode_sync() around freeing spill blocks. Illumos issue: 5350 clean up code in dnode_sync()	2015-01-23 17:24:56 +00:00
delphij	73a900fefa	MFC r275735: MFV r275543: Remove always true tests for ds->ds_phys' presence. Clean up assertions in dsl_dataset_disown. Remove unreachable code in dsl_dataset_disown(). Illumos issue: 5310 Remove always true tests for non-NULL ds->ds_phys	2015-01-23 17:21:11 +00:00
delphij	29c471fa7a	MFC r275734: MFV r275542: If a dnode has a spill block and there is an error while accessing a data block then traverse_dnode() loses information about that error and returns a status of visiting the spill block. This issue is discovered by Spectra Logic. Illumos issue: 5311 traverse_dnode may report success when it should not Original author: gibbs	2015-01-23 17:16:26 +00:00
delphij	f0b1164b8b	MFC r275594: MFV r275540: When importing a pool, don't assume that the passed pool configuration at vdev_load is always vaild. It's possible that a stale configuration that comes with extra vdevs, where metaslab_init() would fail because of lower layer returns error. Change the code to make metaslab_init() handle and return errors from lower layer and pass it back to upper layer and handle it there. Illumos issue: 5213 panic in metaslab_init due to space_map_open returning ENXIO	2015-01-23 00:44:14 +00:00
delphij	7780f3d4e3	MFC r275562: MFV r275535: Unexpand ISP2() and MSEC2NSEC(). Illumos issue: 5255 uts shouldn't open-code ISP2	2015-01-23 00:27:08 +00:00
delphij	99239fcce4	MFC r275561: MFV r275534: Sync with Illumos. This have no effect to FreeBSD. Illumos issue: 5285 pass in cpu_pause_func via pause_cpus	2015-01-23 00:23:48 +00:00
delphij	2fbf983b2a	MFC r275533: Sync with Illumos. This have no effect to FreeBSD. Illumos issue: 5100 sparc build failed after 5004	2015-01-23 00:19:58 +00:00
smh	13bbed0129	MFC r276063: Standardise on illumos for #ifdef's in zvol.c MFC r276066: Refactor zvol locking to minimise diff with upstream MFC r276069: Fix panic when resizing ZFS zvol's Sponsored by: Multiplay	2015-01-21 09:45:48 +00:00
smh	48282fa7cc	MFC r272509 (by delphi): Diff reduction with upstream Sponsored by: Multiplay	2015-01-21 09:39:20 +00:00
delphij	fc4ca1ce31	MFC r265218 (smh): Removed pointless / duplicated call to trim_map_first.	2015-01-10 01:05:12 +00:00
delphij	e6623583d1	MFC r264392 (davide): Fix a panic in zfs_rename(). this is due to a wrong dereference of a vnode when it's not locked and can be (potentially) recycled. 'sdvp' cannot be locked on zfs_rename() entry point because the VFS can't be sure that this scenario is LOR-free (it might violate the parent->child lock acquisition rule). Dereference 'tdvp' instead, which is already locked on entry, and access 'sdvp' fields only when it's safe, i.e. under ZFS_ENTER scope. While at it, remove the usage of VOP_REALVP, as long as this is a NOP on FreeBSD.	2015-01-10 01:01:12 +00:00
kib	511130787f	MFC r276007: Handle MAKEENTRY cnp flag in the VOP_CREATE().	2015-01-04 00:46:06 +00:00
kib	a355201d21	MFC r275897: Set NOCACHE flag for CREATE namei() calls, do not specially handle MAKEENTRY in VOP_LOOKUP().	2015-01-01 10:44:20 +00:00
delphij	2194e37061	MFC r275530: Use %d instead of %u for error number. This way we see ERESTART as -1 not 4294967295 when doing DTrace.	2014-12-22 21:06:26 +00:00
delphij	1ad38ed4f0	MFC r274337,r274673,274681,r275515: ZFS large block support. The default recordsize remains at 128KB. A new tunable/sysctl variable, vfs.zfs.max_recordsize is added to allow adjusting the permitted maximum record size, or zfs_max_recordsize, with a default of 1MB. ZFS will not allow setting recordsize greater than zfs_max_recordsize as a safety belt, because larger recordsize means greater read and write latency and more memory usage. Please note that booting from datasets that have recordsize greater than 128KB is not supported (but it's Okay to enable the feature on the pool). Limited safety belt is provided for mounted root filesystem but use caution when using a larger value. Illumos issue: 5027 zfs large block support	2014-12-22 20:58:51 +00:00
avg	a691672ca9	MFC r275401: zfs_putpages: actually update mtime and ctime	2014-12-18 13:46:11 +00:00
mav	4ff47ae9ab	MFC r275474: Add GET LBA STATUS command support to CTL. It is implemented for LUNs backed by ZVOLs in "dev" mode and files. GEOM has no such API, so for LUNs backed by raw devices all LBAs will be reported as mapped/unknown. Sponsored by: iXsystems, Inc.	2014-12-18 08:38:07 +00:00
avg	c07a7147df	MFC r274628: l2arc: restore correct rounding up of asize of compressed data	2014-12-08 13:06:44 +00:00
delphij	ebd3bcb845	MFC r274172 (avg) fix l2arc compression buffers leak We have observed that arc_release() can be called concurrently with a l2arc in-flight write. Also, we have observed that arc_hdr_destroy() can be called from arc_write_done() for a zio with ZIO_FLAG_IO_REWRITE flag in similar circumstances. Previously the l2arc headers would be freed while leaking their associated compression buffers. Now the buffers are placed on l2arc_free_on_write list for delayed freeing. This is similar to what was already done to arc buffers that were supposed to be freed concurrently with in-flight writes of those buffers. In addition to fixing the discovered leaks this change also adds some protective code to assert that a compression buffer associated with a l2arc header is never leaked. A new kstat l2_cdata_free_on_write is added. It keeps a count of delayed compression buffer frees which previously would have been leaks. Tested by: Vitalij Satanivskij <satan@ukr.net> et al Requested by: many Sponsored by: HybridCluster / ClusterHQ This is a 10.1-RELEASE errata candidate.	2014-12-05 00:32:33 +00:00

1 2 3 4 5 ...

1125 Commits