freebsd-skq

Author	SHA1	Message	Date
glebius	77755bd98c	Make ZFS depend on xdr.ko only. It doesn't need kernel RPC. Differential Revision: https://reviews.freebsd.org/D24408	2020-04-17 06:05:08 +00:00
freqlabs	5ea1792848	MFOpenZFS: ZVOLs should not be allowed to have children zfs create, receive and rename can bypass this hierarchy rule. Update both userland and kernel module to prevent this issue and use pyzfs unit tests to exercise the ioctls directly. Note: this commit slightly changes zfs_ioc_create() ABI. This allow to differentiate a generic error (EINVAL) from the specific case where we tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT). Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Approved by: mav (mentor) MFC after: 2 weeks Sponsored by: iXsystems, Inc. openzfs/zfs@d8d418ff0c	2020-03-25 15:56:18 +00:00
mav	2a20d22da4	MFOpenZFS: make zil max block size tunable We've observed that on some highly fragmented pools, most metaslab allocations are small (~2-8KB), but there are some large, 128K allocations. The large allocations are for ZIL blocks. If there is a lot of fragmentation, the large allocations can be hard to satisfy. The most common impact of this is that we need to check (and thus load) lots of metaslabs from the ZIL allocation code path, causing sync writes to wait for metaslabs to load, which can take a second or more. In the worst case, we may not be able to satisfy the allocation, in which case the ZIL will resort to txg_wait_synced() to ensure the change is on disk. To provide a workaround for this, this change adds a tunable that can reduce the size of ZIL blocks. External-issue: DLPX-61719 Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #8865 openzfs/zfs@b8738257c2 MFC after: 2 weeks	2020-03-19 01:05:54 +00:00
mav	4590151da9	Fix infinite scan on a pool with only special allocations Attempt to run scrub or resilver on a new pool containing only special allocations (special vdev added on creation) caused infinite loop because of dsl_scan_should_clear() limiting memory usage to 5% of pool size, which it calculated accounting only normal allocation class. Addition of special and just in case dedup classes fixes the issue. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #10106 Closes #8694 openzfs/zfs@fa130e010c	2020-03-16 19:03:10 +00:00
freqlabs	9214b7c393	TODO DONE: Use sx_xholder in SPL rwlock.h Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-03-14 00:16:15 +00:00
kib	1750b51956	zfs dmu_read: loosen the assertion. Since switch to the lockless grab, shared busy for ahead/behind pages allows other threads to validate and map the pages readonly. Reviewed by: avg, jeff, markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23986	2020-03-06 21:15:25 +00:00
mav	2787f71e66	Remove vfs.zfs.top_maxinflight tunable/sysctl. It is dead since sorted scrub import at r334844. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-03-05 19:43:43 +00:00
mav	543cc1109b	Increase number of write completion threads, matching ZoL. Our iSCSI benchmarks on a large 80-core system show that previous limit of 8 threads can be a bottleneck. At some points this change increases write IOPS by as much as 50%. I am still not sure that so many threads is really required, but we tested lower amounts and got no significant benefits, while latencies were a bit worse, so decided to not diverge. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-03-03 15:05:13 +00:00
jeff	86dc6f3bc4	Eliminate object locking in zfs where possible with the new lockless grab APIs. Reviewed by: kib, markj, mmacy Differential Revision: https://reviews.freebsd.org/D23848	2020-02-28 20:29:53 +00:00
markj	706ad3a3fd	Clear systrace_args_func when systrace probes are disabled. This function pointer is invalidated when systrace.ko is unloaded. Reported by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-02-28 17:04:36 +00:00
avg	9f5d23fd29	remove stray space symbol in r358380 MFC after: 1 week X-MFC with: r358380	2020-02-27 14:27:42 +00:00
avg	8e6710bf1d	use ZFS_MAX_DATASET_NAME_LEN instead of MAXPATHLEN for dataset names The change affects only FreeBSD specific code as the common code already mostly uses the more idiomatic and correct ZFS_MAX_DATASET_NAME_LEN. MFC after: 1 week	2020-02-27 14:21:01 +00:00
avg	6e9c86fd98	dsl_dataset_promote_sync: populate 'oldname' before using it It's very unlikely that zfsvfs_update_fromname() and zvol_rename_minors() ever did anything during the promote operation as the old name was not initialized. MFC after: 1 week	2020-02-27 14:12:43 +00:00
mav	28f03c2cb3	MFZoL: Relax restriction on zfs_ioc_next_obj() iteration Per the documentation for dnode_next_offset in dnode.c, the "txg" parameter specifies a lower bound on which transaction the dnode can be found in. We are interested in all dnodes that are removed between the first and last transaction in the snapshot. It doesn't need to be created in that snapshot to correspond to a removed file. In fact, the behavior of zfs diff in the test case exactly matches this: the transaction that created the data that was deleted in snapshot "2" was produced before, in snapshot "1", definitely predating the first transaction in snapshot "2". Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com> Closes #2081 zfsonlinux/zfs@7290cd3c4e MFC after: 1 week	2020-02-26 20:38:48 +00:00
tsoome	97a70b6ddd	loader: replace zfs_alloc/zfs_free with malloc/free Use common memory management.	2020-02-26 18:12:12 +00:00
mav	ae114f3d6c	MFZoL: Fix resilver writes in vdev_indirect_io_start This patch addresses an issue found in ztest where resilver write zios that were passed to an indirect vdev would end up being handled as though they were resilver read zios. This caused issues where the zio->io_abd would be both read to and written from at the same time, causing asserts to fail. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8193 zfsonlinux/zfs@5aa95ba0d3 MFC after: 1 week	2020-02-26 16:51:45 +00:00
mav	97978d4f47	Fix patch mismerge in r358336. MFC after: 1 week	2020-02-26 16:04:24 +00:00
mav	512cdfcc21	MFZoL: Fix issue with scanning dedup blocks as scan ends This patch fixes an issue discovered by ztest where dsl_scan_ddt_entry() could add I/Os to the dsl scan queues between when the scan had finished all required work and when the scan was marked as complete. This caused the scan to spin indefinitely without ending. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010 zfsonlinux/zfs@5e0bd0ae05 MFC after: 1 week	2020-02-26 15:59:46 +00:00
mav	9108e06fd4	MFZoL: Fix 2 small bugs with cached dsl_scan_phys_t This patch corrects 2 small bugs where scn->scn_phys_cached was not properly updated to match the primary copy when it needed to be. The first resulted in the pause state not being properly updated and the second resulted in the cached version being completely zeroed even if the primary was not. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010 zfsonlinux/zfs@8cb119e3dc MFC after: 1 week	2020-02-26 15:47:40 +00:00
mav	0b52ef2136	MFZoL: Fix txg_sync_thread hang in scan_exec_io() When scn->scn_maxinflight_bytes has not been initialized it's possible to hang on the condition variable in scan_exec_io(). This issue was uncovered by ztest and is only possible when deduplication is enabled through the following call path. txg_sync_thread() spa_sync() ddt_sync_table() ddt_sync_entry() dsl_scan_ddt_entry() dsl_scan_scrub_cb() dsl_scan_enqueuei() scan_exec_io() cv_wait() Resolve the issue by always initializing scn_maxinflight_bytes to a reasonable minimum value. This value will be recalculated in dsl_scan_sync() to pick up changes to zfs_scan_vdev_limit and the addition/removal of vdevs. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7098 zfsonlinux/zfs@f90a30ad1b MFC after: 1 week	2020-02-26 15:45:04 +00:00
kaktus	ad355b0a9d	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
mav	c9033ce2bd	Remove duplicate dbufs accounting. Since AVL already has embedded element counter, use dn_dbufs_count only for dbufs not counted there (bonus buffers) and just add them. This removes two atomics per dbuf life cycle. According to profiler it reduces time spent by dbuf_destroy() inside bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core. This counter is used only on illumos, so for FreeBSD it was just a waste of time. MFC after: 2 weeks	2020-02-07 15:50:47 +00:00
mav	3a02a6880c	Reduce number of atomic_add() calls in aggsum. Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to re-borrow after the flush. But since asc_borrowed and asc_delta are accessed only while holding asc_lock, it makes no any sense to modify as_lower_bound and as_upper_bound in multiple steps. Instead of that the new code uses only 2 atomics in all the cases, one per as_*_bound variable. I think even that is overkill, simple atomic store and load could be used here, since all modifications are done under the as_lock, but there are no such primitives in ZFS code now. While there, make borrow code consider previous borrow value, so that on mixed request patterns reduce chance of needing to borrow again if much larger request follows tiny one that needed borrow. Also reduce as_numbuckets from uint64_t to u_int. It makes no sense to use so large division operation on every aggsum_add(). Reviewed by: Brian Behlendorf, Paul Dagnelie MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-02-06 20:32:53 +00:00
kib	f56916862b	Add sys/systm.h to several places that use vm headers. It is needed (but not enough) to use e.g. KASSERT() in inline functions. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-02-04 18:56:26 +00:00
mav	4f580d3706	Few microoptimizations to dbuf layer. Move db_link into the same cache line as db_blkid and db_level. It allows significantly reduce avl_add() time in dbuf_create() on systems with large RAM and huge number of dbufs per dnode. Avoid few accesses to dbuf_caches[].size, which is highly congested under high IOPS and never stays in cache for a long time. Use local value we are receiving from zfs_refcount_add_many() any way. Remove cache_size_bytes_max bump from dbuf_evict_one(). I don't see a point to do it on dbuf eviction after we done it on insertion in dbuf_rele_and_unlock(). Reviewed by: mahrens, Brian Behlendorf MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-02-04 15:53:51 +00:00
tsoome	e37b7c646d	loader: rewrite zfs reader zap code to use malloc First step on removing zfs_alloc. Reviewed by: delphij Differential Revision: https://reviews.freebsd.org/D23433	2020-02-04 07:37:55 +00:00
imp	48b94864c5	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs	2020-02-03 17:35:11 +00:00
mav	8ea92b3fdd	Unblock kstat.zfs.misc.dbufstats sysctls. It is not so much broken to hide it after we wasted time to collect it. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-02-03 17:10:40 +00:00
kevans	11e74d9fa2	Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247	2020-02-02 16:34:57 +00:00
kevans	1f9321e091	zfs: light refactor to indicate cachedlookup in zfs_lookup If we come from VOP_CACHEDLOOKUP, we must skip the VEXEC check as it will have been done in the caller (vfs_cache_lookup). This is a part of D23247, which may skip the earlier VEXEC check as well if the root fd was opened with O_SEARCH. This one required slightly more work as zfs_lookup may also be called indirectly as VOP_LOOKUP or a couple of other places where we must do the check.	2020-02-02 16:10:33 +00:00
mjg	d8144f8e88	zfs: ZFS_WLOCK_TEARDOWN_INACTIVE_WLOCKED -> ZFS_TEARDOWN_INACTIVE_WLOCKED Fix up the argument used in one case as well.	2020-02-01 06:39:10 +00:00
mjg	076981e2f3	zfs: convert z_teardown_inactive_lock to sleepable read-mostly lock This eliminates a global serialisation point. It only gets write locked on unmount. Sample result doing an incremental -j 40 build: before: 173.30s user 458.97s system 2595% cpu 24.358 total after: 168.58s user 254.92s system 2211% cpu 19.147 total	2020-01-31 08:38:38 +00:00
mjg	9e428f1289	zfs: provide macros to handle z_teardown_inactive_lock	2020-01-31 08:37:35 +00:00
mjg	de2215f7af	zfs: fix spurious lock contention during path lookup ZFS tracks if anything denies VEXEC to allow for a quick check for the common case of path traversal. Use it. Differential Revision: https://reviews.freebsd.org/D22224	2020-01-30 02:16:17 +00:00
mjg	7a9bf03632	zfs: use VOP_NEED_INACTIVE Big thanks to Greg V for testing previous verions of the patch. Differential Revision: https://reviews.freebsd.org/D22130	2020-01-30 02:14:10 +00:00
mav	d516c76636	Map ECKSUM and EFRAGS from ZFS onto real errnos. Make it less confusing when, for example, stat sets errno to 122 because a checksum failed in ZFS: Before: getfacl: /foo/bar: stat() failed: Unknown error: 122 After: getfacl: /foo/bar: stat() failed: Integrity check failed Submitted by: Ryan Moeller <ryan@ixsystems.com> Reviewed by: mckusick, mav MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D22973	2020-01-13 22:06:16 +00:00
mjg	165ba25434	Add KERNEL_PANICKED macro for use in place of direct panicstr tests	2020-01-12 06:07:54 +00:00
mjg	3bd5f540e5	dtrace: add missing CLTFLAG_MPSAFE annotations	2020-01-12 04:53:22 +00:00
mjg	7b779e9c66	zfs: add missing CLTFLAG_MPSAFE annotations	2020-01-12 04:53:01 +00:00
mjg	d9c6cac9c3	vfs: prealloc vnodes in getnewvnode_reserve Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations.	2020-01-11 22:58:14 +00:00
ian	1090e6574b	Remove scary-looking printf output that happens when you kldload dtrace on arm. Replace it with a comment block explaining why the function is empty on 32-bit arm.	2020-01-09 22:51:37 +00:00
mjg	5c232f8fed	zfs: plug a vnode reserve leak in zfs_make_xattrdir	2020-01-07 04:34:29 +00:00
mjg	f121d45000	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
bdragon	db39dab3a8	[PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations This is a lock-based emulation of 64-bit atomics for kernel use, split off from an earlier patch by jhibbits. This is needed to unblock future improvements that reduce the need for locking on 64-bit platforms by using atomic updates. The implementation allows for future integration with userland atomic64, but as that implies going through sysarch for every use, the current status quo of userland doing its own locking may be for the best. Submitted by: jhibbits (original patch), kevans (mips bits) Reviewed by: jhibbits, jeff, kevans Differential Revision: https://reviews.freebsd.org/D22976	2020-01-02 23:20:37 +00:00
markj	09e22f0663	Remove page locking for queue operations. With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885	2019-12-28 19:04:00 +00:00
mjg	048a894ebc	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738	2019-12-16 00:06:22 +00:00
tsoome	c29918a860	loader: rewrite zfs vdev initialization In some cases the pool discovery will get stuck in infinite loop while setting up the vdev children. To fix, we split the vdev setup into two parts, first we create vdevs based on configuration we do get from pool label, then, we process pool config from MOS and update the pool config if needed. Testing done: confirm previously hung loader is not hung any more. MFC after: 1 week	2019-12-15 21:52:40 +00:00
jeff	bf925a1e49	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626	2019-12-15 21:11:15 +00:00
jhb	fe10c96bb9	Use a callout instead of timeout(9) for delayed zio's. Reviewed by: avg Differential Revision: https://reviews.freebsd.org/D22597	2019-12-13 19:27:51 +00:00
mjg	656673caeb	vfs: locking primitives which elide ->v_vnlock and shared locking disablement Both of these features are not needed by many consumers and result in avoidable reads which in turn puts them on profiles due to cache-line ping ponging. On top of that the current lockgmr entry point is slower than necessary single-threaded. As an attempted clean up preparing for other changes, provide new routines which don't support any of the aforementioned features. With these patches in place vop_stdlock and vop_stdunlock disappear from flamegraphs during -j 104 buildkernel. Reviewed by: jeff (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D22665	2019-12-11 23:11:21 +00:00

1 2 3 4 5 ...

2261 Commits