freebsd-dev

Author	SHA1	Message	Date
Ruslan Bukin	d75038a0af	Fix entering KDB with dtrace-enabled kernel. Reviewed by: markj, jhb Differential Revision: https://reviews.freebsd.org/D24018	2020-05-26 16:44:05 +00:00
Mark Johnston	66b415fb8f	Don't block on the range lock in zfs_getpages(). After r358443 the vnode object lock no longer synchronizes concurrent zfs_getpages() and zfs_write() (which must update vnode pages to maintain coherence). This created a potential deadlock between ZFS range locks and VM page busy locks: a fault on a mapped file will cause the fault page to be busied, after which zfs_getpages() locks a range around the file offset in order to map adjacent, resident pages; zfs_write() locks the range first, and then must busy vnode pages when synchronizing. Solve this by adding a non-blocking mode for ZFS range locks, and using it in zfs_getpages(). If zfs_getpages() fails to acquire the range lock, only the fault page will be populated. Reported by: bdrewery Reviewed by: avg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24839	2020-05-20 18:29:23 +00:00
Toomas Soome	22ed31c23f	lz4 hash table does not start zeroed illumos issue: https://www.illumos.org/issues/12757 Submitted by: andyf	2020-05-19 19:53:12 +00:00
Kyle Evans	47c7d8327c	zfs: reject read(2) of a dirfd with EISDIR This is independent of the recently-discussed global change, which is still in review/discussion stage. This is effectively a measure for consistency in the ZFS world, where FreeBSD was the only platform (as far as I could find) that allowed this. What ZFS exposes is decidedly not useful for any real purposes, to paraphrase (hopefully faithfully) jhb's findings when exploring this: The size of a directory in ZFS is the number of directory entries within. When reading a directory, you would instead get the leading part of its raw contents; the amount you get being dictated by the "size," i.e. number of directory entries. There's decidedly (luckily) no stack disclosure happening here, though the behavior is bizarre and almost certainly a historical accident. This change has already been upstreamed to OpenZFS. MFC after: 1 week	2020-05-19 02:41:05 +00:00
John Baldwin	2c213c2e75	Correct the order of arguments to copyin() for Q_SETQUOTA. MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24656	2020-05-18 16:47:44 +00:00
Pawel Jakub Dawidek	cb761bb2fb	Avoid the GEOM topology lock recursion when we automatically expand a pool. The steps to reproduce the problem: mdconfig -a -t swap -s 3g -u 0 gpart create -s GPT md0 gpart add -t freebsd-zfs -s 1g md0 zpool create -o autoexpand=on foo md0p1 gpart resize -i 1 -s 2g md0	2020-04-25 21:45:31 +00:00
John Baldwin	5c4309b474	Handle non-dtrace-triggered kernel breakpoint traps in mips. If DTRACE is enabled at compile time, all kernel breakpoint traps are first given to dtrace to see if they are triggered by a FBT probe. Previously if dtrace didn't recognize the trap, it was silently ignored breaking the handling of other kernel breakpoint traps such as the debug.kdb.enter sysctl. This only returns early from the trap handler if dtrace recognizes the trap and handles it. Submitted by: Nicolò Mazzucato <nicomazz97@gmail.com> Reviewed by: markj Obtained from: CheriBSD Differential Revision: https://reviews.freebsd.org/D24478	2020-04-21 17:38:07 +00:00
Gleb Smirnoff	9edef911e8	Make ZFS depend on xdr.ko only. It doesn't need kernel RPC. Differential Revision: https://reviews.freebsd.org/D24408	2020-04-17 06:05:08 +00:00
Ryan Moeller	69534635ff	MFOpenZFS: ZVOLs should not be allowed to have children zfs create, receive and rename can bypass this hierarchy rule. Update both userland and kernel module to prevent this issue and use pyzfs unit tests to exercise the ioctls directly. Note: this commit slightly changes zfs_ioc_create() ABI. This allow to differentiate a generic error (EINVAL) from the specific case where we tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT). Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Approved by: mav (mentor) MFC after: 2 weeks Sponsored by: iXsystems, Inc. openzfs/zfs@d8d418ff0c	2020-03-25 15:56:18 +00:00
Alexander Motin	d3c6ba3214	MFOpenZFS: make zil max block size tunable We've observed that on some highly fragmented pools, most metaslab allocations are small (~2-8KB), but there are some large, 128K allocations. The large allocations are for ZIL blocks. If there is a lot of fragmentation, the large allocations can be hard to satisfy. The most common impact of this is that we need to check (and thus load) lots of metaslabs from the ZIL allocation code path, causing sync writes to wait for metaslabs to load, which can take a second or more. In the worst case, we may not be able to satisfy the allocation, in which case the ZIL will resort to txg_wait_synced() to ensure the change is on disk. To provide a workaround for this, this change adds a tunable that can reduce the size of ZIL blocks. External-issue: DLPX-61719 Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #8865 openzfs/zfs@b8738257c2 MFC after: 2 weeks	2020-03-19 01:05:54 +00:00
Alexander Motin	cf2f2eb568	Fix infinite scan on a pool with only special allocations Attempt to run scrub or resilver on a new pool containing only special allocations (special vdev added on creation) caused infinite loop because of dsl_scan_should_clear() limiting memory usage to 5% of pool size, which it calculated accounting only normal allocation class. Addition of special and just in case dedup classes fixes the issue. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #10106 Closes #8694 openzfs/zfs@fa130e010c	2020-03-16 19:03:10 +00:00
Ryan Moeller	9f24784038	TODO DONE: Use sx_xholder in SPL rwlock.h Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-03-14 00:16:15 +00:00
Konstantin Belousov	d5b7401f64	zfs dmu_read: loosen the assertion. Since switch to the lockless grab, shared busy for ahead/behind pages allows other threads to validate and map the pages readonly. Reviewed by: avg, jeff, markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23986	2020-03-06 21:15:25 +00:00
Alexander Motin	5c940cf1ff	Remove vfs.zfs.top_maxinflight tunable/sysctl. It is dead since sorted scrub import at r334844. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-03-05 19:43:43 +00:00
Alexander Motin	e37d5c12e9	Increase number of write completion threads, matching ZoL. Our iSCSI benchmarks on a large 80-core system show that previous limit of 8 threads can be a bottleneck. At some points this change increases write IOPS by as much as 50%. I am still not sure that so many threads is really required, but we tested lower amounts and got no significant benefits, while latencies were a bit worse, so decided to not diverge. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-03-03 15:05:13 +00:00
Jeff Roberson	9defe1c076	Eliminate object locking in zfs where possible with the new lockless grab APIs. Reviewed by: kib, markj, mmacy Differential Revision: https://reviews.freebsd.org/D23848	2020-02-28 20:29:53 +00:00
Mark Johnston	a7261520ba	Clear systrace_args_func when systrace probes are disabled. This function pointer is invalidated when systrace.ko is unloaded. Reported by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-02-28 17:04:36 +00:00
Andriy Gapon	40b1e0dc0e	remove stray space symbol in r358380 MFC after: 1 week X-MFC with: r358380	2020-02-27 14:27:42 +00:00
Andriy Gapon	6d11243ae2	use ZFS_MAX_DATASET_NAME_LEN instead of MAXPATHLEN for dataset names The change affects only FreeBSD specific code as the common code already mostly uses the more idiomatic and correct ZFS_MAX_DATASET_NAME_LEN. MFC after: 1 week	2020-02-27 14:21:01 +00:00
Andriy Gapon	6b47663df5	dsl_dataset_promote_sync: populate 'oldname' before using it It's very unlikely that zfsvfs_update_fromname() and zvol_rename_minors() ever did anything during the promote operation as the old name was not initialized. MFC after: 1 week	2020-02-27 14:12:43 +00:00
Alexander Motin	a33a65ce22	MFZoL: Relax restriction on zfs_ioc_next_obj() iteration Per the documentation for dnode_next_offset in dnode.c, the "txg" parameter specifies a lower bound on which transaction the dnode can be found in. We are interested in all dnodes that are removed between the first and last transaction in the snapshot. It doesn't need to be created in that snapshot to correspond to a removed file. In fact, the behavior of zfs diff in the test case exactly matches this: the transaction that created the data that was deleted in snapshot "2" was produced before, in snapshot "1", definitely predating the first transaction in snapshot "2". Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com> Closes #2081 zfsonlinux/zfs@7290cd3c4e MFC after: 1 week	2020-02-26 20:38:48 +00:00
Toomas Soome	c1c4c81fd7	loader: replace zfs_alloc/zfs_free with malloc/free Use common memory management.	2020-02-26 18:12:12 +00:00
Alexander Motin	0f58760b82	MFZoL: Fix resilver writes in vdev_indirect_io_start This patch addresses an issue found in ztest where resilver write zios that were passed to an indirect vdev would end up being handled as though they were resilver read zios. This caused issues where the zio->io_abd would be both read to and written from at the same time, causing asserts to fail. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8193 zfsonlinux/zfs@5aa95ba0d3 MFC after: 1 week	2020-02-26 16:51:45 +00:00
Alexander Motin	51c04e6cc2	Fix patch mismerge in r358336. MFC after: 1 week	2020-02-26 16:04:24 +00:00
Alexander Motin	f8a7a04b79	MFZoL: Fix issue with scanning dedup blocks as scan ends This patch fixes an issue discovered by ztest where dsl_scan_ddt_entry() could add I/Os to the dsl scan queues between when the scan had finished all required work and when the scan was marked as complete. This caused the scan to spin indefinitely without ending. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010 zfsonlinux/zfs@5e0bd0ae05 MFC after: 1 week	2020-02-26 15:59:46 +00:00
Alexander Motin	308acfcc62	MFZoL: Fix 2 small bugs with cached dsl_scan_phys_t This patch corrects 2 small bugs where scn->scn_phys_cached was not properly updated to match the primary copy when it needed to be. The first resulted in the pause state not being properly updated and the second resulted in the cached version being completely zeroed even if the primary was not. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010 zfsonlinux/zfs@8cb119e3dc MFC after: 1 week	2020-02-26 15:47:40 +00:00
Alexander Motin	4b7f090f8d	MFZoL: Fix txg_sync_thread hang in scan_exec_io() When scn->scn_maxinflight_bytes has not been initialized it's possible to hang on the condition variable in scan_exec_io(). This issue was uncovered by ztest and is only possible when deduplication is enabled through the following call path. txg_sync_thread() spa_sync() ddt_sync_table() ddt_sync_entry() dsl_scan_ddt_entry() dsl_scan_scrub_cb() dsl_scan_enqueuei() scan_exec_io() cv_wait() Resolve the issue by always initializing scn_maxinflight_bytes to a reasonable minimum value. This value will be recalculated in dsl_scan_sync() to pick up changes to zfs_scan_vdev_limit and the addition/removal of vdevs. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7098 zfsonlinux/zfs@f90a30ad1b MFC after: 1 week	2020-02-26 15:45:04 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Alexander Motin	8d8e484d9c	Remove duplicate dbufs accounting. Since AVL already has embedded element counter, use dn_dbufs_count only for dbufs not counted there (bonus buffers) and just add them. This removes two atomics per dbuf life cycle. According to profiler it reduces time spent by dbuf_destroy() inside bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core. This counter is used only on illumos, so for FreeBSD it was just a waste of time. MFC after: 2 weeks	2020-02-07 15:50:47 +00:00
Alexander Motin	c10aea724f	Reduce number of atomic_add() calls in aggsum. Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to re-borrow after the flush. But since asc_borrowed and asc_delta are accessed only while holding asc_lock, it makes no any sense to modify as_lower_bound and as_upper_bound in multiple steps. Instead of that the new code uses only 2 atomics in all the cases, one per as_*_bound variable. I think even that is overkill, simple atomic store and load could be used here, since all modifications are done under the as_lock, but there are no such primitives in ZFS code now. While there, make borrow code consider previous borrow value, so that on mixed request patterns reduce chance of needing to borrow again if much larger request follows tiny one that needed borrow. Also reduce as_numbuckets from uint64_t to u_int. It makes no sense to use so large division operation on every aggsum_add(). Reviewed by: Brian Behlendorf, Paul Dagnelie MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-02-06 20:32:53 +00:00
Konstantin Belousov	a421e8786b	Add sys/systm.h to several places that use vm headers. It is needed (but not enough) to use e.g. KASSERT() in inline functions. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-02-04 18:56:26 +00:00
Alexander Motin	ea642c5c38	Few microoptimizations to dbuf layer. Move db_link into the same cache line as db_blkid and db_level. It allows significantly reduce avl_add() time in dbuf_create() on systems with large RAM and huge number of dbufs per dnode. Avoid few accesses to dbuf_caches[].size, which is highly congested under high IOPS and never stays in cache for a long time. Use local value we are receiving from zfs_refcount_add_many() any way. Remove cache_size_bytes_max bump from dbuf_evict_one(). I don't see a point to do it on dbuf eviction after we done it on insertion in dbuf_rele_and_unlock(). Reviewed by: mahrens, Brian Behlendorf MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-02-04 15:53:51 +00:00
Toomas Soome	4d297e7035	loader: rewrite zfs reader zap code to use malloc First step on removing zfs_alloc. Reviewed by: delphij Differential Revision: https://reviews.freebsd.org/D23433	2020-02-04 07:37:55 +00:00
Warner Losh	58aa35d429	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs	2020-02-03 17:35:11 +00:00
Alexander Motin	c68c82324f	Unblock kstat.zfs.misc.dbufstats sysctls. It is not so much broken to hide it after we wasted time to collect it. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-02-03 17:10:40 +00:00
Kyle Evans	6a5abb1ee5	Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247	2020-02-02 16:34:57 +00:00
Kyle Evans	c887ac8324	zfs: light refactor to indicate cachedlookup in zfs_lookup If we come from VOP_CACHEDLOOKUP, we must skip the VEXEC check as it will have been done in the caller (vfs_cache_lookup). This is a part of D23247, which may skip the earlier VEXEC check as well if the root fd was opened with O_SEARCH. This one required slightly more work as zfs_lookup may also be called indirectly as VOP_LOOKUP or a couple of other places where we must do the check.	2020-02-02 16:10:33 +00:00
Mateusz Guzik	f0c402e425	zfs: ZFS_WLOCK_TEARDOWN_INACTIVE_WLOCKED -> ZFS_TEARDOWN_INACTIVE_WLOCKED Fix up the argument used in one case as well.	2020-02-01 06:39:10 +00:00
Mateusz Guzik	8c3658c450	zfs: convert z_teardown_inactive_lock to sleepable read-mostly lock This eliminates a global serialisation point. It only gets write locked on unmount. Sample result doing an incremental -j 40 build: before: 173.30s user 458.97s system 2595% cpu 24.358 total after: 168.58s user 254.92s system 2211% cpu 19.147 total	2020-01-31 08:38:38 +00:00
Mateusz Guzik	b076e113af	zfs: provide macros to handle z_teardown_inactive_lock	2020-01-31 08:37:35 +00:00
Mateusz Guzik	42a9f8f21d	zfs: fix spurious lock contention during path lookup ZFS tracks if anything denies VEXEC to allow for a quick check for the common case of path traversal. Use it. Differential Revision: https://reviews.freebsd.org/D22224	2020-01-30 02:16:17 +00:00
Mateusz Guzik	e196f7825e	zfs: use VOP_NEED_INACTIVE Big thanks to Greg V for testing previous verions of the patch. Differential Revision: https://reviews.freebsd.org/D22130	2020-01-30 02:14:10 +00:00
Alexander Motin	da19f62dfa	Map ECKSUM and EFRAGS from ZFS onto real errnos. Make it less confusing when, for example, stat sets errno to 122 because a checksum failed in ZFS: Before: getfacl: /foo/bar: stat() failed: Unknown error: 122 After: getfacl: /foo/bar: stat() failed: Integrity check failed Submitted by: Ryan Moeller <ryan@ixsystems.com> Reviewed by: mckusick, mav MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D22973	2020-01-13 22:06:16 +00:00
Mateusz Guzik	879e0604ee	Add KERNEL_PANICKED macro for use in place of direct panicstr tests	2020-01-12 06:07:54 +00:00
Mateusz Guzik	638af813d9	dtrace: add missing CLTFLAG_MPSAFE annotations	2020-01-12 04:53:22 +00:00
Mateusz Guzik	20fa645666	zfs: add missing CLTFLAG_MPSAFE annotations	2020-01-12 04:53:01 +00:00
Mateusz Guzik	b52d50cf69	vfs: prealloc vnodes in getnewvnode_reserve Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations.	2020-01-11 22:58:14 +00:00
Ian Lepore	8bfc473c0e	Remove scary-looking printf output that happens when you kldload dtrace on arm. Replace it with a comment block explaining why the function is empty on 32-bit arm.	2020-01-09 22:51:37 +00:00
Mateusz Guzik	75ad73a8b9	zfs: plug a vnode reserve leak in zfs_make_xattrdir	2020-01-07 04:34:29 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Brandon Bergren	9aafc7c052	[PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations This is a lock-based emulation of 64-bit atomics for kernel use, split off from an earlier patch by jhibbits. This is needed to unblock future improvements that reduce the need for locking on 64-bit platforms by using atomic updates. The implementation allows for future integration with userland atomic64, but as that implies going through sysarch for every use, the current status quo of userland doing its own locking may be for the best. Submitted by: jhibbits (original patch), kevans (mips bits) Reviewed by: jhibbits, jeff, kevans Differential Revision: https://reviews.freebsd.org/D22976	2020-01-02 23:20:37 +00:00
Mark Johnston	9f5632e6c8	Remove page locking for queue operations. With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885	2019-12-28 19:04:00 +00:00
Mateusz Guzik	6fa079fc3f	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738	2019-12-16 00:06:22 +00:00
Toomas Soome	3c2db0ef43	loader: rewrite zfs vdev initialization In some cases the pool discovery will get stuck in infinite loop while setting up the vdev children. To fix, we split the vdev setup into two parts, first we create vdevs based on configuration we do get from pool label, then, we process pool config from MOS and update the pool config if needed. Testing done: confirm previously hung loader is not hung any more. MFC after: 1 week	2019-12-15 21:52:40 +00:00
Jeff Roberson	61a74c5ccd	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626	2019-12-15 21:11:15 +00:00
John Baldwin	889ad0b890	Use a callout instead of timeout(9) for delayed zio's. Reviewed by: avg Differential Revision: https://reviews.freebsd.org/D22597	2019-12-13 19:27:51 +00:00
Mateusz Guzik	c8b29d1212	vfs: locking primitives which elide ->v_vnlock and shared locking disablement Both of these features are not needed by many consumers and result in avoidable reads which in turn puts them on profiles due to cache-line ping ponging. On top of that the current lockgmr entry point is slower than necessary single-threaded. As an attempted clean up preparing for other changes, provide new routines which don't support any of the aforementioned features. With these patches in place vop_stdlock and vop_stdunlock disappear from flamegraphs during -j 104 buildkernel. Reviewed by: jeff (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D22665	2019-12-11 23:11:21 +00:00
Mateusz Guzik	abd80ddb94	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
Mark Johnston	bf10551606	Fix an inverted condition introduced in r353539. This would have most likely resulted in read errors causing page leaks. Submitted by: jeff	2019-12-06 23:49:37 +00:00
Konstantin Belousov	fdc6b10d44	Add a VN_OPEN_INVFS flag. vn_open_cred() assumes that it is called from the top-level of a VFS syscall. Writers must call bwillwrite() before locking any VFS resource to wait for cleanup of dirty buffers. ZFS getextattr() and setextattr() VOPs do call vn_open_cred(), which results in wait for unrelated buffers while owning ZFS vnode lock (and ZFS does not use buffer cache). VN_OPEN_INVFS allows caller to skip bwillwrite. Note that ZFS is still incorrect there, because it starts write on an mp and locks a vnode while holding another vnode lock. Reported by: Willem Jan Withagen <wjw@digiware.nl> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-11-29 14:02:32 +00:00
Alexander Motin	5008399c14	Fix use-after-free in case of L2ARC prefetch failure. In case L2ARC read failed, l2arc_read_done() creates _different_ ZIO to read data from the original storage device. Unfortunately pointer to the failed ZIO remains in hdr->b_l1hdr.b_acb->acb_zio_head, and if some other read try to bump the ZIO priority, it will crash. The problem is reproducible by corrupting L2ARC content and reading some data with prefetch if l2arc_noprefetch tunable is changed to 0. With the default setting the issue is probably not reproducible now. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-11-28 18:28:35 +00:00
Andriy Gapon	8491540808	MFV r354383: 10592 misc. metaslab and vdev related ZoL bug fixes illumos/illumos-gate@555d674d5d `555d674d5d` https://www.illumos.org/issues/10592 This is a collection of recent fixes from ZoL: `8eef997679` Error path in metaslab_load_impl() forgets to drop ms_sync_lock `928e8ad47d` Introduce auxiliary metaslab histograms `425d3237ee` Get rid of space_map_update() for ms_synced_length `6c926f426a` Simplify log vdev removal code `21e7cf5da8` zdb -L should skip leak detection altogether `df72b8bebe` Rename range_tree_verify to range_tree_verify_not_present `75058f3303` Remove unused vdev_t fields Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com> MFC after: 4 weeks	2019-11-21 13:35:43 +00:00
Andriy Gapon	489912da7b	MFV r354382,r354385: 10601 10757 Pool allocation classes illumos/illumos-gate@663207adb1 `663207adb1` 10601 Pool allocation classes https://www.illumos.org/issues/10601 illumos port of ZoL Pool allocation classes. Includes at least these two commits: `441709695` Pool allocation classes misplacing small file blocks `cc99f275a` Pool allocation classes 10757 Add -gLp to zpool subcommands for alt vdev names https://www.illumos.org/issues/10757 Port from ZoL of `d2f3e292d` Add -gLp to zpool subcommands for alt vdev names Note that a subsequent ZoL commit changed -p to -P `a77f29f93` Change full path subcommand flag from -p to -P Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Portions contributed by: Håkan Johansson <f96hajo@chalmers.se> Portions contributed by: Richard Yao <ryao@gentoo.org> Portions contributed by: Chunwei Chen <david.chen@nutanix.com> Portions contributed by: loli10K <ezomori.nozomu@gmail.com> Author: Don Brady <don.brady@delphix.com> 11541 allocation_classes feature must be enabled to add log device illumos/illumos-gate@c1064fd7ce `c1064fd7ce` https://www.illumos.org/issues/11541 After the allocation_classes feature was integrated, one can no longer add a log device to a pool unless that feature is enabled. There is an explicit check for this, but it is unnecessary in the case of log devices, so we should handle this better instead of forcing the feature to be enabled. Author: Jerry Jelinek <jerry.jelinek@joyent.com> FreeBSD notes. I faithfully added the new -g, -L, -P flags, but only -g does something: vdev GUIDs are displayed instead of device names. -L, resolve symlinks, and -P, display full disk paths, do nothing at the moment. The use of special vdevs is backward compatible for read-only access, so root pools should be bootable, but exercise caution. MFC after: 4 weeks	2019-11-21 08:20:05 +00:00
Andriy Gapon	a8c08e008a	MFV r354378,r354379,r354386: 10499 Multi-modifier protection (MMP) 10499 Multi-modifier protection (MMP) illumos/illumos-gate@e0f1c0afa4 `e0f1c0afa4` https://www.illumos.org/issues/10499 Port the following ZFS commits from ZoL to illumos. `379ca9cf2` Multi-modifier protection (MMP) `bbffb59ef` Fix multihost stale cache file import `0d398b256` Do not initiate MMP writes while pool is suspended 10701 Correct lock ASSERTs in vdev_label_read/write illumos/illumos-gate@58447f688d `58447f688d` https://www.illumos.org/issues/10701 Port of ZoL commit: `0091d66f4e` Correct lock ASSERTs in vdev_label_read/write At a minimum, this fixes a blown assert during an MMP test run when running on a DEBUG build. 11770 additional mmp fixes illumos/illumos-gate@4348eb9012 `4348eb9012` https://www.illumos.org/issues/11770 Port a few additional MMP fixes from ZoL that came in after our initial MMP port. `4ca457b065` ZTS: Fix mmp_interval failure `ca95f70dff` zpool import progress kstat (only minimal changes from above can be pulled in right now) `060f0226e6` MMP interval and fail_intervals in uberblock Note from the committer (me). I do not have any use for this feature and I have not tested it. I only did smoke testing with multihost=off. Please be aware. I merged the code only to make future merges easier. Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Portions contributed by: Tim Chase <tim@chase2k.com> Portions contributed by: sanjeevbagewadi <sanjeev.bagewadi@gmail.com> Portions contributed by: John L. Hammond <john.hammond@intel.com> Portions contributed by: Giuseppe Di Natale <dinatale2@llnl.gov> Portions contributed by: Prakash Surya <surya1@llnl.gov> Portions contributed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Olaf Faaland <faaland1@llnl.gov> MFC after: 4 weeks	2019-11-18 09:38:35 +00:00
Konstantin Belousov	a7af4a3e7d	amd64: move GDT into PCPU area. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22302	2019-11-12 15:51:47 +00:00
Andriy Gapon	930db3e338	MFV r354377: 10554 Implemented zpool sync command illumos/illumos-gate@9c2acf00e2 `9c2acf00e2` https://www.illumos.org/issues/10554 During the port of MMP (illumos bug 10499) from ZoL, I found this earlier ZoL project is a prerequisite. Here is the original description. This addition will enable us to sync an open TXG to the main pool on demand. The functionality is similar to 'sync(2)' but 'zpool sync' will return when data has hit the main storage instead of potentially just the ZIL as is the case with the 'sync(2)' cmd. Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Author: Alek Pinchuk <apinchuk@datto.com> MFC after: 3 weeks Relnotes: possibly	2019-11-07 11:18:28 +00:00
Alexander Motin	4cd20c3b08	Add vfs.zfs.zio.taskq_batch_pct tunable. MFC after: 1 week	2019-11-05 15:19:05 +00:00
Andriy Gapon	ec03988887	fix up r354333, make zfsproc visible to dtrace, rename to system_proc I overlooked the fact that zfsproc is required by dtrace modules that use illumos compatible taskq KPI. So, move the symbol definition to the opensolaris module that provides compatibility support for both ZFS and DTrace. Also, rename zfsproc to system_proc to reflect that it is not specific to ZFS. Reported by: ae MFC after: 5 weeks X-MFC with: ae	2019-11-05 14:34:59 +00:00
Andriy Gapon	eb819923ec	zfs: enable SPA_PROCESS on the kernel side The purpose of this change is to group kernelthreads specific to a particular ZFS pool under a kernel process. There can be many dozens of threads per pool. This change improves observability of those threads. This change consists of several subchanges: 1. illumos taskq_create_proc can now pass its process parameter to taskqueue. Also, use zfsproc instead of NULL for taskq_create. Caveat: zfsproc might not be initialized yet. But in that case it is still NULL, so not worse than before. 2. illumos sys/proc.h: kthread id is stored in t_did field, not t_tid. 3. zfs: enable SPA_PROCESS on the kernel side. The change is a bit hairy as newproc() is implemented privately to spa.c. I couldn't think of a better way to populate process name than to poke inside the argument for the process routine. 4. illumos thread_create: allow assigning thread to process other than zfsproc. 5. zfs: expose spa_proc to other users, assign sync and quiesce threads to it. Pool-specific threads created using (relatively new) zthr mechanism are still assigned to the zfskern process rather than to a respective zpool-xxx process. I am going to address this a bit later. Reviewed by: no one MFC after: 5 weeks Relnotes: perhaps Differential Revision: https://reviews.freebsd.org/D9720	2019-11-04 13:30:37 +00:00
Toomas Soome	79a4bf8975	loader: factor out label and uberblock load from vdev_probe, add MMP checks Clean up the label read.	2019-11-03 21:19:52 +00:00
Toomas Soome	0c0a882c7a	loader: we do not support booting from pool with log device If pool has log device, stop there and tell about it.	2019-11-03 13:25:47 +00:00
Toomas Soome	abca0bd501	loader: calculate physical vdev psize from asize Since physical device asize is calculated from psize and the asize is stored in pool label, we can use asize to set the value of psize, which is used to calculate the location of the pool labels. MFC after: 1 week	2019-11-03 11:09:06 +00:00
Toomas Soome	24e1a7ac77	r354253 did miss the fact that libzpool is built as fake kernel We build libzpool as kernel like, use _FAKE_KERNEL check to include kernel api in libzpool.	2019-11-02 21:02:54 +00:00
Toomas Soome	25cf531ecd	r354253 did miss lz4.c from sys/cddl/boot/zfs.	2019-11-02 15:08:19 +00:00
Toomas Soome	e499793e76	Remove duplicate lz4 implementations Port illumos change: https://www.illumos.org/issues/11667 Move lz4.c out of zfs tree to opensolaris/common/lz4, adjust it to be usable from kernel/stand/userland builds, so we can use just one single source. Add lz4.h to declare lz4_compress() and lz4_decompress(). MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D22037	2019-11-02 12:28:04 +00:00
Alexander Motin	a4d5fcadd8	FreeBSD'fy ZFS zlib zalloc/zfree callbacks. The previous code came from OpenSolaris, which in my understanding require allocation size to be known to free memory. To store that size previous code allocated additional 8 byte header. But I have noticed that zlib with present settings allocates 64KB context buffers for each call, that could be efficiently cached by UMA, but addition of those 8 bytes makes them fall back to physical RAM allocations, that cause huge overhead and lock congestion on small blocks. Since FreeBSD's free() does not have the size argument, switching to it solves the problem, increasing write speed to ZVOLs with 4KB block size and GZIP compression on my 40-threads test system from ~60MB/s to ~600MB/s. MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-10-29 21:25:19 +00:00
Toomas Soome	903fe2b762	loader: zio_checksum_verify should check byteswap We do have both native and byteswap checksum callbacks in place but the selection is not wired. MFC after: 1 week	2019-10-27 08:35:29 +00:00
Alan Somers	1af3a11218	MFZoL: Avoid retrieving unused snapshot props This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it to take input parameters that alter the way looping through the list of snapshots is performed. The idea here is to restrict functions that throw away some of the snapshots returned by the ioctl to a range of snapshots that these functions actually use. This improves efficiency and execution speed for some rollback and send operations. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #8077 zfsonlinux/zfs@4c0883fb4a MFC after: 2 weeks	2019-10-26 17:11:02 +00:00
Konstantin Belousov	5b87ecc643	Assert that vnode_pager_setsize() is called with the vnode exclusively locked except for filesystems that set the MNTK_VMSETSIZE_BUG, Set the flag for ZFS. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 16:21:24 +00:00
Andriy Gapon	b6528d546f	MFV r353637: 10844 Serialize ZTHR operations to eliminate races illumos/illumos-gate@6a316e1f6d `6a316e1f6d` https://www.illumos.org/issues/10844 ZoL `61c3391acc` Serialize ZTHR operations to eliminate races Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com> Obtained from: illumos, ZoL MFC after: 3 weeks	2019-10-16 09:29:01 +00:00
Andriy Gapon	428c45f156	MFV r353630: 10809 Performance optimization of AVL tree comparator functions illumos/illumos-gate@c4ab0d3f46 `c4ab0d3f46` https://www.illumos.org/issues/10809 Port ZoL `ee36c709c3` Performance optimization of AVL tree comparator functions This is a followup to r337567 that imported the ZoL commit directly into FreeBSD. It seems that at the time we did not have some of the earlier changes, so some pieces of the ZoL change were not applicable. Also, the illumos version got a few style cleanups. Some changes were missed or incorrectly merged (e.g., vdev_cache_lastused_compare and metaslab_rangesize_compare). Obtained from: ZoL, illumos MFC after: 25 days X-MFC after: r353634	2019-10-16 09:20:08 +00:00
Andriy Gapon	786c532a8f	MFV r348596: 9689 zfs range lock code should not be zpl-specific illumos/illumos-gate@7931524763 FreeBSD note: some tweaking was needed to avoid a conflict with sys/rangelock.h. Author: Matthew Ahrens <mahrens@delphix.com> Obtained from: illumos MFC after: 3 weeks	2019-10-16 09:04:53 +00:00
Andriy Gapon	f6a4b91c75	MFV r353628: 10842 Mutex leak in dsl_dataset_hold_obj() illumos/illumos-gate@ad027c0ff9 `ad027c0ff9` https://www.illumos.org/issues/10842 ZoL `d10b2f1d35` Mutex leak in dsl_dataset_hold_obj() Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Author: Jorgen Lundman <lundman@lundman.net> Obtained from: illumos, ZoL MFC after: 15 days	2019-10-16 07:57:58 +00:00
Andriy Gapon	0c4f60b734	MFV r353619: 9691 fat zap should prefetch when iterating illumos/illumos-gate@52abb70e07 `52abb70e07` https://www.illumos.org/issues/9691 When iterating over a ZAP object, we're almost always certain to iterate over the entire object. If there are multiple leaf blocks, we can realize a performance win by issuing reads for all the leaf blocks in parallel when the iteration begins. For example, if we have 10,000 snapshots, "zfs destroy -nv pool/fs@1%9999" can take 30 minutes when the cache is cold. This change provides a >3x performance improvement, by issuing the reads for all ~64 blocks of each ZAP object in parallel. Author: Matthew Ahrens <mahrens@delphix.com> Obtained from: illumos MFC after: 2 weeks	2019-10-16 07:09:00 +00:00
Andriy Gapon	9efb961d9a	MFV r353617: 9425 allow channel programs to be stopped via signals illumos/illumos-gate@d0cb1fb926 `d0cb1fb926` https://www.illumos.org/issues/9425 Problem Statement ZFS Channel program scripts currently require a timeout, so that hung or long-running scripts return a timeout error instead of causing ZFS to get wedged. This limit can currently be set up to 100 million Lua instructions. Even with a limit in place, it would be desirable to have a sys admin (support engineer) be able to cancel a script that is taking a long time. Proposed Solution Make it possible to abort a channel program by sending an interrupt signal.In the underlying txg_wait_sync function, switch the cv_wait to a cv_wait_sig to catch the signal. Once a signal is encountered, the dsl_sync_task function can install a Lua hook that will get called before the Lua interpreter executes a new line of code. The dsl_sync_task can resume with a standard txg_wait_sync call and wait for the txg to complete. Meanwhile, the hook will abort the script and indicate that the channel program was canceled. The kernel returns a EINTR to indicate that the channel program run was canceled. FreeBSD note: the return value of cv_wait_sig() has inverted meaning between us and illumos. Author: Don Brady <don.brady@delphix.com> Obtained from: illumos MFC after: 4 weeks	2019-10-16 07:00:18 +00:00
Andriy Gapon	179e6dab09	MFV r353615: 9485 Optimize possible split block search space illumos/illumos-gate@a21fe34979 `a21fe34979` https://www.illumos.org/issues/9485 Port this commit from ZoL: `4589f3ae4c` Author: Brian Behlendorf <behlendorf1@llnl.gov> Obtained from: illumos, ZoL MFC after: 3 weeks	2019-10-16 06:43:22 +00:00
Andriy Gapon	7149963e95	MFV r353613: 10731 zfs: NULL pointer errors FreeBSD already had these changes locally. This commit removes a small formatting difference. MFC after: 1 week	2019-10-16 06:38:05 +00:00
Andriy Gapon	6cb9ab2bad	MFC r353611: 10330 merge recent ZoL vdev and metaslab changes illumos/illumos-gate@a0b03b161c `a0b03b161c` https://www.illumos.org/issues/10330 3 recent ZoL changes in the vdev and metaslab code which we can pull over: PR 8324 `c853f382db` 8324 Change target size of metaslabs from 256GB to 16GB PR 8290 `b194fab0fb` 8290 Factor metaslab_load_wait() in metaslab_load() PR 8286 `419ba59145` 8286 Update vdev_is_spacemap_addressable() for new spacemap encoding Author: Serapheim Dimitropoulos <serapheimd@gmail.com> Obtained from: illumos, ZoL MFC after: 2 weeks	2019-10-16 06:26:51 +00:00
Andriy Gapon	b399ca755a	MFV r353608: 10165 libzpool: passing argument 1 to restrict-qualified parameter illumos/illumos-gate@f91fcf59ac `f91fcf59ac` https://www.illumos.org/issues/10165 Author: Toomas Soome <tsoome@me.com> MFC after: 10 days	2019-10-16 06:09:00 +00:00
Andriy Gapon	67f8ab8ebb	fix up r353565, somehow a few files did not get committed MFC after: 3 weeks X-MFC with: r353565	2019-10-15 15:52:01 +00:00
Andriy Gapon	6f2721b907	MFV r353561: 10343 ZoL: Prefix all refcount functions with zfs_ illumos/illumos-gate@e914ace2e9 `e914ace2e9` https://www.illumos.org/issues/10343 On the openzfs feature/porting matrix, this is listed as: prefix to refcount funcs/types Having these changes will make it easier to share other work across the different ZFS operating systems. PR 7963 `424fd7c3e` Prefix all refcount functions with zfs_ PR 7885 & 7932 `c13060e47` Linux 4.19-rc3+ compat: Remove refcount_t compat PR 5823 & 5842 `4859fe796` Linux 4.11 compat: avoid refcount_t name conflict Author: Tim Schumacher <timschumi@gmx.de> Obtained from: illumos, ZoL MFC after: 3 weeks	2019-10-15 15:09:36 +00:00
Andriy Gapon	563db1a947	MFV r353558: 10572 10579 Fix race in dnode_check_slots_free() illumos/illumos-gate@aa02ea0194 `aa02ea0194` 10572 Fix race in dnode_check_slots_free() https://www.illumos.org/issues/10572 The Fix from ZoL: Currently, dnode_check_slots_free() works by checking dn->dn_type in the dnode to determine if the dnode is reclaimable. However, there is a small window of time between dnode_free_sync() in the first call to dsl_dataset_sync() and when the useraccounting code is run when the type is set DMU_OT_NONE, but the dnode is not yet evictable, leading to crashes. This patch adds the ability for dnodes to track which txg they were last dirtied in and adds a check for this before performing the reclaim. This patch also corrects several instances when dn_dirty_link was treated as a list_node_t when it is technically a multilist_node_t. 10579 Don't allow dnode allocation if dn_holds != 0 https://www.illumos.org/issues/10579 The fix from ZoL: This patch simply fixes a small bug where dnode_hold_impl() could attempt to allocate a dnode that was in the process of being freed, but which still had active references. This patch simply adds the required check. Author: Tom Caputi <tcaputi@datto.com> Reported by: delphij MFC after: 2 weeks X-MFC with: r353176	2019-10-15 14:29:18 +00:00
Andriy Gapon	4368589338	MFV r353551: 10452 ZoL: merge in large dnode feature fixes illumos/illumos-gate@946342a260 `946342a260` https://www.illumos.org/issues/10452 illumos is missing a few small follow up ZoL bug fixes for the large dnode feature. We should pull those in. Those commits are in the ZoL tree as (newest to oldest): PR 8435 - `75d6b7ddca` - Add missing copyright notice to large_dnode tests PR 7433 - `e14a32b1c8` - Fix object reclaim when using large dnodes PR 6616 - `48fbb9ddbf` - Free objects when receiving full stream as clone PR 6695 - `39f56627ae` - receive_freeobjects() skips freeing some object Portions contributed by: Ned Bass <bass6@llnl.gov> Portions contributed by: Tom Caputi <tcaputi@datto.com> Author: Fabian Grünbichler <f.gruenbichler@proxmox.com> Obtained from: illumos, ZoL MFC after: 2 weeks X-MFC with: r353176	2019-10-15 14:20:11 +00:00
Jeff Roberson	0012f373e4	(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594	2019-10-15 03:45:41 +00:00
Mateusz Guzik	8fd727827c	zfs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:41:30 +00:00
Andriy Gapon	d0a9542494	fix up r353340, don't assume that fcmpset has strong semantics fcmpset can have two kinds of semantics, weak and strong. For practical purposes, strong semantics means that if fcmpset fails then the reported current value is always different from the expected value. Weak semantics means that the reported current value may be the same as the expected value even though fcmpset failed. That's a so called "sporadic" failure. I originally implemented atomic_cas expecting strong semantics, but many platforms actually have weak one. Reported by: pkubaj (not confirmed if same issue) Discussed with: kib, mjg MFC after: 19 days X-MFC with: r353340	2019-10-11 17:01:02 +00:00
Alan Somers	34e9a37f4d	MFZol: Fix performance of "zfs recv" with many deletions This patch fixes 2 issues with the DMU free throttle implemented in dmu_free_long_range(). The first issue is that get_next_chunk() was calculating the number of L1 blocks the free would dirty incorrectly. In some cases involving extremely large files, this code would greatly overestimate the number of affected L1 blocks, causing excessive calls to txg_wait_open(). This patch corrects the calculation. The second issue is that the free throttle uses the total number of free'd blocks in all (open, quiescing, and syncing) txgs to determine whether to throttle. This causes large frees (such as those created by the first issue) to cause 4 txg syncs before any further frees were allowed to proceed. This patch ensures that the accounting is done entirely in a per-txg fashion, so that frees from a given txg don't affect those that immediately follow it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> zfsonlinux/zfs@f4c594da94 Freeing throttle should account for holes Deletion throttle currently does not account for holes in a file. This means that it can activate when it shouldn't. To fix it we switch the throttle to be based on the number of L1 blocks we will have to dirty when freeing Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> zfsonlinux/zfs@65282ee9e0 Submitted by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: allanjude MFC after: 2 weeks Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D21895	2019-10-11 14:59:28 +00:00
Andriy Gapon	d0c0856f63	emulate illumos membar_producer with atomic_thread_fence_rel membar_producer is supposed to be a store-store barrier. Also, in the code that FreeBSD has ported from illumos membar_producer is used only with regular stores to regular memory (with respect to caching). We do not have an MI primitive for the store-store barrier, so atomic_thread_fence_rel is the closest we have as it provides (load \| store) -> store barrier. Previously, membar_producer was an empty function call on all 32-bit arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was inadequate. On other platforms, such as amd64, arm64, i386, powerpc64, sparc64, membar_producer was implemented using stronger primitives than required for a store-store barrier with respect to regular memory access. For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO). On powerpc64 we now use recommended lwsync instead of eieio. On sparc64 FreeBSD uses TSO mode. On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this is an improvement, actually. After this change we can drop opensolaris_atomic.S for aarch64, amd64, powerpc64 and sparc64 as all required atomic operations have either direct or light-weight mapping to FreeBSD native atomic operations. Discussed with: kib MFC after: 4 weeks	2019-10-10 07:39:41 +00:00
Andriy Gapon	f5c4c7209b	cleanup of illumos compatibility atomics atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms. Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have it. The only exception is sparc64 that provides MD atomic_cas_32 and atomic_cas_64. This is slightly inefficient as fcmpset reports whether the operation updated the target and that information is not needed for cas. Nevertheless, there is less code to maintain and to add for new platforms. Also, the operations are done inline now as opposed to function calls before. atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms that provide it. casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as they have no users. atomic_mtx that is used to emulate 64-bit atomics on platforms that lack them is defined only on those platforms. As a result, platform specific opensolaris_atomic.S files have lost most of their code. The only exception is i386 where the compat+contrib code provides 64-bit atomics for userland use. That code assumes availability of cmpxchg8b instruction. FreeBSD does not have that assumption for i386 userland and does not provide 64-bit atomics. Hopefully, this can and will be fixed. MFC after: 3 weeks	2019-10-09 11:26:36 +00:00
Andriy Gapon	ac99b25298	zfs: use atomic_load_64 to read atomic variable in dmu_object_alloc_impl As long as we support ZFS on 32-bit platforms we should do this for all 64-bit variables that are modified in a lockless fashion using atomic operations. Otherwise, there is a risk of a reading a torn value. Here is a rationale for why I am doing this in dmu_object_alloc_impl: - it's very recent code - the code deals with object IDs and a number of objects in a file system can overflow 32 bits - incorrect allocation of an object ID may result in hard to debug problems - fixing all plain reads of 64-bit atomic variables is not a trivial undertaking to do in one shot, so I chose to do it incrementally MFC after: 3 weeks X-MFC after: r353301, r353176	2019-10-08 11:27:48 +00:00

1 2 3 4 5 ...

2306 Commits