freebsd-dev

Author	SHA1	Message	Date
Kristof Provost	effd82ca70	dtrace: fix fbt return probes on RISC-V Return values are passed in a0, so read it from there. We also pass a1 through to userspace, as the ABI allows small structs to be returned in registers a0/a1. While here read the register values directly from the trapframe rather than rtval, and remove the now unneeded argument from dtrace_invop(). Set fbtp_roffset so that we get the correct return location in arg0. Reviewed by: markj Sponsored by: Axiado Differential Revision: https://reviews.freebsd.org/D26389	2020-09-11 09:15:49 +00:00
Mark Johnston	fcc0db1734	Tighten frame pointer checking in DTrace's amd64 stack unwinder. Avoid assuming that the kernel was compiled with -fno-omit-frame-pointer. MFC after: 1 week Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc.	2020-09-01 15:15:44 +00:00
Matt Macy	a86e97e50d	ZFS: band-aid for -DNO_CLEAN Submitted by: Neal Chauhan Approved by: imp@ Differential Revision: https://reviews.freebsd.org/D26183	2020-08-25 23:35:55 +00:00
Matt Macy	9e5787d228	Merge OpenZFS support in to HEAD. The primary benefit is maintaining a completely shared code base with the community allowing FreeBSD to receive new features sooner and with less effort. I would advise against doing 'zpool upgrade' or creating indispensable pools using new features until this change has had a month+ to soak. Work on merging FreeBSD support in to what was at the time "ZFS on Linux" began in August 2018. I first publicly proposed transitioning FreeBSD to (new) OpenZFS on December 18th, 2018. FreeBSD support in OpenZFS was finally completed in December 2019. A CFT for downstreaming OpenZFS support in to FreeBSD was first issued on July 8th. All issues that were reported have been addressed or, for a couple of less critical matters there are pull requests in progress with OpenZFS. iXsystems has tested and dogfooded extensively internally. The TrueNAS 12 release is based on OpenZFS with some additional features that have not yet made it upstream. Improvements include: project quotas, encrypted datasets, allocation classes, vectorized raidz, vectorized checksums, various command line improvements, zstd compression. Thanks to those who have helped along the way: Ryan Moeller, Allan Jude, Zack Welch, and many others. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25872	2020-08-25 02:21:27 +00:00
Konstantin Belousov	9ce875d9b5	amd64 pmap: LA57 AKA 5-level paging Since LA57 was moved to the main SDM document with revision 072, it seems that we should have a support for it, and silicons are coming. This patch makes pmap support both LA48 and LA57 hardware. The selection of page table level is done at startup, kernel always receives control from loader with 4-level paging. It is not clear how UEFI spec would adapt LA57, for instance it could hand out control in LA57 mode sometimes. To switch from LA48 to LA57 requires turning off long mode, requesting LA57 in CR4, then re-entering long mode. This is somewhat delicate and done in pmap_bootstrap_la57(). AP startup in LA57 mode is much easier, we only need to toggle a bit in CR4 and load right value in CR3. I decided to not change kernel map for now. Single PML5 entry is created that points to the existing kernel_pml4 (KML4Phys) page, and a pml5 entry to create our recursive mapping for vtopte()/vtopde(). This decision is motivated by the fact that we cannot overcommit for KVA, so large space there is unusable until machines start providing wider physical memory addressing. Another reason is that I do not want to break our fragile autotuning, so the KVA expansion is not included into this first step. Nice side effect is that minidumps are compatible. On the other hand, (very) large address space is definitely immediately useful for some userspace applications. For userspace, numbering of pte entries (or page table pages) is always done for 5-level structures even if we operate in 4-level mode. The pmap_is_la57() function is added to report the mode of the specified pmap, this is done not to allow simultaneous 4-/5-levels (which is not allowed by hw), but to accomodate for EPT which has separate level control and in principle might not allow 5-leve EPT despite x86 paging supports it. Anyway, it does not seems critical to have 5-level EPT support now. Tested by: pho (LA48 hardware) Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273	2020-08-23 20:19:04 +00:00
Warner Losh	773e541e8d	Use devctl.h instead of bus.h to reduce newbus pollution. There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@	2020-08-21 00:03:24 +00:00
Mariusz Zaborski	277f38abff	zfs: add an option to the bootloader to rewind the ZFS checkpoint The checkpoints are another way of keeping the state of ZFS. During the rewind, the pool has to be exported. This makes checkpoints unusable when using ZFS as root. Add the option to rewind the ZFS checkpoint at the boot time. If checkpoint exists, a new option for rewinding a checkpoint will appear in the bootloader menu. We fully support boot environments. If the rewind option is selected, the boot loader will show a list of boot environments that existed before the checkpoint. Reviewed by: tsoome, allanjude, kevans (ok with high-level overview) Differential Revision: https://reviews.freebsd.org/D24920	2020-08-18 19:48:04 +00:00
Alex Richardson	11412d5bc9	Fix linker error in libuutil with recent LLVM Not marking the function as static can result in a linker error: undefined reference to __assfail [--no-allow-shlib-undefined] I noticed this error after updating our CHERI LLVM to the latest upstream LLVM HEAD revision. This change effectively reverts r329984 and marks dmu_buf_init_user as static (which keeps the GCC build happy). Reviewed By: #zfs, asomers, freqlabs, mav Differential Revision: https://reviews.freebsd.org/D25663	2020-08-07 16:04:21 +00:00
Alex Richardson	ec4deee4e4	Fix cddl tools bootstrapping on macOS and Linux Reviewed By: brooks Differential Revision: https://reviews.freebsd.org/D25979	2020-08-07 16:03:55 +00:00
Toomas Soome	722c2b4aca	MFOpenZFS: Add support for boot environment data to be stored in the label We are building new bootonce mechanism (previously zfs bootnext) and it is based on this OpenZFS change. Since this patch is nicely self contained, I am commiting it as is, and we can stack our changes. Original patch description follows: Modern bootloaders leverage data stored in the root filesystem to enable some of their powerful features. GRUB specifically has a grubenv file which can store large amounts of configuration data that can be read and written at boot time and during normal operation. This allows sysadmins to configure useful features like automated failover after failed boot attempts. Unfortunately, due to the Copy-on-Write nature of ZFS, the standard behavior of these tools cannot handle writing to ZFS files safely at boot time. We need an alternative way to store data that allows the bootloader to make changes to the data. This work is very similar to work that was done on Illumos to enable similar functionality in the FreeBSD bootloader. This patch is different in that the data being stored is a raw grubenv file; this file can store arbitrary variables and values, and the scripting provided by grub is powerful enough that special structures are not required to implement advanced behavior. We repurpose the second padding area in each label to store the grubenv file, protected by an embedded checksum. We add two ioctls to get and set this data, and libzfs_core and libzfs functions to access them more easily. There are no direct command line interfaces to these functions; these will be added directly to the bootloader utilities. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10009 Obtained from: OpenZFS Sponsored by: Netflix, Klara Inc.	2020-08-05 14:32:20 +00:00
Toomas Soome	491ceb65ec	zfs_keys_nextboot array is missing ZPOOL_CONFIG_POOL_GUID and ZPOOL_CONFIG_GUID As we do check the incomint nvlist, we either need to list all possible keys or use wildcard. PR: 248462 Reported by: larafercue@gmail.com Sponsored by: Netflix, Klara Inc.	2020-08-05 14:08:44 +00:00
Mateusz Guzik	d292b1940c	vfs: remove the obsolete privused argument from vaccess This brings argument count down to 6, which is passable without the stack on amd64.	2020-08-05 09:27:03 +00:00
Mateusz Guzik	e4cdb74faf	zfs: add support for lockless lookup Tested by: pho (in a patchset, previous version) Differential Revision: https://reviews.freebsd.org/D25581	2020-07-25 10:39:41 +00:00
Mateusz Guzik	0379ff6ae3	vfs: introduce vnode sequence counters Modified on each permission change and link/unlink. Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25573	2020-07-25 10:31:52 +00:00
Andriy Gapon	2032c532aa	dtrace/fbt: fix return probe arguments on arm arg0 should be an offset of the return point within the function, arg1 should be the return value. Previously the return probe had arguments as if for the entry probe. Tested on armv7. andrew noted that the same problem seems to be present on arm64, mips, and riscv. I am not sure if I will get around to fixing those. So, platform users or anyone looking to make a contribution please be aware of this opportunity. Reviewed by: markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25685	2020-07-21 07:41:36 +00:00
Mark Johnston	17eee3b501	Fix a memory leak in dsl_scan_visitbp(). This should be triggered only if arc_read() fails, i.e., quite rarely. The same logic is already present in OpenZFS. PR: 247445 Submitted by: jdolecek@NetBSD.org MFC after: 1 week	2020-07-20 17:05:44 +00:00
Andrew Turner	256c5d705a	Don't overflow the trap frame when accessing lr or xzr. When emulating a load pair or store pair in dtrace on arm64 we need to copy the data between the stack and trap frame. When the registers are either the link register or the zero register we will access memory past the end of the trap frame as these are encoded as registers 30 and 31 respectively while the array they access only has 30 entries. Fix this by creating 2 helper functions to perform the operation with special cases for these registers. Sponsored by: Innovate UK	2020-07-17 14:39:07 +00:00
Alan Somers	f60b4812d8	Fix page fault in zfsctl_snapdir_getattr Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I can't reproduce this panic on demand, but this looks like the correct solution. PR: 247668 Reviewed by: avg MFC after: 2 weeks Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D25543	2020-07-02 13:17:31 +00:00
Matt Macy	4dc16f4391	Fix "current" variable name conflict with openzfs The variable "current" is an alias for curthread in openzfs. Rename all variable uses of current in dtrace.c to curstate.	2020-06-27 00:57:48 +00:00
Toomas Soome	a14844e0d6	MFOpenZFS: Add basic zfs ioc input nvpair validation We want newer versions of libzfs_core to run against an existing zfs kernel module (i.e. a deferred reboot or module reload after an update). Programmatically document, via a zfs_ioc_key_t, the valid arguments for the ioc commands that rely on nvpair input arguments (i.e. non legacy commands from libzfs_core). Automatically verify the expected pairs before dispatching a command. This initial phase focuses on the non-legacy ioctls. A follow-on change can address the legacy ioctl input from the zfs_cmd_t. The zfs_ioc_key_t for zfs_keys_channel_program looks like: static const zfs_ioc_key_t zfs_keys_channel_program[] = { {"program", DATA_TYPE_STRING, 0}, {"arg", DATA_TYPE_UNKNOWN, 0}, {"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL}, {"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, {"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, }; Introduce four input errors to identify specific input failures (in addition to generic argument value errors like EINVAL, ERANGE, EBADF, and E2BIG). ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type Reviewed by: allanjude Obtained from: OpenZFS Sponsored by: Netflix, Klara Inc. Differential Revision: https://reviews.freebsd.org/D25393	2020-06-23 06:42:39 +00:00
Allan Jude	c5305bb50a	MFOpenZFS: Add zio_ddt_free()+ddt_phys_decref() error handling The assumption in zio_ddt_free() is that ddt_phys_select() must always find a match. However, if that fails due to a damaged DDT or some other reason the code will NULL dereference in ddt_phys_decref(). While this should never happen it has been observed on various platforms. The result is that unless your willing to patch the ZFS code the pool is inaccessible. Therefore, we're choosing to more gracefully handle this case rather than leave it fatal. http://mail.opensolaris.org/pipermail/zfs-discuss/2012-February/050972.html `5dc6af0eec` Reported by: Pierre Beyssac Obtained from: OpenZFS MFC after: 2 weeks Sponsored by: Klara Inc.	2020-06-22 19:03:02 +00:00
Toomas Soome	3830659e99	loader: create single zfs nextboot implementation We should have nextboot feature implemented in libsa zfs code. To get there, I have created zfs_nextboot() implementation based on two sources, our current simple textual string based approach with added structured boot label PAD structure from OpenZFS. Secondly, all nvlist details are moved to separate source file and restructured a bit. This is done to provide base support to add nvlist add/update feature in followup updates. And finally, the zfsboot/gptzfsboot disk access functions are swapped to use libi386 and libsa. Sponsored by: Netflix, Klara Inc. Differential Revision: https://reviews.freebsd.org/D25324	2020-06-20 06:23:31 +00:00
Allan Jude	9598fc63e6	ZFS: Allow setting checksum=skein on boot pools PR: 245889 Reported by: delphij Sponsored by: Klara Inc.	2020-06-19 17:59:55 +00:00
Rick Macklem	1f7104d720	Fix export_args ex_flags field so that is 64bits, the same as mnt_flags. Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088	2020-06-14 00:10:18 +00:00
Andriy Gapon	04dc03e0fe	fix up r362047: a call to zvol_*_minors() was not hidden from userland Reported by: CI/FreeBSD-head-powerpc64-build MFC after: 5 weeks X-MFC with: r362047	2020-06-11 11:35:30 +00:00
Andriy Gapon	f51f07e1ec	rework how ZVOLs are updated in response to DSL operations With this change all ZVOL updates are initiated from the SPA sync context instead of a mix of the sync and open contexts. The updates are queued to be applied by a dedicated thread in the original order. This should ensure that ZVOLs always accurately reflect the corresponding datasets. ZFS ioctl operations wait on the mentioned thread to complete its work. Thus, the illusion of the synchronous ZVOL update is preserved. At the same time, the SPA sync thread never blocks on ZVOL related operations avoiding problems like reported in bug 203864. This change is based on earlier work in the same direction: D7179 and D14669 by Anthoine Bourgeois. D7179 tried to perform ZVOL operations in the open context and that opened races between them. D14669 uses a design very similar to this change but with different implementation details. This change also heavily borrows from similar code in ZoL, but there are many differences too. See: - `a0bd735adb` - https://github.com/zfsonlinux/zfs/issues/3681 - https://github.com/zfsonlinux/zfs/issues/2217 PR: 203864 MFC after: 5 weeks Sponsored by: CyberSecure Differential Revision: https://reviews.freebsd.org/D23478	2020-06-11 10:41:31 +00:00
Ruslan Bukin	d75038a0af	Fix entering KDB with dtrace-enabled kernel. Reviewed by: markj, jhb Differential Revision: https://reviews.freebsd.org/D24018	2020-05-26 16:44:05 +00:00
Mark Johnston	66b415fb8f	Don't block on the range lock in zfs_getpages(). After r358443 the vnode object lock no longer synchronizes concurrent zfs_getpages() and zfs_write() (which must update vnode pages to maintain coherence). This created a potential deadlock between ZFS range locks and VM page busy locks: a fault on a mapped file will cause the fault page to be busied, after which zfs_getpages() locks a range around the file offset in order to map adjacent, resident pages; zfs_write() locks the range first, and then must busy vnode pages when synchronizing. Solve this by adding a non-blocking mode for ZFS range locks, and using it in zfs_getpages(). If zfs_getpages() fails to acquire the range lock, only the fault page will be populated. Reported by: bdrewery Reviewed by: avg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24839	2020-05-20 18:29:23 +00:00
Toomas Soome	22ed31c23f	lz4 hash table does not start zeroed illumos issue: https://www.illumos.org/issues/12757 Submitted by: andyf	2020-05-19 19:53:12 +00:00
Kyle Evans	47c7d8327c	zfs: reject read(2) of a dirfd with EISDIR This is independent of the recently-discussed global change, which is still in review/discussion stage. This is effectively a measure for consistency in the ZFS world, where FreeBSD was the only platform (as far as I could find) that allowed this. What ZFS exposes is decidedly not useful for any real purposes, to paraphrase (hopefully faithfully) jhb's findings when exploring this: The size of a directory in ZFS is the number of directory entries within. When reading a directory, you would instead get the leading part of its raw contents; the amount you get being dictated by the "size," i.e. number of directory entries. There's decidedly (luckily) no stack disclosure happening here, though the behavior is bizarre and almost certainly a historical accident. This change has already been upstreamed to OpenZFS. MFC after: 1 week	2020-05-19 02:41:05 +00:00
John Baldwin	2c213c2e75	Correct the order of arguments to copyin() for Q_SETQUOTA. MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24656	2020-05-18 16:47:44 +00:00
Pawel Jakub Dawidek	cb761bb2fb	Avoid the GEOM topology lock recursion when we automatically expand a pool. The steps to reproduce the problem: mdconfig -a -t swap -s 3g -u 0 gpart create -s GPT md0 gpart add -t freebsd-zfs -s 1g md0 zpool create -o autoexpand=on foo md0p1 gpart resize -i 1 -s 2g md0	2020-04-25 21:45:31 +00:00
John Baldwin	5c4309b474	Handle non-dtrace-triggered kernel breakpoint traps in mips. If DTRACE is enabled at compile time, all kernel breakpoint traps are first given to dtrace to see if they are triggered by a FBT probe. Previously if dtrace didn't recognize the trap, it was silently ignored breaking the handling of other kernel breakpoint traps such as the debug.kdb.enter sysctl. This only returns early from the trap handler if dtrace recognizes the trap and handles it. Submitted by: Nicolò Mazzucato <nicomazz97@gmail.com> Reviewed by: markj Obtained from: CheriBSD Differential Revision: https://reviews.freebsd.org/D24478	2020-04-21 17:38:07 +00:00
Gleb Smirnoff	9edef911e8	Make ZFS depend on xdr.ko only. It doesn't need kernel RPC. Differential Revision: https://reviews.freebsd.org/D24408	2020-04-17 06:05:08 +00:00
Ryan Moeller	69534635ff	MFOpenZFS: ZVOLs should not be allowed to have children zfs create, receive and rename can bypass this hierarchy rule. Update both userland and kernel module to prevent this issue and use pyzfs unit tests to exercise the ioctls directly. Note: this commit slightly changes zfs_ioc_create() ABI. This allow to differentiate a generic error (EINVAL) from the specific case where we tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT). Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Approved by: mav (mentor) MFC after: 2 weeks Sponsored by: iXsystems, Inc. openzfs/zfs@d8d418ff0c	2020-03-25 15:56:18 +00:00
Alexander Motin	d3c6ba3214	MFOpenZFS: make zil max block size tunable We've observed that on some highly fragmented pools, most metaslab allocations are small (~2-8KB), but there are some large, 128K allocations. The large allocations are for ZIL blocks. If there is a lot of fragmentation, the large allocations can be hard to satisfy. The most common impact of this is that we need to check (and thus load) lots of metaslabs from the ZIL allocation code path, causing sync writes to wait for metaslabs to load, which can take a second or more. In the worst case, we may not be able to satisfy the allocation, in which case the ZIL will resort to txg_wait_synced() to ensure the change is on disk. To provide a workaround for this, this change adds a tunable that can reduce the size of ZIL blocks. External-issue: DLPX-61719 Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #8865 openzfs/zfs@b8738257c2 MFC after: 2 weeks	2020-03-19 01:05:54 +00:00
Alexander Motin	cf2f2eb568	Fix infinite scan on a pool with only special allocations Attempt to run scrub or resilver on a new pool containing only special allocations (special vdev added on creation) caused infinite loop because of dsl_scan_should_clear() limiting memory usage to 5% of pool size, which it calculated accounting only normal allocation class. Addition of special and just in case dedup classes fixes the issue. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #10106 Closes #8694 openzfs/zfs@fa130e010c	2020-03-16 19:03:10 +00:00
Ryan Moeller	9f24784038	TODO DONE: Use sx_xholder in SPL rwlock.h Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-03-14 00:16:15 +00:00
Konstantin Belousov	d5b7401f64	zfs dmu_read: loosen the assertion. Since switch to the lockless grab, shared busy for ahead/behind pages allows other threads to validate and map the pages readonly. Reviewed by: avg, jeff, markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23986	2020-03-06 21:15:25 +00:00
Alexander Motin	5c940cf1ff	Remove vfs.zfs.top_maxinflight tunable/sysctl. It is dead since sorted scrub import at r334844. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-03-05 19:43:43 +00:00
Alexander Motin	e37d5c12e9	Increase number of write completion threads, matching ZoL. Our iSCSI benchmarks on a large 80-core system show that previous limit of 8 threads can be a bottleneck. At some points this change increases write IOPS by as much as 50%. I am still not sure that so many threads is really required, but we tested lower amounts and got no significant benefits, while latencies were a bit worse, so decided to not diverge. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-03-03 15:05:13 +00:00
Jeff Roberson	9defe1c076	Eliminate object locking in zfs where possible with the new lockless grab APIs. Reviewed by: kib, markj, mmacy Differential Revision: https://reviews.freebsd.org/D23848	2020-02-28 20:29:53 +00:00
Mark Johnston	a7261520ba	Clear systrace_args_func when systrace probes are disabled. This function pointer is invalidated when systrace.ko is unloaded. Reported by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-02-28 17:04:36 +00:00
Andriy Gapon	40b1e0dc0e	remove stray space symbol in r358380 MFC after: 1 week X-MFC with: r358380	2020-02-27 14:27:42 +00:00
Andriy Gapon	6d11243ae2	use ZFS_MAX_DATASET_NAME_LEN instead of MAXPATHLEN for dataset names The change affects only FreeBSD specific code as the common code already mostly uses the more idiomatic and correct ZFS_MAX_DATASET_NAME_LEN. MFC after: 1 week	2020-02-27 14:21:01 +00:00
Andriy Gapon	6b47663df5	dsl_dataset_promote_sync: populate 'oldname' before using it It's very unlikely that zfsvfs_update_fromname() and zvol_rename_minors() ever did anything during the promote operation as the old name was not initialized. MFC after: 1 week	2020-02-27 14:12:43 +00:00
Alexander Motin	a33a65ce22	MFZoL: Relax restriction on zfs_ioc_next_obj() iteration Per the documentation for dnode_next_offset in dnode.c, the "txg" parameter specifies a lower bound on which transaction the dnode can be found in. We are interested in all dnodes that are removed between the first and last transaction in the snapshot. It doesn't need to be created in that snapshot to correspond to a removed file. In fact, the behavior of zfs diff in the test case exactly matches this: the transaction that created the data that was deleted in snapshot "2" was produced before, in snapshot "1", definitely predating the first transaction in snapshot "2". Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com> Closes #2081 zfsonlinux/zfs@7290cd3c4e MFC after: 1 week	2020-02-26 20:38:48 +00:00
Toomas Soome	c1c4c81fd7	loader: replace zfs_alloc/zfs_free with malloc/free Use common memory management.	2020-02-26 18:12:12 +00:00
Alexander Motin	0f58760b82	MFZoL: Fix resilver writes in vdev_indirect_io_start This patch addresses an issue found in ztest where resilver write zios that were passed to an indirect vdev would end up being handled as though they were resilver read zios. This caused issues where the zio->io_abd would be both read to and written from at the same time, causing asserts to fail. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8193 zfsonlinux/zfs@5aa95ba0d3 MFC after: 1 week	2020-02-26 16:51:45 +00:00
Alexander Motin	51c04e6cc2	Fix patch mismerge in r358336. MFC after: 1 week	2020-02-26 16:04:24 +00:00
Alexander Motin	f8a7a04b79	MFZoL: Fix issue with scanning dedup blocks as scan ends This patch fixes an issue discovered by ztest where dsl_scan_ddt_entry() could add I/Os to the dsl scan queues between when the scan had finished all required work and when the scan was marked as complete. This caused the scan to spin indefinitely without ending. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010 zfsonlinux/zfs@5e0bd0ae05 MFC after: 1 week	2020-02-26 15:59:46 +00:00
Alexander Motin	308acfcc62	MFZoL: Fix 2 small bugs with cached dsl_scan_phys_t This patch corrects 2 small bugs where scn->scn_phys_cached was not properly updated to match the primary copy when it needed to be. The first resulted in the pause state not being properly updated and the second resulted in the cached version being completely zeroed even if the primary was not. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010 zfsonlinux/zfs@8cb119e3dc MFC after: 1 week	2020-02-26 15:47:40 +00:00
Alexander Motin	4b7f090f8d	MFZoL: Fix txg_sync_thread hang in scan_exec_io() When scn->scn_maxinflight_bytes has not been initialized it's possible to hang on the condition variable in scan_exec_io(). This issue was uncovered by ztest and is only possible when deduplication is enabled through the following call path. txg_sync_thread() spa_sync() ddt_sync_table() ddt_sync_entry() dsl_scan_ddt_entry() dsl_scan_scrub_cb() dsl_scan_enqueuei() scan_exec_io() cv_wait() Resolve the issue by always initializing scn_maxinflight_bytes to a reasonable minimum value. This value will be recalculated in dsl_scan_sync() to pick up changes to zfs_scan_vdev_limit and the addition/removal of vdevs. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7098 zfsonlinux/zfs@f90a30ad1b MFC after: 1 week	2020-02-26 15:45:04 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Alexander Motin	8d8e484d9c	Remove duplicate dbufs accounting. Since AVL already has embedded element counter, use dn_dbufs_count only for dbufs not counted there (bonus buffers) and just add them. This removes two atomics per dbuf life cycle. According to profiler it reduces time spent by dbuf_destroy() inside bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core. This counter is used only on illumos, so for FreeBSD it was just a waste of time. MFC after: 2 weeks	2020-02-07 15:50:47 +00:00
Alexander Motin	c10aea724f	Reduce number of atomic_add() calls in aggsum. Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to re-borrow after the flush. But since asc_borrowed and asc_delta are accessed only while holding asc_lock, it makes no any sense to modify as_lower_bound and as_upper_bound in multiple steps. Instead of that the new code uses only 2 atomics in all the cases, one per as_*_bound variable. I think even that is overkill, simple atomic store and load could be used here, since all modifications are done under the as_lock, but there are no such primitives in ZFS code now. While there, make borrow code consider previous borrow value, so that on mixed request patterns reduce chance of needing to borrow again if much larger request follows tiny one that needed borrow. Also reduce as_numbuckets from uint64_t to u_int. It makes no sense to use so large division operation on every aggsum_add(). Reviewed by: Brian Behlendorf, Paul Dagnelie MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-02-06 20:32:53 +00:00
Konstantin Belousov	a421e8786b	Add sys/systm.h to several places that use vm headers. It is needed (but not enough) to use e.g. KASSERT() in inline functions. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-02-04 18:56:26 +00:00
Alexander Motin	ea642c5c38	Few microoptimizations to dbuf layer. Move db_link into the same cache line as db_blkid and db_level. It allows significantly reduce avl_add() time in dbuf_create() on systems with large RAM and huge number of dbufs per dnode. Avoid few accesses to dbuf_caches[].size, which is highly congested under high IOPS and never stays in cache for a long time. Use local value we are receiving from zfs_refcount_add_many() any way. Remove cache_size_bytes_max bump from dbuf_evict_one(). I don't see a point to do it on dbuf eviction after we done it on insertion in dbuf_rele_and_unlock(). Reviewed by: mahrens, Brian Behlendorf MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-02-04 15:53:51 +00:00
Toomas Soome	4d297e7035	loader: rewrite zfs reader zap code to use malloc First step on removing zfs_alloc. Reviewed by: delphij Differential Revision: https://reviews.freebsd.org/D23433	2020-02-04 07:37:55 +00:00
Warner Losh	58aa35d429	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs	2020-02-03 17:35:11 +00:00
Alexander Motin	c68c82324f	Unblock kstat.zfs.misc.dbufstats sysctls. It is not so much broken to hide it after we wasted time to collect it. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-02-03 17:10:40 +00:00
Kyle Evans	6a5abb1ee5	Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247	2020-02-02 16:34:57 +00:00
Kyle Evans	c887ac8324	zfs: light refactor to indicate cachedlookup in zfs_lookup If we come from VOP_CACHEDLOOKUP, we must skip the VEXEC check as it will have been done in the caller (vfs_cache_lookup). This is a part of D23247, which may skip the earlier VEXEC check as well if the root fd was opened with O_SEARCH. This one required slightly more work as zfs_lookup may also be called indirectly as VOP_LOOKUP or a couple of other places where we must do the check.	2020-02-02 16:10:33 +00:00
Mateusz Guzik	f0c402e425	zfs: ZFS_WLOCK_TEARDOWN_INACTIVE_WLOCKED -> ZFS_TEARDOWN_INACTIVE_WLOCKED Fix up the argument used in one case as well.	2020-02-01 06:39:10 +00:00
Mateusz Guzik	8c3658c450	zfs: convert z_teardown_inactive_lock to sleepable read-mostly lock This eliminates a global serialisation point. It only gets write locked on unmount. Sample result doing an incremental -j 40 build: before: 173.30s user 458.97s system 2595% cpu 24.358 total after: 168.58s user 254.92s system 2211% cpu 19.147 total	2020-01-31 08:38:38 +00:00
Mateusz Guzik	b076e113af	zfs: provide macros to handle z_teardown_inactive_lock	2020-01-31 08:37:35 +00:00
Mateusz Guzik	42a9f8f21d	zfs: fix spurious lock contention during path lookup ZFS tracks if anything denies VEXEC to allow for a quick check for the common case of path traversal. Use it. Differential Revision: https://reviews.freebsd.org/D22224	2020-01-30 02:16:17 +00:00
Mateusz Guzik	e196f7825e	zfs: use VOP_NEED_INACTIVE Big thanks to Greg V for testing previous verions of the patch. Differential Revision: https://reviews.freebsd.org/D22130	2020-01-30 02:14:10 +00:00
Alexander Motin	da19f62dfa	Map ECKSUM and EFRAGS from ZFS onto real errnos. Make it less confusing when, for example, stat sets errno to 122 because a checksum failed in ZFS: Before: getfacl: /foo/bar: stat() failed: Unknown error: 122 After: getfacl: /foo/bar: stat() failed: Integrity check failed Submitted by: Ryan Moeller <ryan@ixsystems.com> Reviewed by: mckusick, mav MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D22973	2020-01-13 22:06:16 +00:00
Mateusz Guzik	879e0604ee	Add KERNEL_PANICKED macro for use in place of direct panicstr tests	2020-01-12 06:07:54 +00:00
Mateusz Guzik	638af813d9	dtrace: add missing CLTFLAG_MPSAFE annotations	2020-01-12 04:53:22 +00:00
Mateusz Guzik	20fa645666	zfs: add missing CLTFLAG_MPSAFE annotations	2020-01-12 04:53:01 +00:00
Mateusz Guzik	b52d50cf69	vfs: prealloc vnodes in getnewvnode_reserve Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations.	2020-01-11 22:58:14 +00:00
Ian Lepore	8bfc473c0e	Remove scary-looking printf output that happens when you kldload dtrace on arm. Replace it with a comment block explaining why the function is empty on 32-bit arm.	2020-01-09 22:51:37 +00:00
Mateusz Guzik	75ad73a8b9	zfs: plug a vnode reserve leak in zfs_make_xattrdir	2020-01-07 04:34:29 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Brandon Bergren	9aafc7c052	[PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations This is a lock-based emulation of 64-bit atomics for kernel use, split off from an earlier patch by jhibbits. This is needed to unblock future improvements that reduce the need for locking on 64-bit platforms by using atomic updates. The implementation allows for future integration with userland atomic64, but as that implies going through sysarch for every use, the current status quo of userland doing its own locking may be for the best. Submitted by: jhibbits (original patch), kevans (mips bits) Reviewed by: jhibbits, jeff, kevans Differential Revision: https://reviews.freebsd.org/D22976	2020-01-02 23:20:37 +00:00
Mark Johnston	9f5632e6c8	Remove page locking for queue operations. With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885	2019-12-28 19:04:00 +00:00
Mateusz Guzik	6fa079fc3f	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738	2019-12-16 00:06:22 +00:00
Toomas Soome	3c2db0ef43	loader: rewrite zfs vdev initialization In some cases the pool discovery will get stuck in infinite loop while setting up the vdev children. To fix, we split the vdev setup into two parts, first we create vdevs based on configuration we do get from pool label, then, we process pool config from MOS and update the pool config if needed. Testing done: confirm previously hung loader is not hung any more. MFC after: 1 week	2019-12-15 21:52:40 +00:00
Jeff Roberson	61a74c5ccd	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626	2019-12-15 21:11:15 +00:00
John Baldwin	889ad0b890	Use a callout instead of timeout(9) for delayed zio's. Reviewed by: avg Differential Revision: https://reviews.freebsd.org/D22597	2019-12-13 19:27:51 +00:00
Mateusz Guzik	c8b29d1212	vfs: locking primitives which elide ->v_vnlock and shared locking disablement Both of these features are not needed by many consumers and result in avoidable reads which in turn puts them on profiles due to cache-line ping ponging. On top of that the current lockgmr entry point is slower than necessary single-threaded. As an attempted clean up preparing for other changes, provide new routines which don't support any of the aforementioned features. With these patches in place vop_stdlock and vop_stdunlock disappear from flamegraphs during -j 104 buildkernel. Reviewed by: jeff (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D22665	2019-12-11 23:11:21 +00:00
Mateusz Guzik	abd80ddb94	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
Mark Johnston	bf10551606	Fix an inverted condition introduced in r353539. This would have most likely resulted in read errors causing page leaks. Submitted by: jeff	2019-12-06 23:49:37 +00:00
Konstantin Belousov	fdc6b10d44	Add a VN_OPEN_INVFS flag. vn_open_cred() assumes that it is called from the top-level of a VFS syscall. Writers must call bwillwrite() before locking any VFS resource to wait for cleanup of dirty buffers. ZFS getextattr() and setextattr() VOPs do call vn_open_cred(), which results in wait for unrelated buffers while owning ZFS vnode lock (and ZFS does not use buffer cache). VN_OPEN_INVFS allows caller to skip bwillwrite. Note that ZFS is still incorrect there, because it starts write on an mp and locks a vnode while holding another vnode lock. Reported by: Willem Jan Withagen <wjw@digiware.nl> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-11-29 14:02:32 +00:00
Alexander Motin	5008399c14	Fix use-after-free in case of L2ARC prefetch failure. In case L2ARC read failed, l2arc_read_done() creates _different_ ZIO to read data from the original storage device. Unfortunately pointer to the failed ZIO remains in hdr->b_l1hdr.b_acb->acb_zio_head, and if some other read try to bump the ZIO priority, it will crash. The problem is reproducible by corrupting L2ARC content and reading some data with prefetch if l2arc_noprefetch tunable is changed to 0. With the default setting the issue is probably not reproducible now. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-11-28 18:28:35 +00:00
Andriy Gapon	8491540808	MFV r354383: 10592 misc. metaslab and vdev related ZoL bug fixes illumos/illumos-gate@555d674d5d `555d674d5d` https://www.illumos.org/issues/10592 This is a collection of recent fixes from ZoL: `8eef997679` Error path in metaslab_load_impl() forgets to drop ms_sync_lock `928e8ad47d` Introduce auxiliary metaslab histograms `425d3237ee` Get rid of space_map_update() for ms_synced_length `6c926f426a` Simplify log vdev removal code `21e7cf5da8` zdb -L should skip leak detection altogether `df72b8bebe` Rename range_tree_verify to range_tree_verify_not_present `75058f3303` Remove unused vdev_t fields Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com> MFC after: 4 weeks	2019-11-21 13:35:43 +00:00
Andriy Gapon	489912da7b	MFV r354382,r354385: 10601 10757 Pool allocation classes illumos/illumos-gate@663207adb1 `663207adb1` 10601 Pool allocation classes https://www.illumos.org/issues/10601 illumos port of ZoL Pool allocation classes. Includes at least these two commits: `441709695` Pool allocation classes misplacing small file blocks `cc99f275a` Pool allocation classes 10757 Add -gLp to zpool subcommands for alt vdev names https://www.illumos.org/issues/10757 Port from ZoL of `d2f3e292d` Add -gLp to zpool subcommands for alt vdev names Note that a subsequent ZoL commit changed -p to -P `a77f29f93` Change full path subcommand flag from -p to -P Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Portions contributed by: Håkan Johansson <f96hajo@chalmers.se> Portions contributed by: Richard Yao <ryao@gentoo.org> Portions contributed by: Chunwei Chen <david.chen@nutanix.com> Portions contributed by: loli10K <ezomori.nozomu@gmail.com> Author: Don Brady <don.brady@delphix.com> 11541 allocation_classes feature must be enabled to add log device illumos/illumos-gate@c1064fd7ce `c1064fd7ce` https://www.illumos.org/issues/11541 After the allocation_classes feature was integrated, one can no longer add a log device to a pool unless that feature is enabled. There is an explicit check for this, but it is unnecessary in the case of log devices, so we should handle this better instead of forcing the feature to be enabled. Author: Jerry Jelinek <jerry.jelinek@joyent.com> FreeBSD notes. I faithfully added the new -g, -L, -P flags, but only -g does something: vdev GUIDs are displayed instead of device names. -L, resolve symlinks, and -P, display full disk paths, do nothing at the moment. The use of special vdevs is backward compatible for read-only access, so root pools should be bootable, but exercise caution. MFC after: 4 weeks	2019-11-21 08:20:05 +00:00
Andriy Gapon	a8c08e008a	MFV r354378,r354379,r354386: 10499 Multi-modifier protection (MMP) 10499 Multi-modifier protection (MMP) illumos/illumos-gate@e0f1c0afa4 `e0f1c0afa4` https://www.illumos.org/issues/10499 Port the following ZFS commits from ZoL to illumos. `379ca9cf2` Multi-modifier protection (MMP) `bbffb59ef` Fix multihost stale cache file import `0d398b256` Do not initiate MMP writes while pool is suspended 10701 Correct lock ASSERTs in vdev_label_read/write illumos/illumos-gate@58447f688d `58447f688d` https://www.illumos.org/issues/10701 Port of ZoL commit: `0091d66f4e` Correct lock ASSERTs in vdev_label_read/write At a minimum, this fixes a blown assert during an MMP test run when running on a DEBUG build. 11770 additional mmp fixes illumos/illumos-gate@4348eb9012 `4348eb9012` https://www.illumos.org/issues/11770 Port a few additional MMP fixes from ZoL that came in after our initial MMP port. `4ca457b065` ZTS: Fix mmp_interval failure `ca95f70dff` zpool import progress kstat (only minimal changes from above can be pulled in right now) `060f0226e6` MMP interval and fail_intervals in uberblock Note from the committer (me). I do not have any use for this feature and I have not tested it. I only did smoke testing with multihost=off. Please be aware. I merged the code only to make future merges easier. Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Portions contributed by: Tim Chase <tim@chase2k.com> Portions contributed by: sanjeevbagewadi <sanjeev.bagewadi@gmail.com> Portions contributed by: John L. Hammond <john.hammond@intel.com> Portions contributed by: Giuseppe Di Natale <dinatale2@llnl.gov> Portions contributed by: Prakash Surya <surya1@llnl.gov> Portions contributed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Olaf Faaland <faaland1@llnl.gov> MFC after: 4 weeks	2019-11-18 09:38:35 +00:00
Konstantin Belousov	a7af4a3e7d	amd64: move GDT into PCPU area. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22302	2019-11-12 15:51:47 +00:00
Andriy Gapon	930db3e338	MFV r354377: 10554 Implemented zpool sync command illumos/illumos-gate@9c2acf00e2 `9c2acf00e2` https://www.illumos.org/issues/10554 During the port of MMP (illumos bug 10499) from ZoL, I found this earlier ZoL project is a prerequisite. Here is the original description. This addition will enable us to sync an open TXG to the main pool on demand. The functionality is similar to 'sync(2)' but 'zpool sync' will return when data has hit the main storage instead of potentially just the ZIL as is the case with the 'sync(2)' cmd. Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Author: Alek Pinchuk <apinchuk@datto.com> MFC after: 3 weeks Relnotes: possibly	2019-11-07 11:18:28 +00:00
Alexander Motin	4cd20c3b08	Add vfs.zfs.zio.taskq_batch_pct tunable. MFC after: 1 week	2019-11-05 15:19:05 +00:00
Andriy Gapon	ec03988887	fix up r354333, make zfsproc visible to dtrace, rename to system_proc I overlooked the fact that zfsproc is required by dtrace modules that use illumos compatible taskq KPI. So, move the symbol definition to the opensolaris module that provides compatibility support for both ZFS and DTrace. Also, rename zfsproc to system_proc to reflect that it is not specific to ZFS. Reported by: ae MFC after: 5 weeks X-MFC with: ae	2019-11-05 14:34:59 +00:00
Andriy Gapon	eb819923ec	zfs: enable SPA_PROCESS on the kernel side The purpose of this change is to group kernelthreads specific to a particular ZFS pool under a kernel process. There can be many dozens of threads per pool. This change improves observability of those threads. This change consists of several subchanges: 1. illumos taskq_create_proc can now pass its process parameter to taskqueue. Also, use zfsproc instead of NULL for taskq_create. Caveat: zfsproc might not be initialized yet. But in that case it is still NULL, so not worse than before. 2. illumos sys/proc.h: kthread id is stored in t_did field, not t_tid. 3. zfs: enable SPA_PROCESS on the kernel side. The change is a bit hairy as newproc() is implemented privately to spa.c. I couldn't think of a better way to populate process name than to poke inside the argument for the process routine. 4. illumos thread_create: allow assigning thread to process other than zfsproc. 5. zfs: expose spa_proc to other users, assign sync and quiesce threads to it. Pool-specific threads created using (relatively new) zthr mechanism are still assigned to the zfskern process rather than to a respective zpool-xxx process. I am going to address this a bit later. Reviewed by: no one MFC after: 5 weeks Relnotes: perhaps Differential Revision: https://reviews.freebsd.org/D9720	2019-11-04 13:30:37 +00:00
Toomas Soome	79a4bf8975	loader: factor out label and uberblock load from vdev_probe, add MMP checks Clean up the label read.	2019-11-03 21:19:52 +00:00
Toomas Soome	0c0a882c7a	loader: we do not support booting from pool with log device If pool has log device, stop there and tell about it.	2019-11-03 13:25:47 +00:00
Toomas Soome	abca0bd501	loader: calculate physical vdev psize from asize Since physical device asize is calculated from psize and the asize is stored in pool label, we can use asize to set the value of psize, which is used to calculate the location of the pool labels. MFC after: 1 week	2019-11-03 11:09:06 +00:00
Toomas Soome	24e1a7ac77	r354253 did miss the fact that libzpool is built as fake kernel We build libzpool as kernel like, use _FAKE_KERNEL check to include kernel api in libzpool.	2019-11-02 21:02:54 +00:00
Toomas Soome	25cf531ecd	r354253 did miss lz4.c from sys/cddl/boot/zfs.	2019-11-02 15:08:19 +00:00

1 2 3 4 5 ...

2332 Commits