Commit Graph

2276 Commits

Author SHA1 Message Date
Mariusz Zaborski
277f38abff zfs: add an option to the bootloader to rewind the ZFS checkpoint
The checkpoints are another way of keeping the state of ZFS.
During the rewind, the pool has to be exported.
This makes checkpoints unusable when using ZFS as root.
Add the option to rewind the ZFS checkpoint at the boot time.
If checkpoint exists, a new option for rewinding a checkpoint will appear in
the bootloader menu.
We fully support boot environments.
If the rewind option is selected, the boot loader will show a list of
boot environments that existed before the checkpoint.

Reviewed by:	tsoome, allanjude, kevans (ok with high-level overview)
Differential Revision:	https://reviews.freebsd.org/D24920
2020-08-18 19:48:04 +00:00
Alex Richardson
11412d5bc9 Fix linker error in libuutil with recent LLVM
Not marking the function as static can result in a linker error:
undefined reference to __assfail [--no-allow-shlib-undefined]
I noticed this error after updating our CHERI LLVM to the latest upstream
LLVM HEAD revision.

This change effectively reverts r329984 and marks dmu_buf_init_user as
static (which keeps the GCC build happy).

Reviewed By:	#zfs, asomers, freqlabs, mav
Differential Revision: https://reviews.freebsd.org/D25663
2020-08-07 16:04:21 +00:00
Alex Richardson
ec4deee4e4 Fix cddl tools bootstrapping on macOS and Linux
Reviewed By:	brooks
Differential Revision: https://reviews.freebsd.org/D25979
2020-08-07 16:03:55 +00:00
Toomas Soome
722c2b4aca MFOpenZFS: Add support for boot environment data to be stored in the label
We are building new bootonce mechanism (previously zfs bootnext) and it is
based on this OpenZFS change. Since this patch is nicely self contained,
I am commiting it as is, and we can stack our changes.

Original patch description follows:

Modern bootloaders leverage data stored in the root filesystem to
enable some of their powerful features. GRUB specifically has a grubenv
file which can store large amounts of configuration data that can be
read and written at boot time and during normal operation. This allows
sysadmins to configure useful features like automated failover after
failed boot attempts. Unfortunately, due to the Copy-on-Write nature
of ZFS, the standard behavior of these tools cannot handle writing to
ZFS files safely at boot time. We need an alternative way to store
data that allows the bootloader to make changes to the data.

This work is very similar to work that was done on Illumos to enable
similar functionality in the FreeBSD bootloader. This patch is different
in that the data being stored is a raw grubenv file; this file can store
arbitrary variables and values, and the scripting provided by grub is
powerful enough that special structures are not required to implement
advanced behavior.

We repurpose the second padding area in each label to store the grubenv
file, protected by an embedded checksum. We add two ioctls to get and
set this data, and libzfs_core and libzfs functions to access them more
easily. There are no direct command line interfaces to these functions;
these will be added directly to the bootloader utilities.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10009

Obtained from:	OpenZFS
Sponsored by:	Netflix, Klara Inc.
2020-08-05 14:32:20 +00:00
Toomas Soome
491ceb65ec zfs_keys_nextboot array is missing ZPOOL_CONFIG_POOL_GUID and ZPOOL_CONFIG_GUID
As we do check the incomint nvlist, we either need to list all possible
keys or use wildcard.

PR:		248462
Reported by:	larafercue@gmail.com
Sponsored by:	Netflix, Klara Inc.
2020-08-05 14:08:44 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
e4cdb74faf zfs: add support for lockless lookup
Tested by:	pho (in a patchset, previous version)
Differential Revision:	https://reviews.freebsd.org/D25581
2020-07-25 10:39:41 +00:00
Mateusz Guzik
0379ff6ae3 vfs: introduce vnode sequence counters
Modified on each permission change and link/unlink.

Reviewed by:	kib
Tested by:	pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25573
2020-07-25 10:31:52 +00:00
Andriy Gapon
2032c532aa dtrace/fbt: fix return probe arguments on arm
arg0 should be an offset of the return point within the function, arg1
should be the return value.  Previously the return probe had arguments as
if for the entry probe.

Tested on armv7.

andrew noted that the same problem seems to be present on arm64, mips,
and riscv.
I am not sure if I will get around to fixing those.  So, platform users
or anyone looking to make a contribution please be aware of this
opportunity.

Reviewed by:	markj
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D25685
2020-07-21 07:41:36 +00:00
Mark Johnston
17eee3b501 Fix a memory leak in dsl_scan_visitbp().
This should be triggered only if arc_read() fails, i.e., quite rarely.
The same logic is already present in OpenZFS.

PR:		247445
Submitted by:	jdolecek@NetBSD.org
MFC after:	1 week
2020-07-20 17:05:44 +00:00
Andrew Turner
256c5d705a Don't overflow the trap frame when accessing lr or xzr.
When emulating a load pair or store pair in dtrace on arm64 we need to
copy the data between the stack and trap frame. When the registers are
either the link register or the zero register we will access memory
past the end of the trap frame as these are encoded as registers 30 and
31 respectively while the array they access only has 30 entries.

Fix this by creating 2 helper functions to perform the operation with
special cases for these registers.

Sponsored by:	Innovate UK
2020-07-17 14:39:07 +00:00
Alan Somers
f60b4812d8 Fix page fault in zfsctl_snapdir_getattr
Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I
can't reproduce this panic on demand, but this looks like the correct
solution.

PR:		247668
Reviewed by:	avg
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25543
2020-07-02 13:17:31 +00:00
Matt Macy
4dc16f4391 Fix "current" variable name conflict with openzfs
The variable "current" is an alias for curthread
in openzfs. Rename all variable uses of current
in dtrace.c to curstate.
2020-06-27 00:57:48 +00:00
Toomas Soome
a14844e0d6 MFOpenZFS: Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).

Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.

This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.

The zfs_ioc_key_t for zfs_keys_channel_program looks like:

static const zfs_ioc_key_t zfs_keys_channel_program[] = {
       {"program",     DATA_TYPE_STRING,               0},
       {"arg",         DATA_TYPE_UNKNOWN,              0},
       {"sync",        DATA_TYPE_BOOLEAN_VALUE,        ZK_OPTIONAL},
       {"instrlimit",  DATA_TYPE_UINT64,               ZK_OPTIONAL},
       {"memlimit",    DATA_TYPE_UINT64,               ZK_OPTIONAL},
};

Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).

ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type

Reviewed by:	allanjude
Obtained from:	OpenZFS
Sponsored by:	Netflix, Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D25393
2020-06-23 06:42:39 +00:00
Allan Jude
c5305bb50a MFOpenZFS: Add zio_ddt_free()+ddt_phys_decref() error handling
The assumption in zio_ddt_free() is that ddt_phys_select() must
always find a match.  However, if that fails due to a damaged
DDT or some other reason the code will NULL dereference in
ddt_phys_decref().

While this should never happen it has been observed on various
platforms.  The result is that unless your willing to patch the
ZFS code the pool is inaccessible.  Therefore, we're choosing
to more gracefully handle this case rather than leave it fatal.

http://mail.opensolaris.org/pipermail/zfs-discuss/2012-February/050972.html

5dc6af0eec

Reported by:	Pierre Beyssac
Obtained from:	OpenZFS
MFC after:	2 weeks
Sponsored by:	Klara Inc.
2020-06-22 19:03:02 +00:00
Toomas Soome
3830659e99 loader: create single zfs nextboot implementation
We should have nextboot feature implemented in libsa zfs code.
To get there, I have created zfs_nextboot() implementation based on
two sources, our current simple textual string based approach with added
structured boot label PAD structure from OpenZFS.

Secondly, all nvlist details are moved to separate source file and
restructured a bit. This is done to provide base support to add nvlist
add/update feature in followup updates.

And finally, the zfsboot/gptzfsboot disk access functions are swapped to use
libi386 and libsa.

Sponsored by:	Netflix, Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D25324
2020-06-20 06:23:31 +00:00
Allan Jude
9598fc63e6 ZFS: Allow setting checksum=skein on boot pools
PR:		245889
Reported by:	delphij
Sponsored by:	Klara Inc.
2020-06-19 17:59:55 +00:00
Rick Macklem
1f7104d720 Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.
Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)

This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.

Reviewed by:	kib, freqlabs
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D25088
2020-06-14 00:10:18 +00:00
Andriy Gapon
04dc03e0fe fix up r362047: a call to zvol_*_minors() was not hidden from userland
Reported by:	CI/FreeBSD-head-powerpc64-build
MFC after:	5 weeks
X-MFC with:	r362047
2020-06-11 11:35:30 +00:00
Andriy Gapon
f51f07e1ec rework how ZVOLs are updated in response to DSL operations
With this change all ZVOL updates are initiated from the SPA sync
context instead of a mix of the sync and open contexts.  The updates are
queued to be applied by a dedicated thread in the original order.  This
should ensure that ZVOLs always accurately reflect the corresponding
datasets.  ZFS ioctl operations wait on the mentioned thread to complete
its work.  Thus, the illusion of the synchronous ZVOL update is
preserved.  At the same time, the SPA sync thread never blocks on ZVOL
related operations avoiding problems like reported in bug 203864.

This change is based on earlier work in the same direction: D7179 and
D14669 by Anthoine Bourgeois.  D7179 tried to perform ZVOL operations
in the open context and that opened races between them.  D14669 uses a
design very similar to this change but with different implementation
details.

This change also heavily borrows from similar code in ZoL, but there are
many differences too.  See:
- a0bd735adb
- https://github.com/zfsonlinux/zfs/issues/3681
- https://github.com/zfsonlinux/zfs/issues/2217

PR:		203864
MFC after:	5 weeks
Sponsored by:	CyberSecure
Differential Revision: https://reviews.freebsd.org/D23478
2020-06-11 10:41:31 +00:00
Ruslan Bukin
d75038a0af Fix entering KDB with dtrace-enabled kernel.
Reviewed by:	markj, jhb
Differential Revision:	https://reviews.freebsd.org/D24018
2020-05-26 16:44:05 +00:00
Mark Johnston
66b415fb8f Don't block on the range lock in zfs_getpages().
After r358443 the vnode object lock no longer synchronizes concurrent
zfs_getpages() and zfs_write() (which must update vnode pages to
maintain coherence).  This created a potential deadlock between ZFS
range locks and VM page busy locks: a fault on a mapped file will cause
the fault page to be busied, after which zfs_getpages() locks a range
around the file offset in order to map adjacent, resident pages;
zfs_write() locks the range first, and then must busy vnode pages when
synchronizing.

Solve this by adding a non-blocking mode for ZFS range locks, and using
it in zfs_getpages().  If zfs_getpages() fails to acquire the range
lock, only the fault page will be populated.

Reported by:	bdrewery
Reviewed by:	avg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24839
2020-05-20 18:29:23 +00:00
Toomas Soome
22ed31c23f lz4 hash table does not start zeroed
illumos issue: https://www.illumos.org/issues/12757

Submitted by:	andyf
2020-05-19 19:53:12 +00:00
Kyle Evans
47c7d8327c zfs: reject read(2) of a dirfd with EISDIR
This is independent of the recently-discussed global change, which is still
in review/discussion stage.

This is effectively a measure for consistency in the ZFS world, where
FreeBSD was the only platform (as far as I could find) that allowed this.
What ZFS exposes is decidedly not useful for any real purposes, to
paraphrase (hopefully faithfully) jhb's findings when exploring this:

The size of a directory in ZFS is the number of directory entries within.
When reading a directory, you would instead get the leading part of its raw
contents; the amount you get being dictated by the "size," i.e. number of
directory entries. There's decidedly (luckily) no stack disclosure happening
here, though the behavior is bizarre and almost certainly a historical
accident.

This change has already been upstreamed to OpenZFS.

MFC after:	1 week
2020-05-19 02:41:05 +00:00
John Baldwin
2c213c2e75 Correct the order of arguments to copyin() for Q_SETQUOTA.
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24656
2020-05-18 16:47:44 +00:00
Pawel Jakub Dawidek
cb761bb2fb Avoid the GEOM topology lock recursion when we automatically expand a pool.
The steps to reproduce the problem:

	mdconfig -a -t swap -s 3g -u 0
	gpart create -s GPT md0
	gpart add -t freebsd-zfs -s 1g md0
	zpool create -o autoexpand=on foo md0p1
	gpart resize -i 1 -s 2g md0
2020-04-25 21:45:31 +00:00
John Baldwin
5c4309b474 Handle non-dtrace-triggered kernel breakpoint traps in mips.
If DTRACE is enabled at compile time, all kernel breakpoint traps are
first given to dtrace to see if they are triggered by a FBT probe.
Previously if dtrace didn't recognize the trap, it was silently
ignored breaking the handling of other kernel breakpoint traps such as
the debug.kdb.enter sysctl.  This only returns early from the trap
handler if dtrace recognizes the trap and handles it.

Submitted by:	Nicolò Mazzucato <nicomazz97@gmail.com>
Reviewed by:	markj
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D24478
2020-04-21 17:38:07 +00:00
Gleb Smirnoff
9edef911e8 Make ZFS depend on xdr.ko only. It doesn't need kernel RPC.
Differential Revision:	https://reviews.freebsd.org/D24408
2020-04-17 06:05:08 +00:00
Ryan Moeller
69534635ff MFOpenZFS: ZVOLs should not be allowed to have children
zfs create, receive and rename can bypass this hierarchy rule. Update
both userland and kernel module to prevent this issue and use pyzfs
unit tests to exercise the ioctls directly.

Note: this commit slightly changes zfs_ioc_create() ABI. This allow to
differentiate a generic error (EINVAL) from the specific case where we
tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>

Approved by:	mav (mentor)
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
openzfs/zfs@d8d418ff0c
2020-03-25 15:56:18 +00:00
Alexander Motin
d3c6ba3214 MFOpenZFS: make zil max block size tunable
We've observed that on some highly fragmented pools, most metaslab
allocations are small (~2-8KB), but there are some large, 128K
allocations.  The large allocations are for ZIL blocks.  If there is a
lot of fragmentation, the large allocations can be hard to satisfy.

The most common impact of this is that we need to check (and thus load)
lots of metaslabs from the ZIL allocation code path, causing sync writes
to wait for metaslabs to load, which can take a second or more.  In the
worst case, we may not be able to satisfy the allocation, in which case
the ZIL will resort to txg_wait_synced() to ensure the change is on
disk.

To provide a workaround for this, this change adds a tunable that can
reduce the size of ZIL blocks.

External-issue: DLPX-61719
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8865
openzfs/zfs@b8738257c2

MFC after:	2 weeks
2020-03-19 01:05:54 +00:00
Alexander Motin
cf2f2eb568 Fix infinite scan on a pool with only special allocations
Attempt to run scrub or resilver on a new pool containing only special
allocations (special vdev added on creation) caused infinite loop
because of dsl_scan_should_clear() limiting memory usage to 5% of pool
size, which it calculated accounting only normal allocation class.

Addition of special and just in case dedup classes fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #10106
Closes #8694
openzfs/zfs@fa130e010c
2020-03-16 19:03:10 +00:00
Ryan Moeller
9f24784038 TODO DONE: Use sx_xholder in SPL rwlock.h
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-03-14 00:16:15 +00:00
Konstantin Belousov
d5b7401f64 zfs dmu_read: loosen the assertion.
Since switch to the lockless grab, shared busy for ahead/behind pages
allows other threads to validate and map the pages readonly.

Reviewed by:	avg, jeff, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D23986
2020-03-06 21:15:25 +00:00
Alexander Motin
5c940cf1ff Remove vfs.zfs.top_maxinflight tunable/sysctl.
It is dead since sorted scrub import at r334844.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-03-05 19:43:43 +00:00
Alexander Motin
e37d5c12e9 Increase number of write completion threads, matching ZoL.
Our iSCSI benchmarks on a large 80-core system show that previous limit
of 8 threads can be a bottleneck.  At some points this change increases
write IOPS by as much as 50%.  I am still not sure that so many threads
is really required, but we tested lower amounts and got no significant
benefits, while latencies were a bit worse, so decided to not diverge.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-03-03 15:05:13 +00:00
Jeff Roberson
9defe1c076 Eliminate object locking in zfs where possible with the new lockless grab
APIs.

Reviewed by:	kib, markj, mmacy
Differential Revision:	https://reviews.freebsd.org/D23848
2020-02-28 20:29:53 +00:00
Mark Johnston
a7261520ba Clear systrace_args_func when systrace probes are disabled.
This function pointer is invalidated when systrace.ko is unloaded.

Reported by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-02-28 17:04:36 +00:00
Andriy Gapon
40b1e0dc0e remove stray space symbol in r358380
MFC after:	1 week
X-MFC with:	r358380
2020-02-27 14:27:42 +00:00
Andriy Gapon
6d11243ae2 use ZFS_MAX_DATASET_NAME_LEN instead of MAXPATHLEN for dataset names
The change affects only FreeBSD specific code as the common code already
mostly uses the more idiomatic and correct ZFS_MAX_DATASET_NAME_LEN.

MFC after:	1 week
2020-02-27 14:21:01 +00:00
Andriy Gapon
6b47663df5 dsl_dataset_promote_sync: populate 'oldname' before using it
It's very unlikely that zfsvfs_update_fromname() and
zvol_rename_minors() ever did anything during the promote operation as
the old name was not initialized.

MFC after:	1 week
2020-02-27 14:12:43 +00:00
Alexander Motin
a33a65ce22 MFZoL: Relax restriction on zfs_ioc_next_obj() iteration
Per the documentation for dnode_next_offset in dnode.c, the "txg"
parameter specifies a lower bound on which transaction the dnode can
be found in. We are interested in all dnodes that are removed between
the first and last transaction in the snapshot. It doesn't need to be
created in that snapshot to correspond to a removed file.

In fact, the behavior of zfs diff in the test case exactly matches
this: the transaction that created the data that was deleted in snapshot
"2" was produced before, in snapshot "1", definitely predating the first
transaction in snapshot "2".

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com>
Closes #2081
zfsonlinux/zfs@7290cd3c4e

MFC after:	1 week
2020-02-26 20:38:48 +00:00
Toomas Soome
c1c4c81fd7 loader: replace zfs_alloc/zfs_free with malloc/free
Use common memory management.
2020-02-26 18:12:12 +00:00
Alexander Motin
0f58760b82 MFZoL: Fix resilver writes in vdev_indirect_io_start
This patch addresses an issue found in ztest where resilver
write zios that were passed to an indirect vdev would end up
being handled as though they were resilver read zios. This
caused issues where the zio->io_abd would be both read to
and written from at the same time, causing asserts to fail.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8193
zfsonlinux/zfs@5aa95ba0d3

MFC after:	1 week
2020-02-26 16:51:45 +00:00
Alexander Motin
51c04e6cc2 Fix patch mismerge in r358336.
MFC after:	1 week
2020-02-26 16:04:24 +00:00
Alexander Motin
f8a7a04b79 MFZoL: Fix issue with scanning dedup blocks as scan ends
This patch fixes an issue discovered by ztest where
dsl_scan_ddt_entry() could add I/Os to the dsl scan queues
between when the scan had finished all required work and
when the scan was marked as complete. This caused the scan
to spin indefinitely without ending.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010
zfsonlinux/zfs@5e0bd0ae05

MFC after:	1 week
2020-02-26 15:59:46 +00:00
Alexander Motin
308acfcc62 MFZoL: Fix 2 small bugs with cached dsl_scan_phys_t
This patch corrects 2 small bugs where scn->scn_phys_cached was
not properly updated to match the primary copy when it needed to
be. The first resulted in the pause state not being properly
updated and the second resulted in the cached version being
completely zeroed even if the primary was not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010
zfsonlinux/zfs@8cb119e3dc

MFC after:	1 week
2020-02-26 15:47:40 +00:00
Alexander Motin
4b7f090f8d MFZoL: Fix txg_sync_thread hang in scan_exec_io()
When scn->scn_maxinflight_bytes has not been initialized it's
possible to hang on the condition variable in scan_exec_io().
This issue was uncovered by ztest and is only possible when
deduplication is enabled through the following call path.

  txg_sync_thread()
    spa_sync()
      ddt_sync_table()
        ddt_sync_entry()
          dsl_scan_ddt_entry()
            dsl_scan_scrub_cb()
              dsl_scan_enqueuei()
                scan_exec_io()
                  cv_wait()

Resolve the issue by always initializing scn_maxinflight_bytes
to a reasonable minimum value.  This value will be recalculated
in dsl_scan_sync() to pick up changes to zfs_scan_vdev_limit
and the addition/removal of vdevs.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7098
zfsonlinux/zfs@f90a30ad1b

MFC after:	1 week
2020-02-26 15:45:04 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Alexander Motin
8d8e484d9c Remove duplicate dbufs accounting.
Since AVL already has embedded element counter, use dn_dbufs_count
only for dbufs not counted there (bonus buffers) and just add them.
This removes two atomics per dbuf life cycle.

According to profiler it reduces time spent by dbuf_destroy() inside
bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core.

This counter is used only on illumos, so for FreeBSD it was just a
waste of time.

MFC after:	2 weeks
2020-02-07 15:50:47 +00:00
Alexander Motin
c10aea724f Reduce number of atomic_add() calls in aggsum.
Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to
re-borrow after the flush.  But since asc_borrowed and asc_delta are
accessed only while holding asc_lock, it makes no any sense to modify
as_lower_bound and as_upper_bound in multiple steps.  Instead of that
the new code uses only 2 atomics in all the cases, one per as_*_bound
variable.  I think even that is overkill, simple atomic store and
load could be used here, since all modifications are done under the
as_lock, but there are no such primitives in ZFS code now.

While there, make borrow code consider previous borrow value, so that
on mixed request patterns reduce chance of needing to borrow again if
much larger request follows tiny one that needed borrow.

Also reduce as_numbuckets from uint64_t to u_int.  It makes no sense
to use so large division operation on every aggsum_add().

Reviewed by:	Brian Behlendorf, Paul Dagnelie
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-02-06 20:32:53 +00:00