Commit Graph

1908 Commits

Author SHA1 Message Date
Martin Matuska
c03c5b1c80 zfs: merge openzfs/zfs@a86e08941 (master) into main
Notable upstream pull request merges:
  #9078:  log xattr=sa create/remove/update to ZIL
  #11919: Cross-platform xattr user namespace compatibility
  #13014: Report dnodes with faulty bonuslen
  #13016: FreeBSD: Fix zvol_cdev_open locking
  #13019: spl: Don't check FreeBSD rwlocks for double initialization
  #13027: Fix clearing set-uid and set-gid bits on a file when
          replying a write
  #13031: Add enumerated vdev names to 'zpool iostat -v' and
          'zpool list -v'
  #13074: Enable encrypted raw sending to pools with greater ashift
  #13076: Receive checks should allow unencrypted child datasets
  #13098: Avoid dirtying the final TXGs when exporting a pool
  #13172: Fix ENOSPC when unlinking multiple files from full pool

Obtained from:	OpenZFS
OpenZFS commit:	a86e089415
2022-03-08 18:53:02 +01:00
Mark Johnston
2d5d2a986c ctf: Import ctf.h from OpenBSD
Use it instead of the existing ctf.h from OpenSolaris.  This makes it
easier to use CTF in the core kernel, and to extend the CTF format to
support wider type IDs.

The imported ctf.h is modified to depend only on _types.h, and also to
provide macros which use the "parent" bit of a type ID to refer to types
in a parent CTF container.

No functional change intended.

Reviewed by:	Domagoj Stolfa, emaste
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34358
2022-03-07 10:43:18 -05:00
Mark Johnston
3a56cfedbc fasttrap: Avoid creating WX mappings
fasttrap instruments certain instructions by overwriting them and
copying the original instruction to some per-thread scratch space which
is executed after the probe fires.  This trampoline jumps back to the
tracepoint after executing the original instruction.

The created mapping has both write and execute permissions, and so this
mechanism doesn't work when allow_wx is disabled.  Work around the
restriction by using proc_rwmem() to write to the trampoline.

Reviewed by:	vangyzen
Tested by:	Amit <akamit91@hotmail.com>
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34304
2022-03-01 12:40:35 -05:00
Mark Johnston
83958173eb fasttrap: Assert that fasttrap_fork() successfully unmaps scratch space
No functional change intended.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-03-01 12:40:35 -05:00
Mateusz Guzik
f17ef28674 fd: rename fget*_locked to fget*_noref
This gets rid of the error prone naming where fget_unlocked returns with
a ref held, while fget_locked requires a lock but provides nothing in
terms of making sure the file lives past unlock.

No functional changes.
2022-02-22 18:53:43 +00:00
Andrew Turner
b5876847ac Teach DTrace about BTI on arm64
The Branch Target Identification (BTI) Armv8-A extension adds new
instructions that can be placed where we may indirrectly branch to,
e.g. at the start of a function called via a function pointer. We can't
emulate these in DTrace as the kernel will have raised a different
exception before the DTrace handler has run.

Skip over the BTI instruction if it's used as the first instruction in
a function.

Sponsored by:	The FreeBSD Foundation
2022-01-19 12:07:35 +00:00
Andriy Gapon
7fdf0e8835 dtrace: add a knob to control maximum size of principal buffers
We had a hardcoded limit of 1/128-th of physical memory that was further
subdivided between all CPUs as principal buffers are allocated on the
per-CPU basis.  Actually, the buffers could use up 1/64-th of the
memmory because with the default switch policy there are two buffers per
CPU.

This commit allows to change that limit.

Note that the discussed limit is per dtrace command invocation.
The idea is to limit the size of a single malloc(9) call, not the total
memory size used by DTrace buffers.

Reviewed by:	markj
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D33648
2022-01-11 15:47:50 +02:00
Domagoj Stolfa
30ec3138ed dtrace: Disable getf() as it is broken on FreeBSD
getf() on FreeBSD calls _sx_slock(), _sx_sunlock() and fget_locked().
Furthermore, it does not set the per-core fault flag, meaning it
usually ends up in a double fault panic once getf() does get called,
especially from fbt.

Reviewing the DTrace Toolkit + a number of other scripts scattered
around FreeBSD, I have not been able to find one use of getf(). Given
how broken the implementation currently is, we disable it until it
can be implemented properly.

Also comment out a test in aggs/tst.subr.d for getf().

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33378
2021-12-17 13:10:22 -05:00
Andrew Turner
b792434150 Create sys/reg.h for the common code previously in machine/reg.h
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).

Reviewed by:	imp, markj
Sponsored by:	DARPA, AFRL (original work)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19830
2021-08-30 12:50:53 +01:00
Ed Maste
b1a217a369 sys/cddl: remove extraneous semicolons
Fixes:		5a1b490d50 ("FreeBSD changes to vendor source.")
Fixes:		91eaf3e183 ("Custom DTrace kernel module...")
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-08-16 10:29:44 -04:00
Konstantin Belousov
66b8eced97 dtrace: use %zu format specifier for data of size_t type
Sponsored by:	The FreeBSD Foundation
2021-08-09 00:27:04 +03:00
Robert Watson
fb581531c1 Teach DTrace that unaligned accesses are OK on aarch64, not just x86.
MFC after:	3 days
Reviewed:	andrew
Differential Revision:	https://reviews.freebsd.org/D29369
2021-03-22 23:57:19 +00:00
Andrew Turner
28d945204e Handle functions that use a nop in the arm64 fbt
To trace leaf asm functions we can insert a single nop instruction as
the first instruction in a function and trigger off this.

Reviewed by:	gnn
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D28132
2021-03-03 14:18:03 +00:00
Andrew Turner
c00ec4dab2 Handle using a sub instruction in the arm64 fbt
Some stack frames are too large for a store pair instruction we already
detect in the arm64 fbt code. Add support for handling subtracting the
stack pointer directly.

Sponsored by:	Innovate UK
2021-01-12 12:42:23 +00:00
Andrew Turner
d0df1a2d54 Only allow a store through sp in the arm64 fbt
When searching for an instruction to patch out in the arm64 function
boundary trace we search for a store pair with a write back. This
instruction is commonly used to store two registers to the stack
and update the stack pointer to hold space for more.

This works in many cases, however not all functions use this, e.g.
when the stack frame is too large. In these cases we may find another
instruction of the same type that doesn't store through the stack
pointer. Filter these instructions out and assume if we see one we
are past the function prologue.

Reported by:	rwatson
Sponsored by:	Innovate UK
2021-01-12 12:42:23 +00:00
Bryan Drewery
f222a6b886 dtrace: Fix /"string" == NULL/ comparisons using an uninitialized value.
A test of this is funcs/tst.strtok.d which has this filter:

    BEGIN
    /(this->field = strtok(this->str, ",")) == NULL/
    {
            exit(1);
    }
The test will randomly fail with exit status of 1 indicating that this->field
was NULL even though printing it out shows it is not.

This is compiled to the DTrace instruction set:
    // Pushed arguments not shown here
    // call strtok() and set result into %r1
    07: 2f001f01    call DIF_SUBR(31), %r1          ! strtok
    // set thread local scalar this->field from %r1
    08: 39050101    stls %r1, DT_VAR(1281)          ! DT_VAR(1281) = "field"
    // Prepare for the == comparison
    // Set right side of %r2 to NULL
    09: 25000102    setx DT_INTEGER[1], %r2         ! 0x0
    // string compare %r1 (strtok result) to %r2
    10: 27010200    scmp %r1, %r2

In this case only %r1 is loaded with a string limit set to lim1.  %r2 being
NULL does not get loaded and does not set lim2.  Then we call dtrace_strncmp()
with MIN(lim1, lim2) resulting in passing 0 and comparing neither side.
dtrace_strncmp() handles this case fine and it already has been while
being lucky with what lim2 was [un]initialized as.

Reviewed by:	markj, Don Morris <dgmorris AT earthlink.net>
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D27671
2021-01-08 14:37:17 -08:00
Alex Richardson
1f6612b444 Install dtrace.h and dependencies
This makes the minimum amount of changes to allow inclusion of dtrace.h
without all the solaris compatibility headers. Installing dtrace.h allows
compiling consumers of libdtrace (e.g. https://github.com/tmetsch/python-dtrace)
without requiring a copy of the source tree.
For python-dtrace I worked around this in 58019c9a12
but being able to build the library without installed sources would be
extremely useful.

Reviewed By:	gnn
Differential Revision: https://reviews.freebsd.org/D27884
2021-01-07 09:26:21 +00:00
Adrian Chadd
975e1c1ce6 [cddl] Fix lz4 function definitions to not tri pup compile.
This tripped up in llvm compilation on amd64 noting that lz4_init/lz4_fini
were lacking in being previously defined.

Reviewed by:	emaste, freqlabs, brooks
Differential Revision:	https://reviews.freebsd.org/D27240
2020-11-17 17:11:07 +00:00
Matt Macy
180f822596 Update OpenZFS to 2.0.0-rc3-gfc5966
- fix panic due to tqid overflow
- Improve libzfs_error_init messages
- Expose zfetch_max_idistance tunable
- Make dbufstat work on FreeBSD
- Fix EIO after resuming receive of new dataset over an existing one
2020-10-17 01:06:04 +00:00
Warner Losh
2fec3ae896 Add zstd support to the boot loader.
Add support to the _STANDALONE environment enough bits of the kernel
that we can compile it. We still have a small zstd_shim.c since there
were 3 items that were a bit hard to nail down and may be cleaned up
in the future. These go hand in hand with a number of commits to
sys/sys in the past weeks, should this need be MFCd.

Discussed with: mmacy (in review and on IRC/Slack)
Reviewed by: freqlabs (on openzfs repo)
Differential Revision: https://reviews.freebsd.org/D26218
2020-10-12 22:19:07 +00:00
Matt Macy
a86e97e50d ZFS: band-aid for -DNO_CLEAN
Submitted by:	Neal Chauhan
Approved by:	imp@
Differential Revision:	https://reviews.freebsd.org/D26183
2020-08-25 23:35:55 +00:00
Matt Macy
9e5787d228 Merge OpenZFS support in to HEAD.
The primary benefit is maintaining a completely shared
code base with the community allowing FreeBSD to receive
new features sooner and with less effort.

I would advise against doing 'zpool upgrade'
or creating indispensable pools using new
features until this change has had a month+
to soak.

Work on merging FreeBSD support in to what was
at the time "ZFS on Linux" began in August 2018.
I first publicly proposed transitioning FreeBSD
to (new) OpenZFS on December 18th, 2018. FreeBSD
support in OpenZFS was finally completed in December
2019. A CFT for downstreaming OpenZFS support in
to FreeBSD was first issued on July 8th. All issues
that were reported have been addressed or, for
a couple of less critical matters there are
pull requests in progress with OpenZFS. iXsystems
has tested and dogfooded extensively internally.
The TrueNAS 12 release is based on OpenZFS with
some additional features that have not yet made
it upstream.

Improvements include:
  project quotas, encrypted datasets,
  allocation classes, vectorized raidz,
  vectorized checksums, various command line
  improvements, zstd compression.

Thanks to those who have helped along the way:
Ryan Moeller, Allan Jude, Zack Welch, and many
others.

Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25872
2020-08-25 02:21:27 +00:00
Mariusz Zaborski
277f38abff zfs: add an option to the bootloader to rewind the ZFS checkpoint
The checkpoints are another way of keeping the state of ZFS.
During the rewind, the pool has to be exported.
This makes checkpoints unusable when using ZFS as root.
Add the option to rewind the ZFS checkpoint at the boot time.
If checkpoint exists, a new option for rewinding a checkpoint will appear in
the bootloader menu.
We fully support boot environments.
If the rewind option is selected, the boot loader will show a list of
boot environments that existed before the checkpoint.

Reviewed by:	tsoome, allanjude, kevans (ok with high-level overview)
Differential Revision:	https://reviews.freebsd.org/D24920
2020-08-18 19:48:04 +00:00
Alex Richardson
11412d5bc9 Fix linker error in libuutil with recent LLVM
Not marking the function as static can result in a linker error:
undefined reference to __assfail [--no-allow-shlib-undefined]
I noticed this error after updating our CHERI LLVM to the latest upstream
LLVM HEAD revision.

This change effectively reverts r329984 and marks dmu_buf_init_user as
static (which keeps the GCC build happy).

Reviewed By:	#zfs, asomers, freqlabs, mav
Differential Revision: https://reviews.freebsd.org/D25663
2020-08-07 16:04:21 +00:00
Alex Richardson
ec4deee4e4 Fix cddl tools bootstrapping on macOS and Linux
Reviewed By:	brooks
Differential Revision: https://reviews.freebsd.org/D25979
2020-08-07 16:03:55 +00:00
Toomas Soome
722c2b4aca MFOpenZFS: Add support for boot environment data to be stored in the label
We are building new bootonce mechanism (previously zfs bootnext) and it is
based on this OpenZFS change. Since this patch is nicely self contained,
I am commiting it as is, and we can stack our changes.

Original patch description follows:

Modern bootloaders leverage data stored in the root filesystem to
enable some of their powerful features. GRUB specifically has a grubenv
file which can store large amounts of configuration data that can be
read and written at boot time and during normal operation. This allows
sysadmins to configure useful features like automated failover after
failed boot attempts. Unfortunately, due to the Copy-on-Write nature
of ZFS, the standard behavior of these tools cannot handle writing to
ZFS files safely at boot time. We need an alternative way to store
data that allows the bootloader to make changes to the data.

This work is very similar to work that was done on Illumos to enable
similar functionality in the FreeBSD bootloader. This patch is different
in that the data being stored is a raw grubenv file; this file can store
arbitrary variables and values, and the scripting provided by grub is
powerful enough that special structures are not required to implement
advanced behavior.

We repurpose the second padding area in each label to store the grubenv
file, protected by an embedded checksum. We add two ioctls to get and
set this data, and libzfs_core and libzfs functions to access them more
easily. There are no direct command line interfaces to these functions;
these will be added directly to the bootloader utilities.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10009

Obtained from:	OpenZFS
Sponsored by:	Netflix, Klara Inc.
2020-08-05 14:32:20 +00:00
Toomas Soome
491ceb65ec zfs_keys_nextboot array is missing ZPOOL_CONFIG_POOL_GUID and ZPOOL_CONFIG_GUID
As we do check the incomint nvlist, we either need to list all possible
keys or use wildcard.

PR:		248462
Reported by:	larafercue@gmail.com
Sponsored by:	Netflix, Klara Inc.
2020-08-05 14:08:44 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
e4cdb74faf zfs: add support for lockless lookup
Tested by:	pho (in a patchset, previous version)
Differential Revision:	https://reviews.freebsd.org/D25581
2020-07-25 10:39:41 +00:00
Mark Johnston
17eee3b501 Fix a memory leak in dsl_scan_visitbp().
This should be triggered only if arc_read() fails, i.e., quite rarely.
The same logic is already present in OpenZFS.

PR:		247445
Submitted by:	jdolecek@NetBSD.org
MFC after:	1 week
2020-07-20 17:05:44 +00:00
Alan Somers
f60b4812d8 Fix page fault in zfsctl_snapdir_getattr
Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I
can't reproduce this panic on demand, but this looks like the correct
solution.

PR:		247668
Reviewed by:	avg
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25543
2020-07-02 13:17:31 +00:00
Matt Macy
4dc16f4391 Fix "current" variable name conflict with openzfs
The variable "current" is an alias for curthread
in openzfs. Rename all variable uses of current
in dtrace.c to curstate.
2020-06-27 00:57:48 +00:00
Toomas Soome
a14844e0d6 MFOpenZFS: Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).

Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.

This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.

The zfs_ioc_key_t for zfs_keys_channel_program looks like:

static const zfs_ioc_key_t zfs_keys_channel_program[] = {
       {"program",     DATA_TYPE_STRING,               0},
       {"arg",         DATA_TYPE_UNKNOWN,              0},
       {"sync",        DATA_TYPE_BOOLEAN_VALUE,        ZK_OPTIONAL},
       {"instrlimit",  DATA_TYPE_UINT64,               ZK_OPTIONAL},
       {"memlimit",    DATA_TYPE_UINT64,               ZK_OPTIONAL},
};

Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).

ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type

Reviewed by:	allanjude
Obtained from:	OpenZFS
Sponsored by:	Netflix, Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D25393
2020-06-23 06:42:39 +00:00
Allan Jude
c5305bb50a MFOpenZFS: Add zio_ddt_free()+ddt_phys_decref() error handling
The assumption in zio_ddt_free() is that ddt_phys_select() must
always find a match.  However, if that fails due to a damaged
DDT or some other reason the code will NULL dereference in
ddt_phys_decref().

While this should never happen it has been observed on various
platforms.  The result is that unless your willing to patch the
ZFS code the pool is inaccessible.  Therefore, we're choosing
to more gracefully handle this case rather than leave it fatal.

http://mail.opensolaris.org/pipermail/zfs-discuss/2012-February/050972.html

5dc6af0eec

Reported by:	Pierre Beyssac
Obtained from:	OpenZFS
MFC after:	2 weeks
Sponsored by:	Klara Inc.
2020-06-22 19:03:02 +00:00
Allan Jude
9598fc63e6 ZFS: Allow setting checksum=skein on boot pools
PR:		245889
Reported by:	delphij
Sponsored by:	Klara Inc.
2020-06-19 17:59:55 +00:00
Rick Macklem
1f7104d720 Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.
Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)

This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.

Reviewed by:	kib, freqlabs
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D25088
2020-06-14 00:10:18 +00:00
Andriy Gapon
04dc03e0fe fix up r362047: a call to zvol_*_minors() was not hidden from userland
Reported by:	CI/FreeBSD-head-powerpc64-build
MFC after:	5 weeks
X-MFC with:	r362047
2020-06-11 11:35:30 +00:00
Andriy Gapon
f51f07e1ec rework how ZVOLs are updated in response to DSL operations
With this change all ZVOL updates are initiated from the SPA sync
context instead of a mix of the sync and open contexts.  The updates are
queued to be applied by a dedicated thread in the original order.  This
should ensure that ZVOLs always accurately reflect the corresponding
datasets.  ZFS ioctl operations wait on the mentioned thread to complete
its work.  Thus, the illusion of the synchronous ZVOL update is
preserved.  At the same time, the SPA sync thread never blocks on ZVOL
related operations avoiding problems like reported in bug 203864.

This change is based on earlier work in the same direction: D7179 and
D14669 by Anthoine Bourgeois.  D7179 tried to perform ZVOL operations
in the open context and that opened races between them.  D14669 uses a
design very similar to this change but with different implementation
details.

This change also heavily borrows from similar code in ZoL, but there are
many differences too.  See:
- a0bd735adb
- https://github.com/zfsonlinux/zfs/issues/3681
- https://github.com/zfsonlinux/zfs/issues/2217

PR:		203864
MFC after:	5 weeks
Sponsored by:	CyberSecure
Differential Revision: https://reviews.freebsd.org/D23478
2020-06-11 10:41:31 +00:00
Mark Johnston
66b415fb8f Don't block on the range lock in zfs_getpages().
After r358443 the vnode object lock no longer synchronizes concurrent
zfs_getpages() and zfs_write() (which must update vnode pages to
maintain coherence).  This created a potential deadlock between ZFS
range locks and VM page busy locks: a fault on a mapped file will cause
the fault page to be busied, after which zfs_getpages() locks a range
around the file offset in order to map adjacent, resident pages;
zfs_write() locks the range first, and then must busy vnode pages when
synchronizing.

Solve this by adding a non-blocking mode for ZFS range locks, and using
it in zfs_getpages().  If zfs_getpages() fails to acquire the range
lock, only the fault page will be populated.

Reported by:	bdrewery
Reviewed by:	avg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24839
2020-05-20 18:29:23 +00:00
Toomas Soome
22ed31c23f lz4 hash table does not start zeroed
illumos issue: https://www.illumos.org/issues/12757

Submitted by:	andyf
2020-05-19 19:53:12 +00:00
Kyle Evans
47c7d8327c zfs: reject read(2) of a dirfd with EISDIR
This is independent of the recently-discussed global change, which is still
in review/discussion stage.

This is effectively a measure for consistency in the ZFS world, where
FreeBSD was the only platform (as far as I could find) that allowed this.
What ZFS exposes is decidedly not useful for any real purposes, to
paraphrase (hopefully faithfully) jhb's findings when exploring this:

The size of a directory in ZFS is the number of directory entries within.
When reading a directory, you would instead get the leading part of its raw
contents; the amount you get being dictated by the "size," i.e. number of
directory entries. There's decidedly (luckily) no stack disclosure happening
here, though the behavior is bizarre and almost certainly a historical
accident.

This change has already been upstreamed to OpenZFS.

MFC after:	1 week
2020-05-19 02:41:05 +00:00
John Baldwin
2c213c2e75 Correct the order of arguments to copyin() for Q_SETQUOTA.
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24656
2020-05-18 16:47:44 +00:00
Pawel Jakub Dawidek
cb761bb2fb Avoid the GEOM topology lock recursion when we automatically expand a pool.
The steps to reproduce the problem:

	mdconfig -a -t swap -s 3g -u 0
	gpart create -s GPT md0
	gpart add -t freebsd-zfs -s 1g md0
	zpool create -o autoexpand=on foo md0p1
	gpart resize -i 1 -s 2g md0
2020-04-25 21:45:31 +00:00
Gleb Smirnoff
9edef911e8 Make ZFS depend on xdr.ko only. It doesn't need kernel RPC.
Differential Revision:	https://reviews.freebsd.org/D24408
2020-04-17 06:05:08 +00:00
Ryan Moeller
69534635ff MFOpenZFS: ZVOLs should not be allowed to have children
zfs create, receive and rename can bypass this hierarchy rule. Update
both userland and kernel module to prevent this issue and use pyzfs
unit tests to exercise the ioctls directly.

Note: this commit slightly changes zfs_ioc_create() ABI. This allow to
differentiate a generic error (EINVAL) from the specific case where we
tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>

Approved by:	mav (mentor)
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
openzfs/zfs@d8d418ff0c
2020-03-25 15:56:18 +00:00
Alexander Motin
d3c6ba3214 MFOpenZFS: make zil max block size tunable
We've observed that on some highly fragmented pools, most metaslab
allocations are small (~2-8KB), but there are some large, 128K
allocations.  The large allocations are for ZIL blocks.  If there is a
lot of fragmentation, the large allocations can be hard to satisfy.

The most common impact of this is that we need to check (and thus load)
lots of metaslabs from the ZIL allocation code path, causing sync writes
to wait for metaslabs to load, which can take a second or more.  In the
worst case, we may not be able to satisfy the allocation, in which case
the ZIL will resort to txg_wait_synced() to ensure the change is on
disk.

To provide a workaround for this, this change adds a tunable that can
reduce the size of ZIL blocks.

External-issue: DLPX-61719
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8865
openzfs/zfs@b8738257c2

MFC after:	2 weeks
2020-03-19 01:05:54 +00:00
Alexander Motin
cf2f2eb568 Fix infinite scan on a pool with only special allocations
Attempt to run scrub or resilver on a new pool containing only special
allocations (special vdev added on creation) caused infinite loop
because of dsl_scan_should_clear() limiting memory usage to 5% of pool
size, which it calculated accounting only normal allocation class.

Addition of special and just in case dedup classes fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #10106
Closes #8694
openzfs/zfs@fa130e010c
2020-03-16 19:03:10 +00:00
Konstantin Belousov
d5b7401f64 zfs dmu_read: loosen the assertion.
Since switch to the lockless grab, shared busy for ahead/behind pages
allows other threads to validate and map the pages readonly.

Reviewed by:	avg, jeff, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D23986
2020-03-06 21:15:25 +00:00
Alexander Motin
5c940cf1ff Remove vfs.zfs.top_maxinflight tunable/sysctl.
It is dead since sorted scrub import at r334844.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-03-05 19:43:43 +00:00
Alexander Motin
e37d5c12e9 Increase number of write completion threads, matching ZoL.
Our iSCSI benchmarks on a large 80-core system show that previous limit
of 8 threads can be a bottleneck.  At some points this change increases
write IOPS by as much as 50%.  I am still not sure that so many threads
is really required, but we tested lower amounts and got no significant
benefits, while latencies were a bit worse, so decided to not diverge.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-03-03 15:05:13 +00:00