1857 Commits

Author SHA1 Message Date
Andriy Gapon
40b1e0dc0e remove stray space symbol in r358380
MFC after:	1 week
X-MFC with:	r358380
2020-02-27 14:27:42 +00:00
Andriy Gapon
6d11243ae2 use ZFS_MAX_DATASET_NAME_LEN instead of MAXPATHLEN for dataset names
The change affects only FreeBSD specific code as the common code already
mostly uses the more idiomatic and correct ZFS_MAX_DATASET_NAME_LEN.

MFC after:	1 week
2020-02-27 14:21:01 +00:00
Andriy Gapon
6b47663df5 dsl_dataset_promote_sync: populate 'oldname' before using it
It's very unlikely that zfsvfs_update_fromname() and
zvol_rename_minors() ever did anything during the promote operation as
the old name was not initialized.

MFC after:	1 week
2020-02-27 14:12:43 +00:00
Alexander Motin
a33a65ce22 MFZoL: Relax restriction on zfs_ioc_next_obj() iteration
Per the documentation for dnode_next_offset in dnode.c, the "txg"
parameter specifies a lower bound on which transaction the dnode can
be found in. We are interested in all dnodes that are removed between
the first and last transaction in the snapshot. It doesn't need to be
created in that snapshot to correspond to a removed file.

In fact, the behavior of zfs diff in the test case exactly matches
this: the transaction that created the data that was deleted in snapshot
"2" was produced before, in snapshot "1", definitely predating the first
transaction in snapshot "2".

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com>
Closes #2081
zfsonlinux/zfs@7290cd3c4e

MFC after:	1 week
2020-02-26 20:38:48 +00:00
Alexander Motin
0f58760b82 MFZoL: Fix resilver writes in vdev_indirect_io_start
This patch addresses an issue found in ztest where resilver
write zios that were passed to an indirect vdev would end up
being handled as though they were resilver read zios. This
caused issues where the zio->io_abd would be both read to
and written from at the same time, causing asserts to fail.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8193
zfsonlinux/zfs@5aa95ba0d3

MFC after:	1 week
2020-02-26 16:51:45 +00:00
Alexander Motin
51c04e6cc2 Fix patch mismerge in r358336.
MFC after:	1 week
2020-02-26 16:04:24 +00:00
Alexander Motin
f8a7a04b79 MFZoL: Fix issue with scanning dedup blocks as scan ends
This patch fixes an issue discovered by ztest where
dsl_scan_ddt_entry() could add I/Os to the dsl scan queues
between when the scan had finished all required work and
when the scan was marked as complete. This caused the scan
to spin indefinitely without ending.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010
zfsonlinux/zfs@5e0bd0ae05

MFC after:	1 week
2020-02-26 15:59:46 +00:00
Alexander Motin
308acfcc62 MFZoL: Fix 2 small bugs with cached dsl_scan_phys_t
This patch corrects 2 small bugs where scn->scn_phys_cached was
not properly updated to match the primary copy when it needed to
be. The first resulted in the pause state not being properly
updated and the second resulted in the cached version being
completely zeroed even if the primary was not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010
zfsonlinux/zfs@8cb119e3dc

MFC after:	1 week
2020-02-26 15:47:40 +00:00
Alexander Motin
4b7f090f8d MFZoL: Fix txg_sync_thread hang in scan_exec_io()
When scn->scn_maxinflight_bytes has not been initialized it's
possible to hang on the condition variable in scan_exec_io().
This issue was uncovered by ztest and is only possible when
deduplication is enabled through the following call path.

  txg_sync_thread()
    spa_sync()
      ddt_sync_table()
        ddt_sync_entry()
          dsl_scan_ddt_entry()
            dsl_scan_scrub_cb()
              dsl_scan_enqueuei()
                scan_exec_io()
                  cv_wait()

Resolve the issue by always initializing scn_maxinflight_bytes
to a reasonable minimum value.  This value will be recalculated
in dsl_scan_sync() to pick up changes to zfs_scan_vdev_limit
and the addition/removal of vdevs.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7098
zfsonlinux/zfs@f90a30ad1b

MFC after:	1 week
2020-02-26 15:45:04 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Alexander Motin
8d8e484d9c Remove duplicate dbufs accounting.
Since AVL already has embedded element counter, use dn_dbufs_count
only for dbufs not counted there (bonus buffers) and just add them.
This removes two atomics per dbuf life cycle.

According to profiler it reduces time spent by dbuf_destroy() inside
bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core.

This counter is used only on illumos, so for FreeBSD it was just a
waste of time.

MFC after:	2 weeks
2020-02-07 15:50:47 +00:00
Alexander Motin
c10aea724f Reduce number of atomic_add() calls in aggsum.
Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to
re-borrow after the flush.  But since asc_borrowed and asc_delta are
accessed only while holding asc_lock, it makes no any sense to modify
as_lower_bound and as_upper_bound in multiple steps.  Instead of that
the new code uses only 2 atomics in all the cases, one per as_*_bound
variable.  I think even that is overkill, simple atomic store and
load could be used here, since all modifications are done under the
as_lock, but there are no such primitives in ZFS code now.

While there, make borrow code consider previous borrow value, so that
on mixed request patterns reduce chance of needing to borrow again if
much larger request follows tiny one that needed borrow.

Also reduce as_numbuckets from uint64_t to u_int.  It makes no sense
to use so large division operation on every aggsum_add().

Reviewed by:	Brian Behlendorf, Paul Dagnelie
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-02-06 20:32:53 +00:00
Alexander Motin
ea642c5c38 Few microoptimizations to dbuf layer.
Move db_link into the same cache line as db_blkid and db_level.
It allows significantly reduce avl_add() time in dbuf_create() on
systems with large RAM and huge number of dbufs per dnode.

Avoid few accesses to dbuf_caches[].size, which is highly congested
under high IOPS and never stays in cache for a long time.  Use local
value we are receiving from zfs_refcount_add_many() any way.

Remove cache_size_bytes_max bump from dbuf_evict_one().  I don't see
a point to do it on dbuf eviction after we done it on insertion in
dbuf_rele_and_unlock().

Reviewed by:	mahrens, Brian Behlendorf
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-02-04 15:53:51 +00:00
Warner Losh
58aa35d429 Remove sparc64 kernel support
Remove all sparc64 specific files
Remove all sparc64 ifdefs
Removee indireeect sparc64 ifdefs
2020-02-03 17:35:11 +00:00
Alexander Motin
c68c82324f Unblock kstat.zfs.misc.dbufstats sysctls.
It is not so much broken to hide it after we wasted time to collect it.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-02-03 17:10:40 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Kyle Evans
c887ac8324 zfs: light refactor to indicate cachedlookup in zfs_lookup
If we come from VOP_CACHEDLOOKUP, we must skip the VEXEC check as it will
have been done in the caller (vfs_cache_lookup). This is a part of D23247,
which may skip the earlier VEXEC check as well if the root fd was opened
with O_SEARCH.

This one required slightly more work as zfs_lookup may also be called
indirectly as VOP_LOOKUP or a couple of other places where we must do the
check.
2020-02-02 16:10:33 +00:00
Mateusz Guzik
f0c402e425 zfs: ZFS_WLOCK_TEARDOWN_INACTIVE_WLOCKED -> ZFS_TEARDOWN_INACTIVE_WLOCKED
Fix up the argument used in one case as well.
2020-02-01 06:39:10 +00:00
Mateusz Guzik
8c3658c450 zfs: convert z_teardown_inactive_lock to sleepable read-mostly lock
This eliminates a global serialisation point. It only gets write locked
on unmount.

Sample result doing an incremental -j 40 build:
before: 173.30s user 458.97s system 2595% cpu 24.358 total
after:  168.58s user 254.92s system 2211% cpu 19.147 total
2020-01-31 08:38:38 +00:00
Mateusz Guzik
b076e113af zfs: provide macros to handle z_teardown_inactive_lock 2020-01-31 08:37:35 +00:00
Mateusz Guzik
42a9f8f21d zfs: fix spurious lock contention during path lookup
ZFS tracks if anything denies VEXEC to allow for a quick check for the
common case of path traversal. Use it.

Differential Revision:	https://reviews.freebsd.org/D22224
2020-01-30 02:16:17 +00:00
Mateusz Guzik
e196f7825e zfs: use VOP_NEED_INACTIVE
Big thanks to Greg V for testing previous verions of the patch.

Differential Revision:	https://reviews.freebsd.org/D22130
2020-01-30 02:14:10 +00:00
Alexander Motin
da19f62dfa Map ECKSUM and EFRAGS from ZFS onto real errnos.
Make it less confusing when, for example, stat sets errno to 122 because a
checksum failed in ZFS:

Before: getfacl: /foo/bar: stat() failed: Unknown error: 122
After: getfacl: /foo/bar: stat() failed: Integrity check failed

Submitted by:	Ryan Moeller <ryan@ixsystems.com>
Reviewed by:	mckusick, mav
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D22973
2020-01-13 22:06:16 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Mateusz Guzik
20fa645666 zfs: add missing CLTFLAG_MPSAFE annotations 2020-01-12 04:53:01 +00:00
Mateusz Guzik
b52d50cf69 vfs: prealloc vnodes in getnewvnode_reserve
Having a reserved vnode count does not guarantee that getnewvnodes wont
block later. Said blocking partially defeats the purpose of reserving in
the first place.

Preallocate instaed. The only consumer was always passing "1" as count
and never nesting reservations.
2020-01-11 22:58:14 +00:00
Mateusz Guzik
75ad73a8b9 zfs: plug a vnode reserve leak in zfs_make_xattrdir 2020-01-07 04:34:29 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mark Johnston
9f5632e6c8 Remove page locking for queue operations.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page.  It is also no longer needed in
the page queue scans.  This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22885
2019-12-28 19:04:00 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
John Baldwin
889ad0b890 Use a callout instead of timeout(9) for delayed zio's.
Reviewed by:	avg
Differential Revision:	https://reviews.freebsd.org/D22597
2019-12-13 19:27:51 +00:00
Mateusz Guzik
c8b29d1212 vfs: locking primitives which elide ->v_vnlock and shared locking disablement
Both of these features are not needed by many consumers and result in avoidable
reads which in turn puts them on profiles due to cache-line ping ponging.

On top of that the current lockgmr entry point is slower than necessary
single-threaded. As an attempted clean up preparing for other changes,
provide new routines which don't support any of the aforementioned features.

With these patches in place vop_stdlock and vop_stdunlock disappear from
flamegraphs during -j 104 buildkernel.

Reviewed by:	jeff (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22665
2019-12-11 23:11:21 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Mark Johnston
bf10551606 Fix an inverted condition introduced in r353539.
This would have most likely resulted in read errors causing page leaks.

Submitted by:	jeff
2019-12-06 23:49:37 +00:00
Konstantin Belousov
fdc6b10d44 Add a VN_OPEN_INVFS flag.
vn_open_cred() assumes that it is called from the top-level of a VFS
syscall.  Writers must call bwillwrite() before locking any VFS
resource to wait for cleanup of dirty buffers.

ZFS getextattr() and setextattr() VOPs do call vn_open_cred(), which
results in wait for unrelated buffers while owning ZFS vnode lock (and
ZFS does not use buffer cache).  VN_OPEN_INVFS allows caller to skip
bwillwrite.

Note that ZFS is still incorrect there, because it starts write on an
mp and locks a vnode while holding another vnode lock.

Reported by:	Willem Jan Withagen <wjw@digiware.nl>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-11-29 14:02:32 +00:00
Alexander Motin
5008399c14 Fix use-after-free in case of L2ARC prefetch failure.
In case L2ARC read failed, l2arc_read_done() creates _different_ ZIO
to read data from the original storage device.  Unfortunately pointer
to the failed ZIO remains in hdr->b_l1hdr.b_acb->acb_zio_head, and if
some other read try to bump the ZIO priority, it will crash.

The problem is reproducible by corrupting L2ARC content and reading
some data with prefetch if l2arc_noprefetch tunable is changed to 0.
With the default setting the issue is probably not reproducible now.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-11-28 18:28:35 +00:00
Andriy Gapon
8491540808 MFV r354383: 10592 misc. metaslab and vdev related ZoL bug fixes
illumos/illumos-gate@555d674d5d
555d674d5d

https://www.illumos.org/issues/10592
  This is a collection of recent fixes from ZoL:
  8eef997679b Error path in metaslab_load_impl() forgets to drop ms_sync_lock
  928e8ad47d3 Introduce auxiliary metaslab histograms
  425d3237ee8 Get rid of space_map_update() for ms_synced_length
  6c926f426a2 Simplify log vdev removal code
  21e7cf5da89 zdb -L should skip leak detection altogether
  df72b8bebe0 Rename range_tree_verify to range_tree_verify_not_present
  75058f33034 Remove unused vdev_t fields

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
MFC after:	4 weeks
2019-11-21 13:35:43 +00:00
Andriy Gapon
489912da7b MFV r354382,r354385: 10601 10757 Pool allocation classes
illumos/illumos-gate@663207adb1
663207adb1

10601 Pool allocation classes
https://www.illumos.org/issues/10601
  illumos port of ZoL Pool allocation classes. Includes at least these two
  commits:
  441709695 Pool allocation classes misplacing small file blocks
  cc99f275a Pool allocation classes

10757 Add -gLp to zpool subcommands for alt vdev names
https://www.illumos.org/issues/10757
  Port from ZoL of
  d2f3e292d Add -gLp to zpool subcommands for alt vdev names
  Note that a subsequent ZoL commit changed -p to -P
  a77f29f93 Change full path subcommand flag from -p to -P

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Portions contributed by: Håkan Johansson <f96hajo@chalmers.se>
Portions contributed by: Richard Yao <ryao@gentoo.org>
Portions contributed by: Chunwei Chen <david.chen@nutanix.com>
Portions contributed by: loli10K <ezomori.nozomu@gmail.com>
Author: Don Brady <don.brady@delphix.com>

11541 allocation_classes feature must be enabled to add log device

illumos/illumos-gate@c1064fd7ce
c1064fd7ce

https://www.illumos.org/issues/11541
  After the allocation_classes feature was integrated, one can no longer add a
  log device to a pool unless that feature is enabled. There is an explicit check
  for this, but it is unnecessary in the case of log devices, so we should handle
  this better instead of forcing the feature to be enabled.

Author: Jerry Jelinek <jerry.jelinek@joyent.com>

FreeBSD notes.
I faithfully added the new -g, -L, -P flags, but only -g does something:
vdev GUIDs are displayed instead of device names.  -L, resolve symlinks,
and -P, display full disk paths, do nothing at the moment.
The use of special vdevs is backward compatible for read-only access, so
root pools should be bootable, but exercise caution.

MFC after:	4 weeks
2019-11-21 08:20:05 +00:00
Andriy Gapon
a8c08e008a MFV r354378,r354379,r354386: 10499 Multi-modifier protection (MMP)
10499 Multi-modifier protection (MMP)
illumos/illumos-gate@e0f1c0afa4
e0f1c0afa4
https://www.illumos.org/issues/10499
  Port the following ZFS commits from ZoL to illumos.
  379ca9cf2 Multi-modifier protection (MMP)
  bbffb59ef Fix multihost stale cache file import
  0d398b256 Do not initiate MMP writes while pool is suspended

10701 Correct lock ASSERTs in vdev_label_read/write
illumos/illumos-gate@58447f688d
58447f688d
https://www.illumos.org/issues/10701
  Port of ZoL commit:
  0091d66f4e Correct lock ASSERTs in vdev_label_read/write
  At a minimum, this fixes a blown assert during an MMP test run when running on
  a DEBUG build.

11770 additional mmp fixes
illumos/illumos-gate@4348eb9012
4348eb9012
https://www.illumos.org/issues/11770
  Port a few additional MMP fixes from ZoL that came in after our
  initial MMP port.
  4ca457b065 ZTS: Fix mmp_interval failure
  ca95f70dff zpool import progress kstat
  (only minimal changes from above can be pulled in right now)
  060f0226e6 MMP interval and fail_intervals in uberblock

Note from the committer (me).
I do not have any use for this feature and I have not tested it.  I only
did smoke testing with multihost=off.
Please be aware.
I merged the code only to make future merges easier.

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Portions contributed by: Tim Chase <tim@chase2k.com>
Portions contributed by: sanjeevbagewadi <sanjeev.bagewadi@gmail.com>
Portions contributed by: John L. Hammond <john.hammond@intel.com>
Portions contributed by: Giuseppe Di Natale <dinatale2@llnl.gov>
Portions contributed by: Prakash Surya <surya1@llnl.gov>
Portions contributed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Olaf Faaland <faaland1@llnl.gov>

MFC after:	4 weeks
2019-11-18 09:38:35 +00:00
Konstantin Belousov
a7af4a3e7d amd64: move GDT into PCPU area.
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22302
2019-11-12 15:51:47 +00:00
Andriy Gapon
930db3e338 MFV r354377: 10554 Implemented zpool sync command
illumos/illumos-gate@9c2acf00e2
9c2acf00e2

https://www.illumos.org/issues/10554
  During the port of MMP (illumos bug 10499) from ZoL, I found this
  earlier ZoL project is a prerequisite. Here is the original
  description.  This addition will enable us to sync an open TXG to the
  main pool on demand. The functionality is similar to 'sync(2)' but
  'zpool sync' will return when data has hit the main storage instead of
  potentially just the ZIL as is the case with the 'sync(2)' cmd.

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>
MFC after:	3 weeks
Relnotes:	possibly
2019-11-07 11:18:28 +00:00
Alexander Motin
4cd20c3b08 Add vfs.zfs.zio.taskq_batch_pct tunable.
MFC after:	1 week
2019-11-05 15:19:05 +00:00
Andriy Gapon
ec03988887 fix up r354333, make zfsproc visible to dtrace, rename to system_proc
I overlooked the fact that zfsproc is required by dtrace modules that
use illumos compatible taskq KPI.  So, move the symbol definition to
the opensolaris module that provides compatibility support for both ZFS
and DTrace.  Also, rename zfsproc to system_proc to reflect that it is
not specific to ZFS.

Reported by:	ae
MFC after:	5 weeks
X-MFC with:	ae
2019-11-05 14:34:59 +00:00
Andriy Gapon
eb819923ec zfs: enable SPA_PROCESS on the kernel side
The purpose of this change is to group kernelthreads specific to a
particular ZFS pool under a kernel process.  There can be many dozens of
threads per pool.  This change improves observability of those threads.

This change consists of several subchanges:
1. illumos taskq_create_proc can now pass its process parameter to
taskqueue.  Also, use zfsproc instead of NULL for taskq_create.  Caveat:
zfsproc might not be initialized yet.  But in that case it is still NULL,
so not worse than before.

2. illumos sys/proc.h: kthread id is stored in t_did field, not t_tid.

3. zfs: enable SPA_PROCESS on the kernel side.  The change is a bit hairy
as newproc() is implemented privately to spa.c.  I couldn't think of a
better way to populate process name than to poke inside the argument for
the process routine.

4. illumos thread_create: allow assigning thread to process other than
zfsproc.

5. zfs: expose spa_proc to other users, assign sync and quiesce threads
to it.

Pool-specific threads created using (relatively new) zthr mechanism are
still assigned to the zfskern process rather than to a respective
zpool-xxx process.  I am going to address this a bit later.

Reviewed by:	no one
MFC after:	5 weeks
Relnotes:	perhaps
Differential Revision: https://reviews.freebsd.org/D9720
2019-11-04 13:30:37 +00:00
Toomas Soome
24e1a7ac77 r354253 did miss the fact that libzpool is built as fake kernel
We build libzpool as kernel like, use _FAKE_KERNEL check to include
kernel api in libzpool.
2019-11-02 21:02:54 +00:00
Toomas Soome
e499793e76 Remove duplicate lz4 implementations
Port illumos change: https://www.illumos.org/issues/11667

Move lz4.c out of zfs tree to opensolaris/common/lz4, adjust it to be
usable from kernel/stand/userland builds, so we can use just one single
source. Add lz4.h to declare lz4_compress() and lz4_decompress().

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D22037
2019-11-02 12:28:04 +00:00
Alexander Motin
a4d5fcadd8 FreeBSD'fy ZFS zlib zalloc/zfree callbacks.
The previous code came from OpenSolaris, which in my understanding require
allocation size to be known to free memory.  To store that size previous
code allocated additional 8 byte header.  But I have noticed that zlib
with present settings allocates 64KB context buffers for each call, that
could be efficiently cached by UMA, but addition of those 8 bytes makes
them fall back to physical RAM allocations, that cause huge overhead and
lock congestion on small blocks.  Since FreeBSD's free() does not have
the size argument, switching to it solves the problem, increasing write
speed to ZVOLs with 4KB block size and GZIP compression on my 40-threads
test system from ~60MB/s to ~600MB/s.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-10-29 21:25:19 +00:00
Alan Somers
1af3a11218 MFZoL: Avoid retrieving unused snapshot props
This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it
to take input parameters that alter the way looping through the list of
snapshots is performed. The idea here is to restrict functions that
throw away some of the snapshots returned by the ioctl to a range of
snapshots that these functions actually use. This improves efficiency
and execution speed for some rollback and send operations.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #8077
zfsonlinux/zfs@4c0883fb4a

MFC after:	2 weeks
2019-10-26 17:11:02 +00:00
Konstantin Belousov
5b87ecc643 Assert that vnode_pager_setsize() is called with the vnode exclusively locked
except for filesystems that set the MNTK_VMSETSIZE_BUG,  Set the flag for ZFS.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:21:24 +00:00
Andriy Gapon
b6528d546f MFV r353637: 10844 Serialize ZTHR operations to eliminate races
illumos/illumos-gate@6a316e1f6d
6a316e1f6d

https://www.illumos.org/issues/10844
  ZoL 61c3391acc9 Serialize ZTHR operations to eliminate races

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
Obtained from:	illumos, ZoL
MFC after:	3 weeks
2019-10-16 09:29:01 +00:00