Commit Graph

1944 Commits

Author SHA1 Message Date
Sean Eric Fagan
69724399c4 This originated from ZFS On Linux, as
d4a72f2386

During scans (scrubs or resilvers), it sorts the blocks in each transaction
group by block offset; the result can be a significant improvement. (On my
test system just now, which I put some effort to introduce fragmentation into
the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the
changes.) I've seen similar rations on production systems.

Approved by:	Alexander Motin
Obtained from:	ZFS On Linux
Relnotes:	Yes (improved scrub performance, with tunables)
Differential Revision:	https://reviews.freebsd.org/D15562
2018-06-08 17:38:28 +00:00
Benno Rice
b3b11d6400 Break recursion involving getnewvnode and zfs_rmnode.
When we're at our vnode limit, getnewvnode will call into the vnode LRU
cache to free up vnodes. If the vnode we try to recycle is a ZFS vnode we
end up, eventually, in zfs_rmnode. If the ZFS vnode we're recycling
represents something with extended attributes, zfs_rmnode will call
zfs_zget which will attempt to allocate another vnode. If the next vnode we
try to recycle is also a ZFS vnode representing something with extended
attributes we can recurse further. This ends up being unbounded and can end
up overflowing the stack.

In order to avoid this, restructure zfs_rmnode to simply add the extended
attribute directory's object ID to the unlinked set, thus not requiring the
allocation of a vnode. We then schedule a task that calls zfs_unlinked_drain
which will do the work of properly marking the vnodes for unlinking.
zfs_unlinked_drain is also called on mount so these will be cleaned up
there.

Reviewed by:	avg, mav
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D15342
2018-06-07 18:59:32 +00:00
Justin Hibbits
a1a990d8a4 Revert r326083, it doesn't behave as expected.
Even though there do appear to be more artificial frames, with 12, stack
traces no longer list at all.  Revert until a better, more stable value can
be determined.
2018-06-03 03:53:11 +00:00
Justin Hibbits
5e91185bb1 Protect dtrace_getpcstack() from a NULL stack pointer in a trap frame
Found when trying to use lockstat on a POWER9, the stack pointer (r1) could
be NULL, and result in a NULL pointer dereference, crashing the kernel.
2018-05-30 03:48:27 +00:00
Hans Petter Selasky
4f2190efce Fix 32-bit buildworld for i386 after r334320.
The 64-bit atomics defined for i386 are currently only available in
the kernel space.

Found by:	cy@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-05-29 13:43:16 +00:00
Hans Petter Selasky
43bb1274d0 Implement atomic_add_64() and atomic_subtract_64() for the i386 target.
While at it add missing _acq_ and _rel_ variants for 64-bit atomic
operations under i386.

Reviewed by:	kib @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-05-29 11:59:02 +00:00
Andriy Gapon
620b779158 fix zfs_getpages crash when called from sendfile, followup to r329363
It turns out that sendfile_swapin() has an optimization where it may
insert pointers to bogus_page into the page array that it passes to
VOP_GETPAGES.  That happens to work with buffer cache, because it
extensively uses bogus_page internally, so it has the necessary checks.
However, ZFS did not expect bogus_page as VOP_GETPAGES(9) does not
document such a (ab)use of bogus_page.

So, this commit adds checks and handling of bogus_page.

I expect that use of bogus_page with VOP_GETPAGES will get documented
sooner rather than later.

Reported by:	Andrew Reilly <areilly@bigpond.net.au>, delphij
Tested by:	Andrew Reilly <areilly@bigpond.net.au>
Requested by:	many
MFC after:	1 week
2018-05-25 07:29:52 +00:00
Andriy Gapon
873c2703d8 Fix 'zpool create -t <tempname>'
Creating a pool with a temporary name fails when we also specify custom
dataset properties: this is because we mistakenly call
zfs_set_prop_nvlist() on the "real" pool name which, as expected,
cannot be found because the SPA is present in the namespace with the
temporary name.

Fix this by specifying the correct pool name when setting the dataset
properties.

Author: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

Obtained from:	ZFS on Linux, zfsonlinux/zfs@4ceb8dd6fd
MFC after:	1 week
2018-05-15 13:27:29 +00:00
Mark Johnston
5f05bda607 DTrace aarch64: Avoid calling unwind_frame() in the probe context.
unwind_frame() may be instrumented by FBT, leading to recursion into
dtrace_probe(). Manually inline unwind_frame() as we do with stack
unwinding code for other architectures.

Submitted by:	Domagoj Stolfa
Reviewed by:	manu
MFC after:	1 week
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D15359
2018-05-12 15:35:26 +00:00
Matt Macy
cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Jamie Gritton
0e5c6bd436 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
Andriy Gapon
ca7019d2ac opensolaris system_taskq does not need to run at maximum priority
In fact, this taskqueue should use "boring" threads, nothing special
about them.

MFC after:	2 weeks
2018-05-04 07:28:01 +00:00
Ed Maste
3804f572a3 zfs_ioctl: avoid out-of-bound read
admbugs:	796
Submitted by:	Domagoj Stolfa <ds815@cam.ac.uk>
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:	avg
MFC after:	1 day
2018-05-04 00:56:41 +00:00
Mateusz Guzik
9d68f7741f systrace: track it like sdt probes
While here predict false.

Note the code is wrong (regardless of this change). Dereference of the
pointer can race with module unload. A fix would set the probe to a
nop stub instead of NULL.
2018-04-27 15:16:34 +00:00
Mateusz Guzik
7cd794214a dtrace: depessimize dtmalloc when dtrace is active
Each malloc/free was testing dtrace_malloc_enabled and forcing
extra reads from the malloc type struct to see if perhaps a
dtmalloc probe was on.

Treat it like lockstat and sdt: have a global bolean.
2018-04-24 01:06:20 +00:00
Mateusz Guzik
4c5209cb21 lockstat: track lockstat just like sdt probes
In particular flip the frequently tested var to bool.
2018-04-24 01:04:10 +00:00
Alexander Motin
bbbac409fe 9433 Fix ARC hit rate
When the compressed ARC feature was added in commit d3c2ae1
the method of reference counting in the ARC was modified.  As
part of this accounting change the arc_buf_add_ref() function
was removed entirely.

This would have be fine but the arc_buf_add_ref() function
served a second undocumented purpose of updating the ARC access
information when taking a hold on a dbuf.  Without this logic
in place a cached dbuf would not migrate its associated
arc_buf_hdr_t to the MFU list.  This would negatively impact
the ARC hit rate, particularly on systems with a small ARC.

This change reinstates the missing call to arc_access() from
dbuf_hold() by implementing a new arc_buf_access() function.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2018-04-16 00:54:58 +00:00
Andriy Gapon
81f187e576 allow ZFS pool to have temporary name for duration of current import
The change adds -t <name> option to zpool create and -t option to zpool
import in its form with an old name and a new name.  This allows to
import (or create) a pool under a name that's different from its real,
permanent name without affecting that name.  This is useful when working
with VM images or images of other physical systems if they happen to
have a ZFS pool with the same name as the host system.

The changes come from ZoL with some small tweaks.
The porting has been done by julian.

The change is being submitted to OpenZFS:
https://github.com/openzfs/openzfs/pull/600

Submitted by:	julian
Reviewed by:	smh
MFC after:	2 weeks
Sponsored by:	Panzura (porting)
Differential Revision: https://reviews.freebsd.org/D14972
2018-04-12 10:37:26 +00:00
Mark Johnston
87c1cb45cd Correct a comment.
Submitted by:	Domagoj Stolfa
X-MFC with:	r332364
Sponsored by:	DARPA, AFRL
2018-04-10 14:07:02 +00:00
Mark Johnston
b32171ea5a Set zfs_arc_free_target to v_free_target.
Page daemon output is now regulated by a PID controller with a setpoint
of v_free_target. Moreover, the page daemon now wakes up regularly
rather than waiting for a wakeup from another thread. This means that
the free page count is unlikely to drop below the old
zfs_arc_free_target value, and as a result the ARC was not readily
freeing pages under memory pressure. Address the immediate problem by
updating zfs_arc_free_target to match the page daemon's new behaviour.

Reported and tested by:	truckman
Discussed with:	jeff
X-MFC with:	r329882
Differential Revision:	https://reviews.freebsd.org/D14994
2018-04-10 13:56:06 +00:00
Mark Johnston
8593136428 Assert that dtrace_probe() doesn't re-enter itself.
This helps catch cases where an instrumented function is called while
in probe context.

Submitted by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
MFC after:	2 weeks
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D14863
2018-04-10 13:47:09 +00:00
Alexander Motin
8b26d76a50 9434 Speculative prefetch is blocked by device removal code.
Device removal code does not set spa_indirect_vdevs_loaded for pools
that never experienced device removal.  At least one visual consequence
of it is completely blocked speculative prefetcher.  This patch sets
the variable in such situations.
2018-04-03 21:16:41 +00:00
Alexander Motin
849a7ce2d5 MFV r331712:
9280 Assertion failure while running removal_with_ganging test with 4K devices

illumos/illumos-gate@243952c7ee

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matt Ahrens <Matt.Ahrens@delphix.com>
2018-03-28 23:17:29 +00:00
Alexander Motin
a96086ed50 MFV 331710:
9188 increase size of dbuf cache to reduce indirect block decompression

illumos/illumos-gate@268bbb2a2f

With compressed ARC (6950) we use up to 25% of our CPU to decompress indirect
blocks, under a workload of random cached reads. To reduce this decompression
cost, we would like to increase the size of the dbuf cache so that more
indirect blocks can be stored uncompressed.

If we are caching entire large files of recordsize=8K, the indirect blocks
use 1/64th as much memory as the data blocks (assuming they have the same
compression ratio). We suggest making the dbuf cache be 1/32nd of all memory,
so that in this scenario we should be able to keep all the indirect blocks
decompressed in the dbuf cache. (We want it to be more than the 1/64th that
the indirect blocks would use because we need to cache other stuff in the
dbuf cache as well.)

In real world workloads, this won't help as dramatically as the example
above, but we think it's still worth it because the risk of decreasing
performance is low. The potential negative performance impact is that we
will be slightly reducing the size of the ARC (by ~3%).

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: George Wilson <george.wilson@delphix.com>
2018-03-28 23:05:48 +00:00
Alexander Motin
9311abdd7e MFV r331708:
9321 arc_loan_compressed_buf() can increment arc_loaned_bytes by the wrong value

illumos/illumos-gate@9be12bd737

arc_loan_compressed_buf() increments arc_loaned_bytes by psize unconditionally
In the case of zfs_compressed_arc_enabled=0, when the buf is returned via
arc_return_buf(), if ARC_BUF_COMPRESSED(buf) is false, then arc_loaned_bytes
is decremented by lsize, not psize.

Switch to using arc_buf_size(buf), instead of psize, which will return
psize or lsize, depending on the result of ARC_BUF_COMPRESSED(buf).

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Allan Jude <allanjude@freebsd.org>
2018-03-28 22:50:05 +00:00
Alexander Motin
5c4561f332 MFV r331706:
9235 rename zpool_rewind_policy_t to zpool_load_policy_t

illumos/illumos-gate@5dafeea3eb

We want to be able to pass various settings during import/open of a pool,
which are not only related to rewind. Instead of adding a new policy and
duplicate a bunch of code, we should just rename rewind_policy to a more
generic term like load_policy.

For instance, we'd like to set spa->spa_import_flags from the nvlist,
rather from a flags parameter passed to spa_import as in some cases we want
those flags not only for the import case, but also for the open case. One
such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would
allow zfs to open a pool when logs are missing.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-03-28 22:29:06 +00:00
Alexander Motin
6cc8a8260f MFV 331704:
9191 dump vdev tree to zfs_dbgmsg when spa load fails due to missing log devices

illumos/illumos-gate@ccef24b493

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-03-28 22:10:06 +00:00
Alexander Motin
2221f0d8af MFV 331702:
9187 racing condition between vdev label and spa_last_synced_txg in vdev_validate

illumos/illumos-gate@d1de72cfa2

ztest failed with uncorrectable IO error despite having the fix for #7163.
Both sides of the mirror have CANT_OPEN_BAD_LABEL, which also distinguishes
it from that issue.

Definitely seems like a racing condition between the vdev_validate and spa_sync:
1. Thread A (spa_sync): vdev label is updated to latest txg
2. Thread B (vdev_validate): vdev label's txg is compared to spa_last_synced_txg and is ahead.
3. Thread A (spa_sync): spa_last_synced_txg is updated to latest txg.

Solution: do not check txg in vdev_validate unless config lock is held.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-03-28 22:07:31 +00:00
Alexander Motin
0b0c76bc58 MFV r331695, 331700: 9166 zfs storage pool checkpoint
illumos/illumos-gate@8671400134

The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with
exactly that.  It can be thought of as a “pool-wide snapshot” (or a
variation of extreme rewind that doesn’t corrupt your data).  It remembers
the entire state of the pool at the point that it was taken and the user
can revert back to it later or discard it.  Its generic use case is an
administrator that is about to perform a set of destructive actions to ZFS
as part of a critical procedure.  She takes a checkpoint of the pool before
performing the actions, then rewinds back to it if one of them fails or puts
the pool into an unexpected state.  Otherwise, she discards it.  With the
assumption that no one else is making modifications to ZFS, she basically
wraps all these actions into a “high-level transaction”.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
2018-03-28 22:01:27 +00:00
Andriy Gapon
f4043145f2 ZFS vn_rele_async: catch up with the use of refcount(9) for the vnode use count
It's not sufficient nor required to use the vnode interlock when
checking if we are going to drop the last use count as the code in
vputx() uses refcount (atomic) operations for both checking and
decrementing the use code.  Apply the same method to vn_rele_async().
While here, remove vn_rele_inactive(), a wrapper around vrele() that
didn't add any value.

Also, the change required making vfs_refcount_release_if_not_last()
public.  I've made vfs_refcount_acquire_if_not_zero() public as well.
They are in sys/refcount.h now.  While making the move I've dropped the
vfs_ prefix.

Reviewed by:	mjg
MFC after:	2 weeks
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D14869
2018-03-28 08:55:31 +00:00
John Baldwin
d41e41f9f0 Remove very old and unused signal information codes.
These have been supplanted by the MI signal information codes in
<sys/signal.h> since 7.0.  The FPE_*_TRAP ones were deprecated even
earlier in 1999.

PR:		226579 (exp-run)
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D14637
2018-03-27 20:57:51 +00:00
Andriy Gapon
f3fe7e5eff zfs: fix mismatch between format specifier and type
vdev_dbgmsg_print_tree printed vdev_id of uint64_t type with %u format
specifier.  That caused subsequent parameters to be incorrectly read
from the stack and lead to a crash when a wrong value was interpreted as
a string pointer.

This should be upstreamed.

Reported by:	pho
MFC after:	3 days
2018-03-23 09:42:47 +00:00
Alexander Motin
4148c56f78 Reduce struct aggsum_bucket padding to fit into one cache line.
Reported by:	mjg
2018-03-23 02:50:38 +00:00
Alexander Motin
e76e77a972 MFV r331407: 9213 zfs: sytem typo
illumos/illumos-gate@edc8ef7d92

Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Toomas Soome <tsoome@me.com>
2018-03-23 02:30:29 +00:00
Alexander Motin
f222611ab0 MFV r331405: 9084 spa_*_ashift must ignore spare devices
illumos/illumos-gate@b037f3dbd6

Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-03-23 02:24:52 +00:00
Alexander Motin
b8436536c9 MFV r331400: 8484 Implement aggregate sum and use for arc counters
In pursuit of improving performance on multi-core systems, we should
implements fanned out counters and use them to improve the performance of
some of the arc statistics. These stats are updated extremely frequently,
and can consume a significant amount of CPU time.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
2018-03-23 02:15:05 +00:00
Mark Johnston
1de56ac728 Revert part of r331264: disable interrupts before disabling WP.
We might otherwise be preempted, leaving WP disabled while another
thread runs on the CPU.

Reported by:	kib
X-MFC with:	r331264
2018-03-20 21:36:35 +00:00
Mark Johnston
7a79ce2e38 Make use of the KPI added in r331252.
MFC after:	2 weeks
2018-03-20 21:16:26 +00:00
Ed Maste
fc2a8776a2 Rename assym.s to assym.inc
assym is only to be included by other .s files, and should never
actually be assembled by itself.

Reviewed by:	imp, bdrewery (earlier)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D14180
2018-03-20 17:58:51 +00:00
Mark Johnston
95099bbad1 Fix an access of an uninitialized variable in dtrace_probe().
Reported by:	Coverity, via cem
MFC after:	3 days
2018-03-18 17:01:50 +00:00
Andriy Gapon
289c14e811 MFV r330973: 9164 assert: newds == os->os_dsl_dataset
illumos/illumos-gate@5f5913bb83
5f5913bb83

https://www.illumos.org/issues/9164
  This issue has been reported by Alan Somers as
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225877

  dmu_objset_refresh_ownership() first disowns a dataset (and releases
  it) and then owns it again. There is an assert that the new dataset
  object is the same as the old dataset object.  When running ZFS Test
  Suite on FreeBSD we see this panic from zpool_upgrade_007_pos test:

  panic: solaris assert: newds == os->os_dsl_dataset (0xfffff80045f4c000
  == 0xfffff80021ab4800)

  I see that the old dataset has dsl_dataset_evict_async() pending in
  ds_dbu.dbu_tqent and its ds_dbuf is NULL.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <don.brady@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <avg@FreeBSD.org>

PR:		225877
Reported by:	asomers
MFC after:	1 week
2018-03-15 08:49:21 +00:00
Steven Hartland
a59e720e65 Prevent ZFS TRIM breaking VTOC8 partitions
Update the ZFS TRIM code to ensure it respects VTOC8 partition headers as
documented by the ZFS On-Disk Specification section 1.3

Before this a zpool create on a VTOC8 partitioned device would overwrite the
partition metadata.

Reported by:	marius
Reviewed by:	marius agv
MFC after:	1 week
Sponsored by:	Multiplay
2018-03-14 21:21:03 +00:00
Andriy Gapon
4a24755d68 MFV r330591: 8984 fix for 6764 breaks ACL inheritance
illumos/illumos-gate@e9bacc6d1a
e9bacc6d1a

https://www.illumos.org/issues/8984
  Consider a directory configured as:
  drwx-ws---+ 2 henson cpp 3 Jan 23 12:35 dropbox/
  user:henson:rwxpdDaARWcC--:f-i----:allow
  owner@:--------------:f-i----:allow
  group@:--------------:f-i----:allow
  everyone@:--------------:f-i----:allow
  owner@:rwxpdDaARWcC--:-di----:allow
  group:cpp:-wx-----------:-------:allow
  owner@:rwxpdDaARWcC--:-------:allow
  A new file created in this directory ends up looking like:
  rw-r--r-+ 1 astudent cpp 0 Jan 23 12:39 testfile
  user:henson:rw-pdDaARWcC--:------I:allow
  owner@:--------------:------I:allow
  group@:--------------:------I:allow
  everyone@:--------------:------I:allow
  owner@:rw-p--aARWcCos:-------:allow
  group@:r-----a-R-c--s:-------:allow
  everyone@:r-----a-R-c--s:-------:allow
  with extraneous group@ and everyone@ entries allowing read access that
  shouldn't exist.
  Per Albert Lee on the zfs mailing list:
  "aclinherit=passthrough/passthrough-x should still
  ignore the requested mode when an inheritable ACE for owner@ group@,
  or everyone@ is present in the parent directory.
  It appears there was an oversight in my fix for
  https://www.illumos.org/issues/6764 which made calling zfs_acl_chmod
  from zfs_acl_inherit unconditional. I think the parent ACL check for
  aclinherit=passthrough needs to be reintroduced in zfs_acl_inherit."
  We have a large number of faculty who use dropbox directories like the example
  to have students submit projects. All of these directories are now allowing

Reviewed by: Sam Zaydel <szaydel@racktopsystems.com>
Reviewed by: Paul B. Henson <henson@acm.org>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Dominik Hassler <hadfl@omniosce.org>

PR:		216886
MFC after:	2 weeks
2018-03-07 13:49:26 +00:00
Mark Johnston
1aa8a926b8 Unbreak amd64 FBT after r330539.
X-MFC with:	r330539
2018-03-06 15:51:59 +00:00
Andriy Gapon
f3b7b054dd add ZFS_ENTER protection to .zfs/snapshot vnode operations that need it
Those operations, zfsctl_snapdir_readdir and zfsctl_snapdir_getattr,
access the filesystem's objset and it can be unstable during operations
like receive and rollback.

MFC after:	2 weeks
2018-02-27 14:08:54 +00:00
Alexander Motin
060bac1db8 Add sysctls/tunables for dbuf cache size.
MFC after:	2 weeks
2018-02-27 01:36:43 +00:00
Alan Somers
7d3761dc72 Don't declare __assfail as static
It gets called by dmu_buf_init_user, which is inline but not static.  So it
needs global linkage itself.

Reported by:	GCC-6
MFC after:	17 days
X-MFC-With:	329722
2018-02-25 14:29:43 +00:00
Alan Somers
92bd443160 Implement CTASSERT using _Static_assert
Prevents warnings about "unused typedef" with GCC-6

Reported by:	GCC-6
MFC after:	18 days
X-MFC-With:	329722
2018-02-24 16:01:21 +00:00
Andriy Gapon
de2cb430ad another rework of getzfsvfs / getzfsvfs_impl code
This change is designed to account for yet another difference between
illumos and FreeBSD VFS.  In FreeBSD a filesystem driver is supposed to
clean up mnt_data in its VFS_UNMOUNT method because it's the last call
into the driver before a struct mount object is destroyed.  The VFS
drains all references to the object before destroying it, but for the
driver it's already as good as gone.
In contrast, illumos VFS provides another method, VFS_FREEVFS, that is
called when all references are drained.  So, the driver can keep its
data after VFS_UNMOUNT and clean it up in VFS_FREEVFS after all
references are gone. This is what ZFS does on illumos.
So there a reference to a filesystem is sufficient to guarantee that the
ZFS specific data, aka zfsvfs_t, stays around (even if the filesystem
gets unmounted).  In FreeBSD we need to vfs_busy the filesystem to get
the same guarantee.  vfs_ref guarantees only that the struct mount is
kept.

The following rules should be observed in getzfsvfs / getzfsvfs_impl on
FreeBSD:
- if we need access to zfsvfs_t then we must use vfs_busy
- if only we need to access struct mount (aka vfs_t), then vfs_ref is
  enough
- when illumos code actually needs only the vfs_t, they still can pass
  the zfsvfs_t and get the vfs_t from it;  that can work in FreeBSD if
  the filesystem is busied, but when it's just referenced then we have
  to pass the vfs_t explicitly
- we cannot call vfs_busy while holding a dataset because that creates a
  LOR with dp_config_rwlock

As a result:
- getzfsvfs_impl now only references the filesystem, same as in illumos,
  but unlike illumos it has to return the vfs_t
- the consumers are updated to account for the change
- getzfsvfs busies the filesystem (and drops the reference from
  getzfsvfs_impl)

Also, zfs_unmount_snap() now gets a busied a filesystem, references it
and then unbusies it essentially reverting actions done in getzfsvfs.
This is needed because the code may perform some checks that require the
zfsvfs_t.  So, those are done before the unbusying.

MFC after:	2 weeks
2018-02-22 13:06:27 +00:00
Andriy Gapon
8d69fe5cc8 followup to r329556, completely remove the covered vnode assert
vrele() acquires the vnode lock only if the hold count drops to zero.
In other scenarios it needs only the interlock.  So,
zfsctl_snapdir_lookup() can race with vfs_mount_destroy() -> vrele()
such that the lookup adds a new reference and then vrele() drops the
mountpoint's reference and only then we check the reference count.
It would be just one in this case.

In fact, the assert should have been removed in r323483 when the code
learned how to deal with the uncovered vnode.

PR:		225795
MFC after:	4 days
X-MFC with:	r329556
2018-02-22 11:41:00 +00:00