Commit Graph

1611 Commits

Author SHA1 Message Date
Alexander Motin
9007a8679a Fix kernel panic when inheriting properties without default.
There are two writable hidden properties "iscsioptions" and "stmf_sbd_lu",
that have no default string value.  Attempt to unset them or replicate
caused kernel panic.  This simple bandaid seems fixes the problem nicely.

MFC after:	2 weeks
2016-08-31 11:55:31 +00:00
Toomas Soome
f1624ed8c4 Bug 212114 - loader: zio_checksum_verify() must test spa for NULL pointer
The issue was introduced with adding support for salted checksums, and
was revealed by bhyve userboot.so.

During pool discovery the loader is reading pool label from disks, and
at that time the spa structure is not yet set up, so the NULL pointer
is passed for spa. This condition must be checked to avoid the corruption
of the memory and NULL pointer dereference.

PR:		212114
Reported by:	tsoome@freebsd.com
Reviewed by:	allanjude
Approved by:	allanjude (mentor)
Differential Revision:	https://reviews.freebsd.org/D7634
2016-08-24 16:30:15 +00:00
Toomas Soome
2c55d0903d Add SHA512, skein, large blocks support for loader zfs.
Updated sha512 from illumos.
Using skein from freebsd crypto tree.
Since loader itself is using 64MB memory for heap, updated zfsboot to
use same, and this also allows to support zfs large blocks.

Note, adding additional features does increate zfsboot code, therefore
this update does increase zfsboot code to 128k, also I have ported gptldr.S
update to zfsldr.S to support 64k+ code.

With this update, boot1.efi has almost reached the current limit of the size
set for it, so one of the future patches for boot1.efi will need to
increase the limit.

Currently known missing zfs features in boot loader are edonr and gzip support.

Reviewed by:	delphij, imp
Approved by:	imp (mentor)
Obtained from:	sha256.c update and skein_zfs.c stub from illumos.
Differential Revision:	https://reviews.freebsd.org/D7418
2016-08-18 00:37:07 +00:00
Mark Johnston
59ceeddecf MFV r301526:
7035 string-related subroutines should validate input earlier

Reviewed by: Alex Wilson <alex.wilson@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Patrick Mooney <pmooney@pfmooney.com>

illumos/illumos-gate@771e39c3b1

MFC after:	2 weeks
2016-08-16 02:25:19 +00:00
Mark Johnston
f66200ee22 MFV r301525:
7033 ustack helper should fault on bad return values

Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alex Wilson <alex.wilson@joyent.com>

illumos/illumos-gate@a2f72b65eb

MFC after:	2 weeks
2016-08-16 02:20:02 +00:00
Mark Johnston
4aea8f31b1 MFV r301524:
7034 negative record sizes should be rejected

Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alex Wilson <alex.wilson@joyent.com>

illumos/illumos-gate@0b8049bfb0

MFC after:	2 weeks
2016-08-16 02:18:34 +00:00
Mark Johnston
b7125fa9cd MFV r296989:
6734 dtrace_canstore_statvar() fails for some valid static variables

Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bryan Cantrill <bryan@joyent.com>

illumos/illumos-gate@d65f2bb4e5

MFC after:	2 weeks
2016-08-16 02:16:54 +00:00
Andriy Gapon
96762fe314 fix a zfs cross-device rename crash introduced in r303763
The problem was that 'zfsvfs' variable was not initialized if the error
was detected, but in the exit path the variable was dereferenced before
the error code was checked.

Reported by:	np
MFC after:	3 days
X-MFC with:	r303763
2016-08-09 06:11:24 +00:00
Justin Hibbits
161c415133 Two fixups for dtrace
* Use the right incantation to get the next stack pointer.  Since powerpc uses
  special frames for traps, dereferencing the stack pointer straight up won't
  get us the next stack pointer in every case.
* Clear EE using the correct instruction sequence.  The PowerISA states that
  'andi.' ANDs the register with 0||<imm>, instead of sign extending or filling
  out the unavailable bits with 1.  Even if it did sign extend, PSL_EE is
  0x8000, so ~PSL_EE is 0x7fff, and the upper bits would be cleared.  Use rlwinm
  in the 32-bit case, and a two-rotate sequence in the 64-bit case, the latter
  chosen to follow the output generated by gcc.

MFC after:	1 week
2016-08-06 15:06:19 +00:00
Andriy Gapon
4fb51b52ef fix .zfs-related cases in zfs_lookup that were broken by r303763
The problem is that the special .zfs nodes are not represented by
znodes but by special gfs-based nodes.
r303763 changed interface of zfs_dirlook such that started operating on
znodes rather than on vnodes and, thus, the function became unsuitable
for handling .zfs entities.
The solution is to move the handling of the special cases to zfs_lookup,
the only consumer of zfs_dirlook.
I already had this solution applied in D7421, but for different reasons.

Reported by:	asomers
MFC after:	3 days
X-MFC with:	r303763
2016-08-06 11:02:07 +00:00
Andriy Gapon
f79bc17233 zfs: honour and make use of vfs vnode locking protocol
ZFS POSIX Layer is originally written for Solaris VFS which is very
different from FreeBSD VFS.  Most importantly many things that FreeBSD VFS
manages on behalf of all filesystems are implemented in ZPL in a different
way.
Thus, ZPL contains code that is redundant on FreeBSD or duplicates VFS
functionality or, in the worst cases, badly interacts / interferes
with VFS.

The most prominent problem is a deadlock caused by the lock order reversal
of vnode locks that may happen with concurrent zfs_rename() and lookup().
The deadlock is a result of zfs_rename() not observing the vnode locking
contract expected by VFS.

This commit removes all ZPL internal locking that protects parent-child
relationships of filesystem nodes.  These relationships are protected
by vnode locks and the code is changed to take advantage of that fact
and to properly interact with VFS.

Removal of the internal locking allowed all ZPL dmu_tx_assign calls to
use TXG_WAIT mode.

Another victim, disputable perhaps, is ZFS support for filesystems with
mixed case sensitivity.  That support is not provided by the OS anyway,
so in ZFS it was a buch of dead code.

To do:
- replace ZFS_ENTER mechanism with VFS managed / visible mechanism
- replace zfs_zget with zfs_vget[f] as much as possible
- get rid of not really useful now zfs_freebsd_* adapters
- more cleanups of unneeded / unused code
- fix / replace .zfs support

PR:		209158
Reported by:	many
Tested by:	many (thank you all!)
MFC after:	5 days
Sponsored by:	HybridCluster / ClusterHQ
Differential Revision: https://reviews.freebsd.org/D6533
2016-08-05 06:23:06 +00:00
Ruslan Bukin
98f50c44e3 Update RISC-V port to Privileged Architecture Version 1.9.
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
2016-08-02 14:50:14 +00:00
Allan Jude
4deb8929ea Make boot code and loader check for unsupported ZFS feature flags
OpenZFS uses feature flags instead of a zpool version number to track
features since the split from Oracle. In addition to avoiding confusion
on ZFS vs OpenZFS version numbers, this also allows features to be added
to different operating systems that use OpenZFS in different order.

The previous zfs boot code (gptzfsboot) and loader (zfsloader) blindly
tries to read the pool, and if failed provided only a vague error message.

With this change, both the boot code and loader check the MOS features
list in the ZFS label and compare it against the list of features that
the loader supports. If any unsupported feature is active, the pool is
not considered as a candidate for booting, and a helpful diagnostic
message is printed to the screen. Features that are merely enabled via
zpool upgrade, but not in use, do not block booting from the pool.

Submitted by:	Toomas Soome <tsoome@me.com>
Reviewed by:	delphij, mav
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D6857
2016-08-01 19:37:43 +00:00
Enji Cooper
4fcae4df7e Conditionalize code which defines sysctls per _KERNEL #ifdef guard
This resolves several issues when compiling libzpool (userspace library), i.e.
-Wimplicit-function-declaration and -Wmissing-declarations issues.

MFC after:	2 weeks
Reported by:	clang
Tested with:	clang 3.8.1, gcc 4.2.1, gcc 5.3.0
Sponsored by:	EMC / Isilon Storage Division
2016-07-31 06:34:49 +00:00
Mark Johnston
57185c52de Restore an ifdef that should not have been removed in r303535.
X-MFC-With:	r303535
2016-07-30 07:05:32 +00:00
Mark Johnston
6d1ffb50fc Include fasttrap handling for DATAMODEL_ILP32 when compiling for amd64.
MFC after:	1 month
2016-07-30 03:11:53 +00:00
Ruslan Bukin
573d53050e Remove unused variables. 2016-07-29 12:29:17 +00:00
Mark Johnston
e86e17af79 Merge {amd64,i386}/instr_size.c into x86_instr_size.c.
Also reduce the diff between us and upstream: the input data model will
always be DATAMODEL_NATIVE because of a bug (p_model is never set but is
always initialized to 0), so we don't need to override the caller anyway.
This change is also necessary to support the pid provider for 32-bit
processes on amd64.

MFC after:	2 weeks
2016-07-20 00:02:10 +00:00
Andriy Gapon
70e3da3892 MFV r302645: 6878 Add scrub completion info to "zpool history"
illumos/illumos-gate@1825bc56e5
1825bc56e5

https://www.illumos.org/issues/6878
  Summary of changes:
      * Replace generic "scan done" message with "scan aborted, restarting",
        "scan cancelled", or "scan done"
      * Log number of errors using spa_get_errlog_size
      * Refactor scan restarting check into static function

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Nav Ravindranath <nav@delphix.com>
MFC after:	2 weeks
2016-07-14 11:53:39 +00:00
Andriy Gapon
39a6b17491 MFV r302650: 6940 Cannot unlink directories when over quota
illumos/illumos-gate@99189164df
99189164df

https://www.illumos.org/issues/6940
  Similar to #6334, but this time with empty directories:
  $ zfs create tank/quota
  $ zfs set quota=10M tank/quota
  $ zfs snapshot tank/quota@snap1
  $ zfs set mountpoint=/mnt/tank/quota tank/quota
  $ mkdir /mnt/tank/quota/dir # create an empty directory
  $ mkfile 11M /mnt/tank/quota/11M
  /mnt/tank/quota/11M: initialized 9830400 of 11534336 bytes: Disc quota exceeded
  $ rmdir /mnt/tank/quota/dir # now unlink the empty directory
  rmdir: directory "/mnt/tank/quota/dir": Disc quota exceeded
  From user perspective, I would expect that ZFS is always able to remove files
  and directories even when the quota is exceeded.

Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Simon Klinkert <simon.klinkert@gmail.com>
MFC after:	2 weeks
2016-07-14 11:51:01 +00:00
Andriy Gapon
fe0cc75230 MFV r302644: 6513 partially filled holes lose birth time
illumos/illumos-gate@8df0bcf0df
8df0bcf0df

https://www.illumos.org/issues/6513
  If a ZFS object contains a hole at level one, and then a data block is created
  at level 0 underneath that l1 block, l0 holes will be created. However, these
  l0 holes do not have the birth time property set; as a result, incremental
  sends will not send those holes.
  Fix is to modify the dbuf_read code to fill in birth time data.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Paul Dagnelie <pcd@delphix.com>
MFC after:	3 weeks
2016-07-14 11:48:42 +00:00
Andriy Gapon
e7ed92bbbc MFV r302641: 6844 dnode_next_offset can detect fictional holes
illumos/illumos-gate@11ceac77ea
11ceac77ea

https://www.illumos.org/issues/6844
  dnode_next_offset is used in a variety of places to iterate over the holes or
  allocated blocks in a dnode. It operates under the premise that it can iterate
  over the blockpointers of a dnode in open context while holding only the
  dn_struct_rwlock as reader. Unfortunately, this premise does not hold.
  When we create the zio for a dbuf, we pass in the actual block pointer in the
  indirect block above that dbuf. When we later zero the bp in
  zio_write_compress, we are directly modifying the bp. The state of the bp is
  now inconsistent from the perspective of dnode_next_offset: the bp will appear
  to be a hole until zio_dva_allocate finally finishes filling it in. In the
  meantime, dnode_next_offset can detect a hole in the dnode when none exists.
  I was able to experimentally demonstrate this behavior with the following
  setup:
  1. Create a file with 1 million dbufs.
  2. Create a thread that randomly dirties L2 blocks by writing to the first L0
  block under them.
  3. Observe dnode_next_offset, waiting for it to skip over a hole in the middle
  of a file.
  4. Do dnode_next_offset in a loop until we skip over such a non-existent hole.
  The fix is to ensure that it is valid to iterate over the indirect blocks in a
  dnode while holding the dn_struct_rwlock by passing the zio a copy of the BP
  and updating the actual BP in dbuf_write_ready while holding the lock.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Alex Reece <alex@delphix.com>
MFC after:	3 weeks
2016-07-14 11:42:53 +00:00
Andriy Gapon
875e6e5b04 MFV r302640: 6874 rollback and receive need to reset ZPL state to what's on disk
illumos/illumos-gate@1fdcbd00c9
1fdcbd00c9

https://www.illumos.org/issues/6874
  When we do a clone swap (caused by "zfs rollback" or "zfs receive"), the ZPL
  doesn't completely reload the state from the DMU; some values remain cached in
  the zfsvfs_t.
  steps to reproduce:
  ```
  #!/bin/bash -x
  zfs destroy -R test/fs
  zfs destroy -R test/recvd
  zfs create test/fs
  zfs snapshot test/fs@a
  zfs set userquota@$USER=1m test/fs
  zfs snapshot test/fs@b
  zfs send test/fs@a | zfs recv test/recvd
  zfs send -i @a test/fs@b | zfs recv test/recvd
  zfs userspace test/recvd
     1. should show 1m quota
        dd if=/dev/urandom of=/test/recvd/file bs=1k count=1024
        sync
        dd if=/dev/urandom of=/test/recvd/file2 bs=1k count=1024
     2. should fail with ENOSPC
        sync
        zfs unmount test/recvd
        zfs mount test/recvd
        zfs userspace test/recvd
     3. if bug above, now shows 1m quota
        dd if=/dev/urandom of=/test/recvd/file3 bs=1k count=1024
     4. if bug above, now fails with ENOSPC
  ```

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	3 weeks
2016-07-14 11:39:36 +00:00
Andriy Gapon
ac3623e090 re-apply r299908: zfsctl_snapdir_lookup: clear VV_ROOT of snapshot's root
The change has been undone in r301275 on the assumption that it was no
longer required.  But that was incorrect, because in this case (and only
in this case) the snapshot root vnode is looked up before z_parent is
fixed up.

MFC after:	5 days
2016-07-13 15:16:51 +00:00
Mark Johnston
ca1ef36cf4 Avoid truncating the return value of DTrace predicates.
Predicates are DIF objects whose return value is compared with zero to
determine whether the corresponding probe body is to be executed. The return
value itself is the contents of a 64-bit DIF register, but it was being
truncated to an int before the comparison. This meant that a predicate such
as /0x100000000/ would evaluate to false.

Reported by:	rwatson
MFC after:	3 days
2016-07-09 22:41:21 +00:00
Steven Hartland
ae8420ed72 Fix ZFS ARC min / max tunable
Due to ARC initial configuration not being done and kmem information
not being available we need to blindly set zfs_arc_max and zfs_arc_min
when configured via the tunable.

This fixes vfs.zfs.arc_(min|max) configuration via loader.conf broken by
r302265.

Approved by:	re(gjb)
MFC after:	1 week
2016-07-06 23:49:19 +00:00
Nathan Whitehorn
96c85efb4b Replace a number of conflations of mp_ncpus and mp_maxid with either
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.

There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.

PR:		kern/210106
Reviewed by:	jhb
Approved by:	re (glebius)
2016-07-06 14:09:49 +00:00
Alexander Motin
e36599916f Revert r299454 and r299448.
Those changes were found confusing FreeBSD libc ACL code, that doesn't
differentiate ACL for directories and files, and report ACLs for all
directories created after those patches as non-trivial.  On the other
side these changes were considered wrong from POSIX and NFSv4 points of
view.  Until further investigation done upstream, revert those changes
locally in preparation for FreeBSD 11.0 release.

Approved by:	re (hrs)
2016-06-30 14:55:49 +00:00
Steven Hartland
f535b2d7d8 Allow ZFS ARC min / max to be tuned at runtime
Prior to this change ZFS ARC min / max could only be changed using
boot time tunables, this allows the values to be tuned at runtime
using the sysctls:
* vfs.zfs.arc_max
* vfs.zfs.arc_min

When adjusting ZFS ARC minimum the memory used  will only reduce
to the new minimum given memory pressure.

Reviewed by:	allanjude
Approved by:	re (gjb)
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D5907
2016-06-29 07:55:45 +00:00
Andriy Gapon
0f7dcde977 fix deadlock-prone code in getzfsvfs()
getzfsvfs() called vfs_busy() in the waiting mode while having a hold on
a pool (via a call to dmu_objset_hold).  In other words,
dp_config_rwlock was held in the shared mode while a thread could be
sleeping in vfs_busy().
The pool's txg sync thread needs to take dp_config_rwlock in the
exclusive mode for some actions, e.g., for executing sync tasks.  If the
sync thread gets blocked, then any thread waiting for its sync task to
get executed is also blocked.  Which, in turn, could mean that
vfs_busy() will keep waiting indefinitely.

The solution is to use vfs_ref() in the locked section and to call
vfs_busy() only after dropping other locks.
Note that a reference on a struct mount object does not prevent an
associated zfsvfs_t object from being destroyed.  So, we have to be
careful to operate only on the struct mount object until we successfully
vfs_busy it.

Approved by:	re (gjb)
MFC after:	2 weeks
2016-06-23 07:01:54 +00:00
Alan Somers
54edbcfb69 Fix uninitialized variable from r300881
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
	Initialize needs_update in vdev_geom_set_physpath

PR:		210409
Reported by:	kp
Reviewed by:	kp
Approved by:	re (hrs)
MFC after:	4 weeks
X-MFC-With:	300881
Sponsored by:	Spectra Logic Corp
2016-06-21 15:27:16 +00:00
Konstantin Belousov
cacbedfc46 Fix gcc build.
Reported andt tested by:	swills
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2016-06-18 20:20:00 +00:00
Konstantin Belousov
4a0d95f810 Use vnlru_free(9) to implement dnlc_reduce_cache().
This apparently puts ARC back under the limits after the vnode pressure
rework in r291244, in particular due to the kmem exhaustion.

Based on patch by:	mckusick
Reviewed by:	avg, mckusick
Tested by:	allanjude, madpilot
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2016-06-17 17:34:28 +00:00
Andriy Gapon
55be2f79e5 l2arc: reset b_tmp_cdata to NULL in the case of unset b_daddr
The change is in arc_buf_l2_cdata_free().
Without this we can trip the assertion in arc_hdr_realloc()
if INVARIANTS option is enabled.

Approved by:	re (kib)
MFC after:	1 week
2016-06-13 18:39:13 +00:00
Andriy Gapon
aa14503e8e zfs_vptocnp: check for an invalid znode
... which can arise after the receive or rollback
and failed zfs_rezget().

Approved by:	re (kib)
MFC after:	1 week
2016-06-13 10:53:34 +00:00
Andriy Gapon
a68789426a zfs: set VROOT / VV_ROOT consistently and in a single place
This is a followup to r300131.

A filesystem's root vnode can be reached not only through VSF_ROOT, but
by other means as well.  For example, via a dot-dot lookup.
Also, a root vnode can get reclaimed and then re-created.  For these
reasons it was insufficient to clear VV_ROOT flag from a root vnode of a
snapshot mounted under .zfs in zfsctl_snapdir_lookup().

So, now we set the flag in zfs_znode_sa_init() only if a vnode
represent a root of a filesystem or a standalone snapshot.
That is, the flag is not set for snapshots mounted under .zfs.

MFC after:	2 weeks
2016-06-03 14:37:18 +00:00
Andriy Gapon
d1cf30f4f1 zfs_root: fix a potential root vnode reference leak
It could happen in an unlikely case that we fail to lock the root vnode
with requested flags (which appear to never include LK_NOWAIT).

MFC after:	1 week
2016-06-03 14:22:12 +00:00
Alan Somers
b1a6b8dcd2 Improve the English in a comment
sys/cddl/contrib/opensolaris/uts/common/sys/acl.h:
	Improve the english in a comment.  No functional changes

Submitted by:	gibbs
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
2016-06-01 22:21:42 +00:00
Andrew Turner
1cb290d2f2 Set oldfp so the check for fp == oldfp works as expected.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-05-31 11:32:09 +00:00
Allan Jude
0144ad3e78 Connect the SHA-512t256 and Skein hashing algorithms to ZFS
Support for the new hashing algorithms in ZFS was introduced in r289422
However it was disconnected because FreeBSD lacked implementations of
SHA-512 (truncated to 256 bits), and Skein.

These implementations were introduced in r300921 and r300966 respectively

This commit connects them to ZFS and enabled these new checksum algorithms

This new algorithms are not supported by the boot blocks, so do not use them
on your root dataset if you boot from ZFS.

Relnotes:	yes
Sponsored by:	ScaleEngine Inc.
2016-05-31 04:12:14 +00:00
Bryan Drewery
fdd9048a63 Avoid more literal-suffix errors with C++11 2016-05-29 00:40:29 +00:00
Alan Somers
7a0c41d5d7 zfsd(8), the ZFS fault management daemon
Add zfsd, which deals with hard drive faults in ZFS pools. It manages
hotspares and replements in drive slots that publish physical paths.

cddl/usr.sbin/zfsd
	Add zfsd(8) and its unit tests

cddl/usr.sbin/Makefile
	Add zfsd to the build

lib/libdevdctl
	A C++ library that helps devd clients process events

lib/Makefile
share/mk/bsd.libnames.mk
share/mk/src.libnames.mk
	Add libdevdctl to the build. It's a private library, unusable by
	out-of-tree software.

etc/defaults/rc.conf
	By default, set zfsd_enable to NO

etc/mtree/BSD.include.dist
	Add a directory for libdevdctl's include files

etc/mtree/BSD.tests.dist
	Add a directory for zfsd's unit tests

etc/mtree/BSD.var.dist
	Add /var/db/zfsd/cases, where zfsd stores case files while it's shut
	down.

etc/rc.d/Makefile
etc/rc.d/zfsd
	Add zfsd's rc script

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
	Fix the resource.fs.zfs.statechange message. It had a number of
	problems:

	It was only being emitted on a transition to the HEALTHY state.
	That made it impossible for zfsd to take actions based on drives
	getting sicker.

	It compared the new state to vdev_prevstate, which is the state that
	the vdev had the last time it was opened.  That doesn't make sense,
	because a vdev can change state multiple times without being
	reopened.

	vdev_set_state contains logic that will change the device's new
	state based on various conditions.  However, the statechange event
	was being posted _before_ that logic took effect.  Now it's being
	posted after.

Submitted by:	gibbs, asomers, mav, allanjude
Reviewed by:	mav, delphij
Relnotes:	yes
Sponsored by:	Spectra Logic Corp, iX Systems
Differential Revision:	https://reviews.freebsd.org/D6564
2016-05-28 17:43:40 +00:00
Enji Cooper
dfdbdb0c82 Fix up r300870
The sys/types.h fix I proposed was only tested with zfs(4), not with
libzpool, which is where the build failure actually existed

Remove vm/vm_pageout.h from arc.c and zfs_vnops.c because they're both
unneeded

MFC after: 1 week
X-MFC with: r300865, r300870
In collaboration with: kib
Submitted by: alc
Sponsored by: EMC / Isilon Storage Division
2016-05-27 22:56:00 +00:00
Alan Somers
151746b244 Avoid issuing spa config updates for physical path when not necessary
ZFS's configuration needs to be updated whenever the physical path for a
device changes, but not when a new device is introduced. This is because new
devices necessarily cause config updates, but only if they are actually
accepted into the pool.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
	Split vdev_geom_set_physpath out of vdev_geom_attrchanged.  When
	setting the vdev's physical path, only request a config update if
	the physical path has changed.  Don't request it when opening a
	device for the first time, because the config sync will happen
	anyway upstack.

sys/geom/geom_dev.c
	Split g_dev_set_physpath and g_dev_set_media out of
	g_dev_attrchanged

Submitted by:	will, asomers
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D6428
2016-05-27 22:32:44 +00:00
Enji Cooper
765daefd68 Unbreak the zfs(4) build
vm/vm_pageout.h grew a dependency on the bool typedef in r300865

arc.c didn't include sys/types.h, which included the definition for the typedef

Other items (ofed, drm2) might need to be chased for this commit.

X-MFC with: r300865
MFC after: 1 week
Pointyhat to: alc
Sponsored by: EMC / Isilon Storage Division
2016-05-27 20:33:38 +00:00
Ruslan Bukin
bdca9b1d52 Correct the implementation of dtrace_interrupt_disable/enable.
Pointed out by:	andrew
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
2016-05-27 17:58:10 +00:00
Andrew Turner
12aac6b55b Fix dtrace_interrupt_disable and dtrace_interrupt_enable by having the
former return the current status for the latter to use. Without this we
could enable interrupts when they shouldn't be.

It's still not quite right as it should only update the bits we care about,
bit should be good enough until the correct fix can be tested.

PR:		204270
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-05-27 12:02:12 +00:00
Bryan Drewery
74c5734766 Use netinet/in.h to avoid include/arpa dependency for DIRDEPS_BUILD.
Sponsored by:	EMC / Isilon Storage Division
2016-05-26 23:20:17 +00:00
Bjoern A. Zeeb
9c759b587f Try to unbreak the build after r300611 by including the header
defining VM_MIN_KERNEL_ADDRESS.

Sponsored by:	DARPA/AFRL
2016-05-24 17:38:27 +00:00
Ruslan Bukin
fed1ca4b71 Add initial DTrace support for RISC-V.
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
2016-05-24 16:41:37 +00:00
Andrew Turner
0d0da76911 Mark all memory before the kernel as toxic to DTrace.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-05-24 13:57:23 +00:00
Andriy Gapon
fabe7e4ecc add vop_print methods to vnode operatios of various zfsctl node types
This should help with diagnostics of zfsctl problems.

MFC after:	2 weeks
2016-05-18 13:21:29 +00:00
Andriy Gapon
e34c8d727b move zfsctl_freebsd_root_lookup right next to zfsctl_root_lookup
That makes it easier to reason about the code.

MFC after:	5 weeks
2016-05-18 08:29:39 +00:00
Andriy Gapon
a4bbed22d2 zfsctl_common_fid: remove redundant assignment
"Reinterpret cast" to zfid_short_t and assignment of zf_len
do the job already.

MFC after:	1 week
2016-05-18 08:26:09 +00:00
Andriy Gapon
e6d4eefe2a zfsctl: tighten an assertion and remove an unused definition
There are only two entries under .zfs and 'shares' has an ID of a
special persistent object in its filesystem.

MFC after:	1 week
2016-05-18 08:23:39 +00:00
Andriy Gapon
439e9b6804 zfs_root: no need to set the root flag here
That was both redundant as zfs_znode_sa_init() already does the job and
insufficient as the root vnode can be reached via other means.

MFC after:	1 weeks
2016-05-18 08:19:41 +00:00
Andriy Gapon
74a3df2b1f zfsctl_freebsd_root_lookup: gfs_vop_lookup may return a doomed vnode
gfs code is (almsot) completely agnostic of FreeBSD VFS locking, so it
does not handle doomed but not yet dead vnodes and may return them.
Check for those vnodes here and retry a lookup.
Note that ZFS and gfs have additional protections that ensure that a
parent vnode of the current vnode is never doomed.

The fixed problem is an occasional failure to lookup a 'snapshot' or
'shares' directories under .zfs.

Note that for the above reason all uses of zfsctl_root_lookup() are
better be replaced with VOP_LOOKUP.

MFC after:	5 weeks
2016-05-18 08:02:49 +00:00
Alan Somers
5f7b3969e9 Speed up vdev_geom_open_by_guids
Speedup is hard to measure because the only time vdev_geom_open_by_guids
gets called on many drives at the same time is during boot. But with
vdev_geom_open hacked to always call vdev_geom_open_by_guids, operations
like "zpool create" speed up by 65%.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c

	* Read all of a vdev's labels in parallel instead of sequentially.
	* In vdev_geom_read_config, don't read the entire label, including
	  the uberblock.  That's a waste of RAM.  Just read the vdev config
	  nvlist.  Reduces the IO and RAM involved with tasting from 1MB to
	  448KB.

Reviewed by:	avg
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D6153
2016-05-17 15:17:23 +00:00
Andriy Gapon
857a214d03 zfs_ioc_rename: fix a reversed condition
FreeBSD zfs_ioc_rename() has an option, not present upstream, that
allows to rename snapshots without unmounting them first.  I am not sure
what is a rationale for that option, but its actual behavior was the
opposite of the intended behavior.  That is, by default the snapshots
were not unmounted.
The option was introduced as part of a large update from upstream in
r248498.

One of the consequences was a havoc under .zfs/snapshot after the rename.
The snapshots got new names but were mounted on top of directories with
old names, so readdir would list the new names, but lookup would still
find the old mounts.

PR:		209093
Reported by:	Frédéric VANNIÈRE <f.vanniere@planet-work.com>
MFC after:	5 days
2016-05-17 07:56:05 +00:00
Andriy Gapon
afe674f089 do not destroy 'snapdir' when it becomes inactive
That was just wrong.  In fact, we can safely keep this static entry when
it's inactive.
Now the destructive action is moved to the reclaim method and the
function is renamed from zfsctl_snapdir_inactive(0 to
zfsctl_snapdir_reclaim().

Also, we can use gfs_vop_reclaim() instead of gfs_dir_inactive() +
kmem_free().

Lastly, we can just assert that the node does not any children when it
is reclaimed, even on the force unmount.  That's because zfs_umount()
does an extra vflush() pass which should destroy all snapshot-mountpoint
vnodes that are the snapdir's children.

MFC after:	5 weeks
2016-05-16 15:48:56 +00:00
Andriy Gapon
9c3e205296 try to recycle "snap" vnodes as soon as possible
Those vnodes should not linger.  "Stale" nodes may get out of
synchronization with actual snapshots.  For example if we destroy a
snapshot and create a new one with the same name.  Or when we rename a
snapshot.

While there fix the argument type for zfsctl_snapshot_reclaim().
Also, its original argument can be passed to gfs_vop_reclaim() directly.

Bug 209093 could be related although I have not specifically verified
that.  Referencing just in case.

PR:		209093
MFC after:	5 weeks
2016-05-16 15:37:41 +00:00
Andriy Gapon
0ab1aa90fa fix locking in zfsctl_root_lookup
Dropping the root vnode's lock after VFS_ROOT() didn't really help the
fact that we acquired the lock while holding its child's, .zfs, lock
while performing the operaiton.
So, directly use zfs_zget() to get the root vnode.

While there simplify the code in zfsctl_freebsd_root_lookup.
We know that .zfs is always exclusively locked.
We know that there is already a reference on *vpp, so no need for an
extra one.
Account for the fact that .. lookup may ask for a different lock type,
not necessarily LK_EXCLUSIVE.  And handle a possible failure to acquire
the lock given the lock flags.

MFC after:	5 weeks
2016-05-16 15:28:39 +00:00
Andriy Gapon
705e6b8170 gfs_lookup_dot() does not have to acquire any locks
In fact, that was dangerous.  For example, zfsctl_snapshot_reclaim()
calls gfs_dir_lookup() on ".." path and that ends up calling
gfs_lookup_dot() which violated locking order by acquiring the parent's
directory vnode lock after the child's vnode lock.

Also, the previous behavior was inconsistent as gfs_dir_lookup()
returned a locked vnode for . and .. lookups, but not for any other.

Now gfs_lookup_dot() just references a resulting vnode and the locking
is done in its consumers, where necessary.
Note that we do not enable shared locking support for any gfs / zfsctl
vnodes.

This commit partially reverts r273641.

MFC after:	5 weeks
2016-05-16 15:13:16 +00:00
Andriy Gapon
7223645bd1 avoid deadlock between zfsctl_snapdir_lookup and zfsctl_snapshot_reclaim
The former acquired a snap vnode lock while holding sd_lock while the
latter does the opposite.

The solution is drop sd_lock before acquiring the vnode lock.  That
should be okay as we are still holding a lock on the 'snapshot'
directory in the exclusive mode.  That lock ensures that there are no
concurrent lookups in the directory and thus no concurrent mount attempts.

But now we have to account for the possibility that the snap vnode
might get reclaim after we drop sd_lock and before we can get
the node lock.  So, check for that case and retry.

MFC after:	5 weeks
2016-05-16 15:03:52 +00:00
Andriy Gapon
c6cd01d924 fix a vnode reference leak caused by illumos compat traverse()
This commit partially reverts r273641 which introduced the leak.
It did so to accomodate for some consumers of traverse() that expected
the starting vnode to stay as-is.  But that introduced the leak in the
case when a mounted filesystem was found and its root vnode was
returned.

r299914 removed the troublesome consumers and now there is no reason to
keep the starting vnode.  So, now the new rules are:
- if there is no mounted filesystem, then nothing is changed
- otherwise the starting vnode is always released
- the root vnode of the mounted filesystem is returned locked and
  referenced in the case of success

MFC after:	5 weeks
X-MFC after:	r299914
2016-05-16 12:15:19 +00:00
Andriy Gapon
20ec8b0f9b fix up r299902: mount_snapshot requires that the covered vnode is locked
Previously that was not strictly enforced.

MFC after:	4 weeks
X-MFC with:	r299902
2016-05-16 11:48:43 +00:00
Andriy Gapon
cf7aa80bbd zfsctl_ops_snapshot: remove methods should never be called
We pretend that snapshots mounted under .zfs are part of the original
filesystem and we try very hard to hide vnodes on top of which the snapshots
are mounted.  Given that I believe that the removed operations should
never be called.  They might have been called previously because
of issues fixed in r299906, r299908 and r299913.

MFC after:	5 weeks
2016-05-16 07:24:30 +00:00
Andriy Gapon
cb68fd3513 zfsctl_snapdir_lookup: always clear VV_ROOT flag of snapshot's root VV_ROOT
Previosuly we did that only if the snapshot was mounted earlier, its
root vnode got recycled and then we accessed it again.
We never cleared the flag for a freshly mounted snapshot.

That was very inconsistent and probably a source of some bugs.
Or maybe that painted over some bugs which might get revealed now.

We should consistently clear the flag because we try very hard to
pretend that snapshots auto-mounted under .zfs are part of their
original filesystem.  In other words, we try to hide the fact that they
are different filesystems / mountpoints.

MFC after:	5 weeks
2016-05-16 06:49:09 +00:00
Andriy Gapon
4df590b5b6 add zfs_vptocnp with special handling for snapshots under .zfs
The logic is similar to that already present in zfs_dirlook() to handle
a dot-dot lookup on a root vnode of a snapshot mounted under
.zfs/snapshot/.
illumos does not have an equivalent of vop_vptocnp, so there only the
lookup had to be patched up.

MFC after:	4 weeks
2016-05-16 06:40:51 +00:00
Andriy Gapon
a03fb1cf6a mount_snapshot: consolidate all error handling
This makes sure that the original vnode is always unlocked and released
if any error happens.

MFC after:	4 weeks
2016-05-16 06:30:25 +00:00
Andriy Gapon
3055925d42 zfsctl: fix several problems with reference counts
* Remove excessive references on a snapshot mountpoint vnode.
  zfsctl_snapdir_lookup() called VN_HOLD() on a vnode returned from
  zfsctl_snapshot_mknode() and the latter also had a call to VN_HOLD()
  on the same vnode.
  On top of that gfs_dir_create() already returns the vnode with the
  use count of 1 (set in getnewvnode).
  So there was 3 references on the vnode.

* mount_snapshot() should keep a reference to a covered vnode.
  That reference is owned by the mountpoint (mounted snapshot filesystem).

* Remove cryptic manipulations of a covered vnode in zfs_umount().
  FreeBSD dounmount() already does the right thing and releases the covered
  vnode.

PR:		207464
Reported by:	dustinwenz@ebureau.com
Tested by:	Howard Powell <hpowell@lighthouseinstruments.com>
MFC after:	3 weeks
2016-05-16 06:24:04 +00:00
John Baldwin
fdce57a042 Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
Enji Cooper
622282b3b8 Include arpa/inet.h to get the htonl(3) definition
MFC after: 2 weeks
Reported by: clang
Sponsored by: EMC / Isilon Storage Division
2016-05-13 11:15:33 +00:00
Conrad Meyer
3b56262303 compat/opensolaris: Don't redefined off64_t if already defined
A follow-up to r299456.

Reported by:	gjb
Sponsored by:	EMC / Isilon Storage Division
2016-05-11 16:05:32 +00:00
Alexander Motin
c59a902fa3 MFV r299453: 6765 zfs_zaccess_delete() comments do not accurately reflect
delete permissions for ACLs

Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Kevin Crowe <kevin.crowe@nexenta.com>

openzfs/openzfs@a40149b935
2016-05-11 13:53:29 +00:00
Alexander Motin
0eb65a5367 MFV r299451: 6764 zfs issues with inheritance flags during chmod(2) with
aclmode=passthrough

Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Albert Lee <trisk@nexenta.com>

openzfs/openzfs@1bcf0d240b
2016-05-11 13:50:34 +00:00
Alexander Motin
85a69dbf66 MFV r299449: 6763 aclinherit=restricted masks inherited permissions by group
perms (groupmask)

Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Albert Lee <trisk@nexenta.com>

openzfs/openzfs@eebb483d0c
2016-05-11 13:48:15 +00:00
Alexander Motin
2a219f349e MFV r299442: 6762 POSIX write should imply DELETE_CHILD on directories - and
some additional considerations

Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Kevin Crowe <kevin.crowe@nexenta.com>

openzfs/openzfs@d316fffc9c
2016-05-11 13:43:20 +00:00
Alexander Motin
42a54f9745 MFV r299440: 6736 ZFS per-vdev ZAPs
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Joe Stein <joe.stein@delphix.com>

openzfs/openzfs@215198a6ad
2016-05-11 12:54:00 +00:00
Alexander Motin
7d54dbae83 MFV r299438: 6842 Fix empty xattr dir causing lockup
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chunwei Chen <tuxoko@gmail.com>

openzfs/openzfs@02525cd08f
2016-05-11 12:46:07 +00:00
Alexander Motin
d7ff478705 MFV r299436: 6843 Make xattr dir truncate and remove in one tx
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chunwei Chen <tuxoko@gmail.com>

openzfs/openzfs@399cc7d5d9
2016-05-11 12:43:54 +00:00
Alexander Motin
0b99ac761e MFV r299434: 6841 Undirty freed spill blocks
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Tim Chase <tim@chase2k.com>

openzfs/openzfs@445e67805d
2016-05-11 12:38:07 +00:00
Ruslan Bukin
d7dc6bae03 Implement FBT provider (MD part) for DTrace on MIPS.
Tested on MIPS64.

Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
2016-05-05 13:54:50 +00:00
Alan Somers
c9a807447d Fix a use-after-free when "zpool import" fails
clear vd->vdev_tsd in vdev_geom_close_locked instead of vdev_geom_detach.
In the latter function, it would fail to happen in certain circumstances
where cp->private was unset.  Ideally, the latter should never happen, but
it can happen when vdev open fails, or where spares are involved.

MFC after:	4 weeks
X-MFC-With:	298786
Sponsored by:	Spectra Logic Corp
2016-04-29 21:29:37 +00:00
Andriy Gapon
27b6c49726 add invpcid instruction to i386 dtrace disassembler tables
MFC after:	2 weeks
2016-04-29 15:45:22 +00:00
Alan Somers
663f649ff6 Refactor vdev_geom_attach and friends to reduce code duplication
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
	Move checks for provider's sectorsize and mediasize into a single
	location in vdev_geom_attach. Remove the zfs::vdev::taste class;
	it's ok to use the regular vdev class for tasting. Consolidate guid
	checks into a single location in vdev_attach_ok. Consolidate some
	error handling code from vdev_geom_attach into vdev_geom_detach,
	closing a resource leak of geom consumers in the process.

Reviewed by:	avg
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D5974
2016-04-29 15:23:51 +00:00
Mark Johnston
676a03fa6a Increase DTRACE_FUNCNAMELEN from 128 to 192.
This allows for the long function components encountered in www/firefox.
This constant is part of DTrace's userland ABI, so this change may not be
MFC'ed.

PR:	207735
2016-04-25 18:44:11 +00:00
Mark Johnston
328d8adb9b Allow DOF sections with excessively long probe function components.
Without this change, DTrace will refuse to load a DOF section if the
function component of any of its probes exceeds DTRACE_FUNCNAMELEN (128).
Probes in C++ programs can have very long function components. Rather than
rejecting all probes if a single probe exceeds the limit, simply skip the
invalid probe and emit a warning. This ensures that valid probes are
instantiated.

PR:		207735
MFC after:	2 weeks
2016-04-25 18:40:57 +00:00
Mark Johnston
cd8bbc382d Add a kern.dtrace.err_verbose sysctl to control dtrace_err_verbose.
When this flag is turned on, DOF and DIF validation errors are printed to
the kernel message buffer. This is useful for debugging.

Also remove the debug.dtrace.debug sysctl, which has no effect.
2016-04-25 18:09:36 +00:00
Andriy Gapon
2d69831b85 lahf/sahf are supported on some amd64 processors
While the instructions were not included into the original instruction
set, their support can be indicated by a special feature bit.
For example:
  CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU)
  ...
    AMD Features2=0x37ff<LAHF, ...>

Clang 3.8 uses lahf/sahf as a faster alternative to pushf/popf where
possible.

MFC after:	2 weeks
2016-04-22 13:44:12 +00:00
Andriy Gapon
dbbcddb426 MFV r298471: 6052 decouple lzc_create() from the implementation details
illumos/illumos-gate@26455f9efc
26455f9efc

https://www.illumos.org/issues/6052
  At the moment type parameter of lzc_create() is of dmu_objset_type_t type.
  That exposes an implementation detail and requires sys/fs/zfs.h to be included
  in libzfs_core.h creating unnecessary coupling between libzfs_core interface
  and ZFS internals.
  I think that dmu_objset_type_t should be replaced with a libzfs_core
  enumeration of supported dataset types.
  For ABI reasons the new enumeration could be bit-compatible with
  dmu_objset_type_t.
  For example:
      typedef enum {
          LZC_DST_ZFS = 2,
          LZC_DST_ZVOL
      } lzc_dataset_type_t;

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <andriy.gapon@clusterhq.com>

MFC after:	2 weeks
Sponsored by:	ClusterHQ
2016-04-22 13:00:27 +00:00
Mark Johnston
6c2806594b Make the second argument of dtrace_invop() a trapframe pointer.
Currently this argument is a pointer into the stack which is used by FBT
to fetch the first five probe arguments. On all non-x86 architectures it's
simply the trapframe address, so this change has no functional impact. On
amd64 it's a pointer into the trapframe such that stack[1 .. 5] gives the
first five argument registers, which are deliberately grouped together in
the amd64 trapframe definition.

A trapframe argument simplifies the invop handlers on !x86 and makes the
x86 FBT invop handler easier to understand. Moreover, it allows for invop
handlers that may want to modify the register set of the interrupted thread.
2016-04-17 23:08:47 +00:00
Andriy Gapon
e01dd79f9a zfs_rezget: z_vnode can not be NULL if zp is valid
MFC after:	3 weeks
2016-04-16 07:41:56 +00:00
Andriy Gapon
c2d36fc5cd zfs: enable vn_io_fault support
Note that now we have to account for possible partial writes
in dmu_write_uio_dbuf().  It seems that on illumos either all or none
of the data are expected to be written.  But the partial writes are
quite expected when vn_io_fault support is enabled.

Reviewed by:	kib
MFC after:	7 weeks
Differential Revision: https://reviews.freebsd.org/D2790
2016-04-16 07:35:53 +00:00
Alan Somers
739f4ae3b1 Don't corrupt ZFS label's physpath attribute when booting while a disk is missing
Prior to this change, vdev_geom_open_by_path would call vdev_geom_attach
prior to verifying the device's GUIDs.  vdev_geom_attach calls
vdev_geom_attrchange to set the physpath in the vdev object.  The result is
that if the disk could not be found, then the labels for other disks in the
same TLD would overwrite the missing disk's physpath with the physpath of
whichever disk currently has the same devname as the missing one used to
have.

MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
2016-04-15 16:36:17 +00:00
Alan Somers
c29088b5c7 Add more debugging statements in vdev_geom.c
Log a debugging message whenever geom functions fail in vdev_geom_attach.
Printing these messages is controlled by vfs.zfs.debug

MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
2016-04-14 23:14:41 +00:00
Alan Somers
f0ac053088 Update a debugging message in vdev_geom_open_by_guids for consistency with
similar messages elsewhere in the file.

MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
2016-04-14 19:20:31 +00:00
Alan Somers
4e3ab010a2 Fix rare double free in vdev_geom_attrchanged
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
	Don't drop the g_topology_lock before freeing old_physpath. That
	opens up a race where one thread can call vdev_geom_attrchanged,
	set old_physpath, drop the g_topology_lock, then block trying to
	acquire the SCL_STATE lock. Then another thread can come into
	vdev_geom_attrchanged, set old_physpath to the same value, and
	proceed to free it. When the first thread resumes, it will free
	the same location.

	It turns out that the SCL_STATE lock isn't needed. It was
	originally added by gibbs to protect vd->vdev_physpath while
	updating the same. However, the update process subsequently was
	switched to an atomic operation (a pointer swap). Now, there is
	no need for the SCL_STATE lock, and hence no need to drop the
	g_topology_lock.

Reviewed by:	delphij
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D5413
2016-04-12 19:11:14 +00:00
Andriy Gapon
c3249989ef l2arc: make sure that all writes honor ashift of a cache device
Previously uncompressed buffers did not obey that rule.

Type of b_asize is changed to uint64_t for consistency,
given that this is a zeta-byte filesystem.

l2arc_compress_buf is renamed to l2arc_transform_buf to better reflect
its new utility.  Now not only we ensure that a compressed buffer has
a size aligned to ashift, but we also allocate a properly sized
temporary buffer if the original buffer is not compressed and it has
an odd size.  This ensures that all I/O to the cache device is always
ashift-aligned, in terms of both a request offset and a request size.

If the aligned data is larger than the original data, then we have to use
a temporary buffer when reading it as well.

Also, enhance physical zio alignment checks using vdev_logical_ashift.
On FreeBSD we have this information, so we can make stricter assertions.

Reviewed by: smh, mav
MFC after:	1 month
Sponsored by:	ClusterHQ
Differential Revision: https://reviews.freebsd.org/D2789
2016-04-12 06:56:35 +00:00
Andriy Gapon
6a50036052 Revert r297396 Modify "4958 zdb trips assert on pools with ashift >= 0xe"
A better fix is following.
2016-04-12 06:54:18 +00:00