Commit Graph

1463 Commits

Author SHA1 Message Date
trasz
81dd6a7bca Fix ru_oublocks accounting for ZFS. There are two code paths that can be
called from zfs_write() - one of them, through dmu_write(), was handled
correctly; the other wasn't.

Reviewed by:	avg@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D4923
2016-01-23 12:13:09 +00:00
asomers
a7292678aa Quell harmless CID about unchecked return value in nvlist_get_guids.
The return value doesn't need to be checked, because nvlist_get_guid's
callers check the returned values of the guids.

Coverity CID:	1341869
MFC after:	1 week
X-MFC-With:	292066
Sponsored by:	Spectra Logic Corp
2016-01-19 23:16:24 +00:00
asomers
5127f80e63 Disallow zvol-backed ZFS pools
Using zvols as backing devices for ZFS pools is fraught with panics and
deadlocks. For example, attempting to online a missing device in the
presence of a zvol can cause a panic when vdev_geom tastes the zvol.  Better
to completely disable vdev_geom from ever opening a zvol. The solution
relies on setting a thread-local variable during vdev_geom_open, and
returning EOPNOTSUPP during zvol_open if that thread-local variable is set.

Remove the check for MUTEX_HELD(&zfsdev_state_lock) in zvol_open. Its intent
was to prevent a recursive mutex acquisition panic. However, the new check
for the thread-local variable also fixes that problem.

Also, fix a panic in vdev_geom_taste_orphan. For an unknown reason, this
function was set to panic. But it can occur that a device disappears during
tasting, and it causes no problems to ignore this departure.

Reviewed by:	delphij
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D4986
2016-01-19 17:00:25 +00:00
dim
ec70d0382b MFV r294101: 6527 Possible access beyond end of string in zpool comment
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Gordon Ross <gwr@nexenta.com>

illumos/illumos-gate@2bd7a8d078

This fixes erroneous double increments of the 'check' variable in a loop
in spa_prop_validate().  I ran into this in the clang380-import branch,
where clang 3.8.0 warns about it.  (It is already fixed there.)

MFC after:	3 days
2016-01-15 21:45:53 +00:00
asomers
30c1145c49 Fix race condition involving ZFS remove events
When a ZFS drive disappears, ZFS sends a resource.fs.zfs.removed event to
userland. A userland program like zfsd(8) can use that event, for example to
activate a hotspare. The current code contains a race condition: vdev_geom
will sent the sysevent _before_ spa.c would update the vdev's status,
causing userland processes to see pool state that does not reflect the
device removal. This change moves the sysevent to spa.c, closing the race.

Reviewed by:	delphij, Sean Eric Fagan
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D4902
2016-01-14 18:19:05 +00:00
asomers
adf93787ff Fix importing l2arc device by guid
After r292066, vdev_geom verifies both the vdev and pool guids of device
labels during open. However, spare and l2arc devices don't have pool guids,
so opening them by guid will fail (opening by path, when the pathname is
known, still succeeds). This change allows a vdev to be opened by guid if
the label contains no pool_guid, which is the case for inactive spares and
l2arc devices.

PR:		292066
Reported by:	delphij
Reviewed by:	delphij, smh
MFC after:	2 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D4861
2016-01-11 22:15:46 +00:00
asomers
67600cfd18 Record physical path information in ZFS Vdevs
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:
	If available, record the physical path of a vdev in ZFS meta-data.
	Do this both when opening the vdev, and when receiving an attribute
	change notification from GEOM.

	Make vdev_geom_close() synchronous instead of deferring its work to
	a GEOM event handler. There is no benefit to deferring the work and
	this prevents a future open call from referencing a consumer that is
	scheduled for destruction. The close followed by an immediate open
	will occur during a vdev reprobe triggered by any type of I/O error.

	Consolidate vdev_geom_close() and vdev_geom_detach() into
	vdev_geom_close() and vdev_geom_close_locked(). This also moves the
	cross linking operations between vdev and GEOM consumer into a
	single place (linking in vdev_geom_attach() and unlinking in
	vdev_geom_close_locked()).

Submitted by:	gibbs, asomers
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D4524
2016-01-11 17:57:26 +00:00
smh
449ab1ce8f Fix const conversion warning in lz4_decompress
Fix const conversion warning in lz4_decompress which shows when warnings
are enabled (to be done later).

MFC after:	2 weeks
X-MFC-With:	r293268
Sponsored by:	Multiplay
2016-01-06 20:28:09 +00:00
allanjude
c7c2f2dfab Replace sys/crypto/sha2/sha2.c with lib/libmd/sha512c.c
cperciva's libmd implementation is 5-30% faster

The same was done for SHA256 previously in r263218

cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation

Extend sbin/md5 to create sha384(1)

Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h}

Reviewed by:	cperciva, des, delphij
Approved by:	secteam, bapt (mentor)
MFC after:	2 weeks
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3929
2015-12-27 17:33:59 +00:00
andrew
f57b487e00 Be stricter on which functions we can probe with FBT. We now only check the
first instruction to see if it's either a pushm with lr, or a sub with sp.
The former is the common case, with the latter used with va_args.

This removes 12 probes. These are all hand-written assembly, with a few C
functions with no stack usage.

Submitted by:	Howard Su <howard0su@gmail.com>
Differential Revision:	https://reviews.freebsd.org/D4419
2015-12-23 17:54:19 +00:00
markj
338746b90e Support an arbitrary number of arguments to DTrace syscall probes.
Rather than pushing all eight possible arguments into dtrace_probe()'s
stack frame, make the syscall_args struct for the current syscall available
via the current thread. Using a custom getargval method for the systrace
provider, this allows any syscall argument to be fetched, even in kernels
that have modified the maximum number of system call arguments.

Sponsored by:	EMC / Isilon Storage Division
2015-12-17 00:00:27 +00:00
glebius
910a73cc44 Fix breakage caused by r292373 in ZFS/FUSE/NFS/SMBFS.
With the new VOP_GETPAGES() KPI the "count" argument counts pages already,
and doesn't need to be translated from bytes to pages.

While here make it consistent that *rbehind and *rahead are updated only
if we doesn't return error.

Pointy hat to:	glebius
2015-12-16 23:48:50 +00:00
markj
66d1f9182b Remove the unused systrace device file and fix style bugs.
MFC after:	1 week
2015-12-16 23:46:27 +00:00
glebius
63cd1c131a A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().
o With new KPI consumers can request contiguous ranges of pages, and
  unlike before, all pages will be kept busied on return, like it was
  done before with the 'reqpage' only. Now the reqpage goes away. With
  new interface it is easier to implement code protected from race
  conditions.

  Such arrayed requests for now should be preceeded by a call to
  vm_pager_haspage() to make sure that request is possible. This
  could be improved later, making vm_pager_haspage() obsolete.

  Strenghtening the promises on the business of the array of pages
  allows us to remove such hacks as swp_pager_free_nrpage() and
  vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
  values for read ahead and read behind, that a pager may do, if it
  can. These pages are completely owned by pager, and not controlled
  by the caller.

  This shifts the UFS-specific readahead logic from vm_fault.c, which
  should be file system agnostic, into vnode_pager.c. It also removes
  one VOP_BMAP() request per hard fault.

Discussed with:	kib, alc, jeff, scottl
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-12-16 21:30:45 +00:00
asomers
14a36812c1 Change an important error message from ZFS_LOG to printf
Submitted by:	gibbs
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
2015-12-11 00:04:13 +00:00
asomers
0e501d366b During vdev_geom_open, require that the vdev guids match the device's label
except during split, add, or create operations. This fixes a bug where the
wrong disk could be returned, and higher layers of ZFS would immediately
eject it again.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:
	o When opening by GUID, require both the pool and vdev GUIDs to
	  match.  While it is highly unlikely for two vdevs to have the same
	  vdev GUIDs, the ZFS storage pool allocator only guarantees they
	  are unique within a pool.

	o Modify the open behavior to:
	  - If we are opening a vdev that hasn't previously been opened,
	    open by path without checking GUIDs.
	  - Otherwise, open by path and verify GUIDs.
	  - If that fails, search all geom providers for a device with
	    matching GUIDs.
	  - If that fails, return ENOENT.

Submitted by:	gibbs, asomers
Reviewed by:	smh
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D4486
2015-12-10 21:46:21 +00:00
markj
8fdcecf370 MFV r289003:
6271 dtrace caused excessive fork time

Author: Bryan Cantrill <bryan@joyent.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Gordon Ross <gwr@nexenta.com>

illumos/illumos-gate@7bd3c1d12d
2015-12-07 21:49:32 +00:00
markj
6744527ba7 Modify DTRACEHIOC_ADDDOF to copy the DOF section from the target process.
r281257 added support for lazyload mode by allowing dtrace(1) to register
a DOF section on behalf of a traced process. This was implemented by
having libdtrace copy the DOF section into a heap-allocated buffer and
passing its address to the ioctl handler. However, DTrace uses the DOF
section address as a lookup key in certain cases, so the ioctl handler
should be given the target process' DOF section address instead. This
change modifies the ADDDOF handler to copy the DOF section in from the
target process, rather than from dtrace(1).
2015-12-07 21:44:05 +00:00
markj
f734f97f4e Add helper functions proc_readmem() and proc_writemem().
These helper functions can be used to read in or write a buffer from or to
an arbitrary process' address space. Without them, this can only be done
using proc_rwmem(), which requires the caller to fill out a uio. This is
onerous and results in code duplication; the new functions provide a simpler
interface which is sufficient for most existing callers of proc_rwmem().

This change also adds a manual page for proc_rwmem() and the new functions.

Reviewed by:	jhb, kib
Differential Revision:	https://reviews.freebsd.org/D4245
2015-12-07 21:33:15 +00:00
andrew
ecf3965dfa Allow the artificial profile frames to be adjusted as needed by the user.
While here update for armv6 to a tested value.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	stat
Differential Revision:	https://reviews.freebsd.org/D4315
2015-12-05 10:00:01 +00:00
andrew
680de71cbf Move the check to see if we are tracing a function with the DTrace Function
Boundary Trace to assembly to reduce the overhead of these checks.

Submitted by:	Howard Su <howard0su@gmail.com>
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D4266
2015-12-05 09:32:36 +00:00
stas
1028b2bacc Make the number of fasttrap probes and the size of the trace points hash table
tunable via sysctl or kernel tunables.

Illumos allows this parameters to be changed via the fasttrap.conf configuration
file, but FreeBSD code hardcoded the parameters.  Expose them under
the kern.dtrace.fasttrap sysctl tree.

MFC after:	2 weeks
2015-12-01 00:24:54 +00:00
markj
965992ab09 Fix a bug in the amd64 dtrace_getarg() implementation: when unwinding the
stack, take into account the copy of rsi pushed between the breakpoint
trapframe and the dtrace_invop frame. Prior to r287644, this was covered
by the fact that sizeof(struct amd64_frame) was 24 rather than 16.

Reported by:	smh
2015-11-19 05:33:15 +00:00
smh
6ee330f608 Switch zfs_panic_recover to panic for bad DVA
As reported by Coverity a null pointer de-reference panic would be triggered
when zfs_recover was set so switch to straight panic as it can never be
recovered.

Reported by: Coverity Scan
MFC after:	1
X-MFC-With:	r290401
Sponsored by:	Multiplay
2015-11-06 20:45:19 +00:00
smh
357c5a9540 Provide information about bad DVA
Provide information about which vdev has an issue with a bad DVA.

MFC after:	1 week
Sponsored by:	Multiplay
2015-11-05 17:12:41 +00:00
smh
eff971aefe Allow zfs_recover to be changed at runtime
MFC after:	1 week
Sponsored by:	Multiplay
2015-11-05 17:00:42 +00:00
andrew
f754112b22 Fix the open solaris atomic functions on arm64. Without this we may use the
wrong value in the comparison, leading to incorrectly setting the new
value.

This has been observed in the ZFS code. Without this we can lose track of
the reference count in a zrlock object.

We should move to use the generic atomic functions, however as this has
been observed I would prefer to have this working, then move to the generic
functions.

PR:		204037
Sponsored by:	ABT Systems Ltd
2015-11-05 16:55:27 +00:00
avg
de9b219b0a zfs: allow the lookup of extended attributes of an unlinked file
That's required for extattr_get_fd(2) and the like to work properly.

PR:		203201
MFC after:	17 days
2015-11-02 10:07:21 +00:00
avg
2b14b99e0a l2arc: do not call trim_map_free() for blocks with zero b_asize
b_asize can be zero if the block is compressed into an empty block
(ZIO_COMPRESS_EMPTY) and the trim code asserts that meaningless
zero-sized trimming is not attempted.
The logic for calling trim_map_free() is extracted into a new function
l2arc_trim() to minimize code duplication.

PR:		203473
Reported by:	Willem Jan Withagen <wjw@digiware.nl>
Tested by:	Willem Jan Withagen <wjw@digiware.nl>
MFC after:	11 days
2015-10-30 12:00:34 +00:00
jhb
9740ac3060 Rename remaining linux32 symbols such as linux_sysent[] and
linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with
linux64.ko.  While here, add support for linux64 binaries to systrace.
- Update NOPROTO entries in amd64/linux/syscalls.master to match the
  main table to fix systrace build.
- Add a special case for union l_semun arguments to the systrace
  generation.
- The systrace_linux32 module now only builds the systrace_linux32.ko.
  module on amd64.
- Add a new systrace_linux module that builds on both i386 and amd64.
  For i386 it builds the existing systrace_linux.ko.  For amd64 it
  builds a systrace_linux.ko for 64-bit binaries.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D3954
2015-10-22 21:28:20 +00:00
mav
f204be76de MFV r289561: 6328 Fix cstyle errors in zfs codebase
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>

illumos/illumos-gate@9a686fbc18
2015-10-19 08:25:37 +00:00
mav
1c4271d11b MFV r289526:
5561 support root pools on EFI/GPT partitioned disks
5125 update zpool/libzfs to manage bootable whole disk pools (EFI/GPT labeled disks)

Reviewed by: Jean McCormack <jean.mccormack@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Hans Rosenfeld <hans.rosenfeld@nexenta.com>

illumos/illumos-gate@1a902ef862

This is NOP changes for FreeBSD.
2015-10-18 18:08:33 +00:00
mav
33618c2e43 Fix ZFS ABI compat shims for zfs receive after r289362.
Difference appeared much less drammatic then seemed originally.
2015-10-17 07:32:46 +00:00
mav
d58e8ed932 MFV r289310:
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@45818ee124

This is only a partial merge of respective ZFS infrastructure changes.
At this moment FreeBSD kernel has no those crypto algorithms, so the
parts of the code to enable them are commented out.  When they are
implemented, it will be trivial to plug them in.
2015-10-16 14:45:21 +00:00
mav
c1c5997359 MFV r289312: 2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@9c3fd1216f

For more info, see:
 - slides http://www.slideshare.net/MatthewAhrens/openzfs-send-and-receive
 - video https://www.youtube.com/watch?v=iY44jPMvxog
 - manpage changes (for zfs resume -s and zfs send -t)
 - upcoming talk at the OpenZFS Developer Summit

The TL;DR is:
Use "zfs receive -s" to save the partially received state on failure.
On failure, get the receive token with "zfs get receive_resume_token <fs>"
Resume the send with "zfs send -t <token_value>"

Relnotes:	yes
2015-10-15 08:47:32 +00:00
mav
297adc9551 MFV r289308: 6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Xin LI <delphij@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Justin T. Gibbs <gibbs@FreeBSD.org>

illumos/illumos-gate@d2058105c6
2015-10-14 10:38:05 +00:00
mav
984f7bcd4c MFV r289306: 6295 metaslab_condense's dbgmsg should include vdev id
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@freebsd.org>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Joe Stein <joe.stein@delphix.com>

illumos/illumos-gate@daec38ecb4
2015-10-14 10:31:50 +00:00
mav
89a4cb886e MFV r289304: 6293 ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign()
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@8fe00bfb87
2015-10-14 10:28:29 +00:00
mav
f3d984a1de MFV r289298: 6286 ZFS internal error when set large block on bootfs
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@6de9bb5603
2015-10-14 07:50:08 +00:00
mav
4e77c724b3 MFV r289296: 6288 dmu_buf_will_dirty could be faster
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@0f2e7d03b8
2015-10-14 07:45:44 +00:00
mav
afbdb89044 MFV r289294: 5219 l2arc_write_buffers() may write beyond target_sz
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Saso Kiselkov <skiselkov@gmail.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Justin Gibbs <gibbs@FreeBSD.org>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Andriy Gapon <avg@freebsd.org>

illumos/illumos-gate@d7d9a6d919
2015-10-14 07:37:02 +00:00
mav
71f83ef21c FreeBSD-specific addition to r289191. 2015-10-12 18:15:25 +00:00
mav
997b0e8273 MFV r289188: 6281 prefetching should apply to 1MB reads
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Alexander Motin <mav@freebsd.org>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Author: George Wilson <george.wilson@delphix.com>

illumos/illumos-gate@632802744e
2015-10-12 15:48:45 +00:00
mav
297ae72354 MFV r289187: 6251 add tunable to disable free_bpobj processing
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Albert Lee <trisk@omniti.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: George Wilson <george.wilson@delphix.com>

illumos/illumos-gate@139510fb6e
2015-10-12 15:44:44 +00:00
mav
8a2ea2981d MFV r289185: 6250 zvol_dump_init() can hold txg open
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Albert Lee <trisk@omniti.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: George Wilson <george.wilson@delphix.com>

illumos/illumos-gate@b10bba7246
2015-10-12 15:39:03 +00:00
mav
bdf9b31316 Restore original array_rd_sz semantics.
Before r278702 prefetch was blocked for I/Os > 1MB, after -- >= 1MB.
1MB I/Os are used for bulk operations in CTL (XCOPY, VERIFY), and disabling
prefetch for them reduced the performance.

This is temporary local patch, that should be replaced when upstreamed.

Discussed with:	mahrens
MFC after:	3 days
2015-10-03 11:05:58 +00:00
markj
2571010394 MFV r288408:
6266 harden dtrace_difo_chunksize() with respect to malicious DIF

illumos/illumos-gate@395c7a3dcf

Reviewed by: Alex Wilson <alex.wilson@joyent.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Bryan Cantrill <bryan@joyent.com>

MFC after:	1 week
2015-09-30 05:24:22 +00:00
avg
7524c2cfda sdt: static-ize couple of variables
MFC after:	11 days
2015-09-29 12:14:22 +00:00
avg
89da8be484 sdt module does not seem to actually use any symbol from opensolaris module
MFC after:	11 days
2015-09-29 12:13:31 +00:00
avg
7ead013d52 std: it is important that func name is never an empty string
otherwise DTRACE_ANCHORED() returns false and that makes stack()
insert a bogus frame at the top.
For example:
dtrace -n 'test:dtrace_test::sdttest { stack(); }

This change is not really a solution, but just a work-around.
The real solution is to record the probe's call site and to use
that for resolving a function name.

PR:		195222
MFC after:	22 days
2015-09-29 12:02:23 +00:00
avg
8594b69e81 sdt: start checking version field when parsing probe definitions
This is an extra safety measure.

MFC after:	21 days
2015-09-29 11:58:21 +00:00
avg
4acd1d8740 dtrace_getarg: remove stray return statement on amd64, powerpc
MFC after:	10 days
2015-09-29 11:55:26 +00:00
avg
9e11cda474 define aok in libnvpair which is linked to all zfs libraries that need aok
This removes the circular dependency of libnvpair on libzfs / libzpool.

PR:		199811
Obtained from:	bapt
MFC after:	23 days
2015-09-28 15:25:36 +00:00
delphij
d9ec87a076 MFV r288063: make dataset property de-registration operation O(1)
A change to a property on a dataset must be propagated to its descendants
in case that property is inherited. For datasets whose information is
not currently loaded into memory (e.g. a snapshot that isn't currently
mounted), there is nothing to do; the property change will take effect
the next time that dataset is loaded. To handle updates to datasets that
are in-core, ZFS registers a callback entry for each property of each
loaded dataset with the dsl directory that holds that dataset. There
is a dsl directory associated with each live dataset that references
both the live dataset and any snapshots of the live dataset. A property
change is effected by doing a traversal of the tree of dsl directories
for a pool, starting at the directory sourcing the change, and invoking
these callbacks.

The current implementation both registers and de-registers properties
individually for each loaded dataset. While registration for a property is
O(1) (insert into a list), de-registration is O(n) (search list and then
remove). The 'n' for de-registration, however, is not limited to the size
(number of snapshots + 1) of the dsl directory. The eviction portion
of the life cycle for the in core state of datasets is asynchronous,
which allows multiple copies of the dataset information to be in-core
at once. Only one of these copies is active at any time with the rest
going through tear down processing, but all copies contribute to the
cost of performing a dsl_prop_unregister().

One way to create multiple, in-flight copies of dataset information
is by performing "zfs list" operations from multiple threads
concurrently. In-core dataset information is loaded on demand and then
evicted when reference counts drops to zero. For datasets that are not
mounted, there is no persistent reference count to keep them resident.
So, a list operation will load them, compute the information required to
do the list operation, and then evict them. When performing this operation
from multiple threads it is possible that some of the in-core dataset
information will be reused, but also possible to lose the race and load
the dataset again, even while the same information is being torn down.

Compounding the performance issue further is a change made for illumos
issue 5056 which made dataset eviction single threaded. In environments
using automation to manage ZFS datasets, it is now possible to create
enough of a backlog of dataset evictions to consume excessive amounts
of kernel memory and to bog down the system.

The fix employed here is to make property de-registration O(1). With this
change in place, it is hoped that a single thread is more than sufficient
to handle eviction processing. If it isn't, the problem can be solved
by increasing the number of threads devoted to the eviction taskq.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_prop.h:
    Associate dsl property callback records with both the
    dsl directory and the dsl dataset that is registering the
    callback. Both connections are protected by the dsl directory's
    "dd_lock".

    When linking callbacks into a dsl directory, group them by
    the property type. This helps reduce the space penalty for the
    double association (the property name pointer is stored once
    per dsl_dir instead of in each record) and reduces the number of
    strcmp() calls required to do callback processing when updating
    a single property. Property types are stored in a linked list
    since currently ZFS registers a maximum of 10 property types
    for each dataset.

    Note that the property buckets/records associated with a dsl
    directory are created on demand, but only freed when the dsl
    directory is freed. Given the static nature of property types
    and their small number, there is no benefit to freeing the few
    bytes of memory used to represent the property record earlier.
    When a property record becomes empty, the dsl directory is either
    going to become unreferenced a little later in this thread of
    execution, or there is a high chance that another dataset is
    going to be loaded that would recreate the bucket anyway.

    Replace dsl_prop_unregister() with dsl_prop_unregister_all().
    All callers of dsl_prop_unregister() are trying to remove
    all property registrations for a given dsl dataset anyway. By
    changing the API, we can avoid doing any lookups of callbacks
    by property type and just traverse the list of all callbacks
    for the dataset and free each one.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:
    Replace use of dsl_prop_unregister() with the new
    dsl_prop_unregister_all() API.

illumos/illumos-gate@03bad06fbb
    Author: Justin Gibbs <gibbs@scsiguy.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Prakash Surya <prakash.surya@delphix.com>
    Approved by: Dan McDonald <danmcd@omniti.com>

Illumos issue:
    6171 dsl_prop_unregister() slows down dataset eviction
    https://www.illumos.org/issues/6171

MFC after:	2 weeks
2015-09-25 01:05:44 +00:00
avg
4d617ecf9d MFV r287817: 6220 memleak in l2arc on debug build
c546f36aa8
https://www.illumos.org/issues/6220
  5408 introduced a memleak in l2arc, namely the member b_thawed gets leaked when
  an arc_hdr is realloced from full to l2only.

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: George Wilson <george@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Arne Jansen <sensille@gmx.net>
2015-09-21 12:23:01 +00:00
delphij
afe48155b2 MFV r287623: 5997 FRU field not set during pool creation and never
updated

ZFS already supports storing the vdev FRU in a vdev property.  There
is code in libzfs to work with this property, and there is code in
the zfs-retire FMA module that looks for that information.  But there
is no code actually setting or updating the FRU.

To address this, ZFS is changed to send a handful of new events
whenever a vdev is added, attached, cleared, or onlined, as well
as when a pool is created or imported.

Note that syseventd is not currently available on FreeBSD and thus
some work is needed to actually support the new ZFS events (e.g. in
zfsd) to actually use this capability, this changeset is mostly a
diff reduction from upstream.

illumos/illumos-gate@1437283407

Illumos issues:

    5997 FRU field not set during pool creation and never updated
    https://www.illumos.org/issues/5997
2015-09-13 07:15:14 +00:00
delphij
3187328530 Note r286552 as merged and reduce diff against upstream. 2015-09-13 06:49:42 +00:00
delphij
0ab19d08a4 MFV r287699: 6214 zpools going south
In r286570 (MFV of r277426) an unprotected write to b_flags to
set the compression mode was introduced.  This would open a race
window where data is partially decompressed, modified, checksummed
and written to the pool, resulting in pool corruption due to the
partial decompression.

Prevent this by reintroducing b_compress

illumos/illumos-gate@d4cd038c92

Illumos issues:

    6214 zpools going south
    https://www.illumos.org/issues/6214
2015-09-12 09:56:23 +00:00
delphij
0150a7214e MFV r287684: 6091 avl_add doesn't assert on non-debug builds
Use assfail() from libuutil instead of ASSERT() in userland
AVL avl_add.

illumos/illumos-gate@faa2b6be2f

Illumos issues:

    6091 avl_add doesn't assert on non-debug builds
    https://www.illumos.org/issues/6091
2015-09-12 08:50:43 +00:00
delphij
d9e5e9837e MFV r287624: 5987 zfs prefetch code needs work
Rewrite the ZFS prefetch code to detect only forward, sequential
streams.

The following kstats have been added:

    kstat.zfs.misc.arcstats.sync_wait_for_async

	How many sync reads have waited for async read
	to complete. (less is better)

    kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch

	How many demand read didn't have to wait for I/O
	because of predictive prefetch.  (more is better)

zfetch kstats have been similified to hits, misses, and max_streams,
with max_streams representing times when we were not able to create
new stream because we already have the maximum number of sequences
for a file.

The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been
replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes
to prefetch per stream.

illumos/illumos-gate@cf6106c8a0

Illumos ZFS issues:

    5987 zfs prefetch code needs work
    https://www.illumos.org/issues/5987
2015-09-12 08:35:51 +00:00
markj
b30bc0e211 Remove the arg0 field from struct amd64_frame. Its existence was a bug,
since on amd64 the first argument to a function is generally not on the
stack.

Revert an old DTrace bug fix to some code that assumed that
sizeof(struct amd64_frame) == 16.

Reviewed by:	jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3255
2015-09-11 03:31:22 +00:00
markj
3ff7032ecd MFV r283513:
5930 fasttrap_pid_enable() panics when prfind() fails in forking process
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bryan Cantrill <bryan@joyent.com>

illumos/illumos-gate@9df7e4e12e
2015-09-11 03:06:34 +00:00
markj
399d2d3d73 MFV r283512:
3599 dtrace_dynvar tail calls can blow stack
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bryan Cantrill <bryan@joyent.com>

illumos/illumos-gate@d47448f09a
2015-09-11 03:04:24 +00:00
delphij
db0a2c953d Expose an interface to determine if an ACE is inherited.
Submitted by:	sef
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3540
2015-09-04 00:14:20 +00:00
allanjude
69ba2d5576 Apply the noline attribute to vdev_queue_max_async_writes
This makes it possible to analyze the performance of the new ZFS
write throttle with dtrace

PR:		200316
Submitted by:	Lacey Powers <lacey.leanne@gmail.com>
Reviewed by:	avg, smh, delphij (no objection)
Approved by:	bapt (mentor)
MFC after:	1 month
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3472
2015-08-31 23:10:42 +00:00
delphij
ea2d741487 Fix a buffer overrun which may lead to data corruption, introduced in
r286951 by reinstating changes in r274628.

In l2arc_compress_buf(), we allocate a buffer to stash away the compressed
data in 'cdata', allocated of l2hdr->b_asize bytes.

We then ask zio_compress_data() to compress the buffer, b_l1hdr.b_tmp_cdata,
which is of l2hdr->b_asize bytes, and have the compressed size (or original
size, if compress didn't gain enough) stored in csize.

To pad the buffer to fit the optimal write size, we round up the compressed
size to L2 device's vdev_ashift.

Illumos code rounds up the size by at most SPA_MINBLOCKSIZE.  Because we
know csize <= b_asize, and b_asize is integer multiple of SPA_MINBLOCKSIZE,
we are guaranteed that the rounded up csize would be <= b_asize. However,
this is not necessarily true when we round up to 1 << vdev_ashift, because
it could be larger than SPA_MINBLOCKSIZE.

So, in the worst case scenario, we are overwriting at most

	(1 << vdev_ashift - SPA_MINBLOCKSIZE)

bytes of memory next to the compressed data buffer.

Andriy's original change in r274628 reorganized the code a little bit,
by moving the padding to after we determined that the compression was
beneficial.  At which point, we would check rounded size against the
allocated buffer size, and the buffer overrun would not be possible.
2015-08-29 09:22:32 +00:00
delphij
56f6e9f024 In r286705 (Illumos 5960/a2cdcdd), a separate thread is created with curproc
as parent.  In the case of a send or receive, the curproc would be the
userland application that issues the ioctl.  This would trigger an assertion
failure introduced in Solaris compatibility shims in r196458 when kernel is
compiled with INVARIANTS.

Fix this by using p0 (proc0 or kernel) as the parent thread when creating
the kernel threads.
2015-08-29 08:16:57 +00:00
avg
5a44a9dd1d MFV (partial) r286889: 5692 expose the number of hole blocks in a file
FreeBSD porting notes:
- only kernel-side changes are merged
- the new ioctl is not actually implemented yet
- thus, the goal is to synchronize DMU code

illumos/illumos-gate@2bcf0248e9

https://www.illumos.org/issues/5692
we would like to expose the number of hole (sparse) blocks in a file.
this can be useful to for example if you want to fill in the holes with
some data; knowing the number of holes in advances allows you to report
progress on hole filling. We could use SEEK_HOLE to do that but it would
be O(n) where n is the number of holes present in the file.

Author: Max Grossman <max.grossman@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2015-08-24 09:48:50 +00:00
avg
69063f1a99 spa_import_rootpool: prevent lock and resource leak
The lock leak could lead to a deadlock later.

PR:		198563
Submitted by:	Fabian Keil <fk@fabiankeil.de>
MFC after:	1 week
2015-08-24 08:44:44 +00:00
avg
98538c87be account for ashift when gathering buffers to be written to l2arc device
The change that introduced the L2ARC compression support also introduced
a bug where the on-disk size of the selected buffers could end up larger
than the target size if the ashift is greater than 9.  This was because
the buffer selection could did not take into account the fact that
on-disk size could be larger than the in-memory buffer size due to
the alignment requirements.

At the moment b_asize is a misnomer as it does not always represent the
allocated size: if a buffer is compressed, then the compressed size is
properly rounded (on FreeBSD), but if the compression fails or it is not
applied, then the original size is kept and it could be smaller than what
ashift requires.

For the same reasons arcstat_l2_asize and the reported used space
on the cache device could be smaller than the actual allocated size
if ashift > 9.  That problem is not fixed by this change.

This change only ensures that l2ad_hand is not advanced by more
than target_sz.  Otherwise we would overwrite active (unevicted)
L2ARC buffers.  That problem is manifested as growing l2_cksum_bad
and l2_io_error counters.

This change also changes 'p' prefix to 'a' prefix in a few places
where variables represent allocated rather than physical size.

The resolved problem could also result in the reported allocated size
being greater than the cache device's capacity, because of the
overwritten buffers (more than one buffer claiming the same disk
space).

This change is already in ZFS-on-Linux:
zfsonlinux/zfs@ef56b0780c

PR:		198242
PR:		195746 (possibly related)
Reviewed by:	mahrens (https://reviews.csiden.org/r/229/)
Tested by:	gkontos@aicom.gr (most recently)
MFC after:	15 days
X-MFC note:	patch does not apply as is at the moment
Relnotes:	yes
Sponsored by:	ClusterHQ
Differential Revision:	https://reviews.freebsd.org/D2764
Reviewed by:	noone (@FreeBSD.org)
2015-08-24 08:10:52 +00:00
avg
ae89183402 try to fix lor between z_teardown_lock and spa_namespace_lock
The lock order reversal and a resulting deadlock were introduced
in r285021 / D2865.  The problem is that zfs_register_callbacks() calls
dsl_prop_get_integer() that has to acquire spa_namespace_lock.
At the same time, spa_config_sync() is called with spa_namespace_lock
held and then it performs ZFS vnode operations that acquire
z_teardown_lock in the reader mode.

So, fix the problem by using dsl_prop_get_int_ds() instead of
dsl_prop_get_integer().  The former does not need to look up
the pool and the dataset by name.

Reported by:	many
Reviewed by:	delphij
Tested by:	delphij, Jens Schweikhardt <schweikh@schweikhardt.net>
MFC after:	5 days
X-MFC with:	r285021
2015-08-21 08:17:44 +00:00
avg
0bba4c28ea fix a mismerge in r286539 (MFV 286538: 5562 ZFS sa_handle's violate...)
PR:		202358
X-MFC with:	r286539
X-MFC attn:	mav
2015-08-21 08:04:56 +00:00
mav
cd9f090632 Restore part of r274628, reverted at r286776.
Submitted by:	avg
2015-08-20 07:41:33 +00:00
oshogbo
fe707ca6b5 Add support for the arrays in nvlist library.
- Add
  nvlist_{add,get,take,move,exists,free}_{number,bool,string,nvlist,
  descriptor} functions.
- Add support for (un)packing arrays.
- Add the nvl_array_next field to the nvlist structure.
  If an array is added by the nvlist_{move,add}_nvlist_array function
  this field will contains next element in the array.
- Add the nitems field to the nvpair and nvpair_header structure.
  This field contains number of elements in the array.
- Add special flag (NV_FLAG_IN_ARRAY) which is set if nvlist is a part of
  an array.
- Add special type (NV_TYPE_NVLIST_ARRAY_NEXT).This type is used only
  on packing/unpacking.
- Add new API for traversing arrays (nvlist_get_array_next).
- Add the nvlist_get_pararr function which combines the
  nvlist_get_array_next and nvlist_get_parent functions. If nvlist is in
  the array it will return next element from array. If nvlist is last
  element in array or it isn't in array it will return his
  container (parent). This function should simplify traveling over nvlist.
- Add tests for new features.
- Add documentation for new functions.
- Add my copyright.
- Regenerate the sys/cddl/compat/opensolaris/sys/nvpair.h file.

PR:		191083
Reviewed by:	allanjude (doc)
Approved by:	pjd (mentor)
2015-08-15 06:34:49 +00:00
mav
3372239bab Remove some random accumulated diff from Illumos.
Submitted by:	avg (partially)
2015-08-14 13:43:12 +00:00
mav
2dd39755dd 2618 arc.c mistypes in the comments
Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bart Coddens <bart.coddens@gmail.com>

illumos/illumos-gate@fc98fea58e
2015-08-14 13:10:30 +00:00
mav
42db8f8b53 Fix r286766 build with debug. 2015-08-14 11:47:53 +00:00
mav
5126ac33ef Fix minor mismerge sometimes earlier. 2015-08-14 09:48:23 +00:00
mav
3cc7e59c98 MFV r286765: 5817 change type of arcs_size from uint64_t to refcount_t
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@2fd872a734

As a way to make it more difficult to introduce bugs into the ARC, and to
make it easier to diagnose issues when bugs do creep in, it would be
beneficial to change the type of the arc_state_t's arcs_size field to be
a refcount_t instead of a uint64_t. This would allow us to make stricter
checks when incrementing and decrementing the value with debugging enabled,
but still fallback to simple, fast atomic operations when debugging is
disabled.
2015-08-14 09:39:23 +00:00
mav
14f3f34528 MFV r285025: 6033 arc_adjust() should search MFU lists for oldest buffer
when adjusting MFU size.

illumos/illumos-gate@31c46cf23c

https://www.illumos.org/issues/6033
  When we're looking for the list containing oldest buffer we never
  actually look at the MFU lists even when we try to evict from MFU.
  looks like a copy paste error, the fix is here:

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Xin Li <delphij@delphij.net>
Reviewed by: Prakash Surya <me@prakashsurya.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alek Pinchuk <alek@nexenta.com>
Obtained from:  illumos
2015-08-14 09:33:46 +00:00
mav
71fb6300f4 MFV r277431: 5497 lock contention on arcs_mtx
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@244781f10d

This patch attempts to reduce lock contention on the current arc_state_t
mutexes. These mutexes are used liberally to protect the number of LRU
lists within the ARC (e.g. ARC_mru, ARC_mfu, etc). The granularity at
which these locks are acquired has been shown to greatly affect the
performance of highly concurrent, cached workloads.
2015-08-14 09:31:07 +00:00
mav
6c172b2474 Revert part of r205231, introducing multiple ARC state locks.
This local implementation will be replaced by one from Illumos to reduce
code divergence and make further merges easier.
2015-08-14 09:25:54 +00:00
mav
f5e5570f14 MFV 286711: 6096 ZFS_SMB_ACL_RENAME needs to cleanup better
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>

illumos/illumos-gate@8f5190a540
2015-08-13 00:13:55 +00:00
mav
666c614c03 MFV 286709:
6093 zfsctl_shares_lookup should only VN_RELE() on zfs_zget() success

Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Dan McDonald <danmcd@omniti.com>

illumos/illumos-gate@0f92170f1e
2015-08-13 00:10:36 +00:00
mav
95e93478d4 MFV 286707: 5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@ca0cc3918a

A ZFS feature flags (large blocks) tracks its refcounts as the number of
datasets that have ever used the feature. Several features of this type
are planned to be added (new checksum functions). This code should be made
common infrastructure rather than duplicating the code for each feature.
2015-08-12 23:59:17 +00:00
mav
99cadf9eed MFV r286704: 5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>

While running 'zfs recv' we noticed that every 128th 8K block required a
read. We were seeing that restore_write() was calling dmu_tx_hold_write()
and the indirect block was not cached. We should prefetch upcoming indirect
blocks to avoid having to go to disk and blocking the restore_write().

Allow an incremental send stream to be received as a clone, even if the
stream does not mark it as a clone.
2015-08-12 22:41:06 +00:00
mav
a874d2c180 MFV r284763: 5981 Deadlock in dmu_objset_find_dp
illumos/illumos-gate@1d3f896f54

https://www.illumos.org/issues/5981
  When dmu_objset_find_dp gets called with a read lock held, it fans out
  the work to the task queue. Each task in turn acquires its own read
  lock before calling the callback. If during this process anyone tries
  to a acquire a write lock, it will stall all read lock requests.Thus
  the tasks will never finish, the read lock of the caller will never
  get freed and the write lock never acquired.  deadlock.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Arne Jansen <jansen@webgods.de>
2015-08-12 19:10:29 +00:00
mav
584d81138a MFV r284762: 5269 zpool import slow
illumos/illumos-gate@12380e1e70

https://www.illumos.org/issues/5269
  When importing a pool (at boot or with zpool import) with many
  filesystem, the process can take minutes. It doesn't matter whether
  the pool has been exported cleanly or uncleanly.  The problem is that
  each dataset has its own log chain. On import, all datasets have to be
  checked if there are logs to replay.  The idea is to speed up this
  process by paralellizing it.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Arne Jansen <jansen@webgods.de>
2015-08-12 18:47:30 +00:00
mav
53d48e6cc1 MFV r286682: 5765 add support for estimating send stream size with
lzc_send_space when source is a bookmark

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@nexenta.com>
Author: Max Grossman <max.grossman@delphix.com>

illumos/illumos-gate@643da460c8
2015-08-12 18:23:08 +00:00
mav
e8e83ea04f MFV r286224: 5695 dmu_sync'ed holes do not retain birth time
illumos/illumos-gate@70163ac57e

https://www.illumos.org/issues/5695
  In dmu_sync_ready(), a hole block pointer will have it's logical size
  explicitly set as it's necessary for replay purposes. To "undo" this,
  dmu_sync_done() will zero out any hole that it finds. This becomes a
  problem when using the "hole_birth" feature, as this will also wipe out
  any birth time that might have happened to be set on the hole.
  ...
  As a fix, the logic to zero out a hole is only applied to old style
  holes with a birth time of zero. Holes created with the "hole_birth"
  feature enabled will have a non-zero birth time, and will be skipped
  (thus preserving the ltime, type, and level information as well).
  In addition, zdb was updated to also print the ltime, type, and level
  information for these new style holes. Previously, only the logical
  birth time would be printed.

Author: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
2015-08-12 17:21:41 +00:00
mav
716a595284 Fix set of sign extension bugs in r286625. 2015-08-12 08:36:58 +00:00
mav
aa6e077870 Fix assertion panic caused by combination of r286598 and TRIM. 2015-08-11 19:15:55 +00:00
mav
dda8fad434 Fix r286625 build on i386. 2015-08-11 12:38:01 +00:00
mav
df9bb0915e Fix minor mismerge in r286574. 2015-08-11 12:22:16 +00:00
mav
78648874e5 MFV r277425:
5376 arc_kmem_reap_now() should not result in clearing arc_no_grow
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@2ec99e3e98
2015-08-11 10:39:19 +00:00
mav
84344d1daa Remove extra lock, that IMO only creates potential problems now. 2015-08-11 09:18:51 +00:00
mav
5feb31de1d MFV 286604: 5812 assertion failed in zrl_tryenter(): zr_owner==NULL
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@8df173054c
2015-08-10 21:36:51 +00:00
mav
7473557e25 MFV 286602: 5810 zdb should print details of bpobj
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@732885fca0
2015-08-10 21:32:40 +00:00
mav
f2c3d6c63d MFV 286599: 5808 spa_check_logs is not necessary on readonly pools
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@23367a2f2c
2015-08-10 21:19:42 +00:00
mav
85b64b29c8 MFV 286597: 5701 zpool list reports incorrect "alloc" value for cache devices
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@a52fc310ba
2015-08-10 21:13:59 +00:00