Commit Graph

548 Commits

Author SHA1 Message Date
Wolfram Schneider
2b299dcf0e add forgotten opening bracket "("
PR:		237514
Reviewed by:	allanjude
MFC after:	soon for 11.3 and 12 series
Differential Revision:	https://reviews.freebsd.org/D21009
2019-07-31 21:21:34 +00:00
Allan Jude
92e0d7f840 zpool.8: the comment property is not read-only
The comment property was listed in the man page twice, once under the list
of read-only properties, and again (correctly), under the list of user
editable properties.

PR:		238355
Reported by:	Michael Zuo <muh.muhten@gmail.com>
Sponsored by:	Klara Systems
2019-06-06 01:32:00 +00:00
Mariusz Zaborski
75ed05ef7d DTrace: create an amd64 test suit
Create two tests checking if we can read urgs registers and if the
rax register returns a correct number.

Reviewed by:	markj
Discussed with:	lwhsu
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20364
2019-06-05 22:32:26 +00:00
Alexander Motin
9b048dd219 MFV r348583: 9847 leaking dd_clones (DMU_OT_DSL_CLONES) objects
illumos/illumos-gate@17fb938fd6

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author:     Matthew Ahrens <mahrens@delphix.com>
2019-06-03 20:49:20 +00:00
Alexander Motin
07a5c938c9 MFV r348578: 9962 zil_commit should omit cache thrash
illumos/illumos-gate@cab3a55e15

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author:     Prakash Surya <prakash.surya@delphix.com>
2019-06-03 20:24:40 +00:00
Alexander Motin
ce88141b27 MFV r348568: 9466 add JSON output support to channel programs
illumos/illumos-gate@5267591016

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     Alek Pinchuk <apinchuk@datto.com>
2019-06-03 19:15:06 +00:00
Alexander Motin
677ef2563d MFV r348553: 9681 ztest failure in spa_history_log_internal due to spa_rename()
illumos/illumos-gate@6aee0ad769

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Matthew Ahrens <mahrens@delphix.com>
2019-06-03 18:32:56 +00:00
Alexander Motin
74f7070445 MFV r348552: 9682 page fault in dsl_async_clone_destroy() while opening pool
illumos/illumos-gate@ade2c82828

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Serapheim Dimitropoulos <serapheim@delphix.com>
2019-06-03 17:56:44 +00:00
Allan Jude
19a9d4fa28 MFV/ZoL: zfs userspace ignored all unresolved UIDs after the first
zfsonlinux/zfs@88cfff1824

zfs_main: fix `zfs userspace` squashing unresolved entries

The `zfs userspace` squashes all entries with unresolved numeric
values into a single output entry due to the comparsion always
made by the string name which is empty in case of unresolved IDs.

Fix this by falling to a numerical comparison when either one
of string values is not found. This then compares any numerical
values after all with a name resolved.

Signed-off-by: Pavel Boldin <boldin.pavel@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Reported by:	clusteradm
Obtained from:	ZFS-on-Linux
MFC after:	3 days
2019-05-18 12:27:22 +00:00
Alexander Motin
d044b69950 Fix dataset name comparison in zfs_compare().
The code never returned match comparing two datasets (not snapshots).
As result, uu_avl_find(), called from zfs_callback(), never succeeded,
allowing to add same dataset into the list multiple times, for example:

	# zfs get name pers pers pers@z pers@z
	NAME    PROPERTY  VALUE   SOURCE
	pers    name      pers    -
	pers    name      pers    -
	pers@z  name      pers@z  -

With the patch:

	# zfs get name pers pers pers@z pers@z
	NAME    PROPERTY  VALUE   SOURCE
	pers    name      pers    -
	pers@z  name      pers@z  -

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-05-08 01:35:43 +00:00
Li-Wen Hsu
132af14fde Add a trailing empty line to match the test code output
This is added for letting these long failing test case pass, and for
consistency.  The test code should be fixed later to not output this extra
empty line.

Sponsored by:	The FreeBSD Foundation
2019-04-29 03:50:21 +00:00
Michael Tuexen
6eb0062dd0 Some test scripts use ncat --sctp --listen port to run an SCTP discard
server in the background. However, when running in the background,
stdin is closed and ncat initiates a graceful shutdown of the SCTP
association. This is not expected by the client. Therefore, the
ncat-based discard server is replaced by a perl-based one.

In addition, to remove the dependency from ncat, which needs to be
installed via the nmap port, also the code testing for a free SCTP port
is changed to use the perl-based client.

Finally, remove some debug output from the report generated.

Reviewed by:		lwhsu@
Differential Revision:	https://reviews.freebsd.org/D20086
2019-04-28 19:07:31 +00:00
Mark Johnston
5ee81b26a8 Ensure that we use a 64-bit value for the last mmap() argument.
When using __syscall(2), the offset argument is passed on the stack on
amd64.  Previously only 32 bits were written, so the upper 32 bits were
garbage and could cause the test to fail.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-03-20 23:35:15 +00:00
Baptiste Daroussin
e7be6f4ed4 Fix a regression introduced in r344569
Import a fix from illumos (thanks Toomas Soomas for pointing at it)

See https://www.illumos.org/issues/10205 for more details
Illumos commit: 247b7da039

Submitted by:	jack@gandi.net
Reported by:	cy
Reviewed by:	tsoome, cy, bapt
Obtained from:	Illumos
2019-02-27 13:49:41 +00:00
Sean Eric Fagan
50792eb553 Set process title during zfs send.
This adds a '-V' option to 'zfs send', which sets the process title once a
second to the progress information.

This code has been in FreeNAS for a long time now; this is just upstreaming
it here.  It was originially written by delphij.

Reviewed by:	mav
Obtained from:	iXsystems, Inc
Sponsored by:	iXsystems, Inc
Differential Revision:	https://reviews.freebsd.org/D19184
2019-02-26 19:23:22 +00:00
Baptiste Daroussin
0b858c82d8 Implement parallel mounting for ZFS filesystem
It was first implemented on Illumos and then ported to ZoL.
This patch is a port to FreeBSD of the ZoL version.
This patch also includes a fix for a race condition that was amended

With such patch Delphix has seen a huge decrease in latency of the mount phase
(https://github.com/openzfs/openzfs/commit/a3f0e2b569 for details).
With that current change Gandi has measured improvments that are on par with
those reported by Delphix.

Zol commits incorporated:
a10d50f999
e63ac16d25

Reviewed by:	avg, sef
Approved by:	avg, sef
Obtained from:	ZoL
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Gandi.net
Differential Revision:	https://reviews.freebsd.org/D19098
2019-02-26 08:18:34 +00:00
Andriy Gapon
885b0f9e91 zpool.8: sort zpool status flags in the same order as in illumos manual
Just in case, while I was here.

MFC after:	1 week
2019-02-20 13:37:27 +00:00
Andriy Gapon
d97ff345cf zpool.8: document -D flag for zpool status
The description is taken from the illumos manual.

Reported by:	stilezy@gmail.com
MFC after:	1 week
2019-02-20 13:34:16 +00:00
Andriy Gapon
4c325393f3 MFV r342532: 5882 Temporary pool names
Note that this commit brings only formatting changes that were done
during the final review of the illumos change, because FreeBSD got the
main changes before illumos.

illumos/illumos-gate@04e5635652
04e5635652

https://www.illumos.org/issues/5882
  This is an import of the temporary pool names functionality from ZoL:
  e2282ef57e
  26b42f3f9d
  2f3ec90061
  00d2a8c92f
  83e9986f6e
  023bbe6f01
  It is intended to assist the creation and management of virtual machines
  that have their rootfs on ZFS on hosts that also have their rootfs on
  ZFS. These situations cause SPA namespace collisions when the standard
  name rpool is used in both cases. The solution is either to give each
  guest pool a name unique to the host, which is not always desireable, or
  boot a VM environment containing an ISO image to install it, which is
  cumbersome.

MFC after:	1 week
Sponsored by:	Panzura
2018-12-26 11:03:14 +00:00
Yuri Pankov
65c9ed85e4 dtrace(1): remove reference to dtruss that was removed from base
system in r300226.

PR:		211618
Reviewed by:	gnn, markj, 0mp
Approved by:	kib (mentor, implicit)
Differential Revision:	https://reviews.freebsd.org/D17762
2018-10-31 15:29:26 +00:00
Michael Tuexen
1e88cc8b59 Add support for send, receive and state-change DTrace providers for
SCTP. They are based on what is specified in the Solaris DTrace manual
for Solaris 11.4.

Reviewed by:		0mp, dteske, markj
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D16839
2018-08-22 21:23:32 +00:00
Mark Johnston
f0af0b312f Add partial documentation for dtrace(1)'s -x configuration options.
Some options are still missing descriptions, but they can be filled in
over time.

Submitted by:	raichoo <raichoo@googlemail.com>
Reviewed by:	0mp (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16671
2018-08-16 19:28:44 +00:00
Matt Macy
cc0fbbb92e MFV/ZoL: Implement large_dnode pool feature
commit 50c957f702
Author: Ned Bass <bass6@llnl.gov>
Date:   Wed Mar 16 18:25:34 2016 -0700

    Implement large_dnode pool feature

    Justification
    -------------

    This feature adds support for variable length dnodes. Our motivation is
    to eliminate the overhead associated with using spill blocks.  Spill
    blocks are used to store system attribute data (i.e. file metadata) that
    does not fit in the dnode's bonus buffer. By allowing a larger bonus
    buffer area the use of a spill block can be avoided.  Spill blocks
    potentially incur an additional read I/O for every dnode in a dnode
    block. As a worst case example, reading 32 dnodes from a 16k dnode block
    and all of the spill blocks could issue 33 separate reads. Now suppose
    those dnodes have size 1024 and therefore don't need spill blocks.  Then
    the worst case number of blocks read is reduced to from 33 to two--one
    per dnode block. In practice spill blocks may tend to be co-located on
    disk with the dnode blocks so the reduction in I/O would not be this
    drastic. In a badly fragmented pool, however, the improvement could be
    significant.

    ZFS-on-Linux systems that make heavy use of extended attributes would
    benefit from this feature. In particular, ZFS-on-Linux supports the
    xattr=sa dataset property which allows file extended attribute data
    to be stored in the dnode bonus buffer as an alternative to the
    traditional directory-based format. Workloads such as SELinux and the
    Lustre distributed filesystem often store enough xattr data to force
    spill bocks when xattr=sa is in effect. Large dnodes may therefore
    provide a performance benefit to such systems.

    Other use cases that may benefit from this feature include files with
    large ACLs and symbolic links with long target names. Furthermore,
    this feature may be desirable on other platforms in case future
    applications or features are developed that could make use of a
    larger bonus buffer area.

    Implementation
    --------------

    The size of a dnode may be a multiple of 512 bytes up to the size of
    a dnode block (currently 16384 bytes). A dn_extra_slots field was
    added to the current on-disk dnode_phys_t structure to describe the
    size of the physical dnode on disk. The 8 bits for this field were
    taken from the zero filled dn_pad2 field. The field represents how
    many "extra" dnode_phys_t slots a dnode consumes in its dnode block.
    This convention results in a value of 0 for 512 byte dnodes which
    preserves on-disk format compatibility with older software.

    Similarly, the in-memory dnode_t structure has a new dn_num_slots field
    to represent the total number of dnode_phys_t slots consumed on disk.
    Thus dn->dn_num_slots is 1 greater than the corresponding
    dnp->dn_extra_slots. This difference in convention was adopted
    because, unlike on-disk structures, backward compatibility is not a
    concern for in-memory objects, so we used a more natural way to
    represent size for a dnode_t.

    The default size for newly created dnodes is determined by the value of
    a new "dnodesize" dataset property. By default the property is set to
    "legacy" which is compatible with older software. Setting the property
    to "auto" will allow the filesystem to choose the most suitable dnode
    size. Currently this just sets the default dnode size to 1k, but future
    code improvements could dynamically choose a size based on observed
    workload patterns. Dnodes of varying sizes can coexist within the same
    dataset and even within the same dnode block. For example, to enable
    automatically-sized dnodes, run

     # zfs set dnodesize=auto tank/fish

    The user can also specify literal values for the dnodesize property.
    These are currently limited to powers of two from 1k to 16k. The
    power-of-2 limitation is only for simplicity of the user interface.
    Internally the implementation can handle any multiple of 512 up to 16k,
    and consumers of the DMU API can specify any legal dnode value.

    The size of a new dnode is determined at object allocation time and
    stored as a new field in the znode in-memory structure. New DMU
    interfaces are added to allow the consumer to specify the dnode size
    that a newly allocated object should use. Existing interfaces are
    unchanged to avoid having to update every call site and to preserve
    compatibility with external consumers such as Lustre. The new
    interfaces names are given below. The versions of these functions that
    don't take a dnodesize parameter now just call the _dnsize() versions
    with a dnodesize of 0, which means use the legacy dnode size.

    New DMU interfaces:
      dmu_object_alloc_dnsize()
      dmu_object_claim_dnsize()
      dmu_object_reclaim_dnsize()

    New ZAP interfaces:
      zap_create_dnsize()
      zap_create_norm_dnsize()
      zap_create_flags_dnsize()
      zap_create_claim_norm_dnsize()
      zap_create_link_dnsize()

    The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The
    spa_maxdnodesize() function should be used to determine the maximum
    bonus length for a pool.

    These are a few noteworthy changes to key functions:

    * The prototype for dnode_hold_impl() now takes a "slots" parameter.
      When the DNODE_MUST_BE_FREE flag is set, this parameter is used to
      ensure the hole at the specified object offset is large enough to
      hold the dnode being created. The slots parameter is also used
      to ensure a dnode does not span multiple dnode blocks. In both of
      these cases, if a failure occurs, ENOSPC is returned. Keep in mind,
      these failure cases are only possible when using DNODE_MUST_BE_FREE.

      If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0.
      dnode_hold_impl() will check if the requested dnode is already
      consumed as an extra dnode slot by an large dnode, in which case
      it returns ENOENT.

    * The function dmu_object_alloc() advances to the next dnode block
      if dnode_hold_impl() returns an error for a requested object.
      This is because the beginning of the next dnode block is the only
      location it can safely assume to either be a hole or a valid
      starting point for a dnode.

    * dnode_next_offset_level() and other functions that iterate
      through dnode blocks may no longer use a simple array indexing
      scheme. These now use the current dnode's dn_num_slots field to
      advance to the next dnode in the block. This is to ensure we
      properly skip the current dnode's bonus area and don't interpret it
      as a valid dnode.

    zdb
    ---
    The zdb command was updated to display a dnode's size under the
    "dnsize" column when the object is dumped.

    For ZIL create log records, zdb will now display the slot count for
    the object.

    ztest
    -----
    Ztest chooses a random dnodesize for every newly created object. The
    random distribution is more heavily weighted toward small dnodes to
    better simulate real-world datasets.

    Unused bonus buffer space is filled with non-zero values computed from
    the object number, dataset id, offset, and generation number.  This
    helps ensure that the dnode traversal code properly skips the interior
    regions of large dnodes, and that these interior regions are not
    overwritten by data belonging to other dnodes. A new test visits each
    object in a dataset. It verifies that the actual dnode size matches what
    was stored in the ztest block tag when it was created. It also verifies
    that the unused bonus buffer space is filled with the expected data
    patterns.

    ZFS Test Suite
    --------------
    Added six new large dnode-specific tests, and integrated the dnodesize
    property into existing tests for zfs allow and send/recv.

    Send/Receive
    ------------
    ZFS send streams for datasets containing large dnodes cannot be received
    on pools that don't support the large_dnode feature. A send stream with
    large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be
    unrecognized by an incompatible receiving pool so that the zfs receive
    will fail gracefully.

    While not implemented here, it may be possible to generate a
    backward-compatible send stream from a dataset containing large
    dnodes. The implementation may be tricky, however, because the send
    object record for a large dnode would need to be resized to a 512
    byte dnode, possibly kicking in a spill block in the process. This
    means we would need to construct a new SA layout and possibly
    register it in the SA layout object. The SA layout is normally just
    sent as an ordinary object record. But if we are constructing new
    layouts while generating the send stream we'd have to build the SA
    layout object dynamically and send it at the end of the stream.

    For sending and receiving between pools that do support large dnodes,
    the drr_object send record type is extended with a new field to store
    the dnode slot count. This field was repurposed from unused padding
    in the structure.

    ZIL Replay
    ----------
    The dnode slot count is stored in the uppermost 8 bits of the lr_foid
    field. The bits were unused as the object id is currently capped at
    48 bits.

    Resizing Dnodes
    ---------------
    It should be possible to resize a dnode when it is dirtied if the
    current dnodesize dataset property differs from the dnode's size, but
    this functionality is not currently implemented. Clearly a dnode can
    only grow if there are sufficient contiguous unused slots in the
    dnode block, but it should always be possible to shrink a dnode.
    Growing dnodes may be useful to reduce fragmentation in a pool with
    many spill blocks in use. Shrinking dnodes may be useful to allow
    sending a dataset to a pool that doesn't support the large_dnode
    feature.

    Feature Reference Counting
    --------------------------
    The reference count for the large_dnode pool feature tracks the
    number of datasets that have ever contained a dnode of size larger
    than 512 bytes. The first time a large dnode is created in a dataset
    the dataset is converted to an extensible dataset. This is a one-way
    operation and the only way to decrement the feature count is to
    destroy the dataset, even if the dataset no longer contains any large
    dnodes. The complexity of reference counting on a per-dnode basis was
    too high, so we chose to track it on a per-dataset basis similarly to
    the large_block feature.

    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #3542
2018-08-12 00:45:53 +00:00
Alexander Leidinger
a079a34fd5 Extend the info about the limitations of datasets in jails.
Reviewed by:	allanjude
Sponsored by:	Essen Hackathon
2018-08-11 20:49:19 +00:00
Alexander Motin
0285589b38 MFV 337214:
9621 Make createtxg and guid properties public

illumos/illumos-gate@e8d4a73c86

Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Josh Paetzel <josh@tcbug.org>
2018-08-03 00:24:27 +00:00
Alexander Motin
2bce9a5316 MFV r337182: 9330 stack overflow when creating a deeply nested dataset
Datasets that are deeply nested (~100 levels) are impractical. We just put
a limit of 50 levels to newly created datasets. Existing datasets should
work without a problem.

illumos/illumos-gate@5ac95da7d6

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author:     Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
2018-08-02 21:19:35 +00:00
Alexander Motin
ac879e61ad 9523 Large alloc in zdb can cause trouble
16MB alloc in zdb_embedded_block() can cause cores in certain situations
(clang, gcc55).

OsX commit: ced236a5da
FreeBSD commit: https://svnweb.freebsd.org/base?view=revision&revision=326150
illumos/illumos-gate@03a4c2f4bf

Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     Jorgen Lundman <lundman@lundman.net>

This is an update for r326150 (by avg), where this change comes from.
2018-08-02 20:44:07 +00:00
Alexander Motin
c423a5e6b8 MFV r337161: 9512 zfs remap poolname@snapname coredumps
Only filesystems and volumes are valid "zfs remap" parameters: when passed
a snapshot name zfs_remap_indirects() does not handle the EINVAL returned
from libzfs_core, which results in failing an assertion and consequently
crashing.

illumos/illumos-gate@0b2e825398

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author:     loli10K <ezomori.nozomu@gmail.com>
2018-08-02 19:13:45 +00:00
Alexander Motin
b59e9cd1c0 MFV r316926:
7955 libshare needs to initialize only those datasets being modified by the consumer

illumos/illumos-gate@8a981c3356
8a981c3356

https://www.illumos.org/issues/7955
  Libshare currently initializes all available filesystems when doing any
  libshare operation. This requires iterating through all the filesystem
  multiple times, which is a huge performance problem for sharing and
  unsharing operations.

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Daniel Hoffman <dj.hoffman@delphix.com>

For FreeBSD this is practically a NOP, just a diff reduction.
2018-08-01 21:51:49 +00:00
Michael Tuexen
7bda966394 Add a dtrace provider for UDP-Lite.
The dtrace provider for UDP-Lite is modeled after the UDP provider.
This fixes the bug that UDP-Lite packets were triggering the UDP
provider.
Thanks to dteske@ for providing the dwatch module.

Reviewed by:		dteske@, markj@, rrs@
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D16377
2018-07-31 22:56:03 +00:00
Alexander Motin
200c27a75d MFV r337014:
9421 zdb should detect and print out the number of "leaked" objects
9422 zfs diff and zdb should explicitly mark objects that are on the deleted queue

illumos/illumos-gate@20b5dafb42

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author:     Paul Dagnelie <pcd@delphix.com>
2018-07-31 22:50:50 +00:00
Alexander Motin
0021e1c10c MFV r336991, r337001:
9102 zfs should be able to initialize storage devices

The first access to a disk block can incur a performance penalty on some
platforms (e.g. AWS's EBS, VMware VMDKs). Therefore it is recommended that
volumes be "thick provisioned", where supported by the platform (VMware).
Thick provisioning is time consuming and often is ignored. If the thick
provision step is omitted, customers will see suboptimal performance until
we have written to all parts of the LUN. ZFS should be able to initialize
any unused storage to remove any first-write penalty that exists.

illumos/illumos-gate@094e47e980

Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author:     George Wilson <george.wilson@delphix.com>
2018-07-31 21:06:04 +00:00
Alexander Motin
d1cf4052d0 MFV r336955: 9236 nuke spa_dbgmsg
We should use zfs_dbgmsg instead of spa_dbgmsg.  Or at least,
metaslab_condense() should call zfs_dbgmsg because it's important and rare
enough to always log. It's possible that the message in zio_dva_allocate()
would be too high-frequency for zfs_dbgmsg.

illumos/illumos-gate@21f7c81cc1

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:47:27 +00:00
Alexander Motin
194000fa21 MFV r336950: 9290 device removal reduces redundancy of mirrors
Mirrors are supposed to provide redundancy in the face of whole-disk failure
and silent damage (e.g. some data on disk is not right, but ZFS hasn't
detected the whole device as being broken). However, the current device
removal implementation bypasses some of the mirror's redundancy.

illumos/illumos-gate@3a4b1be953

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:25:39 +00:00
Alexander Motin
6413a6d31f MFV r336946: 9238 ZFS Spacemap Encoding V2
The current space map encoding has the following disadvantages:
[1] Assuming 512 sector size each entry can represent at most 16MB for a segment.
This makes the encoding very inefficient for large regions of space.
[2] As vdev-wide space maps have started to be used by new features (i.e.
device removal, zpool checkpoint) we've started imposing limits in the
vdevs that can be used with them based on the maximum addressable offset
(currently 64PB for a top-level vdev).

The new remains backwards compatible with the old one. The introduced
two-word entry format, besides extending the limits imposed by the single-entry
layout, also includes a vdev field and some extra padding after its prefix.

The extra padding after the prefix should is reserved for future usage (e.g.
new prefixes for future encodings or new fields for flags). The new vdev field
not only makes the space maps more self-descriptive, but also opens the doors
for pool-wide space maps.

One final important note is that the number of bits used for vdevs is reduced
to 24 bits for blkptrs. That was decided as we don't know of any setups that
use more than 16M vdevs for the time being and
we wanted to fit the vdev field in the space map. In addition that gives us
some extra bits in dva_t.

illumos/illumos-gate@17f11284b4

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-07-30 23:47:38 +00:00
Alexander Motin
1960706625 MFV r336944: 9286 want refreservation=auto
When a ZFS volume is created with zfs create -V (but without -s), the
refreservation property is set to a value that is volsize plus the maximum
size of metadata. If refreservation is ever set to another value, it is
impossible to set it back to the automatically determined value. There are
other cases where refreservation may be wrong. These include receiving a
volume that was sent without properties and zfs clone.

We need:

zfs set refreservation=auto <volume>
zfs clone -o refreservation=auto <volume>

Each one would use the same function used by zfs create -V to determine the
proper value for refreservation.

illumos/illumos-gate@1c10ae76c0

Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Mike Gerdts <mike.gerdts@joyent.com>
2018-07-30 22:39:30 +00:00
Michael Tuexen
53e0911116 Improve TCP related tests for dtrace.
Ensure that the TCP connections are terminated gracefully as expected
by the test. Use appropriate numbers for sent/received packets.
In addition, enable tst.localtcpstate.ksh, which should pass, but
doesn't until https://reviews.freebsd.org/D16369 is committed.

Reviewed by:		markj@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16288
2018-07-22 10:50:59 +00:00
Michael Tuexen
be029a4979 Test that the dtrace UDP receive probe fires.
This test ensures that the fix committed in
https://svnweb.freebsd.org/changeset/base/336551
actually works.

Reviewed by:		dteske@, markj@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16046
2018-07-20 15:37:29 +00:00
Michael Tuexen
10b803a40d Adjust comment to reality since r286171.
Sponsored by:		Netflix, Inc.
2018-07-15 20:42:47 +00:00
Michael Tuexen
e0f9b8233f Don't require a local sshd for the local TCP state dtrace test
This change is similar to the one done in r286171 for
tst.ipv4localtcp.ksh. This not only reduces the requirements on the
system used for testing but results also in a graceful teardown of
the TCP connection.

Reviewed by:		gnn@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16276
2018-07-15 20:41:16 +00:00
Michael Tuexen
dc9f20b3f3 Fix the UDP tests for dtrace.
The code imported from opensolaris was depending on ping supporting
UDP for sending probes. Since this is not supported by ping on FreeBSD
use a perl script instead.
The remote test requires the usage of ksh93, so state that in the
sheband.
Enable the local test, but keep the remote test disabled, since it
requires a remote machine on the LAN.

Reviewed by:		markj@, gnn@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16268
2018-07-15 20:34:22 +00:00
Michael Tuexen
60e4fb3a3f Return the intended return code.
This bug was spotted by markj@ in D16268 because I copied this code part
and used it there. So fix it.

Sponsored by:		Netflix, Inc.
2018-07-14 19:53:41 +00:00
Michael Tuexen
49d7124b18 Fix shebangs and execute bit of test scripts.
Since we don't have /usr/bin/ksh, use a generic way of specifying
ksh. Some of the tests only run with ksh93, so use this shell
for these tests. Two of the tests don't have the execute bit set,
so fix this, too.

Reviewed by:		markj@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16270
2018-07-14 19:49:14 +00:00
Eric van Gyzen
79d4ee83de Fix markup in zfs(8); no content change
Sponsored by:	Dell EMC
2018-06-15 15:28:31 +00:00
Sean Eric Fagan
69724399c4 This originated from ZFS On Linux, as
d4a72f2386

During scans (scrubs or resilvers), it sorts the blocks in each transaction
group by block offset; the result can be a significant improvement. (On my
test system just now, which I put some effort to introduce fragmentation into
the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the
changes.) I've seen similar rations on production systems.

Approved by:	Alexander Motin
Obtained from:	ZFS On Linux
Relnotes:	Yes (improved scrub performance, with tunables)
Differential Revision:	https://reviews.freebsd.org/D15562
2018-06-08 17:38:28 +00:00
Eitan Adler
4822188974 zpool(8): correct list of default properties in 'list'.
The default provides output in the following form:
```
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP
HEALTH  ALTROOT
```

this corrects the man page.

Also submitted upstream as
https://github.com/openzfs/openzfs/pull/632/files (with slightly
different changes needed)
2018-04-28 01:14:16 +00:00
Alexander Motin
e4d6c7fc17 MFV man pages update from r329502: 7614 zfs device evacuation/removal.
MFC after:	3 days
2018-04-17 02:33:54 +00:00
Andriy Gapon
81f187e576 allow ZFS pool to have temporary name for duration of current import
The change adds -t <name> option to zpool create and -t option to zpool
import in its form with an old name and a new name.  This allows to
import (or create) a pool under a name that's different from its real,
permanent name without affecting that name.  This is useful when working
with VM images or images of other physical systems if they happen to
have a ZFS pool with the same name as the host system.

The changes come from ZoL with some small tweaks.
The porting has been done by julian.

The change is being submitted to OpenZFS:
https://github.com/openzfs/openzfs/pull/600

Submitted by:	julian
Reviewed by:	smh
MFC after:	2 weeks
Sponsored by:	Panzura (porting)
Differential Revision: https://reviews.freebsd.org/D14972
2018-04-12 10:37:26 +00:00
Alexander Motin
849a7ce2d5 MFV r331712:
9280 Assertion failure while running removal_with_ganging test with 4K devices

illumos/illumos-gate@243952c7ee

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matt Ahrens <Matt.Ahrens@delphix.com>
2018-03-28 23:17:29 +00:00
Alexander Motin
5c4561f332 MFV r331706:
9235 rename zpool_rewind_policy_t to zpool_load_policy_t

illumos/illumos-gate@5dafeea3eb

We want to be able to pass various settings during import/open of a pool,
which are not only related to rewind. Instead of adding a new policy and
duplicate a bunch of code, we should just rename rewind_policy to a more
generic term like load_policy.

For instance, we'd like to set spa->spa_import_flags from the nvlist,
rather from a flags parameter passed to spa_import as in some cases we want
those flags not only for the import case, but also for the open case. One
such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would
allow zfs to open a pool when logs are missing.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-03-28 22:29:06 +00:00
Alexander Motin
0b0c76bc58 MFV r331695, 331700: 9166 zfs storage pool checkpoint
illumos/illumos-gate@8671400134

The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with
exactly that.  It can be thought of as a “pool-wide snapshot” (or a
variation of extreme rewind that doesn’t corrupt your data).  It remembers
the entire state of the pool at the point that it was taken and the user
can revert back to it later or discard it.  Its generic use case is an
administrator that is about to perform a set of destructive actions to ZFS
as part of a critical procedure.  She takes a checkpoint of the pool before
performing the actions, then rewinds back to it if one of them fails or puts
the pool into an unexpected state.  Otherwise, she discards it.  With the
assumption that no one else is making modifications to ZFS, she basically
wraps all these actions into a “high-level transaction”.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
2018-03-28 22:01:27 +00:00
Alexander Motin
4bada7a0a2 Partial MFV r329753:
8809 libzpool should leverage work done in libfakekernel

illumos/illumos-gate@f06dce2c1f

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andrew Stormont <astormont@racktopsystems.com>

We do not have libfakekernel, but need to reduce code divergence.
2018-03-28 20:41:15 +00:00
Alexander Motin
e76e77a972 MFV r331407: 9213 zfs: sytem typo
illumos/illumos-gate@edc8ef7d92

Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Toomas Soome <tsoome@me.com>
2018-03-23 02:30:29 +00:00
Mark Johnston
4ed6321f41 Use __syscall(2) rather than syscall(2) in syscall/tst.args.c.
Some of mmap(2)'s arguments are 64 bits wide.

MFC after:	3 days
2018-03-18 17:03:26 +00:00
Alexander Motin
1ea10a60f9 MFV r329793, r329795:
9075 Improve ZFS pool import/load process and corrupted pool recovery

illumos/illumos-gate@6f7938128a

Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 03:15:35 +00:00
Alexander Motin
502d18a8f1 MFV r329766: 8962 zdb should work on non-idle pools
illumos/illumos-gate@e144c4e6c9

Currently `zdb` consistently fails to examine non-idle pools as it fails
during the `spa_load()` process. The main problem seems to be that
`spa_load_verify()` fails as can be seen below:

$ sudo zdb -d -G dcenter
    zdb: can't open 'dcenter': I/O error

ZFS_DBGMSG(zdb):
    spa_open_common: opening dcenter
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c4t11d0s0': best uberblock found for spa dcenter. txg 40824950
    spa_load(dcenter): using uberblock with txg=40824950
    spa_load(dcenter): UNLOADING
    spa_load(dcenter): RELOADING
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c3t10d0s0': best uberblock found for spa dcenter. txg 40824952
    spa_load(dcenter): using uberblock with txg=40824952
    spa_load(dcenter): FAILED: spa_load_verify failed [error=5]
    spa_load(dcenter): UNLOADING

This change makes `spa_load_verify()` a dryrun when ran from `zdb`. This is
done by creating a global flag in zfs and then setting it in `zdb`.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 00:42:12 +00:00
Alexander Motin
24433f00ea MFV r329502: 7614 zfs device evacuation/removal
illumos/illumos-gate@5cabbc6b49

https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.

Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
2018-02-21 16:51:02 +00:00
Alexander Motin
e5a4a83784 MFV r318941: 7446 zpool create should support efi system partition
illumos/illumos-gate@7855d95b30
7855d95b30

https://www.illumos.org/issues/7446
  Since we support whole-disk configuration for boot pool, we also will need
  whole disk support with UEFI boot and for this, zpool create should create efi-
  system partition.
  I have borrowed the idea from oracle solaris, and introducing zpool create -
  B switch to provide an way to specify that boot partition should be created.
  However, there is still an question, how big should the system partition be.
  For time being, I have set default size 256MB (thats minimum size for FAT32
  with 4k blocks). To support custom size, the set on creation "bootsize"
  property is created and so the custom size can be set as: zpool create B -
  o bootsize=34MB rpool c0t0d0
  After pool is created, the "bootsize" property is read only. When -B switch is
  not used, the bootsize defaults to 0 and is shown in zpool get output with
  value ''. Older zfs/zpool implementations are ignoring this property.
  https://www.illumos.org/rb/r/219/

Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>

This commit makes no sense for FreeBSD, that is why I blocked the option,
but it should be good to stay closer to upstream.
2018-02-21 00:18:57 +00:00
Alexander Motin
91e16b7e23 MFV r316872: 7502 ztest should run zdb with -G (debug mode)
illumos/illumos-gate@c3c65d17f7
c3c65d17f7

https://www.illumos.org/issues/7502
  Right now ztest executes zdb without -G, so when it has errors, the messages
  are often not very helpful:
  Executing zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest
  zdb: can't open 'ztest': Operation not supported
  ztest: '/usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest' exit
  code 1
  With -G, we'd have:
  /usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache -G ztest
  zdb: can't open 'ztest': Operation not supported

  ZFS_DBGMSG(zdb):
  spa_open_common: opening ztest
  spa_load(ztest): LOADING
  spa_load(ztest): FAILED: unable to parse config [error=48]
  spa_load(ztest): UNLOADING
  Which indicates where the error came from

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-20 20:14:11 +00:00
Alan Somers
d4c225b01c Fix memory leaks in zdb introduced by r329508
Reported by:	Coverity
CID:		1386185
MFC after:	3 weeks
X-MFC-With:	329508
Sponsored by:	Spectra Logic Corp
2018-02-20 19:54:06 +00:00
Alexander Motin
c156ddfcb6 MFV r324198: 8081 Compiler warnings in zdb
illumos/illumos-gate@3f7978d02b
3f7978d02b

https://www.illumos.org/issues/8081
  zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
  which uses -WError, the only way to build it is to disable all compiler
  warnings. This makes it much harder to detect newly introduced bugs. We should
  cleanup all the warnings.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
2018-02-18 04:00:29 +00:00
Alan Somers
ad4bbe575b Fix "zpool add" crash when a replacing vdev has a spare child
Fix an assertion in zpool that causes a crash when running any "zpool add"
command on a spare that contains a replacing vdev with a spare child.

This likely affects Illumos, too.

PR:		225546
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D14138
2018-02-09 16:08:57 +00:00
Alexander Motin
0d6993dfd3 MFV r328255: 8972 zfs holds: In scripted mode, do not pad columns with spaces
illumos/illumos-gate@e9b7d6e7f7

https://www.illumos.org/issues/8972:
'zfs holds -H' does not properly output content in scripted mode. It uses a
tab instead of two spaces, but it still pads column widths with spaces when
it should not.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Allan Jude <allanjude@freebsd.org>
2018-01-22 06:00:45 +00:00
Alexander Motin
4eb2803697 MFV r328251: 8652 Tautological comparisons with ZPROP_INVAL
illumos/illumos-gate@4ae5f5f06c

https://www.illumos.org/issues/8652:
Clang and GCC prefer to use unsigned ints to store enums. With Clang, that
causes tautological comparison warnings when comparing a zfs_prop_t or
zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error
handling code is being silently removed as a result.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 05:52:39 +00:00
Alexander Motin
d26c97e2fb MFV r328233:
8898 creating fs with checksum=skein on the boot pools fails ungracefully

illumos/illumos-gate@9fa2266d9a

https://www.illumos.org/issues/8898:
# zfs create -o checksum=skein rpool/test
internal error: Result too large
Abort (core dumped)

Not a big deal per se, but should be handled correctly.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

PR:		222199
2018-01-22 00:01:36 +00:00
Alexander Motin
442814b7e7 MFV r328220: 8677 Open-Context Channel Programs
illumos/illumos-gate@a3b2868063

https://www.illumos.org/issues/8677
  We want to be able to run channel programs outside of synching context.
  This would greatly improve performance of channel program that just gather
  information, as we won't have to wait for synching context anymore.

  This feature should introduce the following:
  - A new command line flag in "zfs program" to specify our intention to
  run in open context.
  - A new flag/option within the channel program ioctl which selects the
  context.
  - Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
  - Documentation for the new feature in the manual pages.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-01-21 23:02:05 +00:00
Mark Johnston
224e0c2f61 Add "jid" and "jailname" variables to DTrace.
These return the jail ID and jail name for the traced process,
respectively, and are analogous to "zonename" on Solaris/illumos.
"zonename" is now aliased to "jailname".

Also add some stress tests for the new variables.

Submitted by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
Reviewed by:	dteske (previous version)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D13877
2018-01-12 19:59:46 +00:00
Mark Johnston
60eddb209b Add a regression test for r327794.
MFC after:	2 weeks
2018-01-10 21:40:36 +00:00
Mark Johnston
04006780d9 Complete support for dtrace's -x setenv option.
This allows one to override the environment for processes created with
dtrace -c. By default, the environment is inherited.

This support was originally merged from illumos in r249367 but was lost
when the commit was later reverted and then brought back piecemeal.

Reported by:	Samuel Lepetit <slepetit@apple.com>
MFC after:	2 weeks
2017-12-03 16:57:28 +00:00
Mark Johnston
f4b90f5a5c Revert r326181 for now.
We can't link an executable using -m32 until the lib32 phase of a
buildworld, though the build works fine when executing make from
cddl/usr.sbin/dtrace/tests. Some other solution will need to be found.
2017-11-27 17:54:17 +00:00
Andriy Gapon
718cb91ccc zdb: follow-up to r326150, check if malloc succeeded
Reported by:	rpokala
MFC after:	1 week
X-MFC with:	r326150
2017-11-25 09:47:31 +00:00
Mark Johnston
e96f62322b Compile one of the uctf test programs with -m32.
The err.user64mode.ksh test expects it to run as a 32-bit process.

MFC after:	1 week
2017-11-24 19:57:13 +00:00
Andriy Gapon
fd74a38251 zdb: use a heap allocation instead of a huge array on stack
SPA_MAXBLOCKSIZE is 16 MB and having such a large object on the stack is
not nice in general and it could cause some confusing failures in the
single-user mode where the default stack size of 8 MB is used.

I expect that the upstream would make the same change.

MFC after:	1 week
2017-11-24 10:45:33 +00:00
Mark Johnston
fc81ca0045 Don't assume that we can resolve "main" in the ksh executable.
MFC after:	1 week
2017-11-21 15:03:38 +00:00
Andriy Gapon
1c05a6ea6b MFV r325013,r325034: 640 number_to_scaled_string is duplicated in several commands
illumos/illumos-gate@0a0551200e
0a0551200e

https://www.illumos.org/issues/640
  du(1), df(1m), ls(1), and swap(1m) all include a copy (it appears literally
  copied) of the 'number_to_scaled_string' function in their source. This should
  be moved to a shared library and all 4 commands should use this instead.

FreeBSD note: of all libcmdutils functionality ZFS (and other illumos
contrib code) currently uses only nicenum() function (which is similar
to humanize_number but has some formatting differences).  For this
reason I decided to not port the whole library.  As a result, nicenum.c
from libcmdutils is compiled into libzfs and libzpool.  This is a bit
ugly, but works.  If one day we are forced to create libillumos, then
the file should be moved to that library.

Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Jason King <jason.brian.king@gmail.com>

MFC after:	2 weeks
2017-10-27 12:37:22 +00:00
Andriy Gapon
ac63ac6859 zdb.8: replace with the slighly modified upstream version
The upstream has converted their manual page to the same format as we
have, so we can use the upstream version with minimal modifications.

The current modifications are:
- different zpool.cache path
- different manual sections for zdb, zfs, zpool commands

igor reports a few minor issues, it would be nice to fix them both in
FreeBSD and in the upstream.

MFC after:	3 weeks
2017-10-06 08:28:35 +00:00
Andriy Gapon
e117882ba2 MFV r322235: 8067 zdb should be able to dump literal embedded block pointer
illumos/illumos-gate@4923c69fdd
4923c69fdd

FreeBSD note: the manual page is to be updated separately.

https://www.illumos.org/issues/8067
  Add an option to zdb to print a literal embedded block pointer supplied on the
  command line:
  zdb -E [-A] word0:word1:...:word15

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	3 weeks
2017-10-06 08:21:06 +00:00
Andriy Gapon
f51efc68f6 MFV r316864: 6392 zdb: introduce -V for verbatim import
illumos/illumos-gate@dfd5965f7e
dfd5965f7e

FreeBSD note: the manual page is to be updated separately.

https://www.illumos.org/issues/6392
  When given a pool name via -e, zdb would attempt an import. If it
  failed, then it would attempt a verbatim import. This behavior is
  not always desirable so a -V switch is added to zdb to control the
  behavior. When specified, a verbatim import is done. Otherwise,
  the behavior is as it was previously, except no verbatim import
  is done on failure.
  a5778ea242
  Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
  Reviewed by: Matthew Ahrens <mahrens@delphix.com>

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>

MFC after:	3 weeks
2017-10-06 08:09:20 +00:00
Andriy Gapon
65512687a9 MFV r316862: 6410 teach zdb to perform object lookups by path
illumos/illumos-gate@ed61ec1da9
ed61ec1da9

FreeBSD note: this commit does not update the manual page.
The original change includes conversion of the manual page from *roff format
to mandoc format.  So, it is hard to extract the content change from
that.  I am going to replace our zdb manual page, which is an earlier
independent conversion, with a slighly modified version of the upstream page.

https://www.illumos.org/issues/6410
  This is primarily intended to ease debugging & testing ZFS when one is only
  interested in things like the on-disk location of a specific object's blocks,
  but doesn't know their object id. This allows doing things like the following
  (FreeBSD-based example):
          # zpool create -f foo da0
          # dd if=/dev/zero of=/foo/1 bs=1M count=4 >/dev/null 2>&1
          # zpool export foo
          # zdb -vvvvv -o "ZFS plain file" foo /1
          Object  lvl   iblk   dblk  dsize  lsize   %full  type
               8    2    16K   128K  3.99M     4M  100.00  ZFS plain file (K=inherit) (Z=inherit)
                                              168   bonus  System attributes
          dnode flags: USED_BYTES USERUSED_ACCOUNTED
          dnode maxblkid: 31
          path    /1
          uid     0
          gid     0
          atime   Thu Apr 23 22:45:32 2015
          mtime   Thu Apr 23 22:45:32 2015
          ctime   Thu Apr 23 22:45:32 2015
          crtime  Thu Apr 23 22:45:32 2015
          gen     7
          mode    100644
          size    4194304
          parent  4
          links   1
          pflags  40800000004
          Indirect blocks:
                0 L1  DVA[0]=<0:c19200:600> DVA[1]=<0:10800019200:600> [L1 ZFS
  plain file] fletcher4 lz4 LE contiguous unique double size=4000L/200P birth=7L/

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	3 weeks
2017-10-06 07:52:25 +00:00
Alan Somers
d22f655459 MFV r319743: 8108 zdb -l fails to read labels 2 and 3
illumos/illumos-gate@22c8b9583d
22c8b9583d

https://www.illumos.org/issues/8108

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	3 weeks
2017-10-02 22:39:12 +00:00
Alan Somers
58dde1c585 MFV r316863: 3871 fix issues introduced by 3604
illumos/illumos-gate@de05b58863
de05b58863

https://www.illumos.org/issues/3871
  GCC 4.5.3 on Gentoo Linux did not like a few of the changes made in the issue
  3604 patch. It printed an error and a couple of warnings:
  ../../cmd/zdb/zdb.c: In function 'dump_bpobj':
  ../../cmd/zdb/zdb.c:1257:3: error: 'for' loop initial declarations are only
  allowed in C99 mode
  ../../cmd/zdb/zdb.c:1257:3: note: use option -std=c99 or -std=gnu99 to compile
  your code
  ../../cmd/zdb/zdb.c: In function 'dump_deadlist':
  ../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format
  ../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>

MFC after:	3 weeks
2017-10-02 22:35:35 +00:00
Alan Somers
2091d1c3b9 MFV r316861: 6866 zdb -l should return non-zero if it fails to find any label
illumos/illumos-gate@64723e3611
64723e3611

https://www.illumos.org/issues/6866
  Need this for #6865.
  To be generally more scripting-friendly, overload this issue with adding '-q'
  option which should skip printing any label information.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	3 weeks
2017-10-02 22:13:20 +00:00
Alan Somers
66eafdf052 MFC r316858 7280 Allow changing global libzpool variables in zdb
7280 Allow changing global libzpool variables in zdb and ztest through command line

illumos/illumos-gate@0e60744c98
0e60744c98

https://www.illumos.org/issues/7280
  zdb is very handy for diagnosing problems with a pool in a safe and
  quick way. When a pool is in a bad shape, we often want to disable some
  fail-safes, or adjust some tunables in order to open them. In the
  kernel, this is done by changing public variables in mdb. The goal of
  this feature is to add the same capability to zdb and ztest, so that
  they can change libzpool tuneables from the command line.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>

MFC after:	3 weeks
2017-10-02 22:02:04 +00:00
Andriy Gapon
620c2c801b MFV r323913: 8600 ZFS channel programs - snapshot
illumos/illumos-gate@2840dce1a0
2840dce1a0

https://www.illumos.org/issues/8600
  ZFS channel programs should be able to create snapshots.
  In addition to the base snapshot functionality, this will likely entail adding
  extra logic to handle edge cases which were formerly not possible, such as
  creating then destroying a snapshot in the same transaction sync.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>

MFC after:	5 weeks
X-MFC after:	r324163
2017-10-02 11:32:08 +00:00
Andriy Gapon
a1f65a15ce MFV r323912: 8592 ZFS channel programs - rollback
illumos/illumos-gate@000cce6b6f
000cce6b6f

https://www.illumos.org/issues/8592
  ZFS channel programs should be able to perform a rollback. This logic will
  probably look pretty similar to zfs.sync.destroy().

Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brad Lewis <brad.lewis@delphix.com>

MFC after:	5 weeks
X-MFC after:	r324163
2017-10-02 11:23:31 +00:00
Andriy Gapon
3e52a05570 MFV r323794: 8605 zfs channel programs: zfs.exists undocumented and non-working
illumos/illumos-gate@5f39f884e2
5f39f884e2

https://www.illumos.org/issues/8605
  zfs.exists() in channel programs doesn't return any result, and should have a
  man page entry.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>

MFC after:	5 weeks
X-MFC after:	r324163
2017-10-01 16:51:05 +00:00
Andriy Gapon
bda88d07d9 MFV r323530,r323533,r323534: 7431 ZFS Channel Programs, and followups
7431 ZFS Channel Programs

illumos/illumos-gate@dfc115332c
dfc115332c

https://www.illumos.org/issues/7431
  ZFS channel programs (ZCP) adds support for performing compound ZFS
  administrative actions via Lua scripts in a sandboxed environment (with time
  and memory limits).
  This initial commit includes both base support for running ZCP scripts, and a
  small initial library of API calls which support getting properties and
  listing, destroying, and promoting datasets.
  Testing: in addition to the included unit tests, channel programs have been in
  use at Delphix for several months for batch destroying filesystems. The
  dsl_destroy_snaps_nvl() call has also been replaced with

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <chris.williamson@delphix.com>

8552 ZFS LUA code uses floating point math

illumos/illumos-gate@916c8d8811
916c8d8811

https://www.illumos.org/issues/8552
  In the LUA interpreter used by "zfs program", the lua format() function
  accidentally includes support for '%f' and friends, which can cause compilation
  problems when building on platforms that don't support floating-point math in
  the kernel (e.g. sparc). Support for '%f' friends (%f %e %E %g %G) should be
  removed, since there's no way to supply a floating-point value anyway (all
  numbers in ZFS LUA are int64_t's).

Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

8590 memory leak in dsl_destroy_snapshots_nvl()

illumos/illumos-gate@e6ab4525d1
e6ab4525d1

https://www.illumos.org/issues/8590
  In dsl_destroy_snapshots_nvl(), "snaps_normalized" is not freed after it is
  added to "arg".

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

FreeBSD notes:
- zfs-program.8 manual page is taken almost as is from the vendor repository,
  no FreeBSD-ification done
- fixed multiple instances of NULL being used where an integer is expected
- replaced ETIME and ECHRNG with ETIMEDOUT and EDOM respectively

This commit adds a modified version of Lua 5.2.4 under
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua, mirroring the
upstream.  See README.zfs in that directory for the description of Lua
customizations.
See zfs-program.8 on how to use the new feature.

MFC after:	5 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D12528
2017-10-01 16:11:07 +00:00
Andriy Gapon
c13f1d82c8 MFV r323535: 8585 improve batching done in zil_commit()
FreeBSD notes:
- this MFV reverts FreeBSD commit r314549 to make the merge easier
- at present our emulation of cv_timedwait_hires is rather poor,
  so I elected to use cv_timedwait_sbt directly
Please see the differential revision for details.
Unfortunately, I did not get any positive reviews, so there could be
bugs in the FreeBSD-specific piece of the merge.
Hence, the long MFC timeout.

illumos/illumos-gate@1271e4b10d
1271e4b10d

https://www.illumos.org/issues/8585
  The current implementation of zil_commit() can introduce significant
  latency, beyond what is inherent due to the latency of the underlying
  storage. The additional latency comes from two main problems:
  1. When there's outstanding ZIL blocks being written (i.e. there's
      already a "writer thread" in progress), then any new calls to
      zil_commit() will block waiting for the currently oustanding ZIL
      blocks to complete. The blocks written for each "writer thread" is
      coined a "batch", and there can only ever be a single "batch" being
      written at a time. When a batch is being written, any new ZIL
      transactions will have to wait for the next batch to be written,
      which won't occur until the current batch finishes.
  As a result, the underlying storage may not be used as efficiently
      as possible. While "new" threads enter zil_commit() and are blocked
      waiting for the next batch, it's possible that the underlying
      storage isn't fully utilized by the current batch of ZIL blocks. In
      that case, it'd be better to allow these new threads to generate
      (and issue) a new ZIL block, such that it could be serviced by the
      underlying storage concurrently with the other ZIL blocks that are
      being serviced.
  2. Any call to zil_commit() must wait for all ZIL blocks in its "batch"
      to complete, prior to zil_commit() returning. The size of any given
      batch is proportional to the number of ZIL transaction in the queue
      at the time that the batch starts processing the queue; which
      doesn't occur until the previous batch completes. Thus, if there's a
      lot of transactions in the queue, the batch could be composed of
      many ZIL blocks, and each call to zil_commit() will have to wait for
      all of these writes to complete (even if the thread calling
      zil_commit() only cared about one of the transactions in the batch).

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12355
2017-09-26 11:04:08 +00:00
Andriy Gapon
65b5f7c743 MFV r323790: 8567 Inconsistent return value in zpool_read_label
illumos/illumos-gate@c861bfbd77
c861bfbd77

https://www.illumos.org/issues/8567
  If fstat64 fails, pread64 fails, or the label is unintelligible,
  zpool_read_label will return 0. But if malloc fails, it will return -1. For
  consistency, it should always return -1 on failure or 0 on success.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>

MFC after:	2 weeks
2017-09-20 07:23:50 +00:00
Bryan Drewery
b3b6d7b406 Fix the raise tests.
- The exit probe was not appropriately filtered to only the known pid so it
  was firing on any random process that would exit rather the only the one
  we cared about.
- The dtest script executes the tst.raise*.exe in the background from
  POSIX sh without jobs control.  POSIX mandates that SIGINT be set to
  SIG_IGN in this case.  The test executable never actually tested that
  SIGINT could be caught despite trying to block and delay the signal.
  So the SIGINT sent from raise() is never actually received since it
  is ignored.  This could be fixed by calling 'trap - INT' from dtest
  before running the executable but I've opted to just use SIGUSR1
  instead in these specific tests rather than adding more logic to
  test that SIGINT is not ignored at startup.

These 2 issues meant that the tests would randomly work but only if a process
coincidentally exited during the test.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-09-15 19:48:48 +00:00
Li-Wen Hsu
da3086df18 Fix DTrace test tst_inet_ntop_d: remove definitions are already in libdtrace
We have D definitions for the named values in socket.h after r323253.  Remove
them in test script to prevent compiling failure.

Reviewed by:	markj, gnn
Differential Revision:	https://reviews.freebsd.org/D12334
2017-09-12 16:00:51 +00:00
Andriy Gapon
90354c3200 MFV r323107: 8414 Implemented zpool scrub pause/resume
illumos/illumos-gate@1702cce751
1702cce751

FreeBSD note:  rather than merging the zpool.8 update I copied the zpool
scrub section from the illumos zpool.1m to FreeBSD zpool.8 almost
verbatim.  Now that the illumos page uses the mdoc format, it was an
easier option.  Perhaps the change is not in perfect compliance with the
FreeBSD style, but I think that it is acceptible.

https://www.illumos.org/issues/8414
  This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167
  Currently, there is no way to pause a scrub. Pausing may be useful when
  the pool is busy with other I/O to preserve bandwidth.

  Description

  This patch adds the ability to pause and resume scrubbing.  This is achieved
  by maintaining a persistent on-disk scrub state.  While the state is 'paused'
  we do not scrub any more blocks.  We do however perform regular scan
  housekeeping such as freeing async destroyed and deadlist blocks while paused.

  Motivation and Context

  Scrub pausing can be an I/O intensive operation and people have been asking
  for the ability to pause a scrub for a while. This allows one to preserve scrub
  progress while freeing up bandwidth for other I/O.

Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>

MFC after:	2 weeks
2017-09-09 11:00:07 +00:00
Mark Johnston
9b0ddce6bc Use an updated copy of the CDDL header boilerplate from illumos.
Reported by:	Yuri Pankov <yuripv@gmx.com>
X-MFC with:	r322774
2017-08-21 22:26:49 +00:00
Mark Johnston
a3a7b74e18 Add a regression test for r322773.
MFC after:	1 week
2017-08-21 21:58:42 +00:00
Andriy Gapon
b4e4140d13 MFV r322223: 8378 crash due to bp in-memory modification of nopwrite block
illumos/illumos-gate@b7edcb9408
b7edcb9408

https://www.illumos.org/issues/8378
  The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which
  we then nopwrite against.
  zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr
  to zgd_bp, dbuf_write_ready()
  could change db_blkptr, and dbuf_write_done() could remove the dirty record.
  dmu_sync() then sees the stale
  BP and that the dbuf it not dirty, so it is eligible for nop-writing.
  The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the
  db_mtx. We could still see a stale
  db_blkptr, but if it is stale then the dirty record will still exist and thus
  we won't attempt to nopwrite.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-08-08 10:46:51 +00:00
Mark Johnston
22e406c80b Rework and simplify the ksyms(4) implementation.
- Store the symbol table contents in an anonymous swap-backed object. Have
  mmap(/dev/ksyms) map that object, and stop mapping the symbol table into
  the calling process in ksyms_open(). Previously we would cache a pointer
  to the pmap of the opening process, and mmap(/dev/ksyms) would create a
  mapping using the physical address found by a pmap lookup at the initial
  mapping address. However, this assumes that the cached pmap is valid,
  which may not be the case. [1]
- Remove the ksyms ioctl interface. It appears to have been added to work
  around a limitation in libelf that no longer exists; see r321842.
  Moreover, the interface is difficult to support and isn't present in
  illumos. Since ksyms was added specifically to support lockstat(1), it
  is expected that this removal won't have any real impact.
- Simplify ksyms_read() to avoid unnecessary copying.
- Don't call the device handle destructor if we fail to capture a snapshot
  of the kernel's symbol table. devfs will do that for us.

Reported by:	Ilja van Sprundel <ivansprundel@ioactive.com> [1]
Reviewed by:	kib (previous revision)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11789
2017-08-03 00:38:13 +00:00
Mark Johnston
3c9308e82b Remove local variables missed in r321842.
X-MFC with:	r321842
2017-08-01 04:52:03 +00:00
Mark Johnston
90ec6fd4ba Let lockstat use ksyms(4)'s mmap interface.
The workaround described in the deleted comment is no longer needed.

MFC after:	1 week
2017-08-01 04:49:54 +00:00
Li-Wen Hsu
ad89d783a6 Add an auxiliary subroutine to generate some events for testing
This test is also timeout on a quiet system because there is nobody triggering
read probefunc while test execution.

Reviewed by:	gnn, markj, ngie
Differential Revision:	https://reviews.freebsd.org/D11731
2017-07-26 12:07:46 +00:00
Li-Wen Hsu
637ba06516 Modify glob patterns and expected output to match FreeBSD's implementation.
Reviewed by:	gnn, markj, ngie
Differential Revision:	https://reviews.freebsd.org/D11713
2017-07-25 13:14:02 +00:00