freebsd-dev

Author	SHA1	Message	Date
Baptiste Daroussin	e7be6f4ed4	Fix a regression introduced in r344569 Import a fix from illumos (thanks Toomas Soomas for pointing at it) See https://www.illumos.org/issues/10205 for more details Illumos commit: `247b7da039` Submitted by: jack@gandi.net Reported by: cy Reviewed by: tsoome, cy, bapt Obtained from: Illumos	2019-02-27 13:49:41 +00:00
Baptiste Daroussin	c7851c5b7c	Fix regression introduced in r344569 Reported by: cy Tested by: cy Submitted by: Fatih Acar <fatih@gandi.net>	2019-02-27 07:55:53 +00:00
Sean Eric Fagan	50792eb553	Set process title during zfs send. This adds a '-V' option to 'zfs send', which sets the process title once a second to the progress information. This code has been in FreeNAS for a long time now; this is just upstreaming it here. It was originially written by delphij. Reviewed by: mav Obtained from: iXsystems, Inc Sponsored by: iXsystems, Inc Differential Revision: https://reviews.freebsd.org/D19184	2019-02-26 19:23:22 +00:00
Baptiste Daroussin	0b858c82d8	Implement parallel mounting for ZFS filesystem It was first implemented on Illumos and then ported to ZoL. This patch is a port to FreeBSD of the ZoL version. This patch also includes a fix for a race condition that was amended With such patch Delphix has seen a huge decrease in latency of the mount phase (https://github.com/openzfs/openzfs/commit/a3f0e2b569 for details). With that current change Gandi has measured improvments that are on par with those reported by Delphix. Zol commits incorporated: `a10d50f999` `e63ac16d25` Reviewed by: avg, sef Approved by: avg, sef Obtained from: ZoL MFC after: 1 month Relnotes: yes Sponsored by: Gandi.net Differential Revision: https://reviews.freebsd.org/D19098	2019-02-26 08:18:34 +00:00
Leandro Lupori	559af1ec16	Increase ctfconvert buffer size Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D19353	2019-02-25 18:52:47 +00:00
Mark Johnston	9747bd8e02	MFV r344364: 9058 postmortem DTrace frequently broken under vmware illumos/illumos-gate@793bd7e361 MFC after: 1 week	2019-02-20 17:10:30 +00:00
Andriy Gapon	885b0f9e91	zpool.8: sort zpool status flags in the same order as in illumos manual Just in case, while I was here. MFC after: 1 week	2019-02-20 13:37:27 +00:00
Andriy Gapon	d97ff345cf	zpool.8: document -D flag for zpool status The description is taken from the illumos manual. Reported by: stilezy@gmail.com MFC after: 1 week	2019-02-20 13:34:16 +00:00
Andriy Gapon	30d6475b3f	fix userland illumos taskq code to pass relative timeout to cv_timedwait Unlike illumos, FreeBSD cv_timedwait requires a relative timeout. That applies both to the kernel illumos compatibility code and to the userland "fake kernel" code. MFC after: 2 weeks Sponsored by: Panzura	2019-02-20 13:19:08 +00:00
Andriy Gapon	4c325393f3	MFV r342532: 5882 Temporary pool names Note that this commit brings only formatting changes that were done during the final review of the illumos change, because FreeBSD got the main changes before illumos. illumos/illumos-gate@04e5635652 `04e5635652` https://www.illumos.org/issues/5882 This is an import of the temporary pool names functionality from ZoL: `e2282ef57e` `26b42f3f9d` `2f3ec90061` `00d2a8c92f` `83e9986f6e` `023bbe6f01` It is intended to assist the creation and management of virtual machines that have their rootfs on ZFS on hosts that also have their rootfs on ZFS. These situations cause SPA namespace collisions when the standard name rpool is used in both cases. The solution is either to give each guest pool a name unique to the host, which is not always desireable, or boot a VM environment containing an ISO image to install it, which is cumbersome. MFC after: 1 week Sponsored by: Panzura	2018-12-26 11:03:14 +00:00
Andriy Gapon	f050611e7f	MFV r342469: 9630 add lzc_rename and lzc_destroy to libzfs_core illumos/illumos-gate@049ba636fa `049ba636fa` https://www.illumos.org/issues/9630 Rename and destroy are very useful operations that deserve to be in libzfs_core. And they are not hard to implement too. MFC after: 2 weeks Relnotes: maybe	2018-12-26 10:37:41 +00:00
Yuri Pankov	65c9ed85e4	dtrace(1): remove reference to dtruss that was removed from base system in r300226. PR: 211618 Reviewed by: gnn, markj, 0mp Approved by: kib (mentor, implicit) Differential Revision: https://reviews.freebsd.org/D17762	2018-10-31 15:29:26 +00:00
Michael Tuexen	1e88cc8b59	Add support for send, receive and state-change DTrace providers for SCTP. They are based on what is specified in the Solaris DTrace manual for Solaris 11.4. Reviewed by: 0mp, dteske, markj Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16839	2018-08-22 21:23:32 +00:00
Mark Johnston	f0af0b312f	Add partial documentation for dtrace(1)'s -x configuration options. Some options are still missing descriptions, but they can be filled in over time. Submitted by: raichoo <raichoo@googlemail.com> Reviewed by: 0mp (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16671	2018-08-16 19:28:44 +00:00
Matt Macy	cc0fbbb92e	MFV/ZoL: Implement large_dnode pool feature commit `50c957f702` Author: Ned Bass <bass6@llnl.gov> Date: Wed Mar 16 18:25:34 2016 -0700 Implement large_dnode pool feature Justification ------------- This feature adds support for variable length dnodes. Our motivation is to eliminate the overhead associated with using spill blocks. Spill blocks are used to store system attribute data (i.e. file metadata) that does not fit in the dnode's bonus buffer. By allowing a larger bonus buffer area the use of a spill block can be avoided. Spill blocks potentially incur an additional read I/O for every dnode in a dnode block. As a worst case example, reading 32 dnodes from a 16k dnode block and all of the spill blocks could issue 33 separate reads. Now suppose those dnodes have size 1024 and therefore don't need spill blocks. Then the worst case number of blocks read is reduced to from 33 to two--one per dnode block. In practice spill blocks may tend to be co-located on disk with the dnode blocks so the reduction in I/O would not be this drastic. In a badly fragmented pool, however, the improvement could be significant. ZFS-on-Linux systems that make heavy use of extended attributes would benefit from this feature. In particular, ZFS-on-Linux supports the xattr=sa dataset property which allows file extended attribute data to be stored in the dnode bonus buffer as an alternative to the traditional directory-based format. Workloads such as SELinux and the Lustre distributed filesystem often store enough xattr data to force spill bocks when xattr=sa is in effect. Large dnodes may therefore provide a performance benefit to such systems. Other use cases that may benefit from this feature include files with large ACLs and symbolic links with long target names. Furthermore, this feature may be desirable on other platforms in case future applications or features are developed that could make use of a larger bonus buffer area. Implementation -------------- The size of a dnode may be a multiple of 512 bytes up to the size of a dnode block (currently 16384 bytes). A dn_extra_slots field was added to the current on-disk dnode_phys_t structure to describe the size of the physical dnode on disk. The 8 bits for this field were taken from the zero filled dn_pad2 field. The field represents how many "extra" dnode_phys_t slots a dnode consumes in its dnode block. This convention results in a value of 0 for 512 byte dnodes which preserves on-disk format compatibility with older software. Similarly, the in-memory dnode_t structure has a new dn_num_slots field to represent the total number of dnode_phys_t slots consumed on disk. Thus dn->dn_num_slots is 1 greater than the corresponding dnp->dn_extra_slots. This difference in convention was adopted because, unlike on-disk structures, backward compatibility is not a concern for in-memory objects, so we used a more natural way to represent size for a dnode_t. The default size for newly created dnodes is determined by the value of a new "dnodesize" dataset property. By default the property is set to "legacy" which is compatible with older software. Setting the property to "auto" will allow the filesystem to choose the most suitable dnode size. Currently this just sets the default dnode size to 1k, but future code improvements could dynamically choose a size based on observed workload patterns. Dnodes of varying sizes can coexist within the same dataset and even within the same dnode block. For example, to enable automatically-sized dnodes, run # zfs set dnodesize=auto tank/fish The user can also specify literal values for the dnodesize property. These are currently limited to powers of two from 1k to 16k. The power-of-2 limitation is only for simplicity of the user interface. Internally the implementation can handle any multiple of 512 up to 16k, and consumers of the DMU API can specify any legal dnode value. The size of a new dnode is determined at object allocation time and stored as a new field in the znode in-memory structure. New DMU interfaces are added to allow the consumer to specify the dnode size that a newly allocated object should use. Existing interfaces are unchanged to avoid having to update every call site and to preserve compatibility with external consumers such as Lustre. The new interfaces names are given below. The versions of these functions that don't take a dnodesize parameter now just call the _dnsize() versions with a dnodesize of 0, which means use the legacy dnode size. New DMU interfaces: dmu_object_alloc_dnsize() dmu_object_claim_dnsize() dmu_object_reclaim_dnsize() New ZAP interfaces: zap_create_dnsize() zap_create_norm_dnsize() zap_create_flags_dnsize() zap_create_claim_norm_dnsize() zap_create_link_dnsize() The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The spa_maxdnodesize() function should be used to determine the maximum bonus length for a pool. These are a few noteworthy changes to key functions: * The prototype for dnode_hold_impl() now takes a "slots" parameter. When the DNODE_MUST_BE_FREE flag is set, this parameter is used to ensure the hole at the specified object offset is large enough to hold the dnode being created. The slots parameter is also used to ensure a dnode does not span multiple dnode blocks. In both of these cases, if a failure occurs, ENOSPC is returned. Keep in mind, these failure cases are only possible when using DNODE_MUST_BE_FREE. If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0. dnode_hold_impl() will check if the requested dnode is already consumed as an extra dnode slot by an large dnode, in which case it returns ENOENT. * The function dmu_object_alloc() advances to the next dnode block if dnode_hold_impl() returns an error for a requested object. This is because the beginning of the next dnode block is the only location it can safely assume to either be a hole or a valid starting point for a dnode. * dnode_next_offset_level() and other functions that iterate through dnode blocks may no longer use a simple array indexing scheme. These now use the current dnode's dn_num_slots field to advance to the next dnode in the block. This is to ensure we properly skip the current dnode's bonus area and don't interpret it as a valid dnode. zdb --- The zdb command was updated to display a dnode's size under the "dnsize" column when the object is dumped. For ZIL create log records, zdb will now display the slot count for the object. ztest ----- Ztest chooses a random dnodesize for every newly created object. The random distribution is more heavily weighted toward small dnodes to better simulate real-world datasets. Unused bonus buffer space is filled with non-zero values computed from the object number, dataset id, offset, and generation number. This helps ensure that the dnode traversal code properly skips the interior regions of large dnodes, and that these interior regions are not overwritten by data belonging to other dnodes. A new test visits each object in a dataset. It verifies that the actual dnode size matches what was stored in the ztest block tag when it was created. It also verifies that the unused bonus buffer space is filled with the expected data patterns. ZFS Test Suite -------------- Added six new large dnode-specific tests, and integrated the dnodesize property into existing tests for zfs allow and send/recv. Send/Receive ------------ ZFS send streams for datasets containing large dnodes cannot be received on pools that don't support the large_dnode feature. A send stream with large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be unrecognized by an incompatible receiving pool so that the zfs receive will fail gracefully. While not implemented here, it may be possible to generate a backward-compatible send stream from a dataset containing large dnodes. The implementation may be tricky, however, because the send object record for a large dnode would need to be resized to a 512 byte dnode, possibly kicking in a spill block in the process. This means we would need to construct a new SA layout and possibly register it in the SA layout object. The SA layout is normally just sent as an ordinary object record. But if we are constructing new layouts while generating the send stream we'd have to build the SA layout object dynamically and send it at the end of the stream. For sending and receiving between pools that do support large dnodes, the drr_object send record type is extended with a new field to store the dnode slot count. This field was repurposed from unused padding in the structure. ZIL Replay ---------- The dnode slot count is stored in the uppermost 8 bits of the lr_foid field. The bits were unused as the object id is currently capped at 48 bits. Resizing Dnodes --------------- It should be possible to resize a dnode when it is dirtied if the current dnodesize dataset property differs from the dnode's size, but this functionality is not currently implemented. Clearly a dnode can only grow if there are sufficient contiguous unused slots in the dnode block, but it should always be possible to shrink a dnode. Growing dnodes may be useful to reduce fragmentation in a pool with many spill blocks in use. Shrinking dnodes may be useful to allow sending a dataset to a pool that doesn't support the large_dnode feature. Feature Reference Counting -------------------------- The reference count for the large_dnode pool feature tracks the number of datasets that have ever contained a dnode of size larger than 512 bytes. The first time a large dnode is created in a dataset the dataset is converted to an extensible dataset. This is a one-way operation and the only way to decrement the feature count is to destroy the dataset, even if the dataset no longer contains any large dnodes. The complexity of reference counting on a per-dnode basis was too high, so we chose to track it on a per-dataset basis similarly to the large_block feature. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3542	2018-08-12 00:45:53 +00:00
Alexander Leidinger	a079a34fd5	Extend the info about the limitations of datasets in jails. Reviewed by: allanjude Sponsored by: Essen Hackathon	2018-08-11 20:49:19 +00:00
Mark Johnston	0b56e7a8e9	Disable the D subroutines msgsize() and msgdsize(). They are specific to illumos and the corresponding DIF subroutines are already disabled on FreeBSD. Reported by: gnn	2018-08-10 19:23:20 +00:00
Matt Macy	648cfe57fd	Performance optimization of AVL tree comparator functions MFV: commit `ee36c709c3` Author: Gvozden Neskovic <neskovic@gmail.com> Date: Sat Aug 27 20:12:53 2016 +0200 perf: 2.75x faster ddt_entry_compare() First 256bits of ddt_key_t is a block checksum, which are expected to be close to random data. Hence, on average, comparison only needs to look at first few bytes of the keys. To reduce number of conditional jump instructions, the result is computed as: sign(memcmp(k1, k2)). Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1} , which is computed efficiently. Synthetic performance evaluation of original and new algorithm over 1G random keys on 2.6GHz Intel(R) Xeon(R) CPU E5-2660 v3: old 6.85789 s new 2.49089 s perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare() Compute the result directly instead of using conditionals perf: zfs_range_compare() Speedup between 1.1x - 2.5x, depending on compiler version and optimization level. perf: spa_error_entry_compare() `bcmp()` is not suitable for comparator use. Use `memcmp()` instead. perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare() perf: 2.8x faster zil_bp_compare() perf: 2.8x faster mze_compare() perf: faster dbuf_compare() perf: faster compares in spa_misc perf: 2.8x faster layout_hash_compare() perf: 2.8x faster space_reftree_compare() perf: libzfs: faster avl tree comparators perf: guid_compare() perf: dsl_deadlist_compare() perf: perm_set_compare() perf: 2x faster range_tree_seg_compare() perf: faster unique_compare() perf: faster vdev_cache _compare() perf: faster vdev_uberblock_compare() perf: faster fuid _compare() perf: faster zfs_znode_hold_compare() Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Richard Elling <richard.elling@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5033	2018-08-10 06:42:08 +00:00
Alexander Motin	07ddc55096	MFV r337223: 9580 Add a hash-table on top of nvlist to speed-up operations illumos/illumos-gate@2ec7644aab Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2018-08-03 01:52:25 +00:00
Alexander Motin	0285589b38	MFV 337214: 9621 Make createtxg and guid properties public illumos/illumos-gate@e8d4a73c86 Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Yuri Pankov <yuripv@yuripv.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Josh Paetzel <josh@tcbug.org>	2018-08-03 00:24:27 +00:00
Alexander Motin	d49e9be14f	MFV r337184: 9457 libzfs_import.c:add_config() has a memory leak A memory leak occurs on lines 209 and 213 because the config is not freed in the error case. The interface to add_config() seems less than ideal - it would be better if it copied any data necessary from the config and the caller freed it. illumos/illumos-gate@ddfe901b12 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: sara hartse <sara.hartse@delphix.com>	2018-08-02 21:25:32 +00:00
Alexander Motin	2bce9a5316	MFV r337182: 9330 stack overflow when creating a deeply nested dataset Datasets that are deeply nested (~100 levels) are impractical. We just put a limit of 50 levels to newly created datasets. Existing datasets should work without a problem. illumos/illumos-gate@5ac95da7d6 Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>	2018-08-02 21:19:35 +00:00
Alexander Motin	ac879e61ad	9523 Large alloc in zdb can cause trouble 16MB alloc in zdb_embedded_block() can cause cores in certain situations (clang, gcc55). OsX commit: `ced236a5da` FreeBSD commit: https://svnweb.freebsd.org/base?view=revision&revision=326150 illumos/illumos-gate@03a4c2f4bf Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Jorgen Lundman <lundman@lundman.net> This is an update for r326150 (by avg), where this change comes from.	2018-08-02 20:44:07 +00:00
Alexander Motin	c423a5e6b8	MFV r337161: 9512 zfs remap poolname@snapname coredumps Only filesystems and volumes are valid "zfs remap" parameters: when passed a snapshot name zfs_remap_indirects() does not handle the EINVAL returned from libzfs_core, which results in failing an assertion and consequently crashing. illumos/illumos-gate@0b2e825398 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Approved by: Matt Ahrens <mahrens@delphix.com> Author: loli10K <ezomori.nozomu@gmail.com>	2018-08-02 19:13:45 +00:00
Alexander Motin	7fca1b93c4	Do not blindly include illumos kernel headers instead of user-space. It is not needed now, and I doubt it much helped at all, creating more confusions then good.	2018-08-02 18:55:55 +00:00
Alexander Motin	b59e9cd1c0	MFV r316926: 7955 libshare needs to initialize only those datasets being modified by the consumer illumos/illumos-gate@8a981c3356 `8a981c3356` https://www.illumos.org/issues/7955 Libshare currently initializes all available filesystems when doing any libshare operation. This requires iterating through all the filesystem multiple times, which is a huge performance problem for sharing and unsharing operations. Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Author: Daniel Hoffman <dj.hoffman@delphix.com> For FreeBSD this is practically a NOP, just a diff reduction.	2018-08-01 21:51:49 +00:00
Michael Tuexen	7bda966394	Add a dtrace provider for UDP-Lite. The dtrace provider for UDP-Lite is modeled after the UDP provider. This fixes the bug that UDP-Lite packets were triggering the UDP provider. Thanks to dteske@ for providing the dwatch module. Reviewed by: dteske@, markj@, rrs@ Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16377	2018-07-31 22:56:03 +00:00
Alexander Motin	200c27a75d	MFV r337014: 9421 zdb should detect and print out the number of "leaked" objects 9422 zfs diff and zdb should explicitly mark objects that are on the deleted queue illumos/illumos-gate@20b5dafb42 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Matt Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>	2018-07-31 22:50:50 +00:00
Alexander Motin	0021e1c10c	MFV r336991, r337001: 9102 zfs should be able to initialize storage devices The first access to a disk block can incur a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore it is recommended that volumes be "thick provisioned", where supported by the platform (VMware). Thick provisioning is time consuming and often is ignored. If the thick provision step is omitted, customers will see suboptimal performance until we have written to all parts of the LUN. ZFS should be able to initialize any unused storage to remove any first-write penalty that exists. illumos/illumos-gate@094e47e980 Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: George Wilson <george.wilson@delphix.com>	2018-07-31 21:06:04 +00:00
Alexander Motin	d1cf4052d0	MFV r336955: 9236 nuke spa_dbgmsg We should use zfs_dbgmsg instead of spa_dbgmsg. Or at least, metaslab_condense() should call zfs_dbgmsg because it's important and rare enough to always log. It's possible that the message in zio_dva_allocate() would be too high-frequency for zfs_dbgmsg. illumos/illumos-gate@21f7c81cc1 Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2018-07-31 00:47:27 +00:00
Alexander Motin	194000fa21	MFV r336950: 9290 device removal reduces redundancy of mirrors Mirrors are supposed to provide redundancy in the face of whole-disk failure and silent damage (e.g. some data on disk is not right, but ZFS hasn't detected the whole device as being broken). However, the current device removal implementation bypasses some of the mirror's redundancy. illumos/illumos-gate@3a4b1be953 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Tim Chase <tim@chase2k.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2018-07-31 00:25:39 +00:00
Alexander Motin	6413a6d31f	MFV r336946: 9238 ZFS Spacemap Encoding V2 The current space map encoding has the following disadvantages: [1] Assuming 512 sector size each entry can represent at most 16MB for a segment. This makes the encoding very inefficient for large regions of space. [2] As vdev-wide space maps have started to be used by new features (i.e. device removal, zpool checkpoint) we've started imposing limits in the vdevs that can be used with them based on the maximum addressable offset (currently 64PB for a top-level vdev). The new remains backwards compatible with the old one. The introduced two-word entry format, besides extending the limits imposed by the single-entry layout, also includes a vdev field and some extra padding after its prefix. The extra padding after the prefix should is reserved for future usage (e.g. new prefixes for future encodings or new fields for flags). The new vdev field not only makes the space maps more self-descriptive, but also opens the doors for pool-wide space maps. One final important note is that the number of bits used for vdevs is reduced to 24 bits for blkptrs. That was decided as we don't know of any setups that use more than 16M vdevs for the time being and we wanted to fit the vdev field in the space map. In addition that gives us some extra bits in dva_t. illumos/illumos-gate@17f11284b4 Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2018-07-30 23:47:38 +00:00
Alexander Motin	1960706625	MFV r336944: 9286 want refreservation=auto When a ZFS volume is created with zfs create -V (but without -s), the refreservation property is set to a value that is volsize plus the maximum size of metadata. If refreservation is ever set to another value, it is impossible to set it back to the automatically determined value. There are other cases where refreservation may be wrong. These include receiving a volume that was sent without properties and zfs clone. We need: zfs set refreservation=auto <volume> zfs clone -o refreservation=auto <volume> Each one would use the same function used by zfs create -V to determine the proper value for refreservation. illumos/illumos-gate@1c10ae76c0 Reviewed by: Allan Jude <allanjude@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Mike Gerdts <mike.gerdts@joyent.com>	2018-07-30 22:39:30 +00:00
Michael Tuexen	53e0911116	Improve TCP related tests for dtrace. Ensure that the TCP connections are terminated gracefully as expected by the test. Use appropriate numbers for sent/received packets. In addition, enable tst.localtcpstate.ksh, which should pass, but doesn't until https://reviews.freebsd.org/D16369 is committed. Reviewed by: markj@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16288	2018-07-22 10:50:59 +00:00
Michael Tuexen	be029a4979	Test that the dtrace UDP receive probe fires. This test ensures that the fix committed in https://svnweb.freebsd.org/changeset/base/336551 actually works. Reviewed by: dteske@, markj@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16046	2018-07-20 15:37:29 +00:00
Michael Tuexen	10b803a40d	Adjust comment to reality since r286171. Sponsored by: Netflix, Inc.	2018-07-15 20:42:47 +00:00
Michael Tuexen	e0f9b8233f	Don't require a local sshd for the local TCP state dtrace test This change is similar to the one done in r286171 for tst.ipv4localtcp.ksh. This not only reduces the requirements on the system used for testing but results also in a graceful teardown of the TCP connection. Reviewed by: gnn@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16276	2018-07-15 20:41:16 +00:00
Michael Tuexen	dc9f20b3f3	Fix the UDP tests for dtrace. The code imported from opensolaris was depending on ping supporting UDP for sending probes. Since this is not supported by ping on FreeBSD use a perl script instead. The remote test requires the usage of ksh93, so state that in the sheband. Enable the local test, but keep the remote test disabled, since it requires a remote machine on the LAN. Reviewed by: markj@, gnn@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16268	2018-07-15 20:34:22 +00:00
Michael Tuexen	60e4fb3a3f	Return the intended return code. This bug was spotted by markj@ in D16268 because I copied this code part and used it there. So fix it. Sponsored by: Netflix, Inc.	2018-07-14 19:53:41 +00:00
Michael Tuexen	49d7124b18	Fix shebangs and execute bit of test scripts. Since we don't have /usr/bin/ksh, use a generic way of specifying ksh. Some of the tests only run with ksh93, so use this shell for these tests. Two of the tests don't have the execute bit set, so fix this, too. Reviewed by: markj@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16270	2018-07-14 19:49:14 +00:00
Eric van Gyzen	79d4ee83de	Fix markup in zfs(8); no content change Sponsored by: Dell EMC	2018-06-15 15:28:31 +00:00
Mark Johnston	ecbde90073	Process CUs with a language attribute of DW_LANG_Mips_Assembler. At the moment ctfconvert(1) does not do much with such CUs, but that may not be true in the future, and we run ctfconvert on several assembly files during the build. X-MFC with: r334883	2018-06-11 16:33:36 +00:00
Mark Johnston	c5fda9bac0	Don't process DWARF generated from non-C/C++ code. ctfconvert(1) is not designed to handle DWARF generated from such code, and will generally fail in non-obvious ways. Use an explicit check to help catch such potential failures. Reported by: Johannes Lundberg <johalun0@gmail.com> MFC after: 2 weeks	2018-06-09 15:10:49 +00:00
Sean Eric Fagan	69724399c4	This originated from ZFS On Linux, as `d4a72f2386` During scans (scrubs or resilvers), it sorts the blocks in each transaction group by block offset; the result can be a significant improvement. (On my test system just now, which I put some effort to introduce fragmentation into the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the changes.) I've seen similar rations on production systems. Approved by: Alexander Motin Obtained from: ZFS On Linux Relnotes: Yes (improved scrub performance, with tunables) Differential Revision: https://reviews.freebsd.org/D15562	2018-06-08 17:38:28 +00:00
Matt Macy	1f52c1db90	ctf dwarf: don't report "no dwarf entry" as if it were an error	2018-05-19 18:50:58 +00:00
Matt Macy	d0ba1baed3	ctfconvert: silence useless enum has too many values warning	2018-05-19 06:31:17 +00:00
Sean Bruno	86784a0a22	Cleanup sundry clang warnings for code that is not upstream in illumos. https://github.com/illumos/illumos-gate/edit/master/usr/src/lib/libzfs/common/libzfs_sendrecv.c Patch our version of it to quiesce warnings until someone decides to sync up our code: libzfs_sendrecv.c:2555:30: warning: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat] sprintf(guidname, "%lu", thisguid); ~~~ ^~~~~~~~ %llu libzfs_sendrecv.c:2612:29: warning: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat] sprintf(guidname, "%lu", parent_fromsnap_guid); ~~~ ^~~~~~~~~~~~~~~~~~~~ %llu libzfs_sendrecv.c:2645:29: warning: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat] sprintf(guidname, "%lu", parent_fromsnap_guid); ~~~ ^~~~~~~~~~~~~~~~~~~~ %llu Reviewed by: allanjude Differential Revision: https://reviews.freebsd.org/D15325	2018-05-06 16:22:02 +00:00
Eitan Adler	4822188974	zpool(8): correct list of default properties in 'list'. The default provides output in the following form: ``` NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT ``` this corrects the man page. Also submitted upstream as https://github.com/openzfs/openzfs/pull/632/files (with slightly different changes needed)	2018-04-28 01:14:16 +00:00
Alexander Motin	e4d6c7fc17	MFV man pages update from r329502: 7614 zfs device evacuation/removal. MFC after: 3 days	2018-04-17 02:33:54 +00:00
Andriy Gapon	81f187e576	allow ZFS pool to have temporary name for duration of current import The change adds -t <name> option to zpool create and -t option to zpool import in its form with an old name and a new name. This allows to import (or create) a pool under a name that's different from its real, permanent name without affecting that name. This is useful when working with VM images or images of other physical systems if they happen to have a ZFS pool with the same name as the host system. The changes come from ZoL with some small tweaks. The porting has been done by julian. The change is being submitted to OpenZFS: https://github.com/openzfs/openzfs/pull/600 Submitted by: julian Reviewed by: smh MFC after: 2 weeks Sponsored by: Panzura (porting) Differential Revision: https://reviews.freebsd.org/D14972	2018-04-12 10:37:26 +00:00
Alexander Motin	849a7ce2d5	MFV r331712: 9280 Assertion failure while running removal_with_ganging test with 4K devices illumos/illumos-gate@243952c7ee Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matt Ahrens <Matt.Ahrens@delphix.com>	2018-03-28 23:17:29 +00:00
Alexander Motin	5c4561f332	MFV r331706: 9235 rename zpool_rewind_policy_t to zpool_load_policy_t illumos/illumos-gate@5dafeea3eb We want to be able to pass various settings during import/open of a pool, which are not only related to rewind. Instead of adding a new policy and duplicate a bunch of code, we should just rename rewind_policy to a more generic term like load_policy. For instance, we'd like to set spa->spa_import_flags from the nvlist, rather from a flags parameter passed to spa_import as in some cases we want those flags not only for the import case, but also for the open case. One such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would allow zfs to open a pool when logs are missing. Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com>	2018-03-28 22:29:06 +00:00
Alexander Motin	0b0c76bc58	MFV r331695, 331700: 9166 zfs storage pool checkpoint illumos/illumos-gate@8671400134 The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with exactly that. It can be thought of as a “pool-wide snapshot” (or a variation of extreme rewind that doesn’t corrupt your data). It remembers the entire state of the pool at the point that it was taken and the user can revert back to it later or discard it. Its generic use case is an administrator that is about to perform a set of destructive actions to ZFS as part of a critical procedure. She takes a checkpoint of the pool before performing the actions, then rewinds back to it if one of them fails or puts the pool into an unexpected state. Otherwise, she discards it. With the assumption that no one else is making modifications to ZFS, she basically wraps all these actions into a “high-level transaction”. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>	2018-03-28 22:01:27 +00:00
Alexander Motin	4bada7a0a2	Partial MFV r329753: 8809 libzpool should leverage work done in libfakekernel illumos/illumos-gate@f06dce2c1f Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Gordon Ross <gordon.w.ross@gmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Andrew Stormont <astormont@racktopsystems.com> We do not have libfakekernel, but need to reduce code divergence.	2018-03-28 20:41:15 +00:00
Conrad Meyer	2f68cdb944	ctfconvert: Fix minor memory leaks in STABS parser In an error case, free leaked objects. Does anything use STABS anymore? Probably not. Reported by: Coverity Sponsored by: Dell EMC Isilon	2018-03-27 22:49:06 +00:00
Conrad Meyer	52f72944b8	ctfconvert/ctfmerge: Fix a memory leak enumerating DWARF files Reported by: Coverity Sponsored by: Dell EMC Isilon	2018-03-26 23:20:37 +00:00
Conrad Meyer	f5147e312f	libctf: Don't construct pointers to out of bounds array offsets Just attempting to do the pointer arithmetic is undefined behavior. No functional change intended. Reported by: Coverity Sponsored by: Dell EMC Isilon	2018-03-26 22:02:36 +00:00
Conrad Meyer	e796cc77c5	libctf: Appease Coverity overrun warnings Rather than zeroing and reading into the a smaller union member the full union size, just zero and read directly into the union. No functional change intended. Reported by: Coverity Sponsored by: Dell EMC Isilon	2018-03-26 21:57:44 +00:00
Alexander Motin	e76e77a972	MFV r331407: 9213 zfs: sytem typo illumos/illumos-gate@edc8ef7d92 Reviewed by: C Fraire <cfraire@me.com> Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk> Approved by: Joshua M. Clulow <josh@sysmgr.org> Author: Toomas Soome <tsoome@me.com>	2018-03-23 02:30:29 +00:00
Alexander Motin	b8436536c9	MFV r331400: 8484 Implement aggregate sum and use for arc counters In pursuit of improving performance on multi-core systems, we should implements fanned out counters and use them to improve the performance of some of the arc statistics. These stats are updated extremely frequently, and can consume a significant amount of CPU time. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>	2018-03-23 02:15:05 +00:00
Mark Johnston	df7165b88e	Given hidden visibility to symbols referenced by the DOF section. MFC after: 1 week	2018-03-19 19:32:05 +00:00
Mark Johnston	4ed6321f41	Use __syscall(2) rather than syscall(2) in syscall/tst.args.c. Some of mmap(2)'s arguments are 64 bits wide. MFC after: 3 days	2018-03-18 17:03:26 +00:00
Conrad Meyer	a8c03de86d	libdtrace: Fix another uninitialized dtt_flags UB Like r331073, eliminate a UB by fully initializing the struct with a designated initializer. Note that the similar src_dtt is not fully used, so a similar treatment was not absolutely required. I chose to leave it alone. It wouldn't hurt to do the same thing, though. Reported by: Coverity Sponsored by: Dell EMC Isilon	2018-03-16 21:10:36 +00:00
Conrad Meyer	1ad2da031e	libdtrace: Eliminate a minor UB by fully initializing parameter struct The dtt_flags value is dereferenced by dt_type_pointer() and must be initialized first. Reported by: Coverity Sponsored by: Dell EMC Isilon	2018-03-16 20:43:40 +00:00
Alan Somers	50b779bdc9	ZFS: fix adding vdevs to very large pools r323791 changed the return value of zpool_read_label. Error paths that previously returned 0 began to return -1 instead. However, not all error paths initialized errno. When adding vdevs to a very large pool, errno could be prepopulated with ENOMEM, causing the operation to fail. Fix the bug by setting errno=ENOENT in the case that no ZFS label is found. PR: 226096 Submitted by: Nikita Kozlov Reviewed by: avg MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D13088	2018-03-02 21:26:48 +00:00
Alan Somers	7d3761dc72	Don't declare __assfail as static It gets called by dmu_buf_init_user, which is inline but not static. So it needs global linkage itself. Reported by: GCC-6 MFC after: 17 days X-MFC-With: 329722	2018-02-25 14:29:43 +00:00
Alexander Motin	03d54eb339	MFV r329807: 8940 Sending an intra-pool resumable send stream may result in EXDEV illumos/illumos-gate@544132fce3 "zfs send -t <token>" for an incremental send should be able to resume successfully when sending to the same pool: a subtle issue in zfs_iter_children() doesn't currently allow this. Because resuming from a token requires "guid" -> "dataset" mapping (guid_to_name()), we have to walk the whole hierarchy to find the right snapshots to send. When resuming an incremental send both source and destination live in the same pool and have the same guid: this is where zfs_iter_children() gets confused and picks up the wrong snapshot, so we end up trying to send an incremental "destination@snap1 -> source@snap2" stream instead of "source@snap1 -> source@snap2": this fails with an "Invalid cross-device link" (EXDEV) error. Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Author: loli10K <ezomori.nozomu@gmail.com>	2018-02-22 04:01:55 +00:00
Alexander Motin	1ea10a60f9	MFV r329793, r329795: 9075 Improve ZFS pool import/load process and corrupted pool recovery illumos/illumos-gate@6f7938128a Some work has been done lately to improve the debugability of the ZFS pool load (and import) process. This includes: https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's To iterate on top of that, there's a few changes that were made to make the import process more resilient and crash free. One of the first tasks during the pool load process is to parse a config provided from userland that describes what devices the pool is composed of. A vdev tree is generated from that config, and then all the vdevs are opened. The Meta Object Set (MOS) of the pool is accessed, and several metadata objects that are necessary to load the pool are read. The exact configuration of the pool is also stored inside the MOS. Since the configuration provided from userland is external and might not accurately describe the vdev tree of the pool at the txg that is being loaded, it cannot be relied upon to safely operate the pool. For that reason, the configuration in the MOS is read early on. In the past, the two configurations were compared together and if there was a mismatch then the load process was aborted and an error was returned. The latter was a good way to ensure a pool does not get corrupted, however it made the pool load process needlessly fragile in cases where the vdev configuration changed or the userland configuration was outdated. Since the MOS is stored in 3 copies, the configuration provided by userland doesn't have to be perfect in order to read its contents. Hence, a new approach has been adopted: The pool is first opened with the untrusted userland configuration just so that the real configuration can be read from the MOS. The trusted MOS configuration is then used to generate a new vdev tree and the pool is re-opened. When the pool is opened with an untrusted configuration, writes are disabled to avoid accidentally damaging it. During reads, some sanity checks are performed on block pointers to see if each DVA points to a known vdev; when the configuration is untrusted, instead of panicking the system if those checks fail we simply avoid issuing reads to the invalid DVAs. This new two-step pool load process now allows rewinding pools accross vdev tree changes such as device replacement, addition, etc. Loading a pool from an external config file in a clustering environment also becomes much safer now since the pool will import even if the config is outdated and didn't, for instance, register a recent device addition. With this code in place, it became relatively easy to implement a long-sought-after feature: the ability to import a pool with missing top level (i.e. non-redundant) devices. Note that since this almost guarantees some loss Of data, this feature is for now restricted to a read-only import. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Author: Pavel Zakharov <pavel.zakharov@delphix.com>	2018-02-22 03:15:35 +00:00
Alexander Motin	613b0d87da	8942 zfs promote .../%recv should be an error illumos/illumos-gate@add927f8c8 Reported on the ZFSonLinux https://github.com/zfsonlinux/zfs/issues/4843, fixed by https://github.com/zfsonlinux/zfs/pull/6339: If we are in the middle of an incremental zfs receive, the child .../%recv will exist. If you concurrently run zfs promote .../%recv, it will "work", but then zfs gets confused. For example, there's no obvious way to destroy the containing filesystem (because it is now a clone of its invisible child). Attempting to do this promote should be an error. We could fix this by having zfs_ioc_promote() check if zc_name contains a %, similar to zfs_ioc_rename(). Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: loli10K <ezomori.nozomu@gmail.com>	2018-02-22 01:42:13 +00:00
Alexander Motin	502d18a8f1	MFV r329766: 8962 zdb should work on non-idle pools illumos/illumos-gate@e144c4e6c9 Currently `zdb` consistently fails to examine non-idle pools as it fails during the `spa_load()` process. The main problem seems to be that `spa_load_verify()` fails as can be seen below: $ sudo zdb -d -G dcenter zdb: can't open 'dcenter': I/O error ZFS_DBGMSG(zdb): spa_open_common: opening dcenter spa_load(dcenter): LOADING disk vdev '/dev/dsk/c4t11d0s0': best uberblock found for spa dcenter. txg 40824950 spa_load(dcenter): using uberblock with txg=40824950 spa_load(dcenter): UNLOADING spa_load(dcenter): RELOADING spa_load(dcenter): LOADING disk vdev '/dev/dsk/c3t10d0s0': best uberblock found for spa dcenter. txg 40824952 spa_load(dcenter): using uberblock with txg=40824952 spa_load(dcenter): FAILED: spa_load_verify failed [error=5] spa_load(dcenter): UNLOADING This change makes `spa_load_verify()` a dryrun when ran from `zdb`. This is done by creating a global flag in zfs and then setting it in `zdb`. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com>	2018-02-22 00:42:12 +00:00
Alexander Motin	b17bfcde3d	9018 Replace kmem_cache_reap_now() with kmem_cache_reap_soon() illumos/illumos-gate@36a64e6284 To prevent kmem_cache reaping from blocking other system resources, turn kmem_cache_reap_now() (which blocks) into kmem_cache_reap_soon(). Callers to kmem_cache_reap_soon() should use kmem_cache_reap_active(), which exploits #9017's new taskq_empty(). Reviewed by: Bryan Cantrill <bryan@joyent.com> Reviewed by: Dan McDonald <danmcd@joyent.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Yuri Pankov <yuripv@yuripv.net> Author: Tim Kordas <tim.kordas@joyent.com> FreeBSD does not use taskqueue for kmem caches reaping, so this change is less dramatic then it is on Illumos, just limiting reaping to 1 time per second. It may possibly be improved later, if needed.	2018-02-21 23:15:06 +00:00
Alexander Motin	24433f00ea	MFV r329502: 7614 zfs device evacuation/removal illumos/illumos-gate@5cabbc6b49 https://www.illumos.org/issues/7614: This project allows top-level vdevs to be removed from the storage pool with “zpool remove”, reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now “indirect”) vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become “obsolete” because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been “remapped” in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be “remapped” to their new (concrete) locations if possible. This process can be accelerated by using the “zfs remap” command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. Therefore, mirror and raidz devices can not be removed. Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Prashanth Sreenivasa <pks@delphix.com>	2018-02-21 16:51:02 +00:00
Andriy Gapon	cfb675a138	MFV r329718: 8520 7198 lzc_rollback_to should support rolling back to origin illumos/illumos-gate@95643f75d2 `95643f75d2` https://www.illumos.org/issues/8520 lzc_rollback_to() should support rolling back to a clone's origin. The current checks in zfs_ioc_rollback() would not allow that because the origin snapshot belongs to a different filesystem. The overly restrictive check was introduced in 7600, but it was not a regression as none of the existing tools provided a way to rollback to the origin. https://www.illumos.org/issues/7198 EINVAL is returned when a dataset does not have any snapshots, so there is nothing to roll back to. Although the code in zfs_do_rollback checks for that condition in advance, it's still possible that the snapshot(s) gets removed after the check and before the rollback sync task is executed. At the moment zfs command would crash when that happens. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 2 weeks	2018-02-21 15:12:14 +00:00
Alexander Motin	73c9b6d523	MFV r322231: 8430 dir_is_empty_readdir() doesn't properly handle error from fdopendir() illumos/illumos-gate@ba6e7e6505 `ba6e7e6505` https://www.illumos.org/issues/8430 we should close dirfd if fdopendir() fails. Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Sowrabha Gopal <sowrabha.gopal@delphix.com>	2018-02-21 02:21:22 +00:00
Alexander Motin	e5a4a83784	MFV r318941: 7446 zpool create should support efi system partition illumos/illumos-gate@7855d95b30 `7855d95b30` https://www.illumos.org/issues/7446 Since we support whole-disk configuration for boot pool, we also will need whole disk support with UEFI boot and for this, zpool create should create efi- system partition. I have borrowed the idea from oracle solaris, and introducing zpool create - B switch to provide an way to specify that boot partition should be created. However, there is still an question, how big should the system partition be. For time being, I have set default size 256MB (thats minimum size for FAT32 with 4k blocks). To support custom size, the set on creation "bootsize" property is created and so the custom size can be set as: zpool create B - o bootsize=34MB rpool c0t0d0 After pool is created, the "bootsize" property is read only. When -B switch is not used, the bootsize defaults to 0 and is shown in zpool get output with value ''. Older zfs/zpool implementations are ignoring this property. https://www.illumos.org/rb/r/219/ Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Dan McDonald <danmcd@kebe.com> Author: Toomas Soome <tsoome@me.com> This commit makes no sense for FreeBSD, that is why I blocked the option, but it should be good to stay closer to upstream.	2018-02-21 00:18:57 +00:00
Alexander Motin	502b48823a	MFV r316918: 7990 libzfs: snapspec_cb() does not need to call zfs_strdup() illumos/illumos-gate@d8584ba6fb `d8584ba6fb` https://www.illumos.org/issues/7990 The snapspec_cb() callback function in libzfs does not need to call zfs_strdup(). Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Marcel Telka <marcel@telka.sk>	2018-02-20 20:46:27 +00:00
Alexander Motin	67aa4cb2b9	MFV r316902: 7745 print error if lzc_* is called before libzfs_core_init illumos/illumos-gate@7c13517fff `7c13517fff` https://www.illumos.org/issues/7745 The problem is that consumers of `libZFS_Core` that forget to call `libzfs_core_init()` before calling any other function of the library are having a hard time realizing their mistake. The library's internal file descriptor is declared as global static, which is ok, but it is not initialized explicitly; therefore, it defaults to 0, which is a valid file descriptor. If `libzfs_core_init()`, which explicitly initializes the correct fd, is skipped, the ioctl functions return errors that do not have anything to do with `libZFS_Core`, where the problem is actually located. Even though assertions for that existed within `libZFS_Core` for debug builds, they were never enabled because the `-DDEBUG` flag was missing from the compiler flags. This patch applies the following changes: 1. It adds `-DDEBUG` for debug builds of `libZFS_Core` and `libzfs`, to enable their assertions on debug builds. 2. It corrects an assertion within `libzfs`, where a function had been spelled incorrectly (`zpool_prop_unsupported()`) and nobody knew because the `-DDEBUG` flag was missing, and the preprocessor was taking that part of the code away. 3. The library's internal fd is initialized to `-1` and `VERIFY` assertions have been placed to check that the fd is not equal to `-1` before issuing any ioctl. It is important here to note, that the `VERIFY` assertions exist in both debug and non-debug builds. 4. In `libzfs_core_fini` we make sure to never increment the refcount of our fd below 0, and also reset the fd to `-1` when no one refers to it. The reason for this, is for the rare case that the consumer closes all references but then calls one of the library's functions without using `libzfs_core_init()` first, and in the mean time, a previous call to `open()` decided to reuse our previous fd. This scenario would have passed our assertion in Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2018-02-20 20:40:55 +00:00
Alexander Motin	499a9a917a	MFV r316901: 7730 libzfs`add_config() leaks config nvl when reading spare/l2cache devices illumos/illumos-gate@105686550e `105686550e` https://www.illumos.org/issues/7730 antares:root:~# mdb /usr/sbin/zpool > ::sysbp _exit > ::run import pool: data id: 2093977168778024605 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: data ONLINE c6t0d0 ONLINE c6t1d0 ONLINE cache c6t2d0 mdb: stop on entry to _exit mdb: target stopped at: 0xfee556ba: nop mdb: You've got symbols! Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1 libnvpair.so.1 ] > ::findleaks -d BYTES LEAKED VMEM_SEG CALLER 4096 10 fda7b000 MMAP 8192 1 fea8d000 MMAP 8192 1 fe76d000 MMAP 8192 1 fe66e000 MMAP 4096 1 fe570000 MMAP 8192 1 fe470000 MMAP 4096 1 fe372000 MMAP 4096 1 fe273000 MMAP Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2018-02-20 20:37:01 +00:00
Alexander Motin	9f6122a6e7	MFV r316893: 7604 if volblocksize property is the default, it displays as "-" rather than 8K illumos/illumos-gate@4d86c0eab2 `4d86c0eab2` https://www.illumos.org/issues/7604 If a zvol has the default setting for the "volblocksize" property, it is 8KB. However, it is displayed as "-" (not present), rather than "8K". The problem was introduced by: commit 25228e830e86924a41243343b1de9daf2d7dd43a Author: Matthew Ahrens <mahrens@delphix.com> Date: Thu Nov 17 14:37:24 2016 -0800 7571 non-present readonly numeric ZFS props do not have default value which changed changed get_numeric_property() to indicate that readonly default properties are not present. However, zfs_prop_readonly() returns TRUE for both readonly and set-once properties (e.g. volblocksize). Amusingly, that commit essentially reverted: 6900484 default volblocksize is no longer being reported correctly from November 2009. However, that change was not correct either; the correct solution is to only do this check for "truly readonly" (i.e. not setonce) properties. $ zfs list -t volume -o name,volblocksize NAME VOLBLOCK domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/ archive - domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/ datafile - domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/ external - rpool/dump 128K rpool/swap 4K rpool/swap1 =============================================================================== Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-02-20 20:34:04 +00:00
Alexander Motin	806141acfc	MFV r316876: 7542 zfs_unmount failed with EZFS_UNSHARENFSFAILED illumos/illumos-gate@09c9e6dc9b `09c9e6dc9b` https://www.illumos.org/issues/7542 libshare keeps a cached copy of the sharetab listing in memory, which can become out of date if shares are destroyed or created while leaving a libzfs handle open. This results in a spurious unmounting failure when an NFS share exists but isn't in the stale libshare cache. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Amdur <matt.amdur@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Chris Williamson <chris.williamson@delphix.com>	2018-02-20 20:30:40 +00:00
Alexander Motin	6d03c5504b	MFV r316875: 7336 vfork and O_CLOEXEC causes zfs_mount EBUSY illumos/illumos-gate@873c4903a5 `873c4903a5` https://www.illumos.org/issues/7336 We can run into a problem where we call into zfs_mount, which in turn calls is_dir_empty, which opens the directory to try and make sure it's empty. The issue with the current approach is that it holds the directory open while it traverses it with readdir, which, due to subtle interaction with the Java JVM, vfork, and exec can cause a tricky race condition resulting in zfs_mount failures. The approach to resolving the issue in this patch is to drop the usage of readdir altogether, and instead rely on the fact that ZFS stores the number of entries contained in a directory using the st_size field of the stat structure. Thus, if the directory in question is a ZFS directory, we can check to see if it's empty by calling stat() and inspecting the st_size field of structure returned. =============================================================================== The root cause appears to be an interesting race between vfork, exec, and zfs_mount's usage of O_CLOEXEC when calling openat. Here's what is going on: 1. We call zfs_mount, and this in turn calls openat to check if the directory is empty, which results in opening the directory we're trying to mount onto, and increment v_count. 2. As we're in the middle of reading the directory, vfork is called by the JVM and proceeds to exec the jspawnhelper utility. As a result of the vfork, we take an additional hold on the directory, which increments v_count a second time. The semantics of vfork mean the parent process will wait for the child process to exit or exec before the parent can continue; at this point the parent is in the middle of zfs_mount, reading the directory to determine if it's empty or not. 3. The child process exec-ing jspawnhelper gets to the relvm call within exec_args (which is called by exec_common). relvm is the function that releases the parent process, allowing the parent to proceed. The problem is, at this point of calling relvm, the child hasn't yet called close_exec which is responsible for closing the file descriptors inherited from the parent process Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com>	2018-02-20 20:26:48 +00:00
Alexander Motin	c2e79753fe	MFV r316873: 7233 dir_is_empty should open directory with CLOEXEC illumos/illumos-gate@d420209d9c `d420209d9c` https://www.illumos.org/issues/7233 This fixes a race where one thread is executing zfs_mount() while another thread forks and execs. If the fork occurs while the directory is open, the child process will inherit (but not necessarily close immediately) the open fd for the directory, preventing the mount. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Alex Reece <alex@delphix.com>	2018-02-20 20:17:19 +00:00
Alexander Motin	91e16b7e23	MFV r316872: 7502 ztest should run zdb with -G (debug mode) illumos/illumos-gate@c3c65d17f7 `c3c65d17f7` https://www.illumos.org/issues/7502 Right now ztest executes zdb without -G, so when it has errors, the messages are often not very helpful: Executing zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest zdb: can't open 'ztest': Operation not supported ztest: '/usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest' exit code 1 With -G, we'd have: /usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache -G ztest zdb: can't open 'ztest': Operation not supported ZFS_DBGMSG(zdb): spa_open_common: opening ztest spa_load(ztest): LOADING spa_load(ztest): FAILED: unable to parse config [error=48] spa_load(ztest): UNLOADING Which indicates where the error came from Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com>	2018-02-20 20:14:11 +00:00
Alan Somers	d4c225b01c	Fix memory leaks in zdb introduced by r329508 Reported by: Coverity CID: 1386185 MFC after: 3 weeks X-MFC-With: 329508 Sponsored by: Spectra Logic Corp	2018-02-20 19:54:06 +00:00
Alexander Motin	c156ddfcb6	MFV r324198: 8081 Compiler warnings in zdb illumos/illumos-gate@3f7978d02b `3f7978d02b` https://www.illumos.org/issues/8081 zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD, which uses -WError, the only way to build it is to disable all compiler warnings. This makes it much harder to detect newly introduced bugs. We should cleanup all the warnings. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Alan Somers <asomers@gmail.com>	2018-02-18 04:00:29 +00:00
Alexander Motin	6c5aa8d10e	MFV r323911: 8502 illumos#7955 broke delegated datasets when libshare is not present illumos/illumos-gate@1c18e8fbd8 `1c18e8fbd8` https://www.illumos.org/issues/8502 The code in lib/libzfs/common/libzfs_mount.c already basically handles the case when libshare is not installed. We just need to not fail in zfs_init_libshare_impl. I tested this in lx and things work as expected. I also tested there trying to set sharenfs and sharesmb on the delegated dataset. Neither is allowed from within a zone. The spew of msgs from a native zone is not ZFS specific. I see the same spew simply running the share command. Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Yuri Pankov <yuripv@gmx.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Jerry Jelinek <jerry.jelinek@joyent.com>	2018-02-18 01:42:17 +00:00
Alan Somers	ad4bbe575b	Fix "zpool add" crash when a replacing vdev has a spare child Fix an assertion in zpool that causes a crash when running any "zpool add" command on a spare that contains a replacing vdev with a spare child. This likely affects Illumos, too. PR: 225546 MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D14138	2018-02-09 16:08:57 +00:00
Alexander Motin	0d6993dfd3	MFV r328255: 8972 zfs holds: In scripted mode, do not pad columns with spaces illumos/illumos-gate@e9b7d6e7f7 https://www.illumos.org/issues/8972: 'zfs holds -H' does not properly output content in scripted mode. It uses a tab instead of two spaces, but it still pads column widths with spaces when it should not. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Allan Jude <allanjude@freebsd.org>	2018-01-22 06:00:45 +00:00
Alexander Motin	4eb2803697	MFV r328251: 8652 Tautological comparisons with ZPROP_INVAL illumos/illumos-gate@4ae5f5f06c https://www.illumos.org/issues/8652: Clang and GCC prefer to use unsigned ints to store enums. With Clang, that causes tautological comparison warnings when comparing a zfs_prop_t or zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error handling code is being silently removed as a result. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Alan Somers <asomers@gmail.com>	2018-01-22 05:52:39 +00:00
Alexander Motin	cf52647c41	MFV r328249: 8641 "zpool clear" and "zinject" don't work on "spare" or "replacing" vdevs illumos/illumos-gate@2ba5f978a4 https://www.illumos.org/issues/8641: "zpool clear" and "zinject -d" can both operate on specific vdevs, either leaf or interior. However, due to an oversight, neither works on a "spare" or "replacing" vdev. For example: sudo zpool create foo raidz1 c1t5000CCA000081D61d0 c1t5000CCA000186235d0 spare c 1t5000CCA000094115d0 sudo zpool replace foo c1t5000CCA000186235d0 c1t5000CCA000094115d0 $ zpool status foo pool: foo state: ONLINE scan: resilvered 81.5K in 0h0m with 0 errors on Fri Sep 8 10:53:03 2017 config: NAME STATE READ WRITE CKSUM foo ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c1t5000CCA000081D61d0 ONLINE 0 0 0 spare-1 ONLINE 0 0 0 c1t5000CCA000186235d0 ONLINE 0 0 0 c1t5000CCA000094115d0 ONLINE 0 0 0 spares c1t5000CCA000094115d0 INUSE currently in use $ sudo zinject -d spare-1 -A degrade foo cannot find device 'spare-1' in pool 'foo' $ sudo zpool clear foo spare-1 cannot clear errors for spare-1: no such device in pool Even though there was nothing to clear, those commands shouldn't have reported an error. by contrast, trying to clear "raidz1-0" works just fine: $ sudo zpool clear foo raidz1-0 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Alan Somers <asomers@gmail.com>	2018-01-22 04:37:04 +00:00
Alexander Motin	d26c97e2fb	MFV r328233: 8898 creating fs with checksum=skein on the boot pools fails ungracefully illumos/illumos-gate@9fa2266d9a https://www.illumos.org/issues/8898: # zfs create -o checksum=skein rpool/test internal error: Result too large Abort (core dumped) Not a big deal per se, but should be handled correctly. Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com> PR: 222199	2018-01-22 00:01:36 +00:00
Alexander Motin	9b347baa3a	MFV r328231: 8897 zpool online -e fails assertion when run on non-leaf vdevs illumos/illumos-gate@9a551dd645 https://www.illumos.org/issues/8897: # zpool online -e test mirror-1 Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online Abort (core dumped) Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'. Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408 Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Dan McDonald <danmcd@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2018-01-21 23:53:56 +00:00
Alexander Motin	442814b7e7	MFV r328220: 8677 Open-Context Channel Programs illumos/illumos-gate@a3b2868063 https://www.illumos.org/issues/8677 We want to be able to run channel programs outside of synching context. This would greatly improve performance of channel program that just gather information, as we won't have to wait for synching context anymore. This feature should introduce the following: - A new command line flag in "zfs program" to specify our intention to run in open context. - A new flag/option within the channel program ioctl which selects the context. - Appropriate error handling whenever we try a channel program in open-context that contains zfs.sync* expressions. - Documentation for the new feature in the manual pages. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2018-01-21 23:02:05 +00:00
Mark Johnston	224e0c2f61	Add "jid" and "jailname" variables to DTrace. These return the jail ID and jail name for the traced process, respectively, and are analogous to "zonename" on Solaris/illumos. "zonename" is now aliased to "jailname". Also add some stress tests for the new variables. Submitted by: Domagoj Stolfa <domagoj.stolfa@gmail.com> Reviewed by: dteske (previous version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D13877	2018-01-12 19:59:46 +00:00
Mark Johnston	60eddb209b	Add a regression test for r327794. MFC after: 2 weeks	2018-01-10 21:40:36 +00:00
Mark Johnston	ff9ebfc4b6	Fix an off-by-one in dt_opt_setenv(). The bug would cause incorrect behaviour when attempting to override an already set environment variable with -x setenv, as long as the variable is not the last one in the array. Reported by: Samuel Lepetit <slepetit@apple.com> MFC after: 2 weeks	2018-01-10 21:37:11 +00:00
Mark Johnston	04006780d9	Complete support for dtrace's -x setenv option. This allows one to override the environment for processes created with dtrace -c. By default, the environment is inherited. This support was originally merged from illumos in r249367 but was lost when the commit was later reverted and then brought back piecemeal. Reported by: Samuel Lepetit <slepetit@apple.com> MFC after: 2 weeks	2017-12-03 16:57:28 +00:00
Mark Johnston	5577b8a709	Add an envp argument to proc_create(). This is needed to support dtrace's -x setenv option. MFC after: 2 weeks	2017-12-03 16:50:16 +00:00
Mark Johnston	f4b90f5a5c	Revert r326181 for now. We can't link an executable using -m32 until the lib32 phase of a buildworld, though the build works fine when executing make from cddl/usr.sbin/dtrace/tests. Some other solution will need to be found.	2017-11-27 17:54:17 +00:00
Andriy Gapon	718cb91ccc	zdb: follow-up to r326150, check if malloc succeeded Reported by: rpokala MFC after: 1 week X-MFC with: r326150	2017-11-25 09:47:31 +00:00
Mark Johnston	e96f62322b	Compile one of the uctf test programs with -m32. The err.user64mode.ksh test expects it to run as a 32-bit process. MFC after: 1 week	2017-11-24 19:57:13 +00:00
Mark Johnston	eb381edab7	Fix the type signature for sx(9) DTrace subroutines. MFC after: 1 week	2017-11-24 19:05:45 +00:00
Andriy Gapon	fd74a38251	zdb: use a heap allocation instead of a huge array on stack SPA_MAXBLOCKSIZE is 16 MB and having such a large object on the stack is not nice in general and it could cause some confusing failures in the single-user mode where the default stack size of 8 MB is used. I expect that the upstream would make the same change. MFC after: 1 week	2017-11-24 10:45:33 +00:00
Mark Johnston	fc81ca0045	Don't assume that we can resolve "main" in the ksh executable. MFC after: 1 week	2017-11-21 15:03:38 +00:00
Ed Maste	4e4805ddf1	dt_modtext: return error on archs lacking an implementation Reported by: mmel Reviewed by: markj MFC after: 1 week MFC with: r325042 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D13176	2017-11-21 03:15:32 +00:00
Mark Johnston	ad1633b34b	Take r313504 into account when recomputing the string table length. When we encounter a USDT probe in a weak symbol, we emit an alias for the probe function symbol. Such aliases are named differently from the aliases we emit for probes in local functions, so make sure to take that difference into account when resizing the output object file's string table. Otherwise, we underrun the string table buffer. PR: 223680	2017-11-16 07:14:29 +00:00
Ed Maste	732f2c69a1	libdtrace: replace "DOODAD" with more descriptive string Previously some unimplemented libdtrace routines printed the function, file and line number, followed by "DOODAD." That is not particularly informative, so replace it with a message reporting the actual issue. Sponsored by: The FreeBSD Foundation	2017-10-27 16:23:45 +00:00
Andriy Gapon	1c05a6ea6b	MFV r325013,r325034: 640 number_to_scaled_string is duplicated in several commands illumos/illumos-gate@0a0551200e `0a0551200e` https://www.illumos.org/issues/640 du(1), df(1m), ls(1), and swap(1m) all include a copy (it appears literally copied) of the 'number_to_scaled_string' function in their source. This should be moved to a shared library and all 4 commands should use this instead. FreeBSD note: of all libcmdutils functionality ZFS (and other illumos contrib code) currently uses only nicenum() function (which is similar to humanize_number but has some formatting differences). For this reason I decided to not port the whole library. As a result, nicenum.c from libcmdutils is compiled into libzfs and libzpool. This is a bit ugly, but works. If one day we are forced to create libillumos, then the file should be moved to that library. Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Yuri Pankov <yuripv@gmx.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Jason King <jason.brian.king@gmail.com> MFC after: 2 weeks	2017-10-27 12:37:22 +00:00
Alan Somers	63f8025d6a	Fix zpool_read_all_labels when vfs.aio.enable_unsafe=0 Previously, zpool_read_all_labels was trying to do 256KB reads, which are greater than the default MAXPHYS and therefore must go through the slow, unsafe AIO path. Shrink these reads to 112KB so they can use the safe, fast AIO path instead. MFC after: 1 week X-MFC-With: 324568 Sponsored by: Spectra Logic Corp	2017-10-25 16:01:19 +00:00
Mark Johnston	7421ff0751	Address some miscellaneous issues in the CTF man page. - Fix a number of typos. - Replace some illumos-specific references. - Note that a type definition of kind CTF_K_FUNCTION may be followed by a null type identifier in order to provide 4-byte alignment for the next type definition. MFC after: 2 weeks	2017-10-22 19:17:25 +00:00
Mark Johnston	ef36b3f756	MFV r323105 (partial): 8300 fix man page issues found by mandoc 1.14.1 illumos/illumos-gate@72d3dbb9ab `72d3dbb9ab` https://www.illumos.org/issues/8300 Prior to integrating the mdocml update to 1.14.1, fix issues found by new version, especially the "new sentence, new line" style rule. FreeBSD note: this revision merges only the changes to the CTF manual page. The changes to the ZFS pages cannot be applied directly. Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Yuri Pankov <yuri.pankov@nexenta.com> Discussed with: avg MFC after: 2 weeks	2017-10-22 18:32:28 +00:00
Alan Somers	b49e9abcf4	Optimize zpool_read_all_labels with AIO Read all labels in parallel instead of sequentially MFC after: 3 weeks X-MFC-With: 322854 Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D12495	2017-10-12 21:25:11 +00:00
Mark Johnston	11a7f121b8	Avoid adding an extra "0x" prefix before pointer formats. MFC after: 1 week	2017-10-06 18:29:00 +00:00
Andriy Gapon	ac63ac6859	zdb.8: replace with the slighly modified upstream version The upstream has converted their manual page to the same format as we have, so we can use the upstream version with minimal modifications. The current modifications are: - different zpool.cache path - different manual sections for zdb, zfs, zpool commands igor reports a few minor issues, it would be nice to fix them both in FreeBSD and in the upstream. MFC after: 3 weeks	2017-10-06 08:28:35 +00:00
Andriy Gapon	e117882ba2	MFV r322235: 8067 zdb should be able to dump literal embedded block pointer illumos/illumos-gate@4923c69fdd `4923c69fdd` FreeBSD note: the manual page is to be updated separately. https://www.illumos.org/issues/8067 Add an option to zdb to print a literal embedded block pointer supplied on the command line: zdb -E [-A] word0:word1:...:word15 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 3 weeks	2017-10-06 08:21:06 +00:00
Andriy Gapon	c40fbcbc18	MFV r316934: 7340 receive manual origin should override automatic origin illumos/illumos-gate@ed4e7a6a5c `ed4e7a6a5c` https://www.illumos.org/issues/7340 When -o origin=<snapshot> is specified as part of a ZFS receive, that origin should override the automatic detection in libzfs. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com> MFC after: 3 weeks	2017-10-06 08:17:12 +00:00
Andriy Gapon	3c7b32b890	MFV r316933: 5142 libzfs support raidz root pool (loader project) illumos/illumos-gate@d5f26ad812 `d5f26ad812` https://www.illumos.org/issues/5142 the current libzfs only allows simple disk and mirror setup for boot pool, as loader does support booting from raidz, this feature will remove raidz restriction from boot pool setup. FreeBSD note: we have long supported this feature, this commit only removes a small difference in libzfs. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Albert Lee <trisk@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Toomas Soome <tsoome@me.com> MFC after: 3 weeks	2017-10-06 08:15:37 +00:00
Andriy Gapon	6a880de6e0	MFV r316931: 6268 zfs diff confused by moving a file to another directory illumos/illumos-gate@aab04418a7 `aab04418a7` https://www.illumos.org/issues/6268 The zfs diff command presents a description of the changes that have occurred to files within a filesystem between two snapshots. If a file is renamed, the tool is capable of reporting this, e.g.: cd /some/zfs/dataset/subdir mv file0 file1 Will result in a diff record like: R /some/zfs/dataset/subdir/file0 -> /some/zfs/dataset/subdir/file1 Unfortunately, it seems that rename detection only uses the base filename to determine if a file has been renamed or simply modified. This leads to misreporting only the original filename, omitting the more relevant destination filename entirely. For example: cd /some/zfs/dataset/subdir mv file0 ../otherdir/file0 Will result in a diff entry: M /some/zfs/dataset/subdir/file0 But it should really emit: R /some/zfs/dataset/subdir/file0 -> /some/zfs/dataset/otherdir/file0 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Joshua M. Clulow <josh@sysmgr.org> MFC after: 3 weeks	2017-10-06 08:12:13 +00:00
Andriy Gapon	f079bf7ce3	MFV r316877: 7571 non-present readonly numeric ZFS props do not have default value illumos/illumos-gate@ad2760acbd `ad2760acbd` https://www.illumos.org/issues/7571 ZFS displays the default value for non-present readonly numeric (and index) properties. However, these properties default values are not meaningful. Instead, we should display a "-", indicating that they are not present. For example, on a version-12 pool, the usedby* properties are not available, but they show up as the incorrect value "0": 1. zfs get all test12 ... test12 usedbysnapshots 0 - test12 usedbydataset 0 - test12 usedbychildren 0 - test12 usedbyrefreservation 0 - We will be introducing more sometimes-present numeric readonly properties, so it would be nice to fix this. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 3 weeks	2017-10-06 08:10:54 +00:00
Andriy Gapon	f51efc68f6	MFV r316864: 6392 zdb: introduce -V for verbatim import illumos/illumos-gate@dfd5965f7e `dfd5965f7e` FreeBSD note: the manual page is to be updated separately. https://www.illumos.org/issues/6392 When given a pool name via -e, zdb would attempt an import. If it failed, then it would attempt a verbatim import. This behavior is not always desirable so a -V switch is added to zdb to control the behavior. When specified, a verbatim import is done. Otherwise, the behavior is as it was previously, except no verbatim import is done on failure. `a5778ea242` Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Richard Yao <ryao@gentoo.org> MFC after: 3 weeks	2017-10-06 08:09:20 +00:00
Andriy Gapon	65512687a9	MFV r316862: 6410 teach zdb to perform object lookups by path illumos/illumos-gate@ed61ec1da9 `ed61ec1da9` FreeBSD note: this commit does not update the manual page. The original change includes conversion of the manual page from *roff format to mandoc format. So, it is hard to extract the content change from that. I am going to replace our zdb manual page, which is an earlier independent conversion, with a slighly modified version of the upstream page. https://www.illumos.org/issues/6410 This is primarily intended to ease debugging & testing ZFS when one is only interested in things like the on-disk location of a specific object's blocks, but doesn't know their object id. This allows doing things like the following (FreeBSD-based example): # zpool create -f foo da0 # dd if=/dev/zero of=/foo/1 bs=1M count=4 >/dev/null 2>&1 # zpool export foo # zdb -vvvvv -o "ZFS plain file" foo /1 Object lvl iblk dblk dsize lsize %full type 8 2 16K 128K 3.99M 4M 100.00 ZFS plain file (K=inherit) (Z=inherit) 168 bonus System attributes dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 31 path /1 uid 0 gid 0 atime Thu Apr 23 22:45:32 2015 mtime Thu Apr 23 22:45:32 2015 ctime Thu Apr 23 22:45:32 2015 crtime Thu Apr 23 22:45:32 2015 gen 7 mode 100644 size 4194304 parent 4 links 1 pflags 40800000004 Indirect blocks: 0 L1 DVA[0]=<0:c19200:600> DVA[1]=<0:10800019200:600> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/200P birth=7L/ Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Dan McDonald <danmcd@omniti.com> Author: Yuri Pankov <yuri.pankov@nexenta.com> MFC after: 3 weeks	2017-10-06 07:52:25 +00:00
Alan Somers	d22f655459	MFV r319743: 8108 zdb -l fails to read labels 2 and 3 illumos/illumos-gate@22c8b9583d `22c8b9583d` https://www.illumos.org/issues/8108 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com> MFC after: 3 weeks	2017-10-02 22:39:12 +00:00
Alan Somers	58dde1c585	MFV r316863: 3871 fix issues introduced by 3604 illumos/illumos-gate@de05b58863 `de05b58863` https://www.illumos.org/issues/3871 GCC 4.5.3 on Gentoo Linux did not like a few of the changes made in the issue 3604 patch. It printed an error and a couple of warnings: ../../cmd/zdb/zdb.c: In function 'dump_bpobj': ../../cmd/zdb/zdb.c:1257:3: error: 'for' loop initial declarations are only allowed in C99 mode ../../cmd/zdb/zdb.c:1257:3: note: use option -std=c99 or -std=gnu99 to compile your code ../../cmd/zdb/zdb.c: In function 'dump_deadlist': ../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format ../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Richard Yao <ryao@gentoo.org> MFC after: 3 weeks	2017-10-02 22:35:35 +00:00
Alan Somers	2091d1c3b9	MFV r316861: 6866 zdb -l should return non-zero if it fails to find any label illumos/illumos-gate@64723e3611 `64723e3611` https://www.illumos.org/issues/6866 Need this for #6865. To be generally more scripting-friendly, overload this issue with adding '-q' option which should skip printing any label information. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com> MFC after: 3 weeks	2017-10-02 22:13:20 +00:00
Alan Somers	66eafdf052	MFC r316858 7280 Allow changing global libzpool variables in zdb 7280 Allow changing global libzpool variables in zdb and ztest through command line illumos/illumos-gate@0e60744c98 `0e60744c98` https://www.illumos.org/issues/7280 zdb is very handy for diagnosing problems with a pool in a safe and quick way. When a pool is in a bad shape, we often want to disable some fail-safes, or adjust some tunables in order to open them. In the kernel, this is done by changing public variables in mdb. The goal of this feature is to add the same capability to zdb and ztest, so that they can change libzpool tuneables from the command line. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com> MFC after: 3 weeks	2017-10-02 22:02:04 +00:00
Andriy Gapon	620c2c801b	MFV r323913: 8600 ZFS channel programs - snapshot illumos/illumos-gate@2840dce1a0 `2840dce1a0` https://www.illumos.org/issues/8600 ZFS channel programs should be able to create snapshots. In addition to the base snapshot functionality, this will likely entail adding extra logic to handle edge cases which were formerly not possible, such as creating then destroying a snapshot in the same transaction sync. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Chris Williamson <chris.williamson@delphix.com> MFC after: 5 weeks X-MFC after: r324163	2017-10-02 11:32:08 +00:00
Andriy Gapon	a1f65a15ce	MFV r323912: 8592 ZFS channel programs - rollback illumos/illumos-gate@000cce6b6f `000cce6b6f` https://www.illumos.org/issues/8592 ZFS channel programs should be able to perform a rollback. This logic will probably look pretty similar to zfs.sync.destroy(). Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Brad Lewis <brad.lewis@delphix.com> MFC after: 5 weeks X-MFC after: r324163	2017-10-02 11:23:31 +00:00
Andriy Gapon	3e52a05570	MFV r323794: 8605 zfs channel programs: zfs.exists undocumented and non-working illumos/illumos-gate@5f39f884e2 `5f39f884e2` https://www.illumos.org/issues/8605 zfs.exists() in channel programs doesn't return any result, and should have a man page entry. Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Chris Williamson <chris.williamson@delphix.com> MFC after: 5 weeks X-MFC after: r324163	2017-10-01 16:51:05 +00:00
Andriy Gapon	bda88d07d9	MFV r323530,r323533,r323534: 7431 ZFS Channel Programs, and followups 7431 ZFS Channel Programs illumos/illumos-gate@dfc115332c `dfc115332c` https://www.illumos.org/issues/7431 ZFS channel programs (ZCP) adds support for performing compound ZFS administrative actions via Lua scripts in a sandboxed environment (with time and memory limits). This initial commit includes both base support for running ZCP scripts, and a small initial library of API calls which support getting properties and listing, destroying, and promoting datasets. Testing: in addition to the included unit tests, channel programs have been in use at Delphix for several months for batch destroying filesystems. The dsl_destroy_snaps_nvl() call has also been replaced with Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Chris Williamson <chris.williamson@delphix.com> 8552 ZFS LUA code uses floating point math illumos/illumos-gate@916c8d8811 `916c8d8811` https://www.illumos.org/issues/8552 In the LUA interpreter used by "zfs program", the lua format() function accidentally includes support for '%f' and friends, which can cause compilation problems when building on platforms that don't support floating-point math in the kernel (e.g. sparc). Support for '%f' friends (%f %e %E %g %G) should be removed, since there's no way to supply a floating-point value anyway (all numbers in ZFS LUA are int64_t's). Reviewed by: Yuri Pankov <yuripv@gmx.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Dan McDonald <danmcd@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> 8590 memory leak in dsl_destroy_snapshots_nvl() illumos/illumos-gate@e6ab4525d1 `e6ab4525d1` https://www.illumos.org/issues/8590 In dsl_destroy_snapshots_nvl(), "snaps_normalized" is not freed after it is added to "arg". Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> FreeBSD notes: - zfs-program.8 manual page is taken almost as is from the vendor repository, no FreeBSD-ification done - fixed multiple instances of NULL being used where an integer is expected - replaced ETIME and ECHRNG with ETIMEDOUT and EDOM respectively This commit adds a modified version of Lua 5.2.4 under sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua, mirroring the upstream. See README.zfs in that directory for the description of Lua customizations. See zfs-program.8 on how to use the new feature. MFC after: 5 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D12528	2017-10-01 16:11:07 +00:00
Andriy Gapon	c13f1d82c8	MFV r323535: 8585 improve batching done in zil_commit() FreeBSD notes: - this MFV reverts FreeBSD commit r314549 to make the merge easier - at present our emulation of cv_timedwait_hires is rather poor, so I elected to use cv_timedwait_sbt directly Please see the differential revision for details. Unfortunately, I did not get any positive reviews, so there could be bugs in the FreeBSD-specific piece of the merge. Hence, the long MFC timeout. illumos/illumos-gate@1271e4b10d `1271e4b10d` https://www.illumos.org/issues/8585 The current implementation of zil_commit() can introduce significant latency, beyond what is inherent due to the latency of the underlying storage. The additional latency comes from two main problems: 1. When there's outstanding ZIL blocks being written (i.e. there's already a "writer thread" in progress), then any new calls to zil_commit() will block waiting for the currently oustanding ZIL blocks to complete. The blocks written for each "writer thread" is coined a "batch", and there can only ever be a single "batch" being written at a time. When a batch is being written, any new ZIL transactions will have to wait for the next batch to be written, which won't occur until the current batch finishes. As a result, the underlying storage may not be used as efficiently as possible. While "new" threads enter zil_commit() and are blocked waiting for the next batch, it's possible that the underlying storage isn't fully utilized by the current batch of ZIL blocks. In that case, it'd be better to allow these new threads to generate (and issue) a new ZIL block, such that it could be serviced by the underlying storage concurrently with the other ZIL blocks that are being serviced. 2. Any call to zil_commit() must wait for all ZIL blocks in its "batch" to complete, prior to zil_commit() returning. The size of any given batch is proportional to the number of ZIL transaction in the queue at the time that the batch starts processing the queue; which doesn't occur until the previous batch completes. Thus, if there's a lot of transactions in the queue, the batch could be composed of many ZIL blocks, and each call to zil_commit() will have to wait for all of these writes to complete (even if the thread calling zil_commit() only cared about one of the transactions in the batch). Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12355	2017-09-26 11:04:08 +00:00
Andriy Gapon	65b5f7c743	MFV r323790: 8567 Inconsistent return value in zpool_read_label illumos/illumos-gate@c861bfbd77 `c861bfbd77` https://www.illumos.org/issues/8567 If fstat64 fails, pread64 fails, or the label is unintelligible, zpool_read_label will return 0. But if malloc fails, it will return -1. For consistency, it should always return -1 on failure or 0 on success. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alan Somers <asomers@gmail.com> MFC after: 2 weeks	2017-09-20 07:23:50 +00:00
Bryan Drewery	b3b6d7b406	Fix the raise tests. - The exit probe was not appropriately filtered to only the known pid so it was firing on any random process that would exit rather the only the one we cared about. - The dtest script executes the tst.raise*.exe in the background from POSIX sh without jobs control. POSIX mandates that SIGINT be set to SIG_IGN in this case. The test executable never actually tested that SIGINT could be caught despite trying to block and delay the signal. So the SIGINT sent from raise() is never actually received since it is ignored. This could be fixed by calling 'trap - INT' from dtest before running the executable but I've opted to just use SIGUSR1 instead in these specific tests rather than adding more logic to test that SIGINT is not ignored at startup. These 2 issues meant that the tests would randomly work but only if a process coincidentally exited during the test. Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-09-15 19:48:48 +00:00
Andriy Gapon	8ac2314e00	MFV r323527: 5815 libzpool's panic function doesn't set global panicstr, ::status not as useful illumos/illumos-gate@fae6347731 `fae6347731` https://www.illumos.org/issues/5815 When panic() is called from within ztest, the mdb ::status command isn't as useful as it could be since the global panicstr variable isn't updated. We should modify the function to make sure panicstr is set, so ::status can present the error message just like it does on a failed assertion. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Reviewed by: Rich Lowe <richlowe@richlowe.net> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com> MFC after: 4 weeks	2017-09-13 10:34:31 +00:00
Andriy Gapon	a345c0b23b	MFV r323523: 8331 zfs_unshare returns wrong error code for smb unshare failure illumos/illumos-gate@4f4378cc54 `4f4378cc54` https://www.illumos.org/issues/8331 zfs_unshare returns EZFS_UNSHARENFSFAILED on error for all share types. Reviewed by: Marcel Telka <marcel@telka.sk> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andrew Stormont <astormont@racktopsystems.com> MFC after: 4 weeks	2017-09-13 10:23:55 +00:00
Andriy Gapon	c9f7a4ab32	MFV r316932: 6280 libzfs: unshare_one() could fail with EZFS_SHARENFSFAILED illumos/illumos-gate@d1672efb6f `d1672efb6f` https://www.illumos.org/issues/6280 The unshare_one() in libzfs could fail with EZFS_SHARENFSFAILED at line 834 here: 831 /* make sure libshare initialized / 832 if ((err = zfs_init_libshare(hdl, SA_INIT_SHARE_API)) != SA_OK) { 833 free(mntpt); / don't need the copy anymore */ 834 return (zfs_error_fmt(hdl, EZFS_SHARENFSFAILED, 835 dgettext(TEXT_DOMAIN, "cannot unshare '%s': %s"), 836 name, _sa_errorstr(err))); 837 } The correct error should be EZFS_UNSHARENFSFAILED instead. Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Marcel Telka <marcel.telka@nexenta.com> MFC after: 4 weeks	2017-09-13 10:22:09 +00:00
Li-Wen Hsu	da3086df18	Fix DTrace test tst_inet_ntop_d: remove definitions are already in libdtrace We have D definitions for the named values in socket.h after r323253. Remove them in test script to prevent compiling failure. Reviewed by: markj, gnn Differential Revision: https://reviews.freebsd.org/D12334	2017-09-12 16:00:51 +00:00
Mark Johnston	b934b56429	Add a O_CLOEXEC use missed in r323166. PR: 199810 Reported by: Jukka A. Ukkonen <jau789@gmail.com> MFC after: 3 days	2017-09-12 14:38:10 +00:00
Andriy Gapon	90354c3200	MFV r323107: 8414 Implemented zpool scrub pause/resume illumos/illumos-gate@1702cce751 `1702cce751` FreeBSD note: rather than merging the zpool.8 update I copied the zpool scrub section from the illumos zpool.1m to FreeBSD zpool.8 almost verbatim. Now that the illumos page uses the mdoc format, it was an easier option. Perhaps the change is not in perfect compliance with the FreeBSD style, but I think that it is acceptible. https://www.illumos.org/issues/8414 This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167 Currently, there is no way to pause a scrub. Pausing may be useful when the pool is busy with other I/O to preserve bandwidth. Description This patch adds the ability to pause and resume scrubbing. This is achieved by maintaining a persistent on-disk scrub state. While the state is 'paused' we do not scrub any more blocks. We do however perform regular scan housekeeping such as freeing async destroyed and deadlist blocks while paused. Motivation and Context Scrub pausing can be an I/O intensive operation and people have been asking for the ability to pause a scrub for a while. This allows one to preserve scrub progress while freeing up bandwidth for other I/O. Reviewed by: George Melikov <mail@gmelikov.ru> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Alek Pinchuk <apinchuk@datto.com> MFC after: 2 weeks	2017-09-09 11:00:07 +00:00
Mark Johnston	afd2f355fb	Use O_CLOEXEC when opening persistent handles in libdtrace. PR: 199810 Submitted by: jau@iki.fi MFC after: 1 week	2017-09-05 00:11:06 +00:00
Alan Somers	e6c8b3c98a	zfsd(8): Close a race condition when onlining a disk paritition When inserting a partitioned disk, devfs and geom will announce the whole disk before they announce the partition. If the partition containing ZFS extends to one of the disk's extents, then zfsd will see a ZFS label on the whole disk and attempt to online it. ZFS is smart enough to activate the partition instead of the whole disk, but only if GEOM has already created the partition's provider. cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c Add a zpool_read_all_labels method. It's similar to zpool_read_label, but it will return the number of labels found. cddl/usr.sbin/zfsd/zfsd_event.cc When processing a DevFS CREATE event, only online a VDEV if we can read all four ZFS labels. Reviewed by: mav MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D11920	2017-08-24 19:48:41 +00:00
Mark Johnston	9b0ddce6bc	Use an updated copy of the CDDL header boilerplate from illumos. Reported by: Yuri Pankov <yuripv@gmx.com> X-MFC with: r322774	2017-08-21 22:26:49 +00:00
Mark Johnston	a3a7b74e18	Add a regression test for r322773. MFC after: 1 week	2017-08-21 21:58:42 +00:00
Mark Johnston	ce4da6fccc	Fix an off-by-two in the llquantize() action parameter validation. The aggregation created by llquantize() partitions values into buckets; the lower bound of the bucket containing the largest values is b^{m+1}, where b and m are the second and fourth parameters to the action, respectively. Bucket bounds are stored in a 64-bit integer, and so the llquantize() validation checks need to verify that b^{m+1} fits in 64 bits. However, it was only verifying that b^{m-1} fits in 64 bits, so certain parameter combinations could trigger assertion failures in libdtrace. PR: 219451 MFC after: 1 week	2017-08-21 21:56:02 +00:00
Andriy Gapon	9c48e95dd9	MFV r322229: 7600 zfs rollback should pass target snapshot to kernel illumos/illumos-gate@77b171372e `77b171372e` https://www.illumos.org/issues/7600 At present, the kernel side code seems to blindly rollback to whatever happens to be the latest snapshot at the time when the rollback task is processed. The expected target's name should be passed to the kernel driver and the sync task should validate that the target exists and that it is the latest snapshot indeed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 3 weeks	2017-08-08 10:52:01 +00:00
Andriy Gapon	b4e4140d13	MFV r322223: 8378 crash due to bp in-memory modification of nopwrite block illumos/illumos-gate@b7edcb9408 `b7edcb9408` https://www.illumos.org/issues/8378 The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which we then nopwrite against. zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr to zgd_bp, dbuf_write_ready() could change db_blkptr, and dbuf_write_done() could remove the dirty record. dmu_sync() then sees the stale BP and that the dbuf it not dirty, so it is eligible for nop-writing. The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the db_mtx. We could still see a stale db_blkptr, but if it is stale then the dirty record will still exist and thus we won't attempt to nopwrite. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 2 weeks	2017-08-08 10:46:51 +00:00
Andriy Gapon	8b30f189d1	MFV r322217: 8418 zfs_prop_get_table() call in zfs_validate_name() is a no-op illumos/illumos-gate@e09ba01dcd `e09ba01dcd` https://www.illumos.org/issues/8418 The following line in zfs_validate_name() is just a no-op and it should be removed: 108 (void) zfs_prop_get_table(); Reviewed by: Vitaliy Gusev <gusev.vitaliy@icloud.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Marcel Telka <marcel@telka.sk> MFC after: 2 weeks	2017-08-08 10:30:49 +00:00
Ruslan Bukin	ca20f8ec29	o Replace __riscv__ with __riscv o Replace __riscv64 with (__riscv && __riscv_xlen == 64) This is required to support new GCC 7.1 compiler. This is compatible with current GCC 6.1 compiler. RISC-V is extensible ISA and the idea here is to have built-in define per each extension, so together with __riscv we will have some subset of these as well (depending on -march string passed to compiler): __riscv_compressed __riscv_atomic __riscv_mul __riscv_div __riscv_muldiv __riscv_fdiv __riscv_fsqrt __riscv_float_abi_soft __riscv_float_abi_single __riscv_float_abi_double __riscv_cmodel_medlow __riscv_cmodel_medany __riscv_cmodel_pic __riscv_xlen Reviewed by: ngie Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11901	2017-08-07 14:09:57 +00:00
Mark Johnston	22e406c80b	Rework and simplify the ksyms(4) implementation. - Store the symbol table contents in an anonymous swap-backed object. Have mmap(/dev/ksyms) map that object, and stop mapping the symbol table into the calling process in ksyms_open(). Previously we would cache a pointer to the pmap of the opening process, and mmap(/dev/ksyms) would create a mapping using the physical address found by a pmap lookup at the initial mapping address. However, this assumes that the cached pmap is valid, which may not be the case. [1] - Remove the ksyms ioctl interface. It appears to have been added to work around a limitation in libelf that no longer exists; see r321842. Moreover, the interface is difficult to support and isn't present in illumos. Since ksyms was added specifically to support lockstat(1), it is expected that this removal won't have any real impact. - Simplify ksyms_read() to avoid unnecessary copying. - Don't call the device handle destructor if we fail to capture a snapshot of the kernel's symbol table. devfs will do that for us. Reported by: Ilja van Sprundel <ivansprundel@ioactive.com> [1] Reviewed by: kib (previous revision) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11789	2017-08-03 00:38:13 +00:00
Alexander Motin	f2d3f6918e	Add compat shim part missed at r305197. This fixes compatibility between old kernel and new ZFS tools. It seems to be tradition to forget it. :( PR: 221112 MFC after: 3 days	2017-08-02 10:33:47 +00:00
Mark Johnston	3c9308e82b	Remove local variables missed in r321842. X-MFC with: r321842	2017-08-01 04:52:03 +00:00
Mark Johnston	90ec6fd4ba	Let lockstat use ksyms(4)'s mmap interface. The workaround described in the deleted comment is no longer needed. MFC after: 1 week	2017-08-01 04:49:54 +00:00
Li-Wen Hsu	ad89d783a6	Add an auxiliary subroutine to generate some events for testing This test is also timeout on a quiet system because there is nobody triggering read probefunc while test execution. Reviewed by: gnn, markj, ngie Differential Revision: https://reviews.freebsd.org/D11731	2017-07-26 12:07:46 +00:00
Li-Wen Hsu	637ba06516	Modify glob patterns and expected output to match FreeBSD's implementation. Reviewed by: gnn, markj, ngie Differential Revision: https://reviews.freebsd.org/D11713	2017-07-25 13:14:02 +00:00
Li-Wen Hsu	5604d0f997	Make this test case accepts basename() in D script returns "" or "." In Solaris, basename(1) and basename(3) both return "." while being given an empty string (""), while in BSD (and Linux) basename(1) returns "" and basename(3) returns "." While here, also change #!/usr/bin/ksh to #!/usr/bin/env ksh to find ksh in $PATH Reviewed by: gnn, markj (earlier version), ngie (earlier version) Differential Revision: https://reviews.freebsd.org/D11707	2017-07-25 13:11:20 +00:00
Li-Wen Hsu	4ca0dfa6b0	Explicitly set dynamic variable buffer size. We added too many variable assignments in BEGIN block, which will run out of default auto-configured variable buffer space. The test VM has 4G RAM which should be enough for most cases so it's reasonable to increase limitation to these case. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D11676	2017-07-25 13:07:06 +00:00
Li-Wen Hsu	23833df483	Explicitly set dynamic variable buffer size. We added too many variable assignments in BEGIN block, which will run out of default auto-configured variable buffer space. The test VM has 4G RAM which should be enough for most cases so it's reasonable to increase limitation to these case. Reviewed by: gnn, markj, ngie Differential Revision: https://reviews.freebsd.org/D11674	2017-07-25 13:04:24 +00:00
Li-Wen Hsu	070a148127	Add an auxiliary subroutine to generate read(2) event while testing. Reviewed by: gnn, ngie Differential Revision: https://reviews.freebsd.org/D11673	2017-07-25 13:01:10 +00:00
Li-Wen Hsu	d83c70758a	Add a simple script which calls open(2) and others to generate events for testing. This test times-out on a quiet system because there is nobody triggers syscall::open:entry or syscall::: probe while test execution. Reviewed by: gnn, markj (earlier version) Differential Revision: https://reviews.freebsd.org/D11671	2017-07-25 12:58:03 +00:00
Li-Wen Hsu	b9de3393dd	Add a simple program which calls sigtimedwait(2) to generate events for testing This test timeout on a quiet system because there is nobody triggers 'syscall::wait:entry' probe while test execution. Reviewed by: gnn, markj, ngie Differential Revision: https://reviews.freebsd.org/D11668	2017-07-25 12:52:32 +00:00
Enji Cooper	31ed01a2de	Fix whitespace on a line in fix(..) accidentally missed in r321424 MFC after: 1 month MFC with: r321424	2017-07-24 17:29:56 +00:00
Enji Cooper	f3305cae02	Style cleanup: delete spurious trailing whitespace MFC after: 1 month	2017-07-24 17:27:21 +00:00
Enji Cooper	aa52ad5489	Don't use incorrect hardcoded path to ksh -- use /usr/bin/env to find ksh instead MFC after: 1 month	2017-07-23 17:57:00 +00:00
Andriy Gapon	f9cdbaba8d	MFV r318946: 8021 ARC buf data scatter-ization illumos/illumos-gate@770499e185 `770499e185` https://www.illumos.org/issues/8021 The ARC buf data project (known simply as "ABD" since its genesis in the ZoL community) changes the way the ARC allocates `b_pdata` memory from using linear `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This improves ZFS's performance by helping to defragment the address space occupied by the ARC, in particular for cases where compressed ARC is enabled. It could also ease future work to allocate pages directly from `segkpm` for minimal- overhead memory allocations, bypassing the `kmem` subsystem. This is essentially the same change as the one which recently landed in ZFS on Linux, although they made some platform-specific changes while adapting this work to their codebase: 1. Implemented the equivalent of the `segkpm` suggestion for future work mentioned above to bypass issues that they've had with the Linux kernel memory allocator. 2. Changed the internal representation of the ABD's scatter/gather list so it could be used to pass I/O directly into Linux block device drivers. (This feature is not available in the illumos block device interface yet.) FreeBSD notes: - the actual (default) chunk size is 4KB (despite the text above saying 1KB) - we can try to reimplement ABDs, so that they are not permanently mapped into the KVA unless explicitly requested, especially on platforms with scarce KVA - we can try to use unmapped I/O and avoid intermediate allocation of a linear, virtual memory mapped buffer - we can try to avoid extra data copying by referring to chunks / pages in the original ABD Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Dan Kimmel <dan.kimmel@delphix.com> MFC after: 3 weeks	2017-06-20 17:39:24 +00:00
Andriy Gapon	b8d341fe26	MFV r319945,r319946: 8264 want support for promoting datasets in libzfs_core illumos/illumos-gate@a4b8c9aa65 `a4b8c9aa65` https://www.illumos.org/issues/8264 Oddly there is a lzc_clone function, but no lzc_promote function. Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@kebe.com> Approved by: Dan McDonald <danmcd@kebe.com> Author: Andrew Stormont <astormont@racktopsystems.com> MFC after: 1 week	2017-06-14 16:31:36 +00:00
Andriy Gapon	9260925dcd	MFV r319740: 8168 NULL pointer dereference in zfs_create() illumos/illumos-gate@690031d326 `690031d326` https://www.illumos.org/issues/8168 If we manage to export the pool on which we are creating a dataset (filesystem or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for which we never check the return value) we end up dereferencing a NULL pointer in libzfs`zpool_close(). This was discovered on ZFS on Linux. The same issue can be reproduced on Illumos running in parallel: while :; do zpool import -d /tmp testpool ; zpool export testpool ; done while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done Eventually this will result in several core dumps like this one: [root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244 Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1 libnvpair.so.1 ld.so.1 ] > ::stack libzfs.so.1`zpool_close+0x17(0, 0, 0, `8047450`) libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8) zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3) main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70) _start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b) > Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096 Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: loli10K <ezomori.nozomu@gmail.com> MFC after: 2 weeks	2017-06-09 15:30:41 +00:00
Andriy Gapon	ad2b1a296f	MFV r319744,r319745: 8269 dtrace stddev aggregation is normalized incorrectly illumos/illumos-gate@79809f9cf4 `79809f9cf4` https://www.illumos.org/issues/8269 It seems that currently normalization of stddev aggregation is done incorrectly. We divide both the sum of values and the sum of their squares by the normalization factor. But we should divide the sum of squares by the normalization factor squared to scale the original values properly. FreeBSD note: the actual change was committed in r316853, this commit adds the test files and record merge information. Reviewed by: Bryan Cantrill <bryan@joyent.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 1 week Sponsored by: Panzura	2017-06-09 15:16:39 +00:00
Allan Jude	39b0b876dc	New sentences start on new lines, fix two violations Reviewed by: bcr Sponsored by: BSDCan Dev Summit	2017-06-08 01:39:17 +00:00
Allan Jude	dc379eca14	SHA-512 and Skein have been supported by the boot loader for some time. Submitted by: lifanov Reviewed by: bcr Sponsored by: BSDCan Dev Summit	2017-06-08 01:29:24 +00:00
Andriy Gapon	457711b02b	MFV r316922: 5380 receive of a send -p stream doesn't need to try renaming snapshots illumos/illumos-gate@471a88e499 `471a88e499` https://www.illumos.org/issues/5380 A stream created with zfs send -p -I contains properties of all snapshots of a given dataset as opposed to only properties of snapshots in a given range. Not only this is suboptimal but the receive code also does not filter properties by the range. So, properties of earlier snapshots would be updated even though the snapshots themselves are not in the stream (just their properties). Given that modifying the snapshot properties requires a TXG sync and that the snapshots are updated one by one the described behavior may lead to a sever performance penalty. Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 3 weeks	2017-05-24 22:30:21 +00:00
Andriy Gapon	b6be31c7ca	MFC r316908: 7541 zpool import/tryimport ioctl returns ENOMEM because provided buffer is too small for config illumos/illumos-gate@8b65a70b76 `8b65a70b76` https://www.illumos.org/issues/7541 When calling zpool import, zpool does a few ioctls to ZFS. zpool allocates a buffer in userland and passes it to the kernel so that ZFS can copy info into it. ZFS will use it to put the nvlist that describes the pool configuration. If the allocated buffer is too small, ZFS will return ENOMEM and the call will have to be redone. This wastes CPU time and slows down the import process. This happens very often for the ZFS_IOC_POOL_TRYIMPORT call. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com> MFC after: 2 weeks	2017-05-24 21:32:35 +00:00
Andriy Gapon	5644318970	MFC r316904: 7729 libzfs_core`lzc_rollback() leaks result nvl illumos/illumos-gate@ac428481f9 `ac428481f9` https://www.illumos.org/issues/7729 libzfs_core`lzc_rollback() doesn't free the result nvl after lzc_ioctl() call. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Yuri Pankov <yuri.pankov@nexenta.com> MFC after: 2 weeks	2017-05-24 20:53:01 +00:00
Andriy Gapon	c65389d367	MFV r316860: 7545 zdb should disable reference tracking illumos/illumos-gate@4dd77f9e38 `4dd77f9e38` https://www.illumos.org/issues/7545 When evicting from the ARC, we manipulate some refcount_t's, e.g. arcs_size. When using zdb to examine a large amount of data (e.g. zdb -bb on a large pool with small blocks), the ARC may have a large number of entries. If reference tracking is enabled, there will be ~1 reference for each block in the ARC. When evicting, we decrement the refcount and have to search all the references to find the one that we are removing, which is very slow. Since zdb is typically used to find problems with the on-disk format, and not with the code it is running, we should disable reference tracking in zdb. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 2 weeks	2017-05-24 20:41:26 +00:00
Mark Johnston	b4a3f67bd6	Add a little helper program for tst.exitcore.ksh. sleep(1) is capsicumized, which means that we cannot rely on it to dump core as required by the test. MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-05-22 20:34:51 +00:00
Alexander Motin	c9bcbf3800	Fix time handling in cv_timedwait_hires(). pthread_cond_timedwait() receives absolute time, not relative. Passing wrong time there caused two threads of zdb to spin in a tight loop. MFC after: 1 week	2017-05-19 05:12:58 +00:00
Josh Paetzel	c78abb8b50	MFV 316894 7252 7628 compressed zfs send / receive illumos/illumos-gate@5602294fda `5602294fda` https://www.illumos.org/issues/7252 This feature includes code to allow a system with compressed ARC enabled to send data in its compressed form straight out of the ARC, and receive data in its compressed form directly into the ARC. https://www.illumos.org/issues/7628 We should have longer, more readable versions of the ZFS send / recv options. 7628 create long versions of ZFS send / receive options Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: David Quigley <dpquigl@davequigley.com> Reviewed by: Thomas Caputi <tcaputi@datto.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Dan Kimmel <dan.kimmel@delphix.com>	2017-04-25 17:57:43 +00:00
Josh Paetzel	ef18459108	MFV 316891 7386 zfs get does not work properly with bookmarks illumos/illumos-gate@edb901aab9 `edb901aab9` https://www.illumos.org/issues/7386 The zfs get command does not work with the bookmark parameter while it works properly with both filesystem and snapshot: # zfs get -t all -r creation rpool/test NAME PROPERTY VALUE SOURCE rpool/test creation Fri Sep 16 15:00 2016 - rpool/test@snap creation Fri Sep 16 15:00 2016 - rpool/test#bkmark creation Fri Sep 16 15:00 2016 - # zfs get -t all -r creation rpool/test@snap NAME PROPERTY VALUE SOURCE rpool/test@snap creation Fri Sep 16 15:00 2016 - # zfs get -t all -r creation rpool/test#bkmark cannot open 'rpool/test#bkmark': invalid dataset name # The zfs get command should be modified to work properly with bookmarks too. Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Marcel Telka <marcel@telka.sk>	2017-04-21 19:53:52 +00:00
Alan Somers	07bb15b440	MFV 316855 7900 zdb shouldn't print the path of a znode at verbosity < 5 Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Alan Somers <asomers@freebsd.org> illumos/illumos-gate@e548d2fa41 https://www.illumos.org/issues/7900 MFC after: 3 weeks Sponsored by: Spectra Logic Corp	2017-04-14 16:30:37 +00:00
Andriy Gapon	5ad79d9b20	dtrace: fix normalization of stddev aggregation To be upstreamed. Discussed with: Bryan Cantrill <bryancantrill@gmail.com> MFC after: 2 weeks Sponsored by: Panzura	2017-04-14 15:31:04 +00:00
Pedro F. Giffuni	9e98194646	MFV r316693: 8046 Let calloc() do the multiplication in libzfs_fru_refresh `5697e03e6e` https://www.illumos.org/issues/8046 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Pedro Giffuni <pfg@freebsd.org> MFC after: 3 days	2017-04-10 22:56:38 +00:00
Alexander Motin	3aef5b286a	MFV r315290, r315291: 7303 dynamic metaslab selection illumos/illumos-gate@8363e80ae7 https://github.com/illumos/illumos-gate/commit/8363e80ae72609660f6090766ca8c2c18 https://www.illumos.org/issues/7303 This change introduces a new weighting algorithm to improve metaslab selection. The new weighting algorithm relies on the SPACEMAP_HISTOGRAM feature. As a result, the metaslab weight now encodes the type of weighting algorithm used (size-based vs segment-based). This also introduce a new allocation tracing facility and two new dcmds to help debug allocation problems. Each zio now contains a zio_alloc_list_t structure that is populated as the zio goes through the allocations stage. Here's an example of how to use the tracing facility: > c5ec000::print zio_t io_alloc_list \| ::walk list \| ::metaslab_trace MSID DVA ASIZE WEIGHT RESULT VDEV - 0 400 0 NOT_ALLOCATABLE ztest.0a - 0 400 0 NOT_ALLOCATABLE ztest.0a - 0 400 0 ENOSPC ztest.0a - 0 200 0 NOT_ALLOCATABLE ztest.0a - 0 200 0 NOT_ALLOCATABLE ztest.0a - 0 200 0 ENOSPC ztest.0a 1 0 400 1 x 8M 17b1a00 ztest.0a > 1ff2400::print zio_t io_alloc_list \| ::walk list \| ::metaslab_trace MSID DVA ASIZE WEIGHT RESULT VDEV - 0 200 0 NOT_ALLOCATABLE mirror-2 - 0 200 0 NOT_ALLOCATABLE mirror-0 1 0 200 1 x 4M `112ae00` mirror-1 - 1 200 0 NOT_ALLOCATABLE mirror-2 - 1 200 0 NOT_ALLOCATABLE mirror-0 1 1 200 1 x 4M 112b000 mirror-1 - 2 200 0 NOT_ALLOCATABLE mirror-2 If the metaslab is using segment-based weighting then the WEIGHT column will display the number of segments available in the bucket where the allocation attempt was made. Author: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Chris Siden <christopher.siden@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Approved by: Richard Lowe <richlowe@richlowe.net>	2017-03-24 09:37:00 +00:00
Mark Johnston	d935f34b8f	Fix memory leaks in error cases in libdtrace. Submitted by: Tom Rix <trix@juniper.net> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9705	2017-02-23 17:56:24 +00:00
Mark Johnston	74d9553e43	Fix a memory leak in an error case in libctf. Submitted by: Tom Rix <trix@juniper.net> MFC after: 1 week	2017-02-23 17:54:17 +00:00
Mark Johnston	281e4f2dd4	When patching USDT probes, use non-unique names for aliases of weak symbols. Aliases are normally given names that include a key that's unique for each input object file. This, for example, ensures that aliases for identically named local symbols in different object files don't conflict. However, in some cases the static linker will leave an undefined alias after merging identical weak symbols, resulting in a link error. A non-unique name allows the aliases to be merged as well. PR: 216871 X-MFC With: r313262	2017-02-10 02:01:32 +00:00
Mark Johnston	35bf9feb41	Search for _DTRACE_VERSION in sys/sdt.h rather than unistd.h. MFC after: 1 week	2017-02-05 02:45:35 +00:00
Mark Johnston	55c2fd519f	Avoid using Sun compiler-specific flags. MFC after: 1 week	2017-02-05 02:44:48 +00:00
Mark Johnston	273efb05a2	Fix a double free of libelf data buffers in the USDT link code. libdtrace needs to append to the input object files' string and symbol tables. Currently it does so by allocating a larger buffer, copying the existing sections into them, and swapping pointers in the libelf data descriptors. However, it also frees those buffers when its processing is complete, which leads to a double free since the elftoolchain libelf owns them and also frees them in elf_end(3). Instead, free the buffers originally allocated by libelf. MFC after: 2 weeks	2017-02-05 02:44:08 +00:00
Mark Johnston	e801af6fba	Use PC-relative relocations for USDT probe sites on i386 and amd64. When recording probe site addresses in the output DOF file, dtrace -G needs to emit relocations for the .SUNW_dof section in order to obtain the addresses of functions containing probe sites. DTrace expects the addresses to be relative to the base address of the final ELF file, and the amd64 USDT implementation was relying on some unspecified and incorrect behaviour in the base system GNU ld to achieve this. This change reimplements the probe site relocation handling to allow USDT to be used with lld and newer GNU binutils. Specifically, it makes use of R_X86_64_PC64/R_386_PC32 relocations to obtain the probe site address relative to the DOF file address, and adds and uses a new DOF relocation type which computes the final probe site address using these relative offsets. Reported by and discussed with: Rafael Espíndola MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D9374	2017-02-05 02:39:12 +00:00
Mark Johnston	3a3e3279e7	Avoid modifying the object string table when patching USDT probes. dtrace converts pairs of consecutive underscores in a probe name to dashes. When dtrace -G processes relocations corresponding to USDT probe sites, it performs this conversion on the corresponding symbol names prior to looking up the resulting probe names in the USDT provider definition. However, in so doing it would actually modify the input object's string table, which breaks the string suffix merging done by recent binutils. Because we don't care about the symbol name once the probe site is recorded, just perform the probe lookup using a temporary copy. Reported by: hrs, swills MFC after: 3 weeks	2016-12-20 18:25:41 +00:00
Mark Johnston	a46000e8ff	Consistently print D variable indices in decimal when disassembling. MFC after: 1 week	2016-12-20 05:45:52 +00:00
Mark Johnston	3c606f671e	Use the correct path to date(1). MFC after: 1 week Sponsored by: Dell EMC Isilon	2016-12-07 23:38:18 +00:00
Mark Johnston	058f5a9a47	Use the native data model instead of forcing ILP32 in tst.provregex3.ksh. MFC after: 1 week Sponsored by: Dell EMC Isilon	2016-12-07 23:37:51 +00:00
Mark Johnston	e0d70fc1dc	libdtrace: Don't use a read-only handle for enumerating pid probes. Enumeration of return probes involves disassembling subroutines in the target process, and ptrace(2) is currently used to read from the target process. libproc could read from the backing file instead to avoid this problem, but in the common case libdtrace will have a writeable handle on the process anyway. In particular, a writeable handle is needed to list USDT probes, and libdtrace will cache such a handle for processes that it controls via dtrace -c and -p.	2016-12-06 04:28:56 +00:00
Pedro F. Giffuni	69718b786d	Revert r253678, r253661: Fix a segfault in ctfmerge(1) due to a bug in GCC. The change was correct and the bug real, but upstream didn't adopt it and we want to remain in sync. When/if upstream does something about it we can bring their version. The bug in question was fixed in GCC 4.9 which is now the default in FreeBSD's ports. Our native gcc-4.2, which is still in use in some Tier-2 platforms also has a workaround so no end-user should be harmed by the revert.	2016-12-03 17:44:43 +00:00
Andriy Gapon	61158a7ce8	MFV r308989: 6428 set canmount=off on unmounted filesystem tries to unmount children This is a cosmetic and bookkeeping change as the actual change is already in FreeBSD. See r297521, r304520, r308985.	2016-11-24 10:11:09 +00:00
Andriy Gapon	9170c18bb9	revert r304520, set canmount=on is not supposed to mount the filesystem Not sure where I got the idea that it should. See https://github.com/openzfs/openzfs/pull/218 Reported by: mahrens Pointyhat to: avg MFC after: 5 days	2016-11-22 11:44:30 +00:00
Alexander Motin	14b5719f6a	After some ZIL changes 6 years ago zil_slog_limit got partially broken due to zl_itx_list_sz not updated when async itx'es upgraded to sync. Actually because of other changes about that time zl_itx_list_sz is not really required to implement the functionality, so this patch removes some unneeded broken code and variables. Original idea of zil_slog_limit was to reduce chance of SLOG abuse by single heavy logger, that increased latency for other (more latency critical) loggers, by pushing heavy log out into the main pool instead of SLOG. Beside huge latency increase for heavy writers, this implementation caused double write of all data, since the log records were explicitly prepared for SLOG. Since we now have I/O scheduler, I've found it can be much more efficient to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG. Existing ZIL implementation had problem with space efficiency when it has to write large chunks of data into log blocks of limited size. In some cases efficiency stopped to almost as low as 50%. In case of ZIL stored on spinning rust, that also reduced log write speed in half, since head had to uselessly fly over allocated but not written areas. This change improves the situation by offloading problematic operations from z_log_write() to zil_lwb_commit(), which knows real situation of log blocks allocation and can split large requests into pieces much more efficiently. Also as side effect it removes one of two data copy operations done by ZIL code WR_COPIED case. While there, untangle and unify code of z_log_write() functions. Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing block boundary, that may also improve efficiency if ZPL is made to do that. Sponsored by: iXsystems, Inc.	2016-11-17 21:01:27 +00:00
Mark Johnston	375c8b20dc	Remove the DTrace printt and typeref actions. These are FreeBSD-specific and were added in r178576 to provide the ability to pretty-print instances of compound types. However, the print action has long since been augmented to provide this functionality with a simpler interface. Discussed with: gnn Differential Revision: https://reviews.freebsd.org/D8478	2016-11-12 19:26:12 +00:00
Andriy Gapon	d5315b02cd	MFV r308222: 6051 lzc_receive: allow the caller to read the begin record illumos/illumos-gate@620f322510 `620f322510` https://www.illumos.org/issues/6051 Currently lzc_receive() requires that its snapname argument is a snapshot name (contains '@'). zfs receive allows to specify just a dataset name and would try to deduce the snapshot name from the stream. I propose to allow lzc_receive() to do the same. That seems to be quite easy to implement, it requires only a small amount of logic, it does not require any additional system calls or any additional data from the stream. The benefit is that the new behavior would allow to keep the snapshot names the same between the sender and receiver at zero cost, without a need to pass the names out of band. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@icyb.net.ua> MFC after: 2 weeks	2016-11-03 09:24:27 +00:00
Andriy Gapon	97371ba2a9	zfsbootcfg: a simple tool to set next boot (one time) options for zfsboot (gpt)zfsboot will read one-time boot directives from a special ZFS pool area. The area was previously described as "Boot Block Header", but currently it is know as Pad2, marked as reserved and is zeroed out on pool creation. The new code interprets data in this area, if any, using the same format as boot.config. The area is immediately wiped out. Failure to parse the directives results in a reboot right after the cleanup. Otherwise the boot sequence proceeds as usual. zfsbootcfg writes zfsboot arguments specified on its command line to the Pad2 area of a disk identified by vfs.zfs.boot.primary_pool and vfs.zfs.boot.primary_vdev kenv variables that are set by loader during boot. Please see the manual page for more. Thanks to all who reviewed, contributed and made suggestions! There are many potential improvements to the feature, please see the review for details. Reviewed by: wblock (docs) Discussed with: jhb, tsoome MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D7612	2016-10-29 14:09:32 +00:00
Mark Johnston	6a4985f61c	Fix tst.args1.c on LP64 platforms. The untyped probe arguments have a width larger than int on such platforms, so printing their value without a cast can give unexpected results. MFC after: 1 week	2016-10-16 19:50:10 +00:00

... 2 3 4 5 6 ...

1021 Commits