freebsd-skq

Author	SHA1	Message	Date
Baptiste Daroussin	f34d9e5d6b	Fix a bug introduced with parallel mounting of zfs Incorporate a fix from zol: `ab5036df1c` commit log from upstream: Fix race in parallel mount's thread dispatching algorithm Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) #8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of #8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) #8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of #8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> PR: 237517, 237397, 239243 Submitted by: Matthew D. Fuller <fullermd@over-yonder.net> (by email) MFC after: 3 days	2019-07-26 13:12:33 +00:00
Mark Johnston	d6eb98610f	Reference stdint.h types in ctf.5. MFC after: 1 week	2019-07-17 16:31:50 +00:00
Mariusz Zaborski	39d51a9400	DTrace: add a top level makefile to the new test suit Pointed out by: markj	2019-06-09 22:45:07 +00:00
Allan Jude	92e0d7f840	zpool.8: the comment property is not read-only The comment property was listed in the man page twice, once under the list of read-only properties, and again (correctly), under the list of user editable properties. PR: 238355 Reported by: Michael Zuo <muh.muhten@gmail.com> Sponsored by: Klara Systems	2019-06-06 01:32:00 +00:00
Mariusz Zaborski	75ed05ef7d	DTrace: create an amd64 test suit Create two tests checking if we can read urgs registers and if the rax register returns a correct number. Reviewed by: markj Discussed with: lwhsu MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20364	2019-06-05 22:32:26 +00:00
Alexander Motin	9b048dd219	MFV r348583: 9847 leaking dd_clones (DMU_OT_DSL_CLONES) objects illumos/illumos-gate@17fb938fd6 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2019-06-03 20:49:20 +00:00
Alexander Motin	eb106113c2	MFV r348580: 9559 zfs diff handles files on delete queue in fromsnap poorly illumos/illumos-gate@20633e304b Reviewed by: Joshua M. Clulow <josh@sysmgr.org> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Paul Dagnelie <pcd@delphix.com>	2019-06-03 20:40:32 +00:00
Alexander Motin	07a5c938c9	MFV r348578: 9962 zil_commit should omit cache thrash illumos/illumos-gate@cab3a55e15 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Approved by: Joshua M. Clulow <josh@sysmgr.org> Author: Prakash Surya <prakash.surya@delphix.com>	2019-06-03 20:24:40 +00:00
Alexander Motin	ce88141b27	MFV r348568: 9466 add JSON output support to channel programs illumos/illumos-gate@5267591016 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Alek Pinchuk <apinchuk@datto.com>	2019-06-03 19:15:06 +00:00
Alexander Motin	677ef2563d	MFV r348553: 9681 ztest failure in spa_history_log_internal due to spa_rename() illumos/illumos-gate@6aee0ad769 Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2019-06-03 18:32:56 +00:00
Alexander Motin	74f7070445	MFV r348552: 9682 page fault in dsl_async_clone_destroy() while opening pool illumos/illumos-gate@ade2c82828 Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2019-06-03 17:56:44 +00:00
Alexander Motin	c4af53b8f1	MFV r348534: 9616 Bogus error when attempting to set property on read-only pool illumos/illumos-gate@f62db44dbc Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andrew Stormont <astormont@racktopsystems.com>	2019-06-03 17:19:05 +00:00
Mark Johnston	f0e2814d34	Hook up the existing i386 DTrace tests to the build. Now that it's relatively easy to do so, we might as well. MFC after: 1 week Event: Waterloo Hackathon 2019	2019-05-22 03:42:03 +00:00
Mark Johnston	2df0edc13c	Make it possible to generate makefiles for arch-dependent DTrace tests. MFC after: 1 week Event: Waterloo Hackathon 2019	2019-05-22 03:10:23 +00:00
Leandro Lupori	ff7449d6f5	[PowerPC64] stand: fix build using clang 8 as compiler This change fixes "stand" build issues when using clang 8 as compiler. Submitted by: alfredo.junior_eldorado.org.br Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D20026	2019-05-20 19:21:35 +00:00
Allan Jude	19a9d4fa28	MFV/ZoL: `zfs userspace` ignored all unresolved UIDs after the first zfsonlinux/zfs@88cfff1824 zfs_main: fix `zfs userspace` squashing unresolved entries The `zfs userspace` squashes all entries with unresolved numeric values into a single output entry due to the comparsion always made by the string name which is empty in case of unresolved IDs. Fix this by falling to a numerical comparison when either one of string values is not found. This then compares any numerical values after all with a name resolved. Signed-off-by: Pavel Boldin <boldin.pavel@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reported by: clusteradm Obtained from: ZFS-on-Linux MFC after: 3 days	2019-05-18 12:27:22 +00:00
Alexander Motin	d044b69950	Fix dataset name comparison in zfs_compare(). The code never returned match comparing two datasets (not snapshots). As result, uu_avl_find(), called from zfs_callback(), never succeeded, allowing to add same dataset into the list multiple times, for example: # zfs get name pers pers pers@z pers@z NAME PROPERTY VALUE SOURCE pers name pers - pers name pers - pers@z name pers@z - With the patch: # zfs get name pers pers pers@z pers@z NAME PROPERTY VALUE SOURCE pers name pers - pers@z name pers@z - MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-05-08 01:35:43 +00:00
Li-Wen Hsu	132af14fde	Add a trailing empty line to match the test code output This is added for letting these long failing test case pass, and for consistency. The test code should be fixed later to not output this extra empty line. Sponsored by: The FreeBSD Foundation	2019-04-29 03:50:21 +00:00
Michael Tuexen	6eb0062dd0	Some test scripts use ncat --sctp --listen port to run an SCTP discard server in the background. However, when running in the background, stdin is closed and ncat initiates a graceful shutdown of the SCTP association. This is not expected by the client. Therefore, the ncat-based discard server is replaced by a perl-based one. In addition, to remove the dependency from ncat, which needs to be installed via the nmap port, also the code testing for a free SCTP port is changed to use the perl-based client. Finally, remove some debug output from the report generated. Reviewed by: lwhsu@ Differential Revision: https://reviews.freebsd.org/D20086	2019-04-28 19:07:31 +00:00
Edward Tomasz Napierala	b9d89f5e2f	Drop -g from CFLAGS for zfsd(8). No idea why it was ever there. Reviewed by: kib, ngie, asomers MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19915	2019-04-16 12:25:15 +00:00
Edward Tomasz Napierala	93a07b2278	Make zfsd(8) build obey CFLAGS. Reviewed by: imp Obtained from: CheriBSD MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19865	2019-04-10 13:42:37 +00:00
Mark Johnston	5ee81b26a8	Ensure that we use a 64-bit value for the last mmap() argument. When using __syscall(2), the offset argument is passed on the stack on amd64. Previously only 32 bits were written, so the upper 32 bits were garbage and could cause the test to fail. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-03-20 23:35:15 +00:00
Enji Cooper	a530b61063	Integrate cddl/usr.sbin/zfds/tests into the FreeBSD test suite This change integrates the unit tests for zfsd into the test suite using the integration method described in r345203. This change removes the `LOCALBASE` includes added for the port version of googlemock/googletest, as well as unnecessary `LIBADD`/`DPADD` and `CXXFLAGS` defines, which are included in the `GTEST_CXXFLAGS` variable, as part of r345203. Reviewed by: asomers Approved by: emaste (mentor) MFC after: 2 months MFC with: r345203 Differential Revision: https://reviews.freebsd.org/D19552	2019-03-15 21:49:19 +00:00
Baptiste Daroussin	e7be6f4ed4	Fix a regression introduced in r344569 Import a fix from illumos (thanks Toomas Soomas for pointing at it) See https://www.illumos.org/issues/10205 for more details Illumos commit: `247b7da039` Submitted by: jack@gandi.net Reported by: cy Reviewed by: tsoome, cy, bapt Obtained from: Illumos	2019-02-27 13:49:41 +00:00
Baptiste Daroussin	c7851c5b7c	Fix regression introduced in r344569 Reported by: cy Tested by: cy Submitted by: Fatih Acar <fatih@gandi.net>	2019-02-27 07:55:53 +00:00
Sean Eric Fagan	50792eb553	Set process title during zfs send. This adds a '-V' option to 'zfs send', which sets the process title once a second to the progress information. This code has been in FreeNAS for a long time now; this is just upstreaming it here. It was originially written by delphij. Reviewed by: mav Obtained from: iXsystems, Inc Sponsored by: iXsystems, Inc Differential Revision: https://reviews.freebsd.org/D19184	2019-02-26 19:23:22 +00:00
Baptiste Daroussin	0b858c82d8	Implement parallel mounting for ZFS filesystem It was first implemented on Illumos and then ported to ZoL. This patch is a port to FreeBSD of the ZoL version. This patch also includes a fix for a race condition that was amended With such patch Delphix has seen a huge decrease in latency of the mount phase (https://github.com/openzfs/openzfs/commit/a3f0e2b569 for details). With that current change Gandi has measured improvments that are on par with those reported by Delphix. Zol commits incorporated: `a10d50f999` `e63ac16d25` Reviewed by: avg, sef Approved by: avg, sef Obtained from: ZoL MFC after: 1 month Relnotes: yes Sponsored by: Gandi.net Differential Revision: https://reviews.freebsd.org/D19098	2019-02-26 08:18:34 +00:00
Leandro Lupori	559af1ec16	Increase ctfconvert buffer size Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D19353	2019-02-25 18:52:47 +00:00
Mark Johnston	9747bd8e02	MFV r344364: 9058 postmortem DTrace frequently broken under vmware illumos/illumos-gate@793bd7e361 MFC after: 1 week	2019-02-20 17:10:30 +00:00
Andriy Gapon	885b0f9e91	zpool.8: sort zpool status flags in the same order as in illumos manual Just in case, while I was here. MFC after: 1 week	2019-02-20 13:37:27 +00:00
Andriy Gapon	d97ff345cf	zpool.8: document -D flag for zpool status The description is taken from the illumos manual. Reported by: stilezy@gmail.com MFC after: 1 week	2019-02-20 13:34:16 +00:00
Andriy Gapon	30d6475b3f	fix userland illumos taskq code to pass relative timeout to cv_timedwait Unlike illumos, FreeBSD cv_timedwait requires a relative timeout. That applies both to the kernel illumos compatibility code and to the userland "fake kernel" code. MFC after: 2 weeks Sponsored by: Panzura	2019-02-20 13:19:08 +00:00
Kirk McKusick	88640c0e8b	Create new EINTEGRITY error with message "Integrity check failed". An integrity check such as a check-hash or a cross-correlation failed. The integrity error falls between EINVAL that identifies errors in parameters to a system call and EIO that identifies errors with the underlying storage media. EINTEGRITY is typically raised by intermediate kernel layers such as a filesystem or an in-kernel GEOM subsystem when they detect inconsistencies. Uses include allowing the mount(8) command to return a different exit value to automate the running of fsck(8) during a system boot. These changes make no use of the new error, they just add it. Later commits will be made for the use of the new error number and it will be added to additional manual pages as appropriate. Reviewed by: gnn, dim, brueffer, imp Discussed with: kib, cem, emaste, ed, jilles Differential Revision: https://reviews.freebsd.org/D18765	2019-01-17 06:35:45 +00:00
Andriy Gapon	4c325393f3	MFV r342532: 5882 Temporary pool names Note that this commit brings only formatting changes that were done during the final review of the illumos change, because FreeBSD got the main changes before illumos. illumos/illumos-gate@04e5635652 `04e5635652` https://www.illumos.org/issues/5882 This is an import of the temporary pool names functionality from ZoL: `e2282ef57e` `26b42f3f9d` `2f3ec90061` `00d2a8c92f` `83e9986f6e` `023bbe6f01` It is intended to assist the creation and management of virtual machines that have their rootfs on ZFS on hosts that also have their rootfs on ZFS. These situations cause SPA namespace collisions when the standard name rpool is used in both cases. The solution is either to give each guest pool a name unique to the host, which is not always desireable, or boot a VM environment containing an ISO image to install it, which is cumbersome. MFC after: 1 week Sponsored by: Panzura	2018-12-26 11:03:14 +00:00
Andriy Gapon	f050611e7f	MFV r342469: 9630 add lzc_rename and lzc_destroy to libzfs_core illumos/illumos-gate@049ba636fa `049ba636fa` https://www.illumos.org/issues/9630 Rename and destroy are very useful operations that deserve to be in libzfs_core. And they are not hard to implement too. MFC after: 2 weeks Relnotes: maybe	2018-12-26 10:37:41 +00:00
Yuri Pankov	65c9ed85e4	dtrace(1): remove reference to dtruss that was removed from base system in r300226. PR: 211618 Reviewed by: gnn, markj, 0mp Approved by: kib (mentor, implicit) Differential Revision: https://reviews.freebsd.org/D17762	2018-10-31 15:29:26 +00:00
Michael Tuexen	1e88cc8b59	Add support for send, receive and state-change DTrace providers for SCTP. They are based on what is specified in the Solaris DTrace manual for Solaris 11.4. Reviewed by: 0mp, dteske, markj Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16839	2018-08-22 21:23:32 +00:00
Matt Macy	d12e91d584	Make dnode definition uniform on !x86 gcc4 requires -fms-extensions to accept anonymous union members	2018-08-21 03:45:09 +00:00
Kyle Evans	7920ad944b	libbe(3): Move build goop back out of cddl/ Some background: in the GSoC project, libbe/Makefile lived in lib/libbe. I created projects/bectl branch, maintained the above for all of five minutes before I misread Makefile.inc1 and decided that it couldn't possibly build outside of cddl/, so I kicked the Makefile out into the cddl/ build and all was good. The misreading was of the bit where .WAIT is added to SUBDIR after lib, libexec but prior to building bin and cddl only during the install targets, which is the critical part. Fast forward- buildworld was still broken in my branch unbeknownst to me because I didn't nuke my OBJDIR. Combing through Makefile.inc1 eventually revealed the necessary magic to make sure that libbe's dependencies are specified well enough, and it becomes clear what needs done to make a non-cddl/ build work. This is an interesting prospect, because the build split is kind of annoying to work with. IGNORE_PRAGMA is added to avoid dropping WARNS by one more. This was previously pulled in via cddl/Makefile.inc.	2018-08-18 03:20:59 +00:00
Kyle Evans	f25a4e58ec	libbe(3): Remove -v from LDFLAGS -v is clearly not needed for linking, and it adds extra verbose information that is not necessary.	2018-08-18 03:08:54 +00:00
Mark Johnston	f0af0b312f	Add partial documentation for dtrace(1)'s -x configuration options. Some options are still missing descriptions, but they can be filled in over time. Submitted by: raichoo <raichoo@googlemail.com> Reviewed by: 0mp (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16671	2018-08-16 19:28:44 +00:00
Will Andrews	450e5a4378	zfs: add ztest to the kyua test suite. This program is currently failing, and has been for >6 months on HEAD. Ideally, this should be run 24x7 in CI, to discover hard-to-find bugs that only manifest with concurrent i/o. Requested by: lwhsu, mmacy	2018-08-15 13:05:04 +00:00
Kyle Evans	f2fdf2a1dc	libbe(3)/bectl(8): Remove now-redundant include paths These were previously necessary because the libnvpair and libzfs_core includes were not installed into the SYSROOT, being a part of the copies target in include/Makefile rather than being installed with the library. This was fixed in r337696 and the headers are now installed properly, so we may let go of the cruft.	2018-08-13 05:01:19 +00:00
Kyle Evans	ce33c57d6c	Use INCS for non-sys/ libnvpair and libzfs_core includes While nothing was wrong with libnvpair.h, libzfs_core.h was only guarded by MK_CDDL rather than MK_CDDL && MK_ZFS. Rather than ugl'if'ying include/Makefile to impose the extra restriction, just move the non-sys/ includes into INCS with the respect lib builds. This has the added bonus of allowing third party packagers to try and split these libs out of the FreeBSD-runtime package, if they are so inclined. The sys/ include was left alone- generally userland libraries shouldn't install kernel headers. MFC after: 1 week	2018-08-13 03:38:32 +00:00
Matt Macy	cc0fbbb92e	MFV/ZoL: Implement large_dnode pool feature commit 50c957f702ea6d08a634e42f73e8a49931dd8055 Author: Ned Bass <bass6@llnl.gov> Date: Wed Mar 16 18:25:34 2016 -0700 Implement large_dnode pool feature Justification ------------- This feature adds support for variable length dnodes. Our motivation is to eliminate the overhead associated with using spill blocks. Spill blocks are used to store system attribute data (i.e. file metadata) that does not fit in the dnode's bonus buffer. By allowing a larger bonus buffer area the use of a spill block can be avoided. Spill blocks potentially incur an additional read I/O for every dnode in a dnode block. As a worst case example, reading 32 dnodes from a 16k dnode block and all of the spill blocks could issue 33 separate reads. Now suppose those dnodes have size 1024 and therefore don't need spill blocks. Then the worst case number of blocks read is reduced to from 33 to two--one per dnode block. In practice spill blocks may tend to be co-located on disk with the dnode blocks so the reduction in I/O would not be this drastic. In a badly fragmented pool, however, the improvement could be significant. ZFS-on-Linux systems that make heavy use of extended attributes would benefit from this feature. In particular, ZFS-on-Linux supports the xattr=sa dataset property which allows file extended attribute data to be stored in the dnode bonus buffer as an alternative to the traditional directory-based format. Workloads such as SELinux and the Lustre distributed filesystem often store enough xattr data to force spill bocks when xattr=sa is in effect. Large dnodes may therefore provide a performance benefit to such systems. Other use cases that may benefit from this feature include files with large ACLs and symbolic links with long target names. Furthermore, this feature may be desirable on other platforms in case future applications or features are developed that could make use of a larger bonus buffer area. Implementation -------------- The size of a dnode may be a multiple of 512 bytes up to the size of a dnode block (currently 16384 bytes). A dn_extra_slots field was added to the current on-disk dnode_phys_t structure to describe the size of the physical dnode on disk. The 8 bits for this field were taken from the zero filled dn_pad2 field. The field represents how many "extra" dnode_phys_t slots a dnode consumes in its dnode block. This convention results in a value of 0 for 512 byte dnodes which preserves on-disk format compatibility with older software. Similarly, the in-memory dnode_t structure has a new dn_num_slots field to represent the total number of dnode_phys_t slots consumed on disk. Thus dn->dn_num_slots is 1 greater than the corresponding dnp->dn_extra_slots. This difference in convention was adopted because, unlike on-disk structures, backward compatibility is not a concern for in-memory objects, so we used a more natural way to represent size for a dnode_t. The default size for newly created dnodes is determined by the value of a new "dnodesize" dataset property. By default the property is set to "legacy" which is compatible with older software. Setting the property to "auto" will allow the filesystem to choose the most suitable dnode size. Currently this just sets the default dnode size to 1k, but future code improvements could dynamically choose a size based on observed workload patterns. Dnodes of varying sizes can coexist within the same dataset and even within the same dnode block. For example, to enable automatically-sized dnodes, run # zfs set dnodesize=auto tank/fish The user can also specify literal values for the dnodesize property. These are currently limited to powers of two from 1k to 16k. The power-of-2 limitation is only for simplicity of the user interface. Internally the implementation can handle any multiple of 512 up to 16k, and consumers of the DMU API can specify any legal dnode value. The size of a new dnode is determined at object allocation time and stored as a new field in the znode in-memory structure. New DMU interfaces are added to allow the consumer to specify the dnode size that a newly allocated object should use. Existing interfaces are unchanged to avoid having to update every call site and to preserve compatibility with external consumers such as Lustre. The new interfaces names are given below. The versions of these functions that don't take a dnodesize parameter now just call the _dnsize() versions with a dnodesize of 0, which means use the legacy dnode size. New DMU interfaces: dmu_object_alloc_dnsize() dmu_object_claim_dnsize() dmu_object_reclaim_dnsize() New ZAP interfaces: zap_create_dnsize() zap_create_norm_dnsize() zap_create_flags_dnsize() zap_create_claim_norm_dnsize() zap_create_link_dnsize() The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The spa_maxdnodesize() function should be used to determine the maximum bonus length for a pool. These are a few noteworthy changes to key functions: * The prototype for dnode_hold_impl() now takes a "slots" parameter. When the DNODE_MUST_BE_FREE flag is set, this parameter is used to ensure the hole at the specified object offset is large enough to hold the dnode being created. The slots parameter is also used to ensure a dnode does not span multiple dnode blocks. In both of these cases, if a failure occurs, ENOSPC is returned. Keep in mind, these failure cases are only possible when using DNODE_MUST_BE_FREE. If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0. dnode_hold_impl() will check if the requested dnode is already consumed as an extra dnode slot by an large dnode, in which case it returns ENOENT. * The function dmu_object_alloc() advances to the next dnode block if dnode_hold_impl() returns an error for a requested object. This is because the beginning of the next dnode block is the only location it can safely assume to either be a hole or a valid starting point for a dnode. * dnode_next_offset_level() and other functions that iterate through dnode blocks may no longer use a simple array indexing scheme. These now use the current dnode's dn_num_slots field to advance to the next dnode in the block. This is to ensure we properly skip the current dnode's bonus area and don't interpret it as a valid dnode. zdb --- The zdb command was updated to display a dnode's size under the "dnsize" column when the object is dumped. For ZIL create log records, zdb will now display the slot count for the object. ztest ----- Ztest chooses a random dnodesize for every newly created object. The random distribution is more heavily weighted toward small dnodes to better simulate real-world datasets. Unused bonus buffer space is filled with non-zero values computed from the object number, dataset id, offset, and generation number. This helps ensure that the dnode traversal code properly skips the interior regions of large dnodes, and that these interior regions are not overwritten by data belonging to other dnodes. A new test visits each object in a dataset. It verifies that the actual dnode size matches what was stored in the ztest block tag when it was created. It also verifies that the unused bonus buffer space is filled with the expected data patterns. ZFS Test Suite -------------- Added six new large dnode-specific tests, and integrated the dnodesize property into existing tests for zfs allow and send/recv. Send/Receive ------------ ZFS send streams for datasets containing large dnodes cannot be received on pools that don't support the large_dnode feature. A send stream with large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be unrecognized by an incompatible receiving pool so that the zfs receive will fail gracefully. While not implemented here, it may be possible to generate a backward-compatible send stream from a dataset containing large dnodes. The implementation may be tricky, however, because the send object record for a large dnode would need to be resized to a 512 byte dnode, possibly kicking in a spill block in the process. This means we would need to construct a new SA layout and possibly register it in the SA layout object. The SA layout is normally just sent as an ordinary object record. But if we are constructing new layouts while generating the send stream we'd have to build the SA layout object dynamically and send it at the end of the stream. For sending and receiving between pools that do support large dnodes, the drr_object send record type is extended with a new field to store the dnode slot count. This field was repurposed from unused padding in the structure. ZIL Replay ---------- The dnode slot count is stored in the uppermost 8 bits of the lr_foid field. The bits were unused as the object id is currently capped at 48 bits. Resizing Dnodes --------------- It should be possible to resize a dnode when it is dirtied if the current dnodesize dataset property differs from the dnode's size, but this functionality is not currently implemented. Clearly a dnode can only grow if there are sufficient contiguous unused slots in the dnode block, but it should always be possible to shrink a dnode. Growing dnodes may be useful to reduce fragmentation in a pool with many spill blocks in use. Shrinking dnodes may be useful to allow sending a dataset to a pool that doesn't support the large_dnode feature. Feature Reference Counting -------------------------- The reference count for the large_dnode pool feature tracks the number of datasets that have ever contained a dnode of size larger than 512 bytes. The first time a large dnode is created in a dataset the dataset is converted to an extensible dataset. This is a one-way operation and the only way to decrement the feature count is to destroy the dataset, even if the dataset no longer contains any large dnodes. The complexity of reference counting on a per-dnode basis was too high, so we chose to track it on a per-dataset basis similarly to the large_block feature. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3542	2018-08-12 00:45:53 +00:00
Kyle Evans	3f48dbd1cc	Merge libbe(3)/bectl(8) from projects/bectl into head bectl(8) is an administrative interface for working with ZFS boot environments, intended to provide a superset of the functionality provided by sysutils/beadm. libbe(3) is the back-end library that the required functionality has been pulled out into for later reuse. These were originally written for GSoC 2017 under the mentorship of allanjude@. bectl(8) has proven pretty stable in my testing, with the known bug documented in the man page. Relnotes: yes	2018-08-11 23:50:09 +00:00
Kyle Evans	35d2028fb8	libbe(3)/bectl(8): More SYSROOT/GCC build fixes - Missing include path - Fully specify libzfs's dependencies (except for deps pulled in by other deps) in Makefile.inc1 - Drop WARNS back down to 2 for libbe(3). I do this with much hesitation, but the libzfs headers are apparently a hot warning-filled mess as far as GCC 4.2 is concerned.	2018-08-11 22:45:39 +00:00
Alexander Leidinger	a079a34fd5	Extend the info about the limitations of datasets in jails. Reviewed by: allanjude Sponsored by: Essen Hackathon	2018-08-11 20:49:19 +00:00
Brad Davis	edb1df35b0	Fix the build by just installing systop since testing shows it works with: dwatch -X systop Reviewed by: kp Approved by: allanjude (mentor)	2018-08-11 16:06:32 +00:00
Devin Teske	37b0d996dc	dwatch(1): Add systop profile Provides a top-like view of syscall consumers. MFC after: 3 days X-MFC-to: stable/11 Sponsored by: Smule, Inc.	2018-08-11 06:32:31 +00:00

1 2 3 4 5 ...

1176 Commits