freebsd-skq

Author	SHA1	Message	Date
mmacy	cffd15327b	ZFS/MFV: Use cached feature info in spa_add_feature_stats() commit 417104bdd3c7ce07ec58674dd078f9891c3bc780 Author: Ned Bass <bass6@llnl.gov> Date: Thu Feb 26 12:24:11 2015 -0800 Use cached feature info in spa_add_feature_stats() Avoid issuing I/O to the pool when retrieving feature flags information. Trying to read the ZAPs from disk means that zpool clear would hang if the pool is suspended and recovery would require a reboot. To keep the feature stats resident in memory, we hang a cached nvlist off of the spa. It is built up from disk the first time spa_add_feature_stats() is called, and refreshed thereafter using the cached feature reference counts. spa_add_feature_stats() gets called at pool import time so we can be sure the cached nvlist will be available if the pool is later suspended. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3082	2018-08-10 23:42:11 +00:00
mmacy	5160573320	Performance optimization of AVL tree comparator functions MFV: commit ee36c709c3d5f7040e1bd11f5c75318aa03e789f Author: Gvozden Neskovic <neskovic@gmail.com> Date: Sat Aug 27 20:12:53 2016 +0200 perf: 2.75x faster ddt_entry_compare() First 256bits of ddt_key_t is a block checksum, which are expected to be close to random data. Hence, on average, comparison only needs to look at first few bytes of the keys. To reduce number of conditional jump instructions, the result is computed as: sign(memcmp(k1, k2)). Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1} , which is computed efficiently. Synthetic performance evaluation of original and new algorithm over 1G random keys on 2.6GHz Intel(R) Xeon(R) CPU E5-2660 v3: old 6.85789 s new 2.49089 s perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare() Compute the result directly instead of using conditionals perf: zfs_range_compare() Speedup between 1.1x - 2.5x, depending on compiler version and optimization level. perf: spa_error_entry_compare() `bcmp()` is not suitable for comparator use. Use `memcmp()` instead. perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare() perf: 2.8x faster zil_bp_compare() perf: 2.8x faster mze_compare() perf: faster dbuf_compare() perf: faster compares in spa_misc perf: 2.8x faster layout_hash_compare() perf: 2.8x faster space_reftree_compare() perf: libzfs: faster avl tree comparators perf: guid_compare() perf: dsl_deadlist_compare() perf: perm_set_compare() perf: 2x faster range_tree_seg_compare() perf: faster unique_compare() perf: faster vdev_cache _compare() perf: faster vdev_uberblock_compare() perf: faster fuid _compare() perf: faster zfs_znode_hold_compare() Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Richard Elling <richard.elling@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5033	2018-08-10 06:42:08 +00:00
mav	592eca3179	Reduce taskq and context-switch cost of zio pipe When doing a read from disk, ZFS creates 3 ZIO's: a zio_null(), the logical zio_read(), and then a physical zio. Currently, each of these results in a separate taskq_dispatch(zio_execute). On high-read-iops workloads, this causes a significant performance impact. By processing all 3 ZIO's in a single taskq entry, we reduce the overhead on taskq locking and context switching. We accomplish this by allowing zio_done() to return a "next zio to execute" to zio_execute(). This results in a ~12% performance increase for random reads, from 96,000 iops to 108,000 iops (with recordsize=8k, on SSD's). Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: George Wilson <george.wilson@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> External-issue: DLPX-59292 Closes #7736 zfsonlinux/zfs@62840030a7	2018-08-03 02:16:45 +00:00
mav	a68b7794d2	MFV r337223: 9580 Add a hash-table on top of nvlist to speed-up operations illumos/illumos-gate@2ec7644aab Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2018-08-03 01:52:25 +00:00
mav	d1a9e67be4	MFV r337220: 8375 Kernel memory leak in nvpair code illumos/illumos-gate@843c2111b1 Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-03 01:30:03 +00:00
mav	730a64d03c	MFV r337218: 7261 nvlist code should enforce name length limit illumos/illumos-gate@48dd5e630c Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-03 01:26:07 +00:00
mav	cd3a34292e	MFV r337216: 7263 deeply nested nvlist can overflow stack illumos/illumos-gate@9ca527c3d3 Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-03 01:09:12 +00:00
mav	067c6a290b	MFV 337214: 9621 Make createtxg and guid properties public illumos/illumos-gate@e8d4a73c86 Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Yuri Pankov <yuripv@yuripv.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Josh Paetzel <josh@tcbug.org>	2018-08-03 00:24:27 +00:00
mav	42b94416b6	MFV r337212: 9465 ARC check for 'anon_size > arc_c/2' can stall the system illumos/illumos-gate@abe1fd01ce Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Don Brady <don.brady@delphix.com>	2018-08-03 00:14:36 +00:00
mav	78003d8dd8	MFV r337210: 9577 remove zfs_dbuf_evict_key tsd The zfs_dbuf_evict_key TSD (thread-specific data) is not necessary - we can instead pass a flag down in a few places to prevent recursive dbuf eviction. Making this change has 3 benefits: 1. The code semantics are easier to understand. 2. On Linux, performance is improved, because creating/removing TSD values (by setting to NULL vs non-NULL) is expensive, and we do it very often. 3. According to Nexenta, the current semantics can cause a deadlock when concurrently calling dmu_objset_evict_dbufs() (which is rare today, but they are working on a "parallel unmount" change that triggers this more easily) illumos/illumos-gate@c2919acbea Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-03 00:01:48 +00:00
mav	078a6fb8f3	MFV r337208: 9591 ms_shift can be incorrectly changed in MOS config for indirect vdevs that have been historically expanded illumos/illumos-gate@11f6a9680e Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Tim Chase <tim@chase2k.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2018-08-02 23:56:07 +00:00
mav	2b17cdbaca	MFV r337206: 9338 moved dnode has incorrect dn_next_type illumos/illumos-gate@c7fbe46df9 Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-02 23:50:03 +00:00
mav	253d52f67c	MFV r337204: 9439 ZFS double-free due to failure to dirty indirect block illumos/illumos-gate@99a19144e8 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-02 23:46:30 +00:00
mav	b2141aeb01	MFV r337200: 9438 Holes can lose birth time info if a block has a mix of birth times Ultimately, the problem here is that when you truncate and write a file in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation, and then written for the write. During this process, we will lose hole birth time information for any holes in the range. In the case where a dnode is being freed, we need to determine whether the block should be converted to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out. illumos/illumos-gate@738e2a3ce3 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>	2018-08-02 23:43:01 +00:00
mav	22e6be616f	Fix build after r337196 mismerge.	2018-08-02 23:40:28 +00:00
mav	4fe3524922	MFV r337197: 9456 ztest failure in zil_commit_waiter_timeout illumos/illumos-gate@b6031810da Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Approved by: Matt Ahrens <mahrens@delphix.com> Author: Prakash Surya <prakash.surya@delphix.com>	2018-08-02 23:25:49 +00:00
mav	9b1a0e4bff	MFV r337195: 9454 ::zfs_blkstats should count embedded blocks illumos/illumos-gate@dec267e7ea Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-02 23:23:10 +00:00
mav	2aa2398e3b	MFV r337193: 9424 ztest failure: "unprotected error in call to Lua API (Invalid value type 'f unction' for key 'error')" illumos/illumos-gate@fe3ba4d122 Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Don Brady <don.brady@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-02 23:15:10 +00:00
mav	486e11723b	MFV r337190: 9486 reduce memory used by device removal on fragmented pools In the most fragmented real-world cases, this reduces memory used by the mapping from ~1GB to ~50MB of RAM per 1TB of storage removed. Less fragmented cases will typically also see around 50-100MB of RAM per 1TB of storage. illumos/illumos-gate@cfd63e1b1b Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Tim Chase <tim@chase2k.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-02 21:59:46 +00:00
mav	84ffe82515	MFV r337182: 9330 stack overflow when creating a deeply nested dataset Datasets that are deeply nested (~100 levels) are impractical. We just put a limit of 50 levels to newly created datasets. Existing datasets should work without a problem. illumos/illumos-gate@5ac95da7d6 Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>	2018-08-02 21:19:35 +00:00
mav	a93c53b4de	9539 Make zvol operations use _by_dnode routines Continues what was started in 7801 add more by-dnode routines by fully converting zvols to avoid unnecessary dnode_hold() calls. This saves a small amount of CPU time and slightly improves latencies of operations on zvols. illumos/illumos-gate@8dfe5547fb Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Rick McNeal <rick.mcneal@nexenta.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Richard Yao <richard.yao@prophetstor.com>	2018-08-02 21:07:04 +00:00
mav	cdfff2b3ed	MFV r337175: 9487 Free objects when receiving full stream as clone All objects after the last written or freed object are not supposed to exist after receiving the stream. We should free them accordingly, as if a freeobjects record for them had been included in the stream. zfsonlinux/zfs@48fbb9ddbf illumos/illumos-gate@7864b8192b Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>	2018-08-02 20:33:13 +00:00
mav	d0e26b6b03	MFV r337171: 9464 txg_kick() fails to see that we are quiescing, forcing transactions to their next stages without leaving them accumulate changes Ideally we would like txg_kick() to get triggered only when we are sure that we are not syncing AND not quiescing any txg. This way we can kick an open TXG to the quiescing state when we are sure that there is nothing going on and we would benefit from the different states running concurrently. illumos/illumos-gate@fa41d87de9 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Approved by: Dan McDonald <danmcd@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2018-08-02 20:18:49 +00:00
mav	6e7827dd75	MFV r337167: 9442 decrease indirect block size of spacemaps Updates to indirect blocks of spacemaps can contribute significantly to write inflation. Therefore we want to reduce the indirect block size of spacemaps from 128K to 16K. illumos/illumos-gate@221813c13b Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Albert Lee <trisk@forkgnu.org> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Dan McDonald <danmcd@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-02 20:06:46 +00:00
mav	c6e40e5893	MFV r337029: 9426 metaslab size can exceed offset addressable by spacemap metaslab size can exceed offset addressable by spacemap. The vdev can address up to 2^63 * SPA_MAXBLOCKSIZE (512). A metaslab can address up to 2^47 * 2^vdev_ashift. Therefore we may need to increase the number of metaslabs so that the maximum metaslab size is capped at the amount that can be addressed by the spacemap. This should happen in vdev_metaslab_set_size(). illumos/illumos-gate@b4bf0cf045 Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Don Brady <don.brady@delphix.com>	2018-08-01 03:21:17 +00:00
mav	89566ef6ce	MFV r337027: 9328 zap code can take advantage of c99 9329 panic in zap_leaf_lookup() due to concurrent zapification illumos/illumos-gate@bf26014c55 Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-01 03:07:33 +00:00
mav	98de93d873	MFV r337022: 9403 assertion failed in arc_buf_destroy() when concurrently reading block with checksum error This assertion (VERIFY) failure was reported when reading a block. Turns out the problem is that if we get an i/o error (ECKSUM in this case), and there are multiple concurrent ARC reads of the same block (from different clones), then the ARC will put multiple buf's on the same ANON hdr, which isn't supposed to happen, and then causes a panic when we try to arc_buf_destroy() the buf. illumos/illumos-gate@fa98e487a9 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Matt Ahrens <mahrens@delphix.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-08-01 02:39:44 +00:00
mav	9cab53843c	MFV r337020:9443 panic when scrub a v10 pool illumos/illumos-gate@bb1f424574 Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Dan McDonald <danmcd@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2018-07-31 23:00:58 +00:00
mav	3d8e119409	MFV r337014: 9421 zdb should detect and print out the number of "leaked" objects 9422 zfs diff and zdb should explicitly mark objects that are on the deleted queue illumos/illumos-gate@20b5dafb42 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Matt Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>	2018-07-31 22:50:50 +00:00
mav	7bffd764fe	MFV r336991, r337001: 9102 zfs should be able to initialize storage devices The first access to a disk block can incur a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore it is recommended that volumes be "thick provisioned", where supported by the platform (VMware). Thick provisioning is time consuming and often is ignored. If the thick provision step is omitted, customers will see suboptimal performance until we have written to all parts of the LUN. ZFS should be able to initialize any unused storage to remove any first-write penalty that exists. illumos/illumos-gate@094e47e980 Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: George Wilson <george.wilson@delphix.com>	2018-07-31 21:06:04 +00:00
mav	ecba38e72f	MFV r336960: 9256 zfs send space estimation off by > 10% on some datasets illumos/illummos-gate@df477c0afa Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Paul Dagnelie <pcd@delphix.com>	2018-07-31 01:02:22 +00:00
mav	3a353ba423	MFV r336958: 9337 zfs get all is slow due to uncached metadata This project's goal is to make read-heavy channel programs and zfs(1m) administrative commands faster by caching all the metadata that they will need in the dbuf layer. This will prevent the data from being evicted, so that any future call to i.e. zfs get all won't have to go to disk (very much). illumos/illumos-gate@adb52d9262 Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Thomas Caputi <tcaputi@datto.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2018-07-31 00:58:21 +00:00
mav	5947e79ef4	MFV r336955: 9236 nuke spa_dbgmsg We should use zfs_dbgmsg instead of spa_dbgmsg. Or at least, metaslab_condense() should call zfs_dbgmsg because it's important and rare enough to always log. It's possible that the message in zio_dva_allocate() would be too high-frequency for zfs_dbgmsg. illumos/illumos-gate@21f7c81cc1 Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2018-07-31 00:47:27 +00:00
mav	a431f9b1ff	MFV r336952: 9192 explicitly pass good_writes to vdev_uberblock/label_sync Currently vdev_label_sync and vdev_uberblock_sync take a zio_t and assume that its io_private is a pointer to the good_writes count. They should instead accept this argument explicitly. illumos/illumos-gate@a3b5583021 Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2018-07-31 00:37:45 +00:00
mav	5c26614dcc	MFV r336950: 9290 device removal reduces redundancy of mirrors Mirrors are supposed to provide redundancy in the face of whole-disk failure and silent damage (e.g. some data on disk is not right, but ZFS hasn't detected the whole device as being broken). However, the current device removal implementation bypasses some of the mirror's redundancy. illumos/illumos-gate@3a4b1be953 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Tim Chase <tim@chase2k.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2018-07-31 00:25:39 +00:00
mav	92f5755f2b	MFV r336948: 9112 Improve allocation performance on high-end systems On high-end systems running async sequential write workloads, especially NUMA systems with flash or NVMe storage, one significant performance bottleneck is selecting a metaslab to do allocations from. This process can be parallelized, providing significant performance increases for these workloads. illumos/illumos-gate@f78cdc34af Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Alexander Motin <mav@FreeBSD.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Paul Dagnelie <pcd@delphix.com>	2018-07-31 00:02:42 +00:00
mav	8e1f6ca37d	MFV r336946: 9238 ZFS Spacemap Encoding V2 The current space map encoding has the following disadvantages: [1] Assuming 512 sector size each entry can represent at most 16MB for a segment. This makes the encoding very inefficient for large regions of space. [2] As vdev-wide space maps have started to be used by new features (i.e. device removal, zpool checkpoint) we've started imposing limits in the vdevs that can be used with them based on the maximum addressable offset (currently 64PB for a top-level vdev). The new remains backwards compatible with the old one. The introduced two-word entry format, besides extending the limits imposed by the single-entry layout, also includes a vdev field and some extra padding after its prefix. The extra padding after the prefix should is reserved for future usage (e.g. new prefixes for future encodings or new fields for flags). The new vdev field not only makes the space maps more self-descriptive, but also opens the doors for pool-wide space maps. One final important note is that the number of bits used for vdevs is reduced to 24 bits for blkptrs. That was decided as we don't know of any setups that use more than 16M vdevs for the time being and we wanted to fit the vdev field in the space map. In addition that gives us some extra bits in dva_t. illumos/illumos-gate@17f11284b4 Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2018-07-30 23:47:38 +00:00
mav	7a5d3c92c7	MFV r336942: 9189 Add debug to vdev_label_read_config when txg check fails illumos/illumos-gate@b6bf6e1540 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Approved by: Matt Ahrens <mahrens@delphix.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com>	2018-07-30 22:03:29 +00:00
allanjude	b8150c4df5	ZFS: Reserve DMU_BACKUP_FEATURE flags for Native Encryption and ZSTD	2018-07-24 04:38:11 +00:00
sef	b99e763f8e	Fix a couple of typos in r334844 noticed by Richard Kojedzinszky. Submitted by: Richard Kojedzinszky Reviewed by: sef Approved by: mav	2018-07-18 16:03:40 +00:00
jhibbits	c959497ef5	dtrace/powerpc: Correct register indices for non-indexed registers in the trapframe Fix an off-by-one error, LR starts at index 32, not index 33, and the others follow suit.	2018-07-16 19:47:29 +00:00
sef	bfd14e4f72	Fix up some missed and mis-merges from the sequential scan code (r334844). Most of the changes involve moving some code around to reduce conflicts with future merges. One of the missing changes included a notification on scrub cancellation. Approved by: mav Sponsored by: iXsystems Inc	2018-07-10 20:11:32 +00:00
sef	107a344bbb	This exposes ZFS user and group quotas via the normal quatactl(2) mechanism. (Read-only at this point, however.) In particular, this is to allow rpc.rquotad query quotas for NFS mounts, allowing users to see their quotas on the hosts using the datasets. The changes specifically: * Add new RPC entry points for querying quotas. * Changes the library routines to allow non-UFS quotas. * Changes rquotad to check for quotas on mounted filesystems, rather than being limited to entries in /etc/fstab * Lastly, adds a VFS entry-point for ZFS to query quotas. Note that this makes one unavoidable behavioural change: if quotas are enabled, then they can be queried, as opposed to the current method of checking for quotas being specified in fstab. (With ZFS, if there are user or group quotas, they're used, always.) Reviewed by: delphij, mav Approved by: mav Sponsored by: iXsystems Inc Differential Revision: https://reviews.freebsd.org/D15886	2018-07-05 22:56:13 +00:00
mmacy	6f2eb073ec	opensolaris compat: fix compile error when opensolaris/sys/types.h is included before stddef.h ptrdiff_t would be typedef'd twice	2018-07-03 23:45:02 +00:00
sef	a73aecd5f1	This originated from ZFS On Linux, as `d4a72f2386` During scans (scrubs or resilvers), it sorts the blocks in each transaction group by block offset; the result can be a significant improvement. (On my test system just now, which I put some effort to introduce fragmentation into the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the changes.) I've seen similar rations on production systems. Approved by: Alexander Motin Obtained from: ZFS On Linux Relnotes: Yes (improved scrub performance, with tunables) Differential Revision: https://reviews.freebsd.org/D15562	2018-06-08 17:38:28 +00:00
benno	6dafa53868	Break recursion involving getnewvnode and zfs_rmnode. When we're at our vnode limit, getnewvnode will call into the vnode LRU cache to free up vnodes. If the vnode we try to recycle is a ZFS vnode we end up, eventually, in zfs_rmnode. If the ZFS vnode we're recycling represents something with extended attributes, zfs_rmnode will call zfs_zget which will attempt to allocate another vnode. If the next vnode we try to recycle is also a ZFS vnode representing something with extended attributes we can recurse further. This ends up being unbounded and can end up overflowing the stack. In order to avoid this, restructure zfs_rmnode to simply add the extended attribute directory's object ID to the unlinked set, thus not requiring the allocation of a vnode. We then schedule a task that calls zfs_unlinked_drain which will do the work of properly marking the vnodes for unlinking. zfs_unlinked_drain is also called on mount so these will be cleaned up there. Reviewed by: avg, mav Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D15342	2018-06-07 18:59:32 +00:00
jhibbits	45332e3d1f	Revert r326083, it doesn't behave as expected. Even though there do appear to be more artificial frames, with 12, stack traces no longer list at all. Revert until a better, more stable value can be determined.	2018-06-03 03:53:11 +00:00
jhibbits	c0a10a2d85	Protect dtrace_getpcstack() from a NULL stack pointer in a trap frame Found when trying to use lockstat on a POWER9, the stack pointer (r1) could be NULL, and result in a NULL pointer dereference, crashing the kernel.	2018-05-30 03:48:27 +00:00
hselasky	af8c4fe8c2	Fix 32-bit buildworld for i386 after r334320. The 64-bit atomics defined for i386 are currently only available in the kernel space. Found by: cy@ MFC after: 1 week Sponsored by: Mellanox Technologies	2018-05-29 13:43:16 +00:00
hselasky	831d81975f	Implement atomic_add_64() and atomic_subtract_64() for the i386 target. While at it add missing _acq_ and _rel_ variants for 64-bit atomic operations under i386. Reviewed by: kib @ MFC after: 1 week Sponsored by: Mellanox Technologies	2018-05-29 11:59:02 +00:00

1 2 3 4 5 ...

2000 Commits