freebsd-nq/include/sys
Matthew Ahrens aa755b3549
Set aside a metaslab for ZIL blocks
Mixing ZIL and normal allocations has several problems:

1. The ZIL allocations are allocated, written to disk, and then a few
seconds later freed.  This leaves behind holes (free segments) where the
ZIL blocks used to be, which increases fragmentation, which negatively
impacts performance.

2. When under moderate load, ZIL allocations are of 128KB.  If the pool
is fairly fragmented, there may not be many free chunks of that size.
This causes ZFS to load more metaslabs to locate free segments of 128KB
or more.  The loading happens synchronously (from zil_commit()), and can
take around a second even if the metaslab's spacemap is cached in the
ARC.  All concurrent synchronous operations on this filesystem must wait
while the metaslab is loading.  This can cause a significant performance
impact.

3. If the pool is very fragmented, there may be zero free chunks of
128KB or more.  In this case, the ZIL falls back to txg_wait_synced(),
which has an enormous performance impact.

These problems can be eliminated by using a dedicated log device
("slog"), even one with the same performance characteristics as the
normal devices.

This change sets aside one metaslab from each top-level vdev that is
preferentially used for ZIL allocations (vdev_log_mg,
spa_embedded_log_class).  From an allocation perspective, this is
similar to having a dedicated log device, and it eliminates the
above-mentioned performance problems.

Log (ZIL) blocks can be allocated from the following locations.  Each
one is tried in order until the allocation succeeds:
1. dedicated log vdevs, aka "slog" (spa_log_class)
2. embedded slog metaslabs (spa_embedded_log_class)
3. other metaslabs in normal vdevs (spa_normal_class)

The space required for the embedded slog metaslabs is usually between
0.5% and 1.0% of the pool, and comes out of the existing 3.2% of "slop"
space that is not available for user data.

On an all-ssd system with 4TB storage, 87% fragmentation, 60% capacity,
and recordsize=8k, testing shows a ~50% performance increase on random
8k sync writes.  On even more fragmented systems (which hit problem #3
above and call txg_wait_synced()), the performance improvement can be
arbitrarily large (>100x).

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11389
2021-01-21 15:12:54 -08:00
..
crypto Extending FreeBSD UIO Struct 2021-01-20 21:27:30 -08:00
fm Avoid posting duplicate zpool events 2020-09-04 10:34:28 -07:00
fs record ioctl elapsed time in zpool history 2021-01-11 09:29:25 -08:00
lua FreeBSD: Reduce stack usage of Lua 2020-09-22 16:03:11 -07:00
sysevent Avoid installing kernel headers on FreeBSD 2020-06-27 17:40:14 -07:00
zstd zstd: track allocator statistics 2020-10-30 15:26:10 -07:00
abd_impl.h allow callers to allocate and provide the abd_t struct 2021-01-20 11:24:37 -08:00
abd.h allow callers to allocate and provide the abd_t struct 2021-01-20 11:24:37 -08:00
aggsum.h Reduce number of atomic_add() calls in aggsum 2020-02-06 13:21:06 -08:00
arc_impl.h dmu_zfetch: fix memory leak 2020-12-12 16:00:00 -08:00
arc.h Implement memory and CPU hotplug 2020-12-10 14:09:23 -08:00
avl_impl.h
avl.h Restore avl_update() calls and related functions 2020-06-03 09:49:32 -07:00
bitops.h Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
blkptr.h
bplist.h Fast Clone Deletion 2019-07-26 10:54:14 -07:00
bpobj.h Fast Clone Deletion 2019-07-26 10:54:14 -07:00
bptree.h
bqueue.h Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
btree.h Fix typos 2020-06-09 21:24:09 -07:00
dataset_kstats.h port async unlinked drain from illumos-nexenta 2019-02-12 10:41:15 -08:00
dbuf.h Improve zfs receive performance with lightweight write 2020-12-11 10:26:02 -08:00
ddt.h Appease GCC sprintf warnings found on Fedora 32/GCC 10.0.1 2020-08-24 10:32:59 -07:00
dmu_impl.h Remove UIO_ZEROCOPY functions structures 2020-10-30 10:00:33 -07:00
dmu_objset.h Improve zfs receive performance with lightweight write 2020-12-11 10:26:02 -08:00
dmu_recv.h filesystem_limit/snapshot_limit is incorrectly enforced against root 2020-07-11 17:18:02 -07:00
dmu_redact.h Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
dmu_send.h Add 'zfs send --saved' flag 2020-01-10 10:16:58 -08:00
dmu_traverse.h Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
dmu_tx.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dmu_zfetch.h zfetch: Don't issue new streams when old have not completed 2020-09-27 17:08:38 -07:00
dmu.h Set aside a metaslab for ZIL blocks 2021-01-21 15:12:54 -08:00
dnode.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dsl_bookmark.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dsl_crypt.h dmu_objset_from_ds must be called with dp_config_rwlock held 2020-03-12 10:55:02 -07:00
dsl_dataset.h implicit conversion from 'boolean_t' to 'ds_hold_flags_t' 2020-12-27 16:31:02 -08:00
dsl_deadlist.h Add fast path for zfs_ioc_space_snaps() handling of empty_bpobj 2019-08-20 11:34:52 -07:00
dsl_deleg.h Remove code for zfs remap 2019-06-24 16:44:01 -07:00
dsl_destroy.h Fast Clone Deletion 2019-07-26 10:54:14 -07:00
dsl_dir.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dsl_pool.h Eliminate Linux specific inode usage from common code 2019-12-11 11:53:57 -08:00
dsl_prop.h Support inheriting properties in channel programs 2020-01-22 17:03:17 -08:00
dsl_scan.h Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
dsl_synctask.h nowait synctask must succeed 2020-09-04 10:29:39 -07:00
dsl_userhold.h
edonr.h
efi_partition.h Fix typos in include/ 2019-08-30 09:53:15 -07:00
frame.h Linux 5.10 compat: frame.h renamed objtool.h 2020-11-02 22:01:10 +00:00
hkdf.h
Makefile.am Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
metaslab_impl.h Make metaslab class rotor and aliquot per-allocator. 2020-12-15 10:55:44 -08:00
metaslab.h Only examine best metaslabs on each vdev 2020-12-16 14:40:05 -08:00
mmp.h Add zfs_multihost_interval tunable handler for FreeBSD 2020-06-23 13:32:42 -07:00
mntent.h Add FreeBSD required defines to mntent.h 2019-11-30 15:49:09 -08:00
mod.h Replace ZFS on Linux references with OpenZFS 2020-10-08 20:10:13 -07:00
multilist.h Avoid extra taskq_dispatch() calls by DMU 2019-06-25 12:03:38 -07:00
note.h Update build system and packaging 2018-05-29 16:00:33 -07:00
nvpair_impl.h OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations 2018-07-30 11:30:03 -07:00
nvpair.h FreeBSD: make adjustments for the standalone environment 2020-10-13 21:05:49 -07:00
objlist.h Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
pathname.h Replace ZFS on Linux references with OpenZFS 2020-10-08 20:10:13 -07:00
qat.h QAT related bug fixes 2019-09-12 13:33:44 -07:00
range_tree.h Improve compatibility with C++ consumers 2020-06-06 12:54:04 -07:00
rrwlock.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
sa_impl.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
sa.h Extending FreeBSD UIO Struct 2021-01-20 21:27:30 -08:00
skein.h
spa_boot.h
spa_checkpoint.h Serialize ZTHR operations to eliminate races 2019-01-13 10:09:46 -08:00
spa_checksum.h
spa_impl.h Set aside a metaslab for ZIL blocks 2021-01-21 15:12:54 -08:00
spa_log_spacemap.h Log Spacemap Project 2019-07-16 10:11:49 -07:00
spa.h Set aside a metaslab for ZIL blocks 2021-01-21 15:12:54 -08:00
space_map.h Extend zdb to print inconsistencies in livelists and metaslabs 2020-07-14 17:51:05 -07:00
space_reftree.h Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
sysevent.h
txg_impl.h Fix typos in include/ 2019-08-30 09:53:15 -07:00
txg.h Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
u8_textprep_data.h
u8_textprep.h Throw const on some strings 2020-10-02 17:44:10 -07:00
uberblock_impl.h MMP interval and fail_intervals in uberblock 2019-03-21 12:47:57 -07:00
uberblock.h
uio_impl.h Extending FreeBSD UIO Struct 2021-01-20 21:27:30 -08:00
unique.h
uuid.h
vdev_disk.h Make struct vdev_disk_t be platform private 2020-06-16 11:43:33 -07:00
vdev_draid.h Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_file.h Add zfs_file_* interface, remove vnodes 2019-11-21 09:32:57 -08:00
vdev_impl.h Set aside a metaslab for ZIL blocks 2021-01-21 15:12:54 -08:00
vdev_indirect_births.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_indirect_mapping.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_initialize.h Add TRIM support 2019-03-29 09:13:20 -07:00
vdev_raidz_impl.h allow callers to allocate and provide the abd_t struct 2021-01-20 11:24:37 -08:00
vdev_raidz.h Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_rebuild.h Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_removal.h panic in removal_remap test on 4K devices 2019-06-13 13:12:39 -07:00
vdev_trim.h Trim L2ARC 2020-06-09 10:15:08 -07:00
vdev.h Set aside a metaslab for ZIL blocks 2021-01-21 15:12:54 -08:00
xvattr.h Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zap_impl.h
zap_leaf.h Fix ENOSPC in "Handle zap_add() failures in ..." 2018-04-18 14:19:50 -07:00
zap.h fat zap should prefetch when iterating 2019-06-12 13:13:09 -07:00
zcp_global.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_iter.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_prop.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_set.h Support setting user properties in a channel program 2020-02-14 13:41:42 -08:00
zcp.h filesystem_limit/snapshot_limit is incorrectly enforced against root 2020-07-11 17:18:02 -07:00
zfeature.h
zfs_acl.h Return an error code from zfs_acl_chmod_setattr 2019-11-01 10:19:11 -07:00
zfs_bootenv.h zfs label bootenv should store data as nvlist 2020-09-15 15:42:27 -07:00
zfs_context.h Introduce CPU_SEQID_UNSTABLE 2020-11-02 11:51:12 -08:00
zfs_debug.h Set aside a metaslab for ZIL blocks 2021-01-21 15:12:54 -08:00
zfs_delay.h Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_file.h Re-share zfsdev_getminor and zfs_onexit_fd_hold 2020-02-28 14:50:32 -08:00
zfs_fuid.h Replace sprintf()->snprintf() and strcpy()->strlcpy() 2020-06-07 11:42:12 -07:00
zfs_ioctl_impl.h Make zc_nvlist_src_size limit tunable 2020-08-18 09:33:55 -07:00
zfs_ioctl.h Cross-platform acltype 2020-10-13 21:25:48 -07:00
zfs_onexit.h Remove deduplicated send/receive code 2020-04-23 10:06:57 -07:00
zfs_project.h Minor diff reduction with ZoF in include/sys 2019-11-27 11:11:03 -08:00
zfs_quota.h File incorrectly zeroed when receiving incremental stream that toggles -L 2020-06-09 10:41:01 -07:00
zfs_ratelimit.h Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_refcount.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
zfs_rlock.h Add a "try" operation for range locks 2020-07-06 11:53:31 -07:00
zfs_sa.h Extending FreeBSD UIO Struct 2021-01-20 21:27:30 -08:00
zfs_stat.h
zfs_sysfs.h Fix in-kernel sysfs entries 2018-09-06 21:44:52 -07:00
zfs_vfsops.h Add 'zfs rename -u' to rename without remounting 2020-09-01 16:14:16 -07:00
zfs_vnops.h Extending FreeBSD UIO Struct 2021-01-20 21:27:30 -08:00
zfs_znode.h G/C struct znode -> z_moved 2020-11-10 12:42:47 -08:00
zil_impl.h make zil max block size tunable 2019-06-10 11:48:42 -07:00
zil.h zil_parse: make callback parameters const 2020-10-09 09:34:54 -07:00
zio_checksum.h
zio_compress.h Add zstd support to zfs 2020-08-20 10:30:06 -07:00
zio_crypt.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
zio_impl.h Add zstd support to zfs 2020-08-20 10:30:06 -07:00
zio_priority.h Add device rebuild feature 2020-07-03 11:05:50 -07:00
zio.h Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
zrlock.h
zthr.h Introduce names for ZTHRs 2020-07-29 09:43:33 -07:00
zvol_impl.h Fix problems in zvol_set_volmode_impl 2020-11-17 09:50:52 -08:00
zvol.h async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00