illumos/illumos-gate@12380e1e70https://www.illumos.org/issues/5269
When importing a pool (at boot or with zpool import) with many
filesystem, the process can take minutes. It doesn't matter whether
the pool has been exported cleanly or uncleanly. The problem is that
each dataset has its own log chain. On import, all datasets have to be
checked if there are logs to replay. The idea is to speed up this
process by paralellizing it.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Arne Jansen <jansen@webgods.de>
lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@nexenta.com>
Author: Max Grossman <max.grossman@delphix.com>
illumos/illumos-gate@643da460c8
illumos/illumos-gate@70163ac57ehttps://www.illumos.org/issues/5695
In dmu_sync_ready(), a hole block pointer will have it's logical size
explicitly set as it's necessary for replay purposes. To "undo" this,
dmu_sync_done() will zero out any hole that it finds. This becomes a
problem when using the "hole_birth" feature, as this will also wipe out
any birth time that might have happened to be set on the hole.
...
As a fix, the logic to zero out a hole is only applied to old style
holes with a birth time of zero. Holes created with the "hole_birth"
feature enabled will have a non-zero birth time, and will be skipped
(thus preserving the ltime, type, and level information as well).
In addition, zdb was updated to also print the ltime, type, and level
information for these new style holes. Previously, only the logical
birth time would be printed.
Author: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5376 arc_kmem_reap_now() should not result in clearing arc_no_grow
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@2ec99e3e98
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@8df173054c
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@732885fca0
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@23367a2f2c
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>
illumod/illumos-gate@34e8acef00
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@98110f08fa
Reviewed by: Will Andrews <willa@spectralogic.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Justin Gibbs <justing@spectralogic.com>
illumos/illumos-gate@bc9014e6a8
arc_state_t stats and differentiate between "data" and "metadata"
Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@4076b1bf41
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <Chris.Williamson@delphix.com>
illumos/illumos-gate@89c86e3229
Currently, every buffer cached in the L2ARC is accompanied by a 240-byte
header in memory, leading to very high memory consumption when using very
large cache devices. These changes significantly reduce this overhead.
Currently:
L1-only header = 176 bytes
L1 + L2 or L2-only header = 176 bytes + 32 byte checksum + 32 byte l2hdr
= 240 bytes
Memory-optimized:
L1-only header = 176 bytes
L1 + L2 header = 176 bytes + 32 byte checksum = 208 bytes
L2-only header = 96 bytes + 32 byte checksum = 128 bytes
So overall:
Trunk Optimized
+-----------------+
L1-only | 176 B | 176 B | (same)
+-----------------+
L1 & L2 | 240 B | 208 B | (saved 32 bytes)
+-----------------+
L2-only | 240 B | 128 B | (saved 116 bytes)
+-----------------+
For an average blocksize of 8KB, this means that for the L2ARC, the ratio
of metadata to data has gone down from about 2.92% to 1.56%. For a
'storage optimized' EC2 instance with 1600GB of SSD and 60GB of RAM, this
means that we expect a completely full L2ARC to use (1600 GB * 0.0156) /
60GB = 41% of the available memory, down from 78%.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@34d7ce052c
5693 ztest fails in dbuf_verify: buf[i] == 0, due to dedup and bp_override
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@7f7ace3700
5661 ZFS: "compression = on" should use lz4 if feature is enabled
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by: Xin LI <delphij@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Justin T. Gibbs <justing@spectralogic.com>
illumos/illumos-gate@db1741f555
5630 stale bonus buffer in recycled dnode_t leads to data corruption
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Justin T. Gibbs <justing@spectralogic.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
illumos/illumos-gate@9d47dec048
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Justin T. Gibbs <justing@spectralogic.com>
illumos/illumos-gate@e57a022b8f
The check introduced by r285946 failed to add the dependency on
opt_kstack_pages.h which meant the default value for the platform instead
of the customised options KSTACK_PAGES=X was being tested.
Also wrap in #ifdef __FreeBSD__ for portability.
MFC after: 3 days
Sponsored by: Multiplay
When a process is exiting, there is a narrow window where p_cred may be
NULL while its threads are still executing. Specifically, the last thread
to exit a process sets the process state to PRS_ZOMBIE with the proc
spinlock held and then calls thread_exit(). thread_exit() drops the spin
lock, permitting the process to be reaped and thus causing its cred struct
to be released. However, the exiting thread may still cause DTrace probes
to fire by calling sched_throw(), resulting in a double fault if such a
probe enabling attempts to access the GID or UID DIF variables.
The thread's cred reference is not susceptible to this race since it is not
released until after the thread has exited.
MFC after: 1 week
As ZFS requires a more kernel stack pages than is the default on some
architectures e.g. i386, warn if KSTACK_PAGES is less than
ZFS_MIN_KSTACK_PAGES (which is 4 at the time of writing).
MFC after: 3 days
Sponsored by: Multiplay
This change refactors the existing create_thread() function to be more
generic. It replaces almost all of its arguments by a callback that can
be used to extract the thread ID and copy it out to the right place, but
also to perform additional initialization steps, such as setting the
trapframe. This also makes the difference between thr_new() and
thr_create() more clear in my opinion.
This function is going to be used by the CloudABI compatibility layer.
It looks like the OpenSolaris compatibility framework already provides a
function called thread_create(). Rename this function to
do_thread_create() and use a macro to deal with the namespacing
conflict. A similar approach is already used for thread_exit().
MFC after: 1 month
in lockstat.ko. This means that lockstat probes now have typed arguments and
will utilize SDT probe hot-patching support when it arrives.
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D2993
enabled. The cost of a timecounter read can be quite significant, and the
problem became more apparent after r284297, since that change resulted in
a call to lockstat_nsecs() for each acquisition of an rwlock read lock.
PR: 201642
Reviewed by: avg
Tested by: Jason Unovitch
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D3073
belongs to the kernel stack address range for the thread. Right now,
code checks that new frame is not farther then KSTACK_PAGES pages from
the current frame, which allows the address to point past the top of
the stack.
Reviewed by: andrew, emaste, markj
Differential revision: https://reviews.freebsd.org/D3108
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free
list) of either counter are still guarded with vnode interlock.
Reviewed by: kib (earlier version)
Tested by: pho
Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.
Reviewed by: kib
macros on amd64 and i386. Move the definition to machine/param.h.
kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb
version to PINKERNEL().
On i386, correct the lowest kernel address. After the shared page was
introduced, USRSTACK no longer points to the last user address + 1 [*]
Submitted by: Oliver Pinter [*]
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
We now take z_teardown_lock as a writer to ensure that there is no I/O
while the filesystem state is in a flux. Also, zfs_suspend_fs() ->
zfsvfs_teardown() call zfs_unregister_callbacks() and zfs_resume_fs() ->
zfsvfs_setup() call zfs_unregister_callbacks(). Previously there was no
synchronization between those calls and the calls in the re-mounting
case. That could lead to concurrent execution and a crash.
PR: 180060
Differential Revision: https://reviews.freebsd.org/D2865
Suggested by: mahrens
Reviewed by: delphij, pho, mahrens, will
MFC after: 13 days
Sponsored by: ClusterHQ
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@46e1baa6cfhttps://www.illumos.org/issues/5911
Sometimes ZFS appears to hang while deleting a file. It is actually
making slow progress at the file deletion, but other operations
(administrative and writes via the data path) "hang" until the file
removal completes, which can take a long time if the file has many
blocks. The deletion (or most of it) happens in a single txg, and the
sync thread spends most of its time reading indirect blocks via this
stack trace:
swtch+0x141()
cv_wait+0x70()
zio_wait+0x5b()
dbuf_read+0x2c0()
free_children+0x50()
free_children+0x12a()
free_children+0x12a()
free_children+0x12a()
dnode_sync_free_range_impl+0xdf()
dnode_sync_free_range+0x52()
range_tree_vacate+0x65()
dnode_sync+0x1d8()
dmu_objset_sync_dnodes+0x77()
dmu_objset_sync+0x19f()
dsl_dataset_sync+0x51()
dsl_pool_sync+0x9a()
spa_sync+0x2ff()
txg_sync_thread+0x21f()
thread_start+8()
One way to reproduce the problem is if we are over the arc_meta_limit,
e.g. because lots of indirect blocks are pinned because we have L0
dbufs under them. It could be that most of the L1 indirects are cached,
in which case when dmu_free_long_range_impl() calls dmu_tx_hold_free(),
it will complete very quickly. This allows dmu_free_long_range_impl() to
put many (perhaps all of its) transactions in the same TXG. However,
dmu_free_long_range_impl() calls dnode_evict_dbufs (and
dnode_free_range()), which removes the L0 dbufs, thus reducing the hold
count on the L1 indirect blocks above it, allowing them to be evicted.
Because we are over the arc_meta_limit(), these L1 blocks will be
evicted ASAP. Thus when we get to syncing context, the L1 indirects are
no longer cached and must be read in.
Obtained from: illumos
MFC after: 15 days
Do that only when when fast inline versions are available.
At the moment that can be the case only in the kernel and not for all
platforms.
The original code uses the binary search and that's kept as a fallback.
This is a micro optimization.
Differential Revision: https://reviews.freebsd.org/D2839
Reviewed by: delphij, mahrens, mav
MFC after: 17 days