Double test device size for ztest(1).
Illumos issue:
5039 ztest should default to larger device sizes
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 2 weeks
Change dn->dn_dbufs from linked list to AVL tree.
Illumos issues:
4873 zvol unmap calls can take a very long time for larger datasets
MFC after: 2 weeks
Import Illumos changes to address the following Illumos issues:
4976 zfs should only avoid writing to a failing non-redundant
top-level vdev
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature
is enabled
4984 device selection should use fragmentation metric
MFC after: 2 weeks
Instead of asserting all zio's be properly aligned, only assert
on the logical ones.
Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.
This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).
While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.
Illumos issue:
4958 zdb trips assert on pools with ashift >= 0xe
MFC after: 2 weeks
Improve extreme rewind import.
When doing an "extreme rewind" import ("zpool import -XF"), we attempt
to verify all data in the pool, essentially scrubbing the entire pool.
The problem is that spa_load_verify_cb() issues an unbounded number of
concurrent scrub i/os. This can lead to all of memory being used for
these zio's, wedging the system. Like normal scrub, we need to put a
cap on the number of outstanding i/os, and have the traverse thread
block when we reach this cap.
For this purpose the cap can be very large (10,000) to optimize the
elevator algorithm. Three kernel tunables have been added:
vfs.zfs.spa_load_verify_maxinflight
vfs.zfs.spa_load_verify_metadata
vfs.zfs.spa_load_verify_data
The latter two tunables controls whether metadata and/or user data
when doing extreme rewind.
Make 'zpool import -T' imply scrub.
Make zpool import -T <txg> accept hexadecimal values for the txg when
prefixed with 0x.
Skip txg's for which there is no uberblock when doing extreme rewind.
Skip reading all user data twice by skipping prefetches when doing
extreme rewinds as we do not access via the ARC.
Illumos issues:
4970 need controls on i/o issued by zpool import -XF
4971 zpool import -T should accept hex values
4972 zpool import -T implies extreme rewind, and thus a scrub
4973 spa_load_retry retries the same txg
4974 spa_load_verify() reads all data twice
MFC after: 2 weeks
zpool status -x is used to identify pools that are exhibiting
errors or are otherwise unavailable, therefore non-native
block-size pools shouldn't be reported.
Also update man page to clarify other additional conditions
which won't cause a pool to be displayed under zpool status -x.
Sponsored by: Multiplay
Use reserved space for ZFS administrative commands.
We reserve 1/2^spa_slop_shift = 1/32 or 3.125% of pool space (or 32MB at
least) for system use. Most ZPL operations, e.g. write(2), creat(2), will
fail with ENOSPC if we fall below this.
Certain operations, e.g. file removal and most administrative actions,
still permitted until half of the slop space is used. This would allow
users to use these operations to free up space in the pool when pool is
close to full but half of slop space is still free.
A very restricted set of operations that frees up space or change quota
are always permitted, regardless of the amount of free space.
MFC after: 2 weeks
Refresh zpool list for each interval in order to produce fresh
output.
Illumos issue: 4966 zpool list iterator does not update output
MFC after: 2 weeks
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Discussed at: BSDcan
2915 DTrace in a zone should see "cpu", "curpsinfo", et al
2916 DTrace in a zone should be able to access fds[]
2917 DTrace in a zone should have limited provider access
MFC after: 2 weeks
Merge from r258379 missed the tests.
4248 dtrace(1M) should never create DOF with empty probes section
4249 Only probes from the first DTrace object file will be included
Illumos Revision: 54a20ab41aadcb81c53e72fc65886e964e9add59
MFC after: 5 days
Add a new zfs property, "redundant_metadata" which can have values "all" or
"most". The default will be "all", which is the current behavior. When set
to all, ZFS stores an extra copy of all metadata. If a single on-disk block
is corrupt, at worst a single block of user data (which is recordsize bytes
long) can be lost.
Setting to "most" will cause us to only store 1 copy of level-1 indirect
blocks of user data files. This can improve performance of random writes,
because less metadata has to be written. In practice, at worst about
100 blocks (of recordsize bytes each) of user data can be lost if a single
on-disk block is corrupt.
The exact behavior of which metadata blocks are stored redundantly may change
in future releases.
Illumos issue: 3835 zfs need not store 2 copies of all metadata
MFC after: 2 weeks
FreeBSD ZFS port unlike OpenSolaris does not use device IDs, and does not
implement respective devid_*() fuctions. It is pointless to open devices
just to close them back immediately.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
fail to attach to stripped binaries. With the _r_debug_postinit symbol,
dtrace(1) can now set a breakpoint in the victim process after it has
registered its DOF table(s) with the kernel. r_debug_state cannot be used
for this purpose since it is called before DOF is made available, in which
case dtrace(1) cannot create USDT probes before the program begins
execution.
MFC after: 2 weeks
and zdb(8) by growing the buffer on demand with a cap of 1GB (specified in
spa_history_create_obj()).
PR: bin/186574
Submitted by: Andrew Childs <lorne cons org nz> (with changes)
MFC after: 2 weeks
New ZFS property volmode and sysctl vfs.zfs.vol.mode allow switching ZVOL
between three modes:
geom -- existing fully functional behavior (default);
dev -- exposing volumes only as raw disk device file in devfs;
none -- not exposing volumes outside ZFS.
The "dev" mode is less functional (can't be partitioned, mounted, etc),
but it is faster, and in some scenarios with untrusted consumers safer.
It can be useful for NAS, VM block storages, etc.
The "none" mode may be convenient for backup servers, etc. that don't
need direct data access.
Due to the way ZVOL is integrated with main ZFS code, those property
and sysctl are checked only during pool import and volume creation.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
4248 dtrace(1M) should never create DOF with empty probes section
4249 Only probes from the first DTrace object file will be included
Illumos Revision: 4a20ab41aadcb81c53e72fc65886e964e9add59
Reference:
https://www.illumos.org/issues/4248https://www.illumos.org/issues/4249
Obtained from: Illumos
MFC after: 1 month
Fix a memory leak in uu_avl_pool_create: pthread_mutex_init without
a corresponding pthread_mutex_destroy. It shows up, among other
places, when doing "zfs list".
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation
concatenates the DOF tables into one section. Previously, the USDT init
code in drti.o would only look at the first table in the DOF section; with
this change, it iterates over all the tables, passing each DOF table to
the kernel.
PR: 186821
Submitted by: Fedor Indutny <fedor@indutny.com>
MFC after: 1 month
illumos/illumos-gate@6fb4854bed
This fixes the tst.resize1.d and tst.resize2.d DTrace tests, which have
been failing since r261122 since they were causing dtrace(1) to attempt to
allocate and use large amounts of memory, and get killed by the OOM killer
as a result.
MFC after: 1 month
"Manpages should start a new sentence on a new line. This makes it easier
for translators to track changes." -jhb
Approved by: jhb
MFC after: 3 days
Sponsored by: SupraNet Communications, Inc
hot spares. This should be MFC'd to all STABLE branches.
Upon the availability of zfsd, the zpool manpage on relevant branches should
be updated to remove this caveat and document hot spare's reliance on zfsd.
Approved by: avg
MFC after: 1 week
Sponsored by: SupraNet Communications
The limitation was introduced in r178556 without any note or comment.
It seems pretty artificial and now it leads to problems like the following:
$ dtrace -x bufsize=17m -n ...
dtrace: processing aborted: Memory allocation failure
OpenSolaris and illumos never had this limitation.
Sponsored by: HybridCluster
emitting the DIE for the type of that member. ctfconvert can not
handle this properly and will calculate a wrong member bit offset.
Same struct/union type from different .o file will be treated as
different types when their member bit offsets are different, and
gets added/merged multiple times. This will in turn cause many other
structs/pointers/typedefs that refer to the duplicated struct/union
gets added/merged multiple times and eventually causes numerous
duplicated CTF types in the kernel.debug file.
The simple workaround here is to make use of DW_AT_byte_size attribute
of the member DIE to calculate the bits occupied by the member's type,
without actually resolving the type.
attributes generated by Clang 3.4.
* Document how different compilers generate DW_AT_data_member_location
attributes differently.
* Document the quirks about DW_FORM_data[48].
"__anon__". This hack is used to workaround a issue that compilers
like GCC could generate DW_TAG_base_type DIE without a name.
Note that we didn't need this before because the old libdwarf
internally set all the unnamed DIE's name to "__anon__".
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
illumos/illumos-gate@43466aae47
NOTE: Make sure the boot code is updated if a zpool upgrade is
done on boot zpool.
MFC after: 2 weeks
3306 zdb should be able to issue reads in parallel
3321 'zpool reopen' command should be documented in the man page
and help message
illumos/illumos-gate@31d7e8fa33
FreeBSD porting notes: the kernel part of this changeset depends
on Solaris buf(9S) interfaces and are not really applicable for
our use. vdev_disk.c is patched as-is to reduce diverge from
upstream, but vdev_file.c is left intact.
MFC after: 2 weeks
SHT_RELA sections properly instead of assuming that the relocation section
is of type SHT_REL.
Submitted by: Prashanth Kumar <pra_udupi@yahoo.co.in> (original version)
MFC after: 1 month
4171 clean up spa_feature_*() interfaces
4172 implement extensible_dataset feature for use by other zpool
features
illumos/illumos-gate@2acef22db7
MFC after: 2 weeks
4168 ztest assertion failure in dbuf_undirty
4169 verbatim import causes zdb to segfa
4170 zhack leaves pool in ACTIVE state
illumos/illumos-gate@7fdd916c47
MFC after: 2 weeks
(64MB). Even if we would find one somehow, ZFS kernel code rejects such
devices. It is funny to look on attempts to read 4 256K vdev labels from
1.44MB floppy, though it is not very practical and quite slow.
expect the installed ksh binary to be named "ksh", which is not the case
when it's installed on FreeBSD via the shells/ksh93 port. Allow for it to be
"ksh93" as well so that the tests can actually pass.
4101 metaslab_debug should allow for fine-grained control
4102 space_maps should store more information about themselves
4103 space map object blocksize should be increased
4104 ::spa_space no longer works
4105 removing a mirrored log device results in a leaked object
4106 asynchronously load metaslab
illumos/illumos-gate@0713e232b7
Note that some tunables have been removed and some new tunables have
been added. Of particular note, FreeBSD-only knob
vfs.zfs.space_map_last_hope is removed as it was a nop for some time now
(after one of the previous merges from upstream).
MFC after: 11 days
Sponsored by: HybridCluster [merge]
illumos/illumos-gate@69962b5647
Please note the following changes:
- zio_ioctl has lost its priority parameter and now TRIM is executed
with 'now' priority
- some knobs are gone and some new knobs are added; not all of them are
exposed as tunables / sysctls yet
MFC after: 10 days
Sponsored by: HybridCluster [merge]
943 zio_interrupt ends up calling taskq_dispatch with TQ_SLEEP
illumos/illumos-gate@5aeb94743e
Essentially FreeBSD taskqueues already operate in a mode that
was added to Illumos with taskq_dispatch_ent change.
We even exposed the superior FreeBSD interface as taskq_dispatch_safe.
Now we just rename taskq_dispatch_safe to taskq_dispatch_ent and
struct struct ostask to taskq_ent_t, so that code differences will be
minimal.
After this change sys/cddl/compat/opensolaris/sys/taskq.h header is no
longer needed.
Note that this commit is not an MFV because the upstream change was not
individually committed to the vendor area.
MFC after: 8 days
On some architectures (powerpc), char is unsigned by default, which means
comparisons against -1 always fail, so the programs get stuck in an
infinite loop.
MFC after: 1 week
to ILP32. Otherwise dtrace -G will attempt to use it on amd64 if it can't
determine which data model to use, which happens when -64 is omitted and
no object files are provided, e.g. with
# dtrace -G -n BEGIN
This would result in a linker error, but now works properly.
Also remove an unnecessary #ifdef.
MFC after: 2 weeks
installed. Additionally, remove Solaris-specific sections and references,
and replace example outputs with output from lockstat on FreeBSD, since
lockstat's output contains stack traces.
This change also removes some examples that don't seem to work properly on
FreeBSD. The examples should be re-added when lockstat is fixed.
Reported by: avg
MFC after: 1 week
available under Oracle Solaris 11.
This includes an update to the ZFS(8) man page to reflect all the
available alias (snap, umount, and recv).
Initial changes obtained from ZFS On Linux + fixes for man page and cmd
help:
10b75496bbcf81b00a73
Obtained from: https://github.com/zfsonlinux/zfs
MFC after: 2 weeks
Sponsored by: Multiplay
null-separated strings to a single string. This can be used to print the
full arguments of a process using execsnoop (from the DTrace toolkit) or
with the following one-liner:
dtrace -n 'syscall::execve:return {trace(curpsinfo->pr_psargs);}'
Note that this relies on the process arguments being cached via the struct
proc, which means that it will not work for argvs longer than
kern.ps_arg_cache_limit. However, the following rather non-portable
script can be used to extract any argv at exec time:
fbt::kern_execve:entry
{
printf("%s", memstr(args[1]->begin_argv, ' ',
args[1]->begin_envv - args[1]->begin_argv));
}
The debug.dtrace.memstr_max sysctl limits the maximum argument size to
memstr(). Thanks to Brendan Gregg for helpful comments on freebsd-dtrace.
Tested by: Fabian Keil (earlier version)
MFC after: 2 weeks