Use calloc() instead of malloc() + bzero(). This also gets rid of a warning
because bzero is defined by strings.h which is not included in thread_pool.c.
r264400:
NO_MAN= has been deprecated in favor of MAN= for some time, go ahead
and finish the job. ncurses is now the only Makefile in the tree that
uses it since it wasn't a simple mechanical change, and will be
addressed in a future commit.
r265836:
Remove last two NO_MAN= in the tree. In both of these cases, MAN= is
what is needed.
Treat D keywords as identifiers in certain postfix expressions. This allows
one to, for example, access the "provider" field of a struct g_consumer,
even though "provider" is a D keyword.
PR: 169657
The module load address always needs to be included when setting the dm_*_va
fields of dt_module_t. Previously, this was only done on architectures where
kernel modules have type ET_REL; this change fixes that. As a result, symbol
name resolution in the stack() action now works properly for kernel modules
on i386.
ZFS large block support. The default recordsize remains at 128KB.
A new tunable/sysctl variable, vfs.zfs.max_recordsize is added to
allow adjusting the permitted maximum record size, or
zfs_max_recordsize, with a default of 1MB. ZFS will not allow
setting recordsize greater than zfs_max_recordsize as a safety
belt, because larger recordsize means greater read and write
latency and more memory usage.
Please note that booting from datasets that have recordsize greater
than 128KB is not supported (but it's Okay to enable the feature on
the pool).
Limited safety belt is provided for mounted root filesystem but use
caution when using a larger value.
Illumos issue:
5027 zfs large block support
Improve zdb -b performance:
- Reduce gethrtime() call to 1/100th of blkptr's;
- Skip manipulating the size-ordered tree;
- Issue more (10, previously 3) async reads;
- Use lighter weight testing in traverse_visitbp();
Illumos issue:
5243 zdb -b could be much faster
Apply upstream 13597:3eac1e8e0f4c (git: illumos-gate@aa846ad9):
Initialize tqent_flags in the userland taskq implementation. Without
this the assertion of tq->tq_freelist != NULL may fail in taskq_destroy.
The problem is that tqent_flags is never initialized in the userland
implementation while the kernel one does initialize it. Without proper
initialization, the flag may have its lowest bit set, making it treated
as TQENT_FLAG_PREALLOC and never removing taskq_ent_t from tq_freelist.
- Limit ARC for zdb at 256MB. zdb do not typically revisit data
in the ARC.
- Increase default max_inflight from 200 to 1000 (can be overriden
by -I) so we can queue more I/Os when doing scrubbing.
- Print status while loading meataslabs for leak detection.
Illumos issues:
5169 zdb should limit its ARC size
5170 zdb -c should create more scrub i/os by default
5171 zdb should print status while loading metaslabs for leak detection
Iterate through all the children instead of returning error when we hit
the first error. This makes the error message give more information
rather than just the first device that causes problem.
Illumos issue:
5118 When verifying or creating a storage pool, error messages only
show one device
Approved by: re (gjb)
Preserve the errno value of an ioctl before calling free(3). Previously,
errno was very occasionally being clobbered, resulting in a bogus error from
dt_consume() and thus an error from dtrace(1).
Don't try to use the 32-bit drti.o unless the data model is explicitly set
to ILP32. Otherwise dtrace -G will attempt to use it on amd64 if it can't
determine which data model to use, which happens when -64 is omitted and
no object files are provided, e.g. with
# dtrace -G -n BEGIN
This would result in a linker error, but now works properly.
Also remove an unnecessary #ifdef.
Import Illumos changes to address the following Illumos issues:
4976 zfs should only avoid writing to a failing non-redundant
top-level vdev
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature
is enabled
4984 device selection should use fragmentation metric
Don't report non-native block-size pools under zpool status -x
zpool status -x is used to identify pools that are exhibiting
errors or are otherwise unavailable, therefore non-native
block-size pools shouldn't be reported.
Also update man page to clarify other additional conditions
which won't cause a pool to be displayed under zpool status -x.
Sponsored by: Multiplay
Re-apply r248644. This fixes an annoying problem which caused dtrace -c to
fail to attach to stripped binaries. With the _r_debug_postinit symbol,
dtrace(1) can now set a breakpoint in the victim process after it has
registered its DOF table(s) with the kernel. r_debug_state cannot be used
for this purpose since it is called before DOF is made available, in which
case dtrace(1) cannot create USDT probes before the program begins
execution.
Add a function, memstr, which can be used to convert a buffer of
null-separated strings to a single string. This can be used to print the
full arguments of a process using execsnoop (from the DTrace toolkit) or
with the following one-liner:
dtrace -n 'syscall::execve:return {trace(curpsinfo->pr_psargs);}'
Note that this relies on the process arguments being cached via the struct
proc, which means that it will not work for argvs longer than
kern.ps_arg_cache_limit. However, the following rather non-portable
script can be used to extract any argv at exec time:
fbt::kern_execve:entry
{
printf("%s", memstr(args[1]->begin_argv, ' ',
args[1]->begin_envv - args[1]->begin_argv));
}
The debug.dtrace.memstr_max sysctl limits the maximum argument size to
memstr().
Instead of asserting all zio's be properly aligned, only assert
on the logical ones.
Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.
This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).
While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.
Illumos issue:
4958 zdb trips assert on pools with ashift >= 0xe
When our linker merges .SUNW_dof sections from multiple files, it simply
concatenates the DOF tables into one section. Previously, the USDT init
code in drti.o would only look at the first table in the DOF section; with
this change, it iterates over all the tables, passing each DOF table to
the kernel.
PR: 186821
Improve extreme rewind import.
When doing an "extreme rewind" import ("zpool import -XF"), we attempt
to verify all data in the pool, essentially scrubbing the entire pool.
The problem is that spa_load_verify_cb() issues an unbounded number of
concurrent scrub i/os. This can lead to all of memory being used for
these zio's, wedging the system. Like normal scrub, we need to put a
cap on the number of outstanding i/os, and have the traverse thread
block when we reach this cap.
For this purpose the cap can be very large (10,000) to optimize the
elevator algorithm. Three kernel tunables have been added:
vfs.zfs.spa_load_verify_maxinflight
vfs.zfs.spa_load_verify_metadata
vfs.zfs.spa_load_verify_data
The latter two tunables controls whether metadata and/or user data
when doing extreme rewind.
Make 'zpool import -T' imply scrub.
Make zpool import -T <txg> accept hexadecimal values for the txg when
prefixed with 0x.
Skip txg's for which there is no uberblock when doing extreme rewind.
Skip reading all user data twice by skipping prefetches when doing
extreme rewinds as we do not access via the ARC.
Illumos issues:
4970 need controls on i/o issued by zpool import -XF
4971 zpool import -T should accept hex values
4972 zpool import -T implies extreme rewind, and thus a scrub
4973 spa_load_retry retries the same txg
4974 spa_load_verify() reads all data twice