Commit Graph

517 Commits

Author SHA1 Message Date
delphij
81242229b8 MFC r271527: MFV r271511:
Use fnvlist_* to make code more readable.

Illumos issue:
    5135 zpool_find_import_cached() can use fnvlist_*

Approved by:	re (gjb)
2014-10-02 22:16:00 +00:00
delphij
fae0398507 MFC r271227: MFV r271225:
Iterate through all the children instead of returning error when we hit
the first error.  This makes the error message give more information
rather than just the first device that causes problem.

Illumos issue:
    5118 When verifying or creating a storage pool, error messages only
       show one device

Approved by:	re (gjb)
2014-09-25 21:45:07 +00:00
smh
971865d1c7 MFC r271934:
Output boot code warning when zpool upgrade -a is used to add features.

PR:		188328
Approved by:	re (marius)
Sponsored by:	Multiplay
2014-09-24 09:59:48 +00:00
delphij
81bb44f925 MFC r271222:
Fix typo.

Submitted by:	Dmitry Morozovsky <marck rinet ru>
Approved by:	re (gjb)
2014-09-10 13:13:30 +00:00
markj
c044c8f131 MFC r269524:
Preserve the errno value of an ioctl before calling free(3). Previously,
errno was very occasionally being clobbered, resulting in a bogus error from
dt_consume() and thus an error from dtrace(1).
2014-08-20 14:57:55 +00:00
delphij
f0e0389097 MFC r269430: MFV r269426:
Double test device size for ztest(1).

Illumos issue:
    5039 ztest should default to larger device sizes
    Author: Matthew Ahrens <mahrens@delphix.com>
2014-08-18 05:13:46 +00:00
pfg
5b8c39f44b MFC r267875:
4251 libdtrace leaks open file handles

Illumos commit:		93ed8d0d4b068b95d0bb50d57bb854df462a8485
			(partial)
Reference:
https://www.illumos.org/issues/4251

Discussed with:	Robert Mustacchi
Obtained from:	Illumos
2014-08-16 00:52:13 +00:00
markj
599cdb2d59 MFC r257877:
Don't try to use the 32-bit drti.o unless the data model is explicitly set
to ILP32. Otherwise dtrace -G will attempt to use it on amd64 if it can't
determine which data model to use, which happens when -64 is omitted and
no object files are provided, e.g. with

# dtrace -G -n BEGIN

This would result in a linker error, but now works properly.

Also remove an unnecessary #ifdef.
2014-08-14 16:45:01 +00:00
rpaulo
bd71cae4c3 MFC r269776
Remove the BROKEN_LIBELF section.
2014-08-13 06:41:06 +00:00
delphij
63a479a6fb MFC r269229,269404,269466: MFV r269223:
Change dn->dn_dbufs from linked list to AVL tree.

Illumos issues:
  4873 zvol unmap calls can take a very long time for larger datasets
2014-08-12 00:53:03 +00:00
delphij
c14fd95fbc MFC r269118: MFV r269010:
Import Illumos changes to address the following Illumos issues:
  4976 zfs should only avoid writing to a failing non-redundant
       top-level vdev
  4978 ztest fails in get_metaslab_refcount()
  4979 extend free space histogram to device and pool
  4980 metaslabs should have a fragmentation metric
  4981 remove fragmented ops vector from block allocator
  4982 space_map object should proactively upgrade when feature
       is enabled
  4984 device selection should use fragmentation metric
2014-08-10 05:58:41 +00:00
delphij
2e7e16d0c1 MFC r269100:
Diff reduction against Illumos.
2014-08-08 19:14:49 +00:00
delphij
3247b8806e MFC r268621 (smh) + r268625:
Don't report non-native block-size pools under zpool status -x

zpool status -x is used to identify pools that are exhibiting
errors or are otherwise unavailable, therefore non-native
block-size pools shouldn't be reported.

Also update man page to clarify other additional conditions
which won't cause a pool to be displayed under zpool status -x.

Sponsored by:   Multiplay
2014-08-08 19:11:23 +00:00
delphij
ccb7d1d4f5 MFC the cddl/contrib/opensolaris/cmd/zpool portion of r267803 (joel):
mdoc: remove superfluous paragraph macros.
2014-08-08 19:06:24 +00:00
markj
c189d6694c MFC r265631:
Re-apply r248644. This fixes an annoying problem which caused dtrace -c to
fail to attach to stripped binaries. With the _r_debug_postinit symbol,
dtrace(1) can now set a breakpoint in the victim process after it has
registered its DOF table(s) with the kernel. r_debug_state cannot be used
for this purpose since it is called before DOF is made available, in which
case dtrace(1) cannot create USDT probes before the program begins
execution.
2014-08-08 15:21:43 +00:00
markj
2fd28e2373 MFC r256571:
Add a function, memstr, which can be used to convert a buffer of
null-separated strings to a single string. This can be used to print the
full arguments of a process using execsnoop (from the DTrace toolkit) or
with the following one-liner:

dtrace -n 'syscall::execve:return {trace(curpsinfo->pr_psargs);}'

Note that this relies on the process arguments being cached via the struct
proc, which means that it will not work for argvs longer than
kern.ps_arg_cache_limit. However, the following rather non-portable
script can be used to extract any argv at exec time:

fbt::kern_execve:entry
{
    printf("%s", memstr(args[1]->begin_argv, ' ',
        args[1]->begin_envv - args[1]->begin_argv));
}

The debug.dtrace.memstr_max sysctl limits the maximum argument size to
memstr().
2014-08-04 15:36:22 +00:00
delphij
6a949e106d MFC r268855: MFV r268848:
Instead of asserting all zio's be properly aligned, only assert
on the logical ones.

Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.

This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).

While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.

Illumos issue:
  4958 zdb trips assert on pools with ashift >= 0xe
2014-08-02 03:56:06 +00:00
markj
c7af226b37 MFC r264486:
Use the correct format specifiers for wide characters and strings of wide
characters.
2014-07-29 21:21:16 +00:00
markj
2bf1f15393 MFC r262669:
When our linker merges .SUNW_dof sections from multiple files, it simply
concatenates the DOF tables into one section. Previously, the USDT init
code in drti.o would only look at the first table in the DOF section; with
this change, it iterates over all the tables, passing each DOF table to
the kernel.

PR:	186821
2014-07-29 18:31:27 +00:00
delphij
2c88e211d5 MFC r268720: MFV r268714:
Improve extreme rewind import.

When doing an "extreme rewind" import ("zpool import -XF"), we attempt
to verify all data in the pool, essentially scrubbing the entire pool.
The problem is that spa_load_verify_cb() issues an unbounded number of
concurrent scrub i/os.  This can lead to all of memory being used for
these zio's, wedging the system. Like normal scrub, we need to put a
cap on the number of outstanding i/os, and have the traverse thread
block when we reach this cap.

For this purpose the cap can be very large (10,000) to optimize the
elevator algorithm.  Three kernel tunables have been added:

	vfs.zfs.spa_load_verify_maxinflight
	vfs.zfs.spa_load_verify_metadata
	vfs.zfs.spa_load_verify_data

The latter two tunables controls whether metadata and/or user data
when doing extreme rewind.

Make 'zpool import -T' imply scrub.

Make zpool import -T <txg> accept hexadecimal values for the txg when
prefixed with 0x.

Skip txg's for which there is no uberblock when doing extreme rewind.

Skip reading all user data twice by skipping prefetches when doing
extreme rewinds as we do not access via the ARC.

Illumos issues:
  4970 need controls on i/o issued by zpool import -XF
  4971 zpool import -T should accept hex values
  4972 zpool import -T implies extreme rewind, and thus a scrub
  4973 spa_load_retry retries the same txg
  4974 spa_load_verify() reads all data twice
2014-07-29 05:49:16 +00:00
delphij
70f7d126e3 MFC r268473: MFV r268455:
Use reserved space for ZFS administrative commands.
2014-07-23 00:49:35 +00:00
delphij
776d6cbee6 MFC r260156: MFV r260152:
4208 Typo in zfs_main.c: "posxiuser"
2014-07-23 00:46:56 +00:00
delphij
126b8ac4b4 MFC r268470: MFV r268454:
Refresh zpool list for each interval in order to produce fresh
output.

Illumos issue: 4966 zpool list iterator does not update output
2014-07-23 00:41:11 +00:00
delphij
3750365a9c MFC r268469: MFV r268453:
Diff reduction against Illumos.
2014-07-23 00:38:23 +00:00
delphij
adc65d02d1 MFC r268116:
- Fix handling of "new" style of ioctl in compatiblity mode [1];
 - Reorganize code and reduce diff from upstream;
 - Improve forward compatibility shims for previous kernel;

Reported by:    sbruno [1]
2014-07-17 05:20:18 +00:00
delphij
4af6f088fb MFC r268126: MFV r268121:
4924 LZ4 Compression for metadata
2014-07-15 05:42:09 +00:00
delphij
5cce10db7b MFC r268123: MFV r268119:
4914 zfs on-disk bookmark structure should be named *_phys_t
2014-07-15 05:39:22 +00:00
delphij
4d09e20b95 MFC r268086: MFV r267570:
4756 metaslab_group_preload() could deadlock
2014-07-15 05:36:26 +00:00
delphij
bfdd43f2b5 MFC r268084: MFV r267568:
4891 want zdb option to dump all metadata
2014-07-15 05:28:58 +00:00
delphij
91643324a9 MFC r268079: MFV r267566:
4390 i/o errors when deleting filesystem/zvol can lead to space map
     corruption
2014-07-15 05:00:46 +00:00
delphij
9d1dc5bcc9 MFC r268075: MFV r267565:
4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks
2014-07-15 04:53:34 +00:00
delphij
5ed76404fe MFC r260142: MFV r258972:
4373 add block contents print to zstreamdump
2014-07-15 04:44:06 +00:00
delphij
38554d6064 MFC r266771: MFV r266766:
Add a new zfs property, "redundant_metadata" which can have values "all" or
"most".  The default will be "all", which is the current behavior.  When set
to all, ZFS stores an extra copy of all metadata.  If a single on-disk block
is corrupt, at worst a single block of user data (which is recordsize bytes
long) can be lost.

Setting to "most" will cause us to only store 1 copy of level-1 indirect
blocks of user data files.  This can improve performance of random writes,
because less metadata has to be written.  In practice,  at worst about
100 blocks (of recordsize bytes each) of user data can be lost if a single
on-disk block is corrupt.

The exact behavior of which metadata blocks are stored redundantly may change
in future releases.

Illumos issue: 3835 zfs need not store 2 copies of all metadata
2014-07-15 04:39:55 +00:00
delphij
7afd1db032 MFC r267572: MFV r249332 (illumos-gate 14005:55fc53126003)
Illumos ZFS issues:
  3654 zdb should print number of ganged blocks
2014-07-15 04:33:11 +00:00
rpaulo
0cff4b03dd MFC 267929, 267937, 267939, 267940, 267941, 267942, 267987, 268006:
2915 DTrace in a zone should see "cpu", "curpsinfo", et al
 2916 DTrace in a zone should be able to access fds[]
 2917 DTrace in a zone should have limited provider access
 4477 DTrace should speak JSON
 Add stubs for CTF functions which are not yet implemented.
 4474 DTrace Userland CTF Support
 4475 DTrace userland Keyword
 4476 DTrace tests should be better citizens
 4479 pid provider types
 4480 dof emulation is missing checks
 4471 DTrace count() with histogram
 4472 DTrace full width distribution histograms
 4473 DTrace frequency trails
2014-07-12 22:56:41 +00:00
pfg
93d41081e3 MFC r267513:
Merge from r258379 missed the tests.

4248 dtrace(1M) should never create DOF with empty probes section
4249 Only probes from the first DTrace object file will be included

Illumos Revision:	54a20ab41aadcb81c53e72fc65886e964e9add59
2014-06-20 15:40:13 +00:00
delphij
c49d771309 MFC r266520:
Explicitly link libzfs against libavl as it is done in OpenSolaris
(4543:12bb2876a62e).  Without this, some third party applications
may break because the lack of AVL related symbols.

FreeBSD base system are not affected because the FreeBSD ZFS command
line tools were all linked against libavl and thus hide the underlying
issue.

PR:	bin/183081
2014-05-27 18:22:52 +00:00
markj
644a04942b MFC r262329:
Define the KM_NORMALPRI flag for kmem_alloc(), as it is used in some
upstream DTrace code.

MFC r262330:
1452 DTrace buffer autoscaling should be less violent

illumos/illumos-gate@6fb4854bed
2014-05-25 18:19:57 +00:00
mav
058b9b78b9 MFC r265689:
Import adapted OpenSolaris' thread pool API implementation.

The thread pool is used by libzfs to implement parallel disk scanning.
Without this change our dummy wrapper made `zpool import ZZZ` command to
scan all disks sequentially from the single thread when searching for pools.
This change makes it use two threads per CPU, same as in OpenSolaris.

On system with 200 HDDs this change reduces ZFS pool import time from 35
to 22 seconds.
2014-05-24 10:44:40 +00:00
mav
54ed85cbfe MFC r265821:
Comment out some pointless device open/close around reading device IDs.

FreeBSD ZFS port unlike OpenSolaris does not use device IDs, and does not
implement respective devid_*() fuctions.  It is pointless to open devices
just to close them back immediately.
2014-05-24 10:41:37 +00:00
delphij
7cb0f49ed2 MFC r264835 (MFV r264829):
3897 zfs filesystem and snapshot limits
2014-05-09 07:12:31 +00:00
delphij
0fce5e81cc MFC r264669: MFV r264666:
4374 dn_free_ranges should use range_tree_t

illumos/illumos-gate@bf16b11e8d
2014-05-09 06:56:26 +00:00
mav
44963562b0 MFC r264145:
Add property and sysctl to control how ZVOLs are exposed to OS.

New ZFS property volmode and sysctl vfs.zfs.vol.mode allow switching ZVOL
between three modes:
 geom -- existing fully functional behavior (default);
 dev -- exposing volumes only as raw disk device file in devfs;
 none -- not exposing volumes outside ZFS.

The "dev" mode is less functional (can't be partitioned, mounted, etc),
but it is faster, and in some scenarios with untrusted consumers safer.
It can be useful for NAS, VM block storages, etc.
The "none" mode may be convenient for backup servers, etc. that don't
need direct data access.

Due to the way ZVOL is integrated with main ZFS code, those property
and sysctl are checked only during pool import and volume creation.
2014-05-08 13:12:24 +00:00
smh
c84c4e2d80 MFC r264851
Eliminated optarg global being used outside of the function which called getopt

Sponsored by:	Multiplay
2014-05-08 08:17:12 +00:00
markj
607e8b47f9 MFC r262542:
Move some files that are identical on i386 and amd64 to an x86 subdirectory
rather than keeping duplicate copies.
2014-05-03 16:08:52 +00:00
pfg
7c82f25917 MFC r264040:
4248 dtrace(1M) should never create DOF with empty probes section
4249 Only probes from the first DTrace object file will be included

Illumos Revision:	4a20ab41aadcb81c53e72fc65886e964e9add59

Reference:
https://www.illumos.org/issues/4248
https://www.illumos.org/issues/4249

Obtained from:	Illumos
2014-05-02 20:12:31 +00:00
delphij
e852cd6938 MFC r264467:
Take into account when zpool history block grows exceeding 128KB in zpool(8)
and zdb(8) by growing the buffer on demand with a cap of 1GB (specified in
spa_history_create_obj()).

PR:		bin/186574
Submitted by:	Andrew Childs <lorne cons org nz> (with changes)
2014-04-28 06:11:03 +00:00
jmmv
797209d767 MFC r264741: Add placeholder Kyuafiles for various top-level hierarchies.
This is "make tinderbox" clean.
2014-04-28 04:20:14 +00:00
markj
ec059ac886 MFC r262596:
4478 dtrace_dof_maxsize is far too small

illumos/illumos-gate@d339a29bb4
2014-04-23 03:26:29 +00:00
delphij
d336d68dfb MFC r263889 (MFV r263887):
3993 zpool(1M) and zfs(1M) should support -p for "list" and "get"
4700 "zpool get" doesn't support -H or -o options
2014-04-11 01:27:33 +00:00