Commit Graph

1002 Commits

Author SHA1 Message Date
Andriy Gapon
99d058c8a7 MFV r282123: 5610 zfs clone from different source and target pools produces coredump
MFC after:	10 days
2015-04-28 07:42:28 +00:00
Andriy Gapon
28d15239af MFV r282124: 5393 spurious failures from dsl_dataset_hold_obj()
The actual bugfix was pro-actively committed in r275515.
This MFV is cosmetic, it just aligns code style with the upstream.

MFC after:	10 days
2015-04-28 07:37:38 +00:00
Andriy Gapon
39b6f1d6c1 nvpair_type_is_array: DATA_TYPE_INT8_ARRAY was not recognized
To do:	upstream (https://www.illumos.org/issues/5778)
MFC after:	10 days
2015-04-28 06:34:55 +00:00
Mark Johnston
8241ee3b2c Fix DTrace's panic() action.
It would previously call into some unfinished Solaris compatibility code and
return without actually calling panic(9). The compatibility code is
unneeded, however, so just remove it and have dtrace_panic() call vpanic(9)
directly.

Differential Revision:	https://reviews.freebsd.org/D2349
Reviewed by:	avg
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2015-04-24 03:19:30 +00:00
Xin LI
384f656a1a Remove vfs.zfs.snapshot_list_prefetch, the corresponding code was
gone in r248571 already.

MFC after:	1 week
2015-04-17 21:21:11 +00:00
Mark Johnston
67cf27b70f libdtrace: add support for lazyload mode.
Passing "-x lazyload" to dtrace -G during compilation causes dtrace(1) to
not link drti.o into the output object file, so the USDT probes are not created
during process startup. Instead, dtrace(1) will automatically discover and
create probes on the process' behalf when attaching.

Differential Revision:	https://reviews.freebsd.org/D2203
Reviewed by:		rpaulo
MFC after:		1 month
2015-04-08 02:36:37 +00:00
Alexander Motin
91b9f63738 Add DTrace probe to the new ARC reclaim cause added in r281026.
MFC after:	1 month
2015-04-05 14:45:52 +00:00
Alexander Motin
2e9ccb32a1 Make ZFS ARC track both KVA usage and fragmentation.
Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage
reaches certain threshold (3/4 on i386 or 16/17 otherwise).  FreeBSD has
even less KVA, but had no such limit on archs with direct map as amd64.
As result, on machines with a lot of RAM, during load with very small user-
space memory pressure, such as `zfs send`, it was possible to reach state,
when there is enough both physical RAM and KVA (I've seen up to 25-30%),
but no continuous KVA range to allocate even single 128KB I/O request.

Address this situation from two sides:
 - restore KVA usage limitations in a way the most close to Illumos;
 - introduce new requirement for KVA fragmentation, specifying that we
should have at least one sequential KVA range of zfs_max_recordsize bytes.

Experiments show that first limitation done alone is not sufficient.  On
machine with 64GB of RAM it is sometimes needed to drop up to half of ARC
size to get at leats one 1MB KVA chunk.  Statically limiting ARC to half
of KVA/RAM is too strict, so second limitation makes it to work in cycles:
accumulate trash up to certain critical mass, do massive spring-cleaning,
and then start littering again. :)

MFC after:	1 month
2015-04-03 14:45:48 +00:00
Andrew Turner
7572a8c8f1 Add the arm64 defines for cddl code.
Differential Revision:	https://reviews.freebsd.org/D2186
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
2015-04-01 08:31:56 +00:00
Alexander Motin
e5dcb72f45 Some cosmetic polishing. No functional change.
MFC after:	1 week
2015-03-29 20:28:18 +00:00
Mark Johnston
97f2f66479 Remove unused upstream DTrace provider implementations that are duplicates
of providers under sys/cddl/dev/. Also remove sdt_subr.c, which isn't used
in FreeBSD's SDT implementation.

Suggested by:	rwatson
2015-03-16 01:15:08 +00:00
Steven Hartland
208264283d Allow zvol_geom_worker to process BIO_DELETE's
If zvol_geom_start is called with a BIO_DELETE from a thread which can
sleep it queues it for later processing by the zvol_geom_worker. The
zvol_geom_worker didn't have a delete case so would simply loose the bio
hence preventing the original caller from every completing. In addition
an other unknown types would suffer the same fate.

Allow zvol_geom_worker to process BIO_DELETE's via zvol_strategy and
return unsupported for all unknown bio types.

MFC after:	2 weeks
Sponsored by:	Multiplay
2015-03-14 17:35:04 +00:00
Alexander Motin
0d45c37cb6 Make DIOCGATTR in device mode handle "GEOM::candelete".
MFC after:	3 days
2015-03-12 16:19:18 +00:00
Andrew Turner
4a8169d97b Add the MD parts of dtrace needed to use fbt on ARM. For this we need to
emulate the instructions used in function entry and exit.

For function entry ARM will use a push instruction to push up to 16
registers to the stack. While we don't expect all 16 to be used we need to
handle any combination the compiler may generate, even if it doesn't make
sense (e.g. pushing the program counter).

On function return we will either have a pop or branch instruction. The
former is similar to the push instruction, but with care to make sure we
update the stack pointer and program counter correctly in the cases they
are either in the list of registers or not. For branch we need to take the
24-bit offset, sign-extend it, and add that number of 4-byte words to the
program counter. Care needs to be taken as, due to historical reasons, the
address the branch is relative to is not the current instruction, but 8
bytes later.

This allows us to use the following probes on ARM boards:
  dtrace -n 'fbt::malloc:entry { stack() }'
and
  dtrace -n 'fbt:🆓return { stack() }'

Differential Revision:	https://reviews.freebsd.org/D2007
Reviewed by:	gnn, rpaulo
Sponsored by:	ABT Systems Ltd
2015-03-05 17:55:31 +00:00
George V. Neville-Neil
fcb5606706 Initial version of DTrace on ARM32.
Submitted by:	Howard Su based on work by Oleksandr Tymoshenko
Reviewed by:	ian, andrew, rpaulo, markj
2015-02-10 19:41:30 +00:00
Mark Johnston
3277b9a257 Fix a typo in r278137: make sure to free provider state.
X-MFC-With:     r278136
2015-02-08 03:55:12 +00:00
Pedro F. Giffuni
3ccccdc17d MFV r266995:
4767 dtrace_probe() always has the timestamp

Reference:
https://illumos.org/issues/4767

Obtained from:	Illumos
MFC after:	2 weeks
2015-02-03 20:06:30 +00:00
Pedro F. Giffuni
eadcd0fadf MFV r266993:
4469 DTrace helper tracing should be dynamic

Reference:
https://illumos.org/issues/4469

Obtained from:	Illumos
Phabric:	D1551
Reviewed by:	markj
MFC after:	2 weeks
2015-02-03 19:39:53 +00:00
Mark Johnston
c36bd253fa Continue to handle the case where state is NULL, though this currently
cannot happen on FreeBSD. r278136 overlooked the fact that a destructor
registered with devfs_set_cdevpriv(9) is invoked even in the case of an
error.

X-MFC-With:	r278136
2015-02-03 06:04:16 +00:00
Mark Johnston
ac21b651bf Diff reduction with illumos, in preparation for merging r266993 from the
vendor branch. No functional change.

MFC after:	1 week
2015-02-03 05:38:52 +00:00
Steven Hartland
370a13bfff Prevent inlining txg_quiesce
This allows dtrace to monitor the calls to txg_quiesce which can be really
helpful.

Also standardise __noinline order for arc_kmem_reap_now.

Sponsored by:	Multiplay
2015-02-02 00:17:36 +00:00
Mark Johnston
a70a59ea73 Don't attempt to disable enabled fasttrap probes in an exiting process.
There's no need to do so, and we can't hold an exiting process, so this
race can result in panics.

MFC after:	1 week
2015-01-30 05:03:23 +00:00
Mark Johnston
1eb8ad64ea In fasttrap_sigtrap(), use tdsendsignal() rather than tdksignal() to send
SIGTRAP. The latter requires that its thread argument be non-NULL, but
fasttrap_sigtrap() does not.

PR:		193593
MFC after:	1 week
Reported by:	danilo
2015-01-30 04:51:59 +00:00
Xin LI
63cffd61d1 MFV r255258:
Diff reduction with upstream.  The actual change was merged in r272483
already.

MFC after:	2 weeks
2015-01-28 08:56:48 +00:00
Will Andrews
b4e360d239 When creating or updating a node, use vfs_timestamp() for "now" instead
of gethrestime(), to allow the administrator to decide the appropriate
timestamp precision instead of always using nanosecond precision.
2015-01-24 00:43:02 +00:00
Will Andrews
bd3a7c08c4 Remove commented log messages. 2015-01-21 19:30:01 +00:00
Will Andrews
35b540bfb2 Ignore sync requests from the system syncher, i.e. VFS_SYNC(waitfor=MNT_LAZY).
ZFS already commits outstanding data every zfs_txg_timeout seconds, so these
syncs are unnecessarily intrusive.

Submitted by:	gibbs
Sponsored by:	Spectra Logic
MFSpectraBSD:	1105759 on 2014/12/11
2015-01-21 19:25:57 +00:00
Will Andrews
2a2c1d424a Eliminate an #ifdef illumos for zfs_ioc_rename().
Since allow_mounted is a FreeBSD-specific change, default to B_TRUE, then
locally check for the magic bit.  Unconditionally check allow_mounted below.
Convert the setting of allow_mounted to an explicit boolean.

MFC after:	1 week
Sponsored by:	Spectra Logic
MFSpectraBSD:	672578 (in part) on 2013/07/19
2015-01-21 19:20:36 +00:00
Will Andrews
55ddf051d8 Add vfs.zfs.reference_tracking_enable sysctl/tunable.
This is primarily for developer/debugging use; it enables built-in tagged
tracking of refcounts inside ZFS.  It can only be enabled from the loader,
since it modifies how in-core state is managed.  Default remains disabled.

MFC after:	1 week
Sponsored by:	Spectra Logic
2015-01-21 17:03:11 +00:00
Will Andrews
798cbb7523 Fix arc__shrink DTrace probe's to_free argument.
Remove the unnecessary #ifdef _KERNEL, which did not differ in the true or
false cases.  Actually set the value of to_free before using it.

MFC after:	1 week
Sponsored by:	Spectra Logic
2015-01-20 22:39:10 +00:00
Will Andrews
fe20fb9fb0 Use the "zfs_gfs" tag for GFS vnodes to make them easier to identify.
MFC after:	1 week
Sponsored by:	Spectra Logic
2015-01-20 22:31:26 +00:00
Alexander Motin
d6245e3d44 Allow skipping dmu_buf_will_dirty() call in dsl_dir_transfer_space().
dsl_dir_transfer_space() is mostly called after dsl_dir_diduse_space(),
which already calls dmu_buf_will_dirty() for the same dbuf and tx, so
its duplicate call in those cases will change nothing, only spend time.

Skipping this call by four times reduces time spent in dbuf_write_done()
and descendants, updating dataset statistics with several congested lock
acquisitions.  When rewriting 8K zvol blocks at 1GB/s rate, this reduces
CPU time spent inside dbuf_write_done(), according to profiling, from 45%
of 683K samples to 18% of 422K.

MFC after:	2 weeks
2015-01-20 13:09:12 +00:00
Steven Hartland
5eab7e5406 Clean ZFS spa config before syncing
A number of entries that can be present in the spa config shouldn't be saved
to disk so add a method to ensure this is case. Without this if the last
caller to vdev_config_generate requested stats then we can end up in the
cache file.

Also only skip a none writable pool in the cache file generation if its
active. This prevents unavailable pools incorrectly getting removed from
cache file.

Tested by:	delphij
MFC after:	2 weeks
Sponsored by:	Multiplay
2015-01-18 23:15:49 +00:00
Steven Hartland
bc96366c86 Mechanically convert cddl sun #ifdef's to illumos
Since the upstream for cddl code is now illumos not sun, mechanically
convert all sun #ifdef's to illumos #ifdef's which have been used in all
newer code for some time.

Also do a manual pass to correct the use if #ifdef comments as per style(9)
as well as few uses of #if defined(__FreeBSD__) vs #ifndef illumos.

MFC after:	1 month
Sponsored by:	Multiplay
2015-01-17 14:44:59 +00:00
Alexander Motin
38feff972b Fix overflow bug from r248577, turning 30s TRIM timeout into ~4s.
MFC after:	2 weeks
2015-01-14 16:22:00 +00:00
Alexander Motin
d4f46a775d Reimplement TRIM throttling added in r248577.
Previous throttling implementation approached problem from the wrong side.
It significantly limited useful delaying of TRIM requests and aggregation
potential, while not so much controlled TRIM burstiness under heavy load.

With this change random 4K write benchmarks (probably the worst case for
TRIM) show me IOPS increase by 20%, average latency reduction by 30%, peak
TRIM bursts reduction by 3 times and same peak TRIM map size (memory usage).

Also the new logic does not force map size down so heavily, really allowing
to keep deleted data for 32 TXG or 30 seconds under moderate load.  It was
practically impossible with old throttling logic, which pushed map down to
only 64 segments.

Reviewed by:	smh
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-01-14 09:39:57 +00:00
Alexander Motin
5b3a65823d Skip extra bcopy() when scrubbing vdev without redundancy.
According to profiler, this bcopy() can use about 10% of CPU time.

MFC after:	2 weeks
2015-01-12 22:38:55 +00:00
Alexander Motin
f5b85f6551 When aggregating TRIM segments, move the new one to the list end.
New segment at the list head may block all TRIM requests until txg of that
segment can be processed.  On my random I/O tests this change reduce peak
TRIM list length from 650 to 450 segments.  Hopefully it should reduce TRIM
burstiness when list processing is unblocked.

MFC after:	2 weeks
2015-01-11 16:36:39 +00:00
Alexander Motin
2de874ed23 Add LBA as secondary sort key for synchronous I/O requests.
On FreeBSD gethrtime() implemented via getnanouptime(), that has 1ms (1/hz)
precision.  It makes primary sort key (timestamp) collision very possible.
In such situations sorting by secondary key of LBA is much more reasonable
then by totally meaningless zio pointer value.

With this change on multi-threaded synchronous ZVOL read I've measured 10%
throughput increase and average latency reduction.

MFC after:	2 weeks
2015-01-11 00:26:18 +00:00
Alexander Motin
13ea8106d9 Use new optimized dmu_read_uio_dbuf() for ZVOLs in device mode.
This slightly reduces overhead by avoiding dnode_hold()/dnode_rele() calls.

MFC after:	2 weeks
2015-01-10 18:28:58 +00:00
Steven Hartland
8de799ea3a Correct zpool list displaying invalid EXPANDSZ for unavailable pool vdevs
When pools are unavailable their vdevs are also unavailable which means
that vdev_max_asize remains at the default zero. This default was being
used to calculate vs_esize resulting in a negative number as vdev_asize >
vdev_max_asize, which caused zpool list -v to display 16.0E for EXPANDSZ
of these vdevs.
2014-12-31 04:54:48 +00:00
Steven Hartland
51f529b50b Always sync the global ZFS config cache to reflect the new mosconfig
This fixes out of date zpool.cache for root pools, which can cause issues
such as confusion of zdb etc.

MFC after:	1 month
2014-12-23 09:31:24 +00:00
Steven Hartland
6831cf6aa6 Fix panic when resizing ZFS zvol's
Resizing a ZFS ZVOL with debug enabled would result in a panic due to
recursion on dp_config_rwlock.

The upstream change "3464 zfs synctask code needs restructuring" changed
zvol_set_volsize to avoid the recursion on dp_config_rwlock, but this was
missed when originally merged in by r248571 due to significant differences
in our codebases in this area.

These changes also relied on bring in changes from upstream:
3557 dumpvp_size is not updated correctly when a dump zvol's size is
changed, which where also not present.

In order to help prevent future issues in this area a direct comparison
and diff minimisation from current upstream version (b515258) of zvol.c.

Differential Revision:	https://reviews.freebsd.org/D1302
MFC after:	1 month
X-MFC-With:	r276063 & r276066
Sponsored by:	Multiplay
2014-12-22 18:39:38 +00:00
Steven Hartland
08c0f91ecc Refactor zvol locking to minimise diff with upstream
Use #define zfsdev_state_lock spa_namespace_lock instead of replacing all
zfsdev_state_lock with spa_namespace_lock to minimise changes from upstream.

Differential Revision:	D1302
MFC after:	1 month
X-MFC-With	r276063
Sponsored by:	Multiplay
2014-12-22 17:04:51 +00:00
Steven Hartland
39c478eac4 Standardise on illumos for #ifdef's in zvol.c
Also correct as per style(9) on the use of #ifdef comments.

This is a no-op change as pre-cursor to a full cleanup and merge with
upstream zvol changes.

Sponsored by:	Multiplay
2014-12-22 16:38:29 +00:00
Konstantin Belousov
789bdfdbc6 Handle MAKEENTRY cnp flag in the VOP_CREATE(). Curiously, some
fs, e.g. smbfs, already did it.

Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-21 13:29:33 +00:00
Xin LI
62722f7d5b Add missing continue: we can't proceed further if the
kernel does not panic with zfs_panic_recover.

Illumos issue:
    5438 zfs_blkptr_verify should continue after zfs_panic_recover

Reported by:	Coverity
CID:		1232014
2014-12-19 00:20:29 +00:00
Xin LI
7d073f8411 MFV r275914:
As of r270383, the dbuf_compare comparator compares the dbuf
attributes in the following order:

	db_level (indirect level)
	db_blkid (block number)
	db_state (current state)
	the address of the element

Because db_state is being considered before the element's state,
changing of db_state would affect balancedness of the AVL tree,
even when the address of element compares differently.  For
instance, in dbuf_create, db_state may be altered after the
node is inserted into the AVL tree and may break AVL tree
balancedness.

Instead of using db_state as a comparision critera (introduced
in r270383), consider it only when we are doing a lookup, that
is one of the two dbuf pointers contains DB_SEARCH.

Illumos issue:
    5422 preserve AVL invariants in dn_dbufs

MFC after:	2 weeks
2014-12-18 23:45:26 +00:00
Konstantin Belousov
6c21f6edb8 The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter().  Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by:	Chris Torek <torek@pi-coral.com>
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-18 10:01:12 +00:00
Xin LI
a771fba68c MFV r275783:
Convert ARC flags to use enum.  Previously, public flags are defined in
arc.h and private flags are defined in arc.c which can lead to confusion
and programming errors.

Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf'
or 'ab' because arc_buf_t are often named 'buf' as well.

Illumos issue:
    5369 arc flags should be an enum
    5370 consistent arc_buf_hdr_t naming scheme

MFC after:	2 weeks
2014-12-15 18:22:45 +00:00