Commit Graph

1111 Commits

Author SHA1 Message Date
Xin LI
8c20e2ff11 MFV r272494:
Make space_map_truncate() always do space_map_reallocate().  Without
this, setting space_map_max_blksz would cause panic for existing pool,
as dmu_objset_set_blocksize would fail if the object have multiple blocks.

Illumos issues:
   5164 space_map_max_blksz causes panic, does not work
   5165 zdb fails assertion when run on pool with recently-enabled
	spacemap_histogram feature

MFC after:	2 weeks
2014-10-04 08:05:39 +00:00
Steven Hartland
14a0d74ea8 Refactor ZFS ARC reclaim checks and limits
Remove previously added kmem methods in favour of defines which
allow diff minimisation between upstream code base.

Rebalance ARC free target to be vm_pageout_wakeup_thresh by default
which eliminates issue where ARC gets minimised instead of balancing
with VM pageout. The restores the target point prior to r270759.

Bring in missing upstream only changes which move unused code to
further eliminate code differences.

Add additional DTRACE probe to aid monitoring of ARC behaviour.

Enable upstream i386 code paths on platforms which don't define
UMA_MD_SMALL_ALLOC.

Fix mixture of byte an page values in arc_memory_throttle i386 code
path value assignment of available_memory.

PR:		187594
Review:		D702
Reviewed by:	avg
MFC after:	1 week
X-MFC-With:	r270759 & r270861
Sponsored by:	Multiplay
2014-10-03 20:34:55 +00:00
Steven Hartland
99140218aa Fix various issues with zvols
When performing snapshot renames we could deadlock due to the locking
in zvol_rename_minors. In order to avoid this use the same workaround
as zvol_open in zvol_rename_minors.

Add missing zvol_rename_minors to dsl_dataset_promote_sync.

Protect against invalid index into zv_name in zvol_remove_minors.

Replace zvol_remove_minor calls with zvol_remove_minors to ensure
any potential children are also renamed.

Don't fail zvol_create_minors if zvol_create_minor returns EEXIST.

Restore the valid pool check in zfs_ioc_destroy_snaps to ensure we
don't call zvol_remove_minors when zfs_unmount_snap fails.

PR:		193803
MFC after:	1 week
Sponsored by:	Multiplay
2014-10-03 14:49:48 +00:00
Marcelo Araujo
d8a5961f88 Fix failures and warnings reported by newpynfs20090424 test tool.
This fix addresses only issues with the pynfs reports, none of these
issues are know to create problems for extant real clients.

Submitted by:	Bart Hsiao <bart.hsiao@gmail.com>
Reworked by:	myself
Reviewed by:	rmacklem
Approved by:	rmacklem
Sponsored by:	QNAP Systems Inc.
2014-10-03 02:24:41 +00:00
Xin LI
43ac3722ac Diff reduction with kernel code: instruct the compiler that the data of
these types may be unaligned to their "normal" alignment and exercise
caution when accessing them.

PR:		194071
MFC after:	3 days
2014-10-02 00:13:08 +00:00
Will Andrews
fbce0221eb zfsvfs_create(): Refuse to mount datasets whose names are too long.
This is checked for in the zfs_snapshot_004_neg STF/ATF test (currently
still in projects/zfsd rather than head).

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:
- zfsvfs_create(): Check whether the objset name fits into
  statfs.f_mntfromname, and return ENAMETOOLONG if not.  Although
  the filesystem can be unmounted via the umount(8) command, any
  interface that relies on iterating on statfs (e.g. libzfs) will
  fail to find the filesystem by its objset name, and thus assume
  it's not mounted.  This causes "zfs unmount", "zfs destroy",
  etc. to fail on these filesystems, whether or not -f is passed.

MFC after:	1 month
Sponsored by:	Spectra Logic
MFSpectraBSD:	974872 on 2013/08/09
2014-10-01 14:12:02 +00:00
Xin LI
0b66c7c514 Fix a mismerge in r260183 which prevents snapshot zvol devices being
removed and re-instate the fix in r242862.

Reported by:	Leon Dang <ldang nahannisys com>, smh
MFC after:	3 days
2014-09-30 18:50:45 +00:00
Steven Hartland
8caa3daf35 Remove sys/types.h include as per style (9)
SDT requries sys/param.h due to use of NULL

Reported by:	Garrett
Sponsored by:	Multiplay
2014-09-18 20:38:18 +00:00
Steven Hartland
71f3caaf31 Add dtrace probe support for zfs SET_ERROR(..)
MFC after:	1 week
Sponsored by:	Multiplay
2014-09-18 20:00:36 +00:00
Will Andrews
91dda985cc Remove debug.zfs_flags in favor of the new vfs.zfs.debug_flags.
Replace TUNABLE_INT with CTLFLAG_RWTUN.

Submitted by:	avg (debug.zfs_flags removal), smh (TUNABLE_INT replacement)
2014-09-18 18:46:38 +00:00
Will Andrews
f8c2f66a6c Enable ZFS debug flags to be modified via vfs.zfs.debug_flags.
This is primarily only of interest to ZFS developers, but it makes it
easier to get additional debugging.

Submitted by:	gibbs
MFC after:	1 month
Sponsored by:	Spectra Logic
MFSpectraBSD:	517074 on 2011/12/15 (by will), 662343 on 2013/03/20 (by gibbs)
2014-09-18 16:55:41 +00:00
Will Andrews
cf0a1157d7 Reorder sysctls for spa.c global tunables; add sysctl for ccw_retry_interval.
MFC after:	1 month
Sponsored by:	Spectra Logic
2014-09-18 16:38:03 +00:00
Will Andrews
cf7a096e72 bpobj_iterate_impl(): Close a refcount leak iterating on a sublist.
If bpobj_space() returned non-zero here, the sublist would have been
left open, along with the bonus buffer hold it requires.  This call
does not invoke any calls to bpobj_close() itself.

This bug doesn't have any known vector, but was found on inspection.

MFC after:	1 week
Sponsored by:	Spectra Logic
Affects:	All ZFS versions starting 21 May 2010 (illumos cde58dbc)
MFSpectraBSD:	r1050998 on 2014/03/26
2014-09-18 15:37:53 +00:00
Steven Hartland
d1d469e22b Remove unused ZFS ARC functions
* arc_data_buf_alloc
* arc_data_buf_free

MFC after:	1 week
Sponsored by:	Multiplay
2014-09-18 10:46:51 +00:00
Justin Hibbits
e40a5cd3ec Fix the stack tracing for dtrace/powerpc.
Summary:
Fix the stack tracing for dtrace/powerpc by using the trapexit/asttrapexit
return address sentinels instead of checking within the kernel address space.

As part of this, I had to add new inline functions.  FBT traces the kernel, so
we have to have special case handling for this, since a trap will create a full
new trap frame, and there's no way to pass around the 'real' stack.  I handle
this by special-casing 'aframes == 0' with the trap frame.  If aframes counts
out to the trap frame, then assume we're looking for the full kernel trap frame,
so switch to the real stack pointer.

Test Plan: Tested on powerpc64

Reviewers: rpaulo, markj, nwhitehorn

Reviewed By: markj, nwhitehorn

Differential Revision: https://reviews.freebsd.org/D788

MFC after:	3 week
Relnotes:	Yes
2014-09-17 02:43:47 +00:00
Steven Hartland
a889b18c52 Added missing ZFS sysctls
* vfs.zfs.vdev.async_write_active_min_dirty_percent
* vfs.zfs.vdev.async_write_active_max_dirty_percent

Added validation of min / max for ZFS sysctl
* vfs.zfs.dirty_data_max_percent

MFC after:	3 days
2014-09-14 12:23:00 +00:00
Xin LI
f9290bc2c9 MFV r271518:
Correctly report hole at end of file.

When asked to find a hole, the DMU sees that there are no holes in the
object, and returns ESRCH.  The ZPL interprets this as "no holes before
the end of the file", and therefore inserts the "virtual hole" at the
end of the file.  Because DMU and ZPL have different ideas of where the
end of an object/file is, we will end up returning the end of file,
which is generally larger, instead of returning the end of object.

The fix is to handle the "virtual hole" in the DMU. If no hole is found,
the DMU will return a hole at the end of the file, rather than an error.

Illumos issue:
    5139 SEEK_HOLE failed to report a hole at end of file

MFC after:	1 week
2014-09-13 17:48:44 +00:00
Xin LI
dc147754b7 MFV r271517:
In zil_claim, don't issue warning if we get EBUSY (inconsistent) when
opening an objset, instead, ignore it silently.

Illumos issue:

    5140 message about "%recv could not be opened" is printed when booting after crash

MFC after:	1 week
2014-09-13 17:36:34 +00:00
Xin LI
be1b14a063 MFV r271515:
Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to
limit how many blocks can be free'ed before a new transaction group is
created.  The default is no limit (infinite), but we should probably have
a lower default, e.g. 100,000.

With this limit, we can guard against the case where ZFS could run out of
memory when destroying large numbers of blocks in a single transaction
group, as the entire DDT needs to be brought into memory.

Illumos issue:
    5138 add tunable for maximum number of blocks freed in one txg

MFC after:	2 weeks
2014-09-13 17:24:56 +00:00
Xin LI
ff0fc48bde MFV r271512:
Illumos issue:
    5136 fix write throttle comment in dsl_pool.c

MFC after:	2 weeks
2014-09-13 16:51:23 +00:00
Xin LI
263f396e2b MFV r271510:
Enforce 4K as smallest indirect block size (previously the smallest
indirect block size was 1K but that was never used).

This makes some space estimates more accurate and uses less memory
for some data structures.

Illumos issue:
    5141 zfs minimum indirect block size is 4K

MFC after:	2 weeks
2014-09-13 16:26:14 +00:00
Steven Hartland
3cdd9138c3 Persist vdev_resilver_txg changes to avoid panic caused by validation
vs a vdev_resilver_txg value from a previous resilver.

MFC after:	1 week
2014-09-11 16:21:51 +00:00
Gleb Smirnoff
27ad26d8c7 Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES(). 2014-09-10 12:36:41 +00:00
Alexander Motin
ee9534ed96 Make ZVOL writes in device mode support IO_SYNC flag.
MFC after:	1 month
2014-09-09 11:29:55 +00:00
Xin LI
817d804595 MFV r271223:
In dnode_sync(), do dnode_increase_indirection() before processing
the dn_next_nblkptr.

Illumos issue:
    5117 space map reallocation can cause corruption

MFC after:	3 days
2014-09-07 13:13:42 +00:00
Peter Wemm
d903c21a64 Move the restored #ifdef i386 test back inside the #ifdef _KERNEL block
where it originally was.
2014-08-31 09:05:02 +00:00
Steven Hartland
92ac3eb59f Ensure that ZFS ARC free memory checks include cached pages
Also restore kmem_used() check for i386 as it has KVA limits that the raw
page counts above don't consider

PR:		187594
Reviewed by:	peter
X-MFC-With: r270759
Review:	D700
Sponsored by:	Multiplay
2014-08-30 21:44:32 +00:00
Mateusz Guzik
6662ce5aab Add missing proctree locking to fill_kinfo_proc consumers.
This fixes r270444.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-08-30 03:10:55 +00:00
Steven Hartland
4d19f4ad1f Refactor ZFS ARC reclaim logic to be more VM cooperative
Prior to this change we triggered ARC reclaim when kmem usage passed 3/4
of the total available, as indicated by vmem_size(kmem_arena, VMEM_ALLOC).

This could lead large amounts of unused RAM e.g. on a 192GB machine with
ARC the only major RAM consumer, 40GB of RAM would remain unused.

The old method has also been seen to result in extreme RAM usage under
certain loads, causing poor performance and stalls.

We now trigger ARC reclaim when the number of free pages drops below the
value defined by the new sysctl vfs.zfs.arc_free_target, which defaults
to the value of vm.v_free_target.

Credit to Karl Denninger for the original patch on which this update was
based.

PR:		191510 and 187594
Tested by:	dteske
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Multiplay
2014-08-28 19:50:08 +00:00
Mark Johnston
35127d3c0f Restore the correct value when disabling probes. Otherwise the instrumented
tracepoints would continue to generate traps, which would be ignored but
could consume noticeable amounts of CPU if, say, all functions in the kernel
were instrumented.

X-MFC-With:	r270067
2014-08-24 17:10:47 +00:00
Xin LI
ec1b564650 Instead of using timestamp in the AVL, use the memory address when
comparing.

Illumos issue:
    5095 panic when adding a duplicate dbuf to dn_dbufs

MFC after:	3 days
2014-08-22 23:13:53 +00:00
Xin LI
fa4484104c MFV r270197:
Illumos issue:
    5066 remove support for non-ANSI compilation
    5068 Remove SCCSID() macro from <macros.h>

MFC after:	2 weeks
2014-08-22 22:13:36 +00:00
Xin LI
d291a3bd9c Provide compatibility shim for atomic_dec_64_nv.
X-MFC-with:	r270247
MFC after:	13 days
2014-08-21 08:25:46 +00:00
Xin LI
7c1db36b28 MFV r270196:
Illumos issue:
    5047 don't use atomic_*_nv if you discard the return value

MFC after:	2 weeks
2014-08-20 22:39:26 +00:00
Xin LI
249ddb42f6 MFC r270195:
Illumos issue:
    5045 use atomic_{inc,dec}_* instead of atomic_add_*

MFC after:	2 weeks
2014-08-20 21:44:48 +00:00
Xin LI
2bcc37f99c MFV r270193:
Illumos issues:
    5042 stop using deprecated atomic functions

MFC after:	2 weeks
2014-08-20 18:29:18 +00:00
Mark Johnston
266b4a78c2 Factor out the common code for function boundary tracing instead of
duplicating the entire implementation for both x86 and powerpc. This makes
it easier to add support for other architectures and has no functional
impact.

Phabric:	D613
Reviewed by:	gnn, jhibbits, rpaulo
Tested by:	jhibbits (powerpc)
MFC after:	2 weeks
2014-08-16 21:42:55 +00:00
Xin LI
60723bfe21 MFV r269542:
In vdev_get_stats, check that the vdev is not a hole before computing the
fragmentation.  This fixes a panic when removing log device.

Illumos issue:
    5049 panic when removing log device

Author:		Alex Reece <alex@delphix.com>
MFC after:	2 weeks
2014-08-05 00:07:21 +00:00
Mark Johnston
2661328745 Return 0 for the PPID of threads in process 0, as process 0 doesn't have a
parent process.

MFC after:	2 weeks
2014-08-04 19:02:30 +00:00
Xin LI
cd741a5e1d Revert r269404 and use cpu_ticks() for dbuf allocation.
Encode CPU's number by XOR'ing the CPU ID against the 64-bit cpu_ticks().

Reviewed by:	mav, gibbs
Differential Revision: https://phabric.freebsd.org/D521
MFC after:	2 weeks
2014-08-03 09:47:51 +00:00
Xin LI
1dcef10eac MFV r269427:
In dnode_children_t, use C99's "[]" idiom for declaring the variable
sized array dnc_children at the end of the structure.

This prevents the compiler from mistakenly optimizing away accesses
beyond the array's defined size.

Illumos issue:
    5038 Remove "old-style" flexible array usage in ZFS.
    Author: Justin T. Gibbs <justing@spectralogic.com>

MFC after:	2 weeks
2014-08-02 08:34:22 +00:00
Ian Lepore
c311f7078c When arm 64-bit atomic ops are available, define ARM_HAVE_ATOMIC64. Use
that symbol (which will be correct in both kernel and userland contexts)
rather than just __arm__ to decide whether to use a local implementation.
2014-08-02 03:44:27 +00:00
Ian Lepore
814f4c5896 Use the 64-bit atomics now provided by arm machine/atomic.h instead of
(conflicting) local versions.
2014-08-01 23:45:50 +00:00
Steven Hartland
6a369c018c Don't return ZIO_PIPELINE_CONTINUE from vdev_op_io_start methods
This prevents recursion of vdev_queue_io_done as per r265321 but
using a different method as recommended on the openzfs list.

We now use zio_interrupt(zio) and return ZIO_PIPELINE_STOP instead
of returning ZIO_PIPELINE_CONTINUE from vdev_*_io_start methods.

zio_vdev_io_start now ASSERTS the that vdev_op_io_start returns
ZIO_PIPELINE_STOP to ensure future changes don't reintroduce
ZIO_PIPELINE_CONTINUE returns.

Cleanup flow in vdev_geom_io_start while I'm here.

Also fix some cases not using SET_ERROR(..)

MFC after:	2 weeks
X-MFC-With:	r265321
2014-08-01 23:16:48 +00:00
Xin LI
125f68e708 Split gethrtime() and gethrtime_waitfree() and make the former use
nanouptime() instead of getnanouptime().  nanouptime(9) provides more
precise result at expense of being slower.

In r269223, gethrtime() is used as creation time of dbuf, which in turn
acts as portion of lookup key to maintain AVL invariant where there can
not be duplicate items.  Before this change, gethrtime() have preferred
better execution time by sacrificing precision, which may lead to panic
on busy systems with:

	panic: avl_find() succeeded inside avl_add()

Reported by:	allanjude, mav
PR:		kern/192284
MFC after:	11 days
X-MFC-with:	r269223
2014-08-01 22:33:23 +00:00
Rui Paulo
d18aa577d5 Copy strtolctype.h to sys/cddl/contrib/opensolaris/common/util to keep
the kernel self-contained.

Requested by:	jhb
2014-07-31 08:07:23 +00:00
Xin LI
9b046b421f MFV r269224:
Increase default ARC buf_hash_table size.  When typical block size is small,
the hash table could be too small, which would lead to long hash chains and
limit performance for cached reads.

A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which
allows users to override the default assumption of average (typical) block
size.  Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).

Illumos issue:
    5034 ARC's buf_hash_table is too small

MFC after:	2 weeks
2014-07-29 09:36:48 +00:00
Xin LI
a3cbca537e MFV r269223:
Change dn->dn_dbufs from linked list to AVL tree.

Illumos issues:
  4873 zvol unmap calls can take a very long time for larger datasets

MFC after:	2 weeks
2014-07-29 08:42:22 +00:00
Xin LI
343c95a24e Reschedule the 'deadman' callout after handling, this makes our
code behave more like it is on Solaris.

Reported by:	avg
Reviewed by:	avg, mav (but bugs are mine)

Differential Revision: https://phabric.freebsd.org/D457
2014-07-29 06:57:13 +00:00
Konstantin Belousov
fe0e9a63e0 Initialize zfs vnode v_hash when the vnode is allocated, instead of
postponing it to zfs_vget().  zfs_root() returned vnode with the
default value of v_hash, which caused inconsistent v_hash value when
root vnode was obtained from zfs_vget().

Nullfs allocated two upper vnodes for the root zfs vnode due to
different hashes, causing consistency problems.

Reported and tested by:	Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-28 14:24:18 +00:00