Commit Graph

1119 Commits

Author SHA1 Message Date
delphij
7137fdfbce Diff reduction with kernel code: instruct the compiler that the data of
these types may be unaligned to their "normal" alignment and exercise
caution when accessing them.

PR:		194071
MFC after:	3 days
2014-10-02 00:13:08 +00:00
will
1e6d91e484 zfsvfs_create(): Refuse to mount datasets whose names are too long.
This is checked for in the zfs_snapshot_004_neg STF/ATF test (currently
still in projects/zfsd rather than head).

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:
- zfsvfs_create(): Check whether the objset name fits into
  statfs.f_mntfromname, and return ENAMETOOLONG if not.  Although
  the filesystem can be unmounted via the umount(8) command, any
  interface that relies on iterating on statfs (e.g. libzfs) will
  fail to find the filesystem by its objset name, and thus assume
  it's not mounted.  This causes "zfs unmount", "zfs destroy",
  etc. to fail on these filesystems, whether or not -f is passed.

MFC after:	1 month
Sponsored by:	Spectra Logic
MFSpectraBSD:	974872 on 2013/08/09
2014-10-01 14:12:02 +00:00
delphij
915740a55d Fix a mismerge in r260183 which prevents snapshot zvol devices being
removed and re-instate the fix in r242862.

Reported by:	Leon Dang <ldang nahannisys com>, smh
MFC after:	3 days
2014-09-30 18:50:45 +00:00
smh
c466dfba5f Remove sys/types.h include as per style (9)
SDT requries sys/param.h due to use of NULL

Reported by:	Garrett
Sponsored by:	Multiplay
2014-09-18 20:38:18 +00:00
smh
7ce047b163 Add dtrace probe support for zfs SET_ERROR(..)
MFC after:	1 week
Sponsored by:	Multiplay
2014-09-18 20:00:36 +00:00
will
2fa5cd85e7 Remove debug.zfs_flags in favor of the new vfs.zfs.debug_flags.
Replace TUNABLE_INT with CTLFLAG_RWTUN.

Submitted by:	avg (debug.zfs_flags removal), smh (TUNABLE_INT replacement)
2014-09-18 18:46:38 +00:00
will
691a9f40b4 Enable ZFS debug flags to be modified via vfs.zfs.debug_flags.
This is primarily only of interest to ZFS developers, but it makes it
easier to get additional debugging.

Submitted by:	gibbs
MFC after:	1 month
Sponsored by:	Spectra Logic
MFSpectraBSD:	517074 on 2011/12/15 (by will), 662343 on 2013/03/20 (by gibbs)
2014-09-18 16:55:41 +00:00
will
7a7c171c68 Reorder sysctls for spa.c global tunables; add sysctl for ccw_retry_interval.
MFC after:	1 month
Sponsored by:	Spectra Logic
2014-09-18 16:38:03 +00:00
will
7288f5d2fc bpobj_iterate_impl(): Close a refcount leak iterating on a sublist.
If bpobj_space() returned non-zero here, the sublist would have been
left open, along with the bonus buffer hold it requires.  This call
does not invoke any calls to bpobj_close() itself.

This bug doesn't have any known vector, but was found on inspection.

MFC after:	1 week
Sponsored by:	Spectra Logic
Affects:	All ZFS versions starting 21 May 2010 (illumos cde58dbc)
MFSpectraBSD:	r1050998 on 2014/03/26
2014-09-18 15:37:53 +00:00
smh
fc398f1605 Remove unused ZFS ARC functions
* arc_data_buf_alloc
* arc_data_buf_free

MFC after:	1 week
Sponsored by:	Multiplay
2014-09-18 10:46:51 +00:00
jhibbits
2ae1525481 Fix the stack tracing for dtrace/powerpc.
Summary:
Fix the stack tracing for dtrace/powerpc by using the trapexit/asttrapexit
return address sentinels instead of checking within the kernel address space.

As part of this, I had to add new inline functions.  FBT traces the kernel, so
we have to have special case handling for this, since a trap will create a full
new trap frame, and there's no way to pass around the 'real' stack.  I handle
this by special-casing 'aframes == 0' with the trap frame.  If aframes counts
out to the trap frame, then assume we're looking for the full kernel trap frame,
so switch to the real stack pointer.

Test Plan: Tested on powerpc64

Reviewers: rpaulo, markj, nwhitehorn

Reviewed By: markj, nwhitehorn

Differential Revision: https://reviews.freebsd.org/D788

MFC after:	3 week
Relnotes:	Yes
2014-09-17 02:43:47 +00:00
smh
dfd30974e5 Added missing ZFS sysctls
* vfs.zfs.vdev.async_write_active_min_dirty_percent
* vfs.zfs.vdev.async_write_active_max_dirty_percent

Added validation of min / max for ZFS sysctl
* vfs.zfs.dirty_data_max_percent

MFC after:	3 days
2014-09-14 12:23:00 +00:00
delphij
387d8afb94 MFV r271518:
Correctly report hole at end of file.

When asked to find a hole, the DMU sees that there are no holes in the
object, and returns ESRCH.  The ZPL interprets this as "no holes before
the end of the file", and therefore inserts the "virtual hole" at the
end of the file.  Because DMU and ZPL have different ideas of where the
end of an object/file is, we will end up returning the end of file,
which is generally larger, instead of returning the end of object.

The fix is to handle the "virtual hole" in the DMU. If no hole is found,
the DMU will return a hole at the end of the file, rather than an error.

Illumos issue:
    5139 SEEK_HOLE failed to report a hole at end of file

MFC after:	1 week
2014-09-13 17:48:44 +00:00
delphij
9cdf61a6da MFV r271517:
In zil_claim, don't issue warning if we get EBUSY (inconsistent) when
opening an objset, instead, ignore it silently.

Illumos issue:

    5140 message about "%recv could not be opened" is printed when booting after crash

MFC after:	1 week
2014-09-13 17:36:34 +00:00
delphij
3a202e2324 MFV r271515:
Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to
limit how many blocks can be free'ed before a new transaction group is
created.  The default is no limit (infinite), but we should probably have
a lower default, e.g. 100,000.

With this limit, we can guard against the case where ZFS could run out of
memory when destroying large numbers of blocks in a single transaction
group, as the entire DDT needs to be brought into memory.

Illumos issue:
    5138 add tunable for maximum number of blocks freed in one txg

MFC after:	2 weeks
2014-09-13 17:24:56 +00:00
delphij
49c2133129 MFV r271512:
Illumos issue:
    5136 fix write throttle comment in dsl_pool.c

MFC after:	2 weeks
2014-09-13 16:51:23 +00:00
delphij
bd509415bb MFV r271510:
Enforce 4K as smallest indirect block size (previously the smallest
indirect block size was 1K but that was never used).

This makes some space estimates more accurate and uses less memory
for some data structures.

Illumos issue:
    5141 zfs minimum indirect block size is 4K

MFC after:	2 weeks
2014-09-13 16:26:14 +00:00
smh
c3c60bff50 Persist vdev_resilver_txg changes to avoid panic caused by validation
vs a vdev_resilver_txg value from a previous resilver.

MFC after:	1 week
2014-09-11 16:21:51 +00:00
glebius
5939c729a8 Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES(). 2014-09-10 12:36:41 +00:00
mav
7797473e53 Make ZVOL writes in device mode support IO_SYNC flag.
MFC after:	1 month
2014-09-09 11:29:55 +00:00
delphij
52c7048527 MFV r271223:
In dnode_sync(), do dnode_increase_indirection() before processing
the dn_next_nblkptr.

Illumos issue:
    5117 space map reallocation can cause corruption

MFC after:	3 days
2014-09-07 13:13:42 +00:00
peter
3baf385084 Move the restored #ifdef i386 test back inside the #ifdef _KERNEL block
where it originally was.
2014-08-31 09:05:02 +00:00
smh
8d9d31d786 Ensure that ZFS ARC free memory checks include cached pages
Also restore kmem_used() check for i386 as it has KVA limits that the raw
page counts above don't consider

PR:		187594
Reviewed by:	peter
X-MFC-With: r270759
Review:	D700
Sponsored by:	Multiplay
2014-08-30 21:44:32 +00:00
mjg
4cf719a9ee Add missing proctree locking to fill_kinfo_proc consumers.
This fixes r270444.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-08-30 03:10:55 +00:00
smh
502601a540 Refactor ZFS ARC reclaim logic to be more VM cooperative
Prior to this change we triggered ARC reclaim when kmem usage passed 3/4
of the total available, as indicated by vmem_size(kmem_arena, VMEM_ALLOC).

This could lead large amounts of unused RAM e.g. on a 192GB machine with
ARC the only major RAM consumer, 40GB of RAM would remain unused.

The old method has also been seen to result in extreme RAM usage under
certain loads, causing poor performance and stalls.

We now trigger ARC reclaim when the number of free pages drops below the
value defined by the new sysctl vfs.zfs.arc_free_target, which defaults
to the value of vm.v_free_target.

Credit to Karl Denninger for the original patch on which this update was
based.

PR:		191510 and 187594
Tested by:	dteske
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Multiplay
2014-08-28 19:50:08 +00:00
markj
46bd89ef4c Restore the correct value when disabling probes. Otherwise the instrumented
tracepoints would continue to generate traps, which would be ignored but
could consume noticeable amounts of CPU if, say, all functions in the kernel
were instrumented.

X-MFC-With:	r270067
2014-08-24 17:10:47 +00:00
delphij
d89e74165b Instead of using timestamp in the AVL, use the memory address when
comparing.

Illumos issue:
    5095 panic when adding a duplicate dbuf to dn_dbufs

MFC after:	3 days
2014-08-22 23:13:53 +00:00
delphij
626a49e1d6 MFV r270197:
Illumos issue:
    5066 remove support for non-ANSI compilation
    5068 Remove SCCSID() macro from <macros.h>

MFC after:	2 weeks
2014-08-22 22:13:36 +00:00
delphij
6922e3fedf Provide compatibility shim for atomic_dec_64_nv.
X-MFC-with:	r270247
MFC after:	13 days
2014-08-21 08:25:46 +00:00
delphij
5a3c4456e4 MFV r270196:
Illumos issue:
    5047 don't use atomic_*_nv if you discard the return value

MFC after:	2 weeks
2014-08-20 22:39:26 +00:00
delphij
d8cd2ff335 MFC r270195:
Illumos issue:
    5045 use atomic_{inc,dec}_* instead of atomic_add_*

MFC after:	2 weeks
2014-08-20 21:44:48 +00:00
delphij
b248e9b18f MFV r270193:
Illumos issues:
    5042 stop using deprecated atomic functions

MFC after:	2 weeks
2014-08-20 18:29:18 +00:00
markj
ec83007481 Factor out the common code for function boundary tracing instead of
duplicating the entire implementation for both x86 and powerpc. This makes
it easier to add support for other architectures and has no functional
impact.

Phabric:	D613
Reviewed by:	gnn, jhibbits, rpaulo
Tested by:	jhibbits (powerpc)
MFC after:	2 weeks
2014-08-16 21:42:55 +00:00
delphij
a160f7bc63 MFV r269542:
In vdev_get_stats, check that the vdev is not a hole before computing the
fragmentation.  This fixes a panic when removing log device.

Illumos issue:
    5049 panic when removing log device

Author:		Alex Reece <alex@delphix.com>
MFC after:	2 weeks
2014-08-05 00:07:21 +00:00
markj
9e5713a930 Return 0 for the PPID of threads in process 0, as process 0 doesn't have a
parent process.

MFC after:	2 weeks
2014-08-04 19:02:30 +00:00
delphij
6ba22f8d1a Revert r269404 and use cpu_ticks() for dbuf allocation.
Encode CPU's number by XOR'ing the CPU ID against the 64-bit cpu_ticks().

Reviewed by:	mav, gibbs
Differential Revision: https://phabric.freebsd.org/D521
MFC after:	2 weeks
2014-08-03 09:47:51 +00:00
delphij
6901832d85 MFV r269427:
In dnode_children_t, use C99's "[]" idiom for declaring the variable
sized array dnc_children at the end of the structure.

This prevents the compiler from mistakenly optimizing away accesses
beyond the array's defined size.

Illumos issue:
    5038 Remove "old-style" flexible array usage in ZFS.
    Author: Justin T. Gibbs <justing@spectralogic.com>

MFC after:	2 weeks
2014-08-02 08:34:22 +00:00
ian
e5aca7d143 When arm 64-bit atomic ops are available, define ARM_HAVE_ATOMIC64. Use
that symbol (which will be correct in both kernel and userland contexts)
rather than just __arm__ to decide whether to use a local implementation.
2014-08-02 03:44:27 +00:00
ian
105b4c5f48 Use the 64-bit atomics now provided by arm machine/atomic.h instead of
(conflicting) local versions.
2014-08-01 23:45:50 +00:00
smh
07ac26f9f6 Don't return ZIO_PIPELINE_CONTINUE from vdev_op_io_start methods
This prevents recursion of vdev_queue_io_done as per r265321 but
using a different method as recommended on the openzfs list.

We now use zio_interrupt(zio) and return ZIO_PIPELINE_STOP instead
of returning ZIO_PIPELINE_CONTINUE from vdev_*_io_start methods.

zio_vdev_io_start now ASSERTS the that vdev_op_io_start returns
ZIO_PIPELINE_STOP to ensure future changes don't reintroduce
ZIO_PIPELINE_CONTINUE returns.

Cleanup flow in vdev_geom_io_start while I'm here.

Also fix some cases not using SET_ERROR(..)

MFC after:	2 weeks
X-MFC-With:	r265321
2014-08-01 23:16:48 +00:00
delphij
3582d4b4c2 Split gethrtime() and gethrtime_waitfree() and make the former use
nanouptime() instead of getnanouptime().  nanouptime(9) provides more
precise result at expense of being slower.

In r269223, gethrtime() is used as creation time of dbuf, which in turn
acts as portion of lookup key to maintain AVL invariant where there can
not be duplicate items.  Before this change, gethrtime() have preferred
better execution time by sacrificing precision, which may lead to panic
on busy systems with:

	panic: avl_find() succeeded inside avl_add()

Reported by:	allanjude, mav
PR:		kern/192284
MFC after:	11 days
X-MFC-with:	r269223
2014-08-01 22:33:23 +00:00
rpaulo
d088b79500 Copy strtolctype.h to sys/cddl/contrib/opensolaris/common/util to keep
the kernel self-contained.

Requested by:	jhb
2014-07-31 08:07:23 +00:00
delphij
26dcaa7a42 MFV r269224:
Increase default ARC buf_hash_table size.  When typical block size is small,
the hash table could be too small, which would lead to long hash chains and
limit performance for cached reads.

A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which
allows users to override the default assumption of average (typical) block
size.  Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).

Illumos issue:
    5034 ARC's buf_hash_table is too small

MFC after:	2 weeks
2014-07-29 09:36:48 +00:00
delphij
1747d793b5 MFV r269223:
Change dn->dn_dbufs from linked list to AVL tree.

Illumos issues:
  4873 zvol unmap calls can take a very long time for larger datasets

MFC after:	2 weeks
2014-07-29 08:42:22 +00:00
delphij
aae124248a Reschedule the 'deadman' callout after handling, this makes our
code behave more like it is on Solaris.

Reported by:	avg
Reviewed by:	avg, mav (but bugs are mine)

Differential Revision: https://phabric.freebsd.org/D457
2014-07-29 06:57:13 +00:00
kib
b6d2666057 Initialize zfs vnode v_hash when the vnode is allocated, instead of
postponing it to zfs_vget().  zfs_root() returned vnode with the
default value of v_hash, which caused inconsistent v_hash value when
root vnode was obtained from zfs_vget().

Nullfs allocated two upper vnodes for the root zfs vnode due to
different hashes, causing consistency problems.

Reported and tested by:	Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-28 14:24:18 +00:00
delphij
fb0b3d8e74 Add two sysctls for newly added tunables.
MFC after:	2 weeks
2014-07-26 19:07:08 +00:00
delphij
0f4faf42cb MFV r269010:
Import Illumos changes to address the following Illumos issues:
  4976 zfs should only avoid writing to a failing non-redundant
       top-level vdev
  4978 ztest fails in get_metaslab_refcount()
  4979 extend free space histogram to device and pool
  4980 metaslabs should have a fragmentation metric
  4981 remove fragmented ops vector from block allocator
  4982 space_map object should proactively upgrade when feature
       is enabled
  4984 device selection should use fragmentation metric

MFC after:	2 weeks
2014-07-26 10:20:48 +00:00
mav
381cd3e0c8 Make sysctls under vfs.zfs.zfetch writeable.
I don't see any reason for them to be read-only, while tuning them without
reboot is much more convenient for experiments.

MFC after:	2 weeks
2014-07-26 09:09:14 +00:00
delphij
c9e947a583 Transform the I/O when vdev_physical_ashift is greater than
SPA_MINBLOCKSHIFT.

MFC after:	2 weeks
2014-07-25 18:41:56 +00:00