Commit Graph

1500 Commits

Author SHA1 Message Date
Andriy Gapon
0ab1aa90fa fix locking in zfsctl_root_lookup
Dropping the root vnode's lock after VFS_ROOT() didn't really help the
fact that we acquired the lock while holding its child's, .zfs, lock
while performing the operaiton.
So, directly use zfs_zget() to get the root vnode.

While there simplify the code in zfsctl_freebsd_root_lookup.
We know that .zfs is always exclusively locked.
We know that there is already a reference on *vpp, so no need for an
extra one.
Account for the fact that .. lookup may ask for a different lock type,
not necessarily LK_EXCLUSIVE.  And handle a possible failure to acquire
the lock given the lock flags.

MFC after:	5 weeks
2016-05-16 15:28:39 +00:00
Andriy Gapon
705e6b8170 gfs_lookup_dot() does not have to acquire any locks
In fact, that was dangerous.  For example, zfsctl_snapshot_reclaim()
calls gfs_dir_lookup() on ".." path and that ends up calling
gfs_lookup_dot() which violated locking order by acquiring the parent's
directory vnode lock after the child's vnode lock.

Also, the previous behavior was inconsistent as gfs_dir_lookup()
returned a locked vnode for . and .. lookups, but not for any other.

Now gfs_lookup_dot() just references a resulting vnode and the locking
is done in its consumers, where necessary.
Note that we do not enable shared locking support for any gfs / zfsctl
vnodes.

This commit partially reverts r273641.

MFC after:	5 weeks
2016-05-16 15:13:16 +00:00
Andriy Gapon
7223645bd1 avoid deadlock between zfsctl_snapdir_lookup and zfsctl_snapshot_reclaim
The former acquired a snap vnode lock while holding sd_lock while the
latter does the opposite.

The solution is drop sd_lock before acquiring the vnode lock.  That
should be okay as we are still holding a lock on the 'snapshot'
directory in the exclusive mode.  That lock ensures that there are no
concurrent lookups in the directory and thus no concurrent mount attempts.

But now we have to account for the possibility that the snap vnode
might get reclaim after we drop sd_lock and before we can get
the node lock.  So, check for that case and retry.

MFC after:	5 weeks
2016-05-16 15:03:52 +00:00
Andriy Gapon
c6cd01d924 fix a vnode reference leak caused by illumos compat traverse()
This commit partially reverts r273641 which introduced the leak.
It did so to accomodate for some consumers of traverse() that expected
the starting vnode to stay as-is.  But that introduced the leak in the
case when a mounted filesystem was found and its root vnode was
returned.

r299914 removed the troublesome consumers and now there is no reason to
keep the starting vnode.  So, now the new rules are:
- if there is no mounted filesystem, then nothing is changed
- otherwise the starting vnode is always released
- the root vnode of the mounted filesystem is returned locked and
  referenced in the case of success

MFC after:	5 weeks
X-MFC after:	r299914
2016-05-16 12:15:19 +00:00
Andriy Gapon
20ec8b0f9b fix up r299902: mount_snapshot requires that the covered vnode is locked
Previously that was not strictly enforced.

MFC after:	4 weeks
X-MFC with:	r299902
2016-05-16 11:48:43 +00:00
Andriy Gapon
cf7aa80bbd zfsctl_ops_snapshot: remove methods should never be called
We pretend that snapshots mounted under .zfs are part of the original
filesystem and we try very hard to hide vnodes on top of which the snapshots
are mounted.  Given that I believe that the removed operations should
never be called.  They might have been called previously because
of issues fixed in r299906, r299908 and r299913.

MFC after:	5 weeks
2016-05-16 07:24:30 +00:00
Andriy Gapon
cb68fd3513 zfsctl_snapdir_lookup: always clear VV_ROOT flag of snapshot's root VV_ROOT
Previosuly we did that only if the snapshot was mounted earlier, its
root vnode got recycled and then we accessed it again.
We never cleared the flag for a freshly mounted snapshot.

That was very inconsistent and probably a source of some bugs.
Or maybe that painted over some bugs which might get revealed now.

We should consistently clear the flag because we try very hard to
pretend that snapshots auto-mounted under .zfs are part of their
original filesystem.  In other words, we try to hide the fact that they
are different filesystems / mountpoints.

MFC after:	5 weeks
2016-05-16 06:49:09 +00:00
Andriy Gapon
4df590b5b6 add zfs_vptocnp with special handling for snapshots under .zfs
The logic is similar to that already present in zfs_dirlook() to handle
a dot-dot lookup on a root vnode of a snapshot mounted under
.zfs/snapshot/.
illumos does not have an equivalent of vop_vptocnp, so there only the
lookup had to be patched up.

MFC after:	4 weeks
2016-05-16 06:40:51 +00:00
Andriy Gapon
a03fb1cf6a mount_snapshot: consolidate all error handling
This makes sure that the original vnode is always unlocked and released
if any error happens.

MFC after:	4 weeks
2016-05-16 06:30:25 +00:00
Andriy Gapon
3055925d42 zfsctl: fix several problems with reference counts
* Remove excessive references on a snapshot mountpoint vnode.
  zfsctl_snapdir_lookup() called VN_HOLD() on a vnode returned from
  zfsctl_snapshot_mknode() and the latter also had a call to VN_HOLD()
  on the same vnode.
  On top of that gfs_dir_create() already returns the vnode with the
  use count of 1 (set in getnewvnode).
  So there was 3 references on the vnode.

* mount_snapshot() should keep a reference to a covered vnode.
  That reference is owned by the mountpoint (mounted snapshot filesystem).

* Remove cryptic manipulations of a covered vnode in zfs_umount().
  FreeBSD dounmount() already does the right thing and releases the covered
  vnode.

PR:		207464
Reported by:	dustinwenz@ebureau.com
Tested by:	Howard Powell <hpowell@lighthouseinstruments.com>
MFC after:	3 weeks
2016-05-16 06:24:04 +00:00
John Baldwin
fdce57a042 Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
Enji Cooper
622282b3b8 Include arpa/inet.h to get the htonl(3) definition
MFC after: 2 weeks
Reported by: clang
Sponsored by: EMC / Isilon Storage Division
2016-05-13 11:15:33 +00:00
Conrad Meyer
3b56262303 compat/opensolaris: Don't redefined off64_t if already defined
A follow-up to r299456.

Reported by:	gjb
Sponsored by:	EMC / Isilon Storage Division
2016-05-11 16:05:32 +00:00
Alexander Motin
c59a902fa3 MFV r299453: 6765 zfs_zaccess_delete() comments do not accurately reflect
delete permissions for ACLs

Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Kevin Crowe <kevin.crowe@nexenta.com>

openzfs/openzfs@a40149b935
2016-05-11 13:53:29 +00:00
Alexander Motin
0eb65a5367 MFV r299451: 6764 zfs issues with inheritance flags during chmod(2) with
aclmode=passthrough

Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Albert Lee <trisk@nexenta.com>

openzfs/openzfs@1bcf0d240b
2016-05-11 13:50:34 +00:00
Alexander Motin
85a69dbf66 MFV r299449: 6763 aclinherit=restricted masks inherited permissions by group
perms (groupmask)

Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Albert Lee <trisk@nexenta.com>

openzfs/openzfs@eebb483d0c
2016-05-11 13:48:15 +00:00
Alexander Motin
2a219f349e MFV r299442: 6762 POSIX write should imply DELETE_CHILD on directories - and
some additional considerations

Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Kevin Crowe <kevin.crowe@nexenta.com>

openzfs/openzfs@d316fffc9c
2016-05-11 13:43:20 +00:00
Alexander Motin
42a54f9745 MFV r299440: 6736 ZFS per-vdev ZAPs
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Joe Stein <joe.stein@delphix.com>

openzfs/openzfs@215198a6ad
2016-05-11 12:54:00 +00:00
Alexander Motin
7d54dbae83 MFV r299438: 6842 Fix empty xattr dir causing lockup
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chunwei Chen <tuxoko@gmail.com>

openzfs/openzfs@02525cd08f
2016-05-11 12:46:07 +00:00
Alexander Motin
d7ff478705 MFV r299436: 6843 Make xattr dir truncate and remove in one tx
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chunwei Chen <tuxoko@gmail.com>

openzfs/openzfs@399cc7d5d9
2016-05-11 12:43:54 +00:00
Alexander Motin
0b99ac761e MFV r299434: 6841 Undirty freed spill blocks
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Tim Chase <tim@chase2k.com>

openzfs/openzfs@445e67805d
2016-05-11 12:38:07 +00:00
Ruslan Bukin
d7dc6bae03 Implement FBT provider (MD part) for DTrace on MIPS.
Tested on MIPS64.

Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
2016-05-05 13:54:50 +00:00
Alan Somers
c9a807447d Fix a use-after-free when "zpool import" fails
clear vd->vdev_tsd in vdev_geom_close_locked instead of vdev_geom_detach.
In the latter function, it would fail to happen in certain circumstances
where cp->private was unset.  Ideally, the latter should never happen, but
it can happen when vdev open fails, or where spares are involved.

MFC after:	4 weeks
X-MFC-With:	298786
Sponsored by:	Spectra Logic Corp
2016-04-29 21:29:37 +00:00
Andriy Gapon
27b6c49726 add invpcid instruction to i386 dtrace disassembler tables
MFC after:	2 weeks
2016-04-29 15:45:22 +00:00
Alan Somers
663f649ff6 Refactor vdev_geom_attach and friends to reduce code duplication
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
	Move checks for provider's sectorsize and mediasize into a single
	location in vdev_geom_attach. Remove the zfs::vdev::taste class;
	it's ok to use the regular vdev class for tasting. Consolidate guid
	checks into a single location in vdev_attach_ok. Consolidate some
	error handling code from vdev_geom_attach into vdev_geom_detach,
	closing a resource leak of geom consumers in the process.

Reviewed by:	avg
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D5974
2016-04-29 15:23:51 +00:00
Mark Johnston
676a03fa6a Increase DTRACE_FUNCNAMELEN from 128 to 192.
This allows for the long function components encountered in www/firefox.
This constant is part of DTrace's userland ABI, so this change may not be
MFC'ed.

PR:	207735
2016-04-25 18:44:11 +00:00
Mark Johnston
328d8adb9b Allow DOF sections with excessively long probe function components.
Without this change, DTrace will refuse to load a DOF section if the
function component of any of its probes exceeds DTRACE_FUNCNAMELEN (128).
Probes in C++ programs can have very long function components. Rather than
rejecting all probes if a single probe exceeds the limit, simply skip the
invalid probe and emit a warning. This ensures that valid probes are
instantiated.

PR:		207735
MFC after:	2 weeks
2016-04-25 18:40:57 +00:00
Mark Johnston
cd8bbc382d Add a kern.dtrace.err_verbose sysctl to control dtrace_err_verbose.
When this flag is turned on, DOF and DIF validation errors are printed to
the kernel message buffer. This is useful for debugging.

Also remove the debug.dtrace.debug sysctl, which has no effect.
2016-04-25 18:09:36 +00:00
Andriy Gapon
2d69831b85 lahf/sahf are supported on some amd64 processors
While the instructions were not included into the original instruction
set, their support can be indicated by a special feature bit.
For example:
  CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU)
  ...
    AMD Features2=0x37ff<LAHF, ...>

Clang 3.8 uses lahf/sahf as a faster alternative to pushf/popf where
possible.

MFC after:	2 weeks
2016-04-22 13:44:12 +00:00
Andriy Gapon
dbbcddb426 MFV r298471: 6052 decouple lzc_create() from the implementation details
illumos/illumos-gate@26455f9efc
26455f9efc

https://www.illumos.org/issues/6052
  At the moment type parameter of lzc_create() is of dmu_objset_type_t type.
  That exposes an implementation detail and requires sys/fs/zfs.h to be included
  in libzfs_core.h creating unnecessary coupling between libzfs_core interface
  and ZFS internals.
  I think that dmu_objset_type_t should be replaced with a libzfs_core
  enumeration of supported dataset types.
  For ABI reasons the new enumeration could be bit-compatible with
  dmu_objset_type_t.
  For example:
      typedef enum {
          LZC_DST_ZFS = 2,
          LZC_DST_ZVOL
      } lzc_dataset_type_t;

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <andriy.gapon@clusterhq.com>

MFC after:	2 weeks
Sponsored by:	ClusterHQ
2016-04-22 13:00:27 +00:00
Mark Johnston
6c2806594b Make the second argument of dtrace_invop() a trapframe pointer.
Currently this argument is a pointer into the stack which is used by FBT
to fetch the first five probe arguments. On all non-x86 architectures it's
simply the trapframe address, so this change has no functional impact. On
amd64 it's a pointer into the trapframe such that stack[1 .. 5] gives the
first five argument registers, which are deliberately grouped together in
the amd64 trapframe definition.

A trapframe argument simplifies the invop handlers on !x86 and makes the
x86 FBT invop handler easier to understand. Moreover, it allows for invop
handlers that may want to modify the register set of the interrupted thread.
2016-04-17 23:08:47 +00:00
Andriy Gapon
e01dd79f9a zfs_rezget: z_vnode can not be NULL if zp is valid
MFC after:	3 weeks
2016-04-16 07:41:56 +00:00
Andriy Gapon
c2d36fc5cd zfs: enable vn_io_fault support
Note that now we have to account for possible partial writes
in dmu_write_uio_dbuf().  It seems that on illumos either all or none
of the data are expected to be written.  But the partial writes are
quite expected when vn_io_fault support is enabled.

Reviewed by:	kib
MFC after:	7 weeks
Differential Revision: https://reviews.freebsd.org/D2790
2016-04-16 07:35:53 +00:00
Alan Somers
739f4ae3b1 Don't corrupt ZFS label's physpath attribute when booting while a disk is missing
Prior to this change, vdev_geom_open_by_path would call vdev_geom_attach
prior to verifying the device's GUIDs.  vdev_geom_attach calls
vdev_geom_attrchange to set the physpath in the vdev object.  The result is
that if the disk could not be found, then the labels for other disks in the
same TLD would overwrite the missing disk's physpath with the physpath of
whichever disk currently has the same devname as the missing one used to
have.

MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
2016-04-15 16:36:17 +00:00
Alan Somers
c29088b5c7 Add more debugging statements in vdev_geom.c
Log a debugging message whenever geom functions fail in vdev_geom_attach.
Printing these messages is controlled by vfs.zfs.debug

MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
2016-04-14 23:14:41 +00:00
Alan Somers
f0ac053088 Update a debugging message in vdev_geom_open_by_guids for consistency with
similar messages elsewhere in the file.

MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
2016-04-14 19:20:31 +00:00
Alan Somers
4e3ab010a2 Fix rare double free in vdev_geom_attrchanged
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
	Don't drop the g_topology_lock before freeing old_physpath. That
	opens up a race where one thread can call vdev_geom_attrchanged,
	set old_physpath, drop the g_topology_lock, then block trying to
	acquire the SCL_STATE lock. Then another thread can come into
	vdev_geom_attrchanged, set old_physpath to the same value, and
	proceed to free it. When the first thread resumes, it will free
	the same location.

	It turns out that the SCL_STATE lock isn't needed. It was
	originally added by gibbs to protect vd->vdev_physpath while
	updating the same. However, the update process subsequently was
	switched to an atomic operation (a pointer swap). Now, there is
	no need for the SCL_STATE lock, and hence no need to drop the
	g_topology_lock.

Reviewed by:	delphij
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D5413
2016-04-12 19:11:14 +00:00
Andriy Gapon
c3249989ef l2arc: make sure that all writes honor ashift of a cache device
Previously uncompressed buffers did not obey that rule.

Type of b_asize is changed to uint64_t for consistency,
given that this is a zeta-byte filesystem.

l2arc_compress_buf is renamed to l2arc_transform_buf to better reflect
its new utility.  Now not only we ensure that a compressed buffer has
a size aligned to ashift, but we also allocate a properly sized
temporary buffer if the original buffer is not compressed and it has
an odd size.  This ensures that all I/O to the cache device is always
ashift-aligned, in terms of both a request offset and a request size.

If the aligned data is larger than the original data, then we have to use
a temporary buffer when reading it as well.

Also, enhance physical zio alignment checks using vdev_logical_ashift.
On FreeBSD we have this information, so we can make stricter assertions.

Reviewed by: smh, mav
MFC after:	1 month
Sponsored by:	ClusterHQ
Differential Revision: https://reviews.freebsd.org/D2789
2016-04-12 06:56:35 +00:00
Andriy Gapon
6a50036052 Revert r297396 Modify "4958 zdb trips assert on pools with ashift >= 0xe"
A better fix is following.
2016-04-12 06:54:18 +00:00
Alexander Motin
78aec5c610 MFV r297831: 6322 ZFS indirect block predictive prefetch
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Author: Alexander Motin <mav@FreeBSD.org>

Improve speculative prefetch of indirect blocks.

Scalability of many operations on wide ZFS pool can be limited by
requirement to prefetch indirect blocks first.  Recently added
asynchronous indirect block read partially helped, but did not
solve the problem completely.  This patch extends existing prefetcher
functionality to explicitly work with indirect blocks.

Before this change prefetcher issued reads for up to 8MB of data in
advance.  With this change it also issues indirect block reads
for up to 64MB of data in advance, so that when it will be time to
actually read those data, it can be done immediately.  Alike effect
can be achieved by just increasing maximal data prefetch distance,
but at higher memory cost.

Also this change introduces indirect block prefetch for rewrite
operations, that was never done before.  Previously ARC miss for
Indirect blocks regularly blocked rewrites, converting perfectly
aligned asynchronous operations into synchronous read-write pairs,
significantly reducing maximal rewrite speed.

While being there this issue was also fixed:
 - prefetch was done always, even if caching for the dataset was
completely disabled.

Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool
of 12 assorted HDDs shown me such performance numbers:
------- BEFORE --------
Write       491363677 bytes/sec
Read        312430631 bytes/sec
Rewrite      97680464 bytes/sec
-------- AFTER --------
Write       493524146 bytes/sec
Read        438598079 bytes/sec
Rewrite     277506044 bytes/sec

Closes #65
Closes #80

openzfs/openzfs@792fd28ac0
2016-04-11 21:09:15 +00:00
Steven Hartland
2dcee04b3a Only include sysctl in kernel build
Only include sysctl in kernel builds fixing warning about implicit
declaration of function 'sysctl_handle_int'.

PR:		204140
MFC after:	1 week
X-MFC-With:	r297813
Sponsored by:	Multiplay
2016-04-11 13:17:11 +00:00
Steven Hartland
7bc47b4ea3 Only include sysctl in kernel build
Only include sysctl in kernel builds fixing warning about implicit
declaration of function 'sysctl_handle_int'.

Sponsored by:	Multiplay
2016-04-11 08:57:54 +00:00
Andriy Gapon
1da2e1e353 zio: align use of "no dump" flag between use_uma and !use_uma cases
At the moment no ZFS buffers are included into a crash dump unless
ZFS_DEBUG (or INVARIANTS) kernel option is enabled.  That's not very
helpful for debugging of ZFS problems, because important information
often resides in metadata buffers.
This change switches the dumping behavior when UMA is used from the
illumos behavior to a more useful behavior that we have on FreeBSD
when ZFS buffers are allocated via malloc.

Reviewed by:	smh, mav
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D5892
2016-04-11 07:11:20 +00:00
Mark Johnston
b529028676 Implement support for boot-time DTrace.
This allows one to enable DTrace probes relatively early during boot,
during SI_SUB_DTRACE_ANON, before dtrace(1) can invoked. The desired
enabling is created using dtrace -A, which writes a /boot/dtrace.dof
file and uses nextboot(8) to ensure that DTrace kernel modules are loaded
and that the DOF file describing the enabling is loaded by loader(8)
during the subsequent boot. The trace output can then be fetched with
dtrace -a.

With this commit, boot-time DTrace is only functional on i386 and amd64: on
other architectures, the high-resolution timer frequency is initialized
during SI_SUB_CLOCKS and is thus not available when the anonymous
tracing state is initialized. On x86, the TSC is used and is thus available
earlier.

MFC after:	1 month
Relnotes:	yes
2016-04-10 01:25:48 +00:00
Mark Johnston
33b454938a Initialize SDT probes during SI_SUB_DTRACE_PROVIDER.
This is consistent with all other DTrace providers and ensures that
SDT probes are available for boot-time tracing.

MFC after:	2 weeks
2016-04-10 01:24:27 +00:00
Mark Johnston
e1e33ff912 Initialize DTrace hrtimer frequency during SI_SUB_CPU on i386 and amd64.
This allows the hrtimer to be used earlier during boot. This is required
for boot-time DTrace: anonymous enablings are created during
SI_SUB_DTRACE_ANON, which runs before APs are started. In particular,
the DTrace deadman timer requires that the hrtimer be functional.

MFC after:	2 weeks
2016-04-10 01:23:39 +00:00
Alexander Motin
eaee150e3f MFV r297760: 6418 zpool should have a label clearing command
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Author: Will Andrews <will@firepipe.net>

Closes #83
Closes #32

openzfs/openzfs@9663688425

FreeBSD already had `zpool labelclear` functionality, so this is mostly
just a diff reduction.

MFC after:	1 month
2016-04-09 20:30:50 +00:00
Andriy Gapon
c8ff459286 zio write issue threads should have lower (numerically greater) priority
This is because they might do data compression which is quite CPU
expensive.  The original code is correct for illumos, because there
a higher priority corresponds to a greater number.

MFC after:	2 weeks
2016-04-08 11:58:24 +00:00
Alexander Motin
309b1c7ade Alike to r293708 relax pool check in vdev_geom_open_by_path().
This made impossible spare disk open by known path, which kind of worked
only because the same fix was applied to vdev_geom_attach_by_guids() in
r293708.

MFC after:	1 week
2016-04-07 12:54:44 +00:00
Edward Tomasz Napierala
ae34b6ff96 Add four new RCTL resources - readbps, readiops, writebps and writeiops,
for limiting disk (actually filesystem) IO.

Note that in some cases these limits are not quite precise. It's ok,
as long as it's within some reasonable bounds.

Testing - and review of the code, in particular the VFS and VM parts - is
very welcome.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5080
2016-04-07 04:23:25 +00:00