Commit Graph

218 Commits

Author SHA1 Message Date
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Gleb Smirnoff
9ed01c32e0 All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.
2017-04-17 17:07:00 +00:00
Alexander Motin
3aef5b286a MFV r315290, r315291: 7303 dynamic metaslab selection
illumos/illumos-gate@8363e80ae7
https://github.com/illumos/illumos-gate/commit/8363e80ae72609660f6090766ca8c2c18

https://www.illumos.org/issues/7303

  This change introduces a new weighting algorithm to improve metaslab selection.
  The new weighting algorithm relies on the SPACEMAP_HISTOGRAM feature. As a result,
  the metaslab weight now encodes the type of weighting algorithm used
  (size-based vs segment-based).

  This also introduce a new allocation tracing facility and two new dcmds to help
  debug allocation problems. Each zio now contains a zio_alloc_list_t structure
  that is populated as the zio goes through the allocations stage. Here's an
  example of how to use the tracing facility:

> c5ec000::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
  MSID    DVA    ASIZE      WEIGHT             RESULT               VDEV
     -      0      400           0    NOT_ALLOCATABLE           ztest.0a
     -      0      400           0    NOT_ALLOCATABLE           ztest.0a
     -      0      400           0             ENOSPC           ztest.0a
     -      0      200           0    NOT_ALLOCATABLE           ztest.0a
     -      0      200           0    NOT_ALLOCATABLE           ztest.0a
     -      0      200           0             ENOSPC           ztest.0a
     1      0      400      1 x 8M            17b1a00           ztest.0a

> 1ff2400::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
  MSID    DVA    ASIZE      WEIGHT             RESULT               VDEV
     -      0      200           0    NOT_ALLOCATABLE           mirror-2
     -      0      200           0    NOT_ALLOCATABLE           mirror-0
     1      0      200      1 x 4M            112ae00           mirror-1
     -      1      200           0    NOT_ALLOCATABLE           mirror-2
     -      1      200           0    NOT_ALLOCATABLE           mirror-0
     1      1      200      1 x 4M            112b000           mirror-1
     -      2      200           0    NOT_ALLOCATABLE           mirror-2

  If the metaslab is using segment-based weighting then the WEIGHT column will
  display the number of segments available in the bucket where the allocation
  attempt was made.

Author: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2017-03-24 09:37:00 +00:00
Andriy Gapon
7fa27112f3 zfs: clean up unused files and definitions
MFC after:	1 month
X-MFC after:	r314048
2017-02-24 07:53:56 +00:00
Andriy Gapon
47c8e3d912 reimplement zfsctl (.zfs) support
The current code is written on top of GFS, a library with the generic
support for writing filesystems, which was ported from illumos.
Because of significant differences between illumos VFS and FreeBSD
VFS models, both the GFS and zfsctl code were heavily modified to
work on FreeBSD.  Nonetheless, they still contain quite a few ugly
hacks and bugs.

This is a reimplementation of the zfsctl code where the VFS-specific
bits are written from scratch and only the code that interacts with
the rest of ZFS is reused.

Some highlights.

We use two types of nodes, static and on-demand. The static nodes
are used for permanent directories like .zfs, .zfs/snapshot, etc. The
on-demand nodes are used for ephemeral directories that act as snapshot
mount points.
Initially only static nodes are created. Their vnodes are instantiated
when they are looked up. The on-demand nodes and vnodes are instantiated
as needed and the nodes are destroyed as soon as the corresponding
vnodes are reclaimed.
We also try very hard to ensure that uncovered snapshot vnodes do not
linger.  They are supposed to become inactive as soon as they are
uncovered and we try to recycle them immediately.
When a filesystem is unmounted all snapshots under .zfs are unmounted
first, then all vnodes are flushed and finally the static .zfs nodes
are destroyed.

There are some changes outside of zfsctl code too.
z_ctldir is never used directly (as it is an opaque pointer),
zfsctl_root() has to be used instead.  The function returns a locked
vnode now, so it accepts a lock flags parameter.  The function can
also fail now, e.g. during force unmounting, whereas previously it
was infallible.
zfsctl_root_lookup() is retired, instead of it VOP_LOOKUP() on the .zfs
vnode (obtained with zfsctl_root) is used.

Some ideas are picked from an independent work by will.

Reviewed by:	asomers, smh
MFC after:	1 month
Relnotes:	maybe
Differential Revision: https://reviews.freebsd.org/D7421
2017-02-21 17:47:08 +00:00
Mateusz Guzik
619ce4d72e Revert r309619 "ifndef atomic_cas_* in cddl code"
It was a temporary change to ease an import of native atomic_cas primitives.
Instead, atomic_fcmpset was devised with different semantics. See r311168.
2017-01-03 21:02:30 +00:00
Alexander Motin
c5f74c4873 Revert r310023 for now.
After another look my new variable mapping was not exactly right.
2016-12-15 08:03:16 +00:00
Alexander Motin
d686b07132 Reduce diff from Illumos by better variables mapping. 2016-12-13 16:20:10 +00:00
Mateusz Guzik
ef32958e5d ifndef atomic_cas_* in cddl code in preparation for native implementations
This is a temporary change to not require all architectures to import at once.

Discussed with:	jhb
2016-12-06 14:08:49 +00:00
Alan Cox
bba39b9ae3 Remove PG_CACHED-related fields from struct vmmeter, because they are no
longer used.  More precisely, they are always zero because the code that
decremented and incremented them no longer exists.

Bump __FreeBSD_version to mark this change.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8583
2016-11-22 18:13:46 +00:00
Alan Cox
7667839a7e Remove most of the code for implementing PG_CACHED pages. (This change does
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments.  Those changes will come later.)

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8497
2016-11-15 18:22:50 +00:00
Mark Johnston
9e579a58c3 Move implementations of uread() and uwrite() to the illumos compat layer.
MFC after:	1 week
2016-09-24 21:40:14 +00:00
Alexander Motin
20e45e033c Switch random_get_pseudo_bytes() shim to arc4rand().
Our shim for Solaris random_get_bytes() uses read_random(), that looks
reasonable, since it guaranties reliably seeded random data.  On the other
side Solaris random_get_pseudo_bytes() does not provide this guarantie,
and its original Solaris implementation is equivalent to our arc4rand(),
using software crypto without stressing slower hardware RNG.
2016-09-10 09:37:41 +00:00
Andriy Gapon
f79bc17233 zfs: honour and make use of vfs vnode locking protocol
ZFS POSIX Layer is originally written for Solaris VFS which is very
different from FreeBSD VFS.  Most importantly many things that FreeBSD VFS
manages on behalf of all filesystems are implemented in ZPL in a different
way.
Thus, ZPL contains code that is redundant on FreeBSD or duplicates VFS
functionality or, in the worst cases, badly interacts / interferes
with VFS.

The most prominent problem is a deadlock caused by the lock order reversal
of vnode locks that may happen with concurrent zfs_rename() and lookup().
The deadlock is a result of zfs_rename() not observing the vnode locking
contract expected by VFS.

This commit removes all ZPL internal locking that protects parent-child
relationships of filesystem nodes.  These relationships are protected
by vnode locks and the code is changed to take advantage of that fact
and to properly interact with VFS.

Removal of the internal locking allowed all ZPL dmu_tx_assign calls to
use TXG_WAIT mode.

Another victim, disputable perhaps, is ZFS support for filesystems with
mixed case sensitivity.  That support is not provided by the OS anyway,
so in ZFS it was a buch of dead code.

To do:
- replace ZFS_ENTER mechanism with VFS managed / visible mechanism
- replace zfs_zget with zfs_vget[f] as much as possible
- get rid of not really useful now zfs_freebsd_* adapters
- more cleanups of unneeded / unused code
- fix / replace .zfs support

PR:		209158
Reported by:	many
Tested by:	many (thank you all!)
MFC after:	5 days
Sponsored by:	HybridCluster / ClusterHQ
Differential Revision: https://reviews.freebsd.org/D6533
2016-08-05 06:23:06 +00:00
Nathan Whitehorn
96c85efb4b Replace a number of conflations of mp_ncpus and mp_maxid with either
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.

There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.

PR:		kern/210106
Reviewed by:	jhb
Approved by:	re (glebius)
2016-07-06 14:09:49 +00:00
Konstantin Belousov
4a0d95f810 Use vnlru_free(9) to implement dnlc_reduce_cache().
This apparently puts ARC back under the limits after the vnode pressure
rework in r291244, in particular due to the kmem exhaustion.

Based on patch by:	mckusick
Reviewed by:	avg, mckusick
Tested by:	allanjude, madpilot
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2016-06-17 17:34:28 +00:00
Andriy Gapon
c6cd01d924 fix a vnode reference leak caused by illumos compat traverse()
This commit partially reverts r273641 which introduced the leak.
It did so to accomodate for some consumers of traverse() that expected
the starting vnode to stay as-is.  But that introduced the leak in the
case when a mounted filesystem was found and its root vnode was
returned.

r299914 removed the troublesome consumers and now there is no reason to
keep the starting vnode.  So, now the new rules are:
- if there is no mounted filesystem, then nothing is changed
- otherwise the starting vnode is always released
- the root vnode of the mounted filesystem is returned locked and
  referenced in the case of success

MFC after:	5 weeks
X-MFC after:	r299914
2016-05-16 12:15:19 +00:00
Andriy Gapon
a03fb1cf6a mount_snapshot: consolidate all error handling
This makes sure that the original vnode is always unlocked and released
if any error happens.

MFC after:	4 weeks
2016-05-16 06:30:25 +00:00
Andriy Gapon
3055925d42 zfsctl: fix several problems with reference counts
* Remove excessive references on a snapshot mountpoint vnode.
  zfsctl_snapdir_lookup() called VN_HOLD() on a vnode returned from
  zfsctl_snapshot_mknode() and the latter also had a call to VN_HOLD()
  on the same vnode.
  On top of that gfs_dir_create() already returns the vnode with the
  use count of 1 (set in getnewvnode).
  So there was 3 references on the vnode.

* mount_snapshot() should keep a reference to a covered vnode.
  That reference is owned by the mountpoint (mounted snapshot filesystem).

* Remove cryptic manipulations of a covered vnode in zfs_umount().
  FreeBSD dounmount() already does the right thing and releases the covered
  vnode.

PR:		207464
Reported by:	dustinwenz@ebureau.com
Tested by:	Howard Powell <hpowell@lighthouseinstruments.com>
MFC after:	3 weeks
2016-05-16 06:24:04 +00:00
Conrad Meyer
3b56262303 compat/opensolaris: Don't redefined off64_t if already defined
A follow-up to r299456.

Reported by:	gjb
Sponsored by:	EMC / Isilon Storage Division
2016-05-11 16:05:32 +00:00
Andriy Gapon
e881d8757c remove emulation of VFS_HOLD and VFS_RELE from opensolaris compat
On FreeBSD VFS_HOLD/VN_RELE were mapped to MNT_REF/MNT_REL that
manipulate mnt_ref.  But the job of properly maintaining the reference
count is already automatically performed by insmntque(9) and
delmntque(9).  So, in effect all ZFS vnodes referenced the corresponding
mountpoint twice.

That was completely harmless, but we want to be very explicit about what
FreeBSD VFS APIs are used, because illumos VFS_HOLD and FreeBSD MNT_REF
provide quite different guarantees with respect to the held vfs_t /
mountpoint.  On illumos VFS_HOLD is sufficient to guarantee that
vfs_t.vfs_data stays valid.  On the other hand, on FreeBSD MNT_REF does
*not* provide the same guarantee about mnt_data.  We have to use
vfs_busy() to get that guarantee.

Thus, the calls to VFS_HOLD/VFS_RELE on vnode init and fini are removed.
VFS_HOLD calls are replaced with vfs_busy in the ioctl handlers.

And because vfs_busy has a richer interface that can not be dumbed down
in all cases it's better to explicitly use it rather than trying to mask
it behind VFS_HOLD.

This change fixes a panic that could result from a race between
zfs_umount() and zfs_ioc_rollback().  We observed a case where
zfsvfs_free() tried to destroy data that zfsvfs_teardown() was still
using.  That happened because there was nothing to prevent unmounting of
a ZFS filesystem that was in between zfs_suspend_fs() and
zfs_resume_fs().

Reviewed by:	kib, smh
MFC after:	3 weeks
Sponsored by:	ClusterHQ
Differential Revision: https://reviews.freebsd.org/D2794
2016-04-02 16:25:46 +00:00
Alexander Motin
fa9de58388 Fix small memory leak on attempt to access deleted snapshot.
MFC after:	3 days
2016-03-15 21:21:28 +00:00
Alexander Motin
8d0e2eb06b MFV r296529:
6672 arc_reclaim_thread() should use gethrtime() instead of ddi_get_lbolt()
6673 want a macro to convert seconds to nanoseconds and vice-versa

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Eli Rosenthal <eli.rosenthal@delphix.com>

illumos/illumos-gate@a8f6344fa0
2016-03-08 18:28:24 +00:00
Alexander Motin
1b63fd68f4 MFV r296505: 6531 Provide mechanism to artificially limit disk performance
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@97e8130957
2016-03-08 17:27:13 +00:00
Ruslan Bukin
28029b68c0 Welcome the RISC-V 64-bit kernel.
This is the final step required allowing to compile and to run RISC-V
kernel and userland from HEAD.

RISC-V is a completely open ISA that is freely available to academia
and industry.

Thanks to all the people involved! Special thanks to Andrew Turner,
David Chisnall, Ed Maste, Konstantin Belousov, John Baldwin and
Arun Thomas for their help.
Thanks to Robert Watson for organizing this project.

This project sponsored by UK Higher Education Innovation Fund (HEIF5) and
DARPA CTSRD project at the University of Cambridge Computer Laboratory.

FreeBSD/RISC-V project home: https://wiki.freebsd.org/riscv

Reviewed by:	andrew, emaste, kib
Relnotes:	Yes
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D4982
2016-01-29 15:12:31 +00:00
Xin LI
28ffe927c2 Expose an interface to determine if an ACE is inherited.
Submitted by:	sef
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3540
2015-09-04 00:14:20 +00:00
Mariusz Zaborski
347a39b4a6 Add support for the arrays in nvlist library.
- Add
  nvlist_{add,get,take,move,exists,free}_{number,bool,string,nvlist,
  descriptor} functions.
- Add support for (un)packing arrays.
- Add the nvl_array_next field to the nvlist structure.
  If an array is added by the nvlist_{move,add}_nvlist_array function
  this field will contains next element in the array.
- Add the nitems field to the nvpair and nvpair_header structure.
  This field contains number of elements in the array.
- Add special flag (NV_FLAG_IN_ARRAY) which is set if nvlist is a part of
  an array.
- Add special type (NV_TYPE_NVLIST_ARRAY_NEXT).This type is used only
  on packing/unpacking.
- Add new API for traversing arrays (nvlist_get_array_next).
- Add the nvlist_get_pararr function which combines the
  nvlist_get_array_next and nvlist_get_parent functions. If nvlist is in
  the array it will return next element from array. If nvlist is last
  element in array or it isn't in array it will return his
  container (parent). This function should simplify traveling over nvlist.
- Add tests for new features.
- Add documentation for new functions.
- Add my copyright.
- Regenerate the sys/cddl/compat/opensolaris/sys/nvpair.h file.

PR:		191083
Reviewed by:	allanjude (doc)
Approved by:	pjd (mentor)
2015-08-15 06:34:49 +00:00
Alexander Motin
498b9d6c63 Fix r286574 build in user-space. 2015-08-10 12:25:26 +00:00
Alexander Motin
f13e9e1470 MFV r277427: 5445 Add more visibility via arcstats; specifically
arc_state_t stats and differentiate between "data" and "metadata"

Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@4076b1bf41
2015-08-10 10:59:58 +00:00
Ed Schouten
5a170c1b0e Add an API for easily creating userspace threads in kernelspace.
This change refactors the existing create_thread() function to be more
generic. It replaces almost all of its arguments by a callback that can
be used to extract the thread ID and copy it out to the right place, but
also to perform additional initialization steps, such as setting the
trapframe. This also makes the difference between thr_new() and
thr_create() more clear in my opinion.

This function is going to be used by the CloudABI compatibility layer.

It looks like the OpenSolaris compatibility framework already provides a
function called thread_create(). Rename this function to
do_thread_create() and use a macro to deal with the namespacing
conflict. A similar approach is already used for thread_exit().

MFC after:	1 month
2015-07-20 10:20:04 +00:00
Mateusz Guzik
8a08cec166 Create a dedicated function for ensuring that cdir and rdir are populated.
Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.

Reviewed by:	kib
2015-07-11 16:22:48 +00:00
Mateusz Guzik
f131759f54 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
Andriy Gapon
de93769f1d compat nvpair.h: make sure that the names are mangled only for kernel
Currently there is no good reason to mangle the userland API.
The change was introduced in eac1d566b4,
r279437.  Also see https://reviews.freebsd.org/D1881.

I am still convinced that nv should not have introduced intentionally
conflicting API.

Discussed with:	rstone
X-MFC with:	r279437
Sponsored by:	ClusterHQ
2015-06-07 08:54:25 +00:00
Andrew Turner
7572a8c8f1 Add the arm64 defines for cddl code.
Differential Revision:	https://reviews.freebsd.org/D2186
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
2015-04-01 08:31:56 +00:00
Ryan Stone
296a8144aa Allow Illumos code to co-exist with nv(9)
Differential Revision:		https://reviews.freebsd.org/D1881
Reviewed by:			jfv, will
Suggested by:			pjd
MFC after:			1 month
Sponsored by:			Sandvine Inc
2015-03-01 00:22:45 +00:00
Will Andrews
34abed55f3 NSEC_TO_TICK(usec) -> NSEC_TO_TICK(nsec) 2015-01-20 22:29:27 +00:00
Will Andrews
c9c5e04711 Remove unused strdup() #define. 2015-01-20 22:27:45 +00:00
Andriy Gapon
036a8c5dac remove opensolaris cyclic code, replace with high-precision callouts
In the old days callout(9) had 1 tick precision and that was inadequate
for some uses, e.g. DTrace profile module, so we had to emulate cyclic
API and behavior.  Now we can directly use callout(9) in the very few
places where cyclic was used.

Differential Revision:	https://reviews.freebsd.org/D1161
Reviewed by:	gnn, jhb, markj
MFC after:	2 weeks
2014-12-07 11:21:41 +00:00
Konstantin Belousov
6e646651d3 Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 18:01:51 +00:00
Josh Paetzel
14127f5b21 This change addresses 4 bugs in ZFS exposed by Richard Kojedzinszky's
crash.sh script attached to FreeNAS bug 4109:
https://bugs.freenas.org/issues/4109

Three are in the snapshot layer:
a) AVG explains in his notes: https://wiki.freebsd.org/AvgVfsSolarisVsFreeBSD

"VOP_INACTIVE must not do any destructive actions to a vnode
and its filesystem node, nor invalidate them in any way."
gfs_vop_inactive and zfsctl_snapshot_inactive did just that. In
OpenSolaris VOP_INACTIVE is much closer to FreeBSD's VOP_RECLAIM.
Rename & move them to gfs_vop_reclaim and zfsctl_snapshot_reclaim
and merge in the requisite vnode_destroy from zfsctl_common_reclaim.

b) gfs_lookup_dot and various zfsctl functions do not honor the
FreeBSD VFS convention of only locking from the root downward. When
looking up ".." the convention is to drop the current leaf vnode lock before
acquiring the directory vnode and then subsequently re-acquiring the lock on the
leaf vnode. This fixes that in all the places that our exercised by crash.sh.

c) The snapshot may already be unmounted when the directory vnode is reclaimed.
Check for this case and return.

One in the common layer:
d) Callers of traverse expect the reference to the vnode passed in to be
maintained. Don't release it.

This last one may be an unclear contract. There may in fact be some callers that
do expect the reference to be dropped on success in addition to callers that
expect it to be released. In this case a further audit of the callers is needed
and a consensus on the correct behavior.

PR:	184677
Submitted by:	kmacy
Reviewed by:	delphij, will, avg
MFC after:	2 weeks
Sponsored by:	iXsystems
2014-10-25 17:42:44 +00:00
Andriy Gapon
ab26525af2 make userland __assfail from opensolaris compat honor 'aok' variable
This should allow zdb -A option to actually make difference.

MFC after:	2 weeks
2014-10-07 14:15:50 +00:00
Steven Hartland
14a0d74ea8 Refactor ZFS ARC reclaim checks and limits
Remove previously added kmem methods in favour of defines which
allow diff minimisation between upstream code base.

Rebalance ARC free target to be vm_pageout_wakeup_thresh by default
which eliminates issue where ARC gets minimised instead of balancing
with VM pageout. The restores the target point prior to r270759.

Bring in missing upstream only changes which move unused code to
further eliminate code differences.

Add additional DTRACE probe to aid monitoring of ARC behaviour.

Enable upstream i386 code paths on platforms which don't define
UMA_MD_SMALL_ALLOC.

Fix mixture of byte an page values in arc_memory_throttle i386 code
path value assignment of available_memory.

PR:		187594
Review:		D702
Reviewed by:	avg
MFC after:	1 week
X-MFC-With:	r270759 & r270861
Sponsored by:	Multiplay
2014-10-03 20:34:55 +00:00
Steven Hartland
8caa3daf35 Remove sys/types.h include as per style (9)
SDT requries sys/param.h due to use of NULL

Reported by:	Garrett
Sponsored by:	Multiplay
2014-09-18 20:38:18 +00:00
Steven Hartland
71f3caaf31 Add dtrace probe support for zfs SET_ERROR(..)
MFC after:	1 week
Sponsored by:	Multiplay
2014-09-18 20:00:36 +00:00
Steven Hartland
92ac3eb59f Ensure that ZFS ARC free memory checks include cached pages
Also restore kmem_used() check for i386 as it has KVA limits that the raw
page counts above don't consider

PR:		187594
Reviewed by:	peter
X-MFC-With: r270759
Review:	D700
Sponsored by:	Multiplay
2014-08-30 21:44:32 +00:00
Steven Hartland
4d19f4ad1f Refactor ZFS ARC reclaim logic to be more VM cooperative
Prior to this change we triggered ARC reclaim when kmem usage passed 3/4
of the total available, as indicated by vmem_size(kmem_arena, VMEM_ALLOC).

This could lead large amounts of unused RAM e.g. on a 192GB machine with
ARC the only major RAM consumer, 40GB of RAM would remain unused.

The old method has also been seen to result in extreme RAM usage under
certain loads, causing poor performance and stalls.

We now trigger ARC reclaim when the number of free pages drops below the
value defined by the new sysctl vfs.zfs.arc_free_target, which defaults
to the value of vm.v_free_target.

Credit to Karl Denninger for the original patch on which this update was
based.

PR:		191510 and 187594
Tested by:	dteske
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Multiplay
2014-08-28 19:50:08 +00:00
Xin LI
d291a3bd9c Provide compatibility shim for atomic_dec_64_nv.
X-MFC-with:	r270247
MFC after:	13 days
2014-08-21 08:25:46 +00:00
Xin LI
cd741a5e1d Revert r269404 and use cpu_ticks() for dbuf allocation.
Encode CPU's number by XOR'ing the CPU ID against the 64-bit cpu_ticks().

Reviewed by:	mav, gibbs
Differential Revision: https://phabric.freebsd.org/D521
MFC after:	2 weeks
2014-08-03 09:47:51 +00:00
Ian Lepore
c311f7078c When arm 64-bit atomic ops are available, define ARM_HAVE_ATOMIC64. Use
that symbol (which will be correct in both kernel and userland contexts)
rather than just __arm__ to decide whether to use a local implementation.
2014-08-02 03:44:27 +00:00
Ian Lepore
814f4c5896 Use the 64-bit atomics now provided by arm machine/atomic.h instead of
(conflicting) local versions.
2014-08-01 23:45:50 +00:00