freebsd-skq

Author	SHA1	Message	Date
Andriy Gapon	7fa27112f3	zfs: clean up unused files and definitions MFC after: 1 month X-MFC after: r314048	2017-02-24 07:53:56 +00:00
Andriy Gapon	47c8e3d912	reimplement zfsctl (.zfs) support The current code is written on top of GFS, a library with the generic support for writing filesystems, which was ported from illumos. Because of significant differences between illumos VFS and FreeBSD VFS models, both the GFS and zfsctl code were heavily modified to work on FreeBSD. Nonetheless, they still contain quite a few ugly hacks and bugs. This is a reimplementation of the zfsctl code where the VFS-specific bits are written from scratch and only the code that interacts with the rest of ZFS is reused. Some highlights. We use two types of nodes, static and on-demand. The static nodes are used for permanent directories like .zfs, .zfs/snapshot, etc. The on-demand nodes are used for ephemeral directories that act as snapshot mount points. Initially only static nodes are created. Their vnodes are instantiated when they are looked up. The on-demand nodes and vnodes are instantiated as needed and the nodes are destroyed as soon as the corresponding vnodes are reclaimed. We also try very hard to ensure that uncovered snapshot vnodes do not linger. They are supposed to become inactive as soon as they are uncovered and we try to recycle them immediately. When a filesystem is unmounted all snapshots under .zfs are unmounted first, then all vnodes are flushed and finally the static .zfs nodes are destroyed. There are some changes outside of zfsctl code too. z_ctldir is never used directly (as it is an opaque pointer), zfsctl_root() has to be used instead. The function returns a locked vnode now, so it accepts a lock flags parameter. The function can also fail now, e.g. during force unmounting, whereas previously it was infallible. zfsctl_root_lookup() is retired, instead of it VOP_LOOKUP() on the .zfs vnode (obtained with zfsctl_root) is used. Some ideas are picked from an independent work by will. Reviewed by: asomers, smh MFC after: 1 month Relnotes: maybe Differential Revision: https://reviews.freebsd.org/D7421	2017-02-21 17:47:08 +00:00
Mateusz Guzik	619ce4d72e	Revert r309619 "ifndef atomic_cas_* in cddl code" It was a temporary change to ease an import of native atomic_cas primitives. Instead, atomic_fcmpset was devised with different semantics. See r311168.	2017-01-03 21:02:30 +00:00
Alexander Motin	c5f74c4873	Revert r310023 for now. After another look my new variable mapping was not exactly right.	2016-12-15 08:03:16 +00:00
Alexander Motin	d686b07132	Reduce diff from Illumos by better variables mapping.	2016-12-13 16:20:10 +00:00
Mateusz Guzik	ef32958e5d	ifndef atomic_cas_* in cddl code in preparation for native implementations This is a temporary change to not require all architectures to import at once. Discussed with: jhb	2016-12-06 14:08:49 +00:00
Alan Cox	bba39b9ae3	Remove PG_CACHED-related fields from struct vmmeter, because they are no longer used. More precisely, they are always zero because the code that decremented and incremented them no longer exists. Bump __FreeBSD_version to mark this change. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8583	2016-11-22 18:13:46 +00:00
Alan Cox	7667839a7e	Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497	2016-11-15 18:22:50 +00:00
Mark Johnston	9e579a58c3	Move implementations of uread() and uwrite() to the illumos compat layer. MFC after: 1 week	2016-09-24 21:40:14 +00:00
Alexander Motin	20e45e033c	Switch random_get_pseudo_bytes() shim to arc4rand(). Our shim for Solaris random_get_bytes() uses read_random(), that looks reasonable, since it guaranties reliably seeded random data. On the other side Solaris random_get_pseudo_bytes() does not provide this guarantie, and its original Solaris implementation is equivalent to our arc4rand(), using software crypto without stressing slower hardware RNG.	2016-09-10 09:37:41 +00:00
Andriy Gapon	f79bc17233	zfs: honour and make use of vfs vnode locking protocol ZFS POSIX Layer is originally written for Solaris VFS which is very different from FreeBSD VFS. Most importantly many things that FreeBSD VFS manages on behalf of all filesystems are implemented in ZPL in a different way. Thus, ZPL contains code that is redundant on FreeBSD or duplicates VFS functionality or, in the worst cases, badly interacts / interferes with VFS. The most prominent problem is a deadlock caused by the lock order reversal of vnode locks that may happen with concurrent zfs_rename() and lookup(). The deadlock is a result of zfs_rename() not observing the vnode locking contract expected by VFS. This commit removes all ZPL internal locking that protects parent-child relationships of filesystem nodes. These relationships are protected by vnode locks and the code is changed to take advantage of that fact and to properly interact with VFS. Removal of the internal locking allowed all ZPL dmu_tx_assign calls to use TXG_WAIT mode. Another victim, disputable perhaps, is ZFS support for filesystems with mixed case sensitivity. That support is not provided by the OS anyway, so in ZFS it was a buch of dead code. To do: - replace ZFS_ENTER mechanism with VFS managed / visible mechanism - replace zfs_zget with zfs_vget[f] as much as possible - get rid of not really useful now zfs_freebsd_* adapters - more cleanups of unneeded / unused code - fix / replace .zfs support PR: 209158 Reported by: many Tested by: many (thank you all!) MFC after: 5 days Sponsored by: HybridCluster / ClusterHQ Differential Revision: https://reviews.freebsd.org/D6533	2016-08-05 06:23:06 +00:00
Nathan Whitehorn	96c85efb4b	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
Konstantin Belousov	4a0d95f810	Use vnlru_free(9) to implement dnlc_reduce_cache(). This apparently puts ARC back under the limits after the vnode pressure rework in r291244, in particular due to the kmem exhaustion. Based on patch by: mckusick Reviewed by: avg, mckusick Tested by: allanjude, madpilot Sponsored by: The FreeBSD Foundation Approved by: re (gjb)	2016-06-17 17:34:28 +00:00
Andriy Gapon	c6cd01d924	fix a vnode reference leak caused by illumos compat traverse() This commit partially reverts r273641 which introduced the leak. It did so to accomodate for some consumers of traverse() that expected the starting vnode to stay as-is. But that introduced the leak in the case when a mounted filesystem was found and its root vnode was returned. r299914 removed the troublesome consumers and now there is no reason to keep the starting vnode. So, now the new rules are: - if there is no mounted filesystem, then nothing is changed - otherwise the starting vnode is always released - the root vnode of the mounted filesystem is returned locked and referenced in the case of success MFC after: 5 weeks X-MFC after: r299914	2016-05-16 12:15:19 +00:00
Andriy Gapon	a03fb1cf6a	mount_snapshot: consolidate all error handling This makes sure that the original vnode is always unlocked and released if any error happens. MFC after: 4 weeks	2016-05-16 06:30:25 +00:00
Andriy Gapon	3055925d42	zfsctl: fix several problems with reference counts * Remove excessive references on a snapshot mountpoint vnode. zfsctl_snapdir_lookup() called VN_HOLD() on a vnode returned from zfsctl_snapshot_mknode() and the latter also had a call to VN_HOLD() on the same vnode. On top of that gfs_dir_create() already returns the vnode with the use count of 1 (set in getnewvnode). So there was 3 references on the vnode. * mount_snapshot() should keep a reference to a covered vnode. That reference is owned by the mountpoint (mounted snapshot filesystem). * Remove cryptic manipulations of a covered vnode in zfs_umount(). FreeBSD dounmount() already does the right thing and releases the covered vnode. PR: 207464 Reported by: dustinwenz@ebureau.com Tested by: Howard Powell <hpowell@lighthouseinstruments.com> MFC after: 3 weeks	2016-05-16 06:24:04 +00:00
Conrad Meyer	3b56262303	compat/opensolaris: Don't redefined off64_t if already defined A follow-up to r299456. Reported by: gjb Sponsored by: EMC / Isilon Storage Division	2016-05-11 16:05:32 +00:00
Andriy Gapon	e881d8757c	remove emulation of VFS_HOLD and VFS_RELE from opensolaris compat On FreeBSD VFS_HOLD/VN_RELE were mapped to MNT_REF/MNT_REL that manipulate mnt_ref. But the job of properly maintaining the reference count is already automatically performed by insmntque(9) and delmntque(9). So, in effect all ZFS vnodes referenced the corresponding mountpoint twice. That was completely harmless, but we want to be very explicit about what FreeBSD VFS APIs are used, because illumos VFS_HOLD and FreeBSD MNT_REF provide quite different guarantees with respect to the held vfs_t / mountpoint. On illumos VFS_HOLD is sufficient to guarantee that vfs_t.vfs_data stays valid. On the other hand, on FreeBSD MNT_REF does not provide the same guarantee about mnt_data. We have to use vfs_busy() to get that guarantee. Thus, the calls to VFS_HOLD/VFS_RELE on vnode init and fini are removed. VFS_HOLD calls are replaced with vfs_busy in the ioctl handlers. And because vfs_busy has a richer interface that can not be dumbed down in all cases it's better to explicitly use it rather than trying to mask it behind VFS_HOLD. This change fixes a panic that could result from a race between zfs_umount() and zfs_ioc_rollback(). We observed a case where zfsvfs_free() tried to destroy data that zfsvfs_teardown() was still using. That happened because there was nothing to prevent unmounting of a ZFS filesystem that was in between zfs_suspend_fs() and zfs_resume_fs(). Reviewed by: kib, smh MFC after: 3 weeks Sponsored by: ClusterHQ Differential Revision: https://reviews.freebsd.org/D2794	2016-04-02 16:25:46 +00:00
Alexander Motin	fa9de58388	Fix small memory leak on attempt to access deleted snapshot. MFC after: 3 days	2016-03-15 21:21:28 +00:00
Alexander Motin	8d0e2eb06b	MFV r296529: 6672 arc_reclaim_thread() should use gethrtime() instead of ddi_get_lbolt() 6673 want a macro to convert seconds to nanoseconds and vice-versa Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Eli Rosenthal <eli.rosenthal@delphix.com> illumos/illumos-gate@a8f6344fa0	2016-03-08 18:28:24 +00:00
Alexander Motin	1b63fd68f4	MFV r296505: 6531 Provide mechanism to artificially limit disk performance Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@97e8130957	2016-03-08 17:27:13 +00:00
Ruslan Bukin	28029b68c0	Welcome the RISC-V 64-bit kernel. This is the final step required allowing to compile and to run RISC-V kernel and userland from HEAD. RISC-V is a completely open ISA that is freely available to academia and industry. Thanks to all the people involved! Special thanks to Andrew Turner, David Chisnall, Ed Maste, Konstantin Belousov, John Baldwin and Arun Thomas for their help. Thanks to Robert Watson for organizing this project. This project sponsored by UK Higher Education Innovation Fund (HEIF5) and DARPA CTSRD project at the University of Cambridge Computer Laboratory. FreeBSD/RISC-V project home: https://wiki.freebsd.org/riscv Reviewed by: andrew, emaste, kib Relnotes: Yes Sponsored by: DARPA, AFRL Sponsored by: HEIF5 Differential Revision: https://reviews.freebsd.org/D4982	2016-01-29 15:12:31 +00:00
Xin LI	28ffe927c2	Expose an interface to determine if an ACE is inherited. Submitted by: sef Reviewed by: trasz MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3540	2015-09-04 00:14:20 +00:00
Mariusz Zaborski	347a39b4a6	Add support for the arrays in nvlist library. - Add nvlist_{add,get,take,move,exists,free}_{number,bool,string,nvlist, descriptor} functions. - Add support for (un)packing arrays. - Add the nvl_array_next field to the nvlist structure. If an array is added by the nvlist_{move,add}_nvlist_array function this field will contains next element in the array. - Add the nitems field to the nvpair and nvpair_header structure. This field contains number of elements in the array. - Add special flag (NV_FLAG_IN_ARRAY) which is set if nvlist is a part of an array. - Add special type (NV_TYPE_NVLIST_ARRAY_NEXT).This type is used only on packing/unpacking. - Add new API for traversing arrays (nvlist_get_array_next). - Add the nvlist_get_pararr function which combines the nvlist_get_array_next and nvlist_get_parent functions. If nvlist is in the array it will return next element from array. If nvlist is last element in array or it isn't in array it will return his container (parent). This function should simplify traveling over nvlist. - Add tests for new features. - Add documentation for new functions. - Add my copyright. - Regenerate the sys/cddl/compat/opensolaris/sys/nvpair.h file. PR: 191083 Reviewed by: allanjude (doc) Approved by: pjd (mentor)	2015-08-15 06:34:49 +00:00
Alexander Motin	498b9d6c63	Fix r286574 build in user-space.	2015-08-10 12:25:26 +00:00
Alexander Motin	f13e9e1470	MFV r277427: 5445 Add more visibility via arcstats; specifically arc_state_t stats and differentiate between "data" and "metadata" Reviewed by: Basil Crow <basil.crow@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Bayard Bell <bayard.bell@nexenta.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@4076b1bf41	2015-08-10 10:59:58 +00:00
Ed Schouten	5a170c1b0e	Add an API for easily creating userspace threads in kernelspace. This change refactors the existing create_thread() function to be more generic. It replaces almost all of its arguments by a callback that can be used to extract the thread ID and copy it out to the right place, but also to perform additional initialization steps, such as setting the trapframe. This also makes the difference between thr_new() and thr_create() more clear in my opinion. This function is going to be used by the CloudABI compatibility layer. It looks like the OpenSolaris compatibility framework already provides a function called thread_create(). Rename this function to do_thread_create() and use a macro to deal with the namespacing conflict. A similar approach is already used for thread_exit(). MFC after: 1 month	2015-07-20 10:20:04 +00:00
Mateusz Guzik	8a08cec166	Create a dedicated function for ensuring that cdir and rdir are populated. Previously several places were doing it on its own, partially incorrectly (e.g. without the filedesc locked) or even actively harmful by populating jdir or assigning rootvnode without vrefing it. Reviewed by: kib	2015-07-11 16:22:48 +00:00
Mateusz Guzik	f131759f54	fd: make 'rights' a manadatory argument to fget* functions	2015-07-05 19:05:16 +00:00
Andriy Gapon	de93769f1d	compat nvpair.h: make sure that the names are mangled only for kernel Currently there is no good reason to mangle the userland API. The change was introduced in eac1d566b46edef765754203bef22c75c1699966, r279437. Also see https://reviews.freebsd.org/D1881. I am still convinced that nv should not have introduced intentionally conflicting API. Discussed with: rstone X-MFC with: r279437 Sponsored by: ClusterHQ	2015-06-07 08:54:25 +00:00
Andrew Turner	7572a8c8f1	Add the arm64 defines for cddl code. Differential Revision: https://reviews.freebsd.org/D2186 Reviewed by: emaste Sponsored by: The FreeBSD Foundation	2015-04-01 08:31:56 +00:00
Ryan Stone	296a8144aa	Allow Illumos code to co-exist with nv(9) Differential Revision: https://reviews.freebsd.org/D1881 Reviewed by: jfv, will Suggested by: pjd MFC after: 1 month Sponsored by: Sandvine Inc	2015-03-01 00:22:45 +00:00
Will Andrews	34abed55f3	NSEC_TO_TICK(usec) -> NSEC_TO_TICK(nsec)	2015-01-20 22:29:27 +00:00
Will Andrews	c9c5e04711	Remove unused strdup() #define.	2015-01-20 22:27:45 +00:00
Andriy Gapon	036a8c5dac	remove opensolaris cyclic code, replace with high-precision callouts In the old days callout(9) had 1 tick precision and that was inadequate for some uses, e.g. DTrace profile module, so we had to emulate cyclic API and behavior. Now we can directly use callout(9) in the very few places where cyclic was used. Differential Revision: https://reviews.freebsd.org/D1161 Reviewed by: gnn, jhb, markj MFC after: 2 weeks	2014-12-07 11:21:41 +00:00
Konstantin Belousov	6e646651d3	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
Josh Paetzel	14127f5b21	This change addresses 4 bugs in ZFS exposed by Richard Kojedzinszky's crash.sh script attached to FreeNAS bug 4109: https://bugs.freenas.org/issues/4109 Three are in the snapshot layer: a) AVG explains in his notes: https://wiki.freebsd.org/AvgVfsSolarisVsFreeBSD "VOP_INACTIVE must not do any destructive actions to a vnode and its filesystem node, nor invalidate them in any way." gfs_vop_inactive and zfsctl_snapshot_inactive did just that. In OpenSolaris VOP_INACTIVE is much closer to FreeBSD's VOP_RECLAIM. Rename & move them to gfs_vop_reclaim and zfsctl_snapshot_reclaim and merge in the requisite vnode_destroy from zfsctl_common_reclaim. b) gfs_lookup_dot and various zfsctl functions do not honor the FreeBSD VFS convention of only locking from the root downward. When looking up ".." the convention is to drop the current leaf vnode lock before acquiring the directory vnode and then subsequently re-acquiring the lock on the leaf vnode. This fixes that in all the places that our exercised by crash.sh. c) The snapshot may already be unmounted when the directory vnode is reclaimed. Check for this case and return. One in the common layer: d) Callers of traverse expect the reference to the vnode passed in to be maintained. Don't release it. This last one may be an unclear contract. There may in fact be some callers that do expect the reference to be dropped on success in addition to callers that expect it to be released. In this case a further audit of the callers is needed and a consensus on the correct behavior. PR: 184677 Submitted by: kmacy Reviewed by: delphij, will, avg MFC after: 2 weeks Sponsored by: iXsystems	2014-10-25 17:42:44 +00:00
Andriy Gapon	ab26525af2	make userland __assfail from opensolaris compat honor 'aok' variable This should allow zdb -A option to actually make difference. MFC after: 2 weeks	2014-10-07 14:15:50 +00:00
Steven Hartland	14a0d74ea8	Refactor ZFS ARC reclaim checks and limits Remove previously added kmem methods in favour of defines which allow diff minimisation between upstream code base. Rebalance ARC free target to be vm_pageout_wakeup_thresh by default which eliminates issue where ARC gets minimised instead of balancing with VM pageout. The restores the target point prior to r270759. Bring in missing upstream only changes which move unused code to further eliminate code differences. Add additional DTRACE probe to aid monitoring of ARC behaviour. Enable upstream i386 code paths on platforms which don't define UMA_MD_SMALL_ALLOC. Fix mixture of byte an page values in arc_memory_throttle i386 code path value assignment of available_memory. PR: 187594 Review: D702 Reviewed by: avg MFC after: 1 week X-MFC-With: r270759 & r270861 Sponsored by: Multiplay	2014-10-03 20:34:55 +00:00
Steven Hartland	8caa3daf35	Remove sys/types.h include as per style (9) SDT requries sys/param.h due to use of NULL Reported by: Garrett Sponsored by: Multiplay	2014-09-18 20:38:18 +00:00
Steven Hartland	71f3caaf31	Add dtrace probe support for zfs SET_ERROR(..) MFC after: 1 week Sponsored by: Multiplay	2014-09-18 20:00:36 +00:00
Steven Hartland	92ac3eb59f	Ensure that ZFS ARC free memory checks include cached pages Also restore kmem_used() check for i386 as it has KVA limits that the raw page counts above don't consider PR: 187594 Reviewed by: peter X-MFC-With: r270759 Review: D700 Sponsored by: Multiplay	2014-08-30 21:44:32 +00:00
Steven Hartland	4d19f4ad1f	Refactor ZFS ARC reclaim logic to be more VM cooperative Prior to this change we triggered ARC reclaim when kmem usage passed 3/4 of the total available, as indicated by vmem_size(kmem_arena, VMEM_ALLOC). This could lead large amounts of unused RAM e.g. on a 192GB machine with ARC the only major RAM consumer, 40GB of RAM would remain unused. The old method has also been seen to result in extreme RAM usage under certain loads, causing poor performance and stalls. We now trigger ARC reclaim when the number of free pages drops below the value defined by the new sysctl vfs.zfs.arc_free_target, which defaults to the value of vm.v_free_target. Credit to Karl Denninger for the original patch on which this update was based. PR: 191510 and 187594 Tested by: dteske MFC after: 1 week Relnotes: yes Sponsored by: Multiplay	2014-08-28 19:50:08 +00:00
Xin LI	d291a3bd9c	Provide compatibility shim for atomic_dec_64_nv. X-MFC-with: r270247 MFC after: 13 days	2014-08-21 08:25:46 +00:00
Xin LI	cd741a5e1d	Revert r269404 and use cpu_ticks() for dbuf allocation. Encode CPU's number by XOR'ing the CPU ID against the 64-bit cpu_ticks(). Reviewed by: mav, gibbs Differential Revision: https://phabric.freebsd.org/D521 MFC after: 2 weeks	2014-08-03 09:47:51 +00:00
Ian Lepore	c311f7078c	When arm 64-bit atomic ops are available, define ARM_HAVE_ATOMIC64. Use that symbol (which will be correct in both kernel and userland contexts) rather than just __arm__ to decide whether to use a local implementation.	2014-08-02 03:44:27 +00:00
Ian Lepore	814f4c5896	Use the 64-bit atomics now provided by arm machine/atomic.h instead of (conflicting) local versions.	2014-08-01 23:45:50 +00:00
Xin LI	125f68e708	Split gethrtime() and gethrtime_waitfree() and make the former use nanouptime() instead of getnanouptime(). nanouptime(9) provides more precise result at expense of being slower. In r269223, gethrtime() is used as creation time of dbuf, which in turn acts as portion of lookup key to maintain AVL invariant where there can not be duplicate items. Before this change, gethrtime() have preferred better execution time by sacrificing precision, which may lead to panic on busy systems with: panic: avl_find() succeeded inside avl_add() Reported by: allanjude, mav PR: kern/192284 MFC after: 11 days X-MFC-with: r269223	2014-08-01 22:33:23 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00

1 2 3 4 5

215 Commits