freebsd-nq

Author	SHA1	Message	Date
Steven Hartland	92ac3eb59f	Ensure that ZFS ARC free memory checks include cached pages Also restore kmem_used() check for i386 as it has KVA limits that the raw page counts above don't consider PR: 187594 Reviewed by: peter X-MFC-With: r270759 Review: D700 Sponsored by: Multiplay	2014-08-30 21:44:32 +00:00
Steven Hartland	4d19f4ad1f	Refactor ZFS ARC reclaim logic to be more VM cooperative Prior to this change we triggered ARC reclaim when kmem usage passed 3/4 of the total available, as indicated by vmem_size(kmem_arena, VMEM_ALLOC). This could lead large amounts of unused RAM e.g. on a 192GB machine with ARC the only major RAM consumer, 40GB of RAM would remain unused. The old method has also been seen to result in extreme RAM usage under certain loads, causing poor performance and stalls. We now trigger ARC reclaim when the number of free pages drops below the value defined by the new sysctl vfs.zfs.arc_free_target, which defaults to the value of vm.v_free_target. Credit to Karl Denninger for the original patch on which this update was based. PR: 191510 and 187594 Tested by: dteske MFC after: 1 week Relnotes: yes Sponsored by: Multiplay	2014-08-28 19:50:08 +00:00
Xin LI	d291a3bd9c	Provide compatibility shim for atomic_dec_64_nv. X-MFC-with: r270247 MFC after: 13 days	2014-08-21 08:25:46 +00:00
Xin LI	cd741a5e1d	Revert r269404 and use cpu_ticks() for dbuf allocation. Encode CPU's number by XOR'ing the CPU ID against the 64-bit cpu_ticks(). Reviewed by: mav, gibbs Differential Revision: https://phabric.freebsd.org/D521 MFC after: 2 weeks	2014-08-03 09:47:51 +00:00
Ian Lepore	c311f7078c	When arm 64-bit atomic ops are available, define ARM_HAVE_ATOMIC64. Use that symbol (which will be correct in both kernel and userland contexts) rather than just __arm__ to decide whether to use a local implementation.	2014-08-02 03:44:27 +00:00
Ian Lepore	814f4c5896	Use the 64-bit atomics now provided by arm machine/atomic.h instead of (conflicting) local versions.	2014-08-01 23:45:50 +00:00
Xin LI	125f68e708	Split gethrtime() and gethrtime_waitfree() and make the former use nanouptime() instead of getnanouptime(). nanouptime(9) provides more precise result at expense of being slower. In r269223, gethrtime() is used as creation time of dbuf, which in turn acts as portion of lookup key to maintain AVL invariant where there can not be duplicate items. Before this change, gethrtime() have preferred better execution time by sacrificing precision, which may lead to panic on busy systems with: panic: avl_find() succeeded inside avl_add() Reported by: allanjude, mav PR: kern/192284 MFC after: 11 days X-MFC-with: r269223	2014-08-01 22:33:23 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Steven Hartland	82ce008538	Reintroduce priority for the TRIM ZIOs instead of using the "NOW" priority The changes how TRIM requests are generated to use ZIO_TYPE_FREE + a priority instead of ZIO_TYPE_IOCTL, until processed by vdev_geom; only then is it translated the required geom values. This reduces the amount of changes required for FREE requests to be supported by the new IO scheduler. This also eliminates the need for a specific DKIOCTRIM. Also fixed FREE vdev child IO's from running ZIO_STAGE_VDEV_IO_DONE as part of their schedule. As the new IO scheduler can result in a request to execute one type of IO to actually run a different type of IO it requires that zio_trim requests are processed without holding the trim map lock (tm->tm_lock), as the free request execute call may result in write request running hence triggering a trim_map_write_start call, which takes the trim map lock and hence would result in recused on no-recursive sx lock. This is based off avg's original work, so credit to him. MFC after: 1 month	2014-04-30 17:46:29 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
Mark Johnston	dc0f030e51	Define the KM_NORMALPRI flag for kmem_alloc(), as it is used in some upstream DTrace code. It indicates that the kernel memory allocator need not attempt to satisfy non-blocking allocations in low-memory conditions. This has no direct equivalent in the malloc(9) flags, so it is just defined to 0 for now.	2014-02-22 05:13:35 +00:00
Alexander Motin	77e2eaf5b8	Fix build after r260234 by converting ddi_get_lbolt64() from inline into a macro. Otherwise compiler complains that hz variable used there either undefined or defined twice, thanks to header mess caused by compat shims.	2014-01-05 19:07:42 +00:00
Alexander Motin	99e2428636	Remove extra conversion to nanoseconds from ddi_get_lbolt64(). As result this uses one multiplication and shifts instead of one division and two multiplications.	2014-01-03 18:08:31 +00:00
Andriy Gapon	f77ffe1b22	zfs: add zfs_freebsd_putpages this should be more optimal than writing pages one-by-one via zfs_write -> update_pages in the case of multi-page putpages call MFC after: 16 days	2013-11-29 15:39:39 +00:00
Andriy Gapon	fdbcc95a47	zfs: make zfs_map_page / zfs_unmap_page public MFC after: 15 days	2013-11-29 15:33:40 +00:00
Andriy Gapon	02727275c5	opensolaris compat: add taskq_wait emulation MFC after: 10 days	2013-11-28 19:17:11 +00:00
Andriy Gapon	2a4704ab01	MFV r255255: 4045 zfs write throttle & i/o scheduler performance work illumos/illumos-gate@69962b5647 Please note the following changes: - zio_ioctl has lost its priority parameter and now TRIM is executed with 'now' priority - some knobs are gone and some new knobs are added; not all of them are exposed as tunables / sysctls yet MFC after: 10 days Sponsored by: HybridCluster [merge]	2013-11-26 09:57:14 +00:00
Andriy Gapon	34140e78ab	734 taskq_dispatch_prealloc() desired 943 zio_interrupt ends up calling taskq_dispatch with TQ_SLEEP illumos/illumos-gate@5aeb94743e Essentially FreeBSD taskqueues already operate in a mode that was added to Illumos with taskq_dispatch_ent change. We even exposed the superior FreeBSD interface as taskq_dispatch_safe. Now we just rename taskq_dispatch_safe to taskq_dispatch_ent and struct struct ostask to taskq_ent_t, so that code differences will be minimal. After this change sys/cddl/compat/opensolaris/sys/taskq.h header is no longer needed. Note that this commit is not an MFV because the upstream change was not individually committed to the vendor area. MFC after: 8 days	2013-11-26 09:26:18 +00:00
Andriy Gapon	2dbdedbc46	opensolaris taskq: some cosmetic changes - drop trailing whitespace - remove redundant "extern" from function declarations - remove unused macro MFC after: 1 week	2013-11-26 09:10:01 +00:00
Andriy Gapon	a776a1c1c5	sdt: add support for solaris/illumos style DTRACE_PROBE macros The new macros are implemented in terms of SDT_PROBE_DEFINE and SDT_PROBE. Probes defined in this way will appear under SDT provider named "sdt". Parameter types are exposed via SDT_PROBE_ARGTYPE. This is something that illumos does not have by default. This kind of SDT probes is already present in ZFS code, so those probes will now be available if KDTRACE_HOOKS options is enabled. A potential future illumos compatibility enhancement is to encode a provider name as a prefix in a probe name. Reviewed by: markj MFC after: 3 weeks X-MFC after: r258622	2013-11-26 08:49:53 +00:00
Xin LI	e8de677c74	MFV r247844 (illumos-gate 13975:ef6409bc370f) Illumos ZFS issues: 3582 zfs_delay() should support a variable resolution 3584 DTrace sdt probes for ZFS txg states Provide a compatibility shim for Solaris's cv_timedwait_hires to help aid future porting. Approved by: re (ZFS blanket)	2013-09-10 01:46:47 +00:00
Pawel Jakub Dawidek	ab568de789	Handle cases where capability rights are not provided. Reported by: kib	2013-09-05 11:58:12 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Jeff Roberson	5df87b21d3	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
Andriy Gapon	c319ea15f4	opensolaris code: translate INVARIANTS to DEBUG and ZFS_DEBUG Do this by forcing inclusion of sys/cddl/compat/opensolaris/sys/debug_compat.h via -include option into all source files from OpenSolaris. Note that this -include option must always be after -include opt_global.h. Additionally, remove forced definition of DEBUG for some modules and fix their build without DEBUG. Also, meaning of DEBUG was overloaded to enable WITNESS support for some OpenSolaris (primarily ZFS) locks. Now this overloading is removed and that use of DEBUG is replaced with a new option OPENSOLARIS_WITNESS. MFC after: 17 days	2013-08-06 15:51:56 +00:00
Steven Hartland	3666c4917b	style(9) fixes MFC after: 2 days	2013-06-29 23:39:38 +00:00
Steven Hartland	43e695497c	Switch ZFS mutex_owner macro to use sx_xholder as its now exported via sx.h MFC after: 1 week	2013-06-21 15:55:03 +00:00
Martin Matuska	f1b5c26470	MFV r248217: Merge change from vendor to reduce diff only. ZFS dtrace probes are not supported on FreeBSD yet. Illumos ZFS issues: 3598 want to dtrace when errors are generated in zfs MFC after: 3 weeks	2013-04-06 10:39:38 +00:00
Steven Hartland	89e5b43079	TRIM cache devices based on time instead of TXGs. Currently, the trim module uses the same algorithm for data and cache devices when deciding to issue TRIM requests, based on how far in the past the TXG is. Unfortunately, this is not ideal for cache devices, because the L2ARC doesn't use the concept of TXGs at all. In fact, when using a pool for reading only, the L2ARC is written but the TXG counter doesn't increase, and so no new TRIM requests are issued to the cache device. This patch fixes the issue by using time instead of the TXG number as the criteria for trimming on cache devices. The basic delay principle stays the same, but parameters are expressed in seconds instead of TXGs. The new parameters are named trim_l2arc_limit and trim_l2arc_batch, and both default to 30 second. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `17122c31ac` MFC after: 2 weeks	2013-03-21 10:29:05 +00:00
Martin Matuska	520268fb97	MFC @248493	2013-03-19 11:09:15 +00:00
John Baldwin	3cf3b9f097	Partially revert r195702. Deferring stops is now implemented via a set of calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.	2013-03-18 17:23:58 +00:00
Martin Matuska	a03fbc7ecf	MFC @248093	2013-03-09 11:57:51 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Attilio Rao	c934116100	Merge from vmc-playground: Introduce a new KPI that verifies if the page cache is empty for a specified vm_object. This KPI does not make assumptions about the locking in order to be used also for building assertions at init and destroy time. It is mostly used to hide implementation details of the page cache. Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: alc (vm_radix based version) Tested by: flo, pho, jhb, davide	2013-03-09 02:05:29 +00:00
Martin Matuska	dce1a726f2	WiP merge of libzfs_core (MFV r238590, r238592) not yet working, ioctl handling needs to be changed	2013-03-05 08:09:53 +00:00
Rui Paulo	e0dffa2de2	Remove the extra parenthesis from the cv_init() macro. They are not necessary because we already use parenthesis in zfs_cv_init(). This fixes a long standing bug where there would be an extra ")" at the end of the string. This extra parenthesis would show up in the WCHAN of the process (top, stty status, etc.).	2013-03-03 06:42:36 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
Martin Matuska	e70664bafc	MFV v242732: Merge the ZFS I/O deadman thread from vendor (illumos). This feature panics the system on hanging ZFS I/O, helps debugging and resumes failed service. The panic behavior can be controlled with the loader-only tunables: vfs.zfs.deadman_enabled (enable or disable panic on stalled ZFS I/O) vfs.zfs.deadman_synctime (expiration time for stalled ZFS I/O) By default, ZFS I/O deadman is enabled by default on amd64 and i386 excluding virtual guest machines. Illumos ZFS issues: 3246 ZFS I/O deadman thread References: https://www.illumos.org/issues/3246 MFC after: 2 weeks	2013-02-25 12:33:31 +00:00
Xin LI	ef17620fc8	MFV r245512: * Illumos zfs issue #3035 [1] LZ4 compression support in ZFS. LZ4 is a new high-speed BSD-licensed compression algorithm created by Yann Collet that delivers very high compression and decompression performance compared to lzjb (>50% faster on compression, >80% faster on decompression and around 3x faster on compression of incompressible data), while giving better compression ratio [1]. This version of LZ4 corresponds to upstream's [2] revision 85. Please note that for obvious reasons this is not backward read compatible. This means once a pool have LZ4 compressed data, these data can no longer be read by older ZFS implementations. Local changes: - On-stack hash table disabled and using kernel slab allocator instead, at this time. This requires larger kernel thread stack for zio workers. This may change in the future should we adjusted the zio workers' thread stack size. - likely and unlikely will be undefined if they are already defined, this is required for i386 XEN build. - Removed De Bruijn sequence based __builtin_ctz family of builtins in favor of the latter. Both GCC and clang supports these builtins. - Changed the way the LZ4 code detects endianness. - Manual pages modifications to mention the feature based on Illumos counterpart. - Boot loader changes to make it support LZ4 decompression. [1] https://www.illumos.org/issues/3035 [2] http://code.google.com/p/lz4/source/list Obtained from: Illumos (13921:9d721847e469) Tested on: FreeBSD/amd64 MFC after: 1 month	2013-02-09 06:39:28 +00:00
Andriy Gapon	5583e07188	solaris compat: remove KM_ZERO - there is no such flag in Solaris and derivatives - the flag was added in an unrelated change - the flag is not used The proper way to allocate zeroed out memory is to use kmem_zalloc. MFC after: 3 days	2013-02-02 11:41:05 +00:00
Steven Hartland	7150222c0a	Renamed zfs trim stats removing duplicate zio_trim identifier from the name Added description option to kstats. Added descriptions for zio_trim kstats PR: kern/173113 Submitted by: Steven Hartland Reviewed by: pjd Approved by: pjd MFC after: 2 weeks	2012-12-12 16:14:14 +00:00
Andriy Gapon	59e407dfbf	opensolaris compat: terminate cmn_err mesages with a new line MFC after: 6 days	2012-11-24 13:10:36 +00:00
Andriy Gapon	f8abf4a1e4	opensolaris compat: clear VI_MOUNT before returning if mount_snapshot fails To do: investigate if it would be possible to use normal vfs_domount here. Reviewed by: kib MFC after: 19 days	2012-11-04 14:27:31 +00:00
Andriy Gapon	8d041ea733	opensolaris_lookup: use vfs_busy in traverse before calling VFS_ROOT ... to ensure that we have a valid mountpoint during the call. Reviewed by: kib MFC after: 19 days	2012-11-04 14:16:18 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Pawel Jakub Dawidek	bcb77be2b7	Add TRIM support. The code builds a map of regions that were freed. On every write the code consults the map and eventually removes ranges that were freed before, but are now overwritten. Freed blocks are not TRIMed immediately. There is a tunable that defines how many txg we should wait with TRIMming freed blocks (64 by default). There is a low priority thread that TRIMs ranges when the time comes. During TRIM we keep in-flight ranges on a list to detect colliding writes - we have to delay writes that collide with in-flight TRIMs in case something will be reordered and write will reached the disk before the TRIM. We don't have to do the same for in-flight writes, as colliding writes just remove ranges to TRIM. Sponsored by: multiplay.co.uk This work includes some important fixes and some improvements obtained from the zfsonlinux project, including TRIMming entire vdevs on pool create/add/attach and on pool import for spare and cache vdevs. Obtained from: zfsonlinux Submitted by: Etienne Dechamps <etienne.dechamps@ovh.net>	2012-09-23 19:40:58 +00:00
Martin Matuska	4c5238d576	Merge recent zfs vendor changes, sync code and adjust userland DEBUG. Illumos issued covered: 1884 Empty "used" field for zfs *space commands 3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument is zero 3028 zfs {group,user}space -n prints (null) instead of numeric GID/UID 3048 zfs {user,group}space [-s\|-S] is broken 3049 zfs {user,group}space -t doesn't really filter the results 3060 zfs {user,group}space -H output isn't tab-delimited 3061 zfs {user,group}space -o doesn't use specified fields order 3064 usr/src/cmd/zpool/zpool_main.c misspells "successful" 3093 zfs {user,group}space's -i is noop 3098 zfs userspace/groupspace fail without saying why when run as non-root References: https://www.illumos.org/issues/ + [issue_id] Obtained from: illumos (vendor/illumos, vendor/illumos-sys) MFC after: 2 weeks	2012-09-12 18:05:43 +00:00

1 2 3 4

174 Commits