freebsd-skq

Author	SHA1	Message	Date
br	e45cf9cd75	Add a central location for exclusion checks. We check here if function is excluded from FBT instrumentation. Reviewed by: andrew, emaste, markj Differential Revision: https://reviews.freebsd.org/D2899	2015-07-01 14:09:59 +00:00
avg	2aa336795a	MFV r284412: 5911 ZFS "hangs" while deleting file Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@46e1baa6cf https://www.illumos.org/issues/5911 Sometimes ZFS appears to hang while deleting a file. It is actually making slow progress at the file deletion, but other operations (administrative and writes via the data path) "hang" until the file removal completes, which can take a long time if the file has many blocks. The deletion (or most of it) happens in a single txg, and the sync thread spends most of its time reading indirect blocks via this stack trace: swtch+0x141() cv_wait+0x70() zio_wait+0x5b() dbuf_read+0x2c0() free_children+0x50() free_children+0x12a() free_children+0x12a() free_children+0x12a() dnode_sync_free_range_impl+0xdf() dnode_sync_free_range+0x52() range_tree_vacate+0x65() dnode_sync+0x1d8() dmu_objset_sync_dnodes+0x77() dmu_objset_sync+0x19f() dsl_dataset_sync+0x51() dsl_pool_sync+0x9a() spa_sync+0x2ff() txg_sync_thread+0x21f() thread_start+8() One way to reproduce the problem is if we are over the arc_meta_limit, e.g. because lots of indirect blocks are pinned because we have L0 dbufs under them. It could be that most of the L1 indirects are cached, in which case when dmu_free_long_range_impl() calls dmu_tx_hold_free(), it will complete very quickly. This allows dmu_free_long_range_impl() to put many (perhaps all of its) transactions in the same TXG. However, dmu_free_long_range_impl() calls dnode_evict_dbufs (and dnode_free_range()), which removes the L0 dbufs, thus reducing the hold count on the L1 indirect blocks above it, allowing them to be evicted. Because we are over the arc_meta_limit(), these L1 blocks will be evicted ASAP. Thus when we get to syncing context, the L1 indirects are no longer cached and must be read in. Obtained from: illumos MFC after: 15 days	2015-06-19 06:58:05 +00:00
avg	760c76460f	illums compat: use flsl/flsll for highbit/highbit64 Do that only when when fast inline versions are available. At the moment that can be the case only in the kernel and not for all platforms. The original code uses the binary search and that's kept as a fallback. This is a micro optimization. Differential Revision: https://reviews.freebsd.org/D2839 Reviewed by: delphij, mahrens, mav MFC after: 17 days	2015-06-19 06:41:53 +00:00
glebius	5b81a20433	o Un-inline vm_pager_get_pages(), vm_pager_get_pages_async(). o Provide an extensive set of assertions for input array of pages. o Remove now duplicate assertions from different pagers. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-06-17 22:44:27 +00:00
avg	cd7e51a58a	Revert r284511 because it caused build failures on many platforms The problem is that when inline versions of flsl and flsll are not available, then libkern.h must be included for their declarations in kernel sources. The fix would be trivial, but I would like to figure out first if it even makes sense to use the libkern provided implementations. Reported by: bz Pointyhat to: avg	2015-06-17 17:16:06 +00:00
avg	36c82a1ae4	l2arc: pass correct size to trim requests b_size is a logical size of a buffer in memory, b_asize is its physical size that accounts for possible compression. Currently the latter is the best approximation for the allocated, on-disk size. L2ARC TRIM support was committed a few weeks before L2ARC compression was imported, so originally the code was correct, because b_size was the size. Further thoughts. Given that the cache device is being overwritten in a circular fashion it is not clear if a TRIM per each evicted L2ARC buffer has any benefits. Maybe it would be sufficient to issue a single trim request for the whole device when it is loaded, e.g. after a bootup, or when it is unloaded, e.g. before a shutdown. At least as long as L2ARC is not persistent across reboots. Discussed with: smh MFC after: 19 says	2015-06-17 12:28:13 +00:00
avg	1574191a79	illumos compat: use flsl/flsll for highbit/highbit64 This is a micro optimization. The upstream code uses the binary search. Differential Revision: https://reviews.freebsd.org/D2839 Reviewed by: delphij, mav MFC after: 15 days	2015-06-17 12:05:04 +00:00
avg	635065f17b	MFV r284036: 5961 Fix stack overflow in zfs_create_fs illumos/illumos-gate@c701fde691 Author: glebius MFC after: 11 days	2015-06-12 11:10:49 +00:00
avg	35511df052	MFV r284030: 5818 zfs {ref}compressratio is incorrect with 4k sector size illumos/illumos-gate@81cd5c555f Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 17 days	2015-06-12 10:57:05 +00:00
avg	34d03f1f2c	MFV r283534: 5515 dataset user hold doesn't reject empty tags illumos/illumos-gate@752fd8dabc Author: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> MFC after: 10 days	2015-06-12 10:52:53 +00:00
avg	aced9fcb65	MFV r284040: check that datasets are snapshots 5946 zfs_ioc_space_snaps must check that firstsnap and lastsnap refer to snapshots 5945 zfs_ioc_send_space must ensure that fromsnap refers to a snapshot Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gordon.ross@nexenta.com> illumos/illumos-gate@24218bebb4 Note that the upstream commit is modified during MFV: in the upstream the check is done by inspecting ds_is_snapshot field while in FreeBSD we call dsl_dataset_is_snapshot(). This is because illumos/illumos-gate@bc9014e6a8 (r277428 in vendor-sys/illumos) is not MFV-ed yet. MFC after: 10 days	2015-06-12 10:41:24 +00:00
br	ab3ad78145	Don't re-define LOCORE when dtrace is built-in to the kernel.	2015-06-10 09:59:26 +00:00
avg	630db52ab7	compat nvpair.h: make sure that the names are mangled only for kernel Currently there is no good reason to mangle the userland API. The change was introduced in `eac1d566b4`, r279437. Also see https://reviews.freebsd.org/D1881. I am still convinced that nv should not have introduced intentionally conflicting API. Discussed with: rstone X-MFC with: r279437 Sponsored by: ClusterHQ	2015-06-07 08:54:25 +00:00
kib	ae73c05fd5	Add missed {}. Noted by: Morten Rodal <morten@rodal.no> MFC after: 2 weeks	2015-05-27 19:28:14 +00:00
kib	d77dbf3761	Right now, dounmount() is called with unreferenced mount point. Nothing stops a parallel unmount to suceed before the given call to dounmount() checks and locks the covered vnode. Prevent dounmount() from acting on the freed (although type-stable) memory by changing the interface to require the mount point to be referenced. dounmount() consumes the reference on return, regardless of the sucessfull or erronous result. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:22:50 +00:00
avg	2c52296aaa	zfs: fixes for a full stream received into an existing dataset - this should fail early unless the force flag is set - if the force flag is set then any local modifications including snapshots should be undone See: https://www.illumos.org/issues/5912 See: https://reviews.csiden.org/r/220/ Reviewed by: mahrens, Paul Dagnelie <pcd@delphix.com> MFC after: 15 days Sponsored by: ClusterHQ	2015-05-25 11:56:57 +00:00
avg	511007870d	dsl_dataset_promote_check: ensure that shared snaps do not become too long ... after they are transfered from the old origin to the new one. See: https://www.illumos.org/issues/5909 See: https://reviews.csiden.org/r/219/ Reviewed by: mahrens MFC after: 10 days Sponsored by: ClusterHQ	2015-05-25 11:48:15 +00:00
kib	e1f68e3cfb	Remove excess Giant acquisition around the dounmount() call. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-25 09:08:19 +00:00
markj	6928f32018	Remove unused references to calltrap. MFC after: 3 days	2015-05-25 01:22:56 +00:00
jkim	318c4f97e6	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
smh	6394c7af86	Add copyright info missing from r282205 Add the copyright info missing from ZoL origin version. MFC after: 2 days Sponsored by: Multiplay	2015-05-14 08:13:01 +00:00
avg	9f5ffddd44	zfs ioctls: use fget_write / fget_read instead of getf wrapper for fget This allows to ensure that we do not write to a file that was opened for reading only or vice versa. Also, use the correct capability in in zfs_ioc_send_new(). Differential Revision: https://reviews.freebsd.org/D2382 Reviewed by: delphij MFC after: 17 days Sponsored by: ClusterHQ	2015-05-11 10:07:31 +00:00
markj	6ede9cbbef	Remove some commented-out upstream code for handling traps from usermode DTrace probes. This handling is already done in trap() on i386 and amd64.	2015-05-10 22:27:48 +00:00
jhibbits	c50204f333	Fix a couple bugs in 64-bit powerpc fasttrap argument retrieval. Found by code inspection.	2015-05-10 04:33:01 +00:00
avg	b15170de41	MFV r282630: 5809 Blowaway full receive in v1 pool causes kernel panic MFC after: 5 days	2015-05-08 14:03:14 +00:00
avg	897e19b7f1	zfs: do not hold an extra reference on a root vnode while a filesystem is mounted At present zfs_domount() acquires a reference on the filesystem's root vnode and that reference is kept until zfs_umount. The latter calls vflush(rootrefs = 1) to dispose of the extra reference. There is no explanation of why that reference is kept - what problem it solves or what behavior it improves. Also, that logic is FreeBSD specific. There is one real problem with that reference, though. zfs recv -F may receive a full, non-incremental stream to a mounted filesystem. In that case the received root object is likely to have a different z_gen attribute value. Because of that, zfs_rezget will leave the previous root znode and vnode disassociated from the actual object (z_sa_hdl == NULL). Thus, future calls to VFS_ROOT() -> zfs_root() will produce a new vnode-znode pair, while the old one will be kept alive by the outstanding reference. So, the outstanding reference will not actually be for the new root vnode (or, more precisely, vnodes - because a root vnode may be recycled and a newer one can be created). As a result, when vflush(rootrefs = 1) s called there will be two problems: - a leaked reference on the old root vnode preventing a graceful unmount - insufficient references on the actual root vnode leading to a crash upon access to the vnode after it is destroyed by vgone() + vdrop() The second issue will actually override the first one. Differential Revision: https://reviews.freebsd.org/D2353 Reviewed by: delphij, kib, smh MFC after: 17 days	2015-05-05 11:01:06 +00:00
avg	509c82b2c2	dmu_recv_end_check: don't leak hold if dsl_destroy_snapshot_check_impl fails The leak may happen if !drc_newfs && drc_force and there is an error iterating through snapshots or any of snapshot checks fails. See https://www.illumos.org/issues/5870 See https://reviews.csiden.org/r/206/ Reviewed by: mahrens (as mahrens@delphix.com) MFC after: 15 days Sponsored by: ClusterHQ	2015-05-05 10:56:16 +00:00
smh	a5d3a9fc06	Fix misuse of input argument in traverse_visitbp In traverse_visitbp(), the input argument dnp is modified in the middle to point to a temporary buffer. Originally this doesn't matter, because no user of TRAVERSE_POST dereferences it. However, in fbeddd6 a piece of code is added dereferencing dnp after the modification, creating a possible bug. We fix this by creating a new local variable cdnp for the DMU_OT_DNODE case, so we don't modify the input argument. Also we introduce different local variables in the DMU_OT_OBJSET case to prevent confusion between the input argument. Obtained from: zfsonlinux (a585f2f844ed3d4270221fed88f5e494eb55d932) MFC after: 2 weeks Sponsored by: Multiplay	2015-04-28 22:46:58 +00:00
avg	70befe4134	replace a comment about zfs recv -F corner case with a longer, more detailed one The old comment in zfs_rezget explains what situation the code handles, the new comment also describes how the situation can arise. Also, re-join a line that became sufficiently shorti some time ago. Differential Revision: https://reviews.freebsd.org/D2352 Reviewed by: delphij, smh MFC after: 12 days	2015-04-28 09:19:40 +00:00
avg	879cb055bd	zfs_onexit_fd_hold: return EBADF even if devfs_get_cdevpriv gave ENOENT /dev/zfs always has per-open data, so when it is missing the file descriptor is for some other file. Returning ENOENT in this case is confusing as a variety of other conditions (like a missing dataset) may result in the same error. It's better to consistently return EBADF for any problems with the file descriptor. Note that zfs_onexit_fd_hold() is used with 'automatic cleanup fd' - when that fd is closed, typically because a process is terminated, some cleanup action is taken by ZFS driver. E.g. a temporary snapshot hold is released. Perhaps, it would even be worthwhile changing devfs_get_cdevpriv() to return EBADF if there is no associated data. Differential Revision: https://reviews.freebsd.org/D2370 Reviewed by: delphij, smh MFC after: 12 days	2015-04-28 09:11:47 +00:00
avg	65773effb9	dsl_dir_rename_check: return EXDEV on cross-pool rename attempt Obtained from: zfsonlinux/zfs@9063f65476 Obtained from: Boris Protopopov <boris.protopopov@actifio.com> MFC after: 10 days	2015-04-28 08:04:16 +00:00
avg	293d1ac7ce	MFV r282123: 5610 zfs clone from different source and target pools produces coredump MFC after: 10 days	2015-04-28 07:42:28 +00:00
avg	7d7084af66	MFV r282124: 5393 spurious failures from dsl_dataset_hold_obj() The actual bugfix was pro-actively committed in r275515. This MFV is cosmetic, it just aligns code style with the upstream. MFC after: 10 days	2015-04-28 07:37:38 +00:00
avg	7112bf7b61	nvpair_type_is_array: DATA_TYPE_INT8_ARRAY was not recognized To do: upstream (https://www.illumos.org/issues/5778) MFC after: 10 days	2015-04-28 06:34:55 +00:00
rwatson	ed8c3c1f90	Adjust PROF_ARTIFICIAL_FRAMES in the DTrace profile provider on ARM to skip 10, rather than 9, frames. This appears to work quite well in practice on the BeagleBone Black, so remove a comment about the value being bogus and replace it with a slightly less negative one. However, the number of frames to skip is quite sensitive to details of the timer and interrupt handling paths, so this is necessarily fragile -- but no more so than on x86. Sponsored by: DARPA, AFRL	2015-04-25 15:43:12 +00:00
markj	9385b9197d	Fix DTrace's panic() action. It would previously call into some unfinished Solaris compatibility code and return without actually calling panic(9). The compatibility code is unneeded, however, so just remove it and have dtrace_panic() call vpanic(9) directly. Differential Revision: https://reviews.freebsd.org/D2349 Reviewed by: avg MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division	2015-04-24 03:19:30 +00:00
delphij	9420bc6c71	Remove vfs.zfs.snapshot_list_prefetch, the corresponding code was gone in r248571 already. MFC after: 1 week	2015-04-17 21:21:11 +00:00
markj	1d6ffde4f4	libdtrace: add support for lazyload mode. Passing "-x lazyload" to dtrace -G during compilation causes dtrace(1) to not link drti.o into the output object file, so the USDT probes are not created during process startup. Instead, dtrace(1) will automatically discover and create probes on the process' behalf when attaching. Differential Revision: https://reviews.freebsd.org/D2203 Reviewed by: rpaulo MFC after: 1 month	2015-04-08 02:36:37 +00:00
mav	e5f58186a5	Add DTrace probe to the new ARC reclaim cause added in r281026. MFC after: 1 month	2015-04-05 14:45:52 +00:00
mav	326429ebdd	Make ZFS ARC track both KVA usage and fragmentation. Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage reaches certain threshold (3/4 on i386 or 16/17 otherwise). FreeBSD has even less KVA, but had no such limit on archs with direct map as amd64. As result, on machines with a lot of RAM, during load with very small user- space memory pressure, such as `zfs send`, it was possible to reach state, when there is enough both physical RAM and KVA (I've seen up to 25-30%), but no continuous KVA range to allocate even single 128KB I/O request. Address this situation from two sides: - restore KVA usage limitations in a way the most close to Illumos; - introduce new requirement for KVA fragmentation, specifying that we should have at least one sequential KVA range of zfs_max_recordsize bytes. Experiments show that first limitation done alone is not sufficient. On machine with 64GB of RAM it is sometimes needed to drop up to half of ARC size to get at leats one 1MB KVA chunk. Statically limiting ARC to half of KVA/RAM is too strict, so second limitation makes it to work in cycles: accumulate trash up to certain critical mass, do massive spring-cleaning, and then start littering again. :) MFC after: 1 month	2015-04-03 14:45:48 +00:00
andrew	3046fbea8a	Add the arm64 defines for cddl code. Differential Revision: https://reviews.freebsd.org/D2186 Reviewed by: emaste Sponsored by: The FreeBSD Foundation	2015-04-01 08:31:56 +00:00
markj	acaed6413c	Import a missing piece of commit b8fac8e162eda7e98d from illumos-gate. This adds an upper bound, dtrace_ustackdepth_max, to the number of frames traversed when computing the userland stack depth. Some programs - notably firefox - are otherwise able to trigger an infinite loop in dtrace_getustack_common(), causing a panic. MFC after: 1 week	2015-03-30 03:55:51 +00:00
mav	4a52d77cdd	Some cosmetic polishing. No functional change. MFC after: 1 week	2015-03-29 20:28:18 +00:00
markj	d854427d60	Remove unused upstream DTrace provider implementations that are duplicates of providers under sys/cddl/dev/. Also remove sdt_subr.c, which isn't used in FreeBSD's SDT implementation. Suggested by: rwatson	2015-03-16 01:15:08 +00:00
rwatson	9f48720b49	Now that DTrace stack traces handle exception frames better, skip fewer stack frames for FBT 'entry' probes on ARM. MFC after: 3 days Sponsored by: DARPA, AFRL	2015-03-15 15:19:02 +00:00
rwatson	26e3e9bb99	On ARM, unlike some other architectures, saved $pc values from in-kernel traps do appear in the regular call stack, rather than only in a special trap frame, so we don't need to inject the trap-frame $pc into a returned stack trace in DTrace. MFC after: 3 days Sponsored by: DARPA, AFRL	2015-03-15 15:17:34 +00:00
rwatson	ec1dafc898	Replace the completely arbitrary '3' with '9' for the number of frames to skip using the DTrace 'profile' provider on ARM. This causes stack traces to skip various driver-and callout-related things as they do on x86, where the likewise arbitrary values are '6' (32-bit) and '10' (64-bit) for similar sorts of reasons. MFC after: 3 days Sponsored by: DARPA, AFRL	2015-03-15 14:12:40 +00:00
smh	bc1c82b63e	Allow zvol_geom_worker to process BIO_DELETE's If zvol_geom_start is called with a BIO_DELETE from a thread which can sleep it queues it for later processing by the zvol_geom_worker. The zvol_geom_worker didn't have a delete case so would simply loose the bio hence preventing the original caller from every completing. In addition an other unknown types would suffer the same fate. Allow zvol_geom_worker to process BIO_DELETE's via zvol_strategy and return unsupported for all unknown bio types. MFC after: 2 weeks Sponsored by: Multiplay	2015-03-14 17:35:04 +00:00
mav	397e76aa8d	Make DIOCGATTR in device mode handle "GEOM::candelete". MFC after: 3 days	2015-03-12 16:19:18 +00:00
gnn	c531c598ba	Add support for walltimestamp to DTrace on ARM.	2015-03-07 04:38:25 +00:00
andrew	43cb9d5cd8	dtrace_cas32 and dtrace_casptr should retrn the data loaded from target not the new value. Sponsored by: ABT Systems Ltd	2015-03-05 18:03:42 +00:00
andrew	ff8a1038b7	Add the MD parts of dtrace needed to use fbt on ARM. For this we need to emulate the instructions used in function entry and exit. For function entry ARM will use a push instruction to push up to 16 registers to the stack. While we don't expect all 16 to be used we need to handle any combination the compiler may generate, even if it doesn't make sense (e.g. pushing the program counter). On function return we will either have a pop or branch instruction. The former is similar to the push instruction, but with care to make sure we update the stack pointer and program counter correctly in the cases they are either in the list of registers or not. For branch we need to take the 24-bit offset, sign-extend it, and add that number of 4-byte words to the program counter. Care needs to be taken as, due to historical reasons, the address the branch is relative to is not the current instruction, but 8 bytes later. This allows us to use the following probes on ARM boards: dtrace -n 'fbt::malloc:entry { stack() }' and dtrace -n 'fbt:🆓return { stack() }' Differential Revision: https://reviews.freebsd.org/D2007 Reviewed by: gnn, rpaulo Sponsored by: ABT Systems Ltd	2015-03-05 17:55:31 +00:00
nwhitehorn	048b34a391	Fix build after unifying DAR/DEAR storage in trap frame.	2015-03-05 17:02:22 +00:00
rwatson	d81f712ba9	Don't all DTrace's FBT on ARM to instrument undefinedinstruction(), as this would lead to DTrace reentrance. Sponsored by: DARPA, AFRL	2015-03-05 07:40:41 +00:00
andrew	f4b588d2fc	Fix the dtrace ARM atomic compare-and-set functions. These functions are expected to return the data in the memory location pointed at by target after the operation. The FreeBSD atomic functions previously used return either 0 or 1 to indicate if the comparison succeeded or not respectively. With this change these functions only support ARMv6 and later are supported by these functions. Sponsored by: ABT Systems Ltd	2015-03-01 10:04:14 +00:00
rstone	eac1d566b4	Allow Illumos code to co-exist with nv(9) Differential Revision: https://reviews.freebsd.org/D1881 Reviewed by: jfv, will Suggested by: pjd MFC after: 1 month Sponsored by: Sandvine Inc	2015-03-01 00:22:45 +00:00
andrew	c560f15fa8	Use the ARM unwinder with dtrace to extract the stack when asked. With this dtrace is able to display a stack trace similar to the one below. # dtrace -p 603 -n 'tcp:kernel::receive { stack(); }' 0 70 :receive kernel`ip_input+0x140 kernel`netisr_dispatch_src+0xb8 kernel`ether_demux+0x1c4 kernel`ether_nh_input+0x3a8 kernel`netisr_dispatch_src+0xb8 kernel`ether_input+0x60 kernel`cpsw_intr_rx+0xac kernel`intr_event_execute_handlers+0x128 kernel`ithread_loop+0xb4 kernel`fork_exit+0x84 kernel`swi_exit kernel`swi_exit Tested by: gnn Sponsored by: ABT Systems Ltd	2015-02-19 12:20:21 +00:00
gnn	909002dc4f	Clean up machine dependent code for DTrace on ARM. Submitted by: markj	2015-02-11 17:27:37 +00:00
gnn	b9be305241	Initial version of DTrace on ARM32. Submitted by: Howard Su based on work by Oleksandr Tymoshenko Reviewed by: ian, andrew, rpaulo, markj	2015-02-10 19:41:30 +00:00
markj	58755fa7f1	Fix a typo in r278137: make sure to free provider state. X-MFC-With: r278136	2015-02-08 03:55:12 +00:00
pfg	8156f15777	MFV r266995: 4767 dtrace_probe() always has the timestamp Reference: https://illumos.org/issues/4767 Obtained from: Illumos MFC after: 2 weeks	2015-02-03 20:06:30 +00:00
pfg	8afbe75ac6	MFV r266993: 4469 DTrace helper tracing should be dynamic Reference: https://illumos.org/issues/4469 Obtained from: Illumos Phabric: D1551 Reviewed by: markj MFC after: 2 weeks	2015-02-03 19:39:53 +00:00
markj	602435445d	Continue to handle the case where state is NULL, though this currently cannot happen on FreeBSD. r278136 overlooked the fact that a destructor registered with devfs_set_cdevpriv(9) is invoked even in the case of an error. X-MFC-With: r278136	2015-02-03 06:04:16 +00:00
markj	c436419aea	Diff reduction with illumos, in preparation for merging r266993 from the vendor branch. No functional change. MFC after: 1 week	2015-02-03 05:38:52 +00:00
smh	976cdd533d	Prevent inlining txg_quiesce This allows dtrace to monitor the calls to txg_quiesce which can be really helpful. Also standardise __noinline order for arc_kmem_reap_now. Sponsored by: Multiplay	2015-02-02 00:17:36 +00:00
markj	9b9b961a04	Don't attempt to disable enabled fasttrap probes in an exiting process. There's no need to do so, and we can't hold an exiting process, so this race can result in panics. MFC after: 1 week	2015-01-30 05:03:23 +00:00
markj	d127a2030e	In fasttrap_sigtrap(), use tdsendsignal() rather than tdksignal() to send SIGTRAP. The latter requires that its thread argument be non-NULL, but fasttrap_sigtrap() does not. PR: 193593 MFC after: 1 week Reported by: danilo	2015-01-30 04:51:59 +00:00
delphij	262c956ed5	MFV r255258: Diff reduction with upstream. The actual change was merged in r272483 already. MFC after: 2 weeks	2015-01-28 08:56:48 +00:00
will	9f59ab6868	When creating or updating a node, use vfs_timestamp() for "now" instead of gethrestime(), to allow the administrator to decide the appropriate timestamp precision instead of always using nanosecond precision.	2015-01-24 00:43:02 +00:00
will	ea98094037	Remove commented log messages.	2015-01-21 19:30:01 +00:00
will	dd05d56b15	Ignore sync requests from the system syncher, i.e. VFS_SYNC(waitfor=MNT_LAZY). ZFS already commits outstanding data every zfs_txg_timeout seconds, so these syncs are unnecessarily intrusive. Submitted by: gibbs Sponsored by: Spectra Logic MFSpectraBSD: `1105759` on 2014/12/11	2015-01-21 19:25:57 +00:00
will	d9678db8e5	Eliminate an #ifdef illumos for zfs_ioc_rename(). Since allow_mounted is a FreeBSD-specific change, default to B_TRUE, then locally check for the magic bit. Unconditionally check allow_mounted below. Convert the setting of allow_mounted to an explicit boolean. MFC after: 1 week Sponsored by: Spectra Logic MFSpectraBSD: 672578 (in part) on 2013/07/19	2015-01-21 19:20:36 +00:00
will	7cb7f93bfe	Add vfs.zfs.reference_tracking_enable sysctl/tunable. This is primarily for developer/debugging use; it enables built-in tagged tracking of refcounts inside ZFS. It can only be enabled from the loader, since it modifies how in-core state is managed. Default remains disabled. MFC after: 1 week Sponsored by: Spectra Logic	2015-01-21 17:03:11 +00:00
will	f9be6e3f25	Fix arc__shrink DTrace probe's to_free argument. Remove the unnecessary #ifdef _KERNEL, which did not differ in the true or false cases. Actually set the value of to_free before using it. MFC after: 1 week Sponsored by: Spectra Logic	2015-01-20 22:39:10 +00:00
will	5593793474	Use the "zfs_gfs" tag for GFS vnodes to make them easier to identify. MFC after: 1 week Sponsored by: Spectra Logic	2015-01-20 22:31:26 +00:00
will	3c6834bfdd	NSEC_TO_TICK(usec) -> NSEC_TO_TICK(nsec)	2015-01-20 22:29:27 +00:00
will	4bd09b7583	Remove unused strdup() #define.	2015-01-20 22:27:45 +00:00
mav	24b6750419	Allow skipping dmu_buf_will_dirty() call in dsl_dir_transfer_space(). dsl_dir_transfer_space() is mostly called after dsl_dir_diduse_space(), which already calls dmu_buf_will_dirty() for the same dbuf and tx, so its duplicate call in those cases will change nothing, only spend time. Skipping this call by four times reduces time spent in dbuf_write_done() and descendants, updating dataset statistics with several congested lock acquisitions. When rewriting 8K zvol blocks at 1GB/s rate, this reduces CPU time spent inside dbuf_write_done(), according to profiling, from 45% of 683K samples to 18% of 422K. MFC after: 2 weeks	2015-01-20 13:09:12 +00:00
smh	3d07512cea	Clean ZFS spa config before syncing A number of entries that can be present in the spa config shouldn't be saved to disk so add a method to ensure this is case. Without this if the last caller to vdev_config_generate requested stats then we can end up in the cache file. Also only skip a none writable pool in the cache file generation if its active. This prevents unavailable pools incorrectly getting removed from cache file. Tested by: delphij MFC after: 2 weeks Sponsored by: Multiplay	2015-01-18 23:15:49 +00:00
smh	55c26d898b	Mechanically convert cddl sun #ifdef's to illumos Since the upstream for cddl code is now illumos not sun, mechanically convert all sun #ifdef's to illumos #ifdef's which have been used in all newer code for some time. Also do a manual pass to correct the use if #ifdef comments as per style(9) as well as few uses of #if defined(__FreeBSD__) vs #ifndef illumos. MFC after: 1 month Sponsored by: Multiplay	2015-01-17 14:44:59 +00:00
mav	ebab1754a3	Fix overflow bug from r248577, turning 30s TRIM timeout into ~4s. MFC after: 2 weeks	2015-01-14 16:22:00 +00:00
mav	7d12e026a4	Reimplement TRIM throttling added in r248577. Previous throttling implementation approached problem from the wrong side. It significantly limited useful delaying of TRIM requests and aggregation potential, while not so much controlled TRIM burstiness under heavy load. With this change random 4K write benchmarks (probably the worst case for TRIM) show me IOPS increase by 20%, average latency reduction by 30%, peak TRIM bursts reduction by 3 times and same peak TRIM map size (memory usage). Also the new logic does not force map size down so heavily, really allowing to keep deleted data for 32 TXG or 30 seconds under moderate load. It was practically impossible with old throttling logic, which pushed map down to only 64 segments. Reviewed by: smh MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2015-01-14 09:39:57 +00:00
mav	b53f0399f2	Skip extra bcopy() when scrubbing vdev without redundancy. According to profiler, this bcopy() can use about 10% of CPU time. MFC after: 2 weeks	2015-01-12 22:38:55 +00:00
mav	dbc5bca893	When aggregating TRIM segments, move the new one to the list end. New segment at the list head may block all TRIM requests until txg of that segment can be processed. On my random I/O tests this change reduce peak TRIM list length from 650 to 450 segments. Hopefully it should reduce TRIM burstiness when list processing is unblocked. MFC after: 2 weeks	2015-01-11 16:36:39 +00:00
mav	76f4c8c5ee	Add LBA as secondary sort key for synchronous I/O requests. On FreeBSD gethrtime() implemented via getnanouptime(), that has 1ms (1/hz) precision. It makes primary sort key (timestamp) collision very possible. In such situations sorting by secondary key of LBA is much more reasonable then by totally meaningless zio pointer value. With this change on multi-threaded synchronous ZVOL read I've measured 10% throughput increase and average latency reduction. MFC after: 2 weeks	2015-01-11 00:26:18 +00:00
mav	fd6827a197	Use new optimized dmu_read_uio_dbuf() for ZVOLs in device mode. This slightly reduces overhead by avoiding dnode_hold()/dnode_rele() calls. MFC after: 2 weeks	2015-01-10 18:28:58 +00:00
smh	6344c4aa26	Correct zpool list displaying invalid EXPANDSZ for unavailable pool vdevs When pools are unavailable their vdevs are also unavailable which means that vdev_max_asize remains at the default zero. This default was being used to calculate vs_esize resulting in a negative number as vdev_asize > vdev_max_asize, which caused zpool list -v to display 16.0E for EXPANDSZ of these vdevs.	2014-12-31 04:54:48 +00:00
markj	7ea63e4fb4	Restore the trap type argument to the DTrace trap hook, removed in r268600. It's redundant at the moment since it can be obtained from the trapframe on the architectures where DTrace is supported, but this won't be the case with ARM.	2014-12-23 15:38:19 +00:00
smh	325b63f583	Always sync the global ZFS config cache to reflect the new mosconfig This fixes out of date zpool.cache for root pools, which can cause issues such as confusion of zdb etc. MFC after: 1 month	2014-12-23 09:31:24 +00:00
smh	6be4fb3ccf	Fix panic when resizing ZFS zvol's Resizing a ZFS ZVOL with debug enabled would result in a panic due to recursion on dp_config_rwlock. The upstream change "3464 zfs synctask code needs restructuring" changed zvol_set_volsize to avoid the recursion on dp_config_rwlock, but this was missed when originally merged in by r248571 due to significant differences in our codebases in this area. These changes also relied on bring in changes from upstream: 3557 dumpvp_size is not updated correctly when a dump zvol's size is changed, which where also not present. In order to help prevent future issues in this area a direct comparison and diff minimisation from current upstream version (b515258) of zvol.c. Differential Revision: https://reviews.freebsd.org/D1302 MFC after: 1 month X-MFC-With: r276063 & r276066 Sponsored by: Multiplay	2014-12-22 18:39:38 +00:00
smh	9ab3bd130f	Refactor zvol locking to minimise diff with upstream Use #define zfsdev_state_lock spa_namespace_lock instead of replacing all zfsdev_state_lock with spa_namespace_lock to minimise changes from upstream. Differential Revision: D1302 MFC after: 1 month X-MFC-With r276063 Sponsored by: Multiplay	2014-12-22 17:04:51 +00:00
smh	d9885d34c5	Standardise on illumos for #ifdef's in zvol.c Also correct as per style(9) on the use of #ifdef comments. This is a no-op change as pre-cursor to a full cleanup and merge with upstream zvol changes. Sponsored by: Multiplay	2014-12-22 16:38:29 +00:00
kib	4e541c8756	Handle MAKEENTRY cnp flag in the VOP_CREATE(). Curiously, some fs, e.g. smbfs, already did it. Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-21 13:29:33 +00:00
delphij	c6587f31b0	Add missing continue: we can't proceed further if the kernel does not panic with zfs_panic_recover. Illumos issue: 5438 zfs_blkptr_verify should continue after zfs_panic_recover Reported by: Coverity CID: 1232014	2014-12-19 00:20:29 +00:00
delphij	2d307c57cf	MFV r275914: As of r270383, the dbuf_compare comparator compares the dbuf attributes in the following order: db_level (indirect level) db_blkid (block number) db_state (current state) the address of the element Because db_state is being considered before the element's state, changing of db_state would affect balancedness of the AVL tree, even when the address of element compares differently. For instance, in dbuf_create, db_state may be altered after the node is inserted into the AVL tree and may break AVL tree balancedness. Instead of using db_state as a comparision critera (introduced in r270383), consider it only when we are doing a lookup, that is one of the two dbuf pointers contains DB_SEARCH. Illumos issue: 5422 preserve AVL invariants in dn_dbufs MFC after: 2 weeks	2014-12-18 23:45:26 +00:00
kib	77c9d3f4e8	The VOP_LOOKUP() implementations for CREATE op do not put the name into namecache, to avoid cache trashing when doing large operations. E.g., tar archive extraction is not usually followed by access to many of the files created. Right now, each VOP_LOOKUP() implementation explicitely knowns about this quirk and tests for both MAKEENTRY flag presence and op != CREATE to make the call to cache_enter(). Centralize the handling of the quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP. VFS now sets NOCACHE flag for CREATE namei() calls. Note that the change in semantic is backward-compatible and could be merged to the stable branch, and is compatible with non-changed third-party filesystems which correctly handle MAKEENTRY. Suggested by: Chris Torek <torek@pi-coral.com> Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-18 10:01:12 +00:00
delphij	55ed102bbc	MFV r275783: Convert ARC flags to use enum. Previously, public flags are defined in arc.h and private flags are defined in arc.c which can lead to confusion and programming errors. Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf' or 'ab' because arc_buf_t are often named 'buf' as well. Illumos issue: 5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme MFC after: 2 weeks	2014-12-15 18:22:45 +00:00
delphij	a96c405009	MFV r275551: Remove "dbuf phys" db->db_data pointer aliases. Use function accessors that cast db->db_data to the appropriate "phys" type, removing the need for clients of the dmu buf user API to keep properly typed pointer aliases to db->db_data in order to conveniently access their data. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: In zap_leaf() and zap_leaf_byteswap, now that the pointer alias field l_phys has been removed, use the db_data field in an on stack dmu_buf_t to point to the leaf's phys data. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: Remove the db_user_data_ptr_ptr field from dbuf and all logic to maintain it. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: Modify the DMU buf user API to remove the ability to specify a db_data aliasing pointer (db_user_data_ptr_ptr). cddl/contrib/opensolaris/cmd/zdb/zdb.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Create and use the new "phys data" accessor functions dsl_dir_phys(), dsl_dataset_phys(), zap_m_phys(), zap_f_phys(), and zap_leaf_phys(). sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Remove now unused "phys pointer" aliases to db->db_data from clients of the DMU buf user API. Illumos issue: 5314 Remove "dbuf phys" db->db_data pointer aliases in ZFS MFC after: 2 weeks	2014-12-15 07:52:23 +00:00
delphij	a715ee11ee	MFV r275550: In addition to r273158, make the code in spa_sync() that checks if the current TXG is a no-op TXG less fragile. Illumos issue: 5347 idle pool may run itself out of space MFC after: 2 weeks	2014-12-15 05:10:55 +00:00
delphij	9959a8151c	MFV r275549: Add a loader tunable, vfs.zfs.arc_meta_min, which controls how much metadata ZFS should keep in ARC at minimum. In arc_evict(), when doing recycle, take more factors into account by applying the following policy: 1. If no evictable data, evict metadata; 2. If no evictable metadata, evict data; 3. If we hit arc_meta_limit, evict metadata; 4. If we haven't hit arc_meta_min, evict data; 5* (Illumos only, not present in new FreeBSD code, yet) evict the oldest cached element from data and metadata. (FreeBSD) evict the data type specified by caller, which is the existing behavior. Note that because of our splitted locks (implemented in r205231 to improve scalability by reducing lock contention), implementing the fifth Illumos behavior will not be cheap, so for now just implement the 1-4 and fall back to current behavior for 5. Illumos issue: 5368 ARC should cache more metadata MFC after: 2 months (assuming we didn't found better solution)	2014-12-15 04:51:36 +00:00
delphij	7b50e3bb4b	MFV r247174: Expose arc_meta_limit, et al via kstats. Note that as a result, vfs.zfs.arc_meta_used is removed. The existing vfs.zfs.arc_meta_limit sysctl/tunable is retained with a SYSCTL_PROC wrapper. Illumos ZFS issues: 3561 arc_meta_limit should be exposed via kstats Relnotes: yes MFC after: 2 weeks	2014-12-13 19:17:28 +00:00
delphij	57f4abab04	MFV r275548: Verify that the block pointer is structurally valid, before attempting to read it in. It can only be invalid in the case of a ZFS bug, but this change will help identify such bugs in a more transparent way, by panic'ing with a relevant message, rather than indexing off the end of an array or something. Illumos issue: 5349 verify that block pointer is plausible before reading MFC after: 2 weeks	2014-12-13 02:08:18 +00:00
delphij	3fde98966c	MFV r275546: Reduce scrub activities when system there is enough dirty data, namely when dirty data is more than zfs_vdev_async_write_active_min_dirty_percent (once we start to increase the number of concurrent async writes). While there also correct rounding error which would make scrub end up pausing for (zfs_txg_timeout + 1) seconds instead of the desired zfs_txg_timeout seconds. Illumos issue: 5351 scrub goes for an extra second each txg 5352 scrub should pause when there is some dirty data MFC after: 2 weeks	2014-12-13 01:39:24 +00:00
delphij	33da909d10	MFV r275545: If zio_checksum_error() returns other than ECKSUM (e.g. EINVAL), it does not fill in the "zio_bad_cksum_t *info" parameter. Caller should not attempt to use it in this case. Illumos issue: 5348 zio_checksum_error() only fills in info if ECKSUM MFC after: 2 weeks	2014-12-13 01:26:06 +00:00
delphij	b6fe3b0b2d	MFV r275544: Clean up some duplicated code in dnode_sync() around freeing spill blocks. Illumos issue: 5350 clean up code in dnode_sync() MFC after: 2 weeks	2014-12-13 01:18:23 +00:00
delphij	917c282900	MFV r275543: Remove always true tests for ds->ds_phys' presence. Clean up assertions in dsl_dataset_disown. Remove unreachable code in dsl_dataset_disown(). Illumos issue: 5310 Remove always true tests for non-NULL ds->ds_phys MFC after: 2 weeks	2014-12-13 01:14:59 +00:00
delphij	12b51b69b5	MFV r275542: If a dnode has a spill block and there is an error while accessing a data block then traverse_dnode() loses information about that error and returns a status of visiting the spill block. This issue is discovered by Spectra Logic. Illumos issue: 5311 traverse_dnode may report success when it should not Original author: gibbs MFC after: 2 weeks	2014-12-13 01:10:17 +00:00
delphij	713658ad35	MFV r275540: When importing a pool, don't assume that the passed pool configuration at vdev_load is always vaild. It's possible that a stale configuration that comes with extra vdevs, where metaslab_init() would fail because of lower layer returns error. Change the code to make metaslab_init() handle and return errors from lower layer and pass it back to upper layer and handle it there. Illumos issue: 5213 panic in metaslab_init due to space_map_open returning ENXIO MFC after: 2 weeks	2014-12-08 06:04:42 +00:00
avg	15bfe2d262	remove opensolaris cyclic code, replace with high-precision callouts In the old days callout(9) had 1 tick precision and that was inadequate for some uses, e.g. DTrace profile module, so we had to emulate cyclic API and behavior. Now we can directly use callout(9) in the very few places where cyclic was used. Differential Revision: https://reviews.freebsd.org/D1161 Reviewed by: gnn, jhb, markj MFC after: 2 weeks	2014-12-07 11:21:41 +00:00
andrew	4b9dde8b13	Apply the same fix in r274697 to the ARM case.	2014-12-06 12:03:09 +00:00
delphij	7cb9fbedb1	MFV r275535: Unexpand ISP2() and MSEC2NSEC(). Illumos issue: 5255 uts shouldn't open-code ISP2 MFC after: 2 weeks	2014-12-06 09:38:28 +00:00
delphij	9df66c4cfb	MFV r275534: Sync with Illumos. This have no effect to FreeBSD. Illumos issue: 5285 pass in cpu_pause_func via pause_cpus MFC after: 2 weeks	2014-12-06 09:14:46 +00:00
delphij	a561c01fd5	MFC r275533: Sync with Illumos. This have no effect to FreeBSD. Illumos issue: 5100 sparc build failed after 5004 MFC after: 2 weeks	2014-12-06 09:11:13 +00:00
delphij	f85c3d6050	Use %d instead of %u for error number. This way we see ERESTART as -1 not 4294967295 when doing DTrace. MFC after: 2 weeks	2014-12-05 22:56:10 +00:00
delphij	5bff99fba8	Fix a regression introduced in r274337 (large block support) In dsl_dataset_hold_obj() we used zap_contains(.., DS_FIELD_LARGE_BLOCKS) to determine whether the extensible (zapifyed) dataset have large blocks. The code expects the result be either 0 (found) or ENOENT (not found), however reused the variable 'err' which later code expects to be 0. Fix this by adopting similar code construct that is used later for DS_FIELD_BOOKMARK_NAMES, which uses a temporary variable zaperr to catch errors from zap_* rountines. Reported by: Peter J. Creath (on FreeNAS; FreeNAS bug #6848) Illumos issue: 5393 spurious failures from dsl_dataset_hold_obj() Reviewed by: mahrens Sponsored by: iXsystems, Inc. X-MFC with: r274337	2014-12-05 18:29:01 +00:00
mav	7884d9292a	Add GET LBA STATUS command support to CTL. It is implemented for LUNs backed by ZVOLs in "dev" mode and files. GEOM has no such API, so for LUNs backed by raw devices all LBAs will be reported as mapped/unknown. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-12-04 11:34:19 +00:00
avg	926bdf39ae	zfs_putpages: actually update mtime and ctime Reported by: Paul Koch <paul.koch@akips.com> Tested by: Paul Koch <paul.koch@akips.com> MFC after: 2 weeks	2014-12-02 11:44:56 +00:00
delphij	cbde4886c5	Revert r273060 per discussion with avg@ as we need to make L2ARC aware of 4K devices and this one is not the right fix anyway.	2014-11-26 02:20:25 +00:00
dim	f38628840c	Fix the following -Werror warning from clang 3.5.0, while building cddl/lib/libctf: In file included from cddl/contrib/opensolaris/common/ctf/ctf_create.c:31: In file included from sys/cddl/contrib/opensolaris/uts/common/sys/sysmacros.h:34: sys/cddl/contrib/opensolaris/uts/common/sys/isa_defs.h:334:9: warning: '_ILP32' macro redefined [-Wmacro-redefined] #define _ILP32 ^ <built-in>:26:9: note: previous definition is here #define _ILP32 1 ^ 1 warning generated. This is because clang 3.5.0 started predefining _ILP32 and __ILP32__ for the i386 arch. (Earlier versions already predefined _LP64 and __LP64__ for the x86_64 arch.) Reviewed by: emaste, avg, smh, delphij, markj Differential Revision: https://reviews.freebsd.org/D1187	2014-11-19 07:44:21 +00:00
delphij	4a8d07956d	Make vfs.zfs.max_recordsize read-write at runtime. MFC after: 2 weeks	2014-11-18 22:35:19 +00:00
delphij	533328434b	Add a tunable for spa_slop_shift which controls how much space we would reserve by default. Tuning is not recommended. MFC after: 2 weeks	2014-11-18 18:52:38 +00:00
delphij	46a768faae	Allow tuning zfs_max_recordsize via loader tunable. Tuning is NOT recommended. Requested by: Slawa Olhovchenkov <slw zxy spb ru> MFC after: 2 weeks	2014-11-18 18:40:01 +00:00
avg	9d71a483ee	l2arc: restore correct rounding up of asize of compressed data This rounding up was lost in a mismerge of illumos code. See r268075 MFV r267565. After that commit zio_compress_data() no longer performs any compressed size adjustment, so it needs to be done externally. On FreeBSD we round up the size using vdev_ashift rather than SPA_MINBLOCKSIZE so that 4KB devices are properly supported. Additionally, zero out the buffer tail only if compression succeeds. The compression is considered successful if the size of compressed data after rounding up to account for the vdev ashift is less than the original data size. It does not make sense to have the data compressed if all the savings are lost to rounding up. With the new zio_compress_data() it could have been possible that the rounded compressed size would be greater than the original size and thus we could zero beyond the allocated buffer if the zeroing code was kept at the original place. Discussed with: delphij, gibbs MFC after: 2 weeks X-MFC with: r274627	2014-11-17 14:45:42 +00:00
avg	a55c441715	Revert r269093 which introduced physical zio alignment transform Size of physical ZIOs must never be implicitly adjusted, it's a responsibility of a caller to make sure that such a ZIO has proper offset and size. Discussed with: delphij, gibbs MFC after: 2 weeks	2014-11-17 14:16:02 +00:00
smh	b64d477633	Disable TRIM on file backed ZFS vdevs and fix TRIM on init After r265152 TRIM requests are ZIO_TYPE_FREE instead of ZIO_TYPE_IOCTL this meant file backed vdevs to attempted to process the ZIO as a write causing a panic. We now disable TRIM on file backed vdevs and ASSERT the ZIO types supported by each vdev type to ensure we explicity support the ZIO type being processed. Also ensure that TRIM on init is not procesed for devices which declare they didn't support TRIM via vdev_notrim. PR: 195061, 194976, 191573 Sponsored by: Multiplay	2014-11-17 11:32:10 +00:00
kib	b4ef709604	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
delphij	fe03d9b9d2	MFV r274273: ZFS large block support. Please note that booting from datasets that have recordsize greater than 128KB is not supported (but it's Okay to enable the feature on the pool). This may remain unchanged because of memory constraint. Limited safety belt is provided for mounted root filesystem but use caution is advised. Illumos issue: 5027 zfs large block support MFC after: 1 month	2014-11-10 08:20:21 +00:00
delphij	0d7beefb91	MFV r274272 and diff reduction with upstream. Illumos issue: 5244 zio pipeline callers should explicitly invoke next stage Tested with: ztest plus ZFS over GELI configuration MFC after: 1 month	2014-11-09 07:37:00 +00:00
delphij	e284683f74	MFV r274271: Improve zdb -b performance: - Reduce gethrtime() call to 1/100th of blkptr's; - Skip manipulating the size-ordered tree; - Issue more (10, previously 3) async reads; - Use lighter weight testing in traverse_visitbp(); Illumos issue: 5243 zdb -b could be much faster MFC after: 2 weeks	2014-11-08 07:30:40 +00:00
avg	b98f85d480	fix l2arc compression buffers leak We have observed that arc_release() can be called concurrently with a l2arc in-flight write. Also, we have observed that arc_hdr_destroy() can be called from arc_write_done() for a zio with ZIO_FLAG_IO_REWRITE flag in similar circumstances. Previously the l2arc headers would be freed while leaking their associated compression buffers. Now the buffers are placed on l2arc_free_on_write list for delayed freeing. This is similar to what was already done to arc buffers that were supposed to be freed concurrently with in-flight writes of those buffers. In addition to fixing the discovered leaks this change also adds some protective code to assert that a compression buffer associated with a l2arc header is never leaked. A new kstat l2_cdata_free_on_write is added. It keeps a count of delayed compression buffer frees which previously would have been leaks. Tested by: Vitalij Satanivskij <satan@ukr.net> et al Requested by: many MFC after: 2 weeks Sponsored by: HybridCluster / ClusterHQ	2014-11-06 11:08:02 +00:00
mav	e22f45febc	Add to CTL support for logical block provisioning threshold notifications. For ZVOL-backed LUNs this allows to inform initiators if storage's used or available spaces get above/below the configured thresholds. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-11-06 00:48:36 +00:00
jpaetzel	0584337fe8	This change addresses 4 bugs in ZFS exposed by Richard Kojedzinszky's crash.sh script attached to FreeNAS bug 4109: https://bugs.freenas.org/issues/4109 Three are in the snapshot layer: a) AVG explains in his notes: https://wiki.freebsd.org/AvgVfsSolarisVsFreeBSD "VOP_INACTIVE must not do any destructive actions to a vnode and its filesystem node, nor invalidate them in any way." gfs_vop_inactive and zfsctl_snapshot_inactive did just that. In OpenSolaris VOP_INACTIVE is much closer to FreeBSD's VOP_RECLAIM. Rename & move them to gfs_vop_reclaim and zfsctl_snapshot_reclaim and merge in the requisite vnode_destroy from zfsctl_common_reclaim. b) gfs_lookup_dot and various zfsctl functions do not honor the FreeBSD VFS convention of only locking from the root downward. When looking up ".." the convention is to drop the current leaf vnode lock before acquiring the directory vnode and then subsequently re-acquiring the lock on the leaf vnode. This fixes that in all the places that our exercised by crash.sh. c) The snapshot may already be unmounted when the directory vnode is reclaimed. Check for this case and return. One in the common layer: d) Callers of traverse expect the reference to the vnode passed in to be maintained. Don't release it. This last one may be an unclear contract. There may in fact be some callers that do expect the reference to be dropped on success in addition to callers that expect it to be released. In this case a further audit of the callers is needed and a consensus on the correct behavior. PR: 184677 Submitted by: kmacy Reviewed by: delphij, will, avg MFC after: 2 weeks Sponsored by: iXsystems	2014-10-25 17:42:44 +00:00
jhibbits	a2a568a9e6	Whitespace X-MFC-with: r273570 MFC after: 1 week	2014-10-24 03:34:21 +00:00
jhibbits	251119f0e6	Three updates to PowerPC FBT: * Use a constant to define the number of stack frames in a probe exception. * Only allow function symbols in powerpc64 ('.' prefixed) * Set the fbtp_roffset for return probes, so the correct dtrace_probe call is made. MFC after: 1 week	2014-10-24 03:33:01 +00:00
hselasky	49c137f7be	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
delphij	5165eb2973	Add tunable vfs.zfs.space_map_blksz for space map's maximum block size. MFC after: 2 weeks	2014-10-18 22:11:10 +00:00
davide	e88bd26b3f	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
smh	48431b7da6	Prevent ZFS leaking pool free space When processing async destroys ZFS would leak space every txg timeout (5 seconds by default), if no writes occurred, until the pool is totally full. At this point it would be unfixable without a pool recreation. In addition if the machine was rebooted with the pool in this situation would fail to import on boot, hanging indefinitely, as the import process requires the ability to write data to the pool. Any attempts to query the pool status during the hung import would not return as the import holds the pool lock. The only way to import such a pool would be to specify -o readonly=on to the zpool import. zdb -bb <pool> can be used to check for "deferred free" size which is where this lost space will be counted. MFC after: 3 days Sponsored by: Multiplay	2014-10-16 02:23:27 +00:00
delphij	970ba4b3da	Use write_psize instead of write_asize when doing vdev_space_update. Without this change the accounting of L2ARC usage would be wrong and give 16EB free space because the number became negative and overflows. Obtained from: FreeNAS (issue #6239) MFC after: 2 weeks	2014-10-13 20:39:51 +00:00
delphij	275ab8166b	Add a tunable for arc_shrink_shift (vfs.zfs.arc_shrink_shift) that controls how much fraction, 1/2^arc_shrink_shift, should be reclaimed when there is memory pressure. Submitted by: Richard Kojedzinszky <krichy at tvnetwork.hu> MFC after: 2 weeks	2014-10-13 05:34:10 +00:00
delphij	0dc921b942	MFV r272804: Refactor the code and stop restore_object from creating two transactions. Illumos issue: 3693 restore_object uses at least two transactions to restore an object MFC after: 2 weeks	2014-10-09 07:52:51 +00:00
delphij	85c930be77	MFV r272803: Illumos issue: 5175 implement dmu_read_uio_dbuf() to improve cached read performance MFC after: 2 weeks	2014-10-09 07:18:40 +00:00
avg	e2d6ebec55	l2arc_write_buffers: reduce headroom value FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS data lists (currently both are 16) while illumos has just a single list of each kind. headroom determines how much data is scanned on a single list during each run of the l2arc feed thread. Because FreeBSD has more lists we proportionally decrease the limit. Reviewed by: Brendan Gregg (earlier version) MFC after: 2 weeks Sponsored by: HybridCluster	2014-10-07 16:08:21 +00:00
avg	9dd0ee9433	revert r272702: wrong (earlier) change was committed	2014-10-07 16:06:10 +00:00
avg	a402f22912	reduce L2ARC_WRITE_SIZE on FreeBSD FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS data lists (currently both are 16) while illumos has just a single list of each kind. L2ARC_WRITE_SIZE determines the default value of l2arc_write_max which defines limits on how much data is scanned and written to a cache device during each run of the l2arc feed thread. The limits are applied on the per buffer list basis. Because FreeBSD has more lists we proportionally reduce the limits. Reviewed by: Brendan Gregg (earlier version) MFC after: 2 weeks Sponsored by: HybridCluster	2014-10-07 14:30:24 +00:00
avg	073bd9bd13	make userland __assfail from opensolaris compat honor 'aok' variable This should allow zdb -A option to actually make difference. MFC after: 2 weeks	2014-10-07 14:15:50 +00:00
delphij	004a50f8bd	MFV r272591: Use loaned ARC buffer for zfs receive to avoid copy. Illumos issue: 5162 zfs recv should use loaned arc buffer to avoid copy MFC after: 2 weeks	2014-10-06 07:29:17 +00:00
delphij	3f54b74af4	MFV r272585: Split the godfather zio into CPU number's to reduce lock contention. Illumos issue: 5176 lock contention on godfather zio MFC after: 2 weeks	2014-10-06 07:03:17 +00:00
delphij	37185e7390	MFV r272501: Illumos issue: 5177 remove dead code from dsl_scan.c MFC after: 2 weeks	2014-10-06 05:46:51 +00:00
delphij	bd5e6432da	MFV r272500: Don't inherit flags other than DS_FLAG_CI_DATASET and DS_FLAG_INCONSISTENT when cloning. This prevents DS_FLAG_DEFER_DESTROY being inherited from a clone that is marked for deferred destroy, which causes snapshots of the clone being destroyed when getting a hold or clone. Illumos issue: 5150 zfs clone of a defer_destroy snapshot causes strangeness MFC after: 1 week	2014-10-06 05:42:20 +00:00
delphij	60bf73506e	Don't make nested definition for range_seg_cache. Reported by: ian MFC after: 1 week X-MFC-With: r272506	2014-10-04 15:42:52 +00:00
delphij	87516a919a	MFV r272499: Illumos issue: 5174 add sdt probe for blocked read in dbuf_read() MFC after: 2 weeks	2014-10-04 08:55:08 +00:00
delphij	8bbc3504dc	Add a new sysctl, vfs.zfs.vol.unmap_enabled, which allows the system administrator to toggle whether ZFS should ignore UNMAP requests. Illumos issue: 5149 zvols need a way to ignore DKIOCFREE MFC after: 2 weeks	2014-10-04 08:51:57 +00:00
delphij	cd42e24ad6	Diff reduction with upstream. The code change is not really applicable to FreeBSD. Illumos issue: 5148 zvol's DKIOCFREE holds zfsdev_state_lock too long MFC after: 1 month	2014-10-04 08:41:23 +00:00
delphij	096b5dca21	MFV r272496: Add tunable for number of metaslabs per vdev (vfs.zfs.vdev.metaslabs_per_vdev). The default remains at 200. Illumos issue: 5161 add tunable for number of metaslabs per vdev MFC after: 2 weeks	2014-10-04 08:29:48 +00:00
delphij	3dd244b458	MFV r272495: In arc_kmem_reap_now(), reap range_seg_cache too to reclaim memory in response of memory pressure. Illumos issue: 5163 arc should reap range_seg_cache MFC after: 1 week	2014-10-04 08:14:10 +00:00
delphij	64595536f1	MFV r272494: Make space_map_truncate() always do space_map_reallocate(). Without this, setting space_map_max_blksz would cause panic for existing pool, as dmu_objset_set_blocksize would fail if the object have multiple blocks. Illumos issues: 5164 space_map_max_blksz causes panic, does not work 5165 zdb fails assertion when run on pool with recently-enabled spacemap_histogram feature MFC after: 2 weeks	2014-10-04 08:05:39 +00:00
smh	f2543cb01c	Refactor ZFS ARC reclaim checks and limits Remove previously added kmem methods in favour of defines which allow diff minimisation between upstream code base. Rebalance ARC free target to be vm_pageout_wakeup_thresh by default which eliminates issue where ARC gets minimised instead of balancing with VM pageout. The restores the target point prior to r270759. Bring in missing upstream only changes which move unused code to further eliminate code differences. Add additional DTRACE probe to aid monitoring of ARC behaviour. Enable upstream i386 code paths on platforms which don't define UMA_MD_SMALL_ALLOC. Fix mixture of byte an page values in arc_memory_throttle i386 code path value assignment of available_memory. PR: 187594 Review: D702 Reviewed by: avg MFC after: 1 week X-MFC-With: r270759 & r270861 Sponsored by: Multiplay	2014-10-03 20:34:55 +00:00
smh	c04b03f65b	Fix various issues with zvols When performing snapshot renames we could deadlock due to the locking in zvol_rename_minors. In order to avoid this use the same workaround as zvol_open in zvol_rename_minors. Add missing zvol_rename_minors to dsl_dataset_promote_sync. Protect against invalid index into zv_name in zvol_remove_minors. Replace zvol_remove_minor calls with zvol_remove_minors to ensure any potential children are also renamed. Don't fail zvol_create_minors if zvol_create_minor returns EEXIST. Restore the valid pool check in zfs_ioc_destroy_snaps to ensure we don't call zvol_remove_minors when zfs_unmount_snap fails. PR: 193803 MFC after: 1 week Sponsored by: Multiplay	2014-10-03 14:49:48 +00:00
araujo	4100d9b0a6	Fix failures and warnings reported by newpynfs20090424 test tool. This fix addresses only issues with the pynfs reports, none of these issues are know to create problems for extant real clients. Submitted by: Bart Hsiao <bart.hsiao@gmail.com> Reworked by: myself Reviewed by: rmacklem Approved by: rmacklem Sponsored by: QNAP Systems Inc.	2014-10-03 02:24:41 +00:00
delphij	7137fdfbce	Diff reduction with kernel code: instruct the compiler that the data of these types may be unaligned to their "normal" alignment and exercise caution when accessing them. PR: 194071 MFC after: 3 days	2014-10-02 00:13:08 +00:00
will	1e6d91e484	zfsvfs_create(): Refuse to mount datasets whose names are too long. This is checked for in the zfs_snapshot_004_neg STF/ATF test (currently still in projects/zfsd rather than head). sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c: - zfsvfs_create(): Check whether the objset name fits into statfs.f_mntfromname, and return ENAMETOOLONG if not. Although the filesystem can be unmounted via the umount(8) command, any interface that relies on iterating on statfs (e.g. libzfs) will fail to find the filesystem by its objset name, and thus assume it's not mounted. This causes "zfs unmount", "zfs destroy", etc. to fail on these filesystems, whether or not -f is passed. MFC after: 1 month Sponsored by: Spectra Logic MFSpectraBSD: 974872 on 2013/08/09	2014-10-01 14:12:02 +00:00
delphij	915740a55d	Fix a mismerge in r260183 which prevents snapshot zvol devices being removed and re-instate the fix in r242862. Reported by: Leon Dang <ldang nahannisys com>, smh MFC after: 3 days	2014-09-30 18:50:45 +00:00
smh	c466dfba5f	Remove sys/types.h include as per style (9) SDT requries sys/param.h due to use of NULL Reported by: Garrett Sponsored by: Multiplay	2014-09-18 20:38:18 +00:00
smh	7ce047b163	Add dtrace probe support for zfs SET_ERROR(..) MFC after: 1 week Sponsored by: Multiplay	2014-09-18 20:00:36 +00:00
will	2fa5cd85e7	Remove debug.zfs_flags in favor of the new vfs.zfs.debug_flags. Replace TUNABLE_INT with CTLFLAG_RWTUN. Submitted by: avg (debug.zfs_flags removal), smh (TUNABLE_INT replacement)	2014-09-18 18:46:38 +00:00
will	691a9f40b4	Enable ZFS debug flags to be modified via vfs.zfs.debug_flags. This is primarily only of interest to ZFS developers, but it makes it easier to get additional debugging. Submitted by: gibbs MFC after: 1 month Sponsored by: Spectra Logic MFSpectraBSD: 517074 on 2011/12/15 (by will), 662343 on 2013/03/20 (by gibbs)	2014-09-18 16:55:41 +00:00
will	7a7c171c68	Reorder sysctls for spa.c global tunables; add sysctl for ccw_retry_interval. MFC after: 1 month Sponsored by: Spectra Logic	2014-09-18 16:38:03 +00:00
will	7288f5d2fc	bpobj_iterate_impl(): Close a refcount leak iterating on a sublist. If bpobj_space() returned non-zero here, the sublist would have been left open, along with the bonus buffer hold it requires. This call does not invoke any calls to bpobj_close() itself. This bug doesn't have any known vector, but was found on inspection. MFC after: 1 week Sponsored by: Spectra Logic Affects: All ZFS versions starting 21 May 2010 (illumos cde58dbc) MFSpectraBSD: r1050998 on 2014/03/26	2014-09-18 15:37:53 +00:00
smh	fc398f1605	Remove unused ZFS ARC functions * arc_data_buf_alloc * arc_data_buf_free MFC after: 1 week Sponsored by: Multiplay	2014-09-18 10:46:51 +00:00
jhibbits	2ae1525481	Fix the stack tracing for dtrace/powerpc. Summary: Fix the stack tracing for dtrace/powerpc by using the trapexit/asttrapexit return address sentinels instead of checking within the kernel address space. As part of this, I had to add new inline functions. FBT traces the kernel, so we have to have special case handling for this, since a trap will create a full new trap frame, and there's no way to pass around the 'real' stack. I handle this by special-casing 'aframes == 0' with the trap frame. If aframes counts out to the trap frame, then assume we're looking for the full kernel trap frame, so switch to the real stack pointer. Test Plan: Tested on powerpc64 Reviewers: rpaulo, markj, nwhitehorn Reviewed By: markj, nwhitehorn Differential Revision: https://reviews.freebsd.org/D788 MFC after: 3 week Relnotes: Yes	2014-09-17 02:43:47 +00:00
smh	dfd30974e5	Added missing ZFS sysctls * vfs.zfs.vdev.async_write_active_min_dirty_percent * vfs.zfs.vdev.async_write_active_max_dirty_percent Added validation of min / max for ZFS sysctl * vfs.zfs.dirty_data_max_percent MFC after: 3 days	2014-09-14 12:23:00 +00:00
delphij	387d8afb94	MFV r271518: Correctly report hole at end of file. When asked to find a hole, the DMU sees that there are no holes in the object, and returns ESRCH. The ZPL interprets this as "no holes before the end of the file", and therefore inserts the "virtual hole" at the end of the file. Because DMU and ZPL have different ideas of where the end of an object/file is, we will end up returning the end of file, which is generally larger, instead of returning the end of object. The fix is to handle the "virtual hole" in the DMU. If no hole is found, the DMU will return a hole at the end of the file, rather than an error. Illumos issue: 5139 SEEK_HOLE failed to report a hole at end of file MFC after: 1 week	2014-09-13 17:48:44 +00:00
delphij	9cdf61a6da	MFV r271517: In zil_claim, don't issue warning if we get EBUSY (inconsistent) when opening an objset, instead, ignore it silently. Illumos issue: 5140 message about "%recv could not be opened" is printed when booting after crash MFC after: 1 week	2014-09-13 17:36:34 +00:00
delphij	3a202e2324	MFV r271515: Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to limit how many blocks can be free'ed before a new transaction group is created. The default is no limit (infinite), but we should probably have a lower default, e.g. 100,000. With this limit, we can guard against the case where ZFS could run out of memory when destroying large numbers of blocks in a single transaction group, as the entire DDT needs to be brought into memory. Illumos issue: 5138 add tunable for maximum number of blocks freed in one txg MFC after: 2 weeks	2014-09-13 17:24:56 +00:00
delphij	49c2133129	MFV r271512: Illumos issue: 5136 fix write throttle comment in dsl_pool.c MFC after: 2 weeks	2014-09-13 16:51:23 +00:00
delphij	bd509415bb	MFV r271510: Enforce 4K as smallest indirect block size (previously the smallest indirect block size was 1K but that was never used). This makes some space estimates more accurate and uses less memory for some data structures. Illumos issue: 5141 zfs minimum indirect block size is 4K MFC after: 2 weeks	2014-09-13 16:26:14 +00:00
smh	c3c60bff50	Persist vdev_resilver_txg changes to avoid panic caused by validation vs a vdev_resilver_txg value from a previous resilver. MFC after: 1 week	2014-09-11 16:21:51 +00:00
glebius	5939c729a8	Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES().	2014-09-10 12:36:41 +00:00
mav	7797473e53	Make ZVOL writes in device mode support IO_SYNC flag. MFC after: 1 month	2014-09-09 11:29:55 +00:00
delphij	52c7048527	MFV r271223: In dnode_sync(), do dnode_increase_indirection() before processing the dn_next_nblkptr. Illumos issue: 5117 space map reallocation can cause corruption MFC after: 3 days	2014-09-07 13:13:42 +00:00
peter	3baf385084	Move the restored #ifdef i386 test back inside the #ifdef _KERNEL block where it originally was.	2014-08-31 09:05:02 +00:00
smh	8d9d31d786	Ensure that ZFS ARC free memory checks include cached pages Also restore kmem_used() check for i386 as it has KVA limits that the raw page counts above don't consider PR: 187594 Reviewed by: peter X-MFC-With: r270759 Review: D700 Sponsored by: Multiplay	2014-08-30 21:44:32 +00:00
mjg	4cf719a9ee	Add missing proctree locking to fill_kinfo_proc consumers. This fixes r270444. Pointy hat: mjg Reported by: many MFC after: 1 week	2014-08-30 03:10:55 +00:00
smh	502601a540	Refactor ZFS ARC reclaim logic to be more VM cooperative Prior to this change we triggered ARC reclaim when kmem usage passed 3/4 of the total available, as indicated by vmem_size(kmem_arena, VMEM_ALLOC). This could lead large amounts of unused RAM e.g. on a 192GB machine with ARC the only major RAM consumer, 40GB of RAM would remain unused. The old method has also been seen to result in extreme RAM usage under certain loads, causing poor performance and stalls. We now trigger ARC reclaim when the number of free pages drops below the value defined by the new sysctl vfs.zfs.arc_free_target, which defaults to the value of vm.v_free_target. Credit to Karl Denninger for the original patch on which this update was based. PR: 191510 and 187594 Tested by: dteske MFC after: 1 week Relnotes: yes Sponsored by: Multiplay	2014-08-28 19:50:08 +00:00
markj	46bd89ef4c	Restore the correct value when disabling probes. Otherwise the instrumented tracepoints would continue to generate traps, which would be ignored but could consume noticeable amounts of CPU if, say, all functions in the kernel were instrumented. X-MFC-With: r270067	2014-08-24 17:10:47 +00:00
delphij	d89e74165b	Instead of using timestamp in the AVL, use the memory address when comparing. Illumos issue: 5095 panic when adding a duplicate dbuf to dn_dbufs MFC after: 3 days	2014-08-22 23:13:53 +00:00
delphij	626a49e1d6	MFV r270197: Illumos issue: 5066 remove support for non-ANSI compilation 5068 Remove SCCSID() macro from <macros.h> MFC after: 2 weeks	2014-08-22 22:13:36 +00:00
delphij	6922e3fedf	Provide compatibility shim for atomic_dec_64_nv. X-MFC-with: r270247 MFC after: 13 days	2014-08-21 08:25:46 +00:00
delphij	5a3c4456e4	MFV r270196: Illumos issue: 5047 don't use atomic_*_nv if you discard the return value MFC after: 2 weeks	2014-08-20 22:39:26 +00:00
delphij	d8cd2ff335	MFC r270195: Illumos issue: 5045 use atomic_{inc,dec}_* instead of atomic_add_* MFC after: 2 weeks	2014-08-20 21:44:48 +00:00
delphij	b248e9b18f	MFV r270193: Illumos issues: 5042 stop using deprecated atomic functions MFC after: 2 weeks	2014-08-20 18:29:18 +00:00
markj	ec83007481	Factor out the common code for function boundary tracing instead of duplicating the entire implementation for both x86 and powerpc. This makes it easier to add support for other architectures and has no functional impact. Phabric: D613 Reviewed by: gnn, jhibbits, rpaulo Tested by: jhibbits (powerpc) MFC after: 2 weeks	2014-08-16 21:42:55 +00:00
delphij	a160f7bc63	MFV r269542: In vdev_get_stats, check that the vdev is not a hole before computing the fragmentation. This fixes a panic when removing log device. Illumos issue: 5049 panic when removing log device Author: Alex Reece <alex@delphix.com> MFC after: 2 weeks	2014-08-05 00:07:21 +00:00
markj	9e5713a930	Return 0 for the PPID of threads in process 0, as process 0 doesn't have a parent process. MFC after: 2 weeks	2014-08-04 19:02:30 +00:00
delphij	6ba22f8d1a	Revert r269404 and use cpu_ticks() for dbuf allocation. Encode CPU's number by XOR'ing the CPU ID against the 64-bit cpu_ticks(). Reviewed by: mav, gibbs Differential Revision: https://phabric.freebsd.org/D521 MFC after: 2 weeks	2014-08-03 09:47:51 +00:00
delphij	6901832d85	MFV r269427: In dnode_children_t, use C99's "[]" idiom for declaring the variable sized array dnc_children at the end of the structure. This prevents the compiler from mistakenly optimizing away accesses beyond the array's defined size. Illumos issue: 5038 Remove "old-style" flexible array usage in ZFS. Author: Justin T. Gibbs <justing@spectralogic.com> MFC after: 2 weeks	2014-08-02 08:34:22 +00:00
ian	e5aca7d143	When arm 64-bit atomic ops are available, define ARM_HAVE_ATOMIC64. Use that symbol (which will be correct in both kernel and userland contexts) rather than just __arm__ to decide whether to use a local implementation.	2014-08-02 03:44:27 +00:00
ian	105b4c5f48	Use the 64-bit atomics now provided by arm machine/atomic.h instead of (conflicting) local versions.	2014-08-01 23:45:50 +00:00
smh	07ac26f9f6	Don't return ZIO_PIPELINE_CONTINUE from vdev_op_io_start methods This prevents recursion of vdev_queue_io_done as per r265321 but using a different method as recommended on the openzfs list. We now use zio_interrupt(zio) and return ZIO_PIPELINE_STOP instead of returning ZIO_PIPELINE_CONTINUE from vdev_*_io_start methods. zio_vdev_io_start now ASSERTS the that vdev_op_io_start returns ZIO_PIPELINE_STOP to ensure future changes don't reintroduce ZIO_PIPELINE_CONTINUE returns. Cleanup flow in vdev_geom_io_start while I'm here. Also fix some cases not using SET_ERROR(..) MFC after: 2 weeks X-MFC-With: r265321	2014-08-01 23:16:48 +00:00

... 2 3 4 5 6 ...

1429 Commits