freebsd-nq

Author	SHA1	Message	Date
Andriy Gapon	113ce413ba	MFV r329313: 8857 zio_remove_child() panic due to already destroyed parent zio illumos/illumos-gate@d6e1c446d7 `d6e1c446d7` https://www.illumos.org/issues/8857 I had an OS panic on one of our servers: ffffff01809128c0 vpanic() ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80) ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80) ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0, ffffff3373370908) ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0) ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0) ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140) ffffff0180912b40 thread_start+8() It panicked here: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/ zio.c#430 pio->io_lock is DEAD, thus a panic. Further analysis shows the "pio" (parent zio of "cio") has already been destroyed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Youzhong Yang <youzhong@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: George Wilson <george.wilson@delphix.com> PR: 223803 Tested by: shiva.bhanujan@quorum.com MFC after: 2 weeks	2018-02-15 14:46:29 +00:00
Alan Somers	d64bae1f1a	Implement .vop_pathconf and .vop_getacl for the .zfs ctldir zfsctl_common_pathconf will report all the same variables that regular ZFS volumes report. zfsctl_common_getacl will report an ACL equivalent to 555, except that you can't read xattrs or edit attributes. Fixes a bug where "ls .zfs" will occasionally print something like: ls: .zfs/.: Operation not supported PR: 225793 Reviewed by: avg MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D14365	2018-02-14 15:49:31 +00:00
Alexander Motin	bc55f7298e	Add sysctls for dnode block and indirect block shifts. MFC after: 2 weeks	2018-02-09 23:29:50 +00:00
Andriy Gapon	f11548e1da	remove a duplicate assignment There should be no functional change. MFC after: 1 week	2018-02-08 13:22:40 +00:00
Jeff Roberson	e2068d0bcd	Use per-domain locks for vm page queue free. Move paging control from global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000	2018-02-06 22:10:07 +00:00
Andriy Gapon	be4be0e5ca	zfs: move a utility function, ioflags, closer to its consumers No functional change. MFC after: 1 week	2018-02-05 14:19:36 +00:00
Andriy Gapon	6c1e03251e	ZFS ARC: restore illumos uses of 'needfree' that were removed in r325851 This is purely a cosmetic change to have a more complete copy of ifdef-ed out illumos code. MFC after: 1 week	2018-02-02 12:57:33 +00:00
Andriy Gapon	1fc46a9ba0	zfs_rezget: drop cached pages before doing anything else We did that in the case of success to prevent the use of stale cached data, but it makes even less sense to keep the cached data when we fail. Ideally, we should call vgone() on the vnode in the case of zfs_rezget failure, but the current lock order prevents us from doing that. The change also rearranges the order of unlinked check and the size change check. While there, add missing SET_ERROR in one of the error paths. MFC after: 2 weeks	2018-01-31 14:44:51 +00:00
Alexander Motin	1dbc0fc352	MFV r328253: 8835 Speculative prefetch in ZFS not working for misaligned reads illumos/illumos-gate@5cb8d943bc https://www.illumos.org/issues/8835: Sequential reads not aligned to block size are not detected by ZFS prefetcher as sequential, killing prefetch and severely hurting performance. It is caused by dmu_zfetch() in case of misaligned sequential accesses being called with overlap of one block. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Allan Jude <allanjude@freebsd.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Alexander Motin <mav@FreeBSD.org>	2018-01-22 05:57:14 +00:00
Alexander Motin	4eb2803697	MFV r328251: 8652 Tautological comparisons with ZPROP_INVAL illumos/illumos-gate@4ae5f5f06c https://www.illumos.org/issues/8652: Clang and GCC prefer to use unsigned ints to store enums. With Clang, that causes tautological comparison warnings when comparing a zfs_prop_t or zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error handling code is being silently removed as a result. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Alan Somers <asomers@gmail.com>	2018-01-22 05:52:39 +00:00
Alexander Motin	b5eb78f824	MFV r328247: 8959 Add notifications when a scrub is paused or resumed illumos/illumos-gate@301fd1d6f2 Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Sean Eric Fagan <sef@ixsystems.com>	2018-01-22 04:31:48 +00:00
Alexander Motin	abe7ff88e1	MFV r328245: 8856 arc_cksum_is_equal() doesn't take into account ABD-logic illumos/illumos-gate@01a059ee0c https://www.illumos.org/issues/8856: arc_cksum_is_equal() calls zio_push_transform() that requires abd_t* (second arg), but a void* is passed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Roman Strashkin <roman.strashkin@nexenta.com>	2018-01-22 04:23:48 +00:00
Alexander Motin	226cd471b4	MFV r328229: 8930 zfs_zinactive: do not remove the node if the filesystem is readonly illumos/illumos-gate@93c618e0f4 https://www.illumos.org/issues/8930: We normally remove an unlinked node when its last user goes away and the node becomes inactive. However, we should not do that if the filesystem is mounted read-only including the case where it has its readonly property set. The node will remain on the unlinked queue, so it will not be leaked. One particular scenario is when we receive an incremental stream into a mounted read-only filesystem and that stream contains an unlinked file (still on the unlinked queue). If that file is opened before the receive and some time later after the receive it becomes inactive we would remove it and, thus, modify the read-only filesystem. As a result, the filesystem would diverge from its source and further incremental receives would not be possible (without forcing a rollback). Another related scenario, that may or may not be possible depending on an OS / VFS policy, is when an open file is unlinked, then the filesystem is remounted read-only, and then the file is closed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Andriy Gapon <avg@FreeBSD.org>	2018-01-21 23:49:17 +00:00
Alexander Motin	272f833cde	MFV r328227: 8909 8585 can cause a use-after-free kernel panic illumos/illumos-gate@94ddd0900a https://www.illumos.org/issues/8909: There's a race condition that exists if `zil_free_lwb` races with either `zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`. Here's an example panic due to this bug: > ::status debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40 operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc) image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in mo dule "zfs" due to a NULL pointer dereference dump content: kernel pages only > $c zio_shrink+0x12() zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20) zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit+0x80(ffffff03dcd15cc0, 9a9) zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) write+0x250(42, fffffd7ff4832000, 2000) sys_syscall+0x177() If there's an outstanding lwb that's in `zil_commit_waiter_timeout` waiting to timeout, waiting on it's waiter's CV, we must be sure not to call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB may be freed and can result in a use-after-free situation where the stale lwb pointer stored in the `zil_commit_waiter_t` structure of the thread waiting on the waiter's CV is used. A similar situation can occur if an lwb is issued to disk, and thus in the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the disk is servicing that lwb. In this situation, the lwb will be freed by `zil_free_lwb`, which will result in a use-after-free situation when the lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called. This race condition is prevented in `zil_close` by calling `zil_commit` before `zil_free_lwb` is called, which will ensure all outstanding (i.e. all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states) reach the `LWB_STATE_DONE` state before the lwb's are freed (`zil_commit` will not return untill all the lwb's are `LWB_STATE_DONE`). Further, this race condition is prevented in `zil_sync` by only calling `zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set. All lwb's not in the `LWB_STATE_DONE` state will have a non-null value for this pointer; the pointer is only cleared in `zil_lwb_flush_vdevs_done`, at which point the lwb's state will be changed to `LWB_STATE_DONE`. This race is present in `zil_suspend`, leading to this bug. At first glance, it would appear as though this would not be true because `zil_suspend` will call `zil_commit`, just like `zil_close`, but the problem is that `zil_suspend` will set the zilog's `zl_suspend` field prior to calling `zil_commit`. Further, in `zil_commit`, if `zl_suspend` is set, `zil_commit` will take a special branch of logic and use `txg_wait_synced` instead of performing the normal `zil_commit` logic. This call to `txg_wait_synced` might be good enough for the data to reach disk safely before it returns, but it does not ensure that all outstanding lwb's reach the `LWB_STATE_DONE` state before it returns. This is because, if there's an lwb "stuck" in `zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will maintain a non-null value for it's `lwb_buf` field and thus `zil_sync` will not free that lwb. Thus, even though the lwb's data is already on disk, the lwb will be left lingering, waiting on the CV, and will eventually timeout and be issued to disk even though the write is unnesseary. So, after `zil_commit` is called from `zil_suspend`, we incorrectly assume that there are not outstanding lwb's, and proceed to free all lwb's found on the zilog's lwb list. As a result, we free the lwb that will later be used `zil_commit_waiter_timeout`. Reviewed by: John Kennedy <jwk404@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com>	2018-01-21 23:18:42 +00:00
Alexander Motin	d252c97fc0	MFV r328225: 8603 rename zilog's "zl_writer_lock" to "zl_issuer_lock" illumos/illumos-gate@cf07d3da99 https://www.illumos.org/issues/8603: To help make the ZIL's code more understandable, it was suggested that the zilog_t's "zl_writer_lock" field should be renamed to "zl_issuer_lock". Reviewed by: C Fraire <cfraire@me.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com>	2018-01-21 23:11:20 +00:00
Alexander Motin	442814b7e7	MFV r328220: 8677 Open-Context Channel Programs illumos/illumos-gate@a3b2868063 https://www.illumos.org/issues/8677 We want to be able to run channel programs outside of synching context. This would greatly improve performance of channel program that just gather information, as we won't have to wait for synching context anymore. This feature should introduce the following: - A new command line flag in "zfs program" to specify our intention to run in open context. - A new flag/option within the channel program ioctl which selects the context. - Appropriate error handling whenever we try a channel program in open-context that contains zfs.sync* expressions. - Documentation for the new feature in the manual pages. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2018-01-21 23:02:05 +00:00
Andriy Gapon	e09304d8f3	zfs: no need to check that size of zfs_cmd_t is not greater than IOCPARM_MAX Nowadays we do not pass zfs_cmd_t directly through the ioctl interface. Instead a small zfs_iocparm_t object is passed and the command is explicitly copied in and out. So, the check has become irrelevant. MFC after: 3 weeks Sponsored by: Panzura	2018-01-21 11:19:18 +00:00
Mark Johnston	94a889089b	Use the thread's ucred struct when fetching jid or jailname. Reported by: mjg X-MFC with: r327888	2018-01-14 17:55:40 +00:00
Mark Johnston	224e0c2f61	Add "jid" and "jailname" variables to DTrace. These return the jail ID and jail name for the traced process, respectively, and are analogous to "zonename" on Solaris/illumos. "zonename" is now aliased to "jailname". Also add some stress tests for the new variables. Submitted by: Domagoj Stolfa <domagoj.stolfa@gmail.com> Reviewed by: dteske (previous version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D13877	2018-01-12 19:59:46 +00:00
Andriy Gapon	be5060c116	zfs_mount: restore a bit of ifdef-out illumos code And correctly mark the end of the replacement FreeBSD code. MFC after: 1 week	2018-01-09 13:43:04 +00:00
Jeff Roberson	ad5b0f5b51	Fix arc after r326347 broke various memory limit queries. Use UMA features rather than kmem arena size to determine available memory. Initialize the UMA limit to LONG_MAX to avoid spurious wakeups on boot before the real limit is set. PR: 224330 (partial), 224080 Reviewed by: markj, avg Sponsored by: Netflix / Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13494	2018-01-02 04:35:56 +00:00
Dimitry Andric	e42df78a1a	Remove obsolete register keyword from opensolaris's sysmacros.h. When compiling zfsd with recent clang, it leads to a warning about the register storage class being incompatible with C++17. MFC after: 3 days	2017-12-24 19:17:15 +00:00
John Baldwin	f27d3a8a72	Don't return early for non-failure for one of the EMLINK checks. r326987 enabled two #if 0'd-out EMLINK checks in zfs_link_create() for link overflow. However, one of the checks (when the vnode adding a link is a directory such as for mkdir) always returned even if the link did not overflow. Change this to only return early if it needs to report an EMLINK error. Reported by: db, shurd Sponsored by: Chelsio Communications	2017-12-19 23:54:44 +00:00
John Baldwin	b501cc5da6	Rework pathconf handling for FIFOs. On the one hand, FIFOs should respect other variables not supported by the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.). These values are fs-specific and must come from a fs-specific method. On the other hand, filesystems that support FIFOs are required to support _PC_PIPE_BUF on directory vnodes that can contain FIFOs. Given this latter requirement, once the fs-specific VOP_PATHCONF method supports _PC_PIPE_BUF for directories, it is also suitable for FIFOs permitting a single VOP_PATHCONF method to be used for both FIFOs and non-FIFOs. To that end, retire all of the FIFO-specific pathconf methods from filesystems and change FIFO-specific vnode operation switches to use the existing fs-specific VOP_PATHCONF method. For fifofs, set it's VOP_PATHCONF to VOP_PANIC since it should no longer be used. While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that only filesystems supporting FIFOs will report a value. In addition, only report a valid _PC_PIPE_BUF for directories and FIFOs. Discussed with: bde Reviewed by: kib (part of a larger patch) MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D12572	2017-12-19 22:39:05 +00:00
John Baldwin	599afe53a8	Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf(). Having all filesystems fall through to default values isn't always correct and these values can vary for different filesystem implementations. Most of these changes just use the existing default values with a few exceptions: - Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact permissions check this claims for chown(). - Use NANDFS_NAME_LEN for NAME_MAX for nandfs. - Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to indicate hard links aren't supported. Requested by: bde (though perhaps not this exact implementation) Reviewed by: kib (earlier version) MFC after: 1 month Sponsored by: Chelsio Communications	2017-12-19 19:51:36 +00:00
John Baldwin	697a86b6bf	Adjust ZFS' link count handling for ino64. - Define a ZFS_LINK_MAX as the ZFS version of LINK_MAX which is set to UINT64_MAX to match the on-disk format. - Enable the currently #if 0'd code to check for link overflows and return EMLINK. - Don't clamp the link count reported in stat() to LINK_MAX as that is still the 16-bit limit, but report the full link counts. Also, avoid possibly overflowing the reported link count to 0 when adjusting the link count to account for ".snapshot". - Update the LINK_MAX reported by pathconf() to report ZFS_LINK_MAX rather than LINK_MAX (but clamped to LONG_MAX for 32-bit systems). Reviewed by: avg (earlier version) Sponsored by: Chelsio Communications	2017-12-19 19:07:24 +00:00
Mark Johnston	6c7828a280	Avoid CPU migration in dtrace_gethrtime() on x86. dtrace_gethrtime() may be called outside of probe context, and in particular, from the DTRACEIOC_BUFSNAP handler. Disable interrupts rather than using sched_pin() to help ensure that we don't call any external functions when in probe context. PR: 218452 MFC after: 1 week	2017-12-18 17:26:24 +00:00
Mark Johnston	7a177c2d5e	Unregister the ARC lowmem event handler earlier in arc_fini(). Otherwise a poorly timed lowmem event may attempt to acquire a destroyed lock. Unregister the handler before destroying the ARC reclaim thread. Reported by: gjb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13480	2017-12-17 18:21:40 +00:00
Mark Johnston	a981eff82e	MFV r326785: 8880 improve DTrace error checking illumos/illumos-gate@2cf374268f `2cf374268f` https://www.illumos.org/issues/8880 Reviewed by: Tim Kordas <tim.kordas@joyent.com> Reviewed by: Bryan Cantrill <bryan@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Dan McDonald <danmcd@joyent.com> Author: Jerry Jelinek <jerry.jelinek@joyent.com> MFC after: 1 week	2017-12-12 22:08:34 +00:00
Mark Johnston	f74deabac8	Correct initialization of pc on powerpc. PR: 224293 Submitted by: Breno Leitao <breno.leitao@gmail.com> X-MFC with: r326774 Pointy hat: markj	2017-12-12 20:41:11 +00:00
Mark Johnston	5bab623438	Pass the trap frame to fasttrap hooks. The DTrace fasttrap entry points expect a struct reg containing the register values of the calling thread. Perform the conversion in fasttrap rather than in the trap handler: this reduces the number of ifdefs and avoids wasting stack space for traps that don't involve DTrace. MFC after: 2 weeks	2017-12-11 19:21:39 +00:00
Warner Losh	f5b24e1c9f	Mark two things as unused (since they are only sometimes used) and toss in a DECONST to remove a const in some tricky code that would require too extensive a change to unwind otherwise. Sponsored by: Netflix	2017-12-03 04:55:33 +00:00
Warner Losh	1227a4f4ea	Fix all warnings related to geli and ZFS support on x86. Default WARNS to 0 still, since there's still some warnings on other architectures. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D13301	2017-12-02 00:07:37 +00:00
Alan Somers	3613ea59c7	Fix assertion when ZFS fails to open certain devices "panic: vdev_geom_close_locked: cp->private is NULL" This panic will result if ZFS fails to open a device due to either of the following reasons: 1) The device's sector size is greater than 8KB. 2) ZFS wants to open the device RW, but it can't be opened for writing. The solution is to change the initialization order to ensure that the assertion will be satisfied. PR: 221066 Reported by: David NewHamlet <wheelcomplex@gmail.com> Reviewed by: avg MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D13278	2017-11-30 15:36:06 +00:00
Alan Somers	fb20566033	Revert r326399 Accidentally committed wrong file Pointy hat to: asomers Sponsored by: Spectra Logic Corp	2017-11-30 15:34:55 +00:00
Alan Somers	de65823b48	Fix assertion when ZFS fails to open certain devices "panic: vdev_geom_close_locked: cp->private is NULL" This panic will result if ZFS fails to open a device due to either of the following reasons: 1) The device's sector size is greater than 8KB. 2) ZFS wants to open the device RW, but it can't be opened for writing. The solution is to change the initialization order to ensure that the assertion will be satisfied. PR: 221066 Reported by: David NewHamlet <wheelcomplex@gmail.com> Reviewed by: avg MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D13278	2017-11-30 15:28:29 +00:00
Mark Johnston	0037455148	Don't use pcpu_find() to determine if a CPU ID is valid. This addresses assertion failures after r326218. MFC after: 1 week	2017-11-27 18:42:23 +00:00
Mark Johnston	e9f63df76d	Duplicate helpers after disabling inherited tracepoints during a fork. We may create probes in the nascent child process, so we first need to ensure that any inherited tracepoints are first removed. Otherwise the probe sites will not be in the state expected by fasttrap, and it won't be able to enable the probes. MFC after: 2 weeks	2017-11-23 14:29:07 +00:00
Justin Hibbits	efa8edd5bb	PowerPC has 12 artificial frames for the profiler It may need to be different between AIM and Book-E, this was tested only on Book-E (64- and 32-bit) MFC after: 3 weeks	2017-11-22 01:53:59 +00:00
Andriy Gapon	7bcc2cfc86	zfs_write: fix problem with writes appearing to succeed when over quota The problem happens when the writes have offsets and sizes aligned with a filesystem's recordsize (maximum block size). In this scenario dmu_tx_assign() would fail because of being over the quota, but the uio would already be modified in the code path where we copy data from the uio into a borrowed ARC buffer. That makes an appearance of a partial write, so zfs_write() would return success and the uio would be modified consistently with writing a single block. That bug can result in a data loss because the writes over the quota would appear to succeed while the actual data is being discarded. This commit fixes the bug by ensuring that the uio is not changed until after all error checks are done. To achieve that the code now uses uiocopy() + uioskip() as in the original illumos design. We can do that now that uiocopy() has been updated in r326067 to use vn_io_fault_uiomove(). Reported by: mav Analyzed by: mav Reviewed by: mav Pointyhat to: avg (myself) MFC after: 1 week X-MFC after: r326067 X-Erratum: wanted	2017-11-21 18:28:14 +00:00
Andriy Gapon	bf71fe61af	make illumos uiocopy use vn_io_fault_uiomove uiocopy() is currently unused, its purpose is copy data from a uio without modifying the uio. It was in use before the vn_io_fault support was added to ZFS, at which point our code diverged from the illumos code a little bit. Because ZFS is the only (potential) user of the function we are free to modify it to better suit ZFS needs. The intention behind this change is to remove the differences introduced earlier in zfs_write(). While here, re-implement uioskip() using uiomove() with uio_segflg == UIO_NOCOPY. The story of uioskip is the same as with uiocopy. Reviewed by: mav MFC after: 1 week	2017-11-21 18:01:43 +00:00
Mark Johnston	e9a2e17d1b	Avoid holding the process in uread() and uwrite(). In general, higher-level code will atomically verify that the process is not exiting and hold the process. In one case, we were using uwrite() to copy a probed instruction to a per-thread scratch space block, but copyout() can be used for this purpose instead; this change effectively reverts r227291. MFC after: 1 week	2017-11-16 07:25:12 +00:00
Baptiste Daroussin	c07d14deb5	remove the poor emulation of the IllumOS needfree global variable to prevent the ARC reclaim thread running longer than needed. Update the arc::needfree dtrace probe triggered in arc_lowmem() to also report the value we may want to free. Submitted by: Nikita Kozlov <nikita.kozlov at blade-group.com> Reviewed by: avg Approved by: avg MFC after: 3 weeks Sponsored by: blade Differential Revision: https://reviews.freebsd.org/D12163	2017-11-15 12:48:36 +00:00
Andriy Gapon	da1bfa506f	MFV r325609: 7531 Assign correct flags to prefetched buffers illumos/illumos-gate@2729521654 `2729521654` https://www.illumos.org/issues/7531 I found that some buffers that could be L2ARC eligible are not flagged such, leading to some performance impact. As a test I ran the same IO workload 10 times in a raw. It is a metadata only workload (files listing). l2arc_noprefetch=0. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: benrubson <ben.rubson@gmail.com> MFC after: 8 days	2017-11-09 18:22:42 +00:00
Andriy Gapon	885a5425f3	MFV r325607: 8607 zfs: variable set but not used illumos/illumos-gate@b852c2f543 `b852c2f543` https://www.illumos.org/issues/8607 Reviewed by: Yuri Pankov <yuripv@gmx.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Toomas Soome <tsoome@me.com> MFC after: 1 week	2017-11-09 18:14:42 +00:00
Andriy Gapon	2e06836fee	MFV r325605: 8713 Buffer overflow in dsl_dataset_name() illumos/illumos-gate@f37ae9a714 `f37ae9a714` https://www.illumos.org/issues/8713 If we're creating a pool with version >= SPA_VERSION_DSL_SCRUB (v11) we need to account for additional space needed by the origin dataset which will also be snapshotted: "poolname"+"/"+"$ORIGIN"+"@"+"$ORIGIN". Enforce this limit in pool_namecheck(). Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: loli10K <ezomori.nozomu@gmail.com> MFC after: 1 week	2017-11-09 18:12:21 +00:00
Andriy Gapon	96ed2690df	Disable posix_fallocate(2) for ZFS The generic (naive) implementation of posix_fallocate cannot provide the standard mandated guarantee that overwrites would never fail due to the lack of free space. The fundamental reason is the copy-on-write architecture of ZFS. Other features like compression and deduplication can also increase the size difference between the (pre-)allocated dummy content and the future content. So, until ZFS can properly implement the feature it's better to report that it is unsupported rather than providing an ersatz implementation. Please note that EINVAL is used to report that the underlying file system does not support the operation (POSIX.1-2008). illumos and ZoL seem to do the same. MFC after: 3 weeks Sponsored by: Panzura	2017-11-02 13:49:08 +00:00
Andriy Gapon	40d47bb4eb	vdev_geom_close: close errored consumer even if vdev_reopening is set If vdev_geom_close doesn't close the consumer, then the subsequent call to vdev_geom_open() would be just a NOP and would always return success. Thus, at present vdev_reopen() would always succeed for vdev_geom devices even if the underlying provider is in error state. The problem was introduced as a result of an optimization in rS308055. The most significant manifistation of the problem is that zio_vdev_io_done() --> vdev_probe() --> SPA_ASYNC_PROBE --> spa_async_probe() --> vdev_reopen() chain of calls and events becomes a NOP as well. This chain is invoked when zio_vdev_io_done() detects an "unexpected" error from the lower level I/O. Additionally, that call path may race with SPA_ASYNC_REMOVE path because of the asynchronous nature of them both. So, the SPA_ASYNC_PROBE may erroneously mark a vdev as being healthy after SPA_ASYNC_REMOVE marked it as removed. Reviewed by: asomers, mav MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12731	2017-10-31 10:15:03 +00:00
Alan Somers	659058b06f	Fix the error message when creating a zpool on a too-small device Don't check for SPA_MINDEVSIZE in vdev_geom_attach when opening by path. It's redundant with the check in vdev_open, and failing to attach here results in the wrong error message being printed. However, still check for it in some other situations: * When opening by guids, so we don't get bogged down reading from slow devices like floppy drives. * In vdev_geom_read_pool_label for the same reason, because we iterate over all providers. * If the caller requests that we verify the guid, because then we'll have to read from the device before vdev_open verifies the size. PR: 222227 Reported by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> Reviewed by: avg, mav MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D12531	2017-10-23 23:05:29 +00:00
Mateusz Guzik	5a17c5524f	sdt: make all sdt probe sites test one variable This saves on cache misses at the expense of a slight grow of .text. Note this is a bandaid for lack of hotpatching. Discussed with: markj	2017-10-22 20:22:23 +00:00
Andriy Gapon	13bacc7144	remove spa_sync_on assert from spa_async_thread_vd Unlike spa_async_thread that can get started only from spa_sync() spa_async_thread_vd can get started from other contexts. Additionally, spa_async_thread_vd does not really depend on spa sync being enabled. The incorrect assert could be triggered by importing a pool in the read-only mode and then disconnecting one of its disks. In this case spa_sync_on was false because the pool was read-only and spa_async_thread_vd was started to handle SPA_ASYNC_REMOVE event. Note: spa_async_thread_vd() currently exists only in FreeBSD, it was split out of spa_async_thread() in r253990. Discussed with: mav MFC after: 2 weeks	2017-10-19 16:36:07 +00:00
Andriy Gapon	1dce7acd37	illumos mutex_init: use SX_NEW instead of bzero There should be no functional change, but SX_NEW seems to be more idiomatic to the use-case. MFC after: 2 weeks X-MFC note: stable/11 only	2017-10-09 07:44:09 +00:00
Andriy Gapon	e117882ba2	MFV r322235: 8067 zdb should be able to dump literal embedded block pointer illumos/illumos-gate@4923c69fdd `4923c69fdd` FreeBSD note: the manual page is to be updated separately. https://www.illumos.org/issues/8067 Add an option to zdb to print a literal embedded block pointer supplied on the command line: zdb -E [-A] word0:word1:...:word15 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 3 weeks	2017-10-06 08:21:06 +00:00
Andriy Gapon	a4bc304da5	remove heuristic error detection from ddi_strto() Zero, <TYPE>_MIN and <TYPE>_MAX values can result from valid conversions. They don't necessarily imply any error. Since we do not have any reliable error signaling from libkern's strto(), it's better to always assume success rather than to report an error when there is none. Reviewed by: tsoome MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12565	2017-10-05 12:25:18 +00:00
Andriy Gapon	fed20fc736	really unbreak kernel builds on sparc64 and powerpc64 after r324163, ZFS Channel Programs This commit also reverts r324178 that did not fix the problem on powerpc64 where char is usigned. MFC after: 4 weeks X-MFC with: r324163	2017-10-05 06:39:57 +00:00
Andriy Gapon	620c2c801b	MFV r323913: 8600 ZFS channel programs - snapshot illumos/illumos-gate@2840dce1a0 `2840dce1a0` https://www.illumos.org/issues/8600 ZFS channel programs should be able to create snapshots. In addition to the base snapshot functionality, this will likely entail adding extra logic to handle edge cases which were formerly not possible, such as creating then destroying a snapshot in the same transaction sync. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Chris Williamson <chris.williamson@delphix.com> MFC after: 5 weeks X-MFC after: r324163	2017-10-02 11:32:08 +00:00
Andriy Gapon	a1f65a15ce	MFV r323912: 8592 ZFS channel programs - rollback illumos/illumos-gate@000cce6b6f `000cce6b6f` https://www.illumos.org/issues/8592 ZFS channel programs should be able to perform a rollback. This logic will probably look pretty similar to zfs.sync.destroy(). Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Brad Lewis <brad.lewis@delphix.com> MFC after: 5 weeks X-MFC after: r324163	2017-10-02 11:23:31 +00:00
Andriy Gapon	8cdc315e9e	MFV r323795: 8604 Avoid unnecessary work search in VFS when unmounting snapshots illumos/illumos-gate@ed992b0aac `ed992b0aac` https://www.illumos.org/issues/8604 Every time we want to unmount a snapshot (happens during snapshot deletion or renaming) we unnecessarily iterate through all the mountpoints in the VFS layer (see zfs_get_vfs). Ideally we would just put a hold on the snapshot and access its respective VFS resource directly. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com> FreeBSD note: I added a FreeBSD specific function getzfsvfs_ref() which is like getzfsvfs() but returns a filesystem referenced, not busied. We want a busied filesystem in most cases, because we access its private data and, thus, we need to prevent the filesystem from being unmounted and its private data destroyed. But in some cases we can either get away with just a referenced filesystem or we must not busy the filesystem. Unmounting the filesystem is one of such cases. MFC after: 5 weeks X-MFC after: r324163	2017-10-02 11:15:32 +00:00
Andriy Gapon	55b73eb9c5	fix incorrect use of getzfsvfs_impl in r324163, ZFS Channel Programs getzfsvfs_impl() returns a referenced, not busied, filesystem, so the matching call is vfs_rel, not vfs_unbusy. MFC after: 4 weeks X-MFC with: r324163	2017-10-02 11:07:48 +00:00
Andriy Gapon	6a2f82bdf8	unbreak kernel builds on sparc64 and powerpc after r324163, ZFS Channel Programs The custom iscntrl() in ZFS Lua code expects a signed argumnet, so remove the harmful cast. Reported by: ian MFC after: 5 weeks X-MFC with: r324163	2017-10-01 20:12:30 +00:00
Andriy Gapon	3e52a05570	MFV r323794: 8605 zfs channel programs: zfs.exists undocumented and non-working illumos/illumos-gate@5f39f884e2 `5f39f884e2` https://www.illumos.org/issues/8605 zfs.exists() in channel programs doesn't return any result, and should have a man page entry. Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Chris Williamson <chris.williamson@delphix.com> MFC after: 5 weeks X-MFC after: r324163	2017-10-01 16:51:05 +00:00
Andriy Gapon	529b326e63	MFV r323531: 8521 nvlist memory leak in get_clones_stat() and spa_load_best() illumos/illumos-gate@7d3000f774 `7d3000f774` https://www.illumos.org/issues/8521 Yuri reported this to the mailing list: doing a `reboot -d` on current illumos-gate HEAD gives the following ":: findleaks -dv" output: findleaks: maximum buffers => 301061 findleaks: actual buffers => 297587 findleaks: findleaks: potential pointers => 29289774 findleaks: dismissals => 26242305 (89.5%) findleaks: misses => 331153 ( 1.1%) findleaks: dups => 2419681 ( 8.2%) findleaks: follows => 296635 ( 1.0%) findleaks: findleaks: peak memory usage => 7353 kB findleaks: elapsed CPU time => 1.5 seconds findleaks: elapsed wall time => 2.0 seconds findleaks: CACHE LEAKED BUFCTL CALLER ffffff03d222b008 120 ffffff03ef7ceb78 nv_alloc_sys+0x1f ffffff03d222a448 123 ffffff03f4150cc8 nv_alloc_sys+0x1f ffffff03d222b448 5 ffffff03f28bd598 nv_alloc_sys+0x1f ffffff03d222b888 87 ffffff03f28c10f0 nv_alloc_sys+0x1f ffffff03d222c008 21 ffffff03f4139310 nv_alloc_sys+0x1f ffffff03d222b888 43 ffffff040ef3f3e8 nv_alloc_sys+0x1f ffffff03d222c008 120 ffffff03f4591e58 nv_alloc_sys+0x1f ffffff03d222b008 121 ffffff03f352c068 nv_alloc_sys+0x1f ffffff03d222a448 112 ffffff03f414e5f8 nv_alloc_sys+0x1f ffffff03d222b008 119 ffffff03ee92fdc0 nv_alloc_sys+0x1f ffffff03d222b888 46 ffffff03f28c1378 nv_alloc_sys+0x1f ffffff03d222b448 4 ffffff03f28c7708 nv_alloc_sys+0x1f ffffff03d222c008 20 ffffff03f2a6e7e8 nv_alloc_sys+0x1f Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuripv@gmx.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com> MFC after: 5 weeks X-MFC after: r324163	2017-10-01 16:41:05 +00:00
Andriy Gapon	550374efe6	revert r324166, it has an unrelated change in it	2017-10-01 16:37:54 +00:00
Andriy Gapon	7d5c6491f0	MFV r323531: 8521 nvlist memory leak in get_clones_stat() and spa_load_best() illumos/illumos-gate@7d3000f774 `7d3000f774` https://www.illumos.org/issues/8521 Yuri reported this to the mailing list: doing a `reboot -d` on current illumos-gate HEAD gives the following ":: findleaks -dv" output: findleaks: maximum buffers => 301061 findleaks: actual buffers => 297587 findleaks: findleaks: potential pointers => 29289774 findleaks: dismissals => 26242305 (89.5%) findleaks: misses => 331153 ( 1.1%) findleaks: dups => 2419681 ( 8.2%) findleaks: follows => 296635 ( 1.0%) findleaks: findleaks: peak memory usage => 7353 kB findleaks: elapsed CPU time => 1.5 seconds findleaks: elapsed wall time => 2.0 seconds findleaks: CACHE LEAKED BUFCTL CALLER ffffff03d222b008 120 ffffff03ef7ceb78 nv_alloc_sys+0x1f ffffff03d222a448 123 ffffff03f4150cc8 nv_alloc_sys+0x1f ffffff03d222b448 5 ffffff03f28bd598 nv_alloc_sys+0x1f ffffff03d222b888 87 ffffff03f28c10f0 nv_alloc_sys+0x1f ffffff03d222c008 21 ffffff03f4139310 nv_alloc_sys+0x1f ffffff03d222b888 43 ffffff040ef3f3e8 nv_alloc_sys+0x1f ffffff03d222c008 120 ffffff03f4591e58 nv_alloc_sys+0x1f ffffff03d222b008 121 ffffff03f352c068 nv_alloc_sys+0x1f ffffff03d222a448 112 ffffff03f414e5f8 nv_alloc_sys+0x1f ffffff03d222b008 119 ffffff03ee92fdc0 nv_alloc_sys+0x1f ffffff03d222b888 46 ffffff03f28c1378 nv_alloc_sys+0x1f ffffff03d222b448 4 ffffff03f28c7708 nv_alloc_sys+0x1f ffffff03d222c008 20 ffffff03f2a6e7e8 nv_alloc_sys+0x1f Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuripv@gmx.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com> MFC after: 5 weeks X-MFC after: r324163	2017-10-01 16:34:16 +00:00
Andriy Gapon	bda88d07d9	MFV r323530,r323533,r323534: 7431 ZFS Channel Programs, and followups 7431 ZFS Channel Programs illumos/illumos-gate@dfc115332c `dfc115332c` https://www.illumos.org/issues/7431 ZFS channel programs (ZCP) adds support for performing compound ZFS administrative actions via Lua scripts in a sandboxed environment (with time and memory limits). This initial commit includes both base support for running ZCP scripts, and a small initial library of API calls which support getting properties and listing, destroying, and promoting datasets. Testing: in addition to the included unit tests, channel programs have been in use at Delphix for several months for batch destroying filesystems. The dsl_destroy_snaps_nvl() call has also been replaced with Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Chris Williamson <chris.williamson@delphix.com> 8552 ZFS LUA code uses floating point math illumos/illumos-gate@916c8d8811 `916c8d8811` https://www.illumos.org/issues/8552 In the LUA interpreter used by "zfs program", the lua format() function accidentally includes support for '%f' and friends, which can cause compilation problems when building on platforms that don't support floating-point math in the kernel (e.g. sparc). Support for '%f' friends (%f %e %E %g %G) should be removed, since there's no way to supply a floating-point value anyway (all numbers in ZFS LUA are int64_t's). Reviewed by: Yuri Pankov <yuripv@gmx.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Dan McDonald <danmcd@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> 8590 memory leak in dsl_destroy_snapshots_nvl() illumos/illumos-gate@e6ab4525d1 `e6ab4525d1` https://www.illumos.org/issues/8590 In dsl_destroy_snapshots_nvl(), "snaps_normalized" is not freed after it is added to "arg". Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> FreeBSD notes: - zfs-program.8 manual page is taken almost as is from the vendor repository, no FreeBSD-ification done - fixed multiple instances of NULL being used where an integer is expected - replaced ETIME and ECHRNG with ETIMEDOUT and EDOM respectively This commit adds a modified version of Lua 5.2.4 under sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua, mirroring the upstream. See README.zfs in that directory for the description of Lua customizations. See zfs-program.8 on how to use the new feature. MFC after: 5 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D12528	2017-10-01 16:11:07 +00:00
Mark Johnston	47f11baaca	Use C99 initializers for DTrace provider methods. This makes the definitions easier to read and more cscope-friendly. MFC after: 1 week	2017-09-27 17:46:38 +00:00
Andriy Gapon	443efc868c	fix r324011, MFV of r323535, 8585 improve batching done in zil_commit() I managed to commit an older version of the change. Plus, even the latest version was not ready for userland compilation. Reported by: "O. Hartmann" <ohartmann@walstatt.org>, cy MFC after: 1 week X-MFC with: r324011	2017-09-26 15:38:16 +00:00
Andriy Gapon	c13f1d82c8	MFV r323535: 8585 improve batching done in zil_commit() FreeBSD notes: - this MFV reverts FreeBSD commit r314549 to make the merge easier - at present our emulation of cv_timedwait_hires is rather poor, so I elected to use cv_timedwait_sbt directly Please see the differential revision for details. Unfortunately, I did not get any positive reviews, so there could be bugs in the FreeBSD-specific piece of the merge. Hence, the long MFC timeout. illumos/illumos-gate@1271e4b10d `1271e4b10d` https://www.illumos.org/issues/8585 The current implementation of zil_commit() can introduce significant latency, beyond what is inherent due to the latency of the underlying storage. The additional latency comes from two main problems: 1. When there's outstanding ZIL blocks being written (i.e. there's already a "writer thread" in progress), then any new calls to zil_commit() will block waiting for the currently oustanding ZIL blocks to complete. The blocks written for each "writer thread" is coined a "batch", and there can only ever be a single "batch" being written at a time. When a batch is being written, any new ZIL transactions will have to wait for the next batch to be written, which won't occur until the current batch finishes. As a result, the underlying storage may not be used as efficiently as possible. While "new" threads enter zil_commit() and are blocked waiting for the next batch, it's possible that the underlying storage isn't fully utilized by the current batch of ZIL blocks. In that case, it'd be better to allow these new threads to generate (and issue) a new ZIL block, such that it could be serviced by the underlying storage concurrently with the other ZIL blocks that are being serviced. 2. Any call to zil_commit() must wait for all ZIL blocks in its "batch" to complete, prior to zil_commit() returning. The size of any given batch is proportional to the number of ZIL transaction in the queue at the time that the batch starts processing the queue; which doesn't occur until the previous batch completes. Thus, if there's a lot of transactions in the queue, the batch could be composed of many ZIL blocks, and each call to zil_commit() will have to wait for all of these writes to complete (even if the thread calling zil_commit() only cared about one of the transactions in the batch). Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12355	2017-09-26 11:04:08 +00:00
Ian Lepore	a78b4d1462	Use nstosbt() instead of multiplying by SBT_1NS to avoid roundoff errors. Differential Revision: https://reviews.freebsd.org/D11779	2017-09-25 15:03:27 +00:00
Andriy Gapon	f94aa61c33	MFV r323917: 8648 Fix range locking in ZIL commit codepath illumos/illumos-gate@42b1411172 `42b1411172` https://www.illumos.org/issues/8648 I'm opening this bug to track integration of the following ZFS on Linux commit into illumos: commit `f763c3d1df` Author: LOLi <loli10K@users.noreply.github.com> Date: Mon Aug 21 17:59:48 2017 +0200 Fix range locking in ZIL commit codepath Since OpenZFS 7578 (`1b7c1e5`) if we have a ZVOL with logbias=throughput we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data function: zvol_get_data(). Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf. Fix this by taking a range lock on the whole block in zvol_get_data(). Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Alexander Motin <mav@FreeBSD.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: LOLi <loli10K@users.noreply.github.com> MFC after: 10 days	2017-09-22 08:27:27 +00:00
Andriy Gapon	1c6ea90df5	MFV r323914: 8661 remove "zil-cw2" dtrace probe illumos/illumos-gate@bd9d3f9046 `bd9d3f9046` https://www.illumos.org/issues/8661 The "zil-cw1" dtrace probe was previously removed in 8558, and the "zil-cw2" probe should have been removed in that patch as well. Unfortunately, the "zil- cw2" was not removed in 8558, so this bug is to track it's removal. Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> MFC after: 1 week	2017-09-22 08:21:14 +00:00
Alan Somers	cd037f075c	MFV r323789: 8473 scrub does not detect errors on active spares illumos/illumos-gate@554675eee7 `554675eee7` https://www.illumos.org/issues/8473 Scrubbing is supposed to detect and repair all errors in the pool. However, it wrongly ignores active spare devices. The problem can easily be reproduced in OpenZFS at git rev 0ef125d with these commands: truncate -s 64m /tmp/a /tmp/b /tmp/c sudo zpool create testpool mirror /tmp/a /tmp/b spare /tmp/c sudo zpool replace testpool /tmp/a /tmp/c /bin/dd if=/dev/zero bs=1024k count=63 oseek=1 conv=notrunc of=/tmp/c sync sudo zpool scrub testpool zpool status testpool # Will show 0 errors, which is wrong sudo zpool offline testpool /tmp/a sudo zpool scrub testpool zpool status testpool # Will show errors on /tmp/c, # which should've already been fixed FreeBSD head is partially affected: the first scrub will detect some errors, but the second scrub will detect more. Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> MFC after: 1 week Sponsored by: Spectra Logic Corp	2017-09-20 16:31:00 +00:00
Andriy Gapon	aacd0b4bb2	add vfs_zfs.abd_chunk_size tunable It is reported that the default value of 4KB results in a substantial memory use overhead (at least, on some configurations). Using 1KB seems to reduce the overhead significantly. PR: 222377 Reported by: Sean Chittenden <sean@chittenden.org> MFC after: 1 week	2017-09-20 08:36:31 +00:00
Andriy Gapon	3d5487d981	fix memory leak in g_bio zone introduced in r320452, another ABD fallout I overlooked the fact that that ZIO_IOCTL_PIPELINE does not include ZIO_STAGE_VDEV_IO_DONE stage. We do allocate a struct bio for an ioctl zio (a disk cache flush), but we never freed it. This change splits bio handling into two groups, one for normal read/write i/o that passes data around and, thus, needs the abd data tranform; the other group is for "data-less" i/o such as trim and cache flush. PR: 222288 Reported by: Dan Nelson <dnelson@allantgroup.com> Tested by: Borja Marcos <borjam@sarenet.es> MFC after: 10 days	2017-09-20 08:27:21 +00:00
Andriy Gapon	8c9377cde7	MFV r323792: 8602 remove unused "dp_early_sync_tasks" field from "dsl_pool" structure illumos/illumos-gate@2bcb545854 `2bcb545854` https://www.illumos.org/issues/8602 When I landed the fix for 8558, I incorrectly added the "dp_early_sync_tasks" field to the "dsl_pool" structure. This field is used in DelphixOS, but not in illumos. It was incorrectly pulled into illumos, so this bug is to remove it from the structure. Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> MFC after: 1 week	2017-09-20 07:26:52 +00:00
Andriy Gapon	cbc785c293	dounmount: do not release the mount point's reference on the covered vnode As long as mnt_ref is not zero there can be a consumer that might try to access mnt_vnodecovered. For this reason the covered vnode must not be freed until mnt_ref goes to zero. So, move the release of the covered vnode to vfs_mount_destroy. Reviewed by: kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D12329	2017-09-14 08:47:06 +00:00
Andriy Gapon	86261a95ed	slightly simplify zfs_vptocnp It's not necessary to look up the parent's ID to check if the node is the root node of the filesystem. MFC after: 2 weeks	2017-09-13 07:09:58 +00:00
Toomas Soome	b40aaca6dd	loader should support large_dnode The zfsonlinux feature large_dnode is not yet supported by the loader. Reviewed by: avg, allanjude Differential Revision: https://reviews.freebsd.org/D12288	2017-09-12 13:45:04 +00:00
Andriy Gapon	1a2ddb2997	fix a fallout from the ZTOV tightening, r323479 MFC after: 13 days X-MFC with: r323479	2017-09-12 13:21:14 +00:00
Andriy Gapon	bcab65cab5	zfsctl_snapdir_lookup should be able to handle an uncovered vnode The uncovered vnode is possible because there is no guarantee that its hold count would go to zero (and it would be inactivated and reclaimed) immediately after a covering filesystem is unmounted. So, such a vnode should be expected and it is possible to re-use it without any trouble. MFC after: 3 weeks Sponsored by: Panzura	2017-09-12 06:06:58 +00:00
Andriy Gapon	c09d0da8d1	zfs_ctldir: remove obsolete / bogus ARGSUSED lint directives None of the tagged functions had unused parameters. MFC after: 1 week	2017-09-12 06:05:30 +00:00
Andriy Gapon	65b38f7311	zfsvfs_hold: assert that the busied filesystem can not be unmounted This is a FreeBSD specific feature. MFC after: 3 weeks Sponsored by: Panzura	2017-09-12 06:04:50 +00:00
Andriy Gapon	d092f79489	zfs_get_vfs: reference a requested filesystem instead of vfs_busy-ing it The only consumer of zfs_get_vfs, zfs_unmount_snap, does not need the filesystem to be busy, it just need a reference that it can pass to dounmount. Also, previously the code was racy as it unbusied the filesystem before taking a reference on it. Now the code should be simpler and safer. MFC after: 2 weeks Sponsored by: Panzura	2017-09-12 06:04:01 +00:00
Andriy Gapon	f7519dbb76	zfs: tighten debug versions of ZTOV and VTOZ MFC after: 2 weeks Sponsored by: Panzura	2017-09-12 06:02:21 +00:00
Andriy Gapon	970165f190	MFV r323111: 8569 problem with inline functions in abd.h illumos/illumos-gate@37e84ab74e `37e84ab74e` https://www.illumos.org/issues/8569 C [C99] has peculiar rules for inline functions that are different from the C++ rules. Unlike C++ where inline is "fire and forget", in C a programmer must pay attention to the function's storage class / visibility. The main problem is with the case where a compiler decides to not inline a call to the function declared as inline. Some relevant links: - http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15831.html - http://www.drdobbs.com/the-new-c-inline-functions/184401540 The summary is that either the inline functions should be declared 'static inline' or one of the compilation units (.c files) must provide a callable externally visible function definition. In the former case, the compiler would automatically create a local non-inlined function instance in every compilation unit where it's needed. In the latter case the single external definition is used to satisfy any non-inlined calls in all compilation units. As things stand right now, we can get an undefined reference error under certain combinations of compilers and compiler options. For example, this is what I get on FreeBSD when compiling with clang 4.0.0 and -O1: In function `abd_free': /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:385: undefined reference to `abd_is_linear' Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 1 week	2017-09-11 12:15:49 +00:00
Andriy Gapon	25625d8746	Revert r322601, Mark ZFS ABD inline functions static An alternative fix is to be merged from illumos shortly.	2017-09-11 12:08:20 +00:00
Andriy Gapon	3d9a0e564d	MFV r323110: 8558 lwp_create() returns EAGAIN on system with more than 80K ZFS filesystems illumos/illumos-gate@216d7723a1 `216d7723a1` https://www.illumos.org/issues/8558 On a system with more than 80K ZFS filesystems, we've seen cases where lwp_create() will start to fail by returning EAGAIN. The problem being, for each of those 80K ZFS filesystems, a taskq will be created for each dataset as part of the ZIL for each dataset. For each of these taskq's, a kernel thread will be created which results in 24KB being allocated for each thread. With enough of these 24KB allocations, we eventually exhaust the memory region set aside for these allocations. Currently, segkpsize is set to a value of 2GB, which means we can only support about 80K filesystems; 2GB / 24KB = ~80K. The lwp_create() failure comes into play due to the fact that LWP creation also allocates 24KB from this same region of memory. Thus, if we've exhausted this region of memory due to the number of ZIL taskq's, there won't be any memory avaible to allow the call to lwp_create() to succeed. FreeBSD note: I haven't created sysctl-s for the new ZIL clean parameters. Let's add them if anyone requires to tune them. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> MFC after: 3 weeks	2017-09-11 11:31:43 +00:00
Andriy Gapon	90354c3200	MFV r323107: 8414 Implemented zpool scrub pause/resume illumos/illumos-gate@1702cce751 `1702cce751` FreeBSD note: rather than merging the zpool.8 update I copied the zpool scrub section from the illumos zpool.1m to FreeBSD zpool.8 almost verbatim. Now that the illumos page uses the mdoc format, it was an easier option. Perhaps the change is not in perfect compliance with the FreeBSD style, but I think that it is acceptible. https://www.illumos.org/issues/8414 This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167 Currently, there is no way to pause a scrub. Pausing may be useful when the pool is busy with other I/O to preserve bandwidth. Description This patch adds the ability to pause and resume scrubbing. This is achieved by maintaining a persistent on-disk scrub state. While the state is 'paused' we do not scrub any more blocks. We do however perform regular scan housekeeping such as freeing async destroyed and deadlist blocks while paused. Motivation and Context Scrub pausing can be an I/O intensive operation and people have been asking for the ability to pause a scrub for a while. This allows one to preserve scrub progress while freeing up bandwidth for other I/O. Reviewed by: George Melikov <mail@gmelikov.ru> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Alek Pinchuk <apinchuk@datto.com> MFC after: 2 weeks	2017-09-09 11:00:07 +00:00
Kurt Lidl	a8273e4371	Enable dtrace support for mips64 and the ERL kernel config Turn on the required options in the ERL config file, and ensure that the fbt module is listed as a dependency for mips in the modules/dtrace/dtraceall/dtraceall.c file. PR: 220346 Reviewed by: gnn, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12227	2017-09-06 03:19:52 +00:00
Alan Somers	0df3d85af4	Fix remounting ZFS filesystem with "zfs mount" "zfs mount -o" passes a list of mount options directly to nmount(2) after sanity checking them. In particular, zfs(8) will refuse to mount an already existing file system unless "remount" is specified in the option list. However, the "remount" option only exists in Illumos. FreeBSD's equivalent is "update". PR: 221985 Reviewed by: avg MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D12233	2017-09-05 19:40:04 +00:00
Baptiste Daroussin	91f1caeccb	Add sysctls for arc shrinking and growing values The default value for arc_no_grow_shift may not be optimal when using several GiB ARC. Expose it via sysctl allows users to tune it easily. Also expose arc_grow_retry via sysctl for the same reason. The default value of 60s might, in case of intensive load, be too long. Submitted by: Nikita Kozlov <nikita.kozlov@blade-group.com> Reviewed by: mav, manu, bapt MFC after: 2 weeks Sponsored by: blade Differential Revision: https://reviews.freebsd.org/D12144	2017-08-31 13:02:17 +00:00
Ed Maste	3c3d2ba6fe	zfs: do not advertise edonr which is not yet supported illumos 4185 ("add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R") was intentionally merged only partially in r289422, without adding support for skein, sha512 and edonr on FreeBSD. Support for skein and sha512 was added later on, but edonr is still not implemented in FreeBSD. Prior to this commit zfs(8) correctly rejected edonr, but with an error message that claimed support: fk@r500 ~ $zfs set checksum=edonr tank cannot set property for 'tank': 'checksum' must be one of 'on \| off \| fletcher2 \| fletcher4 \| sha256 \| sha512 \| skein \| edonr' PR: 204055 Submitted by: Fabian Keil Approved by: allanjude Obtained from: ElectroBSD MFC after: 1 week	2017-08-29 22:24:22 +00:00
John Baldwin	ac3b479ec8	Add a guard around _ILP32 for mips. This is already done for other architectures in this file and fixes the build with clang.	2017-08-21 17:45:06 +00:00
John Baldwin	b99836cea9	Mark ZFS ABD inline functions static. When built with -fno-inline-functions zfs.ko contains undefined references to these functions if they are only marked inline. Reviewed by: avg (earlier version) MFC after: 1 week Sponsored by: Chelsio Communications	2017-08-16 23:40:32 +00:00
Alan Somers	69b14f7acd	Fix some ZFS debugging messages sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Be more careful about the use of provider names vs vdev names in ZFS_LOG statements. MFC after: 3 weeks Sponsored by: Spectra Logic Corp	2017-08-15 15:20:04 +00:00
Andriy Gapon	984d43cca5	MFV r322242: 8373 TXG_WAIT in ZIL commit path illumos/illumos-gate@d28671a3b0 `d28671a3b0` https://www.illumos.org/issues/8373 The code that writes ZIL blocks uses dmu_tx_assign(TXG_WAIT) to assign a transaction to a transaction group. That seems to be logically incorrect as writing of the ZIL block does not introduce any new dirty data. Also, when there is a lot of dirty data, the call can introduce significant delays into the ZIL commit path, thus affecting all synchronous writes. Additionally, ARC throttling may affect the ZIL writing. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 2 weeks	2017-08-08 11:26:03 +00:00
Andriy Gapon	2653426e89	MFV r322240: 8491 uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS illumos/illumos-gate@79c2b812ee `79c2b812ee` https://www.illumos.org/issues/8491 The zpool checkpoint feature in DxOS added a new field in the uberblock. The Multi-Modifier Protection Pull Request from ZoL adds two new fields in the uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279). As these two changes come from two different sources and once upstreamed and deployed will introduce an incompatibility with each other we want to upstream a change that will reserve the padding for both of them so integration goes smoothly and everyone gets both features. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Olaf Faaland <faaland1@llnl.gov> Approved by: Gordon Ross <gwr@nexenta.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com> MFC after: 3 weeks	2017-08-08 11:21:58 +00:00
Andriy Gapon	6f2f8727e3	MFV r322238: 7915 checks in l2arc_evict could use some cleaning up illumos/illumos-gate@267ae6c3a8 `267ae6c3a8` https://www.illumos.org/issues/7915 l2arc_evict() is strictly serialized with respect to l2arc_write_buffers() and l2arc_write_done(). Normally, l2arc_evict() and l2arc_write_buffers() are called from the same thread, so they can not be concurrent. Also, l2arc_write_buffers() uses zio_wait() on the parent zio of all cache zio-s. That ensures that l2arc_write_done() is completed before l2arc_write_buffers() returns. Finally, if a cache device is removed, then l2arc_evict() is called under SCL_ALL in the exclusive mode. That ensures that it can not be concurrent with the normal L2ARC accesses to the device (including writing and evicting buffers). Given the above, some checks and actions in l2arc_evict() do not make sense. For instance, it must never encounter the write head header let alone remove it from the buffer list. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 2 weeks	2017-08-08 11:19:14 +00:00
Andriy Gapon	3cf2ea1aea	MFV r322236: 8126 ztest assertion failed in dbuf_dirty due to dn_nlevels changing illumos/illumos-gate@dcb6872c56 `dcb6872c56` https://www.illumos.org/issues/8126 The sync thread is concurrently modifying dn_phys->dn_nlevels while dbuf_dirty() is trying to assert something about it, without holding the necessary lock. We need to move this assertion further down in the function, after we have acquired the dn_struct_rwlock. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 2 weeks	2017-08-08 11:14:40 +00:00
Andriy Gapon	98a9c8e68a	zfs: no need for __DECONST after abd constification in r322233 Note that vdev_label_write_pad2() is FreeBSD specific. MFC after: 2 weeks X-MFC after: r322233	2017-08-08 11:07:34 +00:00

1 2 3 4 5 ...

1915 Commits