freebsd-dev

Author	SHA1	Message	Date
Andriy Gapon	a03fb1cf6a	mount_snapshot: consolidate all error handling This makes sure that the original vnode is always unlocked and released if any error happens. MFC after: 4 weeks	2016-05-16 06:30:25 +00:00
Andriy Gapon	3055925d42	zfsctl: fix several problems with reference counts * Remove excessive references on a snapshot mountpoint vnode. zfsctl_snapdir_lookup() called VN_HOLD() on a vnode returned from zfsctl_snapshot_mknode() and the latter also had a call to VN_HOLD() on the same vnode. On top of that gfs_dir_create() already returns the vnode with the use count of 1 (set in getnewvnode). So there was 3 references on the vnode. * mount_snapshot() should keep a reference to a covered vnode. That reference is owned by the mountpoint (mounted snapshot filesystem). * Remove cryptic manipulations of a covered vnode in zfs_umount(). FreeBSD dounmount() already does the right thing and releases the covered vnode. PR: 207464 Reported by: dustinwenz@ebureau.com Tested by: Howard Powell <hpowell@lighthouseinstruments.com> MFC after: 3 weeks	2016-05-16 06:24:04 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Enji Cooper	622282b3b8	Include arpa/inet.h to get the htonl(3) definition MFC after: 2 weeks Reported by: clang Sponsored by: EMC / Isilon Storage Division	2016-05-13 11:15:33 +00:00
Conrad Meyer	3b56262303	compat/opensolaris: Don't redefined off64_t if already defined A follow-up to r299456. Reported by: gjb Sponsored by: EMC / Isilon Storage Division	2016-05-11 16:05:32 +00:00
Alexander Motin	c59a902fa3	MFV r299453: 6765 zfs_zaccess_delete() comments do not accurately reflect delete permissions for ACLs Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Author: Kevin Crowe <kevin.crowe@nexenta.com> openzfs/openzfs@a40149b935	2016-05-11 13:53:29 +00:00
Alexander Motin	0eb65a5367	MFV r299451: 6764 zfs issues with inheritance flags during chmod(2) with aclmode=passthrough Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Author: Albert Lee <trisk@nexenta.com> openzfs/openzfs@1bcf0d240b	2016-05-11 13:50:34 +00:00
Alexander Motin	85a69dbf66	MFV r299449: 6763 aclinherit=restricted masks inherited permissions by group perms (groupmask) Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Author: Albert Lee <trisk@nexenta.com> openzfs/openzfs@eebb483d0c	2016-05-11 13:48:15 +00:00
Alexander Motin	2a219f349e	MFV r299442: 6762 POSIX write should imply DELETE_CHILD on directories - and some additional considerations Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Author: Kevin Crowe <kevin.crowe@nexenta.com> openzfs/openzfs@d316fffc9c	2016-05-11 13:43:20 +00:00
Alexander Motin	42a54f9745	MFV r299440: 6736 ZFS per-vdev ZAPs Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Joe Stein <joe.stein@delphix.com> openzfs/openzfs@215198a6ad	2016-05-11 12:54:00 +00:00
Alexander Motin	7d54dbae83	MFV r299438: 6842 Fix empty xattr dir causing lockup Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Chunwei Chen <tuxoko@gmail.com> openzfs/openzfs@02525cd08f	2016-05-11 12:46:07 +00:00
Alexander Motin	d7ff478705	MFV r299436: 6843 Make xattr dir truncate and remove in one tx Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Chunwei Chen <tuxoko@gmail.com> openzfs/openzfs@399cc7d5d9	2016-05-11 12:43:54 +00:00
Alexander Motin	0b99ac761e	MFV r299434: 6841 Undirty freed spill blocks Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Tim Chase <tim@chase2k.com> openzfs/openzfs@445e67805d	2016-05-11 12:38:07 +00:00
Ruslan Bukin	d7dc6bae03	Implement FBT provider (MD part) for DTrace on MIPS. Tested on MIPS64. Sponsored by: DARPA, AFRL Sponsored by: HEIF5	2016-05-05 13:54:50 +00:00
Alan Somers	c9a807447d	Fix a use-after-free when "zpool import" fails clear vd->vdev_tsd in vdev_geom_close_locked instead of vdev_geom_detach. In the latter function, it would fail to happen in certain circumstances where cp->private was unset. Ideally, the latter should never happen, but it can happen when vdev open fails, or where spares are involved. MFC after: 4 weeks X-MFC-With: 298786 Sponsored by: Spectra Logic Corp	2016-04-29 21:29:37 +00:00
Andriy Gapon	27b6c49726	add invpcid instruction to i386 dtrace disassembler tables MFC after: 2 weeks	2016-04-29 15:45:22 +00:00
Alan Somers	663f649ff6	Refactor vdev_geom_attach and friends to reduce code duplication sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Move checks for provider's sectorsize and mediasize into a single location in vdev_geom_attach. Remove the zfs::vdev::taste class; it's ok to use the regular vdev class for tasting. Consolidate guid checks into a single location in vdev_attach_ok. Consolidate some error handling code from vdev_geom_attach into vdev_geom_detach, closing a resource leak of geom consumers in the process. Reviewed by: avg MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D5974	2016-04-29 15:23:51 +00:00
Mark Johnston	676a03fa6a	Increase DTRACE_FUNCNAMELEN from 128 to 192. This allows for the long function components encountered in www/firefox. This constant is part of DTrace's userland ABI, so this change may not be MFC'ed. PR: 207735	2016-04-25 18:44:11 +00:00
Mark Johnston	328d8adb9b	Allow DOF sections with excessively long probe function components. Without this change, DTrace will refuse to load a DOF section if the function component of any of its probes exceeds DTRACE_FUNCNAMELEN (128). Probes in C++ programs can have very long function components. Rather than rejecting all probes if a single probe exceeds the limit, simply skip the invalid probe and emit a warning. This ensures that valid probes are instantiated. PR: 207735 MFC after: 2 weeks	2016-04-25 18:40:57 +00:00
Mark Johnston	cd8bbc382d	Add a kern.dtrace.err_verbose sysctl to control dtrace_err_verbose. When this flag is turned on, DOF and DIF validation errors are printed to the kernel message buffer. This is useful for debugging. Also remove the debug.dtrace.debug sysctl, which has no effect.	2016-04-25 18:09:36 +00:00
Andriy Gapon	2d69831b85	lahf/sahf are supported on some amd64 processors While the instructions were not included into the original instruction set, their support can be indicated by a special feature bit. For example: CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU) ... AMD Features2=0x37ff<LAHF, ...> Clang 3.8 uses lahf/sahf as a faster alternative to pushf/popf where possible. MFC after: 2 weeks	2016-04-22 13:44:12 +00:00
Andriy Gapon	dbbcddb426	MFV r298471: 6052 decouple lzc_create() from the implementation details illumos/illumos-gate@26455f9efc `26455f9efc` https://www.illumos.org/issues/6052 At the moment type parameter of lzc_create() is of dmu_objset_type_t type. That exposes an implementation detail and requires sys/fs/zfs.h to be included in libzfs_core.h creating unnecessary coupling between libzfs_core interface and ZFS internals. I think that dmu_objset_type_t should be replaced with a libzfs_core enumeration of supported dataset types. For ABI reasons the new enumeration could be bit-compatible with dmu_objset_type_t. For example: typedef enum { LZC_DST_ZFS = 2, LZC_DST_ZVOL } lzc_dataset_type_t; Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Andriy Gapon <andriy.gapon@clusterhq.com> MFC after: 2 weeks Sponsored by: ClusterHQ	2016-04-22 13:00:27 +00:00
Mark Johnston	6c2806594b	Make the second argument of dtrace_invop() a trapframe pointer. Currently this argument is a pointer into the stack which is used by FBT to fetch the first five probe arguments. On all non-x86 architectures it's simply the trapframe address, so this change has no functional impact. On amd64 it's a pointer into the trapframe such that stack[1 .. 5] gives the first five argument registers, which are deliberately grouped together in the amd64 trapframe definition. A trapframe argument simplifies the invop handlers on !x86 and makes the x86 FBT invop handler easier to understand. Moreover, it allows for invop handlers that may want to modify the register set of the interrupted thread.	2016-04-17 23:08:47 +00:00
Andriy Gapon	e01dd79f9a	zfs_rezget: z_vnode can not be NULL if zp is valid MFC after: 3 weeks	2016-04-16 07:41:56 +00:00
Andriy Gapon	c2d36fc5cd	zfs: enable vn_io_fault support Note that now we have to account for possible partial writes in dmu_write_uio_dbuf(). It seems that on illumos either all or none of the data are expected to be written. But the partial writes are quite expected when vn_io_fault support is enabled. Reviewed by: kib MFC after: 7 weeks Differential Revision: https://reviews.freebsd.org/D2790	2016-04-16 07:35:53 +00:00
Alan Somers	739f4ae3b1	Don't corrupt ZFS label's physpath attribute when booting while a disk is missing Prior to this change, vdev_geom_open_by_path would call vdev_geom_attach prior to verifying the device's GUIDs. vdev_geom_attach calls vdev_geom_attrchange to set the physpath in the vdev object. The result is that if the disk could not be found, then the labels for other disks in the same TLD would overwrite the missing disk's physpath with the physpath of whichever disk currently has the same devname as the missing one used to have. MFC after: 4 weeks Sponsored by: Spectra Logic Corp	2016-04-15 16:36:17 +00:00
Alan Somers	c29088b5c7	Add more debugging statements in vdev_geom.c Log a debugging message whenever geom functions fail in vdev_geom_attach. Printing these messages is controlled by vfs.zfs.debug MFC after: 4 weeks Sponsored by: Spectra Logic Corp	2016-04-14 23:14:41 +00:00
Alan Somers	f0ac053088	Update a debugging message in vdev_geom_open_by_guids for consistency with similar messages elsewhere in the file. MFC after: 4 weeks Sponsored by: Spectra Logic Corp	2016-04-14 19:20:31 +00:00
Alan Somers	4e3ab010a2	Fix rare double free in vdev_geom_attrchanged sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Don't drop the g_topology_lock before freeing old_physpath. That opens up a race where one thread can call vdev_geom_attrchanged, set old_physpath, drop the g_topology_lock, then block trying to acquire the SCL_STATE lock. Then another thread can come into vdev_geom_attrchanged, set old_physpath to the same value, and proceed to free it. When the first thread resumes, it will free the same location. It turns out that the SCL_STATE lock isn't needed. It was originally added by gibbs to protect vd->vdev_physpath while updating the same. However, the update process subsequently was switched to an atomic operation (a pointer swap). Now, there is no need for the SCL_STATE lock, and hence no need to drop the g_topology_lock. Reviewed by: delphij MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D5413	2016-04-12 19:11:14 +00:00
Andriy Gapon	c3249989ef	l2arc: make sure that all writes honor ashift of a cache device Previously uncompressed buffers did not obey that rule. Type of b_asize is changed to uint64_t for consistency, given that this is a zeta-byte filesystem. l2arc_compress_buf is renamed to l2arc_transform_buf to better reflect its new utility. Now not only we ensure that a compressed buffer has a size aligned to ashift, but we also allocate a properly sized temporary buffer if the original buffer is not compressed and it has an odd size. This ensures that all I/O to the cache device is always ashift-aligned, in terms of both a request offset and a request size. If the aligned data is larger than the original data, then we have to use a temporary buffer when reading it as well. Also, enhance physical zio alignment checks using vdev_logical_ashift. On FreeBSD we have this information, so we can make stricter assertions. Reviewed by: smh, mav MFC after: 1 month Sponsored by: ClusterHQ Differential Revision: https://reviews.freebsd.org/D2789	2016-04-12 06:56:35 +00:00
Andriy Gapon	6a50036052	Revert r297396 Modify "4958 zdb trips assert on pools with ashift >= 0xe" A better fix is following.	2016-04-12 06:54:18 +00:00
Alexander Motin	78aec5c610	MFV r297831: 6322 ZFS indirect block predictive prefetch Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Author: Alexander Motin <mav@FreeBSD.org> Improve speculative prefetch of indirect blocks. Scalability of many operations on wide ZFS pool can be limited by requirement to prefetch indirect blocks first. Recently added asynchronous indirect block read partially helped, but did not solve the problem completely. This patch extends existing prefetcher functionality to explicitly work with indirect blocks. Before this change prefetcher issued reads for up to 8MB of data in advance. With this change it also issues indirect block reads for up to 64MB of data in advance, so that when it will be time to actually read those data, it can be done immediately. Alike effect can be achieved by just increasing maximal data prefetch distance, but at higher memory cost. Also this change introduces indirect block prefetch for rewrite operations, that was never done before. Previously ARC miss for Indirect blocks regularly blocked rewrites, converting perfectly aligned asynchronous operations into synchronous read-write pairs, significantly reducing maximal rewrite speed. While being there this issue was also fixed: - prefetch was done always, even if caching for the dataset was completely disabled. Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool of 12 assorted HDDs shown me such performance numbers: ------- BEFORE -------- Write 491363677 bytes/sec Read 312430631 bytes/sec Rewrite 97680464 bytes/sec -------- AFTER -------- Write 493524146 bytes/sec Read 438598079 bytes/sec Rewrite 277506044 bytes/sec Closes #65 Closes #80 openzfs/openzfs@792fd28ac0	2016-04-11 21:09:15 +00:00
Steven Hartland	2dcee04b3a	Only include sysctl in kernel build Only include sysctl in kernel builds fixing warning about implicit declaration of function 'sysctl_handle_int'. PR: 204140 MFC after: 1 week X-MFC-With: r297813 Sponsored by: Multiplay	2016-04-11 13:17:11 +00:00
Steven Hartland	7bc47b4ea3	Only include sysctl in kernel build Only include sysctl in kernel builds fixing warning about implicit declaration of function 'sysctl_handle_int'. Sponsored by: Multiplay	2016-04-11 08:57:54 +00:00
Andriy Gapon	1da2e1e353	zio: align use of "no dump" flag between use_uma and !use_uma cases At the moment no ZFS buffers are included into a crash dump unless ZFS_DEBUG (or INVARIANTS) kernel option is enabled. That's not very helpful for debugging of ZFS problems, because important information often resides in metadata buffers. This change switches the dumping behavior when UMA is used from the illumos behavior to a more useful behavior that we have on FreeBSD when ZFS buffers are allocated via malloc. Reviewed by: smh, mav MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D5892	2016-04-11 07:11:20 +00:00
Mark Johnston	b529028676	Implement support for boot-time DTrace. This allows one to enable DTrace probes relatively early during boot, during SI_SUB_DTRACE_ANON, before dtrace(1) can invoked. The desired enabling is created using dtrace -A, which writes a /boot/dtrace.dof file and uses nextboot(8) to ensure that DTrace kernel modules are loaded and that the DOF file describing the enabling is loaded by loader(8) during the subsequent boot. The trace output can then be fetched with dtrace -a. With this commit, boot-time DTrace is only functional on i386 and amd64: on other architectures, the high-resolution timer frequency is initialized during SI_SUB_CLOCKS and is thus not available when the anonymous tracing state is initialized. On x86, the TSC is used and is thus available earlier. MFC after: 1 month Relnotes: yes	2016-04-10 01:25:48 +00:00
Mark Johnston	33b454938a	Initialize SDT probes during SI_SUB_DTRACE_PROVIDER. This is consistent with all other DTrace providers and ensures that SDT probes are available for boot-time tracing. MFC after: 2 weeks	2016-04-10 01:24:27 +00:00
Mark Johnston	e1e33ff912	Initialize DTrace hrtimer frequency during SI_SUB_CPU on i386 and amd64. This allows the hrtimer to be used earlier during boot. This is required for boot-time DTrace: anonymous enablings are created during SI_SUB_DTRACE_ANON, which runs before APs are started. In particular, the DTrace deadman timer requires that the hrtimer be functional. MFC after: 2 weeks	2016-04-10 01:23:39 +00:00
Alexander Motin	eaee150e3f	MFV r297760: 6418 zpool should have a label clearing command Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Author: Will Andrews <will@firepipe.net> Closes #83 Closes #32 openzfs/openzfs@9663688425 FreeBSD already had `zpool labelclear` functionality, so this is mostly just a diff reduction. MFC after: 1 month	2016-04-09 20:30:50 +00:00
Andriy Gapon	c8ff459286	zio write issue threads should have lower (numerically greater) priority This is because they might do data compression which is quite CPU expensive. The original code is correct for illumos, because there a higher priority corresponds to a greater number. MFC after: 2 weeks	2016-04-08 11:58:24 +00:00
Alexander Motin	309b1c7ade	Alike to r293708 relax pool check in vdev_geom_open_by_path(). This made impossible spare disk open by known path, which kind of worked only because the same fix was applied to vdev_geom_attach_by_guids() in r293708. MFC after: 1 week	2016-04-07 12:54:44 +00:00
Edward Tomasz Napierala	ae34b6ff96	Add four new RCTL resources - readbps, readiops, writebps and writeiops, for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080	2016-04-07 04:23:25 +00:00
Wojciech Macek	1c7c13aa0e	Implement dtrace_getupcstack in ARM64 Allow using DTRACE for performance analysis of userspace applications - the function call stack can be captured. This is almost an exact copy of AMD64 solution. Obtained from: Semihalf Sponsored by: Cavium Reviewed by: emaste, gnn, jhibbits Differential Revision: https://reviews.freebsd.org/D5779	2016-04-06 05:13:36 +00:00
Andriy Gapon	e881d8757c	remove emulation of VFS_HOLD and VFS_RELE from opensolaris compat On FreeBSD VFS_HOLD/VN_RELE were mapped to MNT_REF/MNT_REL that manipulate mnt_ref. But the job of properly maintaining the reference count is already automatically performed by insmntque(9) and delmntque(9). So, in effect all ZFS vnodes referenced the corresponding mountpoint twice. That was completely harmless, but we want to be very explicit about what FreeBSD VFS APIs are used, because illumos VFS_HOLD and FreeBSD MNT_REF provide quite different guarantees with respect to the held vfs_t / mountpoint. On illumos VFS_HOLD is sufficient to guarantee that vfs_t.vfs_data stays valid. On the other hand, on FreeBSD MNT_REF does not provide the same guarantee about mnt_data. We have to use vfs_busy() to get that guarantee. Thus, the calls to VFS_HOLD/VFS_RELE on vnode init and fini are removed. VFS_HOLD calls are replaced with vfs_busy in the ioctl handlers. And because vfs_busy has a richer interface that can not be dumbed down in all cases it's better to explicitly use it rather than trying to mask it behind VFS_HOLD. This change fixes a panic that could result from a race between zfs_umount() and zfs_ioc_rollback(). We observed a case where zfsvfs_free() tried to destroy data that zfsvfs_teardown() was still using. That happened because there was nothing to prevent unmounting of a ZFS filesystem that was in between zfs_suspend_fs() and zfs_resume_fs(). Reviewed by: kib, smh MFC after: 3 weeks Sponsored by: ClusterHQ Differential Revision: https://reviews.freebsd.org/D2794	2016-04-02 16:25:46 +00:00
Alexander Motin	baf8ceac9e	MFV r297506: 6738 zfs send stream padding needs documentation Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Eli Rosenthal <eli.rosenthal@delphix.com> illumos/illumos-gate@c20404ff77	2016-04-02 08:36:24 +00:00
Alexander Motin	4cf6fde5e8	MFV r297504: 6681 zfs list burning lots of time in dodefault() via dsl_prop_* Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alex Wilson <alex.wilson@joyent.com> illumos/illumos-gate@d09e4475f6	2016-04-02 08:28:46 +00:00
Gleb Smirnoff	a8b2b39cce	Fix an error in r292373. Use proper count to update "pages in" counter. Noticed by: pfg via Coverity	2016-03-31 21:15:00 +00:00
Alexander Motin	30a0e024ee	Plug open count leak on zvol rename. MFC after: 2 weeks	2016-03-30 16:54:18 +00:00
Alexander Motin	b39dea9308	Switch from using make_dev_p() to make_dev_s() to close races.	2016-03-30 16:48:57 +00:00
Alexander Motin	86b0daa373	Modify "4958 zdb trips assert on pools with ashift >= 0xe". Unlike Illumos FreeBSD has concept of logical ashift, that specifies really minimal vdev block size that can be accessed. This knowledge allows properly pad physical I/O and correctly assert its alignment. This change fixes L2ARC write errors when device has logical sector size above 512 bytes. MFC after: 1 month	2016-03-29 19:18:34 +00:00
Alexander Motin	78b127f2fc	Pass through error code from make_dev_p(). ENAMETOOLONG is much more informative in logs then ENXIO. MFC after: 1 week	2016-03-28 08:12:29 +00:00
Alexander Motin	52c9b0b539	Unify ignoring EEXIST from zvol_create_minor(). This fixes creation of zvol devices for snapshots during zfs receive, that previously failed with "ZFS WARNING: Unable to create ZVOL" message. This solution is not perfect, but IMHO better then it was before. MFC after: 2 weeks	2016-03-24 10:10:41 +00:00
Mark Johnston	48cc2d5e22	Remove unused variables dtrace_in_probe and dtrace_in_probe_addr.	2016-03-17 18:55:54 +00:00
Alexander Motin	fa9de58388	Fix small memory leak on attempt to access deleted snapshot. MFC after: 3 days	2016-03-15 21:21:28 +00:00
Alexander Motin	5db0866658	Make ZFS ignore stripe sizes above SPA_MAXASHIFT (8KB). If device has stripe size bigger then maximal sector size supported by ZFS, there is nothing can be done to avoid read-modify-write cycles. Taking that stripe size into account will only reduce space efficiency and pointlessly bother user with warnings that can not be fixed. Discussed with: smh	2016-03-10 16:39:46 +00:00
Alexander Motin	eef192d85c	Make ZFS more picky to GEOM stripe sizes and offsets. Use of misaligned or non-power-of-2 stripes is not really useful for ZFS, since increased ashift won't help to avoid read-modify-write cycles, and only reduce pool space efficiency and compression rates.	2016-03-10 14:18:14 +00:00
Alexander Motin	a151f3a7ef	MFV r296609: 6370 ZFS send fails to transmit some holes Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Stefan Ring <stefanrin@gmail.com> Reviewed by: Steven Burgess <sburgess@datto.com> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com> In certain circumstances, "zfs send -i" (incremental send) can produce a stream which will result in incorrect sparse file contents on the target. The problem manifests as regions of the received file that should be sparse (and read a zero-filled) actually contain data from a file that was deleted (and which happened to share this file's object ID). Note: this can happen only with filesystems (not zvols, because they do not free (and thus can not reuse) object IDs). Note: This can happen only if, since the incremental source (FromSnap), a file was deleted and then another file was created, and the new file is sparse (i.e. has areas that were never written to and should be implicitly zero-filled). We suspect that this was introduced by 4370 (applies only if hole_birth feature is enabled), and made worse by 5243 (applies if hole_birth feature is disabled, and we never send any holes). The bug is caused by the hole birth feature. When an object is deleted and replaced, all the holes in the object have birth time zero. However, zfs send cannot tell that the holes are new since the file was replaced, so it doesn't send them in an incremental. As a result, you can end up with invalid data when you receive incremental send streams. As a short-term fix, we can always send holes with birth time 0 (unless it's a zvol or a dataset where we can guarantee that no objects have been reused). Closes #37 openzfs/openzfs@adef853162	2016-03-10 09:01:19 +00:00
Alexander Motin	7370229e8d	Add new IOCTL compat shims for ABI breakage caused by r296510: MFV r296505: 6531 Provide mechanism to artificially limit disk performance	2016-03-09 11:16:15 +00:00
Alexander Motin	8d0e2eb06b	MFV r296529: 6672 arc_reclaim_thread() should use gethrtime() instead of ddi_get_lbolt() 6673 want a macro to convert seconds to nanoseconds and vice-versa Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Eli Rosenthal <eli.rosenthal@delphix.com> illumos/illumos-gate@a8f6344fa0	2016-03-08 18:28:24 +00:00
Alexander Motin	468bca03ef	MFV r296527: 6659 nvlist_free(NULL) is a no-op Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Marcel Telka <marcel@telka.sk> Approved by: Robert Mustacchi <rm@joyent.com> Author: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> illumos/illumos-gate@aab83bb83b	2016-03-08 18:11:38 +00:00
Alexander Motin	26802705d3	MFV r296522: 6541 Pool feature-flag check defeated if "verify" is included in the dedup property value Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: ilovezfs <ilovezfs@icloud.com> illumos/illumos-gate@971640e6aa	2016-03-08 17:58:02 +00:00
Alexander Motin	178f2c2b8e	MFV r296520: 6562 Refquota on receive doesn't account for overage Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Dan McDonald <danmcd@omniti.com> illumos/illumos-gate@5f7a8e6d75	2016-03-08 17:53:42 +00:00
Alexander Motin	7a90077752	MFV r296518: 5027 zfs large block support (add copyright) Author: Matthew Ahrens <matt@mahrens.org> illumos/illumos-gate@c3d26abc9e	2016-03-08 17:51:09 +00:00
Alexander Motin	c892984b84	MFV r296515: 6536 zfs send: want a way to disable setting of DRR_FLAG_FREERECORDS Reviewed by: Anil Vijarnia <avijarnia@racktopsystems.com> Reviewed by: Kim Shrier <kshrier@racktopsystems.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andrew Stormont <astormont@racktopsystems.com> illumos/illumos-gate@880094b606	2016-03-08 17:43:21 +00:00
Alexander Motin	253159febf	MFV r296513: 6450 scrub/resilver unnecessarily traverses snapshots created after the scrub started Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@38d6103674	2016-03-08 17:34:58 +00:00
Alexander Motin	4427252c14	MFV r296511: 6537 Panic on zpool scrub with DEBUG kernel Reviewed by: Steve Gonczi <gonczi@comcast.net> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Gary Mills <gary_mills@fastmail.fm> illumos/illumos-gate@8c04a1fa3f	2016-03-08 17:32:24 +00:00
Alexander Motin	1b63fd68f4	MFV r296505: 6531 Provide mechanism to artificially limit disk performance Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@97e8130957	2016-03-08 17:27:13 +00:00
Mark Johnston	9610c89750	Fix a couple of silly mistakes in r291962. - Handle the case where no DOF helper is provided. This occurs with the currently-unused DTRACEHIOC_ADD ioctl. - Fix some checks that prevented the loading DOF in the (non-default) lazyload mode.	2016-03-08 00:46:03 +00:00
Mark Johnston	380344a7af	Fix fasttrap tracepoint locking. Upstream, tracepoints are protected by per-CPU mutexes. An unlinked tracepoint may be freed once all the tracepoint mutexes have been acquired and released - this is done in fasttrap_mod_barrier(). This mechanism was not properly ported: in some places, the proc lock is used in place of a tracepoint lock, and in others the locking is omitted entirely. This change implements tracepoint locking with an rmlock, where the read lock is used in fasttrap probe context. As a side effect, this fixes a recursion on the proc lock when the raise action is used from a userland probe. MFC after: 1 month	2016-03-08 00:43:03 +00:00
Mark Johnston	6b1bddce00	Remove the fasttrap implementation for sparc. Other machine-dependent code required for DTrace on sparc is not present in the tree, so there's no point to keeping the fasttrap code.	2016-03-08 00:18:46 +00:00
Mark Johnston	acaa855f6e	MFV r296306: 6604 harden DIF bounds checking Reviewed by: Alex Wilson <alex.wilson@joyent.com> Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Bryan Cantrill <bryan@joyent.com> illumos/illumos-gate@1c0cef67db MFC after: 2 weeks	2016-03-08 00:14:14 +00:00
Steven Hartland	e283644b87	Removed unused label and fix mutex_exit order Remove unused done label from zfs_setacl fixing PVS-Studio V729. Fix mutex_exit order to mirror the mutex_enter order. MFC after: 1 week Sponsored by: Multiplay	2016-02-25 03:01:24 +00:00
Svatopluk Kraus	35a0bc1260	As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no need to include it explicitly when <vm/vm_param.h> is already included. Suggested by: alc Reviewed by: alc Differential Revision: https://reviews.freebsd.org/D5379	2016-02-22 09:08:04 +00:00
Warner Losh	c55f57071a	Create an API to reset a struct bio (g_reset_bio). This is mandatory for all struct bio you get back from g_{new,alloc}_bio. Temporary bios that you create on the stack or elsewhere should use this before first use of the bio, and between uses of the bio. At the moment, it is nothing more than a wrapper around bzero, but that may change in the future. The wrapper also removes one place where we encode the size of struct bio in the KBI.	2016-02-17 17:16:02 +00:00
Michal Meloun	7a308c64b4	ARM: Rename remaining ARMv4 specific function in DTrace code. I missed it in r295319. Pointed by: tuexen	2016-02-06 11:16:15 +00:00
Andriy Gapon	984777c43f	MFV r294821: 6529 Properly handle updates of variably-sized SA entries. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Tim Chase <tim@chase2k.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Andriy Gapon <avg@icyb.net.ua> illumos/illumos-gate@e7e978b1f7 During the update process in sa_modify_attrs(), the sizes of existing variably-sized SA entries are obtained from sa_lengths[]. The case where a variably-sized SA was being replaced neglected to increment the index into sa_lengths[], so subsequent variable-length SAs would be rewritten with the wrong length. This patch adds the missing increment operation so all variably-sized SA entries are stored with their correct lengths. Another problem was that index into attr_desc[] was increased even when an attribute was removed. If that attribute was not the last attribute, then the last attribute was lost.	2016-02-01 15:40:40 +00:00
Alan Somers	d4b9233a96	Add a sysctl to allow ZFS pools backed by zvols Change 294329 removed the ability to build ZFS pools that are backed by zvols, because having that ability (even if it's not used) leads to deadlocks. By popular demand, I'm adding an off-by-default sysctl to reenable that ability. Reviewed by: lidl, delphij MFC after: Never Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4998	2016-01-29 17:08:26 +00:00
Ruslan Bukin	28029b68c0	Welcome the RISC-V 64-bit kernel. This is the final step required allowing to compile and to run RISC-V kernel and userland from HEAD. RISC-V is a completely open ISA that is freely available to academia and industry. Thanks to all the people involved! Special thanks to Andrew Turner, David Chisnall, Ed Maste, Konstantin Belousov, John Baldwin and Arun Thomas for their help. Thanks to Robert Watson for organizing this project. This project sponsored by UK Higher Education Innovation Fund (HEIF5) and DARPA CTSRD project at the University of Cambridge Computer Laboratory. FreeBSD/RISC-V project home: https://wiki.freebsd.org/riscv Reviewed by: andrew, emaste, kib Relnotes: Yes Sponsored by: DARPA, AFRL Sponsored by: HEIF5 Differential Revision: https://reviews.freebsd.org/D4982	2016-01-29 15:12:31 +00:00
Alexander Motin	5a97f48082	MFV r294819: 6495 Fix mutex leak in dmu_objset_find_dp Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Albert Lee <trisk@omniti.com> Author: Steven Hartland <steven.hartland@multiplay.co.uk> illumos/illumos-gate@2bad22584d	2016-01-26 13:45:41 +00:00
Alexander Motin	1cb4625f18	MFV r294816: 4986 receiving replication stream fails if any snapshot exceeds refquota Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gordon.ross@nexenta.com> Author: Dan McDonald <danmcd@omniti.com> illumos/illumos-gate@5878fad70d	2016-01-26 13:37:30 +00:00
Alexander Motin	75b810aee6	MFV r294814: 6393 zfs receive a full send as a clone Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Paul Dagnelie <pcd@delphix.com> illumos/illumos-gate@68ecb2ec93 This allows to do a full (non-incremental send) and receive it as a clone of an existing dataset. It can leverage nopwrite to share blocks with the origin. This can be used to change the relationship of datasets on the target. For example, maybe on the source you have: A ---- B ---- C And you have sent to the target a full of B, and the incremental B->C: B ---- C You later realize that you want to have A on the target. You will have to do a full send of A, but nopwrite can save you space on the target if you receive it as a clone of B, assuming that A and B have some blocks inxi common: B ---- C \ A	2016-01-26 13:14:39 +00:00
Alexander Motin	d2385b31f5	MFV r294812: 6434 sa_find_sizes() may compute wrong SA header size Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Andriy Gapon <avg@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: James Pan <jiaming.pan@yahoo.com> illumos/illumos-gate@3502ed6e7c	2016-01-26 13:03:01 +00:00
Alexander Motin	49b7f6ef02	MFV r294810: 6414 vdev_config_sync could be simpler Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Will Andrews <will@firepipe.net> illumos/illumos-gate@eb5bb58421	2016-01-26 12:58:58 +00:00
Alexander Motin	70c71b4722	MFV r294808: 6421 Add missing multilist_destroy calls to arc_fini Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Jorgen Lundman <lundman@lundman.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@57deb23282	2016-01-26 12:54:03 +00:00
Alexander Motin	6c941579b9	MFV r294806: 6388 Failure of userland copy should return EFAULT Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Richard Yao <ryao@gentoo.org> illumos/illumos-gate@c71c00bbe8	2016-01-26 12:52:16 +00:00
Alexander Motin	8ad8374efe	MFV r294804: 6386 Fix function call with uninitialized value in vdev_inuse Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Richard Yao <ryao@gentoo.org> illumos/illumos-gate@5bdd995ddb	2016-01-26 12:50:14 +00:00
Alexander Motin	02404a5ad2	MFV r294802: 6334 Cannot unlink files when over quota Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Simon Klinkert <simon.klinkert@gmail.com> illumos/illumos-gate@6575bca013	2016-01-26 12:48:10 +00:00
Alexander Motin	2360c716f9	MFV r294800: 6385 Fix unlocking order in zfs_zget Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Andriy Gapon <avg@freebsd.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Richard Yao <ryao@gentoo.org> illumos/illumos-gate@eaef6a96de	2016-01-26 12:44:49 +00:00
Alexander Motin	81754f9788	MFV r294798: 6292 exporting a pool while an async destroy is running can leave entries in the deferred tree Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Fabian Keil <fk@fabiankeil.de> Approved by: Gordon Ross <gordon.ross@nexenta.com> illumos/illumos-gate@a443cc80c7	2016-01-26 12:37:23 +00:00
Alexander Motin	82abccb272	MFV r294796: 6319 assertion failed in zio_ddt_write: bp->blk_birth == txg Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> illumos/illumos-gate@b39b744be7 This is revert of 5693.	2016-01-26 12:33:58 +00:00
Alexander Motin	27a8d05bd7	MFV r294793: 6367 spa_config_tryenter incorrectly handles the multiple-lock case Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Prashanth Sreenivasa <prashksp@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Approved by: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@e495b6e673	2016-01-26 12:28:53 +00:00
Edward Tomasz Napierala	aa9b057c08	Fix ru_oublocks accounting for ZFS. There are two code paths that can be called from zfs_write() - one of them, through dmu_write(), was handled correctly; the other wasn't. Reviewed by: avg@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D4923	2016-01-23 12:13:09 +00:00
Alan Somers	34a484f353	Quell harmless CID about unchecked return value in nvlist_get_guids. The return value doesn't need to be checked, because nvlist_get_guid's callers check the returned values of the guids. Coverity CID: 1341869 MFC after: 1 week X-MFC-With: 292066 Sponsored by: Spectra Logic Corp	2016-01-19 23:16:24 +00:00
Alan Somers	f7b60097b5	Disallow zvol-backed ZFS pools Using zvols as backing devices for ZFS pools is fraught with panics and deadlocks. For example, attempting to online a missing device in the presence of a zvol can cause a panic when vdev_geom tastes the zvol. Better to completely disable vdev_geom from ever opening a zvol. The solution relies on setting a thread-local variable during vdev_geom_open, and returning EOPNOTSUPP during zvol_open if that thread-local variable is set. Remove the check for MUTEX_HELD(&zfsdev_state_lock) in zvol_open. Its intent was to prevent a recursive mutex acquisition panic. However, the new check for the thread-local variable also fixes that problem. Also, fix a panic in vdev_geom_taste_orphan. For an unknown reason, this function was set to panic. But it can occur that a device disappears during tasting, and it causes no problems to ignore this departure. Reviewed by: delphij MFC after: 1 week Relnotes: yes Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4986	2016-01-19 17:00:25 +00:00
Dimitry Andric	9516209bf2	MFV r294101: 6527 Possible access beyond end of string in zpool comment Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Gordon Ross <gwr@nexenta.com> illumos/illumos-gate@2bd7a8d078 This fixes erroneous double increments of the 'check' variable in a loop in spa_prop_validate(). I ran into this in the clang380-import branch, where clang 3.8.0 warns about it. (It is already fixed there.) MFC after: 3 days	2016-01-15 21:45:53 +00:00
Alan Somers	cbedc01c9a	Fix race condition involving ZFS remove events When a ZFS drive disappears, ZFS sends a resource.fs.zfs.removed event to userland. A userland program like zfsd(8) can use that event, for example to activate a hotspare. The current code contains a race condition: vdev_geom will sent the sysevent _before_ spa.c would update the vdev's status, causing userland processes to see pool state that does not reflect the device removal. This change moves the sysevent to spa.c, closing the race. Reviewed by: delphij, Sean Eric Fagan MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4902	2016-01-14 18:19:05 +00:00
Alan Somers	53f6862723	Fix importing l2arc device by guid After r292066, vdev_geom verifies both the vdev and pool guids of device labels during open. However, spare and l2arc devices don't have pool guids, so opening them by guid will fail (opening by path, when the pathname is known, still succeeds). This change allows a vdev to be opened by guid if the label contains no pool_guid, which is the case for inactive spares and l2arc devices. PR: 292066 Reported by: delphij Reviewed by: delphij, smh MFC after: 2 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4861	2016-01-11 22:15:46 +00:00
Alan Somers	4e7787a9e9	Record physical path information in ZFS Vdevs sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: If available, record the physical path of a vdev in ZFS meta-data. Do this both when opening the vdev, and when receiving an attribute change notification from GEOM. Make vdev_geom_close() synchronous instead of deferring its work to a GEOM event handler. There is no benefit to deferring the work and this prevents a future open call from referencing a consumer that is scheduled for destruction. The close followed by an immediate open will occur during a vdev reprobe triggered by any type of I/O error. Consolidate vdev_geom_close() and vdev_geom_detach() into vdev_geom_close() and vdev_geom_close_locked(). This also moves the cross linking operations between vdev and GEOM consumer into a single place (linking in vdev_geom_attach() and unlinking in vdev_geom_close_locked()). Submitted by: gibbs, asomers MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4524	2016-01-11 17:57:26 +00:00
Steven Hartland	9697b154f2	Fix const conversion warning in lz4_decompress Fix const conversion warning in lz4_decompress which shows when warnings are enabled (to be done later). MFC after: 2 weeks X-MFC-With: r293268 Sponsored by: Multiplay	2016-01-06 20:28:09 +00:00
Allan Jude	7a3f5d11fb	Replace sys/crypto/sha2/sha2.c with lib/libmd/sha512c.c cperciva's libmd implementation is 5-30% faster The same was done for SHA256 previously in r263218 cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation Extend sbin/md5 to create sha384(1) Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h} Reviewed by: cperciva, des, delphij Approved by: secteam, bapt (mentor) MFC after: 2 weeks Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3929	2015-12-27 17:33:59 +00:00
Andrew Turner	06ef48781d	Be stricter on which functions we can probe with FBT. We now only check the first instruction to see if it's either a pushm with lr, or a sub with sp. The former is the common case, with the latter used with va_args. This removes 12 probes. These are all hand-written assembly, with a few C functions with no stack usage. Submitted by: Howard Su <howard0su@gmail.com> Differential Revision: https://reviews.freebsd.org/D4419	2015-12-23 17:54:19 +00:00
Mark Johnston	8ff6d9dd22	Support an arbitrary number of arguments to DTrace syscall probes. Rather than pushing all eight possible arguments into dtrace_probe()'s stack frame, make the syscall_args struct for the current syscall available via the current thread. Using a custom getargval method for the systrace provider, this allows any syscall argument to be fetched, even in kernels that have modified the maximum number of system call arguments. Sponsored by: EMC / Isilon Storage Division	2015-12-17 00:00:27 +00:00
Gleb Smirnoff	f17f88d3e0	Fix breakage caused by r292373 in ZFS/FUSE/NFS/SMBFS. With the new VOP_GETPAGES() KPI the "count" argument counts pages already, and doesn't need to be translated from bytes to pages. While here make it consistent that rbehind and rahead are updated only if we doesn't return error. Pointy hat to: glebius	2015-12-16 23:48:50 +00:00
Mark Johnston	feea513564	Remove the unused systrace device file and fix style bugs. MFC after: 1 week	2015-12-16 23:46:27 +00:00
Gleb Smirnoff	b0cd20172d	A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES(). o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-12-16 21:30:45 +00:00
Alan Somers	670ffd5e5c	Change an important error message from ZFS_LOG to printf Submitted by: gibbs MFC after: 4 weeks Sponsored by: Spectra Logic Corp	2015-12-11 00:04:13 +00:00
Alan Somers	62ac7dd2bf	During vdev_geom_open, require that the vdev guids match the device's label except during split, add, or create operations. This fixes a bug where the wrong disk could be returned, and higher layers of ZFS would immediately eject it again. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: o When opening by GUID, require both the pool and vdev GUIDs to match. While it is highly unlikely for two vdevs to have the same vdev GUIDs, the ZFS storage pool allocator only guarantees they are unique within a pool. o Modify the open behavior to: - If we are opening a vdev that hasn't previously been opened, open by path without checking GUIDs. - Otherwise, open by path and verify GUIDs. - If that fails, search all geom providers for a device with matching GUIDs. - If that fails, return ENOENT. Submitted by: gibbs, asomers Reviewed by: smh MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4486	2015-12-10 21:46:21 +00:00
Mark Johnston	1639290749	MFV r289003: 6271 dtrace caused excessive fork time Author: Bryan Cantrill <bryan@joyent.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Gordon Ross <gwr@nexenta.com> illumos/illumos-gate@7bd3c1d12d	2015-12-07 21:49:32 +00:00
Mark Johnston	6e0f204c3f	Modify DTRACEHIOC_ADDDOF to copy the DOF section from the target process. r281257 added support for lazyload mode by allowing dtrace(1) to register a DOF section on behalf of a traced process. This was implemented by having libdtrace copy the DOF section into a heap-allocated buffer and passing its address to the ioctl handler. However, DTrace uses the DOF section address as a lookup key in certain cases, so the ioctl handler should be given the target process' DOF section address instead. This change modifies the ADDDOF handler to copy the DOF section in from the target process, rather than from dtrace(1).	2015-12-07 21:44:05 +00:00
Mark Johnston	711fbd17ec	Add helper functions proc_readmem() and proc_writemem(). These helper functions can be used to read in or write a buffer from or to an arbitrary process' address space. Without them, this can only be done using proc_rwmem(), which requires the caller to fill out a uio. This is onerous and results in code duplication; the new functions provide a simpler interface which is sufficient for most existing callers of proc_rwmem(). This change also adds a manual page for proc_rwmem() and the new functions. Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D4245	2015-12-07 21:33:15 +00:00
Andrew Turner	5cde34a0ff	Allow the artificial profile frames to be adjusted as needed by the user. While here update for armv6 to a tested value. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: stat Differential Revision: https://reviews.freebsd.org/D4315	2015-12-05 10:00:01 +00:00
Andrew Turner	c218815337	Move the check to see if we are tracing a function with the DTrace Function Boundary Trace to assembly to reduce the overhead of these checks. Submitted by: Howard Su <howard0su@gmail.com> Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D4266	2015-12-05 09:32:36 +00:00
Stanislav Sedov	314eeef290	Make the number of fasttrap probes and the size of the trace points hash table tunable via sysctl or kernel tunables. Illumos allows this parameters to be changed via the fasttrap.conf configuration file, but FreeBSD code hardcoded the parameters. Expose them under the kern.dtrace.fasttrap sysctl tree. MFC after: 2 weeks	2015-12-01 00:24:54 +00:00
Mark Johnston	4d7296f9aa	Fix a bug in the amd64 dtrace_getarg() implementation: when unwinding the stack, take into account the copy of rsi pushed between the breakpoint trapframe and the dtrace_invop frame. Prior to r287644, this was covered by the fact that sizeof(struct amd64_frame) was 24 rather than 16. Reported by: smh	2015-11-19 05:33:15 +00:00
Steven Hartland	465fed1c17	Switch zfs_panic_recover to panic for bad DVA As reported by Coverity a null pointer de-reference panic would be triggered when zfs_recover was set so switch to straight panic as it can never be recovered. Reported by: Coverity Scan MFC after: 1 X-MFC-With: r290401 Sponsored by: Multiplay	2015-11-06 20:45:19 +00:00
Steven Hartland	d0d400133f	Provide information about bad DVA Provide information about which vdev has an issue with a bad DVA. MFC after: 1 week Sponsored by: Multiplay	2015-11-05 17:12:41 +00:00
Steven Hartland	ab66c9067a	Allow zfs_recover to be changed at runtime MFC after: 1 week Sponsored by: Multiplay	2015-11-05 17:00:42 +00:00
Andrew Turner	e5ca5f2abd	Fix the open solaris atomic functions on arm64. Without this we may use the wrong value in the comparison, leading to incorrectly setting the new value. This has been observed in the ZFS code. Without this we can lose track of the reference count in a zrlock object. We should move to use the generic atomic functions, however as this has been observed I would prefer to have this working, then move to the generic functions. PR: 204037 Sponsored by: ABT Systems Ltd	2015-11-05 16:55:27 +00:00
Andriy Gapon	c34d46ff59	zfs: allow the lookup of extended attributes of an unlinked file That's required for extattr_get_fd(2) and the like to work properly. PR: 203201 MFC after: 17 days	2015-11-02 10:07:21 +00:00
Andriy Gapon	abc37121c4	l2arc: do not call trim_map_free() for blocks with zero b_asize b_asize can be zero if the block is compressed into an empty block (ZIO_COMPRESS_EMPTY) and the trim code asserts that meaningless zero-sized trimming is not attempted. The logic for calling trim_map_free() is extracted into a new function l2arc_trim() to minimize code duplication. PR: 203473 Reported by: Willem Jan Withagen <wjw@digiware.nl> Tested by: Willem Jan Withagen <wjw@digiware.nl> MFC after: 11 days	2015-10-30 12:00:34 +00:00
John Baldwin	2f99bcce1e	Rename remaining linux32 symbols such as linux_sysent[] and linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with linux64.ko. While here, add support for linux64 binaries to systrace. - Update NOPROTO entries in amd64/linux/syscalls.master to match the main table to fix systrace build. - Add a special case for union l_semun arguments to the systrace generation. - The systrace_linux32 module now only builds the systrace_linux32.ko. module on amd64. - Add a new systrace_linux module that builds on both i386 and amd64. For i386 it builds the existing systrace_linux.ko. For amd64 it builds a systrace_linux.ko for 64-bit binaries. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D3954	2015-10-22 21:28:20 +00:00
Alexander Motin	6b513e2853	MFV r289561: 6328 Fix cstyle errors in zfs codebase Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Jorgen Lundman <lundman@lundman.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com> illumos/illumos-gate@9a686fbc18	2015-10-19 08:25:37 +00:00
Alexander Motin	62ed65eb78	MFV r289526: 5561 support root pools on EFI/GPT partitioned disks 5125 update zpool/libzfs to manage bootable whole disk pools (EFI/GPT labeled disks) Reviewed by: Jean McCormack <jean.mccormack@nexenta.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Hans Rosenfeld <hans.rosenfeld@nexenta.com> illumos/illumos-gate@1a902ef862 This is NOP changes for FreeBSD.	2015-10-18 18:08:33 +00:00
Alexander Motin	ab866a3d61	Fix ZFS ABI compat shims for `zfs receive` after r289362. Difference appeared much less drammatic then seemed originally.	2015-10-17 07:32:46 +00:00
Alexander Motin	43f774f296	MFV r289310: 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@45818ee124 This is only a partial merge of respective ZFS infrastructure changes. At this moment FreeBSD kernel has no those crypto algorithms, so the parts of the code to enable them are commented out. When they are implemented, it will be trivial to plug them in.	2015-10-16 14:45:21 +00:00
Alexander Motin	c70e61feed	MFV r289312: 2605 want to resume interrupted zfs send Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Xin Li <delphij@freebsd.org> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@9c3fd1216f For more info, see: - slides http://www.slideshare.net/MatthewAhrens/openzfs-send-and-receive - video https://www.youtube.com/watch?v=iY44jPMvxog - manpage changes (for zfs resume -s and zfs send -t) - upcoming talk at the OpenZFS Developer Summit The TL;DR is: Use "zfs receive -s" to save the partially received state on failure. On failure, get the receive token with "zfs get receive_resume_token <fs>" Resume the send with "zfs send -t <token_value>" Relnotes: yes	2015-10-15 08:47:32 +00:00
Alexander Motin	3dbe12b067	MFV r289308: 6267 dn_bonus evicted too early Reviewed by: Richard Yao <ryao@gentoo.org> Reviewed by: Xin LI <delphij@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Justin T. Gibbs <gibbs@FreeBSD.org> illumos/illumos-gate@d2058105c6	2015-10-14 10:38:05 +00:00
Alexander Motin	fc6f8dee4c	MFV r289306: 6295 metaslab_condense's dbgmsg should include vdev id Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@freebsd.org> Reviewed by: Xin Li <delphij@freebsd.org> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Joe Stein <joe.stein@delphix.com> illumos/illumos-gate@daec38ecb4	2015-10-14 10:31:50 +00:00
Alexander Motin	422891c28a	MFV r289304: 6293 ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign() Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@8fe00bfb87	2015-10-14 10:28:29 +00:00
Alexander Motin	95e20e65d3	MFV r289298: 6286 ZFS internal error when set large block on bootfs Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@6de9bb5603	2015-10-14 07:50:08 +00:00
Alexander Motin	a4256278bf	MFV r289296: 6288 dmu_buf_will_dirty could be faster Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@0f2e7d03b8	2015-10-14 07:45:44 +00:00
Alexander Motin	0f45d37812	MFV r289294: 5219 l2arc_write_buffers() may write beyond target_sz Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Saso Kiselkov <skiselkov@gmail.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Reviewed by: Justin Gibbs <gibbs@FreeBSD.org> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Andriy Gapon <avg@freebsd.org> illumos/illumos-gate@d7d9a6d919	2015-10-14 07:37:02 +00:00
Alexander Motin	2269f420b2	FreeBSD-specific addition to r289191.	2015-10-12 18:15:25 +00:00
Alexander Motin	fea4f2108f	MFV r289188: 6281 prefetching should apply to 1MB reads Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Alexander Motin <mav@freebsd.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Gordon Ross <gordon.ross@nexenta.com> Author: George Wilson <george.wilson@delphix.com> illumos/illumos-gate@632802744e	2015-10-12 15:48:45 +00:00
Alexander Motin	558dcd4e42	MFV r289187: 6251 add tunable to disable free_bpobj processing Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Albert Lee <trisk@omniti.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Garrett D'Amore <garrett@damore.org> Author: George Wilson <george.wilson@delphix.com> illumos/illumos-gate@139510fb6e	2015-10-12 15:44:44 +00:00
Alexander Motin	72b6ad9bb5	MFV r289185: 6250 zvol_dump_init() can hold txg open Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Albert Lee <trisk@omniti.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Garrett D'Amore <garrett@damore.org> Author: George Wilson <george.wilson@delphix.com> illumos/illumos-gate@b10bba7246	2015-10-12 15:39:03 +00:00
Alexander Motin	ec5a8cf7c0	Restore original array_rd_sz semantics. Before r278702 prefetch was blocked for I/Os > 1MB, after -- >= 1MB. 1MB I/Os are used for bulk operations in CTL (XCOPY, VERIFY), and disabling prefetch for them reduced the performance. This is temporary local patch, that should be replaced when upstreamed. Discussed with: mahrens MFC after: 3 days	2015-10-03 11:05:58 +00:00
Mark Johnston	f7c3db2537	MFV r288408: 6266 harden dtrace_difo_chunksize() with respect to malicious DIF illumos/illumos-gate@395c7a3dcf Reviewed by: Alex Wilson <alex.wilson@joyent.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Bryan Cantrill <bryan@joyent.com> MFC after: 1 week	2015-09-30 05:24:22 +00:00
Andriy Gapon	a26cc6c081	sdt: static-ize couple of variables MFC after: 11 days	2015-09-29 12:14:22 +00:00
Andriy Gapon	ab8d248801	sdt module does not seem to actually use any symbol from opensolaris module MFC after: 11 days	2015-09-29 12:13:31 +00:00
Andriy Gapon	3bd9b9a600	std: it is important that func name is never an empty string otherwise DTRACE_ANCHORED() returns false and that makes stack() insert a bogus frame at the top. For example: dtrace -n 'test:dtrace_test::sdttest { stack(); } This change is not really a solution, but just a work-around. The real solution is to record the probe's call site and to use that for resolving a function name. PR: 195222 MFC after: 22 days	2015-09-29 12:02:23 +00:00
Andriy Gapon	09999d92b1	sdt: start checking version field when parsing probe definitions This is an extra safety measure. MFC after: 21 days	2015-09-29 11:58:21 +00:00
Andriy Gapon	c9d71814d5	dtrace_getarg: remove stray return statement on amd64, powerpc MFC after: 10 days	2015-09-29 11:55:26 +00:00
Andriy Gapon	9b977fcea2	define aok in libnvpair which is linked to all zfs libraries that need aok This removes the circular dependency of libnvpair on libzfs / libzpool. PR: 199811 Obtained from: bapt MFC after: 23 days	2015-09-28 15:25:36 +00:00
Xin LI	8012d6910c	MFV r288063: make dataset property de-registration operation O(1) A change to a property on a dataset must be propagated to its descendants in case that property is inherited. For datasets whose information is not currently loaded into memory (e.g. a snapshot that isn't currently mounted), there is nothing to do; the property change will take effect the next time that dataset is loaded. To handle updates to datasets that are in-core, ZFS registers a callback entry for each property of each loaded dataset with the dsl directory that holds that dataset. There is a dsl directory associated with each live dataset that references both the live dataset and any snapshots of the live dataset. A property change is effected by doing a traversal of the tree of dsl directories for a pool, starting at the directory sourcing the change, and invoking these callbacks. The current implementation both registers and de-registers properties individually for each loaded dataset. While registration for a property is O(1) (insert into a list), de-registration is O(n) (search list and then remove). The 'n' for de-registration, however, is not limited to the size (number of snapshots + 1) of the dsl directory. The eviction portion of the life cycle for the in core state of datasets is asynchronous, which allows multiple copies of the dataset information to be in-core at once. Only one of these copies is active at any time with the rest going through tear down processing, but all copies contribute to the cost of performing a dsl_prop_unregister(). One way to create multiple, in-flight copies of dataset information is by performing "zfs list" operations from multiple threads concurrently. In-core dataset information is loaded on demand and then evicted when reference counts drops to zero. For datasets that are not mounted, there is no persistent reference count to keep them resident. So, a list operation will load them, compute the information required to do the list operation, and then evict them. When performing this operation from multiple threads it is possible that some of the in-core dataset information will be reused, but also possible to lose the race and load the dataset again, even while the same information is being torn down. Compounding the performance issue further is a change made for illumos issue 5056 which made dataset eviction single threaded. In environments using automation to manage ZFS datasets, it is now possible to create enough of a backlog of dataset evictions to consume excessive amounts of kernel memory and to bog down the system. The fix employed here is to make property de-registration O(1). With this change in place, it is hoped that a single thread is more than sufficient to handle eviction processing. If it isn't, the problem can be solved by increasing the number of threads devoted to the eviction taskq. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_prop.h: Associate dsl property callback records with both the dsl directory and the dsl dataset that is registering the callback. Both connections are protected by the dsl directory's "dd_lock". When linking callbacks into a dsl directory, group them by the property type. This helps reduce the space penalty for the double association (the property name pointer is stored once per dsl_dir instead of in each record) and reduces the number of strcmp() calls required to do callback processing when updating a single property. Property types are stored in a linked list since currently ZFS registers a maximum of 10 property types for each dataset. Note that the property buckets/records associated with a dsl directory are created on demand, but only freed when the dsl directory is freed. Given the static nature of property types and their small number, there is no benefit to freeing the few bytes of memory used to represent the property record earlier. When a property record becomes empty, the dsl directory is either going to become unreferenced a little later in this thread of execution, or there is a high chance that another dataset is going to be loaded that would recreate the bucket anyway. Replace dsl_prop_unregister() with dsl_prop_unregister_all(). All callers of dsl_prop_unregister() are trying to remove all property registrations for a given dsl dataset anyway. By changing the API, we can avoid doing any lookups of callbacks by property type and just traverse the list of all callbacks for the dataset and free each one. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c: Replace use of dsl_prop_unregister() with the new dsl_prop_unregister_all() API. illumos/illumos-gate@03bad06fbb Author: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Illumos issue: 6171 dsl_prop_unregister() slows down dataset eviction https://www.illumos.org/issues/6171 MFC after: 2 weeks	2015-09-25 01:05:44 +00:00
Andriy Gapon	e88445a48b	MFV r287817: 6220 memleak in l2arc on debug build `c546f36aa8` https://www.illumos.org/issues/6220 5408 introduced a memleak in l2arc, namely the member b_thawed gets leaked when an arc_hdr is realloced from full to l2only. Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: George Wilson <george@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Arne Jansen <sensille@gmx.net>	2015-09-21 12:23:01 +00:00
Xin LI	b4f6099b9f	MFV r287623: 5997 FRU field not set during pool creation and never updated ZFS already supports storing the vdev FRU in a vdev property. There is code in libzfs to work with this property, and there is code in the zfs-retire FMA module that looks for that information. But there is no code actually setting or updating the FRU. To address this, ZFS is changed to send a handful of new events whenever a vdev is added, attached, cleared, or onlined, as well as when a pool is created or imported. Note that syseventd is not currently available on FreeBSD and thus some work is needed to actually support the new ZFS events (e.g. in zfsd) to actually use this capability, this changeset is mostly a diff reduction from upstream. illumos/illumos-gate@1437283407 Illumos issues: 5997 FRU field not set during pool creation and never updated https://www.illumos.org/issues/5997	2015-09-13 07:15:14 +00:00
Xin LI	011ecb128f	Note r286552 as merged and reduce diff against upstream.	2015-09-13 06:49:42 +00:00
Xin LI	653809335f	MFV r287699: 6214 zpools going south In r286570 (MFV of r277426) an unprotected write to b_flags to set the compression mode was introduced. This would open a race window where data is partially decompressed, modified, checksummed and written to the pool, resulting in pool corruption due to the partial decompression. Prevent this by reintroducing b_compress illumos/illumos-gate@d4cd038c92 Illumos issues: 6214 zpools going south https://www.illumos.org/issues/6214	2015-09-12 09:56:23 +00:00
Xin LI	3e691a57db	MFV r287684: 6091 avl_add doesn't assert on non-debug builds Use assfail() from libuutil instead of ASSERT() in userland AVL avl_add. illumos/illumos-gate@faa2b6be2f Illumos issues: 6091 avl_add doesn't assert on non-debug builds https://www.illumos.org/issues/6091	2015-09-12 08:50:43 +00:00
Xin LI	8c4f41ff34	MFV r287624: 5987 zfs prefetch code needs work Rewrite the ZFS prefetch code to detect only forward, sequential streams. The following kstats have been added: kstat.zfs.misc.arcstats.sync_wait_for_async How many sync reads have waited for async read to complete. (less is better) kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch How many demand read didn't have to wait for I/O because of predictive prefetch. (more is better) zfetch kstats have been similified to hits, misses, and max_streams, with max_streams representing times when we were not able to create new stream because we already have the maximum number of sequences for a file. The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes to prefetch per stream. illumos/illumos-gate@cf6106c8a0 Illumos ZFS issues: 5987 zfs prefetch code needs work https://www.illumos.org/issues/5987	2015-09-12 08:35:51 +00:00
Mark Johnston	1e954a7c63	Remove the arg0 field from struct amd64_frame. Its existence was a bug, since on amd64 the first argument to a function is generally not on the stack. Revert an old DTrace bug fix to some code that assumed that sizeof(struct amd64_frame) == 16. Reviewed by: jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3255	2015-09-11 03:31:22 +00:00
Mark Johnston	e8baaa998c	MFV r283513: 5930 fasttrap_pid_enable() panics when prfind() fails in forking process Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Bryan Cantrill <bryan@joyent.com> illumos/illumos-gate@9df7e4e12e	2015-09-11 03:06:34 +00:00
Mark Johnston	2275da185c	MFV r283512: 3599 dtrace_dynvar tail calls can blow stack Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Bryan Cantrill <bryan@joyent.com> illumos/illumos-gate@d47448f09a	2015-09-11 03:04:24 +00:00
Xin LI	28ffe927c2	Expose an interface to determine if an ACE is inherited. Submitted by: sef Reviewed by: trasz MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3540	2015-09-04 00:14:20 +00:00
Allan Jude	de84a5132c	Apply the noline attribute to vdev_queue_max_async_writes This makes it possible to analyze the performance of the new ZFS write throttle with dtrace PR: 200316 Submitted by: Lacey Powers <lacey.leanne@gmail.com> Reviewed by: avg, smh, delphij (no objection) Approved by: bapt (mentor) MFC after: 1 month Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3472	2015-08-31 23:10:42 +00:00
Xin LI	9053fe148b	Fix a buffer overrun which may lead to data corruption, introduced in r286951 by reinstating changes in r274628. In l2arc_compress_buf(), we allocate a buffer to stash away the compressed data in 'cdata', allocated of l2hdr->b_asize bytes. We then ask zio_compress_data() to compress the buffer, b_l1hdr.b_tmp_cdata, which is of l2hdr->b_asize bytes, and have the compressed size (or original size, if compress didn't gain enough) stored in csize. To pad the buffer to fit the optimal write size, we round up the compressed size to L2 device's vdev_ashift. Illumos code rounds up the size by at most SPA_MINBLOCKSIZE. Because we know csize <= b_asize, and b_asize is integer multiple of SPA_MINBLOCKSIZE, we are guaranteed that the rounded up csize would be <= b_asize. However, this is not necessarily true when we round up to 1 << vdev_ashift, because it could be larger than SPA_MINBLOCKSIZE. So, in the worst case scenario, we are overwriting at most (1 << vdev_ashift - SPA_MINBLOCKSIZE) bytes of memory next to the compressed data buffer. Andriy's original change in r274628 reorganized the code a little bit, by moving the padding to after we determined that the compression was beneficial. At which point, we would check rounded size against the allocated buffer size, and the buffer overrun would not be possible.	2015-08-29 09:22:32 +00:00
Xin LI	253d699d3c	In r286705 (Illumos 5960/a2cdcdd), a separate thread is created with curproc as parent. In the case of a send or receive, the curproc would be the userland application that issues the ioctl. This would trigger an assertion failure introduced in Solaris compatibility shims in r196458 when kernel is compiled with INVARIANTS. Fix this by using p0 (proc0 or kernel) as the parent thread when creating the kernel threads.	2015-08-29 08:16:57 +00:00
Andriy Gapon	9f2d1b28df	MFV (partial) r286889: 5692 expose the number of hole blocks in a file FreeBSD porting notes: - only kernel-side changes are merged - the new ioctl is not actually implemented yet - thus, the goal is to synchronize DMU code illumos/illumos-gate@2bcf0248e9 https://www.illumos.org/issues/5692 we would like to expose the number of hole (sparse) blocks in a file. this can be useful to for example if you want to fill in the holes with some data; knowing the number of holes in advances allows you to report progress on hole filling. We could use SEEK_HOLE to do that but it would be O(n) where n is the number of holes present in the file. Author: Max Grossman <max.grossman@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Approved by: Richard Lowe <richlowe@richlowe.net>	2015-08-24 09:48:50 +00:00
Andriy Gapon	21fc429242	spa_import_rootpool: prevent lock and resource leak The lock leak could lead to a deadlock later. PR: 198563 Submitted by: Fabian Keil <fk@fabiankeil.de> MFC after: 1 week	2015-08-24 08:44:44 +00:00
Andriy Gapon	082fcc9ed2	account for ashift when gathering buffers to be written to l2arc device The change that introduced the L2ARC compression support also introduced a bug where the on-disk size of the selected buffers could end up larger than the target size if the ashift is greater than 9. This was because the buffer selection could did not take into account the fact that on-disk size could be larger than the in-memory buffer size due to the alignment requirements. At the moment b_asize is a misnomer as it does not always represent the allocated size: if a buffer is compressed, then the compressed size is properly rounded (on FreeBSD), but if the compression fails or it is not applied, then the original size is kept and it could be smaller than what ashift requires. For the same reasons arcstat_l2_asize and the reported used space on the cache device could be smaller than the actual allocated size if ashift > 9. That problem is not fixed by this change. This change only ensures that l2ad_hand is not advanced by more than target_sz. Otherwise we would overwrite active (unevicted) L2ARC buffers. That problem is manifested as growing l2_cksum_bad and l2_io_error counters. This change also changes 'p' prefix to 'a' prefix in a few places where variables represent allocated rather than physical size. The resolved problem could also result in the reported allocated size being greater than the cache device's capacity, because of the overwritten buffers (more than one buffer claiming the same disk space). This change is already in ZFS-on-Linux: zfsonlinux/zfs@ef56b0780c PR: 198242 PR: 195746 (possibly related) Reviewed by: mahrens (https://reviews.csiden.org/r/229/) Tested by: gkontos@aicom.gr (most recently) MFC after: 15 days X-MFC note: patch does not apply as is at the moment Relnotes: yes Sponsored by: ClusterHQ Differential Revision: https://reviews.freebsd.org/D2764 Reviewed by: noone (@FreeBSD.org)	2015-08-24 08:10:52 +00:00
Andriy Gapon	243f5e3085	try to fix lor between z_teardown_lock and spa_namespace_lock The lock order reversal and a resulting deadlock were introduced in r285021 / D2865. The problem is that zfs_register_callbacks() calls dsl_prop_get_integer() that has to acquire spa_namespace_lock. At the same time, spa_config_sync() is called with spa_namespace_lock held and then it performs ZFS vnode operations that acquire z_teardown_lock in the reader mode. So, fix the problem by using dsl_prop_get_int_ds() instead of dsl_prop_get_integer(). The former does not need to look up the pool and the dataset by name. Reported by: many Reviewed by: delphij Tested by: delphij, Jens Schweikhardt <schweikh@schweikhardt.net> MFC after: 5 days X-MFC with: r285021	2015-08-21 08:17:44 +00:00
Andriy Gapon	b985dac5ff	fix a mismerge in r286539 (MFV 286538: 5562 ZFS sa_handle's violate...) PR: 202358 X-MFC with: r286539 X-MFC attn: mav	2015-08-21 08:04:56 +00:00
Alexander Motin	602015fd15	Restore part of r274628, reverted at r286776. Submitted by: avg	2015-08-20 07:41:33 +00:00
Mariusz Zaborski	347a39b4a6	Add support for the arrays in nvlist library. - Add nvlist_{add,get,take,move,exists,free}_{number,bool,string,nvlist, descriptor} functions. - Add support for (un)packing arrays. - Add the nvl_array_next field to the nvlist structure. If an array is added by the nvlist_{move,add}_nvlist_array function this field will contains next element in the array. - Add the nitems field to the nvpair and nvpair_header structure. This field contains number of elements in the array. - Add special flag (NV_FLAG_IN_ARRAY) which is set if nvlist is a part of an array. - Add special type (NV_TYPE_NVLIST_ARRAY_NEXT).This type is used only on packing/unpacking. - Add new API for traversing arrays (nvlist_get_array_next). - Add the nvlist_get_pararr function which combines the nvlist_get_array_next and nvlist_get_parent functions. If nvlist is in the array it will return next element from array. If nvlist is last element in array or it isn't in array it will return his container (parent). This function should simplify traveling over nvlist. - Add tests for new features. - Add documentation for new functions. - Add my copyright. - Regenerate the sys/cddl/compat/opensolaris/sys/nvpair.h file. PR: 191083 Reviewed by: allanjude (doc) Approved by: pjd (mentor)	2015-08-15 06:34:49 +00:00
Alexander Motin	d8928f479b	Remove some random accumulated diff from Illumos. Submitted by: avg (partially)	2015-08-14 13:43:12 +00:00
Alexander Motin	6cb8dbf791	2618 arc.c mistypes in the comments Reviewed by: Jason King <jason.brian.king@gmail.com> Reviewed by: Josef Sipek <jeffpc@josefsipek.net> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Bart Coddens <bart.coddens@gmail.com> illumos/illumos-gate@fc98fea58e	2015-08-14 13:10:30 +00:00
Alexander Motin	997d864ce0	Fix r286766 build with debug.	2015-08-14 11:47:53 +00:00
Alexander Motin	ab4d08c3d3	Fix minor mismerge sometimes earlier.	2015-08-14 09:48:23 +00:00
Alexander Motin	5ba12a280a	MFV r286765: 5817 change type of arcs_size from uint64_t to refcount_t Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@2fd872a734 As a way to make it more difficult to introduce bugs into the ARC, and to make it easier to diagnose issues when bugs do creep in, it would be beneficial to change the type of the arc_state_t's arcs_size field to be a refcount_t instead of a uint64_t. This would allow us to make stricter checks when incrementing and decrementing the value with debugging enabled, but still fallback to simple, fast atomic operations when debugging is disabled.	2015-08-14 09:39:23 +00:00
Alexander Motin	ab4930d98c	MFV r285025: 6033 arc_adjust() should search MFU lists for oldest buffer when adjusting MFU size. illumos/illumos-gate@31c46cf23c https://www.illumos.org/issues/6033 When we're looking for the list containing oldest buffer we never actually look at the MFU lists even when we try to evict from MFU. looks like a copy paste error, the fix is here: Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Xin Li <delphij@delphij.net> Reviewed by: Prakash Surya <me@prakashsurya.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alek Pinchuk <alek@nexenta.com> Obtained from: illumos	2015-08-14 09:33:46 +00:00
Alexander Motin	e0360e14d2	MFV r277431: 5497 lock contention on arcs_mtx Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@244781f10d This patch attempts to reduce lock contention on the current arc_state_t mutexes. These mutexes are used liberally to protect the number of LRU lists within the ARC (e.g. ARC_mru, ARC_mfu, etc). The granularity at which these locks are acquired has been shown to greatly affect the performance of highly concurrent, cached workloads.	2015-08-14 09:31:07 +00:00
Alexander Motin	267b62ec43	Revert part of r205231, introducing multiple ARC state locks. This local implementation will be replaced by one from Illumos to reduce code divergence and make further merges easier.	2015-08-14 09:25:54 +00:00
Alexander Motin	49114ce463	MFV 286711: 6096 ZFS_SMB_ACL_RENAME needs to cleanup better Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Gordon Ross <gordon.w.ross@gmail.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Approved by: Robert Mustacchi <rm@joyent.com> illumos/illumos-gate@8f5190a540	2015-08-13 00:13:55 +00:00
Alexander Motin	3b1f51e911	MFV 286709: 6093 zfsctl_shares_lookup should only VN_RELE() on zfs_zget() success Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Dan McDonald <danmcd@omniti.com> illumos/illumos-gate@0f92170f1e	2015-08-13 00:10:36 +00:00
Alexander Motin	0d0def87fe	MFV 286707: 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@ca0cc3918a A ZFS feature flags (large blocks) tracks its refcounts as the number of datasets that have ever used the feature. Several features of this type are planned to be added (new checksum functions). This code should be made common infrastructure rather than duplicating the code for each feature.	2015-08-12 23:59:17 +00:00
Alexander Motin	b696497df0	MFV r286704: 5960 zfs recv should prefetch indirect blocks 5925 zfs receive -o origin= Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com> While running 'zfs recv' we noticed that every 128th 8K block required a read. We were seeing that restore_write() was calling dmu_tx_hold_write() and the indirect block was not cached. We should prefetch upcoming indirect blocks to avoid having to go to disk and blocking the restore_write(). Allow an incremental send stream to be received as a clone, even if the stream does not mark it as a clone.	2015-08-12 22:41:06 +00:00
Alexander Motin	d0687a01d7	MFV r284763: 5981 Deadlock in dmu_objset_find_dp illumos/illumos-gate@1d3f896f54 https://www.illumos.org/issues/5981 When dmu_objset_find_dp gets called with a read lock held, it fans out the work to the task queue. Each task in turn acquires its own read lock before calling the callback. If during this process anyone tries to a acquire a write lock, it will stall all read lock requests.Thus the tasks will never finish, the read lock of the caller will never get freed and the write lock never acquired. deadlock. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Arne Jansen <jansen@webgods.de>	2015-08-12 19:10:29 +00:00
Alexander Motin	101a6d4eac	MFV r284762: 5269 zpool import slow illumos/illumos-gate@12380e1e70 https://www.illumos.org/issues/5269 When importing a pool (at boot or with zpool import) with many filesystem, the process can take minutes. It doesn't matter whether the pool has been exported cleanly or uncleanly. The problem is that each dataset has its own log chain. On import, all datasets have to be checked if there are logs to replay. The idea is to speed up this process by paralellizing it. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Arne Jansen <jansen@webgods.de>	2015-08-12 18:47:30 +00:00
Alexander Motin	ebf527de10	MFV r286682: 5765 add support for estimating send stream size with lzc_send_space when source is a bookmark Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@nexenta.com> Author: Max Grossman <max.grossman@delphix.com> illumos/illumos-gate@643da460c8	2015-08-12 18:23:08 +00:00
Alexander Motin	2d41b1006f	MFV r286224: 5695 dmu_sync'ed holes do not retain birth time illumos/illumos-gate@70163ac57e https://www.illumos.org/issues/5695 In dmu_sync_ready(), a hole block pointer will have it's logical size explicitly set as it's necessary for replay purposes. To "undo" this, dmu_sync_done() will zero out any hole that it finds. This becomes a problem when using the "hole_birth" feature, as this will also wipe out any birth time that might have happened to be set on the hole. ... As a fix, the logic to zero out a hole is only applied to old style holes with a birth time of zero. Holes created with the "hole_birth" feature enabled will have a non-zero birth time, and will be skipped (thus preserving the ltime, type, and level information as well). In addition, zdb was updated to also print the ltime, type, and level information for these new style holes. Previously, only the logical birth time would be printed. Author: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com>	2015-08-12 17:21:41 +00:00
Alexander Motin	ef00c83db1	Fix set of sign extension bugs in r286625.	2015-08-12 08:36:58 +00:00
Alexander Motin	b3fc966389	Fix assertion panic caused by combination of r286598 and TRIM.	2015-08-11 19:15:55 +00:00
Alexander Motin	3caed89878	Fix r286625 build on i386.	2015-08-11 12:38:01 +00:00
Alexander Motin	a3b3a9752c	Fix minor mismerge in r286574.	2015-08-11 12:22:16 +00:00
Alexander Motin	c350858a50	MFV r277425: 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@2ec99e3e98	2015-08-11 10:39:19 +00:00
Alexander Motin	6be7d38913	Remove extra lock, that IMO only creates potential problems now.	2015-08-11 09:18:51 +00:00
Alexander Motin	1af86496cb	MFV 286604: 5812 assertion failed in zrl_tryenter(): zr_owner==NULL Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@8df173054c	2015-08-10 21:36:51 +00:00
Alexander Motin	799f47828d	MFV 286602: 5810 zdb should print details of bpobj Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@732885fca0	2015-08-10 21:32:40 +00:00
Alexander Motin	c70c15ffa9	MFV 286599: 5808 spa_check_logs is not necessary on readonly pools Reviewed by: George Wilson <george@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@23367a2f2c	2015-08-10 21:19:42 +00:00
Alexander Motin	f7bf11ab59	MFV 286597: 5701 zpool list reports incorrect "alloc" value for cache devices Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@a52fc310ba	2015-08-10 21:13:59 +00:00
Alexander Motin	2e92f38b63	Local addition and mismerge fix for r286579.	2015-08-10 20:34:46 +00:00
Alexander Motin	de8b7ceff1	MFV 286588: 5820 verify failed in zio_done(): BP_EQUAL(bp, io_bp_orig) Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com> illumod/illumos-gate@34e8acef00	2015-08-10 19:38:07 +00:00
Alexander Motin	57f7c5acf5	MFV 286586: 5746 more checksumming in zfs send Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@98110f08fa	2015-08-10 19:32:58 +00:00
Alexander Motin	9b4b955150	MFV r277430: 5313 Allow I/Os to be aggregated across ZIO priority classes Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Will Andrews <willa@SpectraLogic.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com> illumos/illumos-gate@fe319232d2	2015-08-10 12:39:10 +00:00
Alexander Motin	498b9d6c63	Fix r286574 build in user-space.	2015-08-10 12:25:26 +00:00
Alexander Motin	0702ce1a52	Fix r286570 build with debug.	2015-08-10 11:52:54 +00:00
Alexander Motin	83a6947e11	MFV r277428: 5056 ZFS deadlock on db_mtx and dn_holds Reviewed by: Will Andrews <willa@spectralogic.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Justin Gibbs <justing@spectralogic.com> illumos/illumos-gate@bc9014e6a8	2015-08-10 11:30:07 +00:00
Alexander Motin	f13e9e1470	MFV r277427: 5445 Add more visibility via arcstats; specifically arc_state_t stats and differentiate between "data" and "metadata" Reviewed by: Basil Crow <basil.crow@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Bayard Bell <bayard.bell@nexenta.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@4076b1bf41	2015-08-10 10:59:58 +00:00
Alexander Motin	c908dc6f4b	MFV r277426: 5408 managing ZFS cache devices requires lots of RAM Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Don Brady <dev.fs.zfs@gmail.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Chris Williamson <Chris.Williamson@delphix.com> illumos/illumos-gate@89c86e3229 Currently, every buffer cached in the L2ARC is accompanied by a 240-byte header in memory, leading to very high memory consumption when using very large cache devices. These changes significantly reduce this overhead. Currently: L1-only header = 176 bytes L1 + L2 or L2-only header = 176 bytes + 32 byte checksum + 32 byte l2hdr = 240 bytes Memory-optimized: L1-only header = 176 bytes L1 + L2 header = 176 bytes + 32 byte checksum = 208 bytes L2-only header = 96 bytes + 32 byte checksum = 128 bytes So overall: Trunk Optimized +-----------------+ L1-only \| 176 B \| 176 B \| (same) +-----------------+ L1 & L2 \| 240 B \| 208 B \| (saved 32 bytes) +-----------------+ L2-only \| 240 B \| 128 B \| (saved 116 bytes) +-----------------+ For an average blocksize of 8KB, this means that for the L2ARC, the ratio of metadata to data has gone down from about 2.92% to 1.56%. For a 'storage optimized' EC2 instance with 1600GB of SSD and 60GB of RAM, this means that we expect a completely full L2ARC to use (1600 GB * 0.0156) / 60GB = 41% of the available memory, down from 78%.	2015-08-10 10:34:23 +00:00

... 2 3 4 5 6 ...

1642 Commits