freebsd-skq

Author	SHA1	Message	Date
br	229f59004e	Implement FBT provider (MD part) for DTrace on MIPS. Tested on MIPS64. Sponsored by: DARPA, AFRL Sponsored by: HEIF5	2016-05-05 13:54:50 +00:00
asomers	440da718c9	Fix a use-after-free when "zpool import" fails clear vd->vdev_tsd in vdev_geom_close_locked instead of vdev_geom_detach. In the latter function, it would fail to happen in certain circumstances where cp->private was unset. Ideally, the latter should never happen, but it can happen when vdev open fails, or where spares are involved. MFC after: 4 weeks X-MFC-With: 298786 Sponsored by: Spectra Logic Corp	2016-04-29 21:29:37 +00:00
avg	df42baabd1	add invpcid instruction to i386 dtrace disassembler tables MFC after: 2 weeks	2016-04-29 15:45:22 +00:00
asomers	ccca204851	Refactor vdev_geom_attach and friends to reduce code duplication sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Move checks for provider's sectorsize and mediasize into a single location in vdev_geom_attach. Remove the zfs::vdev::taste class; it's ok to use the regular vdev class for tasting. Consolidate guid checks into a single location in vdev_attach_ok. Consolidate some error handling code from vdev_geom_attach into vdev_geom_detach, closing a resource leak of geom consumers in the process. Reviewed by: avg MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D5974	2016-04-29 15:23:51 +00:00
markj	9db9137b34	Increase DTRACE_FUNCNAMELEN from 128 to 192. This allows for the long function components encountered in www/firefox. This constant is part of DTrace's userland ABI, so this change may not be MFC'ed. PR: 207735	2016-04-25 18:44:11 +00:00
markj	c95af0fcd3	Allow DOF sections with excessively long probe function components. Without this change, DTrace will refuse to load a DOF section if the function component of any of its probes exceeds DTRACE_FUNCNAMELEN (128). Probes in C++ programs can have very long function components. Rather than rejecting all probes if a single probe exceeds the limit, simply skip the invalid probe and emit a warning. This ensures that valid probes are instantiated. PR: 207735 MFC after: 2 weeks	2016-04-25 18:40:57 +00:00
markj	3ce95b9c7b	Add a kern.dtrace.err_verbose sysctl to control dtrace_err_verbose. When this flag is turned on, DOF and DIF validation errors are printed to the kernel message buffer. This is useful for debugging. Also remove the debug.dtrace.debug sysctl, which has no effect.	2016-04-25 18:09:36 +00:00
avg	ca977737eb	lahf/sahf are supported on some amd64 processors While the instructions were not included into the original instruction set, their support can be indicated by a special feature bit. For example: CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU) ... AMD Features2=0x37ff<LAHF, ...> Clang 3.8 uses lahf/sahf as a faster alternative to pushf/popf where possible. MFC after: 2 weeks	2016-04-22 13:44:12 +00:00
avg	2a47164901	MFV r298471: 6052 decouple lzc_create() from the implementation details illumos/illumos-gate@26455f9efc `26455f9efc` https://www.illumos.org/issues/6052 At the moment type parameter of lzc_create() is of dmu_objset_type_t type. That exposes an implementation detail and requires sys/fs/zfs.h to be included in libzfs_core.h creating unnecessary coupling between libzfs_core interface and ZFS internals. I think that dmu_objset_type_t should be replaced with a libzfs_core enumeration of supported dataset types. For ABI reasons the new enumeration could be bit-compatible with dmu_objset_type_t. For example: typedef enum { LZC_DST_ZFS = 2, LZC_DST_ZVOL } lzc_dataset_type_t; Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Andriy Gapon <andriy.gapon@clusterhq.com> MFC after: 2 weeks Sponsored by: ClusterHQ	2016-04-22 13:00:27 +00:00
markj	b9677d1249	Make the second argument of dtrace_invop() a trapframe pointer. Currently this argument is a pointer into the stack which is used by FBT to fetch the first five probe arguments. On all non-x86 architectures it's simply the trapframe address, so this change has no functional impact. On amd64 it's a pointer into the trapframe such that stack[1 .. 5] gives the first five argument registers, which are deliberately grouped together in the amd64 trapframe definition. A trapframe argument simplifies the invop handlers on !x86 and makes the x86 FBT invop handler easier to understand. Moreover, it allows for invop handlers that may want to modify the register set of the interrupted thread.	2016-04-17 23:08:47 +00:00
avg	17bf78ca0c	zfs_rezget: z_vnode can not be NULL if zp is valid MFC after: 3 weeks	2016-04-16 07:41:56 +00:00
avg	095cceed90	zfs: enable vn_io_fault support Note that now we have to account for possible partial writes in dmu_write_uio_dbuf(). It seems that on illumos either all or none of the data are expected to be written. But the partial writes are quite expected when vn_io_fault support is enabled. Reviewed by: kib MFC after: 7 weeks Differential Revision: https://reviews.freebsd.org/D2790	2016-04-16 07:35:53 +00:00
asomers	417ecd5e08	Don't corrupt ZFS label's physpath attribute when booting while a disk is missing Prior to this change, vdev_geom_open_by_path would call vdev_geom_attach prior to verifying the device's GUIDs. vdev_geom_attach calls vdev_geom_attrchange to set the physpath in the vdev object. The result is that if the disk could not be found, then the labels for other disks in the same TLD would overwrite the missing disk's physpath with the physpath of whichever disk currently has the same devname as the missing one used to have. MFC after: 4 weeks Sponsored by: Spectra Logic Corp	2016-04-15 16:36:17 +00:00
asomers	0ee7b1f1ff	Add more debugging statements in vdev_geom.c Log a debugging message whenever geom functions fail in vdev_geom_attach. Printing these messages is controlled by vfs.zfs.debug MFC after: 4 weeks Sponsored by: Spectra Logic Corp	2016-04-14 23:14:41 +00:00
asomers	c6604882ed	Update a debugging message in vdev_geom_open_by_guids for consistency with similar messages elsewhere in the file. MFC after: 4 weeks Sponsored by: Spectra Logic Corp	2016-04-14 19:20:31 +00:00
asomers	2a0941fb20	Fix rare double free in vdev_geom_attrchanged sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Don't drop the g_topology_lock before freeing old_physpath. That opens up a race where one thread can call vdev_geom_attrchanged, set old_physpath, drop the g_topology_lock, then block trying to acquire the SCL_STATE lock. Then another thread can come into vdev_geom_attrchanged, set old_physpath to the same value, and proceed to free it. When the first thread resumes, it will free the same location. It turns out that the SCL_STATE lock isn't needed. It was originally added by gibbs to protect vd->vdev_physpath while updating the same. However, the update process subsequently was switched to an atomic operation (a pointer swap). Now, there is no need for the SCL_STATE lock, and hence no need to drop the g_topology_lock. Reviewed by: delphij MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D5413	2016-04-12 19:11:14 +00:00
avg	43050e97c2	l2arc: make sure that all writes honor ashift of a cache device Previously uncompressed buffers did not obey that rule. Type of b_asize is changed to uint64_t for consistency, given that this is a zeta-byte filesystem. l2arc_compress_buf is renamed to l2arc_transform_buf to better reflect its new utility. Now not only we ensure that a compressed buffer has a size aligned to ashift, but we also allocate a properly sized temporary buffer if the original buffer is not compressed and it has an odd size. This ensures that all I/O to the cache device is always ashift-aligned, in terms of both a request offset and a request size. If the aligned data is larger than the original data, then we have to use a temporary buffer when reading it as well. Also, enhance physical zio alignment checks using vdev_logical_ashift. On FreeBSD we have this information, so we can make stricter assertions. Reviewed by: smh, mav MFC after: 1 month Sponsored by: ClusterHQ Differential Revision: https://reviews.freebsd.org/D2789	2016-04-12 06:56:35 +00:00
avg	665a947e1b	Revert r297396 Modify "4958 zdb trips assert on pools with ashift >= 0xe" A better fix is following.	2016-04-12 06:54:18 +00:00
mav	00165201ff	MFV r297831: 6322 ZFS indirect block predictive prefetch Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Author: Alexander Motin <mav@FreeBSD.org> Improve speculative prefetch of indirect blocks. Scalability of many operations on wide ZFS pool can be limited by requirement to prefetch indirect blocks first. Recently added asynchronous indirect block read partially helped, but did not solve the problem completely. This patch extends existing prefetcher functionality to explicitly work with indirect blocks. Before this change prefetcher issued reads for up to 8MB of data in advance. With this change it also issues indirect block reads for up to 64MB of data in advance, so that when it will be time to actually read those data, it can be done immediately. Alike effect can be achieved by just increasing maximal data prefetch distance, but at higher memory cost. Also this change introduces indirect block prefetch for rewrite operations, that was never done before. Previously ARC miss for Indirect blocks regularly blocked rewrites, converting perfectly aligned asynchronous operations into synchronous read-write pairs, significantly reducing maximal rewrite speed. While being there this issue was also fixed: - prefetch was done always, even if caching for the dataset was completely disabled. Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool of 12 assorted HDDs shown me such performance numbers: ------- BEFORE -------- Write 491363677 bytes/sec Read 312430631 bytes/sec Rewrite 97680464 bytes/sec -------- AFTER -------- Write 493524146 bytes/sec Read 438598079 bytes/sec Rewrite 277506044 bytes/sec Closes #65 Closes #80 openzfs/openzfs@792fd28ac0	2016-04-11 21:09:15 +00:00
smh	175f9d7444	Only include sysctl in kernel build Only include sysctl in kernel builds fixing warning about implicit declaration of function 'sysctl_handle_int'. PR: 204140 MFC after: 1 week X-MFC-With: r297813 Sponsored by: Multiplay	2016-04-11 13:17:11 +00:00
smh	d86049c572	Only include sysctl in kernel build Only include sysctl in kernel builds fixing warning about implicit declaration of function 'sysctl_handle_int'. Sponsored by: Multiplay	2016-04-11 08:57:54 +00:00
avg	05bba575e3	zio: align use of "no dump" flag between use_uma and !use_uma cases At the moment no ZFS buffers are included into a crash dump unless ZFS_DEBUG (or INVARIANTS) kernel option is enabled. That's not very helpful for debugging of ZFS problems, because important information often resides in metadata buffers. This change switches the dumping behavior when UMA is used from the illumos behavior to a more useful behavior that we have on FreeBSD when ZFS buffers are allocated via malloc. Reviewed by: smh, mav MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D5892	2016-04-11 07:11:20 +00:00
markj	61ede4b2ec	Implement support for boot-time DTrace. This allows one to enable DTrace probes relatively early during boot, during SI_SUB_DTRACE_ANON, before dtrace(1) can invoked. The desired enabling is created using dtrace -A, which writes a /boot/dtrace.dof file and uses nextboot(8) to ensure that DTrace kernel modules are loaded and that the DOF file describing the enabling is loaded by loader(8) during the subsequent boot. The trace output can then be fetched with dtrace -a. With this commit, boot-time DTrace is only functional on i386 and amd64: on other architectures, the high-resolution timer frequency is initialized during SI_SUB_CLOCKS and is thus not available when the anonymous tracing state is initialized. On x86, the TSC is used and is thus available earlier. MFC after: 1 month Relnotes: yes	2016-04-10 01:25:48 +00:00
markj	9cabb44981	Initialize SDT probes during SI_SUB_DTRACE_PROVIDER. This is consistent with all other DTrace providers and ensures that SDT probes are available for boot-time tracing. MFC after: 2 weeks	2016-04-10 01:24:27 +00:00
markj	33baaeb76f	Initialize DTrace hrtimer frequency during SI_SUB_CPU on i386 and amd64. This allows the hrtimer to be used earlier during boot. This is required for boot-time DTrace: anonymous enablings are created during SI_SUB_DTRACE_ANON, which runs before APs are started. In particular, the DTrace deadman timer requires that the hrtimer be functional. MFC after: 2 weeks	2016-04-10 01:23:39 +00:00
mav	6dcdb61ae3	MFV r297760: 6418 zpool should have a label clearing command Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Author: Will Andrews <will@firepipe.net> Closes #83 Closes #32 openzfs/openzfs@9663688425 FreeBSD already had `zpool labelclear` functionality, so this is mostly just a diff reduction. MFC after: 1 month	2016-04-09 20:30:50 +00:00
avg	f54441416c	zio write issue threads should have lower (numerically greater) priority This is because they might do data compression which is quite CPU expensive. The original code is correct for illumos, because there a higher priority corresponds to a greater number. MFC after: 2 weeks	2016-04-08 11:58:24 +00:00
mav	b9aa8fc68c	Alike to r293708 relax pool check in vdev_geom_open_by_path(). This made impossible spare disk open by known path, which kind of worked only because the same fix was applied to vdev_geom_attach_by_guids() in r293708. MFC after: 1 week	2016-04-07 12:54:44 +00:00
trasz	825d80e01c	Add four new RCTL resources - readbps, readiops, writebps and writeiops, for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080	2016-04-07 04:23:25 +00:00
wma	f5a4347e1c	Implement dtrace_getupcstack in ARM64 Allow using DTRACE for performance analysis of userspace applications - the function call stack can be captured. This is almost an exact copy of AMD64 solution. Obtained from: Semihalf Sponsored by: Cavium Reviewed by: emaste, gnn, jhibbits Differential Revision: https://reviews.freebsd.org/D5779	2016-04-06 05:13:36 +00:00
avg	4a4b7f856b	remove emulation of VFS_HOLD and VFS_RELE from opensolaris compat On FreeBSD VFS_HOLD/VN_RELE were mapped to MNT_REF/MNT_REL that manipulate mnt_ref. But the job of properly maintaining the reference count is already automatically performed by insmntque(9) and delmntque(9). So, in effect all ZFS vnodes referenced the corresponding mountpoint twice. That was completely harmless, but we want to be very explicit about what FreeBSD VFS APIs are used, because illumos VFS_HOLD and FreeBSD MNT_REF provide quite different guarantees with respect to the held vfs_t / mountpoint. On illumos VFS_HOLD is sufficient to guarantee that vfs_t.vfs_data stays valid. On the other hand, on FreeBSD MNT_REF does not provide the same guarantee about mnt_data. We have to use vfs_busy() to get that guarantee. Thus, the calls to VFS_HOLD/VFS_RELE on vnode init and fini are removed. VFS_HOLD calls are replaced with vfs_busy in the ioctl handlers. And because vfs_busy has a richer interface that can not be dumbed down in all cases it's better to explicitly use it rather than trying to mask it behind VFS_HOLD. This change fixes a panic that could result from a race between zfs_umount() and zfs_ioc_rollback(). We observed a case where zfsvfs_free() tried to destroy data that zfsvfs_teardown() was still using. That happened because there was nothing to prevent unmounting of a ZFS filesystem that was in between zfs_suspend_fs() and zfs_resume_fs(). Reviewed by: kib, smh MFC after: 3 weeks Sponsored by: ClusterHQ Differential Revision: https://reviews.freebsd.org/D2794	2016-04-02 16:25:46 +00:00
mav	ba917196d7	MFV r297506: 6738 zfs send stream padding needs documentation Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Eli Rosenthal <eli.rosenthal@delphix.com> illumos/illumos-gate@c20404ff77	2016-04-02 08:36:24 +00:00
mav	73011e007a	MFV r297504: 6681 zfs list burning lots of time in dodefault() via dsl_prop_* Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alex Wilson <alex.wilson@joyent.com> illumos/illumos-gate@d09e4475f6	2016-04-02 08:28:46 +00:00
glebius	970d563531	Fix an error in r292373. Use proper count to update "pages in" counter. Noticed by: pfg via Coverity	2016-03-31 21:15:00 +00:00
mav	06c88b0e22	Plug open count leak on zvol rename. MFC after: 2 weeks	2016-03-30 16:54:18 +00:00
mav	b85bec3458	Switch from using make_dev_p() to make_dev_s() to close races.	2016-03-30 16:48:57 +00:00
mav	50470342d9	Modify "4958 zdb trips assert on pools with ashift >= 0xe". Unlike Illumos FreeBSD has concept of logical ashift, that specifies really minimal vdev block size that can be accessed. This knowledge allows properly pad physical I/O and correctly assert its alignment. This change fixes L2ARC write errors when device has logical sector size above 512 bytes. MFC after: 1 month	2016-03-29 19:18:34 +00:00
mav	88bdb04f5d	Pass through error code from make_dev_p(). ENAMETOOLONG is much more informative in logs then ENXIO. MFC after: 1 week	2016-03-28 08:12:29 +00:00
mav	0c48717363	Unify ignoring EEXIST from zvol_create_minor(). This fixes creation of zvol devices for snapshots during zfs receive, that previously failed with "ZFS WARNING: Unable to create ZVOL" message. This solution is not perfect, but IMHO better then it was before. MFC after: 2 weeks	2016-03-24 10:10:41 +00:00
markj	6b1b5da758	Remove unused variables dtrace_in_probe and dtrace_in_probe_addr.	2016-03-17 18:55:54 +00:00
mav	7521c72f77	Fix small memory leak on attempt to access deleted snapshot. MFC after: 3 days	2016-03-15 21:21:28 +00:00
mav	9a3b845122	Make ZFS ignore stripe sizes above SPA_MAXASHIFT (8KB). If device has stripe size bigger then maximal sector size supported by ZFS, there is nothing can be done to avoid read-modify-write cycles. Taking that stripe size into account will only reduce space efficiency and pointlessly bother user with warnings that can not be fixed. Discussed with: smh	2016-03-10 16:39:46 +00:00
mav	9ba6cd4eb0	Make ZFS more picky to GEOM stripe sizes and offsets. Use of misaligned or non-power-of-2 stripes is not really useful for ZFS, since increased ashift won't help to avoid read-modify-write cycles, and only reduce pool space efficiency and compression rates.	2016-03-10 14:18:14 +00:00
mav	4d9e23ad72	MFV r296609: 6370 ZFS send fails to transmit some holes Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Stefan Ring <stefanrin@gmail.com> Reviewed by: Steven Burgess <sburgess@datto.com> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com> In certain circumstances, "zfs send -i" (incremental send) can produce a stream which will result in incorrect sparse file contents on the target. The problem manifests as regions of the received file that should be sparse (and read a zero-filled) actually contain data from a file that was deleted (and which happened to share this file's object ID). Note: this can happen only with filesystems (not zvols, because they do not free (and thus can not reuse) object IDs). Note: This can happen only if, since the incremental source (FromSnap), a file was deleted and then another file was created, and the new file is sparse (i.e. has areas that were never written to and should be implicitly zero-filled). We suspect that this was introduced by 4370 (applies only if hole_birth feature is enabled), and made worse by 5243 (applies if hole_birth feature is disabled, and we never send any holes). The bug is caused by the hole birth feature. When an object is deleted and replaced, all the holes in the object have birth time zero. However, zfs send cannot tell that the holes are new since the file was replaced, so it doesn't send them in an incremental. As a result, you can end up with invalid data when you receive incremental send streams. As a short-term fix, we can always send holes with birth time 0 (unless it's a zvol or a dataset where we can guarantee that no objects have been reused). Closes #37 openzfs/openzfs@adef853162	2016-03-10 09:01:19 +00:00
mav	8e3791996d	Add new IOCTL compat shims for ABI breakage caused by r296510: MFV r296505: 6531 Provide mechanism to artificially limit disk performance	2016-03-09 11:16:15 +00:00
mav	6126cff85d	MFV r296529: 6672 arc_reclaim_thread() should use gethrtime() instead of ddi_get_lbolt() 6673 want a macro to convert seconds to nanoseconds and vice-versa Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Eli Rosenthal <eli.rosenthal@delphix.com> illumos/illumos-gate@a8f6344fa0	2016-03-08 18:28:24 +00:00
mav	2ad69bb39b	MFV r296527: 6659 nvlist_free(NULL) is a no-op Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Marcel Telka <marcel@telka.sk> Approved by: Robert Mustacchi <rm@joyent.com> Author: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> illumos/illumos-gate@aab83bb83b	2016-03-08 18:11:38 +00:00
mav	f255ec5f1b	MFV r296522: 6541 Pool feature-flag check defeated if "verify" is included in the dedup property value Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: ilovezfs <ilovezfs@icloud.com> illumos/illumos-gate@971640e6aa	2016-03-08 17:58:02 +00:00
mav	a5f2c3291c	MFV r296520: 6562 Refquota on receive doesn't account for overage Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Dan McDonald <danmcd@omniti.com> illumos/illumos-gate@5f7a8e6d75	2016-03-08 17:53:42 +00:00
mav	eb06d29389	MFV r296518: 5027 zfs large block support (add copyright) Author: Matthew Ahrens <matt@mahrens.org> illumos/illumos-gate@c3d26abc9e	2016-03-08 17:51:09 +00:00

1 2 3 4 5 ...

1491 Commits