freebsd-skq

Author	SHA1	Message	Date
Alexander Motin	4cd20c3b08	Add vfs.zfs.zio.taskq_batch_pct tunable. MFC after: 1 week	2019-11-05 15:19:05 +00:00
Andriy Gapon	ec03988887	fix up r354333, make zfsproc visible to dtrace, rename to system_proc I overlooked the fact that zfsproc is required by dtrace modules that use illumos compatible taskq KPI. So, move the symbol definition to the opensolaris module that provides compatibility support for both ZFS and DTrace. Also, rename zfsproc to system_proc to reflect that it is not specific to ZFS. Reported by: ae MFC after: 5 weeks X-MFC with: ae	2019-11-05 14:34:59 +00:00
Andriy Gapon	eb819923ec	zfs: enable SPA_PROCESS on the kernel side The purpose of this change is to group kernelthreads specific to a particular ZFS pool under a kernel process. There can be many dozens of threads per pool. This change improves observability of those threads. This change consists of several subchanges: 1. illumos taskq_create_proc can now pass its process parameter to taskqueue. Also, use zfsproc instead of NULL for taskq_create. Caveat: zfsproc might not be initialized yet. But in that case it is still NULL, so not worse than before. 2. illumos sys/proc.h: kthread id is stored in t_did field, not t_tid. 3. zfs: enable SPA_PROCESS on the kernel side. The change is a bit hairy as newproc() is implemented privately to spa.c. I couldn't think of a better way to populate process name than to poke inside the argument for the process routine. 4. illumos thread_create: allow assigning thread to process other than zfsproc. 5. zfs: expose spa_proc to other users, assign sync and quiesce threads to it. Pool-specific threads created using (relatively new) zthr mechanism are still assigned to the zfskern process rather than to a respective zpool-xxx process. I am going to address this a bit later. Reviewed by: no one MFC after: 5 weeks Relnotes: perhaps Differential Revision: https://reviews.freebsd.org/D9720	2019-11-04 13:30:37 +00:00
Toomas Soome	79a4bf8975	loader: factor out label and uberblock load from vdev_probe, add MMP checks Clean up the label read.	2019-11-03 21:19:52 +00:00
Toomas Soome	0c0a882c7a	loader: we do not support booting from pool with log device If pool has log device, stop there and tell about it.	2019-11-03 13:25:47 +00:00
Toomas Soome	abca0bd501	loader: calculate physical vdev psize from asize Since physical device asize is calculated from psize and the asize is stored in pool label, we can use asize to set the value of psize, which is used to calculate the location of the pool labels. MFC after: 1 week	2019-11-03 11:09:06 +00:00
Toomas Soome	24e1a7ac77	r354253 did miss the fact that libzpool is built as fake kernel We build libzpool as kernel like, use _FAKE_KERNEL check to include kernel api in libzpool.	2019-11-02 21:02:54 +00:00
Toomas Soome	25cf531ecd	r354253 did miss lz4.c from sys/cddl/boot/zfs.	2019-11-02 15:08:19 +00:00
Toomas Soome	e499793e76	Remove duplicate lz4 implementations Port illumos change: https://www.illumos.org/issues/11667 Move lz4.c out of zfs tree to opensolaris/common/lz4, adjust it to be usable from kernel/stand/userland builds, so we can use just one single source. Add lz4.h to declare lz4_compress() and lz4_decompress(). MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D22037	2019-11-02 12:28:04 +00:00
Alexander Motin	a4d5fcadd8	FreeBSD'fy ZFS zlib zalloc/zfree callbacks. The previous code came from OpenSolaris, which in my understanding require allocation size to be known to free memory. To store that size previous code allocated additional 8 byte header. But I have noticed that zlib with present settings allocates 64KB context buffers for each call, that could be efficiently cached by UMA, but addition of those 8 bytes makes them fall back to physical RAM allocations, that cause huge overhead and lock congestion on small blocks. Since FreeBSD's free() does not have the size argument, switching to it solves the problem, increasing write speed to ZVOLs with 4KB block size and GZIP compression on my 40-threads test system from ~60MB/s to ~600MB/s. MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-10-29 21:25:19 +00:00
Toomas Soome	903fe2b762	loader: zio_checksum_verify should check byteswap We do have both native and byteswap checksum callbacks in place but the selection is not wired. MFC after: 1 week	2019-10-27 08:35:29 +00:00
Alan Somers	1af3a11218	MFZoL: Avoid retrieving unused snapshot props This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it to take input parameters that alter the way looping through the list of snapshots is performed. The idea here is to restrict functions that throw away some of the snapshots returned by the ioctl to a range of snapshots that these functions actually use. This improves efficiency and execution speed for some rollback and send operations. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #8077 zfsonlinux/zfs@4c0883fb4a MFC after: 2 weeks	2019-10-26 17:11:02 +00:00
Konstantin Belousov	5b87ecc643	Assert that vnode_pager_setsize() is called with the vnode exclusively locked except for filesystems that set the MNTK_VMSETSIZE_BUG, Set the flag for ZFS. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 16:21:24 +00:00
Andriy Gapon	b6528d546f	MFV r353637: 10844 Serialize ZTHR operations to eliminate races illumos/illumos-gate@6a316e1f6d `6a316e1f6d` https://www.illumos.org/issues/10844 ZoL 61c3391acc9 Serialize ZTHR operations to eliminate races Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com> Obtained from: illumos, ZoL MFC after: 3 weeks	2019-10-16 09:29:01 +00:00
Andriy Gapon	428c45f156	MFV r353630: 10809 Performance optimization of AVL tree comparator functions illumos/illumos-gate@c4ab0d3f46 `c4ab0d3f46` https://www.illumos.org/issues/10809 Port ZoL ee36c709c3d Performance optimization of AVL tree comparator functions This is a followup to r337567 that imported the ZoL commit directly into FreeBSD. It seems that at the time we did not have some of the earlier changes, so some pieces of the ZoL change were not applicable. Also, the illumos version got a few style cleanups. Some changes were missed or incorrectly merged (e.g., vdev_cache_lastused_compare and metaslab_rangesize_compare). Obtained from: ZoL, illumos MFC after: 25 days X-MFC after: r353634	2019-10-16 09:20:08 +00:00
Andriy Gapon	786c532a8f	MFV r348596: 9689 zfs range lock code should not be zpl-specific illumos/illumos-gate@7931524763 FreeBSD note: some tweaking was needed to avoid a conflict with sys/rangelock.h. Author: Matthew Ahrens <mahrens@delphix.com> Obtained from: illumos MFC after: 3 weeks	2019-10-16 09:04:53 +00:00
Andriy Gapon	f6a4b91c75	MFV r353628: 10842 Mutex leak in dsl_dataset_hold_obj() illumos/illumos-gate@ad027c0ff9 `ad027c0ff9` https://www.illumos.org/issues/10842 ZoL d10b2f1d35b Mutex leak in dsl_dataset_hold_obj() Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Author: Jorgen Lundman <lundman@lundman.net> Obtained from: illumos, ZoL MFC after: 15 days	2019-10-16 07:57:58 +00:00
Andriy Gapon	0c4f60b734	MFV r353619: 9691 fat zap should prefetch when iterating illumos/illumos-gate@52abb70e07 `52abb70e07` https://www.illumos.org/issues/9691 When iterating over a ZAP object, we're almost always certain to iterate over the entire object. If there are multiple leaf blocks, we can realize a performance win by issuing reads for all the leaf blocks in parallel when the iteration begins. For example, if we have 10,000 snapshots, "zfs destroy -nv pool/fs@1%9999" can take 30 minutes when the cache is cold. This change provides a >3x performance improvement, by issuing the reads for all ~64 blocks of each ZAP object in parallel. Author: Matthew Ahrens <mahrens@delphix.com> Obtained from: illumos MFC after: 2 weeks	2019-10-16 07:09:00 +00:00
Andriy Gapon	9efb961d9a	MFV r353617: 9425 allow channel programs to be stopped via signals illumos/illumos-gate@d0cb1fb926 `d0cb1fb926` https://www.illumos.org/issues/9425 Problem Statement ZFS Channel program scripts currently require a timeout, so that hung or long-running scripts return a timeout error instead of causing ZFS to get wedged. This limit can currently be set up to 100 million Lua instructions. Even with a limit in place, it would be desirable to have a sys admin (support engineer) be able to cancel a script that is taking a long time. Proposed Solution Make it possible to abort a channel program by sending an interrupt signal.In the underlying txg_wait_sync function, switch the cv_wait to a cv_wait_sig to catch the signal. Once a signal is encountered, the dsl_sync_task function can install a Lua hook that will get called before the Lua interpreter executes a new line of code. The dsl_sync_task can resume with a standard txg_wait_sync call and wait for the txg to complete. Meanwhile, the hook will abort the script and indicate that the channel program was canceled. The kernel returns a EINTR to indicate that the channel program run was canceled. FreeBSD note: the return value of cv_wait_sig() has inverted meaning between us and illumos. Author: Don Brady <don.brady@delphix.com> Obtained from: illumos MFC after: 4 weeks	2019-10-16 07:00:18 +00:00
Andriy Gapon	179e6dab09	MFV r353615: 9485 Optimize possible split block search space illumos/illumos-gate@a21fe34979 `a21fe34979` https://www.illumos.org/issues/9485 Port this commit from ZoL: `4589f3ae4c` Author: Brian Behlendorf <behlendorf1@llnl.gov> Obtained from: illumos, ZoL MFC after: 3 weeks	2019-10-16 06:43:22 +00:00
Andriy Gapon	7149963e95	MFV r353613: 10731 zfs: NULL pointer errors FreeBSD already had these changes locally. This commit removes a small formatting difference. MFC after: 1 week	2019-10-16 06:38:05 +00:00
Andriy Gapon	6cb9ab2bad	MFC r353611: 10330 merge recent ZoL vdev and metaslab changes illumos/illumos-gate@a0b03b161c `a0b03b161c` https://www.illumos.org/issues/10330 3 recent ZoL changes in the vdev and metaslab code which we can pull over: PR 8324 c853f382db 8324 Change target size of metaslabs from 256GB to 16GB PR 8290 b194fab0fb 8290 Factor metaslab_load_wait() in metaslab_load() PR 8286 419ba59145 8286 Update vdev_is_spacemap_addressable() for new spacemap encoding Author: Serapheim Dimitropoulos <serapheimd@gmail.com> Obtained from: illumos, ZoL MFC after: 2 weeks	2019-10-16 06:26:51 +00:00
Andriy Gapon	b399ca755a	MFV r353608: 10165 libzpool: passing argument 1 to restrict-qualified parameter illumos/illumos-gate@f91fcf59ac `f91fcf59ac` https://www.illumos.org/issues/10165 Author: Toomas Soome <tsoome@me.com> MFC after: 10 days	2019-10-16 06:09:00 +00:00
Andriy Gapon	67f8ab8ebb	fix up r353565, somehow a few files did not get committed MFC after: 3 weeks X-MFC with: r353565	2019-10-15 15:52:01 +00:00
Andriy Gapon	6f2721b907	MFV r353561: 10343 ZoL: Prefix all refcount functions with zfs_ illumos/illumos-gate@e914ace2e9 `e914ace2e9` https://www.illumos.org/issues/10343 On the openzfs feature/porting matrix, this is listed as: prefix to refcount funcs/types Having these changes will make it easier to share other work across the different ZFS operating systems. PR 7963 424fd7c3e Prefix all refcount functions with zfs_ PR 7885 & 7932 c13060e47 Linux 4.19-rc3+ compat: Remove refcount_t compat PR 5823 & 5842 4859fe796 Linux 4.11 compat: avoid refcount_t name conflict Author: Tim Schumacher <timschumi@gmx.de> Obtained from: illumos, ZoL MFC after: 3 weeks	2019-10-15 15:09:36 +00:00
Andriy Gapon	563db1a947	MFV r353558: 10572 10579 Fix race in dnode_check_slots_free() illumos/illumos-gate@aa02ea0194 `aa02ea0194` 10572 Fix race in dnode_check_slots_free() https://www.illumos.org/issues/10572 The Fix from ZoL: Currently, dnode_check_slots_free() works by checking dn->dn_type in the dnode to determine if the dnode is reclaimable. However, there is a small window of time between dnode_free_sync() in the first call to dsl_dataset_sync() and when the useraccounting code is run when the type is set DMU_OT_NONE, but the dnode is not yet evictable, leading to crashes. This patch adds the ability for dnodes to track which txg they were last dirtied in and adds a check for this before performing the reclaim. This patch also corrects several instances when dn_dirty_link was treated as a list_node_t when it is technically a multilist_node_t. 10579 Don't allow dnode allocation if dn_holds != 0 https://www.illumos.org/issues/10579 The fix from ZoL: This patch simply fixes a small bug where dnode_hold_impl() could attempt to allocate a dnode that was in the process of being freed, but which still had active references. This patch simply adds the required check. Author: Tom Caputi <tcaputi@datto.com> Reported by: delphij MFC after: 2 weeks X-MFC with: r353176	2019-10-15 14:29:18 +00:00
Andriy Gapon	4368589338	MFV r353551: 10452 ZoL: merge in large dnode feature fixes illumos/illumos-gate@946342a260 `946342a260` https://www.illumos.org/issues/10452 illumos is missing a few small follow up ZoL bug fixes for the large dnode feature. We should pull those in. Those commits are in the ZoL tree as (newest to oldest): PR 8435 - 75d6b7ddca269542279975f716a343bb40a79baf - Add missing copyright notice to large_dnode tests PR 7433 - e14a32b1c844d924b9f093375c0badcf10f61741 - Fix object reclaim when using large dnodes PR 6616 - 48fbb9ddbf2281911560dfbc2821aa8b74127315 - Free objects when receiving full stream as clone PR 6695 - 39f56627ae988d09b4e3803c01c22b2026b2310e - receive_freeobjects() skips freeing some object Portions contributed by: Ned Bass <bass6@llnl.gov> Portions contributed by: Tom Caputi <tcaputi@datto.com> Author: Fabian Grünbichler <f.gruenbichler@proxmox.com> Obtained from: illumos, ZoL MFC after: 2 weeks X-MFC with: r353176	2019-10-15 14:20:11 +00:00
Jeff Roberson	0012f373e4	(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594	2019-10-15 03:45:41 +00:00
Mateusz Guzik	8fd727827c	zfs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:41:30 +00:00
Andriy Gapon	d0a9542494	fix up r353340, don't assume that fcmpset has strong semantics fcmpset can have two kinds of semantics, weak and strong. For practical purposes, strong semantics means that if fcmpset fails then the reported current value is always different from the expected value. Weak semantics means that the reported current value may be the same as the expected value even though fcmpset failed. That's a so called "sporadic" failure. I originally implemented atomic_cas expecting strong semantics, but many platforms actually have weak one. Reported by: pkubaj (not confirmed if same issue) Discussed with: kib, mjg MFC after: 19 days X-MFC with: r353340	2019-10-11 17:01:02 +00:00
Alan Somers	34e9a37f4d	MFZol: Fix performance of "zfs recv" with many deletions This patch fixes 2 issues with the DMU free throttle implemented in dmu_free_long_range(). The first issue is that get_next_chunk() was calculating the number of L1 blocks the free would dirty incorrectly. In some cases involving extremely large files, this code would greatly overestimate the number of affected L1 blocks, causing excessive calls to txg_wait_open(). This patch corrects the calculation. The second issue is that the free throttle uses the total number of free'd blocks in all (open, quiescing, and syncing) txgs to determine whether to throttle. This causes large frees (such as those created by the first issue) to cause 4 txg syncs before any further frees were allowed to proceed. This patch ensures that the accounting is done entirely in a per-txg fashion, so that frees from a given txg don't affect those that immediately follow it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> zfsonlinux/zfs@f4c594da94 Freeing throttle should account for holes Deletion throttle currently does not account for holes in a file. This means that it can activate when it shouldn't. To fix it we switch the throttle to be based on the number of L1 blocks we will have to dirty when freeing Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> zfsonlinux/zfs@65282ee9e0 Submitted by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: allanjude MFC after: 2 weeks Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D21895	2019-10-11 14:59:28 +00:00
Andriy Gapon	d0c0856f63	emulate illumos membar_producer with atomic_thread_fence_rel membar_producer is supposed to be a store-store barrier. Also, in the code that FreeBSD has ported from illumos membar_producer is used only with regular stores to regular memory (with respect to caching). We do not have an MI primitive for the store-store barrier, so atomic_thread_fence_rel is the closest we have as it provides (load \| store) -> store barrier. Previously, membar_producer was an empty function call on all 32-bit arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was inadequate. On other platforms, such as amd64, arm64, i386, powerpc64, sparc64, membar_producer was implemented using stronger primitives than required for a store-store barrier with respect to regular memory access. For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO). On powerpc64 we now use recommended lwsync instead of eieio. On sparc64 FreeBSD uses TSO mode. On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this is an improvement, actually. After this change we can drop opensolaris_atomic.S for aarch64, amd64, powerpc64 and sparc64 as all required atomic operations have either direct or light-weight mapping to FreeBSD native atomic operations. Discussed with: kib MFC after: 4 weeks	2019-10-10 07:39:41 +00:00
Andriy Gapon	f5c4c7209b	cleanup of illumos compatibility atomics atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms. Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have it. The only exception is sparc64 that provides MD atomic_cas_32 and atomic_cas_64. This is slightly inefficient as fcmpset reports whether the operation updated the target and that information is not needed for cas. Nevertheless, there is less code to maintain and to add for new platforms. Also, the operations are done inline now as opposed to function calls before. atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms that provide it. casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as they have no users. atomic_mtx that is used to emulate 64-bit atomics on platforms that lack them is defined only on those platforms. As a result, platform specific opensolaris_atomic.S files have lost most of their code. The only exception is i386 where the compat+contrib code provides 64-bit atomics for userland use. That code assumes availability of cmpxchg8b instruction. FreeBSD does not have that assumption for i386 userland and does not provide 64-bit atomics. Hopefully, this can and will be fixed. MFC after: 3 weeks	2019-10-09 11:26:36 +00:00
Andriy Gapon	ac99b25298	zfs: use atomic_load_64 to read atomic variable in dmu_object_alloc_impl As long as we support ZFS on 32-bit platforms we should do this for all 64-bit variables that are modified in a lockless fashion using atomic operations. Otherwise, there is a risk of a reading a torn value. Here is a rationale for why I am doing this in dmu_object_alloc_impl: - it's very recent code - the code deals with object IDs and a number of objects in a file system can overflow 32 bits - incorrect allocation of an object ID may result in hard to debug problems - fixing all plain reads of 64-bit atomic variables is not a trivial undertaking to do in one shot, so I chose to do it incrementally MFC after: 3 weeks X-MFC after: r353301, r353176	2019-10-08 11:27:48 +00:00
Andriy Gapon	3251c5ae51	fix up r353168, add atomic_swap_64 to i386 version of opensolaris_atomic.S The compatibility code for the atomic operations in ZFS code is a bit messy. In some cases the native definitions are directly made available, in some cases there are emulated operations in opensolaris_atomic.c and in yet other cases there are atomic operations implemented in assembly that were obtained from OpenSolaris / illumos. This commit adds atomic_swap_64 for use with i386 userland. The code is copied from illumos. I am not sure why FreeBSD does not provide that operation natively. Maybe because we try (or pretend) to support processors that did not have the necessary instructions. While here I also added atomic_load_64 for the same reasons. This is original code based on iilumos atomic_swap_64 and FreeBSD atomic_load_acq_64_i586. Pointyhat to: avg MFC after: 1 week	2019-10-07 12:53:27 +00:00
Andriy Gapon	862c20fd89	MFV r350898, r351075: 8423 8199 7432 Implement large_dnode pool feature 8423 8199 7432 Implement large_dnode pool feature 7432 Large dnode pool feature 8199 multi-threaded dmu_object_alloc() 8423 Implement large_dnode pool feature 10406 large_dnode changes broke zfs recv of legacy stream llumos/illumos-gate@54811da5ac `54811da5ac` https://www.illumos.org/issues/8423 https://www.illumos.org/issues/8199 https://www.illumos.org/issues/7432 illumos/illumos-gate@811964cd9f `811964cd9f` https://www.illumos.org/issues/10406 ZoL issues: Improved dnode allocation #6564 Clean up large dnode code #6262 Fix dnode_hold() freeing dnode behavior #8172 Fix dnode allocation race #6414, #6439 Partial: Raw sends must be able to decrease nlevels #6821, #6864 Remove unnecessary txg syncs from receive_object() Closes #7197 This updates FreeBSD large_dnode code (that was imported from ZoL) to a version that was committed to illumos. It has some cleanups, improvements and fixes comparing to what we have in FreeBSD now. I think that the most significant update is 8199 multi-threaded dmu_object_alloc(). This commit reverts r351077 that was a revert of r351074 and r351076 and restores those changes. Required atomic operations should be available now on all platforms where we build ZFS. Obtained from: illumos MFC after: 3 weeks	2019-10-07 08:14:45 +00:00
Andriy Gapon	f1fd3654e7	ZFS: unconditionally use atomic_swap_64 Previously, the code used a plain store on platforms that lacked atomic_swap_64 and possibly some other platforms as the condition worked only if atomic_swap_64 was a macro. MFC after: 1 week X-MFC after: r353166, r353167	2019-10-07 08:00:54 +00:00
Andriy Gapon	cf78c4beae	ZFS: add emulation of atomic_swap_64 and atomic_load_64 Some 32-bit platforms do not provide 64-bit atomic operations that ZFS requires, either in userland or at all. We emulate those operations for those platforms using a mutex. That is not entirely correct and it's very efficient. Besides, the loads are plain loads, so torn values are possible. Nevertheless, the emulation seems to work for some definition of work. This change adds atomic_swap_64, which is already used in ZFS code, and atomic_load_64 that can be used to prevent torn reads. MFC after: 1 week	2019-10-07 07:54:34 +00:00
Mateusz Guzik	818631b634	zfs: add root vnode caching This replaces the approach added in r338927. See r353150. Sponsored by: The FreeBSD Foundation	2019-10-06 22:16:00 +00:00
Mariusz Zaborski	5eb65c4ce5	dtrace: 64-bits registers support The registers in ilumos and FreeBSD have a different number. In the illumos, last 32-bits register defined is SS an in FreeBSD is GS. While translating register we should comper it to the highest one. PR: 240358 Reported by: lwhsu@ MFC after: 2 weeks	2019-10-04 16:17:00 +00:00
Andriy Gapon	912c3fe715	ZFS: add bookmark renaming The feature is implemented as an extension of the existing ZFS_IOC_RENAME ioctl. Both the userland and the DSL interfaces support renaming only a single bookmark at a time. As of now, there is no ZCP interface to the new functionality. I am going to add it once the DSL interface passes a test of time. This change picks up support for zfs_ioc_namecheck_t::ENTITY_NAME that was added to ZoL as part of Redacted Send/Receive feature by Paul Dagnelie <pcd@delphix.com>. This is needed to allow a bookmark name in zc_name. Discussed with: mahrens Reviewed by: bcr (man page) Sponsored by: CyberSecure Differential Revision: https://reviews.freebsd.org/D21795	2019-10-03 11:08:45 +00:00
Alexander Motin	e9f4580d92	Improve latency of synchronous 128KB writes. Before my ZIL space optimization few years ago 128KB writes were logged as two 64KB+ records in two 128KB log blocks. After that change it became ~124KB+/4KB+ in two 128KB log blocks to free space in the second block for another record. Unfortunately in case of 128KB only writes, when space in the second block remained unused, that change increased write latency by imbalancing checksum computation time between parallel threads. This change introduces new 68KB log block size, used for both writes below 67KB and 128KB-sharp writes. Writes of 68-127KB are still using one 128KB block to not increase processing overhead. Writes above 131KB are still using full 128KB blocks, since possible saving there is small. Mixed loads will likely also fall back to previous 128KB, since code uses maximum of the last 10 requested block sizes. On a simple 128KB write test with queue depth of 1 this change demonstrates ~15-20% performance improvement. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-10-01 20:09:25 +00:00
Mark Johnston	9093dd9a66	Implement x86 dtrace_invop_(un)init() in C. There is no reason for these routines to be written in assembly. In the ports of DTrace to other platforms, they are already written in C. No functional change intended. MFC after: 1 week Sponsored by: Netflix	2019-09-23 15:08:17 +00:00
Andriy Gapon	38a1def12f	MFZoL: Retire send space estimation via ZFS_IOC_SEND Add a small wrapper around libzfs_core's lzc_send_space() to libzfs so that every legacy ZFS_IOC_SEND consumer, along with their userland counterpart estimate_ioctl(), can leverage ZFS_IOC_SEND_SPACE to request send space estimation. The legacy functionality in zfs_ioc_send() is left untouched for compatibility purposes. Obtained from: ZoL Obtained from: zfsonlinux/zfs@cf7684bc8d Author: loli10K <ezomori.nozomu@gmail.com> MFC after: 2 weeks	2019-09-22 08:44:41 +00:00
Andriy Gapon	6caa629e73	fix dsl_scan_ds_clone_swapped logic It was incorrect with respect to swapping dataset IDs both in the on-disk ZAP object and the in-memory queue. In both cases, if only ds1 was already present, then it would be first replaced with ds2 and then ds2 would be replaced back with ds1. Also, both cases did not properly handle a situation where both ds1 and ds2 are already queued. A duplicate insertion would be attempted and its failure would result in a panic. This change has also been submitted to ZoL as zfsonlinux/zfs@dd262c9 PR: 239566 Reported by: pascal.guitierrez@gmail.com MFC after: 4 days Sponsored by: CyberSecure	2019-09-19 09:43:56 +00:00
Alexander Motin	f4897c94dd	Fix typo, setting hidden flag instead of reparse. Submitted by: Ryan Moeller <ryan@ixsystems.com> MFC after: 3 days Sponsored by: iXsystems, Inc.	2019-09-18 19:33:08 +00:00
Mateusz Guzik	a8c8e44bf0	vfs: manage mnt_ref with atomics New primitive is introduced to denote sections can operate locklessly on aspects of struct mount, but which can also be disabled if necessary. This provides an opportunity to start scaling common case modifications while providing stable state of the struct when facing unmount, write suspendion or other events. mnt_ref is the first counter to start being managed in this manner with the intent to make it per-cpu. Reviewed by: kib, jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21425	2019-09-16 21:31:02 +00:00
Mark Johnston	e8bcf6966b	Revert r352406, which contained changes I didn't intend to commit.	2019-09-16 15:04:45 +00:00
Mark Johnston	41fd4b9422	Fix a couple of nits in r352110. - Remove a dead variable from the amd64 pmap_extract_and_hold(). - Fix grammar in the vm_page_wire man page. Reported by: alc Reviewed by: alc, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21639	2019-09-16 15:03:12 +00:00
Jeff Roberson	c75757481f	Replace redundant code with a few new vm_page_grab facilities: - VM_ALLOC_NOCREAT will grab without creating a page. - vm_page_grab_valid() will grab and page in if necessary. - vm_page_busy_acquire() automates some busy acquire loops. Discussed with: alc, kib, markj Tested by: pho (part of larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21546	2019-09-10 19:08:01 +00:00
Jeff Roberson	4cdea4a853	Use the sleepq lock rather than the page lock to protect against wakeup races with page busy state. The object lock is still used as an interlock to ensure that the identity stays valid. Most callers should use vm_page_sleep_if_busy() to handle the locking particulars. Reviewed by: alc, kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21255	2019-09-10 18:27:45 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Andriy Gapon	b539c9bfbd	ZFS: Always refuse receving non-resume stream when resume state exists This fixes a hole in the situation where the resume state is left from receiving a new dataset and, so, the state is set on the dataset itself (as opposed to %recv child). Additionally, distinguish incremental and resume streams in error messages. This was also committed to ZoL: zfsonlinux/zfs@ebeb6f23bf MFC after: 2 weeks Sponsored by: CyberSecure	2019-09-04 07:33:22 +00:00
Mateusz Guzik	e3c3248cc7	vfs: implement usecount implying holdcnt vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting freed and usecount to denote it is actively used. Previously all operations bumping usecount would also bump holdcnt, which is not necessary. We can detect if usecount is already > 1 (in which case holdcnt is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves on atomic ops. Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21471	2019-09-03 15:42:11 +00:00
Mark Johnston	08cfa56ea3	Extend uma_reclaim() to permit different reclamation targets. The page daemon periodically invokes uma_reclaim() to reclaim cached items from each zone when the system is under memory pressure. This is important since the size of these caches is unbounded by default. However it also results in bursts of high latency when allocating from heavily used zones as threads miss in the per-CPU caches and must access the keg in order to allocate new items. With r340405 we maintain an estimate of each zone's usage of its (per-NUMA domain) cache of full buckets. Start making use of this estimate to avoid reclaiming the entire cache when under memory pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only items in excess of the estimate are reclaimed. Draining a zone reclaims all of the cached full buckets (the previous behaviour of uma_reclaim()), and may further drain the per-CPU caches in extreme cases. Now, when under memory pressure, the page daemon will trim zones rather than draining them. As a result, heavily used zones do not incur bursts of bucket cache misses following reclamation, but large, unused caches will be reclaimed as before. Reviewed by: jeff Tested by: pho (an earlier version) MFC after: 2 months Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16667	2019-09-01 22:22:43 +00:00
Mateusz Guzik	e9fff745a7	zfs: fix snapshot dir destruction after introducion of VOP_NEED_INACTIVE Reported by: lwhsu PR: 240221 Sponsored by: The FreeBSD Foundation	2019-08-31 13:24:22 +00:00
Konstantin Belousov	6470c8d3db	Rework v_object lifecycle for vnodes. Current implementation of vnode_create_vobject() and vnode_destroy_vobject() is written so that it prepared to handle the vm object destruction for live vnode. Practically, no filesystems use this, except for some remnants that were present in UFS till today. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim. Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412	2019-08-29 07:50:25 +00:00
Andriy Gapon	4b4d1b818e	zfs_ioc_snapshot: check user-prop permissions on snapshotted datasets Previously, the permissions were checked on the pool which was obviously incorrect. After this change, zfs_check_userprops() only validates the properties without any permission checks. The permissions are checked individually for each snapshotted dataset. This was also committed to ZoL: zfsonlinux/zfs@e6203d2 Reported by: CyberSecure MFC after: 1 week Sponsored by: CyberSecure	2019-08-29 07:19:06 +00:00
Mark Johnston	772dd133c6	Avoid direct accesses of the vm_page wire_count field. No functional change intended. Sponsored by: Netflix	2019-08-28 18:01:54 +00:00
Jeff Roberson	cf27e0d125	Use an atomic reference count for paging in progress so that callers do not require the object lock. Reviewed by: markj Tested by: pho (as part of a larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21311	2019-08-19 23:09:38 +00:00
Andriy Gapon	97b342a310	zfs_vget: fix vnode reference count leak in error path If vn_lock() failed, then the function returned the error but the vnode obtained via zfs_zget() was never released. MFC after: 10 days Sponsored by: Panzura	2019-08-17 09:23:03 +00:00
Andriy Gapon	5a75f51a1f	Revert r351076 and r351074 because of atomic_swap_64 on 32-bit platforms Trying to sort it out.	2019-08-15 15:27:58 +00:00
Andriy Gapon	30f7381b8e	MFV r351075: 10406 large_dnode changes broke zfs recv of legacy stream illumos/illumos-gate@811964cd9f `811964cd9f` https://www.illumos.org/issues/10406 The large dnode changes from 8423 caused problems in zfs recv for a legacy stream. This manifests when attempting to mount the received stream, but the problem is in the receive code. We missed the following commit from ZoL which fixes this. commit da2feb42fb5c7a8c1e1cc67f7a880da9d8e97bc2 Author: Tom Caputi <tcaputi@datto.com> Date: Thu Jun 28 17:55:11 2018 -0400 Fix 'zfs recv' of non large_dnode send streams Currently, there is a bug where older send streams without the DMU_BACKUP_FEATURE_LARGE_DNODE flag are not handled correctly. The code in receive_object() fails to handle cases where drro->drr_dn_slots is set to 0, which is always the case when the sending code does not support this feature flag. This patch fixes the issue by ensuring that that a value of 0 is treated as DNODE_MIN_SLOTS. Author: Tom Caputi <tcaputi@datto.com> MFC after: 3 weeks X-MFC after: r351074	2019-08-15 15:11:20 +00:00
Andriy Gapon	93132b76cd	MFV r350898: 8423 8199 7432 Implement large_dnode pool feature 8423 8199 7432 Implement large_dnode pool feature 8423 Implement large_dnode pool feature 8199 multi-threaded dmu_object_alloc() 7432 Large dnode pool feature llumos/illumos-gate@54811da5ac `54811da5ac` https://www.illumos.org/issues/8423 https://www.illumos.org/issues/8199 https://www.illumos.org/issues/7432 ZoL issues: Improved dnode allocation #6564 Clean up large dnode code #6262 Fix dnode_hold() freeing dnode behavior #8172 Fix dnode allocation race #6414, #6439 Partial: Raw sends must be able to decrease nlevels #6821, #6864 Remove unnecessary txg syncs from receive_object() Closes #7197 This updates FreeBSD large_dnode code (that was imported from ZoL) to a version that was committed to illumos. It has some cleanups, improvements and fixes comparing to what we have in FreeBSD now. I think that the most significant update is 8199 multi-threaded dmu_object_alloc(). Obtained from: illumos MFC after: 3 weeks	2019-08-15 14:57:27 +00:00
Andriy Gapon	4139761bb5	MFV r350896: 6585 sha512, skein, and edonr have an unenforced dependency on extensible dataset illumos/illumos-gate@892586e8a1 `892586e8a1` https://www.illumos.org/issues/6585 In any pool without the extensible dataset feature flag already enabled, creating a dataset with dedup set to use one of the new checksums would result in the following panic as soon as any data was added: panic[cpu0]/thread=ffffff0006761c40: feature_get_refcount(spa, feature, &refcount) != 48 (0x30 != 0x30), file: ../../common/fs/zfs/zfeature.c line 390 ffffff0006761830 fffffffffba8fbdd () ffffff0006761890 zfs:feature_do_action+11a () ffffff00067618c0 zfs:spa_feature_incr+1e () ffffff0006761920 zfs:dmu_object_zapify+b7 () ffffff00067619b0 zfs:dsl_dataset_activate_feature+97 () ffffff0006761a20 zfs:dsl_dataset_sync+ba () ffffff0006761ab0 zfs:dsl_pool_sync+153 () ffffff0006761b70 zfs:spa_sync+26e () ffffff0006761c20 zfs:txg_sync_thread+227 () ffffff0006761c30 unix:thread_start+8 () Inspection showed that feature->fi_feature was 7, which is the value of SPA_FEATURE_EXTENSIBLE_DATASET in the spa_feature enum. Testing shows that the panic can be prevented by explicitly setting extensible dataset as a dependency for the sha512, edonr, and skein feature flags. Alternatively, the new checksums code could possibly be changed to obviate the need for the dependency. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: ilovezfs <ilovezfs@icloud.com> Note that FreeBSD does not support ednor yet. MFC after: 2 weeks	2019-08-12 11:42:16 +00:00
Andriy Gapon	cfe94339f2	a stop gap fix for a race between dnode_hold and dnode_sync_free The race was introduced in r337669, the large dnode feature import from ZoL. The problem was debugged by ZoL developers and then, independently, on FreeBSD. The fix is an early proposal by Brian Behlendorf: `50f32ed74e` This fix never went into ZoL. A larger change that was committed later included a different solution because of the re-worked code. Ideally, we want to revert this fix and re-synchronize FreeBSD large dnode code with that in illumos (or newer ZoL). illumos has a later import of the feature from ZoL that does not have the bug. PR: 236480 Obtained from: Brian Behlendorf <behlendorf1@llnl.gov> Submitted by: ncrogers@gmail.com (patch adaptation) Reported by: ncrogers@gmail.com Tested by: ncrogers@gmail.com, Dennis Noordsij <dennis.noordsij@alumni.helsinki.fi>, Julien Cigar <julien@perdition.city> MFC after: 10 days	2019-08-12 10:30:00 +00:00
Toomas Soome	b1b9326846	loader: support com.delphix:removing We should support removing vdev from boot pool. Update loader zfs reader to support com.delphix:removing. Reviewed by: allanjude MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18901	2019-08-08 18:08:13 +00:00
Xin LI	0ed1d6fb00	Allow Kernel to link in both legacy libkern/zlib and new sys/contrib/zlib, with an eventual goal to convert all legacl zlib callers to the new zlib version: * Move generic zlib shims that are not specific to zlib 1.0.4 to sys/dev/zlib. * Connect new zlib (1.2.11) to the zlib kernel module, currently built with Z_SOLO. * Prefix the legacy zlib (1.0.4) with 'zlib104_' namespace. * Convert sys/opencrypto/cryptodeflate.c to use new zlib. * Remove bundled zlib 1.2.3 from ZFS and adapt it to new zlib and make it depend on the zlib module. * Fix Z_SOLO build of new zlib. PR: 229763 Submitted by: Yoshihiro Ota <ota j email ne jp> Reviewed by: markm (sys/dev/zlib/zlib_kmod.c) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D19706	2019-08-01 06:35:33 +00:00
Mark Johnston	61f2f0bae6	Fix FASTTRAPIOC_GETINSTR. This ioctl is used when a breakpoint is encountered while disassembling a symbol in the target process. Since only one DTrace consumer can toggle or enumerate fasttrap probes from a given process at time, this ioctl does not appear to be used in practice.	2019-07-17 16:38:29 +00:00
Mark Johnston	eeacb3b02f	Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247	2019-07-08 19:46:20 +00:00
Alexander Motin	419110374a	Avoid extra taskq_dispatch() calls by DMU. DMU sync code calls taskq_dispatch() for each sublist of os_dirty_dnodes and os_synced_dnodes. Since the number of sublists by default is equal to number of CPUs, it will dispatch equal, potentially large, number of tasks, waking up many CPUs to handle them, even if only one or few of sublists actually have any work to do. This change adds check for empty sublists to avoid this.	2019-06-25 18:35:23 +00:00
Alexander Motin	3bae917061	Minimize aggsum_compare(&arc_size, arc_c) calls. For busy ARC situation when arc_size close to arc_c is desired. But then it is quite likely that aggsum_compare(&arc_size, arc_c) will need to flush per-CPU buckets to find exact comparison result. Doing that often in a hot path penalizes whole idea of aggsum usage there, since it replaces few simple atomic additions with dozens of lock acquisitions. Replacing aggsum_compare() with aggsum_upper_bound() in code increasing arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles allows to save ~5% of CPU time in aggsum code during sequential write to 12 ZVOLs with 16KB block size on large dual-socket system. I suppose there some minor arc_p behavior change due to lower precision of the new code, but I don't think it is a big deal, since it should affect only very small window in time (aggsum buckets are flushed every second) and in ARC size (buckets are limited to 10 average ARC blocks per CPU). MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-06-14 20:04:28 +00:00
Alexander Motin	b3b3aa2e29	Alike to ZoL disable metaslab allocation tracing code. It is too generous to collect in production debug traces that can only be read with kernel debugger. Illumos includes special code in their mdb debugger to read it, we don't. MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-06-14 19:57:32 +00:00
Alexander Motin	284e53a401	Properly align struct multilist_sublist to cache line. Manual Illumos alignment does not fit us due to different kmutex_t size. MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-06-14 17:09:39 +00:00
Alexander Motin	913095dc56	Move write aggregation memory copy out of vq_lock. Memory copy is too heavy operation to do under the congested lock. Moving it out reduces congestion by many times to almost invisible. Since the original zio removed from the queue, and the child zio is not executed yet, I don't see why would the copy need protection. My guess it just remained like this from the time when lock was not dropped here, which was added later to fix lock ordering issue. Multi-threaded sequential write tests with both HDD and SSD pools with ZVOL block sizes of 4KB, 16KB, 64KB and 128KB all show major reduction of lock congestion, saving from 15% to 35% of CPU time and increasing throughput from 10% to 40%. Reviewed by: ahrens, behlendorf, ryao MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-06-13 01:21:32 +00:00
Alexander Motin	35251e9c28	Fix comparison signedness in arc_is_overflowing(). When ARC size is very small, aggsum_lower_bound(&arc_size) may return negative values, that due to unsigned comparison caused delays, waiting for arc_adjust() to "fix" it by calling aggsum_value(&arc_size). Use of signed comparison there fixes the problem. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-06-07 20:59:24 +00:00
Alexander Motin	61586dd647	Explicitly start ARC adjustment on limits change. While formally it is not necessary, but the sooner it start, the sooner it finish, and supposedly less disturbing for workload it will be. MFC after: 2 weeks	2019-06-07 19:03:17 +00:00
Andriy Gapon	d8b12a2162	Restore ARC MFU/MRU pressure Before r305323 (MFV r302991: 6950 ARC should cache compressed data) arc_read() code did this for access to a ghost buffer: arc_adapt() (from arc_get_data_buf()) arc_access(hdr, hash_lock) I.e., we first checked access to the MFU ghost/MRU ghost buffer and adapt MFU/MRU sizes (in arc_adapt()) and next move buffer from the ghost state to regular. After r305323 the sequence is different: arc_access(hdr, hash_lock); arc_hdr_alloc_pabd(hdr); I.e., we first move the buffer from the ghost state in arc_access() and then we check access to buffer in ghost state (in arc_hdr_alloc_pabd() -> arc_get_data_abd() -> arc_get_data_impl() -> arc_adapt()). This is incorrect: arc_adapt() never see access to the ghost buffer because arc_access() already migrated the buffer from the ghost state to regular. So, the fix is to restore a call to arc_adapt() before arc_access() and to suppress the call to arc_adapt() after arc_access(). Submitted by: Slawa Olhovchenkov <slw@zxy.spb.ru> MFC after: 2 weeks Sponsored by: Integros [integros.com] Differential Revision: https://reviews.freebsd.org/D19094	2019-06-07 06:35:42 +00:00
Mark Johnston	c080655467	Fix a race between fasttrap and the user breakpoint handler. When disabling the last enabled userspace probe, fasttrap clears the function pointers which hook in to the breakpoint handler. If a traced thread hit a fasttrap breakpoint before it was removed, we must ensure that it is able to call the hook; otherwise fasttrap will not consume the trap and SIGTRAP will be delievered to the thread. Synchronize with such threads by ensuring that they load the hook pointer with interrupts disabled, and by completing an SMP rendezvous after removing breakpoints and before clearing the pointers. Reported by: Alexander Alexeev <Alexander.Alexeev@dell.com> Tested by: Alexander Alexeev (earlier version) Reviewed by: cem, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20526	2019-06-06 16:03:25 +00:00
Mariusz Zaborski	8da024d941	dtrace: 64-bits registers support The registers in ilumos and FreeBSD have a different number. In the illumos, last 32-bits register defined is SS an in FreeBSD is GS. This off-by-one caused the uregs array to returns the wrong 64-bits register on amd64. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20363	2019-06-05 22:29:05 +00:00
Alexander Motin	0b5319dda0	MFV r348585: 9683 Allow bypassing devid in vdev_disk_open() illumos/illumos-gate@6fe4f3002c Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com> This is irrelevant to FreeBSD, just to reduce divergence.	2019-06-03 20:55:52 +00:00
Alexander Motin	9b048dd219	MFV r348583: 9847 leaking dd_clones (DMU_OT_DSL_CLONES) objects illumos/illumos-gate@17fb938fd6 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2019-06-03 20:49:20 +00:00
Alexander Motin	07a5c938c9	MFV r348578: 9962 zil_commit should omit cache thrash illumos/illumos-gate@cab3a55e15 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Approved by: Joshua M. Clulow <josh@sysmgr.org> Author: Prakash Surya <prakash.surya@delphix.com>	2019-06-03 20:24:40 +00:00
Alexander Motin	a66a7143d4	MFV r348576: 9963 Seperate tunable for disabling ZIL vdev flush illumos/illumos-gate@f8fdf68125 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com>	2019-06-03 20:05:43 +00:00
Alexander Motin	c9719c9a6d	MFV r348573: 9993 zil writes can get delayed in zio pipeline illumos/illumos-gate@2258ad0b75 Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: George Wilson <george.wilson@delphix.com>	2019-06-03 19:25:53 +00:00
Alexander Motin	4d6afba5e0	MFV r348555: 9690 metaslab of vdev with no space maps was flushed during removal illumos/illumos-gate@4e75ba6826 Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2019-06-03 19:03:24 +00:00
Alexander Motin	1b61262505	MFC r348554: 9688 aggsum_fini leaks memory illumos/illumos-gate@29bf2d68be Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Jorgen Lundman <lundman@lundman.net> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>	2019-06-03 19:00:24 +00:00
Alexander Motin	677ef2563d	MFV r348553: 9681 ztest failure in spa_history_log_internal due to spa_rename() illumos/illumos-gate@6aee0ad769 Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2019-06-03 18:32:56 +00:00
Alexander Motin	74f7070445	MFV r348552: 9682 page fault in dsl_async_clone_destroy() while opening pool illumos/illumos-gate@ade2c82828 Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com>	2019-06-03 17:56:44 +00:00
Alexander Motin	2eff60e998	MFV r348551: 9862 fix typo in comment in vdev_impl.h illumos/illumos-gate@84927f52bd Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Robert Mustacchi <rm@joyent.com> Author: Allan Jude <allanjude@freebsd.org>	2019-06-03 17:44:47 +00:00
Alexander Motin	bd2ae688a4	MFV r348550: 1700 Add SCSI UNMAP support illumos/illumos-gate@047c81d31d Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Dan McDonald <danmcd@joyent.com> Author: Saso Kiselkov <saso.kiselkov@nexenta.com> This is irrelevant to FreeBSD, just a diff reduction.	2019-06-03 17:43:32 +00:00
Alexander Motin	d40f6a585a	MFV r348548: 9617 too-frequent TXG sync causes excessive write inflation illumos/illumos-gate@7928f4baf4 Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2019-06-03 17:40:11 +00:00
Alexander Motin	b3ed2d08e4	MFV r348537: 8601 memory leak in get_special_prop() illumos/illumos-gate@e19b450bec Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: John Gallagher <john.gallagher@delphix.com>	2019-06-03 17:29:57 +00:00
Alexander Motin	c066dcc074	MFV r348535: 9677 panic from zio_write_gang_block() when creating dump device on fragmented rpool illumos/illumos-gate@7341a7de4f Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Brad Lewis <brad.lewis@delphix.com>	2019-06-03 17:27:25 +00:00
Allan Jude	ad579d984f	Fix assertion in ZFS TRIM code Due to an attempt to check two conditions at once in a macro not designed as such, the assertion would always evaluate to true. #define VERIFY3_IMPL(LEFT, OP, RIGHT, TYPE) do { \ const TYPE __left = (TYPE)(LEFT); \ const TYPE __right = (TYPE)(RIGHT); \ if (!(__left OP __right)) \ assfail3(#LEFT " " #OP " " #RIGHT, \ (uintmax_t)__left, #OP, (uintmax_t)__right, \ __FILE__, __LINE__); \ _NOTE(CONSTCOND) } while (0) #define ASSERT3U(x, y, z) VERIFY3_IMPL(x, y, z, uint64_t) Mean that we compared: left = (type == ZIO_TYPE_FREE \|\| psize) OP = "<=" right = (SPA_MAXBLOCKSIZE) If the type was not FREE, 0 is less than SPA_MAXBLOCKSIZE (16MB) If the type is ZIO_TYPE_FREE, 1 is less than SPA_MAXBLOCKSIZE The constraint on psize (physical size of the FREE operation) is never checked against SPA_MAXBLOCKSIZE Reported by: Ka Ho Ng <khng300@gmail.com> Reviewed by: kevans MFC after: 2 weeks Sponsored by: Klara Systems	2019-05-29 20:34:35 +00:00
Justin Hibbits	b2aea1ad8f	powerpc/dtrace: Fix fbt function probing for ELFv2 '.' function names exist only in ELFv1. ELFv2 does away with function descriptors, and look more like they do on powerpc(32) and most other platforms, as direct function pointers. Stop blacklisting regular function names in ELFv2. Submitted by: Brandon Bergren Differential Revision: https://reviews.freebsd.org/D20346	2019-05-27 03:18:56 +00:00
Alexander Motin	83d0c3846d	Allocate buffers smaller then ABD chunk size as linear. This allows to reduce memory waste by letting UMA to put multiple small buffers into one memory page slab. The page sharing means that UMA may not be able to free memory page when some of buffers are freed, but alternatively memory used by that buffer would just be wasted from the beginning. This change follows alike change in ZoL, but unlike Linux (according to my understanding of it from comments) FreeBSD never shares slabs bigger then one memory page, so this should be even less invasive then there. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-05-22 18:43:48 +00:00
Allan Jude	57e9b361ba	Fix typo in r348068	2019-05-21 21:39:03 +00:00
Allan Jude	4e7d8292ec	ZFS: Make deadman tunables no longer read-only This allows the user to enable, disable, and adjust the I/O deadman at runtime. This can be especially useful when a pool is backed by remote storage (such as iscsi, ggated, etc). PR: 221906 Submitted by: Fabian Keil <fk@fabiankeil.de> Obtained from: ElectroBSD MFC after: 1 week Sponsored by: Klara Systems Event: Waterloo Hackathon 2019	2019-05-21 21:26:18 +00:00
Conrad Meyer	e2e050c8ef	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
Justin Hibbits	d69b94bab0	powerpc/dtrace: Actually fix stack traces Fix stack unwinding such that requesting N stack frames in lockstat will actually give you N frames, not anywhere from 0-3 as had been before. lockstat prints the mutex function instead of the caller as the reported locker, but the stack frame is detailed enough to find the real caller. MFC after: 2 weeks	2019-05-17 19:57:08 +00:00
Konstantin Belousov	7c5a46a1bc	Remove resolver_qual from DEFINE_IFUNC/DEFINE_UIFUNC macros. In all practical situations, the resolver visibility is static. Requested by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: so (emaste) Differential revision: https://reviews.freebsd.org/D20281	2019-05-16 22:20:54 +00:00
Alexander Motin	7763842174	Add mutex_destroy() missed in r334844. MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-04-26 19:02:21 +00:00
Alexander Motin	32d8034f77	Fix minor mismerges. No functional change. MFC after: 1 week	2019-04-26 18:25:59 +00:00
Alexander Motin	48ecceba1e	Change the way FreeBSD GID inheritance is hacked. I believe previous ifdef caused NULL dereference in later zfs_log_create() on attempt to create file inside directory belonging to ephemeral group created on illumos, trying to write to log information about GID domain of the newly created file, inheriting the ephemeral GID. This patch reuses original illumos SGID code with exception that due to lack of ID mapping code on FreeBSD ephemeral GID will turn into GID_NOBODY by another ifdef inside zfs_fuid_map_id(). MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-04-19 15:44:45 +00:00
Justin Hibbits	e9aae3496e	powerpc/dtrace: Fix dtrace powerpc asm, and simplify stack walking Fix some execution bugs in the dtrace powerpc asm. addme pulls in the carry flag which we don't want, and the result wasn't recorded anyways, so the following beq to check for exit condition wasn't checking the right condition. Simplify the stack walking in dtrace_isa.c, so there's only a single walker that handles both pc and sp. This should make it easier to follow, and any bugfix that may be needed for walking only needs to be made in one place instead of two now. MFC after: 2 weeks	2019-04-13 03:32:21 +00:00
Mariusz Zaborski	a1304030b8	Introduce funlinkat syscall that always us to check if we are removing the file associated with the given file descriptor. Reviewed by: kib, asomers Reviewed by: cem, jilles, brooks (they reviewed previous version) Discussed with: pjd, and many others Differential Revision: https://reviews.freebsd.org/D14567	2019-04-06 09:34:26 +00:00
Conrad Meyer	a8a16c7128	Replace read_random(9) with more appropriate arc4rand(9) KPIs Reviewed by: ae, delphij Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19760	2019-04-04 01:02:50 +00:00
Pawel Jakub Dawidek	af4f9e5f00	If the autoexpand pool property is turned on and vdev is healthy try to expand the pool automatically when we detect underlying GEOM provider size change. Obtained from: Fudo Security Tested in: AWS	2019-03-30 07:29:20 +00:00
Andriy Gapon	76a63ee510	Revert r345410, VOP_FSYNC change in ZFS vdev_file I overlooked the fact that that VOP_FSYNC() call is not a FreeBSD VFS call, but a macro that provides an illumos-compatible wrapper for the FreeBSD operation. PR: 236475 Reported by: lwhsu Pointyhat to: avg	2019-03-22 17:44:47 +00:00
Andriy Gapon	c1581f4f4d	ZFS vdev_file: use correct value for waitfor parameter of VOP_FSYNC PR: 236475 Reported by: asomers MFC after: 2 weeks	2019-03-22 09:11:45 +00:00
Mark Johnston	a4b59d3db6	Use an explicit comparison with VM_GUEST_NO. Reported by: jhb MFC with: r345359 Sponsored by: The FreeBSD Foundation	2019-03-21 20:07:50 +00:00
Mark Johnston	e362e590f9	Don't attempt to measure TSC skew when running as a VM guest. It simply doesn't work in general since VCPUs may migrate between physical cores. The approach used to measure skew also doesn't make much sense in a VM. PR: 218452 MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-03-21 02:52:22 +00:00
Alexander Motin	6bb46107d8	MFV r336930: 9284 arc_reclaim_thread has 2 jobs `arc_reclaim_thread()` calls `arc_adjust()` after calling `arc_kmem_reap_now()`; `arc_adjust()` signals `arc_get_data_buf()` to indicate that we may no longer be `arc_is_overflowing()`. The problem is, `arc_kmem_reap_now()` can take several seconds to complete, has no impact on `arc_is_overflowing()`, but due to how the code is structured, can impact how long the ARC will remain in the `arc_is_overflowing()` state. The fix is to use seperate threads to: 1. keep `arc_size` under `arc_c`, by calling `arc_adjust()`, which improves `arc_is_overflowing()` 2. keep enough free memory in the system, by calling `arc_kmem_reap_now()` plus `arc_shrink()`, which improves `arc_available_memory()`. illumos/illumos-gate@de753e34f9 Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@joyent.com> Reviewed by: Tim Kordas <tim.kordas@joyent.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Brad Lewis <brad.lewis@delphix.com>	2019-03-15 18:59:04 +00:00
Simon J. Gerraty	f5fdf82d82	Add _PC_ACL_* to vop_stdpathconf This avoid EINVAL from tmpfs etc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D19512	2019-03-11 20:40:56 +00:00
Alexander Motin	aa8676f25d	Revert minor part of r344934. I tried to save some CPU time on hopeless aggregation attempts, but it seems the condition I added is overly strict, blocking also aggregation of optional I/Os in cases which previously were possible. Revert just to be safe. MFC after: 1 month	2019-03-11 17:39:09 +00:00
Alexander Motin	5ca679e3c4	MFV/ZoL: Disable LBA weighting on files and SSDs The LBA weighting makes sense on rotational media where the outer tracks have twice the bandwidth of the inner tracks. However, it is detrimental on nonrotational media such as solid state disks, where the only effect is to ensure that metaslabs enter the best-fit allocation behavior sooner, which is detrimental to performance. It also makes no sense on files where the underlying filesystem can arrange things however it wants. Author: Richard Yao <ryao@gentoo.org> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3712 zfsonlinux/zfs@fb40095f5f To reduce code divergence this merge replaces equivalent but different FreeBSD code detecting non-rotating medium vdevs. MFC after: 1 month	2019-03-08 21:13:45 +00:00
Alexander Motin	673544c3dd	Add separate aggregation limit for non-rotating media. Before sequential scrub patches ZFS never aggregated I/Os above 128KB. Sequential scrub bumped that to 1MB, which motivation I understand for spinning disks, since it should reduce number of head seeks. But for SSDs it makes much less sense to me, especially on FreeBSD, where due to MAXPHYS limitation device will likely still see bunch of 128KB I/Os instead of one large. Having more strict aggregation limit allows to avoid allocation of large memory buffer and memcpy to/from it, that is a serious problem when bandwidth reaches few GB/s. MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-03-08 19:38:52 +00:00
Alexander Motin	3a3ba532e7	MFV/ZoL: Fix zfs_vdev_aggregation_limit bounds checking Update the bounds checking for zfs_vdev_aggregation_limit so that it has a floor of zero and a maximum value of the supported block size for the pool. Additionally add an early return when zfs_vdev_aggregation_limit equals zero to disable aggregation. For very fast solid state or memory devices it may be more expensive to perform the aggregation than to issue the IO immediately. Author: Brian Behlendorf <behlendorf1@llnl.gov> zfsonlinux/zfs@a58df6f536 MFV/ZoL: Cap maximum aggregate IO size Commit 8542ef8 allowed optional IOs to be aggregated beyond the specified aggregation limit. Since the aggregation limit was also used to enforce the maximum block size, setting `zfs_vdev_aggregation_limit=16777216` could result in an attempt to allocate an ABD larger than 16M. Author: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6259 Closes #6270 zfsonlinux/zfs@2d678f779a	2019-03-08 18:49:27 +00:00
Alexander Motin	ede8782611	Improve entropy for ZFS taskqueue selection. I just found that at least on Skylake CPUs cpu_ticks() never returns odd values, only even, and possibly has even bigger step (176/2?), that makes its lower bits very bad entropy source, leaving half of taskqueues unused. Switch to sbinuptime(), closer to upstreams, mitigates the problem by the rate conversion working as kind of hash function. In case that is somehow not enough (timer rate is too low or too divisible) mix in curcpu. MFC after: 1 week	2019-03-07 22:56:39 +00:00
Alexander Motin	551b7d3a29	Add respective tunables to few ZFS sysctls. MFC after: 1 week	2019-03-07 01:24:08 +00:00
Pawel Jakub Dawidek	b8da50d526	Improve readability of the code by making it explicit where the 'c' variable starts. It is also more consistent with similar code in this file.	2019-03-01 05:54:13 +00:00
Mark Johnston	8e7127fd91	Fix fasttrap_sig{trap,segv}(). - Don't leak the ksiginfo structure. - Hold the proc lock when sending a signal in fasttrap_sigsegv(). MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-02-26 18:20:41 +00:00
Mark Johnston	5563c675b3	Revert r344587. The fasttrap_isa.h header is needed by libdtrace, not just the kernel.	2019-02-26 17:33:56 +00:00
Mark Johnston	df59ed0787	Remove illumos-specific code from the x86 fasttrap_isa.c. The file has not been touched upstream in over a decade, and the nature of the code means that a lot of FreeBSD-specific bits are required. Remove the dead code to improve readability. No functional change intended. Discussed with: cem MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-02-26 16:34:43 +00:00
Mark Johnston	6829dae12b	Remove stub fasttrap implementations. No platforms except i386, amd64 and powerpc implement fasttrap; the fasttrap files for other arches do not contain any code and bloat the output from cscope, so just remove them. MFC after: 1 week	2019-02-26 16:31:47 +00:00
Mark Johnston	f23e684bbf	Commit a missing piece of r344452. MFC with: r344452	2019-02-21 22:56:54 +00:00
Mark Johnston	4f1b715c84	Fix a tracepoint lookup race in fasttrap_pid_probe(). fasttrap hooks the userspace breakpoint handler; the hook looks up the breakpoint address in a hash table of tracepoints. It is possible for the tracepoint to be removed by a different thread in between the breakpoint trap and the hash table lookup, in which case SIGTRAP gets delivered to the target process. Fix the problem by adding a per-process generation counter that gets incremented when a tracepoint belonging to that process is removed. Then, when a lookup fails, the trapping instruction is restarted if the thread's counter doesn't match that of the process. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19273	2019-02-21 22:54:17 +00:00
Pawel Jakub Dawidek	2691ae3230	Simplify the code. No functional changes. Reviewed by: rpokala	2019-02-20 00:25:45 +00:00
Pawel Jakub Dawidek	91853b8546	Simplify the code.	2019-02-19 23:53:33 +00:00
Pawel Jakub Dawidek	01e21ead90	Correct typo in the comment.	2019-02-19 23:44:00 +00:00
Pawel Jakub Dawidek	99ab63b69d	Change assertion to log the incorrect io_type we've got.	2019-02-19 23:43:15 +00:00
Pawel Jakub Dawidek	36d43b5dfe	Grabage-collect no longer used variable.	2019-02-19 23:41:23 +00:00
Pawel Jakub Dawidek	11c8759337	The way ZFS searches for its vdevs is the following: first it looks for a vdev that has the same name as the one stored in metadata and that has all VDEV labels in place. If it cannot find a GEOM provider with the given name and all VDEV labels it will scan all GEOM providers for the best match (the most VDEV labels available), but here the name is ignored. In case the ZFS pool is created, eg. using GPT partition label: # zpool create tank /dev/gpt/tank everything works, and on every import ZFS will pick /dev/gpt/tank and not /dev/da0p4. The problem occurs when da0p4 is extended and ZFS is unable to find all VDEV labels in /dev/gpt/tank anymore (the VDEV labels stored at the end of the partition are now somewhere else). In this case it will scan all GEOM providers and will pick the first one with the best match, ie. da0p4. Fix this problem by checking the VDEV/provider name even if we get the same match. If the name is the same as the one we have in pool's metadata, prefer this GEOM provider. Reported by: oshogbo, Michal Mroz <m.mroz@fudosecurity.com> Tested by: Michal Mroz <m.mroz@fudosecurity.com> Obtained from: Fudo Security	2019-02-19 23:35:55 +00:00
Pawel Jakub Dawidek	d793cf7019	In the vdev_geom_open_by_path() function we assume that vdev path starts with "/dev/". Make sure this is the case.	2019-02-19 23:22:39 +00:00
Alexander Motin	ed0a3e8637	s/Maximal/Maximum/ in sysctl description. Submitted by: smh MFC after: 1 week	2019-02-04 20:09:22 +00:00
Alexander Motin	ef08154150	Add missed tunables/sysctls for some new vdev variables. While there, make few existing sysctls writeable, since there is no reason not to. MFC after: 1 week	2019-02-04 16:13:41 +00:00
Alexander Motin	54cde30f92	Remove BIO_ORDERED flag from BIO_FLUSH sent by ZFS. In all cases where ZFS sends BIO_FLUSH, it first waits for all related writes to complete, so its BIO_FLUSH does not care about strict ordering. Removal of one makes life much easier at least for NVMe driver, which hardware has no concept of request ordering, relying completely on software. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-01-30 17:39:44 +00:00
Mariusz Zaborski	db009dddfd	zfs: allow to change cache flush sysctl There is no reason for this variable to be tunable. This variable is used as a barrier in few places. Discussed with: pjd MFC after: 2 weeks Sponsored by: Fudo Security	2019-01-26 13:53:00 +00:00
Sean Eric Fagan	82e20c0a72	Change ZFS quotas to return EINVAL when not present (matches man page). UFS will return EINVAL when quotas are not enabled on a filesystem; ZFS' equivalent involves not having quotas (there is not way to enable or disable quotas as such). My initial implementation had it return ENOENT, but quotactl(2) indicates EINVAL is more appropriate. MFC after: 2 weeks Approved by: mav Reviewed by: markj Reported by: Emrion <kmachine@free.fr> Sponsored by: iXsystems Inc PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234413	2019-01-11 02:53:46 +00:00
Matt Macy	27e05a1902	zfsboot: support newer ZFS versions declare v3 objset size/layout to fix userboot and possibly other loader issues - fix for userboot assertion failure in zfs_dev_close in free due to out of bounds write - fix for zfs_alloc / zfs_free mismatch assertion failure when booting GPT on BIOS	2019-01-03 22:49:11 +00:00
Andriy Gapon	4c325393f3	MFV r342532: 5882 Temporary pool names Note that this commit brings only formatting changes that were done during the final review of the illumos change, because FreeBSD got the main changes before illumos. illumos/illumos-gate@04e5635652 `04e5635652` https://www.illumos.org/issues/5882 This is an import of the temporary pool names functionality from ZoL: `e2282ef57e` `26b42f3f9d` `2f3ec90061` `00d2a8c92f` `83e9986f6e` `023bbe6f01` It is intended to assist the creation and management of virtual machines that have their rootfs on ZFS on hosts that also have their rootfs on ZFS. These situations cause SPA namespace collisions when the standard name rpool is used in both cases. The solution is either to give each guest pool a name unique to the host, which is not always desireable, or boot a VM environment containing an ISO image to install it, which is cumbersome. MFC after: 1 week Sponsored by: Panzura	2018-12-26 11:03:14 +00:00
Andriy Gapon	f050611e7f	MFV r342469: 9630 add lzc_rename and lzc_destroy to libzfs_core illumos/illumos-gate@049ba636fa `049ba636fa` https://www.illumos.org/issues/9630 Rename and destroy are very useful operations that deserve to be in libzfs_core. And they are not hard to implement too. MFC after: 2 weeks Relnotes: maybe	2018-12-26 10:37:41 +00:00
Mateusz Guzik	8ca79fbd4a	dtrace: fix userspace access on boxes with SMAP dtrace has its own routines which were not updated after SMAP support got implemented. Use ifunc just like for other routines. This in particular fixes ustack(). Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18542	2018-12-13 20:09:38 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Toomas Soome	7aaf685ba7	zfs: we can boot from dataset with large_dnode enabled loader has been supporting large_dnode for some time, no need to block the feature for boot dataset. Reviewed by: avg MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D18391	2018-12-03 19:35:21 +00:00
Mark Johnston	6d2e2df764	Ensure that directory entry padding bytes are zeroed. Directory entries must be padded to maintain alignment; in many filesystems the padding was not initialized, resulting in stack memory being copied out to userspace. With the ino64 work there are also some explicit pad fields in struct dirent. Add a subroutine to clear these bytes and use it in the in-tree filesystems. The NFS client is omitted for now as it was fixed separately in r340787. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-11-23 22:24:59 +00:00
Alexander Motin	eecd0a1856	Revert r340096: 9952 Block size change during zfs receive drops spill block It was reported, and I easily reproduced it, that this change triggers panic when receiving replication stream with enabled embedded blocks, when short file compressing into one embedded block changes its block size. I am not sure that the problem is in this particuler patch, not just triggered by it, but since investigation and fix will take some time, I've decided to revert this for now. PR: 198457, 233277	2018-11-21 18:18:57 +00:00
Mark Johnston	544e0a4f69	Use taskqueue_quiesce(9) to implement taskq_wait(). PR: 227784 Reviewed by: cem MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17975	2018-11-21 17:19:08 +00:00
Justin Hibbits	cfebc0faa7	DTrace/powerpc: Fix FBT return probes The FBT fuction boundary prober was setting one return probe marker value, but the dtrace handler was expecting another. This causes a hang when tracing return probes.	2018-11-21 16:47:11 +00:00

1 2 3 4 5 ...

2290 Commits