freebsd-dev

Author	SHA1	Message	Date
Andriy Gapon	69bac03666	MFV r308990: 7181 race between zfs_mount and zfs_ioc_rollback illumos/illumos-gate@90f2c094b3 `90f2c094b3` https://www.illumos.org/issues/7181 zfsvfs_setup() is called in both zfs_mount and zfs_resume_fs paths. dmu_objset_set_user(zfsvfs->z_os, zfsvfs) is called early in zfsvfs_setup() before the setup is actually completed, thus an under-constructed zfsvfs becomes visible. Additionally, there is nothing to serialize the two call paths. As a result two threads can step on each other's toes. assertion failed: zilog->zl_clean_taskq == NULL, file: ../../common/fs/zfs/zil.c, line: 1772 > $c vpanic() 0xfffffffffbdf6928() zil_open+0x45(ffffff1bbc5dd000, fffffffff7993880) zfsvfs_setup+0x84(ffffffb378d77000, 0) zfs_resume_fs+0x132(ffffffb378d77000, ffffffb37ddcf000) zfs_ioc_rollback+0x96(ffffffb37ddcf000, ffffff01dcdc4cd0, ffffff01aa091000) zfsdev_ioctl+0x215(10a00000000, 5a19, 80465f8, 100003, ffffff01ab318368, ffffff0004b59e58) cdev_ioctl+0x39(10a00000000, 5a19, 80465f8, 100003, ffffff01ab318368, ffffff0004b59e58) spec_ioctl+0x60(ffffff0197737700, 5a19, 80465f8, 100003, ffffff01ab318368, ffffff0004b59e58) fop_ioctl+0x55(ffffff0197737700, 5a19, 80465f8, 100003, ffffff01ab318368, ffffff0004b59e58) ioctl+0x9b(7, 5a19, 80465f8) sys_syscall32+0x1f7() > ffffff1bbc5dd000::print objset_t os_zil os_zil = 0xffffff1c053cf7c0 > 0xffffff1c053cf7c0::print zilog_t zl_clean_taskq Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Author: Andriy Gapon <andriy.gapon@clusterhq.com> MFC after: 2 weeks	2016-11-24 10:34:42 +00:00
Andriy Gapon	b55ae64b50	MFV r308988: 7199, 7200 dsl_dataset_rollback_sync may try to free already free blocks 7199 dsl_dataset_rollback_sync may try to free already free blocks 7200 no blocks must be born in a txg after a snaphot is created illumos/illumos-gate@bfaed0b91e `bfaed0b91e` https://www.illumos.org/issues/7199 dsl_dataset_rollback_sync may try to free already freed blocks when it calls dsl_destroy_head_sync_impl to destroy a temporary clone. That happens if a snapshot to which we are rolling back and from which the clone is created has some ZIL records. https://www.illumos.org/issues/7200 No new blocks must be born in a dataset in the same TXG after a snapshot of the dataset is taken. Those blocks would have the same blk_birth as the dataset's ds_prev_snap_txg and as such they would be presumed to belong o the snapshot while in fact they do not. All the datasets must be clean before sync tasks are run, so the described scenario may happen only if one of the sync tasks dirties the dataset and another sync task takes its snapshot. Then, there will be another sync pass because of the dirty data and the new blocks will be born in the same TXG when the data is written out. It seems that almost all of the existing sync tasks modify only MOS and do not dirty any objsets. The only exception that I've been able to identify so far is the rollback which can modify an objset when it zeroes out the objset's ZIL. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com> Author: Andriy Gapon <andriy.gapon@clusterhq.com> MFC after: 3 weeks	2016-11-24 10:29:21 +00:00
Andriy Gapon	239c22b73d	MFV r308987: 7180 potential race between zfs_suspend_fs+zfs_resume_fs and zfs_ioc_rename illumos/illumos-gate@690041b9ca `690041b9ca` https://www.illumos.org/issues/7180 If a filesystem is not unmounted while the rename is being performed, then, for example, a concurrect zfs rollback may call zfs_suspend_fs followed by zfs_resume_fs on the same filesystem. The latter takes the filesystem's name as an argument. If the filesystem name changes as a result of the rename, then dmu_objset_hold(osname, zfsvfs, &os) call in zfs_resume_fs would fail resulting in a kernel panic. So far I have been able to reproduce this problem on FreeBSD where zfs rename has -u option that skips the unmounting before doing the renaming. But I think that in theory the same problem can occur on illumos as well, because the unmounting is done in userland before invoking the rename ioctl and there could be a race with, e.g., zfs mount. panic: solaris assert: dmu_objset_hold(osname, zfsvfs, &zfsvfs->z_os) == 0 (0x2 == 0x0), file: /usr/devel/svn/head/sys/cddl/contrib/opensolaris/uts/common/fs/ zfs/zfs_vfsops.c, line: 2210 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004df30710 vpanic() at vpanic+0x182/frame 0xfffffe004df30790 panic() at panic+0x43/frame 0xfffffe004df307f0 assfail3() at assfail3+0x2c/frame 0xfffffe004df30810 zfs_resume_fs() at zfs_resume_fs+0xb9/frame 0xfffffe004df30860 zfs_ioc_rollback() at zfs_ioc_rollback+0x61/frame 0xfffffe004df308a0 zfsdev_ioctl() at zfsdev_ioctl+0x65c/frame 0xfffffe004df30940 devfs_ioctl_f() at devfs_ioctl_f+0x156/frame 0xfffffe004df309a0 kern_ioctl() at kern_ioctl+0x246/frame 0xfffffe004df30a00 sys_ioctl() at sys_ioctl+0x171/frame 0xfffffe004df30ae0 amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe004df30bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe004df30bf0 Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> MFC after: 2 weeks	2016-11-24 10:21:22 +00:00
Andriy Gapon	d15b9428bb	further fix zfs_lock() diagnostics It was very wrong to look at the vnode and znode internals without having locked the vnode first. Reported by: pho Tested by: pho MFC after: 1 week X-MFC with: r308887	2016-11-24 09:00:51 +00:00
George V. Neville-Neil	cdaa8777f7	Add tunable to disable destructive dtrace Submitted by: Joerg Pernfuss <code.jpe@gmail.com> Reviewed by: rstone, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8624	2016-11-23 22:50:20 +00:00
Alan Cox	bba39b9ae3	Remove PG_CACHED-related fields from struct vmmeter, because they are no longer used. More precisely, they are always zero because the code that decremented and incremented them no longer exists. Bump __FreeBSD_version to mark this change. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8583	2016-11-22 18:13:46 +00:00
Andriy Gapon	17055fcda7	fix unsafe modification of zfs_vnodeops when DIAGNOSTIC is enabled The idea was to avoid a false assertion in zfs_lock, but it was implemented very dangerously and incorrectly. Reported by: pho Tested by: pho MFC after: 1 week	2016-11-20 14:00:50 +00:00
Andriy Gapon	2ec31e84cc	zfs: fix up after the removal of PG_CACHED pages in r308691 PR: 214629 Reported by: mshirk@daemon-security.com Reviewed by: alc Tested by: Shawn Webb <shawn.webb@hardenedbsd.org> X-MFC with: 308691	2016-11-19 08:12:57 +00:00
Mark Johnston	188011dbf2	Support fetching RFLAGS in fasttrap_getreg(). MFC after: 1 week	2016-11-18 03:11:11 +00:00
Alexander Motin	14b5719f6a	After some ZIL changes 6 years ago zil_slog_limit got partially broken due to zl_itx_list_sz not updated when async itx'es upgraded to sync. Actually because of other changes about that time zl_itx_list_sz is not really required to implement the functionality, so this patch removes some unneeded broken code and variables. Original idea of zil_slog_limit was to reduce chance of SLOG abuse by single heavy logger, that increased latency for other (more latency critical) loggers, by pushing heavy log out into the main pool instead of SLOG. Beside huge latency increase for heavy writers, this implementation caused double write of all data, since the log records were explicitly prepared for SLOG. Since we now have I/O scheduler, I've found it can be much more efficient to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG. Existing ZIL implementation had problem with space efficiency when it has to write large chunks of data into log blocks of limited size. In some cases efficiency stopped to almost as low as 50%. In case of ZIL stored on spinning rust, that also reduced log write speed in half, since head had to uselessly fly over allocated but not written areas. This change improves the situation by offloading problematic operations from z_log_write() to zil_lwb_commit(), which knows real situation of log blocks allocation and can split large requests into pieces much more efficiently. Also as side effect it removes one of two data copy operations done by ZIL code WR_COPIED case. While there, untangle and unify code of z_log_write() functions. Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing block boundary, that may also improve efficiency if ZPL is made to do that. Sponsored by: iXsystems, Inc.	2016-11-17 21:01:27 +00:00
Alexander Motin	eb9bfc257d	Revert r307392: I've found a way to avoid big allocations completely.	2016-11-17 20:44:51 +00:00
Alan Cox	7667839a7e	Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497	2016-11-15 18:22:50 +00:00
Mark Johnston	375c8b20dc	Remove the DTrace printt and typeref actions. These are FreeBSD-specific and were added in r178576 to provide the ability to pretty-print instances of compound types. However, the print action has long since been augmented to provide this functionality with a simpler interface. Discussed with: gnn Differential Revision: https://reviews.freebsd.org/D8478	2016-11-12 19:26:12 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Oleksandr Tymoshenko	d30e308465	Fix include order as required post r308415	2016-11-07 20:02:18 +00:00
Alexander Motin	8acf168aab	Fix ZIL records ordering when ZVOL opened both with and without FSYNC. Before this an earlier writes to a ZVOL opened without FSYNC could get to ZIL after later writes to the same ZVOL opened with FSYNC. Fix this by replicating functionality of ZPL (zv_sync_cnt equivalent to z_sync_cnt), marking all log records sync if anybody opened the ZVOL with FSYNC. MFC after: 2 weeks	2016-11-01 16:03:31 +00:00
Alexander Motin	2d1d8f4c8f	Pass to zvol_log_truncate() same sync values as to zvol_log_write(). Surplus marking of TX_TRUNCATE records as sync could result in putting them into ZIL before previous writes if ones were async. MFC after: 2 weeks	2016-11-01 12:47:19 +00:00
Alexander Motin	74a148f46f	Add sysctls for zfs_immediate_write_sz and zvol_immediate_write_sz.	2016-10-29 23:25:12 +00:00
Andriy Gapon	97371ba2a9	zfsbootcfg: a simple tool to set next boot (one time) options for zfsboot (gpt)zfsboot will read one-time boot directives from a special ZFS pool area. The area was previously described as "Boot Block Header", but currently it is know as Pad2, marked as reserved and is zeroed out on pool creation. The new code interprets data in this area, if any, using the same format as boot.config. The area is immediately wiped out. Failure to parse the directives results in a reboot right after the cleanup. Otherwise the boot sequence proceeds as usual. zfsbootcfg writes zfsboot arguments specified on its command line to the Pad2 area of a disk identified by vfs.zfs.boot.primary_pool and vfs.zfs.boot.primary_vdev kenv variables that are set by loader during boot. Please see the manual page for more. Thanks to all who reviewed, contributed and made suggestions! There are many potential improvements to the feature, please see the review for details. Reviewed by: wblock (docs) Discussed with: jhb, tsoome MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D7612	2016-10-29 14:09:32 +00:00
Alexander Motin	471cf6ce7d	Add vdev_reopening support to vdev_geom. It allows to avoid extra GEOM providers flapping without significant need. Since GEOM got resize support, we don't need to reopen provider to get new size. If provider was orphaned and no longer valid, ZFS should already know that, and in such case reopen should be done in full as expected. MFC after: 2 weeks	2016-10-28 17:05:14 +00:00
Alexander Motin	f106f43aa2	Matching GUIDs, handle possible race on vdev detach. In case of vdev detach, causing top level mirror vdev destruction, leaf vdev changes its GUID to one of the destroyed mirror, that creates race condition when GUID in vdev label may not match one in the pool config. This change replicates logic nuance of vdev_validate() by adding special exception, matching the vdev GUID against the top level vdev GUID. Since this exception is not completely reliable (may give false positives if we fail to erase label on detached vdev), use it only as last resort. Quick way to reproduce this scenario now is detach vdev from a pool with enabled autoextend. During vdev detach autoextend logic tries to reopen remaining vdev, that always fails now since in-memory configuration is already updated, while on-disk labels are not yet. MFC after: 2 weeks	2016-10-28 16:21:31 +00:00
Alexander Motin	4be4cba048	Improve few debugging log messages.	2016-10-28 15:30:10 +00:00
Andriy Gapon	539fc86f2e	3746 ZRLs are racy illumos/illumos-gate@260af64db7 `260af64db7` https://www.illumos.org/issues/3746 From the original change log: It was possible for a reference to be added even with the lock held, and for references added just after a lock release to be lost. This bug was also independently found and reported in wesunsolve.net issues 6985013 6995524. In zrl_add(), always use an atomic operation to update the refcount. The mutex in the ZRL only guarantees that wakeups occur for waiters on the lock. It offers no protection against concurrent updates of the refcount. The only refcount transition that is safe to perform without an atomic operation is from ZRL_LOCKED back to 0, since this can only be performed by the thread which has the ZRL locked. Authored by: Will Andrews <will@freebsd.org> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Reviewed by: Pavel Zakharov <pavel.zakha@gmail.com> Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Justin T. Gibbs <gibbs@scsiguy.com> Approved by: Matt Ahrens <mahrens@delphix.com> Author: Youzhong Yang <yyang@mathworks.com> PR: 204037 MFC after: 1 week	2016-10-27 07:38:07 +00:00
Alexander Motin	f0cbbdecbc	Fix panic after ZVOL renamed to name invalid for DEVFS. MFC after: 2 weeks	2016-10-24 12:24:24 +00:00
Alexander Motin	9be66df1e1	Add vfs.zfs.zil_log_limit sysctl. It is at least partially broken now, but that is another question.	2016-10-16 18:49:15 +00:00
Alexander Motin	a059d8ccbc	Optimize ZIL itx memory allocation on FreeBSD. These allocations can reach up to 128KB, while FreeBSD kernel allocator can cache allocations only up to 64KB. To avoid expensive allocations for each large ZIL write use caching zio_buf_alloc() allocator instead. To make it possible de-inline few instances of zil_itx_destroy().	2016-10-16 10:43:12 +00:00
Alexander Motin	1899e205d1	MFV r307314: 6988 spa_sync() spends half its time in dmu_objset_do_userquota_updates Using a benchmark which creates 2 million files in one TXG, I observe that the thread running spa_sync() is on CPU almost the entire time we are syncing, and therefore can be a performance bottleneck. About 50% of the time in spa_sync() is in dmu_objset_do_userquota_updates(). The problem is that dmu_objset_do_userquota_updates() calls zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was modified (or created). In this benchmark, all the files are owned by the same user/group, so all 2 million calls to zap_increment_int() are modifying the same entry in the zap. The same issue exists for the DMU_GROUPUSED_OBJECT. We should keep an in-memory map from user to space delta while we are syncing, and when we finish, iterate over the in-memory map and modify the ZAP once per entry. This reduces the number of calls to zap_increment_int() from "number of objects modified" to "number of owners/groups of modified files". This reduced the time spent in spa_sync() in the file create benchmark by ~33%, from 11 seconds to 7 seconds. Closes #107 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Jinshan Xiong <jinshan.xiong@intel.com> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@5fc46359c5	2016-10-14 12:03:04 +00:00
Alexander Motin	b3a8b04807	MFV r307313: 5120 zfs should allow large block/gzip/raidz boot pool (loader project) Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Toomas Soome <tsoome@me.com> openzfs/openzfs@c8811bd3e2 FreeBSD still does not support booting from gzip-compressed datasets, so keep one chunk of this commit out.	2016-10-14 12:01:33 +00:00
Konstantin Belousov	5975e53d40	Fix a race in vm_page_busy_sleep(9). Suppose that we have an exclusively busy page, and a thread which can accept shared-busy page. In this case, typical code waiting for the page xbusy state to pass is again: VM_OBJECT_WLOCK(object); ... if (vm_page_xbusied(m)) { vm_page_lock(m); VM_OBJECT_WUNLOCK(object); <---1 vm_page_busy_sleep(p, "vmopax"); goto again; } Suppose that the xbusy state owner locked the object, unbusied the page and unlocked the object after we are at the line [1], but before we executed the load of the busy_lock word in vm_page_busy_sleep(). If it happens that there is still no waiters recorded for the busy state, the xbusy owner did not acquired the page lock, so it proceeded. More, suppose that some other thread happen to share-busy the page after xbusy state was relinquished but before the m->busy_lock is read in vm_page_busy_sleep(). Again, that thread only needs vm_object lock to proceed. Then, vm_page_busy_sleep() reads busy_lock value equal to the VPB_SHARERS_WORD(1). In this case, all tests in vm_page_busy_sleep(9) pass and we are going to sleep, despite the page being share-busied. Update check for m->busy_lock == VPB_UNBUSIED in vm_page_busy_sleep(9) to also accept shared-busy state if we only wait for the xbusy state to pass. Merge sequential if()s with the same 'then' clause in vm_page_busy_sleep(). Note that the current code does not share-busy pages from parallel threads, the only way to have more that one sbusy owner is right now is to recurse. Reported and tested by: pho (previous version) Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8196	2016-10-13 14:41:05 +00:00
Konstantin Belousov	f71d08566c	Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. Reported and tested by: jkim Discussed with: mjg Sponsored by: The FreeBSD Foundation	2016-10-07 11:38:28 +00:00
Andriy Gapon	6f98c83306	implement zfs_vptocnp() using z_parent property This should allow vn_fullpath() to work even when vfs name cache is disabled for zfs, which is the case when zfs properties like casesensitivity and normalization are set non-default values. The new code should be 100% reliable for directories and "mostly" reliable for files, that is, when hardlinks across directories are not used. Reported by: Frederic Chardon <chardon.frederic@gmail.com> Reviewed by: kib (vfs contract) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8146	2016-10-07 06:29:24 +00:00
Andriy Gapon	9ba3abc30e	zfs: fix a wrong assertion for extended attributes For the extended attributes the order between z_teardown_lock and the vnode lock is different. The bug was triggered only with DIAGNOSTIC turned on. This fix is developed in cooperation with avos. PR: 213112 Reported by: avos Tested by: avos MFC after: 1 week	2016-10-04 08:09:25 +00:00
Mark Johnston	4538cee5bf	Allow tracing of functions prefixed by "__". This restriction was inherited from upstream but is not relevant on FreeBSD. Furthermore, it hindered the tracing of locking primitive subroutines. MFC after: 1 week	2016-10-02 00:35:00 +00:00
Alexander Motin	863ef2ca62	Add #ifdef _KERNEL around send_holes_without_birth_time sysctl. Reported by: avg@	2016-09-29 17:48:53 +00:00
Alexander Motin	226a11f81e	MFV r306423: 7402 Create tunable to ignore hole_birth feature Until we can resolve the numerous hole_birth bugs that have cropped up recently, and come up with a way going forwards to protect users from corruption, we should disable the hole_birth feature. Using a tunable allows those who are confident that their data is correct to continue to take advantage of the feature. Closes #188 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>	2016-09-29 00:00:37 +00:00
Alexander Motin	bb97118138	MFV r306422: 7254 ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedobjs dsl_dataset_space is looking at the ds_bp's fill count while dmu_objset_write_ready() is concurrently modifying it. This fix adds an rrwlock to protect the ds_bp. Closes #180 Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>	2016-09-28 23:54:47 +00:00
Mark Johnston	9e579a58c3	Move implementations of uread() and uwrite() to the illumos compat layer. MFC after: 1 week	2016-09-24 21:40:14 +00:00
Andriy Gapon	d26312a4e4	fix vnode lock assertion for extended attributes directory Background. In ZFS a file with extended attributes has a special directory associated with it where each extended attribute is a file. The attribute's name is a file name and its value is a file content. When the ownership of a file with extended attributes is changed, ZFS also changes ownership of the special directory. This is where the bug was hit. The bug was introduced in r209158. Nota bene. ZFS vnode locks are typically acquired before z_teardown_lock (i.e., before ZFS_ENTER). But this is not the case for the vnodes that represent the extended attribute directory and files. Those are always locked after ZFS_ENTER. This is confusing and fragile. PR: 212702 Reported by: Christian Fuss to FreeNAS Tested by: mav MFC after: 1 week	2016-09-24 08:13:15 +00:00
Mark Johnston	36f5d07745	Re-check the systrace probe ID before calling dtrace_probe(). Otherwise there exists a narrow window during which a syscall probe can be disabled and cause a concurrently-running thread to call dtrace_probe() with an invalid probe ID. Reported by: ngie MFC after: 1 week Sponsored by: Dell EMC Isilon	2016-09-22 23:22:53 +00:00
Allan Jude	c2b475d0ee	MFV r268120: 4936 lz4 could theoretically overflow a pointer with a certain input illumos/illumos-gate@58d0718061 Reviewed by: delphij MFC after: 2 weeks Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D7850	2016-09-11 17:48:06 +00:00
Alexander Motin	20e45e033c	Switch random_get_pseudo_bytes() shim to arc4rand(). Our shim for Solaris random_get_bytes() uses read_random(), that looks reasonable, since it guaranties reliably seeded random data. On the other side Solaris random_get_pseudo_bytes() does not provide this guarantie, and its original Solaris implementation is equivalent to our arc4rand(), using software crypto without stressing slower hardware RNG.	2016-09-10 09:37:41 +00:00
Alexander Motin	4605bf63c4	MFV r305562: 7259 DS_FIELD_LARGE_BLOCKS is unused The DS_FIELD_LARGE_BLOCKS macro has been unused since the integration of this patch: commit ca0cc3918a1789fa839194af2a9245f801a06b1a Author: Matthew Ahrens <mahrens@delphix.com> Date: Fri Jul 24 09:53:55 2015 -0700 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> This patch simply removes this macro from dsl_dataset.h. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-07 20:09:24 +00:00
Alexander Motin	de1fdddeda	MFV r305560: 7278 tuning zfs_arc_max does not impact arc_c_min When changing zfs_arc_max (e.g. as zdb does), it may be set to less than the default arc_c_min. arc_c_min should decrease to not be more than arc_c_max, but it doesn't; therefore tuning of arc_c_max is ineffective. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@608764bead	2016-09-07 20:05:10 +00:00
Andriy Gapon	1a82707cd7	fix zfs pool creation accidentally broken by r305331 The upstream change introduced a new load state, SPA_LOAD_CREATE, and vdev_geom code needs to be aware of it. Tested by: cy MFC after: 1 week X-MFC with: r305331	2016-09-06 06:09:12 +00:00
Alexander Motin	9b9258a12a	Missed FreeBSD-specific piece of r305338.	2016-09-03 11:17:33 +00:00
Alexander Motin	d7e781bda3	MFC r305337: 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, I observed poor performance. I noticed that dmu_tx_hold_zap() was using about 30% of all CPU, and doing dnode_hold() 7 times on the same object (the ZAP object that is being held). dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the dnode_t that we already have in hand, rather than repeatedly calling dnode_hold(). To do this, we need to pass the dnode_t down through all the intermediate calls that dmu_tx_hold_zap() makes, making these routines take the dnode_t* rather than an objset_t* and a uint64_t object number. In particular, the following routines will need to have analogous *_by_dnode() variants created: dmu_buf_hold_noread() dmu_buf_hold() zap_lookup() zap_lookup_norm() zap_count_write() zap_lockdir() zap_count_write() This can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around the object allocation issue. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds. Sponsored by: Intel Corp. Closes #109 Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@d3e523d489	2016-09-03 11:00:29 +00:00
Alexander Motin	4ad4b70e77	MFV r305336: 7247 zfs receive of deduplicated stream fails This resolves two 'zfs recv' issues. First, when receiving into an existing filesystem, a snapshot created during the receive process is not added to the guid->dataset map for the stream, resulting in failed lookups for deduped streams when a WRITE_BYREF record refers to a snapshot received earlier in the stream. Second, the newly created snapshot was also not set properly, referencing the snapshot before the new receiving dataset rather than the existing filesystem. Closes #159 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Author: Chris Williamson <chris.williamson@delphix.com> openzfs/openzfs@b09697c8c1	2016-09-03 10:59:05 +00:00
Alexander Motin	070da3f779	MFV r305335: 7003 zap_lockdir() should tag hold zap_lockdir() / zap_unlockdir() should take a "void *tag" argument which tags the hold on the zap. This will help diagnose programming errors which misuse the hold on the ZAP. Sponsored by: Intel Corp. Closes #108 Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@0780b3eab5	2016-09-03 10:58:14 +00:00
Alexander Motin	d3ec2cdb4a	MFV r304157: 7230 add assertions to dmu_send_impl() to verify that stream includes BEGIN and END records illumos/illumos-gate@12b90ee2d3 https://github.com/illumos/illumos-gate/commit/12b90ee2d3b10689fc45f4930d2392f5f e1d9cfa https://www.illumos.org/issues/7230 A test failure occurred where a send stream had only a BEGIN record. This should not be possible if the send returns without error. Prevented this from happening in the future by adding an assertion to dmu_send_impl() to verify that if the function returns 0 (success) both a BEGIN and END record are present. Did this by adding flags to dmu_sendarg_t (indicating whether BEGIN o r END records sent), having dump_record() set flags appropriately, adding VERIFY statement to dmu_send_impl(). Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matt Krantz <matt.krantz@delphix.com>	2016-09-03 10:10:58 +00:00
Alexander Motin	7aafc9d4c8	MFV r304156: 7235 remove unused func dsl_dataset_set_blkptr illumos/illumos-gate@bd56f80007 https://github.com/illumos/illumos-gate/commit/bd56f80007857b960e0981ed0797ad8ec 844a96b https://www.illumos.org/issues/7235 The function dsl_dataset_set_blkptr() is unused. We should remove it. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2016-09-03 10:09:23 +00:00

1 2 3 4 5 ...

1628 Commits