freebsd-skq

Author	SHA1	Message	Date
mm	4e7dd14720	Fix null pointer dereference in zfs_freebsd_setacl(). Prevents unprivileged users from panicking the kernel by calling __acl_delete_*() on files or directories inside a ZFS mount. MFC after: 3 days	2017-03-02 23:23:28 +00:00
mav	aa7ab7b7d4	Execute last ZIO of log commit synchronously. For short transactions overhead of context switch can be too large. Skipping it gives significant latency reduction. For large ones, including multiple ZIOs, latency is less critical, while throughput there may become limited by checksumming speed of single CPU core. To get best of both cases, execute last ZIO directly from calling thread context to save latency, while all others (if there are any) enqueue to taskqueues in traditional way. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2017-03-02 07:55:47 +00:00
mav	3ef0c861a5	Completely skip cache flushing for not supporting log devices. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2017-03-02 07:50:06 +00:00
ae	dcf8479255	Do not invoke the resize event when previous provider's size was zero. This is similar to r303637 fix for geom_disk. Reported by: avg Tested by: avg MFC after: 1 week	2017-03-01 18:03:32 +00:00
jpaetzel	78d6756cdb	MFV 314276 7570 tunable to allow zvol SCSI unmap to return on commit of txn to ZIL illumos/illumos-gate@1c9272b861 `1c9272b861` https://www.illumos.org/issues/7570 Based on the discovery that every unmap waits for the commit of the txn to the ZIL, introducing a very high latency to unmap commands, this behavior was made into a tunable zvol_unmap_sync_enabled and set to false. The net impact of this change is that by default SCSI unmap commands will result in space being freed within the zvol (today they are ignored and returned with good status). However, unlike the code today, instead of 18+ms per unmap, they take about 30us. With the testing done on NTFS against a Win2k12 target, the new behavior should work seamlessly. Files on the zvol that have already been set with the zfree application will continue to write 0's when deleted, and any new files created since zvol creation will send unmap commands when deleted. This behavior exists today, but with this change the unmap commands will be processed and result in reclaim of space. Author: Stephen Blinick <stephen.blinick@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Approved by: Robert Mustacchi <rm@joyent.com>	2017-02-25 20:01:17 +00:00
avg	ad87994751	l2arc: try to fix write size calculation broken by Compressed ARC commit While there, make a change to not evict a first buffer outside the requested eviciton range. To do: - give more consistent names to the size variables - upstream to OpenZFS PR: 216178 Reported by: lev Tested by: lev MFC after: 2 weeks	2017-02-25 17:03:48 +00:00
avg	d895ce4e14	zfs: call spa_deadman on a taskqueue thread callout(9) prohibits callout functions from sleeping. illumos mutexes are emulated using sx(9). spa_deadman() calls vdev_deadman() and the latter acquires vq_lock. As a result we can get a more confusing panic instead of a specific panic or no panic: sleepq_add: td 0xfffff80019669960 to sleep on wchan 0xfffff8001cff4d88 with sleeping prohibited This change adds another level of indirection where the deadman callout schedules spa_deadman() to be executed on taskqueue_thread. While there, use callout_schedule(0 instead of callout_reset() in spa_sync(). Discussed with: mav MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9762	2017-02-25 16:45:53 +00:00
jpaetzel	2140000a07	MFV 314243 6676 Race between unique_insert() and unique_remove() causes ZFS fsid change illumos/illumos-gate@40510e8eba `40510e8eba` https://www.illumos.org/issues/6676 The fsid of zfs filesystems might change after reboot or remount. The problem seems to be caused by a race between unique_insert() and unique_remove(). The unique_remove() is called from dsl_dataset_evict() which is now an asynchronous thread. In a case the dsl_dataset_evict() thread is very slow and calls unique_remove() too late we will end up with changed fsid on zfs mount. This problem is very likely caused by #5056. Steps to Reproduce Note: I'm able to reproduce this always on a single core (virtual) machine. On multicore machines it is not so easy to reproduce. # uname -a SunOS openindiana 5.11 illumos-633aa80 i86pc i386 i86pc Solaris # zfs create rpool/TEST # FS=$(echo ::fsinfo \| mdb -k \| grep TEST \| awk '{print $1}') # echo $FS::print vfs_t vfs_fsid \| mdb -k vfs_fsid = { vfs_fsid.val = [ 0x54d7028a, 0x70311508 ] } # zfs umount rpool/TEST # zfs mount rpool/TEST # FS=$(echo ::fsinfo \| mdb -k \| grep TEST \| awk '{print $1}') # echo $FS::print vfs_t vfs_fsid \| mdb -k vfs_fsid = { vfs_fsid.val = [ 0xd9454e49, 0x6b36d08 ] } # Impact The persistent fsid (filesystem id) is essential for proper NFS functionality. If the fsid of a filesystem changes on remount (or after reboot) the NFS clients might not be able to automatically recover from such event and the manual remount of the NFS filesystems on every NFS client might be needed. Author: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Dan Vatca <dan.vatca@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com>	2017-02-25 14:45:54 +00:00
avg	b50f2868e5	zfs: clean up unused files and definitions MFC after: 1 month X-MFC after: r314048	2017-02-24 07:53:56 +00:00
tsoome	b2b5986a97	loader: update symlink support in zfs reader As the current zfs file system is providing symlink via system attributes, need to update the code accordingly. Note, as the zfsboot code does not free the memory at this time, the object list will put some stress on the boot2 heap, eventually we should address the issue. Reviewed by: allanjude, smh Approved by: allanjude (mentor) Differential Revision: https://reviews.freebsd.org/D9706	2017-02-22 22:00:50 +00:00
avg	70167f709d	zfs: move zio_taskq_basedc under SYSDC That knob is useless without SDC (or alike) scheduling class support. That is, it's unused on FreeBSD. MFC after: 4 days	2017-02-21 21:11:58 +00:00
avg	47967f0879	zfs: lower priority of zio_write_issue threads by four The difference of one was insignificant because zio_write_issue threads ended up on the same run queues as other zio threads. See sys/priority.h and sys/runq.h for more details. Add a comment describing FreeBSD priority considerations and restore the illumos variant of the code for comparison. Obtained from: Panzura MFC after: 2 weeks Sponsored by: Panzura	2017-02-21 21:09:21 +00:00
avg	ef19eda2c7	reimplement zfsctl (.zfs) support The current code is written on top of GFS, a library with the generic support for writing filesystems, which was ported from illumos. Because of significant differences between illumos VFS and FreeBSD VFS models, both the GFS and zfsctl code were heavily modified to work on FreeBSD. Nonetheless, they still contain quite a few ugly hacks and bugs. This is a reimplementation of the zfsctl code where the VFS-specific bits are written from scratch and only the code that interacts with the rest of ZFS is reused. Some highlights. We use two types of nodes, static and on-demand. The static nodes are used for permanent directories like .zfs, .zfs/snapshot, etc. The on-demand nodes are used for ephemeral directories that act as snapshot mount points. Initially only static nodes are created. Their vnodes are instantiated when they are looked up. The on-demand nodes and vnodes are instantiated as needed and the nodes are destroyed as soon as the corresponding vnodes are reclaimed. We also try very hard to ensure that uncovered snapshot vnodes do not linger. They are supposed to become inactive as soon as they are uncovered and we try to recycle them immediately. When a filesystem is unmounted all snapshots under .zfs are unmounted first, then all vnodes are flushed and finally the static .zfs nodes are destroyed. There are some changes outside of zfsctl code too. z_ctldir is never used directly (as it is an opaque pointer), zfsctl_root() has to be used instead. The function returns a locked vnode now, so it accepts a lock flags parameter. The function can also fail now, e.g. during force unmounting, whereas previously it was infallible. zfsctl_root_lookup() is retired, instead of it VOP_LOOKUP() on the .zfs vnode (obtained with zfsctl_root) is used. Some ideas are picked from an independent work by will. Reviewed by: asomers, smh MFC after: 1 month Relnotes: maybe Differential Revision: https://reviews.freebsd.org/D7421	2017-02-21 17:47:08 +00:00
jpaetzel	641c2d24c5	MVF: 313876 7504 kmem_reap hangs spa_sync and administrative tasks illumos/illumos-gate@405a5a0f5c https://github.com/illumos/illumos-gate/commit/405a5a0f5c3ab36cb76559467d1a62ba648bd80 https://www.illumos.org/issues/7504 We see long spa_sync(). We are waiting to hold dp_config_rwlock for writer. Some other thread holds dp_config_rwlock for reader, then calls arc_get_data_buf(), which finds that arc_is_overflowing()==B_TRUE. So it waits (while holding dp_config_rwlock for reader) for arc_reclaim_thread to signal arc_reclaim_waiters_cv. Before signaling, arc_reclaim_thread does arc_kmem_reap_now(), which takes ~seconds. Author: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com>	2017-02-17 17:52:12 +00:00
markj	b12a88ae0f	Directly include needed headers rather than relying on pollution. We get machine/cpu.h via kmem.h -> proc.h -> _vm_domain.h -> seq.h. Reported by: Ryan Libby Sponsored by: Dell EMC Isilon X-MFC with: r313841	2017-02-17 03:27:20 +00:00
markj	2a3b2874a6	Prevent CPU migration when checking the DTrace nofault flag on x86. dtrace_trap() consumes page and protection faults triggered by code running in DTrace probe context. Such faults occur with interrupts disabled and are detected using a per-CPU flag. Regular faults cause dtrace_trap() to be called with interrupts enabled, and nothing was ensuring that the flag was read from the correct CPU. This may result in dtrace_trap() consuming unrelated page and protection faults when DTrace is enabled, causing the fault handler to return without actually having handled the fault. Diagnosed by: Ryan Libby <rlibby@gmail.com> MFC after: 3 days Sponsored by: Dell EMC Isilon	2017-02-16 23:05:20 +00:00
jpaetzel	32882dfcac	MFV 313786 7500 Simplify dbuf_free_range by removing dn_unlisted_l0_blkid illumos/illumos-gate@653af1b809 `653af1b809` https://www.illumos.org/issues/7500 With the integration of: commit 0f6d88aded0d165f5954688a9b13bac76c38da84 Author: Alex Reece <alex@delphix.com> Date: Sat Jul 26 13:40:04 2014 -0800 4873 zvol unmap calls can take a very long time for larger datasets the dnode's dn_bufs field was changed from a list to a tree. As a result, the dn_unlisted_l0_blkid field is no longer necessary. Author: Stephen Blinick <stephen.blinick@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com>	2017-02-16 19:00:09 +00:00
markj	9836e32a79	Use pget() instead of pfind() in fasttrap_pid_{enable,disable}(). Suggested by: mjg MFC after: 1 week	2017-02-15 06:07:01 +00:00
markj	dfae3be6c7	Check for an exiting process when enabling PID provider probes. MFC after: 1 week	2017-02-15 01:35:26 +00:00
avg	2e9e9620d9	remove l2_padding_needed statistic from zfs arc It became obsolete when the Compressed ARC support was committed. MFC after: 1 week	2017-02-12 19:45:30 +00:00
avg	764d943c05	check remaining space in zfs implementations of vptocnp PR: 216939 Submitted by: Iouri V. Ivliev <fbsd@any.com.ru> MFC after: 1 week	2017-02-12 19:40:59 +00:00
asomers	ab48acbf00	Fix setting birthtime in ZFS sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c * In zfs_freebsd_setattr, if the caller wants to set the birthtime, set the bits that zfs_settattr expects * In zfs_setattr, if XAT_CREATETIME is set, set xoa_createtime, expected by zfs_xvattr_set. The two levels of indirection seem excessive, but it minimizes diffs vs OpenZFS. * In zfs_setattr, check for overflow of va_birthtime (from delphij) * Remove red herring in zfs_getattr sys/cddl/contrib/opensolaris/uts/common/sys/vnode.h * Un-booby-trap some macros New tests are under review at https://github.com/pjd/pjdfstest/pull/6 Reviewed by: avg MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D9353	2017-02-09 21:30:53 +00:00
gnn	761aa7c014	Fix the ifdef protection and remove superfluous extern statements Reported by: Konstantin Belousov MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-02-07 01:21:18 +00:00
markj	a327df1219	Ensure that the DOF string length is divisible by 2. It is an ASCII encoding of a hexadecimal representation of the DOF file used to enable anonymous tracing, so its length should always be even. MFC after: 1 week	2017-02-05 02:47:34 +00:00
markj	d879975195	Use PC-relative relocations for USDT probe sites on i386 and amd64. When recording probe site addresses in the output DOF file, dtrace -G needs to emit relocations for the .SUNW_dof section in order to obtain the addresses of functions containing probe sites. DTrace expects the addresses to be relative to the base address of the final ELF file, and the amd64 USDT implementation was relying on some unspecified and incorrect behaviour in the base system GNU ld to achieve this. This change reimplements the probe site relocation handling to allow USDT to be used with lld and newer GNU binutils. Specifically, it makes use of R_X86_64_PC64/R_386_PC32 relocations to obtain the probe site address relative to the DOF file address, and adds and uses a new DOF relocation type which computes the final probe site address using these relative offsets. Reported by and discussed with: Rafael Espíndola MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D9374	2017-02-05 02:39:12 +00:00
gnn	da0598537e	Files which implement the new random number system code for DTrace Submitted by: Graeme Jenkinson MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-02-03 22:40:13 +00:00
gnn	45b8e2daa4	Replace the implementation of DTrace's RAND subroutine for generating low-quality random numbers with a modern implementation (xoroshiro128+) that is capable of generating better quality randomness without compromising performance. Submitted by: Graeme Jenkinson Reviewed by: markj MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9051	2017-02-03 22:26:19 +00:00
markj	3a9f9a6da6	Sync the x86 dis_tables.c with upstream. This corresponds to the following illumos issues: 5755 want support for Intel FMA instrs 5756 want support for Intel BMI1 instrs 5757 want support for Intel BMI2 instrs 5758 want support for Intel AVX2 instrs 7204 Want broadwell rdseed and adx support 7208 Want stac/clac disasm support 7733 Need SHA Instruction dis support 7756 dis can't handle x86 SSE 3 instructions 7757 want avx2 disasm tests 7758 want SSE 4.1 disasm tests MFC after: 2 weeks	2017-02-03 03:22:47 +00:00
bapt	bd0b52fc1f	Revert crap accidentally committed	2017-01-28 16:31:23 +00:00
bapt	02ac05d572	Revert r312923 a better approach will be taken later	2017-01-28 16:30:14 +00:00
markj	f41402441d	Fix an off-by-one in an assertion on fasttrap tracepoint sizes. FASTTRAP_MAX_INSTR_SIZE is the largest valid value of a tracepoint, so correct the assertion accordingly. This limit was hit with a 15-byte NOP. Reported by: bdrewery MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-01-27 17:58:41 +00:00
markj	8ae74334c6	Fix initialization of "p" after r312658. CID: 1369410	2017-01-25 16:35:57 +00:00
markj	8632319c4b	Remove the DTRACEHIOC_ADD ioctl. This ioctl has been considered legacy by upstream since the DTrace code was first imported, and is unused. The removal also allows some simplification of dtrace_helper_slurp(). Also remove a bogus copyout in the DTRACEHIOC_ADDDOF handler. Due to a bug, it would overwrite an in-memory copy of the DOF header rather than the passed-in DOF helper. Moreover, DTRACEHIOC_ADDDOF already copies the helper back out automatically since its argument has the IOC_OUT attribute.	2017-01-23 02:21:06 +00:00
jpaetzel	a7aeec2d89	MFV 312436 6569 large file delete can starve out write ops illumos/illumos-gate@ff5177ee8b `ff5177ee8b` https://www.illumos.org/issues/6569 The core issue I've found is that there is no throttle for how many deletes get assigned to one TXG. As a results when deleting large files we end up filling consecutive TXGs with deletes/frees, then write throttling other (more important) ops. There is an easy test case for this problem. Try deleting several large files (at least 1/2 TB) while you do write ops on the same pool. What we've seen is performance of these write ops (let's call it sideload I/O) would drop to zero. More specifically the problem is that dmu_free_long_range_impl() can/will fill up all of the dirty data in the pool "instantly", before many of the sideload ops can get in. So sideload performance will be impacted until all the files are freed. The solution we have tested at Nexenta (with positive results) creates a relatively simple throttle for how many "free" ops we let into one TXG. However this solution exposes other problems that should also be addressed. If we are to slow down freeing of data that means one has to wait even longer (assuming vnode ref count of 1) to get shell back after an rm or for NFS thread to finish the free-ing op. To avoid this the proposed solution is to call zfs_inactive() async for "large" files. Async freeing then begs for the reclaimed space to be accounted for in the zpool's "freeing" prop. The other issue with having a longer delete is the inability to export/unmount for a longer period of time. The proposed solution is to interrupt freeing of blocks when a fs is unmounted. Author: Alek Pinchuk <alek@nexenta.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed by: avg Differential Revision: D9008	2017-01-20 15:01:04 +00:00
andrew	7547465849	Use the kernel stack in the ARM FBT DTrace provider. This is used to find the fifth argument to functions being traced, however there was an error where the userspace stack was being used. This may be invalid leading to a kernel panic if this address is unmapped. Submitted by: Graeme Jenkinson <graeme.jenkinson@cl.cam.ac.uk> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9229	2017-01-18 13:27:24 +00:00
markj	a7ded95b7e	Have DTrace handle faults when dereferencing a lock object pointer. MFC after: 1 week	2017-01-11 01:18:06 +00:00
markj	adb95404d4	Ignore LC_SLEEPABLE when testing whether a mutex is adaptive. MFC after: 1 week	2017-01-11 01:15:55 +00:00
mjg	056dc629c1	Revert r309619 "ifndef atomic_cas_* in cddl code" It was a temporary change to ease an import of native atomic_cas primitives. Instead, atomic_fcmpset was devised with different semantics. See r311168.	2017-01-03 21:02:30 +00:00
markj	792f2dc38d	Remove the "unused" DIF subroutine index left after r308582. These indices are input to a build-time script that generates code to validate subroutine names.	2017-01-03 00:24:12 +00:00
markj	1bd7e208ec	Remove an obsolete pragma from dtrace.h. It triggers a compiler warning and has been removed upstream. MFC after: 1 week	2016-12-27 23:31:32 +00:00
gnn	64ddb8fc27	Remove extra DOF_SEC_XLIMPORT from the DOF_SEC_ISLOADABLE macro MFC after: 2 weeks Sponsored by: DARPA, AFRL	2016-12-16 20:44:14 +00:00
mav	d828ed76b5	Revert r310023 for now. After another look my new variable mapping was not exactly right.	2016-12-15 08:03:16 +00:00
mav	6d84de4da4	Reduce diff from Illumos by better variables mapping.	2016-12-13 16:20:10 +00:00
mav	f780634303	Postpone ZVOL media/block size caching till first open. At least on FreeBSD there are no legal way to access media or get its size without opening device/provider first. Postponing this caching allows to skip several disk seeks per ZVOL/snapshot during import. For HDD pool with 1 ZVOL in dev mode with 1000 snapshots this reduces pool import time from 40 to 10 seconds. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2016-12-11 19:50:39 +00:00
mav	40f50a36ce	Add missed vfs.zfs.zfetch.max_idistance sysctl.	2016-12-10 21:19:27 +00:00
markj	93c0c5f137	Don't create FBT probes for lock owner methods. These functions may be called in DTrace probe context, so they cannot be safely traced. Moreover, they are currently only used by DTrace, so their corresponding FBT probes are not particularly useful. MFC after: 2 weeks	2016-12-10 03:13:11 +00:00
markj	94473fd3c7	Consistently use fbt_excluded() on all architectures. MFC after: 2 weeks	2016-12-10 03:11:05 +00:00
mav	59a1ac276a	Fix spa_alloc_tree sorting by offset in r305331. Original commit "7090 zfs should improve allocation order" declares alloc queue sorted by time and offset. But in practice io_offset is always zero, so sorting happened only by time, while order of writes with equal time was completely random. On Illumos this did not affected much thanks to using high resolution timestamps. On FreeBSD due to using much faster but low resolution timestamps it caused bad data placement on disks, affecting further read performance. This change switches zio_timestamp_compare() from comparing uninitialized io_offset to really populated io_bookmark values. I haven't decided yet what to do with timestampts, but on simple tests this change gives the same peformance results by just making code to work as declared. MFC after: 1 week	2016-12-08 15:58:03 +00:00
gnn	ebeff40c9a	Fix a kernel panic in DTrace's rw_iswriter subroutine. On FreeBSD the sense of rw_write_held() and rw_iswriter() were reversed, probably due to a cut and paste error. Using rw_iswriter() would cause the kernel to panic. Reviewed by: markj MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D8718	2016-12-07 07:27:47 +00:00
mjg	73f53b1e61	ifndef atomic_cas_* in cddl code in preparation for native implementations This is a temporary change to not require all architectures to import at once. Discussed with: jhb	2016-12-06 14:08:49 +00:00

1 2 3 4 5 ...

1691 Commits