freebsd-skq

Author	SHA1	Message	Date
asomers	03e85c2d9b	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c When a da or ada device dissappears, outstanding IOs fail with ENXIO, not EIO. The check for EIO was probably copied from Illumos, where that is indeed the correct errno. Without this change, pulling a busy drive from a zpool would usually turn it into UNAVAIL, even though pulling an idle drive would turn it into REMOVED. With this change, it is REMOVED every time. Also, vdev_geom_io_intr shouldn't do zfs_post_remove, because that results in devd getting two resource.fs.zfs.removed events. The comment said that the event had to be sent directly instead of through the async removal thread because "the DE engine is using this information to discard prevoius I/O errors". However, the fact that vdev_geom_io_intr was never actually sending the events until now, and that vdev_geom_orphan never sent them at all, and that vdev_geom_orphan usually gets called about 2 seconds after the actual removal, means that FreeBSD's userland can cope with a late event just fine. Approved by: ken (mentor) Sponsored by: Spectra Logic Corporation MFC after: 4 weeks	2013-12-12 00:27:22 +00:00
markj	f8785b45de	Correct the check for errors from proc_rwmem(). MFC after: 2 weeks	2013-12-11 04:31:40 +00:00
mav	057ae4aad3	Don't even try to read vdev labels from devices smaller then SPA_MINDEVSIZE (64MB). Even if we would find one somehow, ZFS kernel code rejects such devices. It is funny to look on attempts to read 4 256K vdev labels from 1.44MB floppy, though it is not very practical and quite slow.	2013-12-10 12:36:44 +00:00
delphij	375701af53	Expose spa_asize_inflation. X-MFC-With: r258632	2013-12-06 23:49:16 +00:00
avg	430acb4217	zfs: add zfs_freebsd_putpages this should be more optimal than writing pages one-by-one via zfs_write -> update_pages in the case of multi-page putpages call MFC after: 16 days	2013-11-29 15:39:39 +00:00
avg	16f88ac15b	zfs: add dmu_write_pages variant for freebsd The freebsd variant of dmu_write_pages is hidden under _KERNEL to avoid needlessly pulling in vm_page_t declaration. Besides, this function seems to be useless for ZFS userland counterpart. MFC after: 15 days	2013-11-29 15:34:43 +00:00
avg	7a0711c338	zfs: make zfs_map_page / zfs_unmap_page public MFC after: 15 days	2013-11-29 15:33:40 +00:00
avg	63dbff5d06	drop ZUT_OBJ, zfs unit testing driver never materialzied in freebsd MFC after: 5 days	2013-11-29 15:32:53 +00:00
avg	89468053e3	zfs mappedread_sf: assert that a page is never partially valid ZFS never partially validates or invalidates a page. The higher level VM should not do that either. mappedread_sf correct operation depends on a page being either fully valid or invalid. MFC after: 7 days	2013-11-29 12:19:52 +00:00
avg	47f145913e	MFV r258665: 4347 ZPL can use dmu_tx_assign(TXG_WAIT) illumos/illumos-gate@e722410c49 MFC after: 9 days X-MFC after: r258632	2013-11-28 19:44:36 +00:00
avg	9932b97e88	MFV r258371,r258372: 4101 metaslab_debug should allow for fine-grained control 4101 metaslab_debug should allow for fine-grained control 4102 space_maps should store more information about themselves 4103 space map object blocksize should be increased 4104 ::spa_space no longer works 4105 removing a mirrored log device results in a leaked object 4106 asynchronously load metaslab illumos/illumos-gate@0713e232b7 Note that some tunables have been removed and some new tunables have been added. Of particular note, FreeBSD-only knob vfs.zfs.space_map_last_hope is removed as it was a nop for some time now (after one of the previous merges from upstream). MFC after: 11 days Sponsored by: HybridCluster [merge]	2013-11-28 19:37:22 +00:00
avg	8c62dc6efa	opensolaris compat: add taskq_wait emulation MFC after: 10 days	2013-11-28 19:17:11 +00:00
avg	db4cf528fc	fix a serious bug in r258632: offset parameter must be set in zio In illumos all ioctl zio-s are "global" at the moment. That is they act on a whole disk, e.g. a cache flush command, and thus do not need either offset or size parameters. FreeBSD, on the other hand, has support for TRIM command and that command requires proper offset and size parameters. Without this fix all TRIM commands act on the start of any disk or partition used by ZFS destroying any data there. Pointyhat to: avg Tested by: sbruno MFC after: 3 days X-MFC with: r258632 Sponsored by: HybridCluster	2013-11-28 08:48:49 +00:00
avg	1729bafbfe	fix debug.zfs_flags sysctl description in r258638 Pointyhat to: avg MFC after: 3 days	2013-11-26 10:57:09 +00:00
avg	aab537bd02	expose zfs_flags as debug.zfs_flags r/w tunable and sysctl This knob is purposefully hidden under debug. MFC after: 5 days Sponsored by: HybridCluster	2013-11-26 10:46:43 +00:00
avg	b0eda9bba9	MFV r258376: 3964 L2ARC should always compress metadata buffers illumos/illumos-gate@e4be62a2b7 MFC after: 10 days Sponsored by: HybridCluster [merge]	2013-11-26 10:14:23 +00:00
avg	5f1125f515	MFV r255256: 3954 metaslabs continue to load even after hitting zfs_mg_alloc_failure limit 4080 zpool clear fails to clear pool 4081 need zfs_mg_noalloc_threshold illumos/illumos-gate@22e30981d8 MFC after: 10 days Sponsored by: HybridCluster [merge]	2013-11-26 10:02:02 +00:00
avg	51e896cc78	MFV r255255: 4045 zfs write throttle & i/o scheduler performance work illumos/illumos-gate@69962b5647 Please note the following changes: - zio_ioctl has lost its priority parameter and now TRIM is executed with 'now' priority - some knobs are gone and some new knobs are added; not all of them are exposed as tunables / sysctls yet MFC after: 10 days Sponsored by: HybridCluster [merge]	2013-11-26 09:57:14 +00:00
avg	fb0b2e19b5	MFV r247578: 3581 spa_zio_taskq[ZIO_TYPE_FREE][ZIO_TASKQ_ISSUE]->tq_lock is piping hot illumos/illumos-gate@ec94d32216 MFC after: 9 days Sponsored by: HybridCluster [merge]	2013-11-26 09:45:48 +00:00
avg	d1f2a86401	734 taskq_dispatch_prealloc() desired 943 zio_interrupt ends up calling taskq_dispatch with TQ_SLEEP illumos/illumos-gate@5aeb94743e Essentially FreeBSD taskqueues already operate in a mode that was added to Illumos with taskq_dispatch_ent change. We even exposed the superior FreeBSD interface as taskq_dispatch_safe. Now we just rename taskq_dispatch_safe to taskq_dispatch_ent and struct struct ostask to taskq_ent_t, so that code differences will be minimal. After this change sys/cddl/compat/opensolaris/sys/taskq.h header is no longer needed. Note that this commit is not an MFV because the upstream change was not individually committed to the vendor area. MFC after: 8 days	2013-11-26 09:26:18 +00:00
avg	a1e8ebde3f	opensolaris taskq: some cosmetic changes - drop trailing whitespace - remove redundant "extern" from function declarations - remove unused macro MFC after: 1 week	2013-11-26 09:10:01 +00:00
avg	37cef93b68	sdt: add support for solaris/illumos style DTRACE_PROBE macros The new macros are implemented in terms of SDT_PROBE_DEFINE and SDT_PROBE. Probes defined in this way will appear under SDT provider named "sdt". Parameter types are exposed via SDT_PROBE_ARGTYPE. This is something that illumos does not have by default. This kind of SDT probes is already present in ZFS code, so those probes will now be available if KDTRACE_HOOKS options is enabled. A potential future illumos compatibility enhancement is to encode a provider name as a prefix in a probe name. Reviewed by: markj MFC after: 3 weeks X-MFC after: r258622	2013-11-26 08:49:53 +00:00
avg	71889a5eff	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
pjd	b9061d4b3d	When append-only, immutable or read-only flag is set don't allow for hard links creation. This matches UFS behaviour. Reported by: Oleg Ginzburg <olevole@olevole.ru> MFC after: 1 month	2013-11-25 21:17:14 +00:00
attilio	7ee4e910ce	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
avg	5499e7013c	MFV r258378: 4089 NULL pointer dereference in arc_read() illumos/illumos-gate@57815f6b95 Tested by: adrian MFC after: 4 days	2013-11-20 11:52:32 +00:00
avg	64efe866e5	MFV r258377: 4088 use after free in arc_release() illumos/illumos-gate@ccc22e1304 MFC after: 5 days	2013-11-20 11:47:50 +00:00
jhibbits	008d8ce40f	Fix the function search space. Submitted by: Howard Su	2013-11-20 01:33:13 +00:00
avg	92a63ca12f	zfs page_busy: fix the boundaries of the cleared range This is a fix for a regression introduced in r246293. vm_page_clear_dirty expects the range to have DEV_BSIZE aligned boundaries, otherwise it extends them. Thus it can happen that the whole page is marked clean while actually having some small dirty region(s). This commit makes the range properly aligned and ensures that only the clean data is marked as such. It would interesting to evaluate how much benefit clearing with DEV_BSIZE granularity produces. Perhaps instead we should clear the whole page when it is completely overwritten and don't bother clearing any bits if only a portion a page is written. Reported by: George Hartzell <hartzell@alerce.com>, Richard Todd <rmtodd@servalan.servalan.com> Tested by: George Hartzell <hartzell@alerce.com>, Reviewed by: kib MFC after: 5 days	2013-11-19 18:43:47 +00:00
mav	6479b7a632	Reenable vfs.zfs.zio.use_uma for amd64, disabled at r209261. On machines with seveal CPUs and enough RAM this can easily twice improve ZFS performance or twice reduce CPU usage. It was disabled three years ago due to memory and KVA exhaustion reports, but our VM subsystem got improved a lot since that time, hopefully enough to make another try.	2013-11-19 11:19:07 +00:00
asomers	5b09abdb34	opensolaris/uts/common/dtrace/fasttrap.c Fix several problems that can cause panics on kldload and kldunload. * kproc_create(fasttrap_pid_cleanup_cb, ...) gets called before fasttrap_provs.fth_table gets allocated. This can lead to a panic on module load, because fasttrap_pid_cleanup_cb references fasttrap_provs.fth_table. Move kproc_create down after the point that fasttrap_provs.fth_table gets allocated, and modify the error handling accordingly. * dtrace_fasttrap_{fork,exec,exit} weren't getting NULLed until after fasttrap_provs.fth_table got freed. That caused panics on module unload because fasttrap_exec_exit calls fasttrap_provider_retire, which references fasttrap_provs.fth_table. NULL those function pointers earlier. * There wasn't any code to destroy the fasttrap_{tpoints,provs,procs}.fth_table mutexes on module unload, leading to a resource leak when WITNESS is enabled. Destroy those mutexes during fasttrap_unload(). Reviewed by: markj Approved by: ken (mentor) Sponsored by: Spectra Logic MFC after: 4 weeks	2013-11-18 16:51:56 +00:00
smh	3078082cfe	Fix ZFS deadlock when sending a snapshot which is mounted. MFC after: 1 week Sponsored by: Multiplay	2013-11-18 11:28:19 +00:00
markj	19a7950d1d	The fasttrap ioctl used to create probes takes a variable-sized argument. It was not being correctly copied into the kernel on FreeBSD, and as a result, probes with multiple probe sites were not being created properly. To fix this, change the ioctl definition so that the fasttrap ioctl handler is responsible for copying in userland data. Submitted by: Prashanth Kumar <pra_udupi@yahoo.co.in> MFC after: 1 month	2013-11-18 03:24:50 +00:00
mav	6e9db1ae07	Introduce allocation cache to store LZ4 compression contexts without kicking VM subsystem twice for every written record. Tests on 24-core system show double reduction of CPU time spent on copying single large well-compressed file. This patch is not really needed on illumos (while not harm either) since their memory allocator by default uses caching for all requests up to 128K. Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>	2013-11-14 15:54:54 +00:00
markj	2547e15155	Use suword32 and suword64 instead of copyout(9). This fixes a bug in the emulation of the call instruction caused by reversing the uaddr and kaddr arguments when copying data out to userland: the suword* functions take the uaddr as the first argument whereas copyout(9) takes the kaddr as the first argument. This also partially undoes the fixes from r257143. Submitted by: Prashanth Kumar <pra_udupi@yahoo.co.in> (original version) MFC after: 1 month	2013-11-05 06:13:46 +00:00
markj	a5fb1fbfd8	Remove references to an unused fasttrap probe hook, and remove the corresponding x86 trap type. Userland DTrace probes are currently handled by the other fasttrap hooks (dtrace_pid_probe_ptr and dtrace_return_probe_ptr). Discussed with: rpaulo	2013-10-31 02:35:00 +00:00
markj	130419d137	Do some cleanup of the SDT code. In particular, * Remove the unused sdt cdev. * Don't bother keeping a list of probes in struct sdt_prov; it's not needed. * Invoke sdt_load and sdt_unload from the module handler instead of registering separate SYSINITs. * Keep to within 80 columns. * Check for errors from dtrace_unregister().	2013-10-26 06:23:51 +00:00
markj	2dcf53c15b	Fix a couple of bugs in the fasttrap emulation of a "push %rbp" instruction: the code was trying to save the stack pointer rather than the frame pointer, and the arguments to copyout(9) were reversed, so nothing ended up being saved on the stack. This would cause process crashes when the pid provider was being used to instrument calls of a function starting with this instruction. Reported by: symbolics@gmx.com Tested by: symbolics@gmx.com (earlier version) MFC after: 2 weeks	2013-10-26 03:21:54 +00:00
jhibbits	fc498ec178	ELF PowerPC64 ABI puts the LR save word at 16 byte offset, not 8.	2013-10-25 00:17:12 +00:00
smh	5c7a6f5d92	Improve ZFS N-way mirror read performance by using load and locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: https://github.com/zfsonlinux/zfs/pull/1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay	2013-10-23 09:54:58 +00:00
smh	1c55b38aeb	Use the vdev's ashift to calculate the supported min block size passed to zio_compress_data(..) when compressing l2arc buffers. This eliminates l2arc I/O errors, which resulted in very poor performance on vdev's configured with block size greater than 512b due to compression assuming a smaller min block size than the vdev supports. MFC after: 2 days	2013-10-22 13:31:36 +00:00
mav	4219fc0074	Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months	2013-10-22 08:22:19 +00:00
markj	041e0d0c57	When fetching function arguments out of a frame on amd64, explicitly select the register based on the argument index rather than relying on the fields in struct reg to be in the right order. This assumption is incorrect on FreeBSD and generally led to bogus argument values for the sixth argument of PID and USDT probes; the first five are passed directly to dtrace_probe() via the fasttrap trap handler and so were correctly handled. MFC after: 2 weeks	2013-10-21 04:15:55 +00:00
markj	3ecc6f1298	Add a function, memstr, which can be used to convert a buffer of null-separated strings to a single string. This can be used to print the full arguments of a process using execsnoop (from the DTrace toolkit) or with the following one-liner: dtrace -n 'syscall::execve:return {trace(curpsinfo->pr_psargs);}' Note that this relies on the process arguments being cached via the struct proc, which means that it will not work for argvs longer than kern.ps_arg_cache_limit. However, the following rather non-portable script can be used to extract any argv at exec time: fbt::kern_execve:entry { printf("%s", memstr(args[1]->begin_argv, ' ', args[1]->begin_envv - args[1]->begin_argv)); } The debug.dtrace.memstr_max sysctl limits the maximum argument size to memstr(). Thanks to Brendan Gregg for helpful comments on freebsd-dtrace. Tested by: Fabian Keil (earlier version) MFC after: 2 weeks	2013-10-16 01:39:26 +00:00
jhibbits	0b9629ab6a	Add fasttrap for PowerPC. This is the last piece of the dtrace/ppc puzzle. It's incomplete, it doesn't contain full instruction emulation, but it should be sufficient for most cases. MFC after: 1 month	2013-10-15 15:00:29 +00:00
avg	1446c5336b	MFV r255257: 4082 zfs receive gets EFBIG from dmu_tx_hold_free() illumos change 14172:be36a38bac3d: illumos ZFS issues: 4082 zfs receive gets EFBIG from dmu_tx_hold_free() Please note that this change is slightly different from r255257, because it is merged out of order with other (larger) upstream changes. PR: kern/182570 Reported by: Keith White <kwhite@site.uottawa.ca> Tested by: Keith White <kwhite@site.uottawa.ca> Approved by: re (glebius) MFC after: 1 week X-MFC after: r254753	2013-10-10 09:53:46 +00:00
markj	4e3872abc7	Initialize and free the DTrace taskqueue in the dtrace module load/unload handlers rather than in the dtrace device open/close methods. The current approach can cause a panic if the device is closed which the taskqueue thread is active, or if a kernel module containing a provider is unloaded while retained enablings are present and the dtrace device isn't opened. Submitted by: gibbs (original version) Reviewed by: gibbs Approved by: re (glebius) MFC after: 2 weeks	2013-10-08 12:56:46 +00:00
delphij	038b37b952	Improve lzjb decompress performance by reorganizing the code to tighten the copy loop. Submitted by: Denis Ahrens <denis h3q com> MFC after: 2 weeks Approved by: re (gjb)	2013-10-08 01:38:24 +00:00
gibbs	82601b02ea	Optimize the block size used on ZFS cache devices as is already done for data and log devices. Reported by: Dmitryy Makarov Submitted by: smh Reviewed by: gibbs Approved by: re (delphij) MFC after: 2 weeks	2013-09-21 03:52:08 +00:00
delphij	d04a7f0144	MFV r254750: Add support of Illumos dumps on zvol over RAID-Z. Note that this only adds the features. FreeBSD would still need more work to support dumping on zvols. Illumos ZFS issues: 2932 support crash dumps to raidz, etc. pools MFC after: 1 month Approved by: re (ZFS blanket)	2013-09-21 00:17:26 +00:00

1 2 3 4 5 ...

949 Commits