freebsd-dev

Author	SHA1	Message	Date
Roger Pau Monné	83c2fa73e6	xen-blkfront: fix memory leak in xbd_connect error path If gnttab_grant_foreign_access() fails for any of the indirection pages, the code breaks out of both the loops without freeing the local variable indirectpages, causing a memory leak. Submitted by: Pratyush Yadav <pratyush@freebsd.org> Differential Review: https://reviews.freebsd.org/D16136	2018-07-30 11:27:51 +00:00
Roger Pau Monné	8b19549b0e	xen-blkfront: fix length check Length is an unsigned integer, so checking against < 0 doesn't make sense. While there also make clear that a length of 0 always succeeds. Submitted by: Pratyush Yadav <pratyush@freebsd.org> Differential Review: https://reviews.freebsd.org/D16045	2018-07-30 11:15:20 +00:00
Pedro F. Giffuni	ac2fffa4b7	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
Pedro F. Giffuni	26c1d774b5	dev: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these is likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values.	2018-01-13 22:30:30 +00:00
Jason A. Harmening	eb36b1d0bc	Clean up MD pollution of bus_dma.h: --Remove special-case handling of sparc64 bus_dmamap* functions. Replace with a more generic mechanism that allows MD busdma implementations to generate inline mapping functions by defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This is currently useful for sparc64, x86, and arm64, which all implement non-load dmamap operations as simple wrappers around map objects which may be bus- or device-specific. --Remove NULL-checked bus_dmamap macros. Implement the equivalent NULL checks in the inlined x86 implementation. For non-x86 platforms, these checks are a minor pessimization as those platforms do not currently allow NULL maps. NULL maps were originally allowed on arm64, which appears to have been the motivation behind adding arm[64]-specific barriers to bus_dma.h, but that support was removed in r299463. --Simplify the internal interface used by the bus_dmamap_load* variants and move it to bus_dma_internal.h --Fix some drivers that directly include sys/bus_dma.h despite the recommendations of bus_dma(9) Reviewed by: kib (previous revision), marius Differential Revision: https://reviews.freebsd.org/D10729	2017-07-01 05:35:29 +00:00
Roger Pau Monné	e5d27b37e3	xen/blkfront: correctly detach a disk with active users Call disk_gone when the backend switches to the "Closing" state and blkfront still has pending users. This allows the disk to be detached, and will call into xbd_closing by itself when the geom layout cleanup has finished. Reported by: bapt Tested by: manu Reviewed by: bapt Sponsored by: Citrix Systems R&D MFC after: 1 week Differential revision: https://reviews.freebsd.org/D10772	2017-05-19 08:11:15 +00:00
Roger Pau Monné	8dee0e9bd6	xen: add support for canceled suspend When running on Xen, it's possible that a suspend request to the hypervisor fails (return from HYPERVISOR_suspend different than 0). This means that the suspend hasn't succeed, and the resume procedure needs to properly handle this case. First of all, when such situation happens there's no need to reset the vector callback, hypercall page, shared info, event channels or grant table, because it's state is preserved. Also, the PV drivers don't need to be reset to the initial state, since the connection with the backed has not been interrupted. Submitted by: Liuyingdong <liuyingdong@huawei.com> Reviewed by: royger MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9635	2017-03-07 09:16:51 +00:00
Dimitry Andric	085def3f0a	In xbd_connect(), use correct scanf conversion specifiers for the feature_barrier and feature_flush variables. Otherwise, adjacent variables on the stack, such as sector_size, may be overwritten, with disastrous results. Note that I did not see a good reason to revert the addition of zero checks introduced in r310013. Better safe than sorry. PR: 215209 Tested by: royger MFC after: 3 days	2016-12-14 19:28:19 +00:00
Colin Percival	93954c2da3	Check that blkfront devices have a non-zero number of sectors and a non-zero sector size. Such a device would be a virtual disk of zero bytes; clearly not useful, and not something we should try to attach. As a fortuitous side effect, checking that these values are non-zero here results in them not becoming zero later on the function. This odd behaviour began with r309124 (clang 3.9.0) but is challenging to debug; making any changes to this function whatsoever seems to affect the llvm optimizer behaviour enough to make the unexpected zeroing of the sector_size variable cease. PR: 215209 Security: The potential for variables to unexpectedly become zero has worrying consequences for security in general, but not so much in this particular context.	2016-12-13 06:54:13 +00:00
Pedro F. Giffuni	453130d9bf	sys/dev: minor spelling fixes. Most affect comments, very few have user-visible effects.	2016-05-03 03:41:25 +00:00
Alexander Motin	d5d7399d2d	Pass through some new block device features. MFC after: 1 month	2016-04-03 11:18:20 +00:00
Colin Percival	bcccdfa37b	Don't dereference a pointer immediately after determining that it is equal to NULL. [1] While I'm here, s/xb/xbd/ (the name changed a long time ago but this instance wasn't corrected). Reported by: PVS-Studio [1]	2016-02-14 13:42:16 +00:00
Colin Percival	cbb261aec7	Add two more assertions to catch busdma problems. Each segment provided by busdma to the blkfront driver must be an integer number of sectors, and must be aligned in memory on a "sector" boundary. Having these assertions yesterday would have made finding the bug fixed in r293698 somewhat easier.	2016-01-11 21:02:30 +00:00
Roger Pau Monné	a55a04a892	xen-blkfront: add support for unmapped IO Using unmapped IO is really beneficial when running inside of a VM, since it avoids IPIs to other vCPUs in order to invalidate the mappings. This patch adds unmapped IO support to blkfront. The following tests results have been obtained when running on a Xen host without HAP: PVHVM 3165.84 real 6354.17 user 4483.32 sys PVHVM with unmapped IO 2099.46 real 4624.52 user 2967.38 sys This is because when running using shadow page tables TLB flushes and range invalidations are much more expensive, so using unmapped IO provides a very important performance boost. Sponsored by: Citrix Systems R&D MFC after: 2 weeks X-MFC-with: r290610 dev/xen/blkfront/blkfront.c: - Add and announce support for unmapped IO.	2015-11-09 12:22:44 +00:00
Roger Pau Monné	f4576dd975	x86/dma_bounce: revert r289834 and r289836 The new load_ma implementation can cause dereferences when used with certain drivers, back it out until the reason is found: Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 03 fault virtual address = 0x30 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808a2d22 stack pointer = 0x28:0xfffffe07cc737710 frame pointer = 0x28:0xfffffe07cc737790 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 13 (g_down) trap number = 12 panic: page fault cpuid = 11 KDB: stack backtrace: #0 0xffffffff80641647 at kdb_backtrace+0x67 #1 0xffffffff80606762 at vpanic+0x182 #2 0xffffffff806067e3 at panic+0x43 #3 0xffffffff8084eef1 at trap_fatal+0x351 #4 0xffffffff8084f0e4 at trap_pfault+0x1e4 #5 0xffffffff8084e82f at trap+0x4bf #6 0xffffffff80830d57 at calltrap+0x8 #7 0xffffffff8063beab at _bus_dmamap_load_ccb+0x1fb #8 0xffffffff8063bc51 at bus_dmamap_load_ccb+0x91 #9 0xffffffff8042dcad at ata_dmaload+0x11d #10 0xffffffff8042df7e at ata_begin_transaction+0x7e #11 0xffffffff8042c18e at ataaction+0x9ce #12 0xffffffff802a220f at xpt_run_devq+0x5bf #13 0xffffffff802a17ad at xpt_action_default+0x94d #14 0xffffffff802c0024 at adastart+0x8b4 #15 0xffffffff802a2e93 at xpt_run_allocq+0x193 #16 0xffffffff802c0735 at adastrategy+0xf5 #17 0xffffffff80554206 at g_disk_start+0x426 Uptime: 2m29s	2015-10-26 14:50:35 +00:00
Roger Pau Monné	ee74891fc7	blkfront: add support for unmapped IO Using unmapped IO is really beneficial when running inside of a VM, since it avoids IPIs to other vCPUs in order to invalidate the mappings. This patch adds unmapped IO support to blkfront. The following tests results have been obtained when running on a Xen host without HAP: PVHVM 3165.84 real 6354.17 user 4483.32 sys PVHVM with unmapped IO 2099.46 real 4624.52 user 2967.38 sys This is because when running using shadow page tables TLB flushes and range invalidations are much more expensive, so using unmapped IO provides a very important performance boost. Sponsored by: Citrix Systems R&D MFC after: 2 weeks X-MFC-with: r289834	2015-10-23 15:46:42 +00:00
Roger Pau Monné	1a52c10530	Update Xen headers from 4.2 to 4.6 Pull the latest headers for Xen which allow us to add support for ARM and use new features in FreeBSD. This is a verbatim copy of the xen/include/public so every headers which don't exits anymore in the Xen repositories have been dropped. Note the interface version hasn't been bumped, it will be done in a follow-up. Although, it requires fix in the code to get it compiled: - sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so drop it. - {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on xen/interface/event_channel.h, so include it. - {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include machine/intr_machdep.h. This is also fixing build compilation with the new headers. - dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas been dropped. So directly use struct blkif_request_segment Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if __XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file where xen/xen-os.h is not correctly included. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3805 Sponsored by: Citrix Systems R&D	2015-10-06 11:29:44 +00:00
Roger Pau Monné	f8f1bb83f7	xen: allow disabling PV disks and nics Introduce two new loader tunnables that can be used to disable PV disks and PV nics at boot time. They default to 0 and should be set to 1 (or any number different than 0) in order to disable the PV devices: hw.xen.disable_pv_disks=1 hw.xen.disable_pv_nics=1 In /boot/loader.conf will disable both PV disks and nics. Sponsored by: Citrix Systems R&D Tested by: Karl Pielorz <kpielorz_lst@tdx.co.uk> MFC after: 1 week	2015-08-21 15:53:08 +00:00
John Baldwin	3c790178c5	Remove some more vestiges of the Xen PV domu support. Specifically, use vtophys() directly instead of vtomach() and retire the no-longer-used headers <machine/xenfunc.h> and <machine/xenvar.h>. Reported by: bde (stale bits in <machine/xenfunc.h>) Reviewed by: royger (earlier version) Differential Revision: https://reviews.freebsd.org/D3266	2015-08-06 17:07:21 +00:00
Colin Percival	aaebf69062	Add support for Xen blkif indirect segment I/Os. This makes it possible for the blkfront driver to perform I/Os of up to 2 MB, subject to support from the blkback to which it is connected and the initiation of such large I/Os by the rest of the kernel. In practice, the I/O size is increased from 40 kB to 128 kB. The changes to xen/interface/io/blkif.h consist merely of merging updates from the upstream Xen repository. In dev/xen/blkfront/block.h we add some convenience macros and structure fields used for indirect-page I/Os: The device records its negotiated limit on the number of indirect pages used, while each I/O command structure gains permanently allocated page(s) for indirect page references and the Xen grant references for those pages. In dev/xen/blkfront/blkfront.c we now check in xbd_queue_cb whether a request is small enough to handle without an indirection page, and either follow the previous behaviour or use new code for issuing an indirect segment I/O. In xbd_connect we read the size of indirect segment I/Os supported by the backend and select the maximum size we will use; then allocate the pages and Xen grant references for each I/O command structure. In xbd_free those grants and pages are released. A new loader tunable, hw.xbd.xbd_enable_indirect, can be set to 0 in order to disable this functionality; it works by pretending that the backend does not support this feature. Some backends exhibit a loss of performance with large I/Os, so users may wish to test with and without this functionality enabled. Reviewed by: royger MFC after: 3 days Relnotes: yes	2015-07-30 03:50:01 +00:00
Colin Percival	7a70e15235	Rename mksegarray to xbd_mksegarray for consistency with other function names in this file. Submitted by: royger	2015-06-23 06:50:03 +00:00
Colin Percival	ad935ed241	Garbage collect comments and a macro which related to the pre-r284296 support for a "segment block" extension in FreeBSD's Xen blkfront/blkback drivers. This commit should not result in any functional changes.	2015-06-21 06:52:03 +00:00
Colin Percival	91fb36cfa8	Move the bus_dma_tag creation and per-transaction data allocation from xbd_initialize to xbd_connect. Both of these initialization steps need to know what the maximum possible I/O size will be, and when we gain support for indirect segment I/Os we won't know that value until we reach xbd_connect. Since none of this data is used before xbd_connect completes, moving the initialization is harmless. This commit should not result in any functional changes.	2015-06-21 05:36:58 +00:00
Colin Percival	0115209538	If we fail to allocate memory, pass ENOMEM as the error code, not the "error" variable (which is always zero at this point).	2015-06-21 05:32:56 +00:00
Colin Percival	d0ecc14d49	Refactor xbd_queue_cb, extracting the code which converts bus_dma segments into blkif segments, and moving it into a new function. This will be used by upcoming support for indirect-segment blkif requests. This commit should not result in any functional changes.	2015-06-20 00:02:03 +00:00
Colin Percival	d33a1217bd	Minor clean up to xbd_queue_cb: * nsegs must be at most BLKIF_MAX_SEGMENTS_PER_REQUEST (since we specify that limit to bus_dma_tag_create), so KASSERT that rather than silently adjusting the request. * block_segs is now a synonym for nsegs, so garbage collect that variable. * nsegs is never read during or after the while loop, so remove the dead decrement from the loop. These were all left behind from the pre-r284296 support for a "segment block" extension.	2015-06-19 22:40:58 +00:00
Roger Pau Monné	112cacaee4	xen-blk{front/back}: remove broken FreeBSD extensions The FreeBSD extension adds a new request type, called blkif_segment_block which has a size of 112bytes for both i386 and amd64. This is fine on amd64, since requests have a size of 112B there also. But this is not true for i386, where requests have a size of 108B. So on i386 we basically overrun the ring slot when queuing a request of type blkif_segment_block_t, which is very bad. Remove this extension (including a cleanup of the public blkif.h header file) from blkfront and blkback. Sponsored by: Citrix Systems R&D Tested-by: cperciva	2015-06-12 07:50:34 +00:00
Roger Pau Monné	77b6916d2e	Revert r269814: blkfront: add support for unmapped IO Current busdma code for unmapped bios will not properly align the segment size, causing corruption on blkfront devices. Revert the commit until busdma code is fixed. Reported by: mav MFC after: 1 day	2014-09-04 14:56:24 +00:00
Roger Pau Monné	38232e93d8	blkfront: add support for unmapped IO Using unmapped IO is really beneficial when running inside of a VM, since it avoids IPIs to other vCPUs in order to invalidate the mappings. This patch adds unmapped IO support to blkfront. The following tests results have been obtained when running on a Xen host without HAP: PVHVM 3165.84 real 6354.17 user 4483.32 sys PVHVM with unmapped IO 2099.46 real 4624.52 user 2967.38 sys This is because when running using shadow page tables TLB flushes and range invalidations are much more expensive, so using unmapped IO provides a very important performance boost. Sponsored by: Citrix Systems R&D Tested by: robak MFC after: 1 week PR: 191173 dev/xen/blkfront/blkfront.c: - Add and announce support for unmapped IO.	2014-08-11 15:37:02 +00:00
Justin T. Gibbs	a371f519da	Allow FreeBSD to be booted from CDROM media on XenServer 6.2 and prior releases. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (gjb) sys/dev/xen/blkfront/blkfront.c: On XenServer versions up to an including 6.2, paravirtualized CDROM support is broken. When running in an HVM domain, ignore paravirtualized instances of CDROM media, and instead rely on native drivers attaching to emulated hardware. This functions correctly on all currently known Xen based platforms.	2013-10-13 02:34:20 +00:00
Colin Percival	48a1ceed53	If reading a virtual-device value fails, attempt to read a virtual-device-ext value. Some hosts do not publish "extended" disk IDs via virtual-device in an attempt to avoid confusing old blkfront drivers, and without this change we failed to attach such disks. In particular, this commit allows all 24 ephemeral disks on EC2 hs1.8xlarge instances to be used, instead of only the first 15. MFC after: 3 days	2013-08-30 01:46:56 +00:00
Justin T. Gibbs	76acc41fb7	Implement vector callback for PVHVM and unify event channel implementations Re-structure Xen HVM support so that: - Xen is detected and hypercalls can be performed very early in system startup. - Xen interrupt services are implemented using FreeBSD's native interrupt delivery infrastructure. - the Xen interrupt service implementation is shared between PV and HVM guests. - Xen interrupt handlers can optionally use a filter handler in order to avoid the overhead of dispatch to an interrupt thread. - interrupt load can be distributed among all available CPUs. - the overhead of accessing the emulated local and I/O apics on HVM is removed for event channel port events. - a similar optimization can eventually, and fairly easily, be used to optimize MSI. Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure, and misc Xen cleanups: Sponsored by: Spectra Logic Corporation Unification of PV & HVM interrupt infrastructure, bug fixes, and misc Xen cleanups: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/x86/x86/local_apic.c: sys/amd64/include/apicvar.h: sys/i386/include/apicvar.h: sys/amd64/amd64/apic_vector.S: sys/i386/i386/apic_vector.s: sys/amd64/amd64/machdep.c: sys/i386/i386/machdep.c: sys/i386/xen/exception.s: sys/x86/include/segments.h: Reserve IDT vector 0x93 for the Xen event channel upcall interrupt handler. On Hypervisors that support the direct vector callback feature, we can request that this vector be called directly by an injected HVM interrupt event, instead of a simulated PCI interrupt on the Xen platform PCI device. This avoids all of the overhead of dealing with the emulated I/O APIC and local APIC. It also means that the Hypervisor can inject these events on any CPU, allowing upcalls for different ports to be handled in parallel. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: Map Xen per-vcpu area during AP startup. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: Increase the FreeBSD IRQ vector table to include space for event channel interrupt sources. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: Remove Xen HVM per-cpu variable data. These fields are now allocated via the dynamic per-cpu scheme. See xen_intr.c for details. sys/amd64/include/xen/hypercall.h: sys/dev/xen/blkback/blkback.c: sys/i386/include/xen/xenvar.h: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/xen/gnttab.c: Prefer FreeBSD primatives to Linux ones in Xen support code. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: sys/dev/xen/balloon/balloon.c: sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/console/xencons_ring.c: sys/dev/xen/control/control.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/xenpci/xenpci.c: sys/i386/i386/machdep.c: sys/i386/include/pmap.h: sys/i386/include/xen/xenfunc.h: sys/i386/isa/npx.c: sys/i386/xen/clock.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/xen_rtc.c: sys/xen/evtchn/evtchn_dev.c: sys/xen/features.c: sys/xen/gnttab.c: sys/xen/gnttab.h: sys/xen/hvm.h: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_if.m: sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusvar.h: sys/xen/xenstore/xenstore.c: sys/xen/xenstore/xenstore_dev.c: sys/xen/xenstore/xenstorevar.h: Pull common Xen OS support functions/settings into xen/xen-os.h. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: Remove constants, macros, and functions unused in FreeBSD's Xen support. sys/xen/xen-os.h: sys/i386/xen/xen_machdep.c: sys/x86/xen/hvm.c: Introduce new functions xen_domain(), xen_pv_domain(), and xen_hvm_domain(). These are used in favor of #ifdefs so that FreeBSD can dynamically detect and adapt to the presence of a hypervisor. The goal is to have an HVM optimized GENERIC, but more is necessary before this is possible. sys/amd64/amd64/machdep.c: sys/dev/xen/xenpci/xenpcivar.h: sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/sys/kernel.h: Refactor magic ioport, Hypercall table and Hypervisor shared information page setup, and move it to a dedicated HVM support module. HVM mode initialization is now triggered during the SI_SUB_HYPERVISOR phase of system startup. This currently occurs just after the kernel VM is fully setup which is just enough infrastructure to allow the hypercall table and shared info page to be properly mapped. sys/xen/hvm.h: sys/x86/xen/hvm.c: Add definitions and a method for configuring Hypervisor event delievery via a direct vector callback. sys/amd64/include/xen/xen-os.h: sys/x86/xen/hvm.c: sys/conf/files: sys/conf/files.amd64: sys/conf/files.i386: Adjust kernel build to reflect the refactoring of early Xen startup code and Xen interrupt services. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: sys/dev/xen/control/control.c: sys/dev/xen/evtchn/evtchn_dev.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/xen/xenstore/xenstore.c: sys/xen/evtchn/evtchn_dev.c: sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c Adjust drivers to use new xen_intr_*() API. sys/dev/xen/blkback/blkback.c: Since blkback defers all event handling to a taskqueue, convert this task queue to a "fast" taskqueue, and schedule it via an interrupt filter. This avoids an unnecessary ithread context switch. sys/xen/xenstore/xenstore.c: The xenstore driver is MPSAFE. Indicate as much when registering its interrupt handler. sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbusvar.h: Remove unused event channel APIs. sys/xen/evtchn.h: Remove all kernel Xen interrupt service API definitions from this file. It is now only used for structure and ioctl definitions related to the event channel userland device driver. Update the definitions in this file to match those from NetBSD. Implementing this interface will be necessary for Dom0 support. sys/xen/evtchn/evtchnvar.h: Add a header file for implemenation internal APIs related to managing event channels event delivery. This is used to allow, for example, the event channel userland device driver to access low-level routines that typical kernel consumers of event channel services should never access. sys/xen/interface/event_channel.h: sys/xen/xen_intr.h: Standardize on the evtchn_port_t type for referring to an event channel port id. In order to prevent low-level event channel APIs from leaking to kernel consumers who should not have access to this data, the type is defined twice: Once in the Xen provided event_channel.h, and again in xen/xen_intr.h. The double declaration is protected by __XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared twice within a given compilation unit. sys/xen/xen_intr.h: sys/xen/evtchn/evtchn.c: sys/x86/xen/xen_intr.c: sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/xenpci/xenpcivar.h: New implementation of Xen interrupt services. This is similar in many respects to the i386 PV implementation with the exception that events for bound to event channel ports (i.e. not IPI, virtual IRQ, or physical IRQ) are further optimized to avoid mask/unmask operations that aren't necessary for these edge triggered events. Stubs exist for supporting physical IRQ binding, but will need additional work before this implementation can be fully shared between PV and HVM. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: sys/i386/xen/mp_machdep.c sys/x86/xen/hvm.c: Add support for placing vcpu_info into an arbritary memory page instead of using HYPERVISOR_shared_info->vcpu_info. This allows the creation of domains with more than 32 vcpus. sys/i386/i386/machdep.c: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/exception.s: Add support for new event channle implementation.	2013-08-29 19:52:18 +00:00
Colin Percival	d1688fc3f1	Remove duplicate dev.xbd.*.max_requests sysctl added in r252260. Approved by: gibbs	2013-08-27 19:10:36 +00:00
Justin T. Gibbs	9985113b61	In the Xen block front driver, take advantage of backends that support cache flush and write barrier commands. sys/dev/xen/blkfront/block.h: Add per-command flag that specifies that the I/O queue must be frozen after this command is dispatched. This is used to implement "single-stepping". Remove the unused per-command flag that indicates a polled command. Add block device instance flags to record backend features. Add a block device instance flag to indicate the I/O queue is frozen until all outstanding I/O completes. Enhance the queue API to allow the number of elements in a queue to be interrogated. Prefer "inline" to "__inline". sys/dev/xen/blkfront/blkfront.c: Formalize queue freeze semantics by adding methods for both global and command-associated queue freezing. Provide mechanism to freeze the I/O queue until all outstanding I/O completes. Use this to implement barrier semantics (BIO_ORDERED) when the backend does not support BLKIF_OP_WRITE_BARRIER commands. Implement BIO_FLUSH as either a BLKIF_OP_FLUSH_DISKCACHE command or a 0 byte write barrier. Currently, all publicly available backends perform a diskcache flush when processing barrier commands, and this frontend behavior matches what is done in Linux. Simplify code by using new queue length API. Report backend features during device attach and via sysctl. Submitted by: Roger Pau Monné Submitted by: gibbs (Merge with new driver queue API, sysctl support)	2013-06-26 20:39:07 +00:00
Justin T. Gibbs	b834eea697	sys/dev/xen/blkfront/blkfront.c: In xbd_thaw(), fix inverted logic to verify the queue is frozen before attempting a thaw. MFC after: 1 week	2013-06-16 16:01:24 +00:00
Justin T. Gibbs	127a9483ed	Properly track the different reasons new I/O is temporarily disabled, and only re-enable I/O when all reasons have cleared. sys/dev/xen/blkfront/block.h: In the block front driver softc, replace the boolean XBDF_FROZEN flag with a count of commands and driver global issues that freeze the I/O queue. So long xbd_qfrozen_cnt is non-zero, I/O is halted. Add flags to xbd_flags for tracking grant table entry and free command resource shortages. Each of these classes can increment xbd_qfrozen_cnt at most once. Add a command flag (XBDCF_ASYNC_MAPPING) that is set whenever the initial mapping attempt of a command fails with EINPROGRESS. sys/dev/xen/blkfront/blkfront.c: In xbd_queue_cb(), use new XBDCF_ASYNC_MAPPING flag to definitively know if an async bus dmamap load has occurred. Add xbd_freeze() and xbd_thaw() helper methods for managing xbd_qfrozen_cnt and use them to implement all queue freezing logic. Add missing "thaw" to restart I/O processing once grant references become available. Sponsored by: Spectra Logic Corporation	2013-06-15 04:51:31 +00:00
Justin T. Gibbs	e2c1fe9009	Improve debugger visibility into queuing functions by removing the macro scheme for defining inline command queuing functions. Prefer enums to #defines. sys/dev/xen/blkfront/block.h Replace inline function generation performed by the XBDQ_COMMAND_QUEUE() macro with single instances of each inline function (init, enqueue, dequeue, remove). This was made possible by using queue indexes instead of bit flags in the command structure, and passing the index enum as an argument to the functions. Improve panic/assert messages in the queue functions. Combine queue data and stats into a single data structure and declare an array of them instead of each queue individually. Convert command flags, softc state, and softc flags to enums. sys/dev/xen/blkfront/blkfront.c Mechanical adjustments for new queue api. Sponsored by: Spectra Logic Corporation MFC after: 1 week	2013-06-14 17:00:58 +00:00
Justin T. Gibbs	7283d23698	sys/dev/xen/blkfront/blkfront.c: Remove dead code. Sponsored by: Spectra Logic Corporation MFC after: 1 week	2013-06-01 04:07:56 +00:00
Justin T. Gibbs	d9fab01d7b	sys/dev/xen/blkfront/blkfront.c: Remove local, and incorrect, definition for the value of an invalid grant reference. Extract ring cleanup code into xbd_free_ring() function for symetry with xbd_alloc_ring(). This process also eliminated an initialized but unused variable. Sponsored by: Spectra Logic Corporation MFC after: 1 week	2013-06-01 04:02:51 +00:00
Justin T. Gibbs	cdf5d66f2f	Style changes. No intended functional changes. o rename flush_requests => xbd_flush_requests o rename xbd_setup_ring => xbd_alloc_ring Sponsored by: Spectra Logic Corporation MFC after: 1 week	2013-05-31 22:33:28 +00:00
Justin T. Gibbs	fac3fd8015	Style cleanups. No intended functional changes. o Group functions by by their functionality. o Remove superfluous declarations. o Remove more unused (#ifdef'd out) code. Sponsored by: Spectra Logic Corporation	2013-05-31 22:21:37 +00:00
Justin T. Gibbs	33eebb6a75	Style cleanups. No intended functional changes. o This driver is the "xbd" driver, not the "blkfront", "blkif", "xbf", or "xb" driver. Use the "xbd_" naming conventions for all functions, structures, and constants. o The prevailing convention for structure fields in this driver is to prefix them with an abreviation of the structure type. Update "recently added" fields to match this style. o Remove unused data structures. o Remove superfluous casts. o Make a pass over the whole driver and bring it closer to style(9) conformance. Sponsored by: Spectra Logic Corporation MFC after: 1 week	2013-05-31 21:05:07 +00:00
Justin T. Gibbs	5e58295a1f	Apply the ad* => ada* IDE device name transition to the Xen block front driver. Submitted by: Bei Guan <gbtju85@gmail.com> Reviewed by: gibbs MFC after: 1 week	2013-05-31 04:43:19 +00:00
Kenneth D. Merry	c3fb2891f0	Fix a bug which causes a panic in daopen(). The panic is caused by a da(4) instance going away while GEOM is still probing it. In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. The solution is to add a callback to the GEOM disk code that is called when all of its resources are cleaned up. This is implemented inside GEOM by adding an optional callback that is called when all consumers have detached from a provider, and the provider is about to be deleted. scsi_cd.c, scsi_da.c: In the register routine for the cd(4) and da(4) routines, acquire a reference to the CAM peripheral instance just before we call disk_create(). Use the new GEOM disk d_gone() callback to register a callback (dadiskgonecb()/cddiskgonecb()) that decrements the peripheral reference count once GEOM has finished cleaning up its resources. In the cd(4) driver, clean up open and close behavior slightly. GEOM makes sure we only get one open() and one close call, so there is no need to set an open flag and decrement the reference count if we are not the first open. In the cd(4) driver, use cam_periph_release_locked() in a couple of error scenarios to avoid extra mutex calls. geom.h: Add a new, optional, providergone callback that is called when a provider is about to be deleted. geom_disk.h: Add a new d_gone() callback to the GEOM disk interface. Bump the DISK_VERSION to version 2. This probably should have been done after a couple of previous changes, especially the addition of the d_getattr() callback. geom_disk.c: Add a providergone callback for the disk class, g_disk_providergone(), that calls the user's d_gone() callback if it exists. Bump the DISK_VERSION to 2. geom_subr.c: In g_destroy_provider(), call the providergone callback if it has been provided. In g_new_geomf(), propagate the class's providergone callback to the new geom instance. blkfront.c: Callers of disk_create() are supposed to pass in DISK_VERSION, not an explicit disk API version number. Update the blkfront driver to do that. disk.9: Update the disk(9) man page to include information on the new d_gone() callback, as well as the previously added d_getattr() callback, d_descr field, and HBA PCI ID fields. MFC after: 5 days	2012-06-24 04:29:03 +00:00
Justin T. Gibbs	0d17232400	Correct failure to attach the PV block front device on Citrix XenServer configurations that advertise the multi-page ring extension, but only allow a single page of ring space. sys/dev/xen/blkfront/blkfront.c: If only one page of ring space is being used, do not publish in the XenStore the number of pages in use (1), via either of the supported multi-page ring extension schemes. Single page operation is the same with or without the ring-page extension being negotiated. Relying on the legacy behavior avoids an incompatible difference in how the two ring-page extension schemes that are out in the wild, deal with the base case of a single page. The Amazon/Red Hat drivers use the same XenStore variable as if the extension was not negotiated. The Citrix drivers assume the new ring reference XenStore variables will be available Reported by: Oliver Schonefeld <schonefeld@ids-mannheim.de> MFC after: 3 days	2012-03-25 14:20:43 +00:00
Scott Long	6ac6f295b0	Final pass at having devices use their bus parent for dma tags. The remaining drivers that haven't been converted have various problems or complexities that will be dealt with later. This list includes: hptrr, hptmv, hpt27xx - device aggregation across multiple parents drm - want to talk to the maintainer first tsec, sec - Openfirmware devices, not sure if changes are warranted fatm - Done except for unused testing code usb - want to talk to the maintainer first ce, cp, ctau, cx - Significant driver changes needed to convey parent info There are also devices tucked into architecture subtrees that I'll leave for the respective maintainers to deal with.	2012-03-12 19:29:35 +00:00
Justin T. Gibbs	443cc4d407	Fix a bug in the calculation of the maximum I/O request size. The previous code did not limit the I/O request size based on the maximum number of segments supported by the back-end. In current practice, since the only back-end supporting chained requests is the FreeBSD implementation, this limit was never exceeded. sys/dev/xen/blkfront/block.h: Add two macros, XBF_SEGS_TO_SIZE() and XBF_SIZE_TO_SEGS(), to centralize the logic of reserving a segment to deal with non-page-aligned I/Os. sys/dev/xen/blkfront/blkfront.c: o When negotiating transfer parameters, limit the max_request_size we use and publish, if it is greater than the maximum, unaligned, I/O we can support with the number of segments advertised by the backend. o Don't unilaterally reduce the I/O size published to the disk layer by a single page. max_request_size is already properly limited in the transfer parameter negotiation code. o Fix typos in printf strings: "max_requests_segments" -> "max_request_segments" "specificed" -> "specified" MFC after: 1 day	2012-02-16 21:58:47 +00:00
Justin T. Gibbs	8b8bfa3567	Enhance documentation, improve interoperability, and fix defects in FreeBSD's front and back Xen blkif interface drivers. sys/dev/xen/blkfront/block.h: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkback/blkback.c: Replace FreeBSD specific multi-page ring impelementation with support for both the Citrix and Amazon/RedHat versions of this extension. sys/dev/xen/blkfront/blkfront.c: o Add a per-instance sysctl tree that exposes all negotiated transport parameters (ring pages, max number of requests, max request size, max number of segments). o In blkfront_vdevice_to_unit() add a missing return statement so that we properly identify the unit number for high numbered xvd devices. sys/dev/xen/blkback/blkback.c: o Add static dtrace probes for several events in this driver. o Defer connection shutdown processing until the front-end enters the closed state. This avoids prematurely tearing down the connection when buggy front-ends transition to the closing state, even though the device is open and they veto the close request from the tool stack. o Add nodes for maximum request size and the number of active ring pages to the exising, per-instance, sysctl tree. o Miscelaneous style cleanup. sys/xen/interface/io/blkif.h: o Add extensive documentation of the XenStore nodes used to implement the blkif interface. o Document the startup sequence between a front and back driver. o Add structures and documenatation for the "discard" feature (AKA Trim). o Cleanup some definitions related to FreeBSD's request number/size/segment-limit extension. sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkback/blkback.c: sys/xen/xenbus/xenbusvar.h: Add the convenience function xenbus_get_otherend_state() and use it to simplify some logic in both block-front and block-back. MFC after: 1 day	2012-02-15 06:45:49 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Justin T. Gibbs	06a630f65d	Add suspend/resume support to the Xen blkfront driver. Sponsored by: BQ Internet sys/dev/xen/blkfront/block.h: sys/dev/xen/blkfront/blkfront.c: Remove now unused blkif_vdev_t from the blkfront soft. sys/dev/xen/blkfront/blkfront.c: o In blkfront_suspend(), indicate the desire to suspend by changing the softc connected state to SUSPENDED, and then wait for any I/O pending on the remote peer to drain. Cancel suspend processing if I/O does not drain within 30 seconds. o Enable and update blkfront_resume(). Since I/O is drained prior to the suspension of the VM, the complicated recovery process performed by other Xen blkfront implementations is avoided. We simply tear down the connection to our old peer, and then re-connect. o In blkif_initialize(), fix a resource leak and botched return if we cannot allocate shadow memory for our requests. o In blkfront_backend_changed(), correct our response to the XenbusStateInitialised state. This state indicates that our backend peer has published sufficient data for blkfront to publish ring information and other XenStore data, not that a connection can occur. Blkfront now will only perform connection processing in response to the XenbusStateConnected state. This corrects an issue where blkfront connected before the backend was ready during resume processing. Approved by: re MFC after: 1 week	2011-09-21 00:02:44 +00:00

1 2

70 Commits