freebsd-nq

Author	SHA1	Message	Date
Neel Natu	7a902ec0ec	Add a check to validate that memory BARs of passthru devices are 4KB aligned. Also, the MSI-x table offset is not required to be 4KB aligned so take this into account when computing the pages occupied by the MSI-x tables.	2014-02-18 19:00:15 +00:00
John Baldwin	a96b8b801a	Tweak the handling of PCI capabilities in emulated devices to remove the non-standard zero capability list terminator. Instead, track the start and end of the most recently added capability and use that to adjust the previous capability's next pointer when a capability is added and to determine the range of config registers belonging to PCI capability registers. Reviewed by: neel	2014-02-18 03:00:20 +00:00
Neel Natu	d84882ca8f	Allow PCI devices to be configured on all valid bus numbers from 0 to 255. This is done by representing each bus as root PCI device in ACPI. The device implements the _BBN method to return the PCI bus number to the guest OS. Each PCI bus keeps track of the resources that is decodes for devices configured on the bus: i/o, mmio (32-bit) and mmio (64-bit). These windows are advertised to the guest via the _CRS object of the root device. Bus 0 is treated specially since it consumes the I/O ports to access the PCI config space [0xcf8-0xcff]. It also decodes the legacy I/O ports that are consumed by devices on the LPC bus. For this reason the LPC bridge can be configured only on bus 0. The bus number can be specified using the following command line option to bhyve(8): "-s <bus>:<slot>:<func>,<emul>[,<config>]" Discussed with: grehan@ Reviewed by: jhb@	2014-02-14 21:34:08 +00:00
John Baldwin	3cbf3585cb	Enhance the support for PCI legacy INTx interrupts and enable them in the virtio backends. - Add a new ioctl to export the count of pins on the I/O APIC from vmm to the hypervisor. - Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for ISA interrupts. - Populate the MP Table with I/O interrupt entries for any PCI INTx interrupts. - Create a _PRT table under the PCI root bridge in ACPI to route any PCI INTx interrupts appropriately. - Track which INTx interrupts are in use per-slot so that functions that share a slot attempt to distribute their INTx interrupts across the four available pins. - Implicitly mask INTx interrupts if either MSI or MSI-X is enabled and when the INTx DIS bit is set in a function's PCI command register. Either assert or deassert the associated I/O APIC pin when the state of one of those conditions changes. - Add INTx support to the virtio backends. - Always advertise the MSI capability in the virtio backends. Submitted by: neel (7) Reviewed by: neel MFC after: 2 weeks	2014-01-29 14:56:48 +00:00
John Baldwin	d2bc4816c5	Remove support for legacy PCI devices. These haven't been needed since support for LPC uart devices was added and it conflicts with upcoming patches to add PCI INTx support. Reviewed by: neel	2014-01-27 22:26:15 +00:00
John Baldwin	e6c8bc291a	Rework the DSDT generation code a bit to generate more accurate info about LPC devices. Among other things, the LPC serial ports now appear as ACPI devices. - Move the info for the top-level PCI bus into the PCI emulation code and add ResourceProducer entries for the memory ranges decoded by the bus for memory BARs. - Add a framework to allow each PCI emulation driver to optionally write an entry into the DSDT under the \_SB_.PCI0 namespace. The LPC driver uses this to write a node for the LPC bus (\_SB_.PCI0.ISA). - Add a linker set to allow any LPC devices to write entries into the DSDT below the LPC node. - Move the existing DSDT block for the RTC to the RTC driver. - Add DSDT nodes for the AT PIC, the 8254 ISA timer, and the LPC UART devices. - Add a "SuperIO" device under the LPC node to claim "system resources" aling with a linker set to allow various drivers to add IO or memory ranges that should be claimed as a system resource. - Add system resource entries for the extended RTC IO range, the registers used for ACPI power management, the ELCR, PCI interrupt routing register, and post data register. - Add various helper routines for generating DSDT entries. Reviewed by: neel (earlier version)	2014-01-02 21:26:59 +00:00
Neel Natu	4f8be175d5	Add an API to deliver message signalled interrupts to vcpus. This allows callers treat the MSI 'addr' and 'data' fields as opaque and also lets bhyve implement multiple destination modes: physical, flat and clustered. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com) Reviewed by: grehan@	2013-12-16 19:59:31 +00:00
Neel Natu	ac7304a758	Add an ioctl to assert and deassert an ioapic pin atomically. This will be used to inject edge triggered legacy interrupts into the guest. Start using the new API in device models that use edge triggered interrupts: viz. the 8254 timer and the LPC/uart device emulation. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-11-23 03:56:03 +00:00
Neel Natu	565bbb8698	Move the ioapic device model from userspace into vmm.ko. This is needed for upcoming in-kernel device emulations like the HPET. The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to manipulate the ioapic pin state. Discussed with: grehan@ Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-11-12 22:51:03 +00:00
Neel Natu	c8afb9bc3f	Fix an off-by-one error when iterating over the emulated PCI BARs. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-11-06 22:35:52 +00:00
Neel Natu	ea7f1c8cd2	Add support for PCI-to-ISA LPC bridge emulation. If the LPC bus is attached to a virtual machine then we implicitly create COM1 and COM2 ISA devices. Prior to this change the only way of attaching a COM port to the virtual machine was by presenting it as a PCI device that is mapped at the legacy I/O address 0x3F8 or 0x2F8. There were some issues with the original approach: - It did not work at all with UEFI because UEFI will reprogram the PCI device BARs and remap the COM1/COM2 ports at non-legacy addresses. - OpenBSD GENERIC kernel does not create a /dev/console because it expects the uart device at the legacy 0x3F8/0x2F8 address to be an ISA device. - It was functional with a FreeBSD guest but caused the console to appear on /dev/ttyu2 which was not intuitive. The uart emulation is now independent of the bus on which it resides. Thus it is possible to have uart devices on the PCI bus in addition to the legacy COM1/COM2 devices behind the LPC bus. The command line option to attach ISA COM1/COM2 ports to a virtual machine is "-s <bus>,lpc -l com1,stdio". The command line option to create a PCI-attached uart device is: "-s <bus>,uart[,stdio]" The command line option to create PCI-attached COM1/COM2 device is: "-S <bus>,uart[,stdio]". This style of creating COM ports is deprecated. Discussed with: grehan Reviewed by: grehan Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com) M share/examples/bhyve/vmrun.sh AM usr.sbin/bhyve/legacy_irq.c AM usr.sbin/bhyve/legacy_irq.h M usr.sbin/bhyve/Makefile AM usr.sbin/bhyve/uart_emul.c M usr.sbin/bhyve/bhyverun.c AM usr.sbin/bhyve/uart_emul.h M usr.sbin/bhyve/pci_uart.c M usr.sbin/bhyve/pci_emul.c M usr.sbin/bhyve/inout.c M usr.sbin/bhyve/pci_emul.h M usr.sbin/bhyve/inout.h AM usr.sbin/bhyve/pci_lpc.c AM usr.sbin/bhyve/pci_lpc.h	2013-10-29 00:18:11 +00:00
Peter Grehan	2a8d400a2e	Allow a 4-byte write to PCI config space to overlap the 2 read-only bytes at the start of a PCI capability. This is the sequence that OpenBSD uses when enabling MSI interrupts, and works fine on real h/w. In bhyve, convert the 4 byte write to a 2-byte write to the r/w area past the first 2 r/o bytes of a capability. Reviewed by: neel Approved by: re@ (blanket)	2013-10-09 23:53:21 +00:00
Neel Natu	318224bbe6	Merge projects/bhyve_npt_pmap into head. Make the amd64/pmap code aware of nested page table mappings used by bhyve guests. This allows bhyve to associate each guest with its own vmspace and deal with nested page faults in the context of that vmspace. This also enables features like accessed/dirty bit tracking, swapping to disk and transparent superpage promotions of guest memory. Guest vmspace: Each bhyve guest has a unique vmspace to represent the physical memory allocated to the guest. Each memory segment allocated by the guest is mapped into the guest's address space via the 'vmspace->vm_map' and is backed by an object of type OBJT_DEFAULT. pmap types: The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT. The PT_X86 pmap type is used by the vmspace associated with the host kernel as well as user processes executing on the host. The PT_EPT pmap is used by the vmspace associated with a bhyve guest. Page Table Entries: The EPT page table entries as mostly similar in functionality to regular page table entries although there are some differences in terms of what bits are used to express that functionality. For e.g. the dirty bit is represented by bit 9 in the nested PTE as opposed to bit 6 in the regular x86 PTE. Therefore the bitmask representing the dirty bit is now computed at runtime based on the type of the pmap. Thus PG_M that was previously a macro now becomes a local variable that is initialized at runtime using 'pmap_modified_bit(pmap)'. An additional wrinkle associated with EPT mappings is that older Intel processors don't have hardware support for tracking accessed/dirty bits in the PTE. This means that the amd64/pmap code needs to emulate these bits to provide proper accounting to the VM subsystem. This is achieved by using the following mapping for EPT entries that need emulation of A/D bits: Bit Position Interpreted By PG_V 52 software (accessed bit emulation handler) PG_RW 53 software (dirty bit emulation handler) PG_A 0 hardware (aka EPT_PG_RD) PG_M 1 hardware (aka EPT_PG_WR) The idea to use the mapping listed above for A/D bit emulation came from Alan Cox (alc@). The final difference with respect to x86 PTEs is that some EPT implementations do not support superpage mappings. This is recorded in the 'pm_flags' field of the pmap. TLB invalidation: The amd64/pmap code has a number of ways to do invalidation of mappings that may be cached in the TLB: single page, multiple pages in a range or the entire TLB. All of these funnel into a single EPT invalidation routine called 'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and sends an IPI to the host cpus that are executing the guest's vcpus. On a subsequent entry into the guest it will detect that the EPT has changed and invalidate the mappings from the TLB. Guest memory access: Since the guest memory is no longer wired we need to hold the host physical page that backs the guest physical page before we can access it. The helper functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose. PCI passthru: Guest's with PCI passthru devices will wire the entire guest physical address space. The MMIO BAR associated with the passthru device is backed by a vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that have one or more PCI passthru devices attached to them. Limitations: There isn't a way to map a guest physical page without execute permissions. This is because the amd64/pmap code interprets the guest physical mappings as user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U shares the same bit position as EPT_PG_EXECUTE all guest mappings become automatically executable. Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews as well as their support and encouragement. Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing object for pci passthru mmio regions. Special thanks to Peter Holm for testing the patch on short notice. Approved by: re Discussed with: grehan Reviewed by: alc, kib Tested by: pho	2013-10-05 21:22:35 +00:00
Neel Natu	6a52209f9c	Allow single byte reads of the emulated MSI-X tables. This is not required by the PCI specification but needed to dump MMIO space from "ddb" in the guest.	2013-08-27 16:50:48 +00:00
Peter Grehan	50dc0db3f0	Fix ordering of legacy IRQ reservations. Submitted by: Jeremiah Lott jlott at averesystems dot com	2013-08-16 00:35:20 +00:00
Peter Grehan	a38e2a64dc	Support an optional "mac=" parameter to virtio-net config, to allow users to set the MAC address for a device. Clean up some obsolete code in pci_virtio_net.c Allow an error return from a PCI device emulation's init routine to be propagated all the way back to the top-level and result in the process exiting. Submitted by: Dinakar Medavaram dinnu sun at gmail (original version)	2013-07-04 05:35:56 +00:00
Peter Grehan	34d244edb2	Fix up option parsing to allow a colon in the config section. Clean up some other unnecessary code. Submitted by: Dinakar Medavaram dinnu sun at gmail Reviewed by: neel	2013-07-01 23:53:22 +00:00
Peter Grehan	7554303627	Allow the PCI config address register to be read. The Linux kernel does this. Also remove an unused header file. Submitted by: tycho nightingale at pluribusnetworks com Reviewed by: neel	2013-06-28 05:01:25 +00:00
Neel Natu	b05c77ff84	Gripe if some <slot,function> tuple is specified more than once instead of silently overwriting the previous assignment. Gripe if the emulation is not recognized instead of silently ignoring the emulated device. If an error is detected by pci_parse_slot() then exit from the command line parsing loop in main(). Submitted by (initial version): Chris Torek (chris.torek@gmail.com)	2013-04-26 02:24:50 +00:00
Neel Natu	9f08548d20	Setup accesses to the memory hole below 4GB to return all 1's on read and consume all writes without any side effects. Obtained from: NetApp	2013-04-17 02:03:12 +00:00
Neel Natu	028d9311cd	Improve PCI BAR emulation: - Respect the MEMEN and PORTEN bits in the command register - Allow the guest to reprogram the address decoded by the BAR Submitted by: Gopakumar T Obtained from: NetApp	2013-04-10 02:12:39 +00:00
Neel Natu	b060ba5024	Simplify the assignment of memory to virtual machines by requiring a single command line option "-m <memsize in MB>" to specify the memory size. Prior to this change the user needed to explicitly specify the amount of memory allocated below 4G (-m <lowmem>) and the amount above 4G (-M <highmem>). The "-M" option is no longer supported by 'bhyveload' and 'bhyve'. The start of the PCI hole is fixed at 3GB and cannot be directly changed using command line options. However it is still possible to change this in special circumstances via the 'vm_set_lowmem_limit()' API provided by libvmmapi. Submitted by: Dinakar Medavaram (initial version) Reviewed by: grehan Obtained from: NetApp	2013-03-18 22:38:30 +00:00
Peter Grehan	0ab13648f5	Add the ability to have a 'fallback' search for memory ranges. These set of ranges will be looked at if a standard memory range isn't found, and won't be installed in the cache. Use this to implement the memory behaviour of the PCI hole on x86 systems, where writes are ignored and reads always return -1. This allows breakpoints to be set when issuing a 'boot -d', which has the side effect of accessing the PCI hole when changing the PTE protection on kernel code, since the pmap layer hasn't been initialized (a bug, but present in existing FreeBSD releases so has to be handled). Reviewed by: neel Obtained from: NetApp	2013-02-22 00:46:32 +00:00
Neel Natu	74f80b236d	Advertise PCI-E capability in the hostbridge device presented to the guest. FreeBSD wants to see this capability in at least one device in the PCI hierarchy before it allows use of MSI or MSI-X. Obtained from: NetApp	2013-02-15 18:41:36 +00:00
Neel Natu	aa12663f49	Fix a bug in the passthru implementation where it would assume that all devices are MSI-X capable. This in turn would lead it to treat bar 0 as the MSI-X table bar even if the underlying device did not support MSI-X. Fix this by providing an API to query the MSI-X table index of the emulated device. If the underlying device does not support MSI-X then this API will return -1. Obtained from: NetApp	2013-02-01 02:41:47 +00:00
Neel Natu	c9b4e98754	Add support for MSI-X interrupts in the virtio network device and make that the default. The current behavior of advertising a single MSI vector can be requested by setting the environment variable "BHYVE_USE_MSI" to "true". The use of MSI is not compliant with the virtio specification and will be eventually phased out. Submitted by: Gopakumar T Obtained from: NetApp	2013-01-30 04:30:36 +00:00
Peter Grehan	e285ef8d28	Rename fbsdrun.* -> bhyverun.* bhyve is intended to be a generic hypervisor, and not FreeBSD-specific. (renaming internal routines will come later) Reviewed by: neel Obtained from: NetApp	2012-12-13 01:58:11 +00:00
Neel Natu	90415e0b4e	Ignore PCI configuration accesses to all bus numbers other than PCI bus 0. Obtained from: NetApp	2012-10-27 02:39:08 +00:00
Peter Grehan	fbfc1c763c	Remove mptable generation code from libvmmapi and move it to bhyve. Firmware tables require too much knowledge of system configuration, and it's difficult to pass that information in general terms to a library. The upcoming ACPI work exposed this - it will also livein bhyve. Also, remove code specific to NetApp from the mptable name, and remove the -n option from bhyve. Reviewed by: neel Obtained from: NetApp	2012-10-26 13:40:12 +00:00
Peter Grehan	4d1e669cad	Rework how guest MMIO regions are dealt with. - New memory region interface. An RB tree holds the regions, with a last-found per-vCPU cache to deal with the common case of repeated guest accesses to MMIO registers in the same page. - Support memory-mapped BARs in PCI emulation. mem.c/h - memory region interface instruction_emul.c/h - remove old region interface. Use gpa from EPT exit to avoid a tablewalk to determine operand address. Determine operand size and use when calling through to region handler. fbsdrun.c - call into region interface on paging exit. Distinguish between instruction emul error and region not found pci_emul.c/h - implement new BAR callback api. Split BAR alloc routine into routines that require/don't require the BAR phys address. ioapic.c pci_passthru.c pci_virtio_block.c pci_virtio_net.c pci_uart.c - update to new BAR callback i/f Reviewed by: neel Obtained from: NetApp	2012-10-19 18:11:17 +00:00
Neel Natu	25d4944e06	Fix a bug in how a 64-bit bar in a pci passthru device would be presented to the guest. Prior to the fix it was possible for such a bar to appear as a 32-bit bar as long as it was allocated from the region below 4GB. This had the potential to confuse some drivers that were particular about the size of the bars. Obtained from: NetApp	2012-08-06 07:20:25 +00:00
Neel Natu	99d653892b	Add support for emulating PCI multi-function devices. These function number is specified by an optional [:<func>] after the slot number: -s 1:0,virtio-net,tap0 Ditto for the mptable naming: -n 1:0,e0a Obtained from: NetApp	2012-08-06 06:51:27 +00:00
Neel Natu	b0b53d3af9	Device model for ioapic emulation. With this change the uart emulation is entirely interrupt driven. Obtained from: NetApp	2012-08-05 00:00:52 +00:00
Neel Natu	308f907720	Use the correct variable to index into the 'lirq[]' array to check the legacy IRQ ownership.	2012-08-04 04:26:17 +00:00
Peter Grehan	0038ee9891	Add 16550 uart emulation as a PCI device. This allows it to be activated as part of the slot config options. The syntax is: -s <slotnum>,uart[,stdio] The stdio parameter instructs the code to perform i/o using stdin/stdout. It can only be used for one instance. To allow legacy i/o ports/irqs to be used, a new variant of the slot command, -S, is introduced. When used to specify a slot, the device will use legacy resources if it supports them; otherwise it will be treated the same as the '-s' option. Specifying the -S option with the uart will first use the 0x3f8/irq 4 config, and the second -S will use 0x2F8/irq 3. Interrupt delivery is awaiting the arrival of the i/o apic code, but this works fine in uart(4)'s polled mode. This code was written by Cynthia Lu @ MIT while an intern at NetApp, with further work from neel@ and grehan@. Obtained from: NetApp	2012-05-03 03:11:27 +00:00
Peter Grehan	cd942e0f25	MSI-x interrupt support for PCI pass-thru devices. Includes instruction emulation for memory r/w access. This opens the door for io-apic, local apic, hpet timer, and legacy device emulation. Submitted by: ryan dot berryhill at sandvine dot com Reviewed by: grehan Obtained from: Sandvine	2012-04-28 16:28:00 +00:00
John Baldwin	b67e81db43	First cut to port bhyve, vmmctl, and libvmmapi to HEAD.	2011-05-15 04:03:11 +00:00
Peter Grehan	366f60834f	Import of bhyve hypervisor and utilities, part 1. vmm.ko - kernel module for VT-x, VT-d and hypervisor control bhyve - user-space sequencer and i/o emulation vmmctl - dump of hypervisor register state libvmm - front-end to vmm.ko chardev interface bhyve was designed and implemented by Neel Natu. Thanks to the following folk from NetApp who helped to make this available: Joe CaraDonna Peter Snyder Jeff Heller Sandeep Mann Steve Miller Brian Pawlowski	2011-05-13 04:54:01 +00:00

38 Commits