freebsd-skq

Author	SHA1	Message	Date
pfg	9e7434b2c5	lib: minor spelling fixes in comments. No functional change.	2016-05-01 19:37:33 +00:00
gjb	8bfb527a82	MFH Sponsored by: The FreeBSD Foundation	2016-02-22 12:28:23 +00:00
skra	812447f90a	As <machine/param.h> is included from <sys/param.h>, there is no need to include it explicitly when <sys/param.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5378	2016-02-22 09:04:36 +00:00
gjb	fef2698edf	First pass through library packaging. Sponsored by: The FreeBSD Foundation	2016-02-04 21:16:35 +00:00
bdrewery	e13d6f8b3f	META MODE: Prefer INSTALL=tools/install.sh to lessen the need for xinstall.host. This both avoids some dependencies on xinstall.host and allows bootstrapping on older releases to work due to lack of at least 'install -l' support. Sponsored by: EMC / Isilon Storage Division	2015-11-25 19:10:28 +00:00
neel	e9213dd046	Move the 'devmem' device nodes from /dev/vmm to /dev/vmm.io Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual machines on the host. These tools break if the devmem device nodes also appear in /dev/vmm. Requested by: grehan	2015-07-06 19:41:43 +00:00
sjg	4beac78e55	Updated depends	2015-07-03 06:11:54 +00:00
neel	e419539610	Fix a regression in "movs" emulation after r284539. The regression was caused due to a change in behavior of the 'vm_map_gpa()'. Prior to r284539 if 'vm_map_gpa()' was called to map an address range in the guest MMIO region then it would return NULL. This was used by the "movs" emulation to detect if the 'src' or 'dst' operand was in MMIO space. Post r284539 'vm_map_gpa()' started returning a non-NULL pointer even when mapping the guest MMIO region. Fix this by returning non-NULL only if [gaddr, gaddr+len) is entirely within the 'lowmem' or 'highmem' regions and NULL otherwise. Pointy hat to: neel Reviewed by: grehan Reported by: tychon, Ben Perrault (ben.perrault@gmail.com) MFC after: 1 week	2015-06-22 00:30:34 +00:00
neel	8c70d6c7af	Restructure memory allocation in bhyve to support "devmem". devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer where doing a trap-and-emulate for every access is impractical. devmem is a hybrid of system memory (sysmem) and emulated device models. devmem is mapped in the guest address space via nested page tables similar to sysmem. However the address range where devmem is mapped may be changed by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is usually mapped RO or RW as compared to RWX mappings for sysmem. Each devmem segment is named (e.g. "bootrom") and this name is used to create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom). The device node supports mmap(2) and this decouples the host mapping of devmem from its mapping in the guest address space (which can change). Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D2762 MFC after: 4 weeks	2015-06-18 06:00:17 +00:00
sjg	008d7c831f	Add META_MODE support. Off by default, build behaves normally. WITH_META_MODE we get auto objdir creation, the ability to start build from anywhere in the tree. Still need to add real targets under targets/ to build packages. Differential Revision: D2796 Reviewed by: brooks imp	2015-06-13 19:20:56 +00:00
sjg	65145fa4c8	Merge sync of head	2015-05-27 01:19:58 +00:00
neel	7776059e98	Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup(). Prior to this change both functions returned 0 for success, -1 for failure and +1 to indicate that an exception was injected into the guest. The numerical value of ERESTART also happens to be -1 so when these functions returned -1 it had to be translated to a positive errno value to prevent the VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce bugs when writing emulation code. Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if an exception was delivered to the guest. The return value is 0 or EFAULT so no additional translation is needed. Reviewed by: tychon MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D2428	2015-05-06 16:25:20 +00:00
bapt	2ef8ecc842	Fix overlinking in bhyve: libvmmapi is actually needed to be linked to libutil, not bhyve nor bhyveload	2015-04-09 21:38:40 +00:00
tychon	b3c521e852	Fix "MOVS" instruction memory to MMIO emulation. Currently updates to %rdi, %rsi, etc are inadvertently bypassed along with the check to see if the instruction needs to be repeated per the 'rep' prefix. Add "MOVS" instruction support for the 'MMIO to MMIO' case. Reviewed by: neel	2015-04-01 00:15:31 +00:00
neel	9d0b530c89	Fix a bug in libvmmapi 'vm_copy_setup()' where it would return success even if the 'gpa' was in the guest MMIO region. This would manifest as a segmentation fault in 'vm_map_copyin()' or 'vm_map_copyout()' because 'vm_map_gpa()' would return NULL for this 'gpa'. Fix this by calling 'vm_map_gpa()' in 'vm_copy_setup' and returning a failure if the 'gpa' cannot be mapped. This matches the behavior of 'vm_copy_setup()' in vmm.ko. MFC after: 1 week	2015-01-19 06:51:04 +00:00
neel	d9f07f9841	Simplify instruction restart logic in bhyve. Keep track of the next instruction to be executed by the vcpu as 'nextrip'. As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should start execution. Also, instruction restart happens implicitly via 'vm_inject_exception()' or explicitly via 'vm_restart_instruction()'. The APIs behave identically in both kernel and userspace contexts. The main beneficiary is the instruction emulation code that executes in both contexts. bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length' as readonly: - Restarting an instruction is now done by calling 'vm_restart_instruction()' as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout()) - Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch()) Differential Revision: https://reviews.freebsd.org/D1526 Reviewed by: grehan MFC after: 2 weeks	2015-01-18 03:08:30 +00:00
neel	7aa6460c48	Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko. The new RTC emulation supports all interrupt modes: periodic, update ended and alarm. It is also capable of maintaining the date/time and NVRAM contents across virtual machine reset. Also, the date/time fields can now be modified by the guest. Since bhyve now emulates both the PIT and the RTC there is no need for "Legacy Replacement Routing" in the HPET so get rid of it. The RTC device state can be inspected via bhyvectl as follows: bhyvectl --vm=vm --get-rtc-time bhyvectl --vm=vm --set-rtc-time=<unix_time_secs> bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value> Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D1385 MFC after: 2 weeks	2014-12-30 22:19:34 +00:00
sjg	d7cd1d425c	Merge head from 7/28	2014-08-19 06:50:54 +00:00
neel	4535fa67c4	Fix fault injection in bhyve. The faulting instruction needs to be restarted when the exception handler is done handling the fault. bhyve now does this correctly by setting 'vmexit[vcpu].inst_length' to zero so the %rip is not advanced. A minor complication is that the fault injection APIs are used by instruction emulation code that is shared by vmm.ko and bhyve. Thus the argument that refers to 'struct vm ' in kernel or 'struct vmctx ' in userspace needs to be loosely typed as a 'void *'.	2014-07-24 01:38:11 +00:00
neel	e972917c13	Emulate instructions emitted by OpenBSD/i386 version 5.5: - CMP REG, r/m - MOV AX/EAX/RAX, moffset - MOV moffset, AX/EAX/RAX - PUSH r/m	2014-07-23 04:28:51 +00:00
neel	1f15eea2e0	Handle nested exceptions in bhyve. A nested exception condition arises when a second exception is triggered while delivering the first exception. Most nested exceptions can be handled serially but some are converted into a double fault. If an exception is generated during delivery of a double fault then the virtual machine shuts down as a result of a triple fault. vm_exit_intinfo() is used to record that a VM-exit happened while an event was being delivered through the IDT. If an exception is triggered while handling the VM-exit it will be treated like a nested exception. vm_entry_intinfo() is used by processor-specific code to get the event to be injected into the guest on the next VM-entry. This function is responsible for deciding the disposition of nested exceptions.	2014-07-19 20:59:08 +00:00
neel	921bfc7679	Provide APIs to directly get 'lowmem' and 'highmem' size directly. Previously the sizes were inferred indirectly based on the size of the mappings at 0 and 4GB respectively. This works fine as long as size of the allocation is identical to the size of the mapping in the guest's address space. However, if the mapping is disjoint then this assumption falls apart (e.g., due to the legacy BIOS hole between 640KB and 1MB).	2014-06-24 02:02:51 +00:00
neel	80a67d54c4	Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained by vmm.ko. This allows the virtual machine to be restarted without having to destroy it first. Reviewed by: grehan	2014-06-07 21:36:52 +00:00
neel	9c2a942387	Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing it implicitly in vmm.ko. Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and "--get-suspended-cpus" options. This is in preparation for being able to reset virtual machine state without having to destroy and recreate it.	2014-05-31 23:37:34 +00:00
neel	31a1b31ef2	Fix issue with restarting an "insb/insw/insl" instruction because of a page fault on the destination buffer. Prior to this change a page fault would be detected in vm_copyout(). This was done after the I/O port access was done. If the I/O port access had side-effects (e.g. reading the uart FIFO) then restarting the instruction would result in incorrect behavior. Fix this by validating the guest linear address before doing the I/O port emulation. If the validation results in a page fault exception being injected into the guest then the instruction can now be restarted without any side-effects.	2014-05-26 18:21:08 +00:00
neel	51a05acc08	Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out of the guest linear address space. These APIs in turn use a new ioctl 'VM_GLA2GPA' to convert the guest linear address to guest physical. Use the new copyin/copyout APIs when emulating ins/outs instruction in bhyve(8).	2014-05-24 23:12:30 +00:00
sjg	5860f0d106	Updated dependencies	2014-05-16 14:09:51 +00:00
jhb	f558af85b7	Implement a PCI interrupt router to route PCI legacy INTx interrupts to the legacy 8259A PICs. - Implement an ICH-comptabile PCI interrupt router on the lpc device with 8 steerable pins configured via config space access to byte-wide registers at 0x60-63 and 0x68-6b. - For each configured PCI INTx interrupt, route it to both an I/O APIC pin and a PCI interrupt router pin. When a PCI INTx interrupt is asserted, ensure that both pins are asserted. - Provide an initial routing of PCI interrupt router (PIRQ) pins to 8259A pins (ISA IRQs) and initialize the interrupt line config register for the corresponding PCI function with the ISA IRQ as this matches existing hardware. - Add a global _PIC method for OSPM to select the desired interrupt routing configuration. - Update the _PRT methods for PCI bridges to provide both APIC and legacy PRT tables and return the appropriate table based on the configured routing configuration. Note that if the lpc device is not configured, no routing information is provided. - When the lpc device is enabled, provide ACPI PCI link devices corresponding to each PIRQ pin. - Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A pins via the ELCR. - Mark the power management SCI as level triggered. - Don't hardcode the number of elements in Packages in the source for the DSDT. iasl(8) will fill in the actual number of elements, and this makes it simpler to generate a Package with a variable number of elements. Reviewed by: tycho	2014-05-15 14:16:55 +00:00
neel	f14c076ec7	Don't include the guest memory segments in the bhyve(8) process core dump. This has not added a lot of value when debugging bhyve issues while greatly increasing the time and space required to store the core file. Passing the "-C" option to bhyve(8) will change the default and dump guest memory in the core dump. Requested by: grehan Reviewed by: grehan	2014-05-13 16:40:27 +00:00
sjg	1a7e48acf1	Updated dependencies	2014-05-10 05:16:28 +00:00
sjg	ed3fc70bf5	Merge from head	2014-05-08 23:54:15 +00:00
neel	b616a9a2e4	Allow a virtual machine to be forcibly reset or powered off. This is done by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF. The disposition of VM_SUSPEND is also made available to the exit handler via the 'u.suspended' member of 'struct vm_exit'. This capability is exposed via the '--force-reset' and '--force-poweroff' arguments to /usr/sbin/bhyvectl. Discussed with: grehan@	2014-04-28 22:06:40 +00:00
sjg	5e568154a0	Merge head	2014-04-28 07:50:45 +00:00
sjg	0c7e03a54c	Merge head	2014-04-27 08:13:43 +00:00
tychon	5906c6773b	Add support for emulating the slave PIC. Reviewed by: grehan, jhb Approved by: grehan (co-mentor)	2014-04-14 19:00:20 +00:00
neel	3e49998fdf	Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called from any context i.e., it is not required to be called from a vcpu thread. The ioctl simply sets a state variable 'vm->suspend' to '1' and returns. The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The suspend handler waits until all 'vm->active_cpus' have transitioned to 'vm->suspended_cpus' before returning to userspace. Discussed with: grehan	2014-03-26 23:34:27 +00:00
tychon	25c8b61cfd	Replace the userspace atpic stub with a more functional vmm.ko model. New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ can be used to manipulate the pic, and optionally the ioapic, pin state. Reviewed by: jhb, neel Approved by: neel (co-mentor)	2014-03-11 16:56:00 +00:00
neel	e01c440dae	Queue pending exceptions in the 'struct vcpu' instead of directly updating the processor-specific VMCS or VMCB. The pending exception will be delivered right before entering the guest. The order of event injection into the guest is: - hardware exception - NMI - maskable interrupt In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt window-exiting and inject it as soon as possible after the hardware exception is injected. Also since interrupts are inherently asynchronous, injecting them after the hardware exception should not affect correctness from the guest perspective. Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict it to only deliver x86 hardware exceptions. This new ioctl is now used to inject a protection fault when the guest accesses an unimplemented MSR. Discussed with: grehan, jhb Reviewed by: jhb	2014-02-26 00:52:05 +00:00
jhb	f4e46bef98	Add support for FreeBSD/i386 guests under bhyve. - Similar to the hack for bootinfo32.c in userboot, define _MACHINE_ELF_WANT_32BIT in the load_elf32 file handlers in userboot. This allows userboot to load 32-bit kernels and modules. - Copy the SMAP generation code out of bootinfo64.c and into its own file so it can be shared with bootinfo32.c to pass an SMAP to the i386 kernel. - Use uint32_t instead of u_long when aligning module metadata in bootinfo32.c in userboot, as otherwise the metadata used 64-bit alignment which corrupted the layout. - Populate the basemem and extmem members of the bootinfo struct passed to 32-bit kernels. - Fix the 32-bit stack in userboot to start at the top of the stack instead of the bottom so that there is room to grow before the kernel switches to its own stack. - Push a fake return address onto the 32-bit stack in addition to the arguments normally passed to exec() in the loader. This return address is needed to convince recover_bootinfo() in the 32-bit locore code that it is being invoked from a "new" boot block. - Add a routine to libvmmapi to setup a 32-bit flat mode register state including a GDT and TSS that is able to start the i386 kernel and update bhyveload to use it when booting an i386 kernel. - Use the guest register state to determine the CPU's current instruction mode (32-bit vs 64-bit) and paging mode (flat, 32-bit, PAE, or long mode) in the instruction emulation code. Update the gla2gpa() routine used when fetching instructions to handle flat mode, 32-bit paging, and PAE paging in addition to long mode paging. Don't look for a REX prefix when the CPU is in 32-bit mode, and use the detected mode to enable the existing 32-bit mode code when decoding the mod r/m byte. Reviewed by: grehan, neel MFC after: 1 month	2014-02-05 04:39:03 +00:00
jhb	3f6ca218c6	Enhance the support for PCI legacy INTx interrupts and enable them in the virtio backends. - Add a new ioctl to export the count of pins on the I/O APIC from vmm to the hypervisor. - Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for ISA interrupts. - Populate the MP Table with I/O interrupt entries for any PCI INTx interrupts. - Create a _PRT table under the PCI root bridge in ACPI to route any PCI INTx interrupts appropriately. - Track which INTx interrupts are in use per-slot so that functions that share a slot attempt to distribute their INTx interrupts across the four available pins. - Implicitly mask INTx interrupts if either MSI or MSI-X is enabled and when the INTx DIS bit is set in a function's PCI command register. Either assert or deassert the associated I/O APIC pin when the state of one of those conditions changes. - Add INTx support to the virtio backends. - Always advertise the MSI capability in the virtio backends. Submitted by: neel (7) Reviewed by: neel MFC after: 2 weeks	2014-01-29 14:56:48 +00:00
jhb	8ab82a5fe1	Extend the support for local interrupts on the local APIC: - Add a generic routine to trigger an LVT interrupt that supports both fixed and NMI delivery modes. - Add an ioctl and bhyvectl command to trigger local interrupts inside a guest. In particular, a global NMI similar to that raised by SERR# or PERR# can be simulated by asserting LINT1 on all vCPUs. - Extend the LVT table in the vCPU local APIC to support CMCI. - Flesh out the local APIC error reporting a bit to cache errors and report them via ESR when ESR is written to. Add support for asserting the error LVT when an error occurs. Raise illegal vector errors when attempting to signal an invalid vector for an interrupt or when sending an IPI. - Ignore writes to reserved bits in LVT entries. - Export table entries the MADT and MP Table advertising the stock x86 config of LINT0 set to ExtInt and LINT1 wired to NMI. Reviewed by: neel (earlier version)	2013-12-23 19:29:07 +00:00
neel	e24d040187	Rename the ambiguously named 'vm_setup_msi()' and 'vm_setup_msix()' to 'vm_setup_pptdev_msi()' and 'vm_setup_pptdev_msix()' respectively. It should now be clear that these functions operate on passthru devices.	2013-12-18 03:58:51 +00:00
neel	e62c100b90	Add an API to deliver message signalled interrupts to vcpus. This allows callers treat the MSI 'addr' and 'data' fields as opaque and also lets bhyve implement multiple destination modes: physical, flat and clustered. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com) Reviewed by: grehan@	2013-12-16 19:59:31 +00:00
neel	89dbc92028	Add HPET device emulation to bhyve. bhyve supports a single timer block with 8 timers. The timers are all 32-bit and capable of being operated in periodic mode. All timers support interrupt delivery using MSI. Timers 0 and 1 also support legacy interrupt routing. At the moment the timers are not connected to any ioapic pins but that will be addressed in a subsequent commit. This change is based on a patch from Tycho Nightingale (tycho.nightingale@pluribusnetworks.com).	2013-11-25 19:04:51 +00:00
neel	3b87354d1e	Add an ioctl to assert and deassert an ioapic pin atomically. This will be used to inject edge triggered legacy interrupts into the guest. Start using the new API in device models that use edge triggered interrupts: viz. the 8254 timer and the LPC/uart device emulation. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-11-23 03:56:03 +00:00
neel	384d86e888	Move the ioapic device model from userspace into vmm.ko. This is needed for upcoming in-kernel device emulations like the HPET. The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to manipulate the ioapic pin state. Discussed with: grehan@ Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-11-12 22:51:03 +00:00
sjg	ebb713fe41	New/updated dependencies	2013-10-17 19:59:51 +00:00
neel	75369cb181	Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose 'invpcid' instruction to the guest. Currently bhyve will try to enable this capability unconditionally if it is available. Consolidate code in bhyve to set the capabilities so it is no longer duplicated in BSP and AP bringup. Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid' instruction is available. Reviewed by: grehan MFC after: 3 days	2013-10-16 18:20:27 +00:00
neel	f9f9a7e617	Parse the memory size parameter using expand_number() to allow specifying the memory size more intuitively (e.g. 512M, 4G etc). Submitted by: rodrigc Reviewed by: grehan Approved by: re (blanket)	2013-10-09 03:56:07 +00:00
neel	aed205d5cd	Merge projects/bhyve_npt_pmap into head. Make the amd64/pmap code aware of nested page table mappings used by bhyve guests. This allows bhyve to associate each guest with its own vmspace and deal with nested page faults in the context of that vmspace. This also enables features like accessed/dirty bit tracking, swapping to disk and transparent superpage promotions of guest memory. Guest vmspace: Each bhyve guest has a unique vmspace to represent the physical memory allocated to the guest. Each memory segment allocated by the guest is mapped into the guest's address space via the 'vmspace->vm_map' and is backed by an object of type OBJT_DEFAULT. pmap types: The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT. The PT_X86 pmap type is used by the vmspace associated with the host kernel as well as user processes executing on the host. The PT_EPT pmap is used by the vmspace associated with a bhyve guest. Page Table Entries: The EPT page table entries as mostly similar in functionality to regular page table entries although there are some differences in terms of what bits are used to express that functionality. For e.g. the dirty bit is represented by bit 9 in the nested PTE as opposed to bit 6 in the regular x86 PTE. Therefore the bitmask representing the dirty bit is now computed at runtime based on the type of the pmap. Thus PG_M that was previously a macro now becomes a local variable that is initialized at runtime using 'pmap_modified_bit(pmap)'. An additional wrinkle associated with EPT mappings is that older Intel processors don't have hardware support for tracking accessed/dirty bits in the PTE. This means that the amd64/pmap code needs to emulate these bits to provide proper accounting to the VM subsystem. This is achieved by using the following mapping for EPT entries that need emulation of A/D bits: Bit Position Interpreted By PG_V 52 software (accessed bit emulation handler) PG_RW 53 software (dirty bit emulation handler) PG_A 0 hardware (aka EPT_PG_RD) PG_M 1 hardware (aka EPT_PG_WR) The idea to use the mapping listed above for A/D bit emulation came from Alan Cox (alc@). The final difference with respect to x86 PTEs is that some EPT implementations do not support superpage mappings. This is recorded in the 'pm_flags' field of the pmap. TLB invalidation: The amd64/pmap code has a number of ways to do invalidation of mappings that may be cached in the TLB: single page, multiple pages in a range or the entire TLB. All of these funnel into a single EPT invalidation routine called 'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and sends an IPI to the host cpus that are executing the guest's vcpus. On a subsequent entry into the guest it will detect that the EPT has changed and invalidate the mappings from the TLB. Guest memory access: Since the guest memory is no longer wired we need to hold the host physical page that backs the guest physical page before we can access it. The helper functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose. PCI passthru: Guest's with PCI passthru devices will wire the entire guest physical address space. The MMIO BAR associated with the passthru device is backed by a vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that have one or more PCI passthru devices attached to them. Limitations: There isn't a way to map a guest physical page without execute permissions. This is because the amd64/pmap code interprets the guest physical mappings as user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U shares the same bit position as EPT_PG_EXECUTE all guest mappings become automatically executable. Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews as well as their support and encouragement. Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing object for pci passthru mmio regions. Special thanks to Peter Holm for testing the patch on short notice. Approved by: re Discussed with: grehan Reviewed by: alc, kib Tested by: pho	2013-10-05 21:22:35 +00:00

1 2

65 Commits