freebsd-dev

Author	SHA1	Message	Date
Dmitry Chagin	2dedc1281a	Revert r266925 as it can lead to instant panic at fexecve(): To allow to run the interpreter itself add a new ELF branding type. Pointed out by: kib, mjg	2014-06-17 05:29:18 +00:00
Tycho Nightingale	a026dc3fcb	Bring an overly enthusiastic KASSERT inline with the Intel SDM. Reviewed by: neel	2014-06-16 22:59:18 +00:00
Attilio Rao	3ae10f7477	- Modify vm_page_unwire() and vm_page_enqueue() to directly accept the queue where to enqueue pages that are going to be unwired. - Add stronger checks to the enqueue/dequeue for the pagequeues when adding and removing pages to them. Of course, for unmanaged pages the queue parameter of vm_page_unwire() will be ignored, just as the active parameter today. This makes adding new pagequeues quicker. This change effectively modifies the KPI. __FreeBSD_version will be, however, bumped just when the full cache of free pages will be evicted. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2014-06-16 18:15:27 +00:00
Roger Pau Monné	ef409ede7b	amd64/i386: introduce APIC hooks for different APIC implementations. This is needed for Xen PV(H) guests, since there's no hardware lapic available on this kind of domains. This commit should not change functionality. Sponsored by: Citrix Systems R&D Reviewed by: jhb Approved by: gibbs amd64/include/cpu.h: amd64/amd64/mp_machdep.c: i386/include/cpu.h: i386/i386/mp_machdep.c: - Remove lapic_ipi_vectored hook from cpu_ops, since it's now implemented in the lapic hooks. amd64/amd64/mp_machdep.c: i386/i386/mp_machdep.c: - Use lapic_ipi_vectored directly, since it's now an inline function that will call the appropiate hook. x86/x86/local_apic.c: - Prefix bare metal public lapic functions with native_ and mark them as static. - Define default implementation of apic_ops. x86/include/apicvar.h: - Declare the apic_ops structure and create inline functions to access the hooks, so the change is transparent to existing users of the lapic_ functions. x86/xen/hvm.c: - Switch to use the new apic_ops.	2014-06-16 08:43:03 +00:00
Tycho Nightingale	5ebc578ba6	Replace enum forward declarations with complete definitions. Reviewed by: neel	2014-06-10 18:46:00 +00:00
Neel Natu	404874659f	Add helper functions to populate VM exit information for rendezvous and astpending exits. This is to reduce code duplication between VT-x and SVM implementations.	2014-06-10 16:45:58 +00:00
Neel Natu	0494cb1bcb	Turn on interrupt window exiting unconditionally when an ExtINT is being injected into the guest. This allows the hypervisor to inject another ExtINT or APIC vector as soon as the guest is able to process interrupts. This change is not to address any correctness issue but to guarantee that any pending APIC vector that was preempted by the ExtINT will be injected as soon as possible. Prior to this change such pending interrupts could be delayed until the next VM exit.	2014-06-10 01:38:02 +00:00
Neel Natu	051f2bd19d	Add reserved bit checking when doing %CR8 emulation and inject #GP if required. Pointed out by: grehan Reviewed by: tychon	2014-06-09 20:51:08 +00:00
Neel Natu	5fcf252f41	Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained by vmm.ko. This allows the virtual machine to be restarted without having to destroy it first. Reviewed by: grehan	2014-06-07 21:36:52 +00:00
Alan Cox	dd05fa1945	Add a page size field to struct vm_page. Increase the page size field when a partially populated reservation becomes fully populated, and decrease this field when a fully populated reservation becomes partially populated. Use this field to simplify the implementation of pmap_enter_object() on amd64, arm, and i386. On all architectures where we support superpages, the cost of creating a superpage mapping is roughly the same as creating a base page mapping. For example, both kinds of mappings entail the creation of a single PTE and PV entry. With this in mind, use the page size field to make the implementation of vm_map_pmap_enter(..., MAP_PREFAULT_PARTIAL) a little smarter. Previously, if MAP_PREFAULT_PARTIAL was specified to vm_map_pmap_enter(), that function would only map base pages. Now, it will create up to 96 base page or superpage mappings. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-06-07 17:12:26 +00:00
Tycho Nightingale	594db0024e	Support guest accesses to %cr8. Reviewed by: neel	2014-06-06 18:23:49 +00:00
Warner Losh	3f1afabf09	Restore comments accidentally removed. MFC after: 3 days	2014-06-06 04:08:55 +00:00
Neel Natu	95ebc360ef	Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing it implicitly in vmm.ko. Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and "--get-suspended-cpus" options. This is in preparation for being able to reset virtual machine state without having to destroy and recreate it.	2014-05-31 23:37:34 +00:00
Dmitry Chagin	5f56da1891	To allow to run the interpreter itself add a new ELF branding type. Allow Linux ABI to run ELF interpreter. MFC after: 3 days	2014-05-31 15:01:51 +00:00
Tycho Nightingale	11669a681c	If VMX isn't enabled so long as the lock bit isn't set yet in MSR IA32_FEATURE_CONTROL it still can be. Approved by: grehan (co-mentor)	2014-05-30 23:37:31 +00:00
Neel Natu	92754b1199	Remove bogus check for kmem_malloc() failure even though M_WAITOK is set. Requested by: jkim	2014-05-30 20:58:32 +00:00
Neel Natu	8e351f8a3c	Allocate a zeroed LDT. Failing to do this might result in the LDT appearing to run out of free descriptors because of random junk in the descriptor's 'sd_type' field. http://lists.freebsd.org/pipermail/freebsd-amd64/2014-May/016088.html Reviewed by: kib MFC after: 2 weeks	2014-05-30 18:59:37 +00:00
Konstantin Belousov	64e9726555	When usermode loaded non-default segment selector into the %gs, correctly prepare KGSBASE msr to restore the user descriptor base on the last swapgs during return to usermode. Reported and tested by: peterj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-05-29 16:18:31 +00:00
Mark Johnston	f2789bd5c7	Commit the rest of the changes that were intended to be part of r266826. X-MFC-with: r266826	2014-05-29 01:42:22 +00:00
John Baldwin	44a68c4e40	- Rework the XSAVE/XRSTOR emulation to only expose XCR0 features to the guest for which the rules regarding xsetbv emulation are known. In particular future extensions like AVX-512 have interdependencies among feature bits that could allow a guest to trigger a GP# in the host with the current approach of allowing anything the host supports. - Add proper checking of Intel MPX and AVX-512 XSAVE features in the xsetbv emulation and allow these features to be exposed to the guest if they are enabled in the host. - Expose a subset of known-safe features from leaf 0 of the structured extended features to guests if they are supported on the host including RDFSBASE/RDGSBASE, BMI1/2, AVX2, AVX-512, HLE, ERMS, and RTM. Aside from AVX-512, these features are all new instructions available for use in ring 3 with no additional hypervisor changes needed. Reviewed by: neel	2014-05-27 19:04:38 +00:00
Neel Natu	65ffa035a7	Add segment protection and limits violation checks in vie_calculate_gla() for 32-bit x86 guests. Tested using ins/outs executed in a FreeBSD/i386 guest.	2014-05-27 04:26:22 +00:00
Neel Natu	ae0780bbf1	Remove restriction on insb/insw/insl emulation. These instructions are properly emulated.	2014-05-25 02:05:23 +00:00
Neel Natu	5382c19d81	Do the linear address calculation for the ins/outs emulation using a new API function 'vie_calculate_gla()'. While the current implementation is simplistic it forms the basis of doing segmentation checks if the guest is in 32-bit protected mode.	2014-05-25 00:57:24 +00:00
Neel Natu	da11f4aa1d	Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out of the guest linear address space. These APIs in turn use a new ioctl 'VM_GLA2GPA' to convert the guest linear address to guest physical. Use the new copyin/copyout APIs when emulating ins/outs instruction in bhyve(8).	2014-05-24 23:12:30 +00:00
Neel Natu	e813a87350	Consolidate all the information needed by the guest page table walker into 'struct vm_guest_paging'. Check for canonical addressing in vmm_gla2gpa() and inject a protection fault into the guest if a violation is detected. If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to point to the root of the page tables.	2014-05-24 20:26:57 +00:00
Neel Natu	37a723a5b3	When injecting a page fault into the guest also update the guest's %cr2 to indicate the faulting linear address. If the guest PML4 entry has the PG_PS bit set then inject a page fault into the guest with the PGEX_RSV bit set in the error_code. Get rid of redundant checks for the PG_RW violations when walking the page tables.	2014-05-24 19:13:25 +00:00
Neel Natu	a7424861fb	Check for alignment check violation when processing in/out string instructions.	2014-05-23 19:59:14 +00:00
Neel Natu	d17b5104a9	Add emulation of the "outsb" instruction. NetBSD guests use this to write to the UART FIFO. The emulation is constrained in a number of ways: 64-bit only, doesn't check for all exception conditions, limited to i/o ports emulated in userspace. Some of these constraints will be relaxed in followup commits. Requested by: grehan Reviewed by: tychon (partially and a much earlier version)	2014-05-23 05:15:17 +00:00
Neel Natu	c5e423dd2e	A Centos 6.4 guest will write 0xff to the 8259 mask register before beginning the proper ICWx initialization sequence. It assumes, probably correctly, that the boot firmware has done the 8259 initialization. Since grub-bhyve does not initialize the 8259 this write to the mask register takes a code path in which 'error' remains uninitialized (ready=0,icw_num=0). Fix this by initializing 'error' at the start of the function.	2014-05-23 05:04:50 +00:00
John Baldwin	0eb7ae8d0a	Don't permit users to request a subset of the AVX512 or MPX xsave masks. These masks are documented in the Intel Architecture Instruction Set Extensions Programming Reference (March 2014). Reviewed by: kib MFC after: 1 month	2014-05-22 18:22:02 +00:00
Neel Natu	ba6f5e23cc	Allow vmx_getdesc() and vmx_setdesc() to be called for a vcpu that is in the VCPU_RUNNING state. This will let the VMX exit handler inspect the vcpu's segment descriptors without having to exit the critical section.	2014-05-22 17:22:37 +00:00
Justin Hibbits	81e3caaf77	imagact_binmisc builds for all supported architectures, so enable it for all. Any bugs in execution will be dealt with as they crop up. MFC after: 3 weeks Relnotes: Yes	2014-05-22 05:04:40 +00:00
Neel Natu	fd949af642	Inject page fault into the guest if the page table walker detects an invalid translation for the guest linear address.	2014-05-22 03:14:54 +00:00
Neel Natu	f888763dd8	Add PG_RW check when translating a guest linear to guest physical address. Set the accessed and dirty bits in the page table entry. If it fails then restart the page table walk from the beginning. This might happen if another vcpu modifies the page tables simultaneously. Reviewed by: alc, kib	2014-05-20 20:30:28 +00:00
John Baldwin	674b6d6e0d	Add support for decoding the AMD SVM instructions.	2014-05-19 18:07:37 +00:00
Neel Natu	e4c8a13d61	Add PG_U (user/supervisor) checks when translating a guest linear address to a guest physical address. PG_PS (page size) field is valid only in a PDE or a PDPTE so it is now checked only in non-terminal paging entries. Ignore the upper 32-bits of the CR3 for PAE paging.	2014-05-19 03:50:07 +00:00
Peter Grehan	897bb47e7b	Make the vmx asm code dtrace-fbt-friendly by - inserting frame enter/leave sequences - restructuring the vmx_enter_guest routine so that it subsumes the vm_exit_guest block, which was the #vmexit RIP and not a callable routine. Reviewed by: neel MFC after: 3 weeks	2014-05-18 03:50:17 +00:00
John Baldwin	8b3949c344	Add support for decoding rdrand and rdseed.	2014-05-17 21:10:03 +00:00
John Baldwin	355d8a2f91	Add definitions for more structured extended features as well as XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions). Obtained from: Intel's Instruction Set Extensions Programming Reference (March 2014)	2014-05-16 17:45:09 +00:00
John Baldwin	b3e9732a76	Implement a PCI interrupt router to route PCI legacy INTx interrupts to the legacy 8259A PICs. - Implement an ICH-comptabile PCI interrupt router on the lpc device with 8 steerable pins configured via config space access to byte-wide registers at 0x60-63 and 0x68-6b. - For each configured PCI INTx interrupt, route it to both an I/O APIC pin and a PCI interrupt router pin. When a PCI INTx interrupt is asserted, ensure that both pins are asserted. - Provide an initial routing of PCI interrupt router (PIRQ) pins to 8259A pins (ISA IRQs) and initialize the interrupt line config register for the corresponding PCI function with the ISA IRQ as this matches existing hardware. - Add a global _PIC method for OSPM to select the desired interrupt routing configuration. - Update the _PRT methods for PCI bridges to provide both APIC and legacy PRT tables and return the appropriate table based on the configured routing configuration. Note that if the lpc device is not configured, no routing information is provided. - When the lpc device is enabled, provide ACPI PCI link devices corresponding to each PIRQ pin. - Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A pins via the ELCR. - Mark the power management SCI as level triggered. - Don't hardcode the number of elements in Packages in the source for the DSDT. iasl(8) will fill in the actual number of elements, and this makes it simpler to generate a Package with a variable number of elements. Reviewed by: tycho	2014-05-15 14:16:55 +00:00
Neel Natu	f3db4c53e6	Increase the TSS limit by one byte. The processor requires an additional byte with all bits set to 1 beyond the I/O permission bitmap. Prior to this change accessing I/O ports [0xFFF8-0xFFFF] would trigger a #GP fault even though the I/O bitmap allowed access to those ports. For more details see section "I/O Permission Bit Map" in the Intel SDM, Vol 1. Reviewed by: kib	2014-05-14 22:24:09 +00:00
Neel Natu	055fc2cb5e	Virtual machine halt detection is turned on by default. Allow it to be disabled via the tunable 'hw.vmm.halt_detection'.	2014-05-05 16:19:24 +00:00
Nathan Whitehorn	a9d0ed68b3	Disable ACPI and P4TCC throttling by default, following discussion on freebsd-current. These CPU speed control techniques are usually unhelpful at best. For now, continue building the relevant code into GENERIC so that it can trivially be re-enabled at runtime if anyone wants it. MFC after: 1 month	2014-05-04 16:38:21 +00:00
Kenneth D. Merry	991554f2c4	Bring in the mpr(4) driver for LSI's MPT3 12Gb SAS controllers. This is derived from the mps(4) driver, but it supports only the 12Gb IT and IR hardware including the SAS 3004, SAS 3008 and SAS 3108. Some notes about this driver: o The 12Gb hardware can do "FastPath" I/O, and that capability is included in this driver. o WarpDrive functionality has been removed, since it isn't supported in the 12Gb driver interface. o The Scatter/Gather list handling code is significantly different between the 6Gb and 12Gb hardware. The 12Gb boards support IEEE Scatter/Gather lists. Thanks to LSI for developing and testing this driver for FreeBSD. share/man/man4/mpr.4: mpr(4) man page. sys/dev/mpr/*: mpr(4) driver files. sys/modules/Makefile, sys/modules/mpr/Makefile: Add a module Makefile for the mpr(4) driver. sys/conf/files: Add the mpr(4) driver. sys/amd64/conf/GENERIC, sys/i386/conf/GENERIC, sys/mips/conf/OCTEON1, sys/sparc64/conf/GENERIC: Add the mpr(4) driver to all config files that currently have the mps(4) driver. sys/ia64/conf/GENERIC: Add the mps(4) and mpr(4) drivers to the ia64 GENERIC config file. sys/i386/conf/XEN: Exclude the mpr module from building here. Submitted by: Steve McConnell <Stephen.McConnell@lsi.com> MFC after: 3 days Tested by: Chris Reeves <chrisr@spectralogic.com> Sponsored by: LSI, Spectra Logic Relnotes: LSI 12Gb SAS driver mpr(4) added	2014-05-02 20:25:09 +00:00
Eitan Adler	804e017089	lindev(4): finish the partial commit in r265212 lindev(4) was only used to provide /dev/full which is now a standard feature of FreeBSD. /dev/full was never linux-specific and provides a generally useful feature. Document this in UPDATING and bump __FreeBSD_version. This will be documented in the PH shortly. Reported by: jkim	2014-05-02 07:14:22 +00:00
Neel Natu	e50ce2aa06	Add logic in the HLT exit handler to detect if the guest has put all vcpus to sleep permanently by executing a HLT with interrupts disabled. When this condition is detected the guest with be suspended with a reason of VM_SUSPEND_HALT and the bhyve(8) process will exit. Tested by executing "halt" inside a RHEL7-beta guest. Discussed with: grehan@ Reviewed by: jhb@, tychon@	2014-05-02 00:33:56 +00:00
Neel Natu	2cb97c9dd6	Ignore writes to microcode update MSR. This MSR is accessed by RHEL7 guest. Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.	2014-04-30 02:08:27 +00:00
Neel Natu	c6a0cc2e21	Some Linux guests will implement a 'halt' by disabling the APIC and executing the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET). This functionality was broken in r263780 in a way that made it impossible to kill the bhyve(8) process because it would loop forever in vm_handle_suspend(). Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from a Linux guest will appear to be hung but this is consistent with the behavior on bare metal. The guest can be rebooted by using the bhyvectl options '--force-reset' or '--force-poweroff'. Reviewed by: grehan@	2014-04-29 18:42:56 +00:00
Neel Natu	f0fdcfe247	Allow a virtual machine to be forcibly reset or powered off. This is done by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF. The disposition of VM_SUSPEND is also made available to the exit handler via the 'u.suspended' member of 'struct vm_exit'. This capability is exposed via the '--force-reset' and '--force-poweroff' arguments to /usr/sbin/bhyvectl. Discussed with: grehan@	2014-04-28 22:06:40 +00:00
Ed Maste	b6a0a32b58	Report boot method (BIOS/UEFI) via sysctl machdep.bootmethod Sponsored by: The FreeBSD Foundation	2014-04-27 15:14:59 +00:00

1 2 3 4 5 ...

6718 Commits