freebsd-skq

Author	SHA1	Message	Date
bryanv	4b08d4e97f	Add VirtIO console to the x86 NOTES and files Requested by: jhb	2014-11-03 22:37:10 +00:00
jhb	fdfced8ce8	MFamd64: Add support for extended FPU states on i386. This includes support for AVX on i386. - Similar to amd64, move the FPU save area out of the PCB and instead store saved FPU state in a variable-sized buffer after the PCB on the stack. - To support the variable PCB location, alter the locore code to only use the bottom-most page of proc0stack for init386(). init386() returns the correct stack pointer to locore which adjusts the stack for thread0 before calling mi_startup(). - Don't bother setting cr3 in thread0's pcb in locore before calling init386(). It wasn't used (init386() overwrote it at the end) and it doesn't work with the variable-sized FPU save area. - Remove the new-bus attachment from npx. This was only ever useful for external co-processors using IRQ13, but those have not been supported for several years. npxinit() is now called much earlier during boot (init386()) similar to amd64. - Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE. - npxsave() is now only called from context switch contexts so it can use XSAVEOPT. Differential Revision: https://reviews.freebsd.org/D1058 Reviewed by: kib Tested on: FreeBSD/i386 VM under bhyve on Intel i5-2520	2014-11-02 22:58:30 +00:00
jhb	d47eb7d2d4	Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel	2014-10-28 19:17:44 +00:00
kib	ad7bf17db7	Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:28:20 +00:00
kib	29a659ef8e	Add fueword(9) and casueword(9) functions. They are like fuword(9) and casuword(9), but do not mix value read and indication of fault. I know (or remember) enough assembly to handle x86 and powerpc. For arm, mips and sparc64, implement fueword() and casueword() as wrappers around fuword() and casuword(), which means that the functions cannot distinguish between -1 and fault. On architectures where fueword() and casueword() are native, implement fuword() and casuword() using fueword() and casuword(), to reduce assembly code duplication. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks (ia64 needs treating)	2014-10-28 15:22:13 +00:00
araujo	8a51fd3c94	Reported by: Coverity CID: 1249760 Reviewed by: neel Approved by: neel Sponsored by: QNAP Systems Inc.	2014-10-28 07:19:02 +00:00
grehan	4789422fd1	Remove bhyve SVM feature printf's now that they are available in the general CPU feature detection code. Reviewed by: neel	2014-10-27 22:20:51 +00:00
neel	4ee18dab17	Change the type of the first argument to the I/O emulation handlers to 'struct vm '. Previously it used to be a 'void ' but there is no reason to hide the actual type from the handler. Discussed with: tychon MFC after: 1 week	2014-10-26 19:03:06 +00:00
alc	dc84db712f	By the time that pmap_init() runs, vm_phys_segs[] has been initialized. Obtaining the end of memory address from vm_phys_segs[] is a little easier than obtaining it from phys_avail[]. Discussed with: Svatopluk Kraus	2014-10-26 17:56:47 +00:00
neel	eb58530f9b	Move the ACPI PM timer emulation into vmm.ko. This reduces variability during timer calibration by keeping the emulation "close" to the guest. Additionally having all timer emulations in the kernel will ease the transition to a per-VM clock source (as opposed to using the host's uptime keep track of time). Discussed with: grehan	2014-10-26 04:44:28 +00:00
neel	a76d8ee4e9	Don't pass the 'error' return from an I/O port handler directly to vm_run(). Most I/O port handlers return -1 to signal an error. If this value is returned without modification to vm_run() then it leads to incorrect behavior because '-1' is interpreted as ERESTART at the system call level. Fix this by always returning EIO to signal an error from an I/O port handler. MFC after: 1 week	2014-10-26 03:03:41 +00:00
jhb	fcc57fff95	Add COMPAT_FREEBSD9 and COMPAT_FREEBSD10 options to wrap code that provides compatability for FreeBSD 9.x and 10.x binaries. Enable these options in kernel configs that enable other COMPAT_FREEBSD<n> options.	2014-10-24 19:58:24 +00:00
royger	0e3d9b8126	amd64: make uiomove_fromphys functional for pages not mapped by the DMAP Place the code introduced in r268660 into a separate function that can be called from uiomove_fromphys. Instead of pre-allocating two KVA pages use vmem_alloc to allocate them on demand when needed. This prevents blocking if a page fault is taken while physical addresses from outside the DMAP are used, since the lock is now removed. Also introduce a safety catch in PHYS_TO_DMAP and DMAP_TO_PHYS. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D947 amd64/amd64/pmap.c: - Factor out the code to deal with non DMAP addresses from pmap_copy_pages and place it in pmap_map_io_transient. - Change the code to use vmem_alloc instead of a set of pre-allocated pages. - Use pmap_qenter and don't pin the thread if there can be page faults. amd64/amd64/uio_machdep.c: - Use pmap_map_io_transient in order to correctly deal with physical addresses not covered by the DMAP. amd64/include/pmap.h: - Add the prototypes for the new functions. amd64/include/vmparam.h: - Add safety catches to make sure PHYS_TO_DMAP and DMAP_TO_PHYS are only used with addresses covered by the DMAP.	2014-10-24 09:48:58 +00:00
royger	919b7d8b7c	xen: implement the privcmd user-space device This device is only attached to priviledged domains, and allows the toolstack to interact with Xen. The two functions of the privcmd interface is to allow the execution of hypercalls from user-space, and the mapping of foreign domain memory. Sponsored by: Citrix Systems R&D i386/include/xen/hypercall.h: amd64/include/xen/hypercall.h: - Introduce a function to make generic hypercalls into Xen. xen/interface/xen.h: xen/interface/memory.h: - Import the new hypercall XENMEM_add_to_physmap_range used by auto-translated guests to map memory from foreign domains. dev/xen/privcmd/privcmd.c: - This device has the following functions: - Allow user-space applications to make hypercalls into Xen. - Allow user-space applications to map memory from foreign domains, this is accomplished using the newly introduced hypercall (XENMEM_add_to_physmap_range). xen/privcmd.h: - Public ioctl interface for the privcmd device. x86/xen/hvm.c: - Remove declaration of hypercall_page, now it's declared in hypercall.h. conf/files: - Add the privcmd device to the build process.	2014-10-22 17:07:20 +00:00
hselasky	49c137f7be	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
neel	4797036028	Merge projects/bhyve_svm into HEAD. After this change bhyve supports AMD processors with the SVM/AMD-V hardware extensions. More details available here: https://lists.freebsd.org/pipermail/freebsd-virtualization/2014-October/002905.html Submitted by: Anish Gupta (akgupt3@gmail.com) Tested by: Benjamin Perrault (ben.perrault@gmail.com) Tested by: Willem Jan Withagen (wjw@digiware.nl)	2014-10-21 07:10:43 +00:00
neel	fb464d4f98	Fix a race in pmap_emulate_accessed_dirty() that could trigger a EPT misconfiguration VM-exit. An EPT misconfiguration is triggered when the processor encounters a PTE that is writable but not readable (WR=10). On processors that require A/D bit emulation PG_M and PG_A map to EPT_PG_WRITE and EPT_PG_READ respectively. If the PTE is updated as in the following code snippet: pte \|= PG_M; pte \|= PG_A; then it is possible for another processor to observe the PTE after the PG_M (aka EPT_PG_WRITE) bit is set but before PG_A (aka EPT_PG_READ) bit is set. This will trigger an EPT misconfiguration VM-exit on the other processor. Reported by: rodrigc Reviewed by: grehan MFC after: 3 days	2014-10-21 01:06:58 +00:00
neel	be8b3ca439	Merge from projects/bhyve_svm all the changes outside vmm.ko or bhyve utilities: Add support for AMD's nested page tables in pmap.c: - Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit) for a pmap of type PT_RVI. - Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of type PT_EPT or PT_RVI. Add CPU_SET_ATOMIC_ACQ(num, cpuset): This is used when activating a vcpu in the nested pmap. Using the 'acquire' variant guarantees that the load of the 'pm_eptgen' will happen only after the vcpu is activated in 'pm_active'. Add defines for various AMD-specific MSRs. Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-10-20 18:09:33 +00:00
neel	dd2febd6f2	IFC @r273214	2014-10-20 02:57:30 +00:00
neel	53c23ba9a2	IFC @r273206	2014-10-19 23:05:18 +00:00
neel	13e9198693	Don't advertise the "OS visible workarounds" feature in cpuid.80000001H:ECX. bhyve doesn't emulate the MSRs needed to support this feature at this time. Don't expose any model-specific RAS and performance monitoring features in cpuid leaf 80000007H. Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and BIOS signature and P-state related MSRs. This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels 2.6.32, 3.10.0 and 3.17.0.	2014-10-19 21:38:58 +00:00
neel	d5ca76422a	Don't advertise support for the NodeID MSR since bhyve doesn't emulate it.	2014-10-18 05:39:32 +00:00
imp	ad9f7e3038	Fix build to not bogusly always rebuild vmm.ko. Rename vmx_assym.s to vmx_assym.h to reflect that file's actual use and update vmx_support.S's include to match. Add vmx_assym.h to the SRCS to that it gets properly added to the dependency list. Add vmx_support.S to SRCS as well, so it gets built and needs fewer special-case goo. Remove now-redundant special-case goo. Finally, vmx_genassym.o doesn't need to depend on a hand expanded ${_ILINKS} explicitly, that's all taken care of by beforedepend. With these items fixed, we no longer build vmm.ko every single time through the modules on a KERNFAST build. Sponsored by: Netflix	2014-10-17 13:20:49 +00:00
neel	170f49cd34	Don't advertise the Instruction Based Sampling feature because it requires emulating a large number of MSRs. Ignore writes to a couple more AMD-specific MSRs and return 0 on read. This further reduces the unimplemented MSRs accessed by a Linux guest on boot.	2014-10-17 06:23:04 +00:00
neel	b194fc5a2b	Hide extended PerfCtr MSRs on AMD processors by clearing bits 23, 24 and 28 in CPUID.80000001H:ECX. Handle accesses to PerfCtrX and PerfEvtSelX MSRs by ignoring writes and returning 0 on reads. This further reduces the number of unimplemented MSRs hit by a Linux guest during boot.	2014-10-17 03:04:38 +00:00
neel	8bc5fc92e5	Use the correct fault type (VM_PROT_EXECUTE) for an instruction fetch.	2014-10-16 18:16:31 +00:00
neel	09fbd3f7d9	Fix topology enumeration issues exposed by AMD Bulldozer Family 15h processor. Initialize CPUID.80000008H:ECX[7:0] with the number of logical processors in the package. This fixes a panic during early boot in NetBSD 7.0 BETA. Clear the Topology Extension feature bit from CPUID.80000001H:ECX since we don't emulate leaves 0x8000001D and 0x8000001E. This fixes a divide by zero panic in early boot in Centos 6.4. Tested on an "AMD Opteron 6320" courtesy of Ben Perrault. Reviewed by: grehan	2014-10-16 18:13:10 +00:00
davide	e88bd26b3f	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
neel	c4ba9b78fd	Actually hide the SVM capability by clearing CPUID.80000001H:ECX[bit 3] after it has been initialized by cpuid_count(). Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-10-15 04:29:03 +00:00
neel	268d74c73a	Emulate "POP r/m". This is needed to boot OpenBSD/i386 MP kernel in bhyve. Reported by: grehan MFC after: 1 week	2014-10-14 21:02:33 +00:00
neel	651ecc1e21	Remove extraneous comments.	2014-10-11 04:57:17 +00:00
neel	0d5ab4a163	Get rid of unused headers. Restrict scope of malloc types M_SVM and M_SVM_VLAPIC by making them static. Replace ERR() with KASSERT(). style(9) cleanup.	2014-10-11 04:41:21 +00:00
neel	bbecfc3c42	Get rid of unused forward declaration of 'struct svm_softc'.	2014-10-11 03:21:33 +00:00
neel	3e5970c8ec	style(9) fixes. Get rid of unused headers.	2014-10-11 03:19:26 +00:00
neel	c09e1bd7e2	Use a consistent style for messages emitted when the module is loaded.	2014-10-11 03:09:34 +00:00
neel	bfa9f55e70	IFC @r272887	2014-10-10 23:52:56 +00:00
neel	ae1ff84ece	Fix bhyvectl so it works correctly on AMD/SVM hosts. Also, add command line options to display some key VMCB fields. The set of valid options that can be passed to bhyvectl now depends on the processor type. AMD-specific options are identified by a "--vmcb" or "--avic" in the option name. Intel-specific options are identified by a "--vmcs" in the option name. Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-10-10 21:48:59 +00:00
neel	f443215307	Support Intel-specific MSRs that are accessed when booting up a linux in bhyve: - MSR_PLATFORM_INFO - MSR_TURBO_RATIO_LIMITx - MSR_RAPL_POWER_UNIT Reviewed by: grehan MFC after: 1 week	2014-10-09 19:13:33 +00:00
markj	0ebf86e1b1	Pass up the error status of minidumpsys() to its callers. PR: 193761 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Sponsored by: EMC / Isilon Storage Division	2014-10-08 20:25:21 +00:00
kib	30a51a18f4	Add an argument to the x86 pmap_invalidate_cache_range() to request forced invalidation of the cache range regardless of the presence of self-snoop feature. Some recent Intel GPUs in some modes are not coherent, and dirty lines in CPU cache must be flushed before the pages are transferred to GPU domain. Reviewed by: alc (previous version) Tested by: pho (amd64) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-10-08 16:48:03 +00:00
neel	ce319c48f4	Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'. The hypervisor hides the MONITOR/MWAIT capability by unconditionally setting CPUID.01H:ECX[3] to 0 so the guest should not expect these instructions to be present anyways. Discussed with: grehan	2014-10-06 20:48:01 +00:00
neel	16cbb8793a	IFC @r272481	2014-10-05 01:28:21 +00:00
neel	f6b1385c0e	Get rid of code that dealt with the hardware not being able to save/restore the PAT MSR on guest exit/entry. This workaround was done for a beta release of VMware Fusion 5 but is no longer needed in later versions. All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT in the VM exit and entry controls. Discussed with: grehan	2014-10-02 05:32:29 +00:00
royger	c5a5f5947f	msi: add Xen MSI implementation This patch adds support for MSI interrupts when running on Xen. Apart from adding the Xen related code needed in order to register MSI interrupts this patch also makes the msi_init function a hook in init_ops, so different MSI implementations can have different initialization functions. Sponsored by: Citrix Systems R&D xen/interface/physdev.h: - Add the MAP_PIRQ_TYPE_MULTI_MSI to map multi-vector MSI to the Xen public interface. x86/include/init.h: - Add a hook for setting custom msi_init methods. amd64/amd64/machdep.c: i386/i386/machdep.c: - Set the default msi_init hook to point to the native MSI initialization method. x86/xen/pv.c: - Set the Xen MSI init hook when running as a Xen guest. x86/x86/local_apic.c: - Call the msi_init hook instead of directly calling msi_init. xen/xen_intr.h: x86/xen/xen_intr.c: - Introduce support for registering/releasing MSI interrupts with Xen. - The MSI interrupts will use the same PIC as the IO APIC interrupts. xen/xen_msi.h: x86/xen/xen_msi.c: - Introduce a Xen MSI implementation. x86/xen/xen_nexus.c: - Overwrite the default MSI hooks in the Xen Nexus to use the Xen MSI implementation. x86/xen/xen_pci.c: - Introduce a Xen specific PCI bus that inherits from the ACPI PCI bus and overwrites the native MSI methods. - This is needed because when running under Xen the MSI messages used to configure MSI interrupts on PCI devices are written by Xen itself. dev/acpica/acpi_pci.c: - Lower the quality of the ACPI PCI bus so the newly introduced Xen PCI bus can take over when needed. conf/files.i386: conf/files.amd64: - Add the newly created files to the build process.	2014-09-30 16:46:45 +00:00
neel	09ab9e2937	IFC @r272185	2014-09-27 22:15:50 +00:00
neel	11f176f814	Simplify register state save and restore across a VMRUN: - Host registers are now stored on the stack instead of a per-cpu host context. - Host %FS and %GS selectors are not saved and restored across VMRUN. - Restoring the %FS/%GS selectors was futile anyways since that only updates the low 32 bits of base address in the hidden descriptor state. - GS.base is properly updated via the MSR_GSBASE on return from svm_launch(). - FS.base is not used while inside the kernel so it can be safely ignored. - Add function prologue/epilogue so svm_launch() can be traced with Dtrace's FBT entry/exit probes. They also serve to save/restore the host %rbp across VMRUN. Reviewed by: grehan Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-27 02:04:58 +00:00
grehan	8be950fc2c	Allow the PIC's IMR register to be read before ICW initialisation. As of git submit e179f6914152eca9, the Linux kernel does a simple probe of the PIC by writing a pattern to the IMR and then reading it back, prior to the init sequence of ICW words. The bhyve PIC emulation wasn't allowing the IMR to be read until the ICW sequence was complete. This limitation isn't required so relax the test. With this change, Linux kernels 3.15-rc2 and later won't hang on boot when calibrating the local APIC. Reviewed by: tychon MFC after: 3 days	2014-09-27 01:15:24 +00:00
royger	494dc32ba6	ddb: allow specifying the exact address of the symtab and strtab When the FreeBSD kernel is loaded from Xen the symtab and strtab are not loaded the same way as the native boot loader. This patch adds three new global variables to ddb that can be used to specify the exact position and size of those tables, so they can be directly used as parameters to db_add_symbol_table. A new helper is introduced, so callers that used to set ksym_start and ksym_end can use this helper to set the new variables. It also adds support for loading them from the Xen PVH port, that was previously missing those tables. Sponsored by: Citrix Systems R&D Reviewed by: kib ddb/db_main.c: - Add three new global variables: ksymtab, kstrtab, ksymtab_size that can be used to specify the position and size of the symtab and strtab. - Use those new variables in db_init in order to call db_add_symbol_table. - Move the logic in db_init to db_fetch_symtab in order to set ksymtab, kstrtab, ksymtab_size from ksym_start and ksym_end. ddb/ddb.h: - Add prototype for db_fetch_ksymtab. - Declate the extern variables ksymtab, kstrtab and ksymtab_size. x86/xen/pv.c: - Add support for finding the symtab and strtab when booted as a Xen PVH guest. Since Xen loads the symtab and strtab as NetBSD expects to find them we have to adapt and use the same method. amd64/amd64/machdep.c: arm/arm/machdep.c: i386/i386/machdep.c: mips/mips/machdep.c: pc98/pc98/machdep.c: powerpc/aim/machdep.c: powerpc/booke/machdep.c: sparc64/sparc64/machdep.c: - Use the newly introduced db_fetch_ksymtab in order to set ksymtab, kstrtab and ksymtab_size.	2014-09-25 08:28:10 +00:00
bz	7df5be5e6e	As per [1] Intel only supports this driver on 64bit platforms. For now restrict it to amd64. Other architectures might be re-added later once tested. Remove the drivers from the global NOTES and files files and move them to the amd64 specifics. Remove the drivers from the i386 modules build and only leave the amd64 version. Rather than depending on "inet" depend on "pci" and make sure that ixl(4) and ixlv(4) can be compiled independently [2]. This also allows the drivers to build properly on IPv4-only or IPv6-only kernels. PR: 193824 [2] Reviewed by: eric.joyner intel.com MFC after: 3 days References: [1] http://lists.freebsd.org/pipermail/svn-src-all/2014-August/090470.html	2014-09-23 08:33:03 +00:00
neel	35aaa3ac11	Allow more VMCB fields to be cached: - CR2 - CR0, CR3, CR4 and EFER - GDT/IDT base/limit fields - CS/DS/ES/SS selector/base/limit/attrib fields The caching can be further restricted via the tunable 'hw.vmm.svm.vmcb_clean'. Restructure the code such that the fields above are only modified in a single place. This makes it easy to invalidate the VMCB cache when any of these fields is modified.	2014-09-21 23:42:54 +00:00
neel	5834276187	Get rid of unused stat VMM_HLT_IGNORED.	2014-09-21 18:52:56 +00:00
kib	f2d3c32e1b	Update and clarify comments. Remove the useless counter for impossible, but seen in wild situation (on buggy hypervisors). In collaboration with: bde MFC after: 1 week	2014-09-21 09:06:50 +00:00
neel	6a5c27c9b1	The memory type bits (PAT, PCD, PWT) associated with a nested PTE or PDE are identical to the traditional x86 page tables.	2014-09-21 06:36:17 +00:00
neel	ef294abb97	IFC r271888. Restructure MSR emulation so it is all done in processor-specific code.	2014-09-20 21:46:31 +00:00
neel	77d107b3b9	IFC @r271887	2014-09-20 06:27:37 +00:00
neel	0350d078cb	Add some more KTR events to help debugging.	2014-09-20 05:13:03 +00:00
neel	f6a11e15ea	MSR_KGSBASE is no longer saved and restored from the guest MSR save area. This behavior was changed in r271888 so update the comment block to reflect this. MSR_KGSBASE is accessible from the guest without triggering a VM-exit. The permission bitmap for MSR_KGSBASE is modified by vmx_msr_guest_init() so get rid of redundant code in vmx_vminit().	2014-09-20 05:12:34 +00:00
neel	46721cc2c7	Restructure the MSR handling so it is entirely handled by processor-specific code. There are only a handful of MSRs common between the two so there isn't too much duplicate functionality. The VT-x code has the following types of MSRs: - MSRs that are unconditionally saved/restored on every guest/host context switch (e.g., MSR_GSBASE). - MSRs that are restored to guest values on entry to vmx_run() and saved before returning. This is an optimization for MSRs that are not used in host kernel context (e.g., MSR_KGSBASE). - MSRs that are emulated and every access by the guest causes a trap into the hypervisor (e.g., MSR_IA32_MISC_ENABLE). Reviewed by: grehan	2014-09-20 02:35:21 +00:00
kib	433c60fe45	- Use NULL instead of 0 for fpcurthread. - Note the quirk with the interrupt enabled state of the dna handler. - Use just panic() instead of printf() and panic(). Print tid instead of pid, the fpu state is per-thread. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-09-18 09:13:20 +00:00
bz	a7854053f1	Re-gen after r271743 implementing most of timer_{create,settime,gettime,getoverrun,delete}. MFC after: 3 days Sponsored by: DARPA, AFRL	2014-09-18 08:40:00 +00:00
bz	122003e2ff	Implement most of timer_{create,settime,gettime,getoverrun,delete} for amd64/linux32. Fix the entirely bogus (untested) version from r161310 for i386/linux using the same shared code in compat/linux. It is unclear to me if we could support more clock mappings but the current set allows me to successfully run commercial 32bit linux software under linuxolator on amd64. Reviewed by: jhb Differential Revision: D784 MFC after: 3 days Sponsored by: DARPA, AFRL	2014-09-18 08:36:45 +00:00
kib	cd6a1fef18	Presence of any VM_PROT bits in the permission argument on x86 implies that the entry is readable and valid. Reported by: markj Submitted by: alc Tested by: pho (previous version), markj MFC after: 3 days	2014-09-17 18:49:57 +00:00
neel	c9b7ad126a	IFC @r271694	2014-09-17 18:46:51 +00:00
neel	eefb10a184	Rework vNMI injection. Keep track of NMI blocking by enabling the IRET intercept on a successful vNMI injection. The NMI blocking condition is cleared when the handler executes an IRET and traps back into the hypervisor. Don't inject NMI if the processor is in an interrupt shadow to preserve the atomic nature of "STI;HLT". Take advantage of this and artificially set the interrupt shadow to prevent NMI injection when restarting the "iret". Reviewed by: Anish Gupta (akgupt3@gmail.com), grehan	2014-09-17 00:30:25 +00:00
neel	16169f746d	Minor cleanup. Get rid of unused 'svm_feature' from the softc. Get rid of the redundant 'vcpu_cnt' checks in svm.c. There is a similar check in vmm.c against 'vm->active_cpus' before the AMD-specific code is called. Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-09-16 04:01:55 +00:00
neel	97bffd44f6	Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to the processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes care of injecting this vector when the guest is able to receive it. Legacy PIC interrupts are still delivered via the event injection mechanism. This is because the vector injected by the PIC must reflect the state of its pins at the time the CPU is ready to accept the interrupt. Accesses to the TPR via %CR8 are handled entirely in hardware. This requires that the emulated TPR must be synced to V_TPR after a #VMEXIT. The guest can also modify the TPR via the memory mapped APIC. This requires that the V_TPR must be synced with the emulated TPR before a VMRUN. Reviewed by: Anish Gupta (akgupt3@gmail.com)	2014-09-16 03:31:40 +00:00
neel	cbc92dc709	Set the 'vmexit->inst_length' field properly depending on the type of the VM-exit and ultimately on whether nRIP is valid. This allows us to update the %rip after the emulation is finished so any exceptions triggered during the emulation will point to the right instruction. Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability is available. The effective segment field in EXITINFO1 is not valid without this capability. Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging. Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2 fields in bhyve(8). Reviewed by: Anish Gupta (akgupt3@gmail.com) Reviewed by: grehan	2014-09-14 04:39:04 +00:00
neel	3e1af2f123	Bug fixes. - Don't enable the HLT intercept by default. It will be enabled by bhyve(8) if required. Prior to this change HLT exiting was always enabled making the "-H" option to bhyve(8) meaningless. - Recognize a VM exit triggered by a non-maskable interrupt. Prior to this change the exit would be punted to userspace and the virtual machine would terminate.	2014-09-13 23:48:43 +00:00
neel	a38d07e455	style(9): insert an empty line if the function has no local variables Pointed out by: grehan	2014-09-13 22:45:04 +00:00
neel	32e0378b35	AMD processors that have the SVM decode assist capability will store the instruction bytes in the VMCB on a nested page fault. This is useful because it saves having to walk the guest page tables to fetch the instruction. vie_init() now takes two additional parameters 'inst_bytes' and 'inst_len' that map directly to 'vie->inst[]' and 'vie->num_valid'. The instruction emulation handler skips calling 'vmm_fetch_instruction()' if 'vie->num_valid' is non-zero. The use of this capability can be turned off by setting the sysctl/tunable 'hw.vmm.svm.disable_npf_assist' to '1'. Reviewed by: Anish Gupta (akgupt3@gmail.com) Discussed with: grehan	2014-09-13 22:16:40 +00:00
jhb	9741565697	Add a sysctl to export the EFI memory map along with a handler in the sysctl(8) binary to format it. Reviewed by: emaste MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D771	2014-09-13 03:10:02 +00:00
neel	b2ca87a5d0	Optimize the common case of injecting an interrupt into a vcpu after a HLT by explicitly moving it out of the interrupt shadow. The hypervisor is done "executing" the HLT and by definition this moves the vcpu out of the 1-instruction interrupt shadow. Prior to this change the interrupt would be held pending because the VMCS guest-interruptibility-state would indicate that "blocking by STI" was in effect. This resulted in an unnecessary round trip into the guest before the pending interrupt could be injected. Reviewed by: grehan	2014-09-12 06:15:20 +00:00
neel	45bb1086d6	style(9): indent the switch, don't indent the case, indent case body one tab.	2014-09-11 06:17:56 +00:00
neel	e108397c4a	Repurpose the V_IRQ interrupt injection to implement VMX-style interrupt window exiting. This simply involves setting V_IRQ and enabling the VINTR intercept. This instructs the CPU to trap back into the hypervisor as soon as an interrupt can be injected into the guest. The pending interrupt is then injected via the traditional event injection mechanism. Rework vcpu interrupt injection so that Linux guests now idle with host cpu utilization close to 0%. Reviewed by: Anish Gupta (earlier version) Discussed with: grehan	2014-09-11 02:37:02 +00:00
jhb	fa53fae9e4	MFamd64: Use initializecpu() to set various model-specific registers on AP startup and AP resume (it was already used for BSP startup and BSP resume). - Split code to do one-time probing of cache properties out of initializecpu() and into initializecpucache(). This is called once on the BSP during boot. - Move enable_sse() into initializecpu(). - Call initializecpu() for AP startup instead of enable_sse() and manually frobbing MSR_EFER to enable PG_NX. - Call initializecpu() when an AP resumes. In theory this will now properly re-enable PG_NX in MSR_EFER when resuming a PAE kernel on APs.	2014-09-10 21:37:47 +00:00
neel	907c0aec17	Allow intercepts and irq fields to be cached by the VMCB. Provide APIs svm_enable_intercept()/svm_disable_intercept() to add/delete VMCB intercepts. These APIs ensure that the VMCB state cache is invalidated when intercepts are modified. Each intercept is identified as a (index,bitmask) tuple. For e.g., the VINTR intercept is identified as (VMCB_CTRL1_INTCPT,VMCB_INTCPT_VINTR). The first 20 bytes in control area that are used to enable intercepts are represented as 'uint32_t intercept[5]' in 'struct vmcb_ctrl'. Modify svm_setcap() and svm_getcap() to use the new APIs. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 03:13:40 +00:00
neel	3c61bfaac6	Move the VMCB initialization into svm.c in preparation for changes to the interrupt injection logic. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 02:35:19 +00:00
neel	bbc4ff544f	Move the event injection function into svm.c and add KTR logging for every event injection. This in in preparation for changes to SVM guest interrupt injection. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 02:20:32 +00:00
neel	5b8dd1ad82	Remove a bogus check that flagged an error if the guest %rip was zero. An AP begins execution with %rip set to 0 after a startup IPI. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 01:46:22 +00:00
neel	77b2918286	Make the KTR tracepoints uniform and ensure that every VM-exit is logged. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 01:37:32 +00:00
neel	d07afdc371	Allow guest read access to MSR_EFER without hypervisor intervention. Dirty the VMCB_CACHE_CR state cache when MSR_EFER is modified.	2014-09-10 01:10:53 +00:00
neel	d6f50ad39f	Remove gratuitous forward declarations. Remove tabs on empty lines.	2014-09-09 23:39:43 +00:00
neel	5824b2ab21	Do proper ASID management for guest vcpus. Prior to this change an ASID was hard allocated to a guest and shared by all its vcpus. The meant that the number of VMs that could be created was limited to the number of ASIDs supported by the CPU. It was also inefficient because it forced a TLB flush on every VMRUN. With this change the number of guests that can be created is independent of the number of available ASIDs. Also, the TLB is flushed only when a new ASID is allocated. Discussed with: grehan Reviewed by: Anish Gupta (akgupt3@gmail.com)	2014-09-06 19:02:52 +00:00
jhb	3a8cf1a38b	Create a separate structure for per-CPU state saved across suspend and resume that is a superset of a pcb. Move the FPU state out of the pcb and into this new structure. As part of this, move the FPU resume code on amd64 into a C function. This allows resumectx() to still operate only on a pcb and more closely mirrors the i386 code. Reviewed by: kib (earlier version)	2014-09-06 15:23:28 +00:00
neel	8decf07cf3	Merge svm_set_vmcb() and svm_init_vmcb() into a single function that is called just once when a vcpu is initialized. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-05 03:33:16 +00:00
pfg	a4f8957a94	Apply known workarounds for modern MacBooks. The legacy USB circuit tends to give trouble on MacBook. While the original report covered MacBook, extend the fix preemptively for the newer MacBookPro too. PR: 191693 Reviewed by: emaste MFC after: 5 days	2014-09-05 01:06:45 +00:00
markj	a3123049c0	Add mrsas(4) to GENERIC for i386 and amd64. Approved by: ambrisko, kadesai MFC after: 3 days	2014-09-04 21:06:33 +00:00
jhb	f7c94cc497	Merge the amd64 and i386 identcpu.c into a single x86 implementation. This brings the structured extended features mask and VT-x reporting to i386 and Intel cache and TLB info (under bootverbose) to amd64.	2014-09-04 14:26:25 +00:00
neel	9c9ff7e250	Remove unused header file. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-04 06:07:32 +00:00
neel	4038b1d8f9	Consolidate the code to restore the host TSS after a #VMEXIT into a single function restore_host_tss(). Don't bother to restore MSR_KGSBASE after a #VMEXIT since it is not used in the kernel. It will be restored on return to userspace. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-04 06:00:18 +00:00
jhb	c102e50e93	Remove trailing whitespace.	2014-09-04 01:56:15 +00:00
jhb	1e7d9a1324	- Move prototypes for various functions into out of C files and into <machine/md_var.h>. - Move some CPU-related variables out of i386/i386/identcpu.c to initcpu.c to match amd64. - Move the declaration of has_f00f_hack out of identcpu.c to machdep.c. - Remove a misleading comment from i386/i386/initcpu.c (locore zeros the BSS before it calls identify_cpu()) and remove explicit zero assignments to reduce the diff with amd64.	2014-09-04 01:46:06 +00:00
neel	e5d2f8730c	IFC @r269962 Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-09-02 04:22:42 +00:00
alc	8b014a347b	Update a comment to reflect the changes in r213408. MFC after: 5 days	2014-09-02 04:11:20 +00:00
neel	15d3c1ec17	The "SUB" instruction used in getcc() actually does 'x -= y' so use the proper constraint for 'x'. The "+r" constraint indicates that 'x' is an input and output register operand. While here generate code for different variants of getcc() using a macro GETCC(sz) where 'sz' indicates the operand size. Update the status bits in %rflags when emulating AND and OR opcodes. Reviewed by: grehan	2014-08-30 19:59:42 +00:00
pfg	27db27f321	Minor space/tab cleanups. Most of them were ripped from the GSoC 2104 SMAP + kpatch project. This is only a cosmetic change. Taken from: Oliver Pinter (op@) MFC after: 5 days	2014-08-30 15:41:07 +00:00
jhb	9d531dc3c1	- Add a new structure type for the ACPI 3.0 SMAP entry that includes the optional attributes field. - Add a 'machdep.smap' sysctl that exports the SMAP table of the running system as an array of the ACPI 3.0 structure. (On older systems, the attributes are given a value of zero.) Note that the sysctl only exports the SMAP table if it is available via the metadata passed from the loader to the kernel. If an SMAP is not available, an empty array is returned. - Add a format handler for the ACPI 3.0 SMAP structure to the sysctl(8) binary to format the SMAP structures in a readable format similar to the format found in boot messages. MFC after: 2 weeks	2014-08-29 21:25:47 +00:00
grehan	4900dd0aa5	Implement the 0x2B SUB instruction, and the OR variant of 0x81. Found with local APIC accesses from bitrig/amd64 bsd.rd, 07/15-snap. Reviewed by: neel MFC after: 3 days	2014-08-27 00:53:56 +00:00
neel	37c34582e9	An exception is allowed to be injected even if the vcpu is in an interrupt shadow, so move the check for pending exception before bailing out due to an interrupt shadow. Change return type of 'vmcb_eventinject()' to a void and convert all error returns into KASSERTs. Fix VMCB_EXITINTINFO_EC(x) and VMCB_EXITINTINFO_TYPE(x) to do the shift before masking the result. Reviewed by: Anish Gupta (akgupt3@gmail.com)	2014-08-25 00:58:20 +00:00
grehan	79d2bf4036	Change __inline style to be consistent with FreeBSD usage, and also fix gcc build (on STABLE, when MFCd). PR: 192880 Reviewed by: neel Reported by: ngie MFC after: 1 day	2014-08-24 02:07:34 +00:00
neel	ca77633f85	Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package" tunables to modify the default cpu topology advertised by bhyve. Also add a tunable "hw.vmm.topology.cpuid_leaf_b" to disable the CPUID leaf 0xb. This is intended for testing guest behavior when it falls back on using CPUID leaf 0x4 to deduce CPU topology. The default behavior is to advertise each vcpu as a core in a separate soket.	2014-08-24 01:10:06 +00:00
neel	33474e62d3	Fix a bug in the emulation of CPUID leaf 0x4 where bhyve was claiming that the vcpu had no caches at all. This causes problems when executing applications in the guest compiled with the Intel compiler. Submitted by: Mark Hill (mark.hill@tidalscale.com)	2014-08-23 22:44:31 +00:00
neel	ea3ac18997	Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannot find any unmasked pin with an interrupt asserted. Reviewed by: tychon CR: https://reviews.freebsd.org/D669 MFC after: 1 week	2014-08-23 21:16:26 +00:00
jhb	711e51996a	Fix build of si(4) and enable it in LINT on amd64 and i386.	2014-08-20 16:07:17 +00:00
jhb	4b82aff648	Bump MAXCPU on amd64 from 64 to 256. In practice APIC only permits 255 CPUs (IDs 0 through 254). Getting above that limit requires x2APIC. MFC after: 1 month	2014-08-20 16:06:24 +00:00
kib	d9b5edee7d	Increase max number of physical segments on amd64 to 63. Eventually, the vmd_segs of the struct vm_domain should become bitset instead of long, to allow arbitrary compile-time selected maximum. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-20 08:07:08 +00:00
alc	21285cbfcd	There exists a possible sequence of page table page allocation failures starting with a superpage demotion by pmap_enter() that could result in a PV list lock being held when pmap_enter() is just about to return KERN_RESOURCE_SHORTAGE. Consequently, the KASSERT that no PV list locks are held needs to be replaced with a conditional unlock. Discussed with: kib X-MFC with: r269728 Sponsored by: EMC / Isilon Storage Division	2014-08-18 20:28:08 +00:00
gavin	0c913b9f59	Update i386/NOTES and amd64/NOTES files to contain the complete list of firmwares for iwn(4) and sort them. MFC after: 1 week	2014-08-14 18:29:55 +00:00
neel	69905f3734	Reword comment to match the interrupt mode names from the MPtable spec. Reviewed by: tychon	2014-08-14 18:03:38 +00:00
neel	53369652fd	Use the max guest memory address when creating its iommu domain. Also, assert that the GPA being mapped in the domain is less than its maxaddr. Reviewed by: grehan Pointed out by: Anish Gupta (akgupt3@gmail.com)	2014-08-14 05:00:45 +00:00
alc	6ec80ed21e	Update the text of a KASSERT() to reflect the changes in r269728.	2014-08-09 17:13:02 +00:00
kib	094158b3f2	Change pmap_enter(9) interface to take flags parameter and superpage mapping size (currently unused). The flags includes the fault access bits, wired flag as PMAP_ENTER_WIRED, and a new flag PMAP_ENTER_NOSLEEP to indicate that pmap should not sleep. For powerpc aim both 32 and 64 bit, fix implementation to ensure that the requested mapping is created when PMAP_ENTER_NOSLEEP is not specified, in particular, wait for the available memory required to proceed. In collaboration with: alc Tested by: nwhitehorn (ppc aim32 and booke) Sponsored by: The FreeBSD Foundation and EMC / Isilon Storage Division MFC after: 2 weeks	2014-08-08 17:12:03 +00:00
neel	6e79e6c83b	Support PCI extended config space in bhyve. Add the ACPI MCFG table to advertise the extended config memory window. Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted or moved in the guest's address space. The PCI extended config space is an example of an immutable memory range. Add emulation for the "movzw" instruction. This instruction is used by FreeBSD to read a 16-bit extended config space register. CR: https://phabric.freebsd.org/D505 Reviewed by: jhb, grehan Requested by: tychon	2014-08-08 03:49:01 +00:00
glebius	991a1f338d	Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c The MD allocators were very common, however there were some minor differencies. These differencies were all consolidated in the MI allocator, under ifdefs. The defines from machine/vmparam.h turn on features required for a particular machine. For details look in the comment in sys/sf_buf.h. As result no MD code left in sys///vm_machdep.c. Some arches still have machine/sf_buf.h, which is usually quite small. Tested by: glebius (i386), tuexen (arm32), kevlo (arm32) Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-08-05 09:44:10 +00:00
alc	38b6c535da	Retire pmap_change_wiring(). We have never used it to wire virtual pages. We continue to use pmap_enter() for that. For unwiring virtual pages, we now use pmap_unwire(), which unwires a range of virtual addresses instead of a single virtual page. Sponsored by: EMC / Isilon Storage Division	2014-08-03 20:40:51 +00:00
jhb	05d21354c3	- Output a summary of optional VT-x features in dmesg similar to CPU features. If bootverbose is enabled, a detailed list is provided; otherwise, a single-line summary is displayed. - Add read-only sysctls for optional VT-x capabilities used by bhyve under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that indicate the presence of optional capabilities under this node. CR: https://phabric.freebsd.org/D498 Reviewed by: grehan, neel MFC after: 1 week	2014-07-30 00:00:12 +00:00
neel	20e3e8762f	If a vcpu has issued a HLT instruction with interrupts disabled then it sleeps forever in vm_handle_hlt(). This is usually not an issue as long as one of the other vcpus properly resets or powers off the virtual machine. However, if the bhyve(8) process is killed with a signal the halted vcpu cannot be woken up because it's sleep cannot be interrupted. Fix this by waking up periodically and returning from vm_handle_hlt() if TDF_ASTPENDING is set. Reported by: Leon Dang Sponsored by: Nahanni Systems	2014-07-26 02:53:51 +00:00
neel	62d591cec9	Don't return -1 from the push emulation handler. Negative return values are interpreted specially on return from sys_ioctl() and may cause undesirable side-effects like restarting the system call.	2014-07-26 02:51:46 +00:00
neel	563faffd70	Fix a couple of issues in the PUSH emulation: It is not possible to PUSH a 32-bit operand on the stack in 64-bit mode. The default operand size for PUSH is 64-bits and the operand size override prefix changes that to 16-bits. vm_copy_setup() can return '1' if it encounters a fault when walking the guest page tables. This is a guest issue and is now handled properly by resuming the guest to handle the fault.	2014-07-24 23:01:53 +00:00
marius	3179fa4baa	Copying pages via temporary mappings in the !DMAP case of pmap_copy_pages() involves updating the corresponding page tables followed by accesses to the pages in question. This sequence is subject to the situation exactly described in the "AMD64 Architecture Programmer's Manual Volume 2: System Programming" rev. 3.23, "7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore, issuing the INVLPG right after modifying the PTE bits is crucial (see also r269050). For the amd64 PMAP code, the order of instructions was already correct. The above fact still is worth documenting, though. 1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf Reviewed by: alc Sponsored by: Bally Wulff Games & Entertainment GmbH	2014-07-24 10:12:22 +00:00
neel	4535fa67c4	Fix fault injection in bhyve. The faulting instruction needs to be restarted when the exception handler is done handling the fault. bhyve now does this correctly by setting 'vmexit[vcpu].inst_length' to zero so the %rip is not advanced. A minor complication is that the fault injection APIs are used by instruction emulation code that is shared by vmm.ko and bhyve. Thus the argument that refers to 'struct vm ' in kernel or 'struct vmctx ' in userspace needs to be loosely typed as a 'void *'.	2014-07-24 01:38:11 +00:00
royger	0ee303bbe3	don't set CR4 PSE bit on amd64 Setting PSE together with PAE or in long mode just makes the PSE bit completely ignored, so don't set it. Sponsored by: Citrix Systems R&D Reviewed by: kib	2014-07-23 15:53:29 +00:00
neel	e972917c13	Emulate instructions emitted by OpenBSD/i386 version 5.5: - CMP REG, r/m - MOV AX/EAX/RAX, moffset - MOV moffset, AX/EAX/RAX - PUSH r/m	2014-07-23 04:28:51 +00:00
emaste	56b12303b3	Don't pass null kmdp to preload_search_info On Xen PVH guests kmdp == NULL. Submitted by: royger MFC after: 3 days Sponsored by: The FreeBSD Foundation	2014-07-22 13:58:33 +00:00
markj	07a0bd68a5	Fix the build when DTrace isn't enabled. Reported by: stefanf X-MFC-With: r268600	2014-07-20 18:44:56 +00:00
neel	7b1204cc02	Fix build without INVARIANTS defined by getting rid of unused variable 'exc'. Reported by: adrian, stefanf	2014-07-20 16:34:35 +00:00
neel	1f15eea2e0	Handle nested exceptions in bhyve. A nested exception condition arises when a second exception is triggered while delivering the first exception. Most nested exceptions can be handled serially but some are converted into a double fault. If an exception is generated during delivery of a double fault then the virtual machine shuts down as a result of a triple fault. vm_exit_intinfo() is used to record that a VM-exit happened while an event was being delivered through the IDT. If an exception is triggered while handling the VM-exit it will be treated like a nested exception. vm_entry_intinfo() is used by processor-specific code to get the event to be injected into the guest on the next VM-entry. This function is responsible for deciding the disposition of nested exceptions.	2014-07-19 20:59:08 +00:00
markj	e18e12eeda	Use a C wrapper for trap() instead of checking and calling the DTrace trap hook in assembly. Suggested by: kib Reviewed by: kib (original version) X-MFC-With: r268600	2014-07-19 02:27:31 +00:00
neel	5046d9cb8a	Add emulation for legacy x86 task switching mechanism. FreeBSD/i386 uses task switching to handle double fault exceptions and this change enables that to work. Reported by: glebius	2014-07-16 21:26:26 +00:00
neel	eb07e4ed55	Add support for operand size and address size override prefixes in bhyve's instruction emulation [1]. Fix bug in emulation of opcode 0x8A where the destination is a legacy high byte register and the guest vcpu is in 32-bit mode. Prior to this change instead of modifying %ah, %bh, %ch or %dh the emulation would end up modifying %spl, %bpl, %sil or %dil instead. Add support for moffsets by treating it as a 2, 4 or 8 byte immediate value during instruction decoding. Fix bug in verify_gla() where the linear address computed after decoding the instruction was not being truncated to the effective address size [2]. Tested by: Leon Dang [1] Reported by: Peter Grehan [2] Sponsored by: Nahanni Systems	2014-07-15 17:37:17 +00:00
kib	1ecfe30151	Make amd64 pmap_copy_pages() functional for pages not mapped by DMAP. Requested and reviewed by: royger Tested by: pho, royger Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-15 09:30:43 +00:00
markj	880dd1a983	Invoke the DTrace trap handler before calling trap() on amd64. This matches the upstream implementation and helps ensure that a trap induced by tracing fbt::trap:entry is handled without recursively generating another trap. This makes it possible to run most (but not all) of the DTrace tests under common/safety/ without triggering a kernel panic. Submitted by: Anton Rang <anton.rang@isilon.com> (original version) Phabric: D95	2014-07-14 04:38:17 +00:00
neel	307c44649f	Use the correct offset when converting a logical address (segment:offset) to a linear address.	2014-07-11 01:23:38 +00:00
kib	729061be23	For safety, ensure that any consumer of the set_regs() and ptrace_set_pc() use the correct return to userspace using iret. The signal return, PT_CONTINUE (which in fact uses signal return path) set the pcb flag already. The setcontext(2) enforces iret return when %rip is incorrect. Due to this, the change is redundand, but is made to ensure that no path which modifies context, forgets to set PCB_FULL_IRET. Inspired by: CVE-2014-4699 Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-09 21:39:40 +00:00
neel	845f7be2e3	Accurately identify the vcpu's operating mode as 64-bit, compatibility, protected or real.	2014-07-08 21:48:57 +00:00
neel	d5633f89da	Invalidate guest TLB mappings as a side-effect of its CR3 being updated. This is a pre-requisite for task switch emulation since the CR3 is loaded from the new TSS.	2014-07-08 20:51:03 +00:00
kib	ae88c29379	Correct si_code for the SIGBUS signal generated by the alignment trap. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-08 08:05:42 +00:00
alc	d74e85dbb9	Introduce pmap_unwire(). It will replace pmap_change_wiring(). There are several reasons for this change: pmap_change_wiring() has never (in my memory) been used to set the wired attribute on a virtual page. We have always used pmap_enter() to do that. Moreover, it is not really safe to use pmap_change_wiring() to set the wired attribute on a virtual page. The description of pmap_change_wiring() says that it assumes the existence of a mapping in the pmap. However, non-wired mappings may be reclaimed by the pmap at any time. (See pmap_collect().) Many implementations of pmap_change_wiring() will crash if the mapping does not exist. pmap_unwire() accepts a range of virtual addresses, whereas pmap_change_wiring() acts upon a single virtual page. Since we are typically unwiring a range of virtual addresses, pmap_unwire() will be more efficient. Moreover, pmap_unwire() allows us to unwire superpage mappings. Previously, we were forced to demote the superpage mapping, because pmap_change_wiring() only allowed us to express the unwiring of a single base page mapping at a time. This added to the overhead of unwiring for large ranges of addresses, including the implicit unwiring that occurs at process termination. Implementations for arm and powerpc will follow. Discussed with: jeff, marcel Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-07-06 17:42:38 +00:00
emaste	9825a4c806	Prefer vt(4) for UEFI boot The UEFI framebuffer driver vt_efifb requires vt(4), so add a mechanism for the startup routine to set the preferred console. This change is ugly because console init happens very early in the boot, making a cleaner interface difficult. This change is intended only to facilitate the sc(4) / vt(4) transition, and can be reverted once vt(4) is the default.	2014-07-02 13:24:21 +00:00
emaste	10d8b7a43b	Add vt(4) devices and options to NOTES Reviewed by: marius (earlier version)	2014-07-01 00:22:54 +00:00
emaste	e6dbbf35ca	Add vt(4) to GENERIC and retire the separate VT config vt(4) and sc(4) can now coexist in the same kernel. To choose the vt driver, set the loader tunable kern.vty=vt .	2014-06-30 16:18:38 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
tychon	816d8c3faa	Add support for emulating the move instruction: "mov r/m8, imm8". Reviewed by: neel	2014-06-26 17:15:41 +00:00
grehan	54db9f3822	Expose the amount of resident and wired memory from the guest's vmspace. This is different than the amount shown for the process e.g. by /usr/bin/top - that is the mappings faulted in by the mmap'd region of guest memory. The values can be fetched with bhyvectl # bhyvectl --get-stats --vm=myvm ... Resident memory 413749248 Wired memory 0 ... vmm_stat.[ch] - Modify the counter code in bhyve to allow direct setting of a counter as opposed to incrementing, and providing a callback to fetch a counter's value. Reviewed by: neel	2014-06-25 22:13:35 +00:00
kib	fe547198b1	Add FPU_KERN_KTHR flag to fpu_kern_enter(9), which avoids saving FPU context into memory for the kernel threads which called fpu_kern_thread(9). This allows the fpu_kern_enter() callers to not check for is_fpu_kern_thread() to get the optimization. Apply the flag to padlock(4) and aesni(4). In aesni_cipher_process(), do not leak FPU context state on error. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-23 07:37:54 +00:00
dchagin	dd6bed9dd2	Revert r266925 as it can lead to instant panic at fexecve(): To allow to run the interpreter itself add a new ELF branding type. Pointed out by: kib, mjg	2014-06-17 05:29:18 +00:00
tychon	bb415f07f0	Bring an overly enthusiastic KASSERT inline with the Intel SDM. Reviewed by: neel	2014-06-16 22:59:18 +00:00
attilio	2802c525ad	- Modify vm_page_unwire() and vm_page_enqueue() to directly accept the queue where to enqueue pages that are going to be unwired. - Add stronger checks to the enqueue/dequeue for the pagequeues when adding and removing pages to them. Of course, for unmanaged pages the queue parameter of vm_page_unwire() will be ignored, just as the active parameter today. This makes adding new pagequeues quicker. This change effectively modifies the KPI. __FreeBSD_version will be, however, bumped just when the full cache of free pages will be evicted. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2014-06-16 18:15:27 +00:00

1 2 3 4 5 ...

7242 Commits