freebsd-skq

Author	SHA1	Message	Date
kib	1ecfe30151	Make amd64 pmap_copy_pages() functional for pages not mapped by DMAP. Requested and reviewed by: royger Tested by: pho, royger Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-15 09:30:43 +00:00
markj	880dd1a983	Invoke the DTrace trap handler before calling trap() on amd64. This matches the upstream implementation and helps ensure that a trap induced by tracing fbt::trap:entry is handled without recursively generating another trap. This makes it possible to run most (but not all) of the DTrace tests under common/safety/ without triggering a kernel panic. Submitted by: Anton Rang <anton.rang@isilon.com> (original version) Phabric: D95	2014-07-14 04:38:17 +00:00
neel	307c44649f	Use the correct offset when converting a logical address (segment:offset) to a linear address.	2014-07-11 01:23:38 +00:00
kib	729061be23	For safety, ensure that any consumer of the set_regs() and ptrace_set_pc() use the correct return to userspace using iret. The signal return, PT_CONTINUE (which in fact uses signal return path) set the pcb flag already. The setcontext(2) enforces iret return when %rip is incorrect. Due to this, the change is redundand, but is made to ensure that no path which modifies context, forgets to set PCB_FULL_IRET. Inspired by: CVE-2014-4699 Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-09 21:39:40 +00:00
neel	845f7be2e3	Accurately identify the vcpu's operating mode as 64-bit, compatibility, protected or real.	2014-07-08 21:48:57 +00:00
neel	d5633f89da	Invalidate guest TLB mappings as a side-effect of its CR3 being updated. This is a pre-requisite for task switch emulation since the CR3 is loaded from the new TSS.	2014-07-08 20:51:03 +00:00
kib	ae88c29379	Correct si_code for the SIGBUS signal generated by the alignment trap. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-08 08:05:42 +00:00
alc	d74e85dbb9	Introduce pmap_unwire(). It will replace pmap_change_wiring(). There are several reasons for this change: pmap_change_wiring() has never (in my memory) been used to set the wired attribute on a virtual page. We have always used pmap_enter() to do that. Moreover, it is not really safe to use pmap_change_wiring() to set the wired attribute on a virtual page. The description of pmap_change_wiring() says that it assumes the existence of a mapping in the pmap. However, non-wired mappings may be reclaimed by the pmap at any time. (See pmap_collect().) Many implementations of pmap_change_wiring() will crash if the mapping does not exist. pmap_unwire() accepts a range of virtual addresses, whereas pmap_change_wiring() acts upon a single virtual page. Since we are typically unwiring a range of virtual addresses, pmap_unwire() will be more efficient. Moreover, pmap_unwire() allows us to unwire superpage mappings. Previously, we were forced to demote the superpage mapping, because pmap_change_wiring() only allowed us to express the unwiring of a single base page mapping at a time. This added to the overhead of unwiring for large ranges of addresses, including the implicit unwiring that occurs at process termination. Implementations for arm and powerpc will follow. Discussed with: jeff, marcel Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-07-06 17:42:38 +00:00
emaste	9825a4c806	Prefer vt(4) for UEFI boot The UEFI framebuffer driver vt_efifb requires vt(4), so add a mechanism for the startup routine to set the preferred console. This change is ugly because console init happens very early in the boot, making a cleaner interface difficult. This change is intended only to facilitate the sc(4) / vt(4) transition, and can be reverted once vt(4) is the default.	2014-07-02 13:24:21 +00:00
emaste	10d8b7a43b	Add vt(4) devices and options to NOTES Reviewed by: marius (earlier version)	2014-07-01 00:22:54 +00:00
emaste	e6dbbf35ca	Add vt(4) to GENERIC and retire the separate VT config vt(4) and sc(4) can now coexist in the same kernel. To choose the vt driver, set the loader tunable kern.vty=vt .	2014-06-30 16:18:38 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
tychon	816d8c3faa	Add support for emulating the move instruction: "mov r/m8, imm8". Reviewed by: neel	2014-06-26 17:15:41 +00:00
grehan	54db9f3822	Expose the amount of resident and wired memory from the guest's vmspace. This is different than the amount shown for the process e.g. by /usr/bin/top - that is the mappings faulted in by the mmap'd region of guest memory. The values can be fetched with bhyvectl # bhyvectl --get-stats --vm=myvm ... Resident memory 413749248 Wired memory 0 ... vmm_stat.[ch] - Modify the counter code in bhyve to allow direct setting of a counter as opposed to incrementing, and providing a callback to fetch a counter's value. Reviewed by: neel	2014-06-25 22:13:35 +00:00
kib	fe547198b1	Add FPU_KERN_KTHR flag to fpu_kern_enter(9), which avoids saving FPU context into memory for the kernel threads which called fpu_kern_thread(9). This allows the fpu_kern_enter() callers to not check for is_fpu_kern_thread() to get the optimization. Apply the flag to padlock(4) and aesni(4). In aesni_cipher_process(), do not leak FPU context state on error. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-23 07:37:54 +00:00
dchagin	dd6bed9dd2	Revert r266925 as it can lead to instant panic at fexecve(): To allow to run the interpreter itself add a new ELF branding type. Pointed out by: kib, mjg	2014-06-17 05:29:18 +00:00
tychon	bb415f07f0	Bring an overly enthusiastic KASSERT inline with the Intel SDM. Reviewed by: neel	2014-06-16 22:59:18 +00:00
attilio	2802c525ad	- Modify vm_page_unwire() and vm_page_enqueue() to directly accept the queue where to enqueue pages that are going to be unwired. - Add stronger checks to the enqueue/dequeue for the pagequeues when adding and removing pages to them. Of course, for unmanaged pages the queue parameter of vm_page_unwire() will be ignored, just as the active parameter today. This makes adding new pagequeues quicker. This change effectively modifies the KPI. __FreeBSD_version will be, however, bumped just when the full cache of free pages will be evicted. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2014-06-16 18:15:27 +00:00
royger	7c7f3fb2d0	amd64/i386: introduce APIC hooks for different APIC implementations. This is needed for Xen PV(H) guests, since there's no hardware lapic available on this kind of domains. This commit should not change functionality. Sponsored by: Citrix Systems R&D Reviewed by: jhb Approved by: gibbs amd64/include/cpu.h: amd64/amd64/mp_machdep.c: i386/include/cpu.h: i386/i386/mp_machdep.c: - Remove lapic_ipi_vectored hook from cpu_ops, since it's now implemented in the lapic hooks. amd64/amd64/mp_machdep.c: i386/i386/mp_machdep.c: - Use lapic_ipi_vectored directly, since it's now an inline function that will call the appropiate hook. x86/x86/local_apic.c: - Prefix bare metal public lapic functions with native_ and mark them as static. - Define default implementation of apic_ops. x86/include/apicvar.h: - Declare the apic_ops structure and create inline functions to access the hooks, so the change is transparent to existing users of the lapic_ functions. x86/xen/hvm.c: - Switch to use the new apic_ops.	2014-06-16 08:43:03 +00:00
neel	8c7e29c295	Disable global interrupts early so all the software state maintained by bhyve is sampled "atomically". Any interrupts after this point will be held pending by the CPU until the guest starts executing and will immediately trigger a #VMEXIT. Reviewed by: Anish Gupta (akgupt3@gmail.com)	2014-06-11 17:48:07 +00:00
tychon	e250a91c1d	Replace enum forward declarations with complete definitions. Reviewed by: neel	2014-06-10 18:46:00 +00:00
neel	e48c89801a	Add helper functions to populate VM exit information for rendezvous and astpending exits. This is to reduce code duplication between VT-x and SVM implementations.	2014-06-10 16:45:58 +00:00
neel	32f7809be1	Turn on interrupt window exiting unconditionally when an ExtINT is being injected into the guest. This allows the hypervisor to inject another ExtINT or APIC vector as soon as the guest is able to process interrupts. This change is not to address any correctness issue but to guarantee that any pending APIC vector that was preempted by the ExtINT will be injected as soon as possible. Prior to this change such pending interrupts could be delayed until the next VM exit.	2014-06-10 01:38:02 +00:00
grehan	adbd7deabf	Temporary fix for guest idle detection. Handle ExtINT injection for SVM. The HPET emulation will inject a legacy interrupt at startup, and if this isn't handled, will result in the HLT-exit code assuming there are outstanding ExtINTs and return without sleeping. svm_inj_interrupts() needs more changes to bring it up to date with the VT-x version: these are forthcoming. Reviewed by: neel	2014-06-09 21:02:48 +00:00
neel	d4bb0b204a	Add reserved bit checking when doing %CR8 emulation and inject #GP if required. Pointed out by: grehan Reviewed by: tychon	2014-06-09 20:51:08 +00:00
grehan	fe997346e0	Allow the TSC MSR to be accessed directly from the guest.	2014-06-07 23:08:06 +00:00
grehan	afc0a3433a	Set the guest PAT MSR in the VMCB to power-on defaults. Linux guests accept the values in this register, while *BSD guests reprogram it. Default values of zero correspond to PAT_UNCACHEABLE, resulting in glacial performance. Thanks to Willem Jan Withagen for first reporting this and helping out with the investigation.	2014-06-07 23:05:12 +00:00
neel	80a67d54c4	Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained by vmm.ko. This allows the virtual machine to be restarted without having to destroy it first. Reviewed by: grehan	2014-06-07 21:36:52 +00:00
alc	39548e640f	Add a page size field to struct vm_page. Increase the page size field when a partially populated reservation becomes fully populated, and decrease this field when a fully populated reservation becomes partially populated. Use this field to simplify the implementation of pmap_enter_object() on amd64, arm, and i386. On all architectures where we support superpages, the cost of creating a superpage mapping is roughly the same as creating a base page mapping. For example, both kinds of mappings entail the creation of a single PTE and PV entry. With this in mind, use the page size field to make the implementation of vm_map_pmap_enter(..., MAP_PREFAULT_PARTIAL) a little smarter. Previously, if MAP_PREFAULT_PARTIAL was specified to vm_map_pmap_enter(), that function would only map base pages. Now, it will create up to 96 base page or superpage mappings. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-06-07 17:12:26 +00:00
tychon	c04c953593	Support guest accesses to %cr8. Reviewed by: neel	2014-06-06 18:23:49 +00:00
imp	7694525189	Restore comments accidentally removed. MFC after: 3 days	2014-06-06 04:08:55 +00:00
grehan	f1ed4b50ae	ins/outs support for SVM. Modelled on the Intel VT-x code. Remove CR2 save/restore - the guest restore/save is done in hardware, and there is no need to save/restore the host version (same as VT-x). Submitted by: neel (SVM segment descriptor 'P' bit code) Reviewed by: neel	2014-06-06 02:55:18 +00:00
grehan	39adc03910	Allow the guest's CR2 value to be read/written. This is required for page-fault injection.	2014-06-05 06:29:18 +00:00
grehan	2374fa6276	Use API call when VM is detected as suspended. This fixes the (harmless) error message on exit: vmexit_suspend: invalid reason 217645057 Reviewed by: neel, Anish Gupta (akgupt3@gmail.com)	2014-06-03 22:26:46 +00:00
grehan	5e6423ee3b	Bring (almost) up-to-date with HEAD. - use the new virtual APIC page - update to current bhyve APIs Tested by Anish with multiple FreeBSD SMP VMs on a Phenom, and verified by myself with light FreeBSD VM testing on a Sempron 3850 APU. The issues reported with Linux guests are very likely to still be here, but this sync eliminates the skew between the project branch and CURRENT, and should help to determine the causes. Some follow-on commits will fix minor cosmetic issues. Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-06-03 06:56:54 +00:00
grehan	95f7c2f56c	MFC @ r266724 An SVM update will follow this.	2014-06-03 02:34:21 +00:00
neel	9c2a942387	Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing it implicitly in vmm.ko. Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and "--get-suspended-cpus" options. This is in preparation for being able to reset virtual machine state without having to destroy and recreate it.	2014-05-31 23:37:34 +00:00
dchagin	538f396887	To allow to run the interpreter itself add a new ELF branding type. Allow Linux ABI to run ELF interpreter. MFC after: 3 days	2014-05-31 15:01:51 +00:00
tychon	61025dc75e	If VMX isn't enabled so long as the lock bit isn't set yet in MSR IA32_FEATURE_CONTROL it still can be. Approved by: grehan (co-mentor)	2014-05-30 23:37:31 +00:00
neel	0a0e9fcd5a	Remove bogus check for kmem_malloc() failure even though M_WAITOK is set. Requested by: jkim	2014-05-30 20:58:32 +00:00
neel	aefe217075	Allocate a zeroed LDT. Failing to do this might result in the LDT appearing to run out of free descriptors because of random junk in the descriptor's 'sd_type' field. http://lists.freebsd.org/pipermail/freebsd-amd64/2014-May/016088.html Reviewed by: kib MFC after: 2 weeks	2014-05-30 18:59:37 +00:00
kib	7c98ae3376	When usermode loaded non-default segment selector into the %gs, correctly prepare KGSBASE msr to restore the user descriptor base on the last swapgs during return to usermode. Reported and tested by: peterj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-05-29 16:18:31 +00:00
markj	4c818572b7	Commit the rest of the changes that were intended to be part of r266826. X-MFC-with: r266826	2014-05-29 01:42:22 +00:00
jhb	e3d386d9fe	- Rework the XSAVE/XRSTOR emulation to only expose XCR0 features to the guest for which the rules regarding xsetbv emulation are known. In particular future extensions like AVX-512 have interdependencies among feature bits that could allow a guest to trigger a GP# in the host with the current approach of allowing anything the host supports. - Add proper checking of Intel MPX and AVX-512 XSAVE features in the xsetbv emulation and allow these features to be exposed to the guest if they are enabled in the host. - Expose a subset of known-safe features from leaf 0 of the structured extended features to guests if they are supported on the host including RDFSBASE/RDGSBASE, BMI1/2, AVX2, AVX-512, HLE, ERMS, and RTM. Aside from AVX-512, these features are all new instructions available for use in ring 3 with no additional hypervisor changes needed. Reviewed by: neel	2014-05-27 19:04:38 +00:00
neel	4b40e47cf8	Add segment protection and limits violation checks in vie_calculate_gla() for 32-bit x86 guests. Tested using ins/outs executed in a FreeBSD/i386 guest.	2014-05-27 04:26:22 +00:00
neel	07a8a1c99a	Remove restriction on insb/insw/insl emulation. These instructions are properly emulated.	2014-05-25 02:05:23 +00:00
neel	ffc6a38259	Do the linear address calculation for the ins/outs emulation using a new API function 'vie_calculate_gla()'. While the current implementation is simplistic it forms the basis of doing segmentation checks if the guest is in 32-bit protected mode.	2014-05-25 00:57:24 +00:00
neel	51a05acc08	Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out of the guest linear address space. These APIs in turn use a new ioctl 'VM_GLA2GPA' to convert the guest linear address to guest physical. Use the new copyin/copyout APIs when emulating ins/outs instruction in bhyve(8).	2014-05-24 23:12:30 +00:00
neel	6a6e13c407	Consolidate all the information needed by the guest page table walker into 'struct vm_guest_paging'. Check for canonical addressing in vmm_gla2gpa() and inject a protection fault into the guest if a violation is detected. If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to point to the root of the page tables.	2014-05-24 20:26:57 +00:00
neel	52a4f11861	When injecting a page fault into the guest also update the guest's %cr2 to indicate the faulting linear address. If the guest PML4 entry has the PG_PS bit set then inject a page fault into the guest with the PGEX_RSV bit set in the error_code. Get rid of redundant checks for the PG_RW violations when walking the page tables.	2014-05-24 19:13:25 +00:00
neel	2ccda87aca	Check for alignment check violation when processing in/out string instructions.	2014-05-23 19:59:14 +00:00
neel	8f99933d82	Add emulation of the "outsb" instruction. NetBSD guests use this to write to the UART FIFO. The emulation is constrained in a number of ways: 64-bit only, doesn't check for all exception conditions, limited to i/o ports emulated in userspace. Some of these constraints will be relaxed in followup commits. Requested by: grehan Reviewed by: tychon (partially and a much earlier version)	2014-05-23 05:15:17 +00:00
neel	062bfb5ea3	A Centos 6.4 guest will write 0xff to the 8259 mask register before beginning the proper ICWx initialization sequence. It assumes, probably correctly, that the boot firmware has done the 8259 initialization. Since grub-bhyve does not initialize the 8259 this write to the mask register takes a code path in which 'error' remains uninitialized (ready=0,icw_num=0). Fix this by initializing 'error' at the start of the function.	2014-05-23 05:04:50 +00:00
jhb	9578fc0e8e	Don't permit users to request a subset of the AVX512 or MPX xsave masks. These masks are documented in the Intel Architecture Instruction Set Extensions Programming Reference (March 2014). Reviewed by: kib MFC after: 1 month	2014-05-22 18:22:02 +00:00
neel	f33a3d02f0	Allow vmx_getdesc() and vmx_setdesc() to be called for a vcpu that is in the VCPU_RUNNING state. This will let the VMX exit handler inspect the vcpu's segment descriptors without having to exit the critical section.	2014-05-22 17:22:37 +00:00
jhibbits	445bd25136	imagact_binmisc builds for all supported architectures, so enable it for all. Any bugs in execution will be dealt with as they crop up. MFC after: 3 weeks Relnotes: Yes	2014-05-22 05:04:40 +00:00
neel	645d479a58	Inject page fault into the guest if the page table walker detects an invalid translation for the guest linear address.	2014-05-22 03:14:54 +00:00
neel	6071e2741b	Add PG_RW check when translating a guest linear to guest physical address. Set the accessed and dirty bits in the page table entry. If it fails then restart the page table walk from the beginning. This might happen if another vcpu modifies the page tables simultaneously. Reviewed by: alc, kib	2014-05-20 20:30:28 +00:00
jhb	59c78787cb	Add support for decoding the AMD SVM instructions.	2014-05-19 18:07:37 +00:00
neel	b0752c3683	Add PG_U (user/supervisor) checks when translating a guest linear address to a guest physical address. PG_PS (page size) field is valid only in a PDE or a PDPTE so it is now checked only in non-terminal paging entries. Ignore the upper 32-bits of the CR3 for PAE paging.	2014-05-19 03:50:07 +00:00
grehan	9fa48763c0	Make the vmx asm code dtrace-fbt-friendly by - inserting frame enter/leave sequences - restructuring the vmx_enter_guest routine so that it subsumes the vm_exit_guest block, which was the #vmexit RIP and not a callable routine. Reviewed by: neel MFC after: 3 weeks	2014-05-18 03:50:17 +00:00
jhb	1c24706a80	Add support for decoding rdrand and rdseed.	2014-05-17 21:10:03 +00:00
jhb	db4e203198	Add definitions for more structured extended features as well as XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions). Obtained from: Intel's Instruction Set Extensions Programming Reference (March 2014)	2014-05-16 17:45:09 +00:00
jhb	f558af85b7	Implement a PCI interrupt router to route PCI legacy INTx interrupts to the legacy 8259A PICs. - Implement an ICH-comptabile PCI interrupt router on the lpc device with 8 steerable pins configured via config space access to byte-wide registers at 0x60-63 and 0x68-6b. - For each configured PCI INTx interrupt, route it to both an I/O APIC pin and a PCI interrupt router pin. When a PCI INTx interrupt is asserted, ensure that both pins are asserted. - Provide an initial routing of PCI interrupt router (PIRQ) pins to 8259A pins (ISA IRQs) and initialize the interrupt line config register for the corresponding PCI function with the ISA IRQ as this matches existing hardware. - Add a global _PIC method for OSPM to select the desired interrupt routing configuration. - Update the _PRT methods for PCI bridges to provide both APIC and legacy PRT tables and return the appropriate table based on the configured routing configuration. Note that if the lpc device is not configured, no routing information is provided. - When the lpc device is enabled, provide ACPI PCI link devices corresponding to each PIRQ pin. - Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A pins via the ELCR. - Mark the power management SCI as level triggered. - Don't hardcode the number of elements in Packages in the source for the DSDT. iasl(8) will fill in the actual number of elements, and this makes it simpler to generate a Package with a variable number of elements. Reviewed by: tycho	2014-05-15 14:16:55 +00:00
neel	5df866f4b1	Increase the TSS limit by one byte. The processor requires an additional byte with all bits set to 1 beyond the I/O permission bitmap. Prior to this change accessing I/O ports [0xFFF8-0xFFFF] would trigger a #GP fault even though the I/O bitmap allowed access to those ports. For more details see section "I/O Permission Bit Map" in the Intel SDM, Vol 1. Reviewed by: kib	2014-05-14 22:24:09 +00:00
neel	5fd692c3b5	Virtual machine halt detection is turned on by default. Allow it to be disabled via the tunable 'hw.vmm.halt_detection'.	2014-05-05 16:19:24 +00:00
nwhitehorn	34465d9bbe	Disable ACPI and P4TCC throttling by default, following discussion on freebsd-current. These CPU speed control techniques are usually unhelpful at best. For now, continue building the relevant code into GENERIC so that it can trivially be re-enabled at runtime if anyone wants it. MFC after: 1 month	2014-05-04 16:38:21 +00:00
ken	8f3f80c382	Bring in the mpr(4) driver for LSI's MPT3 12Gb SAS controllers. This is derived from the mps(4) driver, but it supports only the 12Gb IT and IR hardware including the SAS 3004, SAS 3008 and SAS 3108. Some notes about this driver: o The 12Gb hardware can do "FastPath" I/O, and that capability is included in this driver. o WarpDrive functionality has been removed, since it isn't supported in the 12Gb driver interface. o The Scatter/Gather list handling code is significantly different between the 6Gb and 12Gb hardware. The 12Gb boards support IEEE Scatter/Gather lists. Thanks to LSI for developing and testing this driver for FreeBSD. share/man/man4/mpr.4: mpr(4) man page. sys/dev/mpr/*: mpr(4) driver files. sys/modules/Makefile, sys/modules/mpr/Makefile: Add a module Makefile for the mpr(4) driver. sys/conf/files: Add the mpr(4) driver. sys/amd64/conf/GENERIC, sys/i386/conf/GENERIC, sys/mips/conf/OCTEON1, sys/sparc64/conf/GENERIC: Add the mpr(4) driver to all config files that currently have the mps(4) driver. sys/ia64/conf/GENERIC: Add the mps(4) and mpr(4) drivers to the ia64 GENERIC config file. sys/i386/conf/XEN: Exclude the mpr module from building here. Submitted by: Steve McConnell <Stephen.McConnell@lsi.com> MFC after: 3 days Tested by: Chris Reeves <chrisr@spectralogic.com> Sponsored by: LSI, Spectra Logic Relnotes: LSI 12Gb SAS driver mpr(4) added	2014-05-02 20:25:09 +00:00
eadler	382c3dae47	lindev(4): finish the partial commit in r265212 lindev(4) was only used to provide /dev/full which is now a standard feature of FreeBSD. /dev/full was never linux-specific and provides a generally useful feature. Document this in UPDATING and bump __FreeBSD_version. This will be documented in the PH shortly. Reported by: jkim	2014-05-02 07:14:22 +00:00
neel	b735ae5b9a	Add logic in the HLT exit handler to detect if the guest has put all vcpus to sleep permanently by executing a HLT with interrupts disabled. When this condition is detected the guest with be suspended with a reason of VM_SUSPEND_HALT and the bhyve(8) process will exit. Tested by executing "halt" inside a RHEL7-beta guest. Discussed with: grehan@ Reviewed by: jhb@, tychon@	2014-05-02 00:33:56 +00:00
neel	0601994645	Ignore writes to microcode update MSR. This MSR is accessed by RHEL7 guest. Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.	2014-04-30 02:08:27 +00:00
neel	9c85092013	Some Linux guests will implement a 'halt' by disabling the APIC and executing the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET). This functionality was broken in r263780 in a way that made it impossible to kill the bhyve(8) process because it would loop forever in vm_handle_suspend(). Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from a Linux guest will appear to be hung but this is consistent with the behavior on bare metal. The guest can be rebooted by using the bhyvectl options '--force-reset' or '--force-poweroff'. Reviewed by: grehan@	2014-04-29 18:42:56 +00:00
neel	b616a9a2e4	Allow a virtual machine to be forcibly reset or powered off. This is done by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF. The disposition of VM_SUSPEND is also made available to the exit handler via the 'u.suspended' member of 'struct vm_exit'. This capability is exposed via the '--force-reset' and '--force-poweroff' arguments to /usr/sbin/bhyvectl. Discussed with: grehan@	2014-04-28 22:06:40 +00:00
emaste	a864055bad	Report boot method (BIOS/UEFI) via sysctl machdep.bootmethod Sponsored by: The FreeBSD Foundation	2014-04-27 15:14:59 +00:00
kib	d581b6a9ab	Same as it was done in r263878 for invlrng_handler(), fix order of checks for special pcid values in invlpg_pcid_handler(). Forst check for special values, and only then do PCID-specific page invalidation. Minor fix to the style compliance, declare local variable at the function start. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-04-27 05:37:01 +00:00
nwhitehorn	e4853bbc44	Don't need this now. VT does the same thing, but better. Submitted by: gjb	2014-04-27 02:28:32 +00:00
nwhitehorn	d89a83edd0	Add vt_efifb to VT kernel configuration now that that actually works. This kernel will now boot on both BIOS and EFI systems without modification. Equivalent functionality in GENERIC requires making vt(9) the default console driver, which is probably appropriate at this point.	2014-04-27 02:22:21 +00:00
neel	ebf51bf73a	A VMCS is always inactive when it exits the vmx_run() loop. Remove redundant code and the misleading comment that suggest otherwise. Reviewed by: grehan@	2014-04-26 22:37:56 +00:00
scottl	62a64f0d2b	Retire smp_active. It was racey and caused demonstrated problems with the cpufreq code. Replace its use with smp_started. There's at least one userland tool that still looks at the kern.smp.active sysctl, so preserve it but point it to smp_started as well. Discussed with: peter, jhb MFC after: 3 days Obtained from: Netflix	2014-04-26 20:27:54 +00:00
gjb	69c3e6933b	Add a UEFI kernel configuration to include the VT kernel, and replace the vt_vga driver with vt_efifb. This is intended to help with snapshot builds only. There is no intention to MFC this commit. Sponsored by: The FreeBSD Foundation	2014-04-25 21:47:24 +00:00
royger	f6f9cb7a0f	xen: fix copyright header Some of the code in xen-locore.S was picked from Cherry G. Mathew amd64 Xen PV branch, but I've failed to set the proper copyright, so do it now. Approved by: gibbs	2014-04-24 14:44:42 +00:00
grehan	4f6dd265e1	Allow the guest to read the TSC via MSR 0x10. NetBSD/amd64 does this, as does Linux on AMD CPUs. Reviewed by: neel MFC after: 3 weeks	2014-04-24 00:27:34 +00:00
neel	360d54aa50	Change the vlapic timer frequency to be in the ballpark of contemporary hardware. This also decouples the vlapic emulation from the host's TSC frequency. Requested by: grehan@	2014-04-23 16:50:40 +00:00
tychon	f44a06b5a1	Factor out common ioport handler code for better hygiene -- pointed out by neel@. Approved by: neel (co-mentor)	2014-04-22 16:13:56 +00:00
tychon	d8c307b493	Add support for the PIT 'readback' command -- based on a patch by grehan@. Approved by: grehan (co-mentor)	2014-04-18 16:05:12 +00:00
tychon	2c52df9a16	Respect the destination operand size of the 'Input from Port' instruction. Approved by: grehan (co-mentor)	2014-04-18 15:22:56 +00:00
tychon	4d0de44f39	Add support for reading the PIT Counter 2 output signal via the NMI Status and Control register at port 0x61. Be more conservative about "catching up" callouts that were supposed to fire in the past by skipping an interrupt if it was scheduled too far in the past. Restore the PIT ACPI DSDT entries and add an entry for NMISC too. Approved by: neel (co-mentor)	2014-04-18 00:02:06 +00:00
jhb	03a8cfa7d9	Don't spindown the BSP if it executes hlt with the APIC disabled. A guest that doesn't use the APIC at all can trigger this, plus the BSP always needs to execute as it should trigger a reset, etc. Reviewed by: tychon	2014-04-15 20:53:53 +00:00
tychon	bbe78c2d72	Local APIC access via 32-bit naturally-aligned loads is merely suggested in the SDM. Since some OSes have implemented otherwise don't be too rigorous in enforcing it. Approved by: grehan (co-mentor)	2014-04-15 17:06:26 +00:00
tychon	04f26f5235	Add support for emulating the byte move and sign extend instructions: "movsx r/m8, r32" and "movsx r/m8, r64". Approved by: grehan (co-mentor)	2014-04-15 15:11:10 +00:00
tychon	5906c6773b	Add support for emulating the slave PIC. Reviewed by: grehan, jhb Approved by: grehan (co-mentor)	2014-04-14 19:00:20 +00:00
neel	335d93f16d	There is no need to save and restore the host's return address in the 'struct vmxctx'. It is preserved on the host stack across a guest entry and exit and just restoring the host's '%rsp' is sufficient. Pointed out by: grehan@	2014-04-11 20:15:53 +00:00
tychon	45ec65c336	Account for the "plus 1" encoding of the CPUID Function 4 reported core per package and cache sharing values. Approved by: grehan (co-mentor)	2014-04-11 18:19:21 +00:00
grehan	2ac5c08506	Rework r264179. - remove redundant code - remove erroneous setting of the error return in vmmdev_ioctl() - use style(9) initialization - in vmx_inject_pir(), document the race condition that the final conditional statement was detecting, Tested with both gcc and clang builds. Reviewed by: neel	2014-04-10 19:15:58 +00:00
sbruno	c5f634c8c7	Really, really, really only allow this option for amd64/i386 builds. Submitted by: imp@ and tinderbox	2014-04-09 18:44:54 +00:00
imp	e8da9ba992	Make the vmm code compile with gcc too. Not entirely sure things are correct for the pirbase test (since I'd have thought we'd need to do something even when the offset is 0 and that test looks like a misguided attempt to not use an uninitialized variable), but it is at least the same as today.	2014-04-05 22:43:23 +00:00
rstone	35a855d80f	Re-write bhyve's I/O MMU handling in terms of PCI RID. Reviewed by: neel MFC after: 2 months Sponsored by: Sandvine Inc.	2014-04-01 15:54:03 +00:00
rstone	120bf54d08	Revert PCI RID changes. My PCI RID changes somehow got intermixed with my PCI ARI patch when I committed it. I may have accidentally applied a patch to a non-clean working tree. Revert everything while I figure out what went wrong. Pointy hat to: rstone	2014-04-01 15:06:03 +00:00
rstone	4df7085933	Re-write bhyve's I/O MMU handling in terms of PCI RIDs Reviewed by: neel Sponsored by: Sandvine Inc	2014-04-01 14:54:43 +00:00
kib	120b1857f9	Clear the kernel grab of the FPU state on fork. The pcb_save pointer is already correctly reset to the FPU user save area, only PCB_KERNFPU flag might leak from old thread state into the new state. For creation of the user-mode thread, the change is nop since corresponding syscall code does not use FPU. On the other hand, creation of a kernel thread forks from a thread selected arbitrary from proc0, which might use FPU. Reported and tested by: Chris Torek <torek@torek.net> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-29 11:56:33 +00:00
kib	a6e5d8b248	Several fixes for the PCID implementation: - When clearing a bit for a cpuid in pmap->pm_save, ensure that the cpuid is not set in pm_active. The pm_save indicates which CPUs may have cached translations for given PCID, which implies that a CPU executing with the given pmap active have the translations cached. [1] - In smp_masked_invltlb(), pass pmap to smp_targeted_tlb_shootdown(). [1] - In invlrng_handler(), check for the special values of pcid (0 and -1) and do corresponding global or total invalidations before checking for performing PCID-specific range invalidation with INVPCID_ADDR. [2] - In invltlb_pcid_handler(), do not read %cr3 unless needed. [2] - Do minor style tweaks. [2] Submitted by: Henrik Gulbrandsen <henrik@gulbra.net> [1] Other parts sponsored by: The FreeBSD Foundation [2] Tested by: Henrik Gulbrandsen, pho MFC after: 1 week	2014-03-28 16:07:27 +00:00
emaste	d2c99117cd	Update EFI framebuffer handoff from loader Sponsored by: The FreeBSD Foundation	2014-03-27 19:43:38 +00:00
emaste	4a841fdff4	amd64: Parse the EFI memory map if present With this change (and loader.efi from the projects/uefi branch) we can now boot under qemu using the OVMF UEFI firmware image with the limitation that a serial console is required. (This is largely r246337 from the projects/uefi branch.) Sponsored by: The FreeBSD Foundation	2014-03-27 18:23:02 +00:00
neel	3e49998fdf	Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called from any context i.e., it is not required to be called from a vcpu thread. The ioctl simply sets a state variable 'vm->suspend' to '1' and returns. The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The suspend handler waits until all 'vm->active_cpus' have transitioned to 'vm->suspended_cpus' before returning to userspace. Discussed with: grehan	2014-03-26 23:34:27 +00:00
imp	bd031ca10c	Rather than require a makeoptions DEBUG to get debug correct, add it in kern.mk, but only if we're using clang. While this option is supported by both clang and gcc, in the future there may be changes to clang which change the defaults that require a tweak to build our kernel such that other tools in our tree will work. Set a good example by forcing -gdwarf-2 only for clang builds, and only if the user hasn't specified another dwarf level already. Update UPDATING to reflect the changed state of affairs. This also keeps us from having to update all the ARM kernels to add this, and also keeps us from in the future having to update all the MIPS kernels and is one less place the user will have to know to do something special for clang and one less thing developers will need to do when moving an architecture to clang. Reviewed by: ian@ MFC after: 1 week	2014-03-25 22:08:31 +00:00
tychon	58699bc5fc	Move the atpit device model from userspace into vmm.ko for better precision and lower latency. Approved by: grehan (co-mentor)	2014-03-25 19:20:34 +00:00
bdrewery	6fcf6199a4	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
kib	7390415c58	Add change forgotten in r263475. Make dmaplimit accessible outside amd64/pmap.c. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-21 17:17:19 +00:00
kib	24c4e4a548	Fix two issues with /dev/mem access on amd64, both causing kernel page faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, success returned from the kernacc(9) does not guarantee that consequent attempt to read or write to the checked address succeed, since other thread might invalidate the address meantime. Add a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. The trap handler would then see a page fault from access, and recover in normal way, making /dev/mem access safer. Remove GIANT_REQUIRED from the amd64 memrw(), since it is not needed and having Giant locked does not solve issues for amd64. Note that at least the second issue exists on other architectures, and requires similar patching for md code. Reported and tested by: clusteradm (gjb, sbruno) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-21 14:25:09 +00:00
imp	9f008568e7	Remove vestiges of knowing the ISA bus, which we gave up on around 20 years ago. Remove redunant copy of isaregs.h.	2014-03-19 21:03:04 +00:00
markj	b41aca8e4d	Only invoke fasttrap hooks for traps from user mode, and ensure that they're called with interrupts enabled. Calling fasttrap_pid_probe() with interrupts disabled can lead to deadlock if fasttrap writes to the process' address space. Reviewed by: rpaulo MFC after: 3 weeks	2014-03-19 01:27:56 +00:00
imp	ea27b8b541	In kernel config files, it is supposed to be 'options<space><tab>' not 'options<tab><tab>', per long standing (but recently not so strictly enforced) convention.	2014-03-18 14:41:18 +00:00
neel	3818d66305	When a vcpu is deactivated it must also unblock any rendezvous that may be blocked on it. This is done by issuing a wakeup after clearing the 'vcpuid' from 'active_cpus'. Also, use CPU_CLR_ATOMIC() to guarantee visibility of the updated 'active_cpus' across all host cpus.	2014-03-18 02:49:28 +00:00
neel	9e498dc116	Notify vcpus participating in the rendezvous of the pending event to ensure that they execute the rendezvous function as soon as possible.	2014-03-17 23:30:38 +00:00
imp	47e104941c	Align all comments in config files on same column. This consistency helps when bits and pieces of GENERIC from i386 or amd64 are cut and pasted into other architecture's config files (which in the case of ARM had gotten rather akimbo).	2014-03-16 15:22:52 +00:00
rwatson	33fdc14c0c	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
tychon	5460439295	Fix a race wherein the source of an interrupt vector is wrongly attributed if an ExtINT arrives during interrupt injection. Also, fix a spurious interrupt if the PIC tries to raise an interrupt before the outstanding one is accepted. Finally, improve the PIC interrupt latency when another interrupt is raised immediately after the outstanding one is accepted by creating a vmexit rather than waiting for one to occur by happenstance. Approved by: neel (co-mentor)	2014-03-15 23:09:34 +00:00
rwatson	e78b9db504	Revert a small portion of r263198 left over from local testing: don't enable PCB groups and RSS by default [yet].	2014-03-15 00:59:23 +00:00
rwatson	f411704afc	Several years after initial development, merge prototype support for linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge)	2014-03-15 00:57:50 +00:00
glebius	80e85e32a5	Remove AppleTalk support. AppleTalk was a network transport protocol for Apple Macintosh devices in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was a legacy protocol and primary networking protocol is TCP/IP. The last Mac OS X release to support AppleTalk happened in 2009. The same year routing equipment vendors (namely Cisco) end their support. Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 06:29:43 +00:00
glebius	d494babace	Remove IPX support. IPX was a network transport protocol in Novell's NetWare network operating system from late 80s and then 90s. The NetWare itself switched to TCP/IP as default transport in 1998. Later, in this century the Novell Open Enterprise Server became successor of Novell NetWare. The last release that claimed to still support IPX was OES 2 in 2007. Routing equipment vendors (e.g. Cisco) discontinued support for IPX in 2011. Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 02:58:48 +00:00
imp	bf13b5b908	Delete stray clause 3 (Advertising clause) and renumber while i'm here. Approved by: alc@	2014-03-11 23:41:35 +00:00
tychon	9affb68b8d	Don't try to return a vector to a caller that only cares if a vector is pending or not. Approved by: neel (co-mentor)	2014-03-11 22:12:12 +00:00
imp	c4c8568cd0	Remove clause 3 (the advertising clause), per the regent's letter.	2014-03-11 17:20:50 +00:00
tychon	25c8b61cfd	Replace the userspace atpic stub with a more functional vmm.ko model. New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ can be used to manipulate the pic, and optionally the ioapic, pin state. Reviewed by: jhb, neel Approved by: neel (co-mentor)	2014-03-11 16:56:00 +00:00
royger	446e208ee2	xen: add a hook to perform AP startup AP startup on PVH follows the PV method, so we need to add a hook in order to diverge from bare metal. Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/machdep.c: - Add hook for start_all_aps on native (using native_start_all_aps defined in mp_machdep). amd64/amd64/mp_machdep.c: - Make some variables global because they will also be used by the Xen PVH AP startup code. - Use the start_all_aps hook to start APs. - Rename start_all_aps to native_start_all_aps. amd64/include/smp.h: - Add declaration for native_start_all_aps. x86/include/init.h: - Declare start_all_aps hook in init_ops. x86/xen/pv.c: - Pick external declarations from mp_machdep. - Introduce Xen PV code to start APs on PVH. - Set start_all_aps init hook to use the Xen PVH implementation.	2014-03-11 10:27:57 +00:00
royger	419270d8a7	xen: add hook for AP bootstrap memory reservation This hook will only be implemented for bare metal, Xen doesn't require any bootstrap code since APs are started in long mode with paging enabled. Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/machdep.c: - Set mp_bootaddress hook for bare metal. x86/include/init.h: - Define mp_bootaddress in init_ops.	2014-03-11 10:26:16 +00:00
royger	6b1be12234	xen: use the same hypercall mechanism for XEN and XENHVM Currently XEN (PV) and XENHVM (PVHVM) ports use different ways to issue hypercalls, unify this by filling the hypercall_page under HVM also. Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/include/xen/hypercall.h: - Unify Xen hypercall code by always using the PV way. i386/i386/locore.s: - Define hypercall_page on i386 XENHVM. x86/xen/hvm.c: - Fill hypercall_page on XENHVM kernels using the HVM method (only when running as an HVM guest).	2014-03-11 10:24:13 +00:00
royger	891131cb52	xen: implement hook to fetch and parse e820 memory map e820 memory map is fetched using a hypercall under Xen PVH, so add a hook to init_ops in oder to diverge from bare metal and implement a Xen variant. Approved by: gibbs Sponsored by: Citrix Systems R&D x86/include/init.h: - Add a parse_memmap hook to init_ops, that will be called to fetch and parse the memory map. amd64/amd64/machdep.c: - Decouple the fetch and the parse of the memmap, so the parse function can be shared with Xen code. - Move code around in order to implement the parse_memmap hook. amd64/include/pc/bios.h: - Declare bios_add_smap_entries (implemented in machdep.c). x86/xen/pv.c: - Implement fetching of e820 memmap when running as a PVH guest by using the XENMEM_memory_map hypercall.	2014-03-11 10:23:03 +00:00
royger	467e743960	xen: implement an early timer for Xen PVH When running as a PVH guest, there's no emulated i8254, so we need to use the Xen PV timer as the early source for DELAY. This change allows for different implementations of the early DELAY function and implements a Xen variant for it. Approved by: gibbs Sponsored by: Citrix Systems R&D dev/xen/timer/timer.c: dev/xen/timer/timer.h: - Implement Xen early delay functions using the PV timer and declare them. x86/include/init.h: - Add hooks for early clock source initialization and early delay functions. i386/i386/machdep.c: pc98/pc98/machdep.c: amd64/amd64/machdep.c: - Set early delay hooks to use the i8254 on bare metal. - Use clock_init (that will in turn make use of init_ops) to initialize the early clock source. amd64/include/clock.h: i386/include/clock.h: - Declare i8254_delay and clock_init. i386/xen/clock.c: - Rename DELAY to i8254_delay. x86/isa/clock.c: - Introduce clock_init that will take care of initializing the early clock by making use of the init_ops hooks. - Move non ISA related delay functions to the newly introduced delay file. x86/x86/delay.c: - Add moved delay related functions. - Implement generic DELAY function that will use the init_ops hooks. x86/xen/pv.c: - Set PVH hooks for the early delay related functions in init_ops. conf/files.amd64: conf/files.i386: conf/files.pc98: - Add delay.c to the kernel build.	2014-03-11 10:20:42 +00:00
royger	4df602a6bf	amd64: introduce hook for custom preload metadata parsers Add hooks to amd64 in order to have diverging implementations, since on Xen PV the metadata is passed to the kernel in a different form. Approbed by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/machdep.c: - Define init_ops for native. - Put native code inside of native_parse_preload_data hook. - Call the parse_preload_data in order to fill the metadata info. x86/include/init.h: - Declare the init_ops struct. x86/xen/pv.c: - Declare xen_init_ops that contains the Xen PV implementation of init_ops. - Implement the parse_preload_data for Xen PVH, the info is fetched from HYPERVISOR_start_info->cmd_line as provided by Xen.	2014-03-11 10:15:25 +00:00
royger	5dd05db7ff	xen: add PV/PVH kernel entry point Add the PV/PVH entry point and the low level functions for PVH early initialization. Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/genassym.c: - Add __FreeBSD_version define to assym.s so it can be used for the Xen notes. amd64/amd64/locore.S: - Make bootstack global so it can be used from Xen kernel entry point. amd64/amd64/xen-locore.S: - Add Xen notes to the kernel. - Add the Xen PV entry point, that is going to call hammer_time_xen. amd64/include/asmacros.h: - Add ELFNOTE macros. i386/xen/xen_machdep.c: - Define HYPERVISOR_start_info for the XEN i386 PV port, which is going to be used in some shared code between PV and PVH. x86/xen/hvm.c: - Define HYPERVISOR_start_info for the PVH port. x86/xen/pv.c: - Introduce hammer_time_xen which is going to perform early setup for Xen PVH: - Setup shared Xen variables start_info, shared_info and xen_store. - Set guest type. - Create initial page tables as FreeBSD expects to find them. - Call into native init function (hammer_time). xen/xen-os.h: - Declare HYPERVISOR_start_info. conf/files.amd64: - Add amd64/amd64/locore.S and x86/xen/pv.c to the list of files.	2014-03-11 10:07:01 +00:00
royger	27026f4f2a	amd64/i386: switch IPI handlers to C code. Move asm IPIs handlers to C code, so both Xen and native IPI handlers share the same code. Reviewed by: jhb Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/apic_vector.S: i386/i386/apic_vector.s: - Remove asm coded IPI handlers and instead call the newly introduced C variants. amd64/amd64/mp_machdep.c: i386/i386/mp_machdep.c: - Add C coded clones to the asm IPI handlers (moved from x86/xen/hvm.c). i386/include/smp.h: amd64/include/smp.h: - Add prototypes for the C IPI handlers. x86/xen/hvm.c: - Move the C IPI handlers to mp_machdep and call those in the Xen IPI handlers. i386/xen/mp_machdep.c: - Add dummy IPI handlers to the i386 Xen PV port (this port doesn't support SMP).	2014-03-11 10:03:29 +00:00
emaste	2ab45c505b	Disable amd64 TLB Context ID (pcid) by default for now There are a number of reports of userspace application crashes that are "solved" by setting vm.pmap.pcid_enabled=0, including Java and the x11/mate-terminal port (PR ports/184362). I originally planned to disable this only in stable/10 (in r262753), but it has been pointed out that additional crash reports on HEAD are not likely to provide new insight into the problem. The feature can easily be enabled for testing.	2014-03-05 01:34:10 +00:00
jkim	9b4d3b43ca	Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c. Inspired by: jhb	2014-03-04 21:35:57 +00:00
jkim	e6f1aee9e4	Revert accidentally committed changes in 262748.	2014-03-04 20:16:00 +00:00
jkim	74dcbf5843	Properly save and restore CR0. MFC after: 3 days	2014-03-04 20:07:36 +00:00
jkim	373cea9476	Remove dead code since r230426, fix a comment, and tidy up. Reported by: jhb MFC after: 3 days	2014-03-04 19:41:16 +00:00
neel	4e6374765e	Fix a race between VMRUN() and vcpu_notify_event() due to 'vcpu->hostcpu' being updated outside of the vcpu_lock(). The race is benign and could potentially result in a missed notification about a pending interrupt to a vcpu. The interrupt would not be lost but rather delayed until the next VM exit. The vcpu's hostcpu is now updated concurrently with the vcpu state change. When the vcpu transitions to the RUNNING state the hostcpu is set to 'curcpu'. It is set to 'NOCPU' in all other cases. Reviewed by: grehan	2014-03-01 03:17:58 +00:00
jhb	4905c0f870	Correct VMware capitalization. Submitted by: joeld	2014-02-28 21:33:40 +00:00
jhb	4ce54b93eb	Workaround an apparent bug in VMWare Fusion's nested VT support where it triggers a VM exit with the exit reason of an external interrupt but without a valid interrupt set in the exit interrupt information. Tested by: Michael Dexter Reviewed by: neel MFC after: 1 week	2014-02-28 19:07:55 +00:00
neel	e01c440dae	Queue pending exceptions in the 'struct vcpu' instead of directly updating the processor-specific VMCS or VMCB. The pending exception will be delivered right before entering the guest. The order of event injection into the guest is: - hardware exception - NMI - maskable interrupt In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt window-exiting and inject it as soon as possible after the hardware exception is injected. Also since interrupts are inherently asynchronous, injecting them after the hardware exception should not affect correctness from the guest perspective. Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict it to only deliver x86 hardware exceptions. This new ioctl is now used to inject a protection fault when the guest accesses an unimplemented MSR. Discussed with: grehan, jhb Reviewed by: jhb	2014-02-26 00:52:05 +00:00
grehan	e0e0829e5e	MFC @ r259635 This brings in the "-w" option from bhyve to ignore unknown MSRs. It will make debugging Linux guests a bit easier. Suggested by: Willem Jan Withagen (wjw at digiware nl)	2014-02-25 06:29:56 +00:00
alc	ebe945ff9f	When the kernel is running in a virtual machine, it cannot rely upon the processor family to determine if the workaround for AMD Family 10h Erratum 383 should be enabled. To enable virtual machine migration among a heterogeneous collection of physical machines, the hypervisor may have been configured to report an older processor family with a reduced feature set. Effectively, the reported processor family and its features are like a "least common denominator" for the collection of machines. Therefore, when the kernel is running in a virtual machine, instead of relying upon the processor family, we now test for features that prove that the underlying processor is not affected by the erratum. (The features that we test for are unlikely to ever be emulated in software on an affected physical processor.) PR: 186061 Tested by: Simon Matter Discussed with: jhb, neel MFC after: 2 weeks	2014-02-22 18:53:42 +00:00
neel	3e0732cf3e	Add support for x2APIC virtualization assist in Intel VT-x. The vlapic.ops handler 'enable_x2apic_mode' is called when the vlapic mode is switched to x2APIC. The VT-x implementation of this handler turns off the APIC-access virtualization and enables the x2APIC virtualization in the VMCS. The x2APIC virtualization is done by allowing guest read access to a subset of MSRs in the x2APIC range. In non-root operation the processor will satisfy an 'rdmsr' access to these MSRs by reading from the virtual APIC page instead. The guest is also given write access to TPR, EOI and SELF_IPI MSRs which get special treatment in non-root operation. This is documented in the Intel SDM section titled "Virtualizing MSR-Based APIC Accesses". Enforce that APIC-write and APIC-access VM-exits are handled only if APIC-access virtualization is enabled. The one exception to this is SELF_IPI virtualization which may result in an APIC-write VM-exit.	2014-02-21 06:03:54 +00:00
neel	4626d164b8	Simplify APIC mode switching from MMIO to x2APIC. In part this is done to simplify the implementation of the x2APIC virtualization assist in VT-x. Prior to this change the vlapic allowed the guest to change its mode from xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked when the virtual machine is created. This is not very constraining because operating systems already have to deal with BIOS setting up the APIC in x2APIC mode at boot. Fix a bug in the CPUID emulation where the x2APIC capability was leaking from the host to the guest. Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore MSR accesses to the vlapic when it is in xAPIC mode. The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8) can be used to change the mode to x2APIC instead. Discussed with: grehan@	2014-02-20 01:48:25 +00:00
jhb	521737384d	A first pass at adding support for injecting hardware exceptions for emulated instructions. - Add helper routines to inject interrupt information for a hardware exception from the VM exit callback routines. - Use the new routines to inject GP and UD exceptions for invalid operations when emulating the xsetbv instruction. - Don't directly manipulate the entry interrupt info when a user event is injected. Instead, store the event info in the vmx state and only apply it during a VM entry if a hardware exception or NMI is not already pending. - While here, use HANDLED/UNHANDLED instead of 1/0 in a couple of routines. Reviewed by: neel	2014-02-18 03:07:36 +00:00
neel	f9781635be	Handle writes to the SELF_IPI MSR by the guest when the vlapic is configured in x2apic mode. Reads to this MSR are currently ignored but should cause a general proctection exception to be injected into the vcpu. All accesses to the corresponding offset in xAPIC mode are ignored. Also, do not panic the host if there is mismatch between the trigger mode programmed in the TMR and the actual interrupt being delivered. Instead the anomaly is logged to aid debugging and to prevent a misbehaving guest from panicking the host.	2014-02-17 23:07:16 +00:00

1 2 3 4 5 ...

7112 Commits