This patch adds support for MSI interrupts when running on Xen. Apart
from adding the Xen related code needed in order to register MSI
interrupts this patch also makes the msi_init function a hook in
init_ops, so different MSI implementations can have different
initialization functions.
Sponsored by: Citrix Systems R&D
xen/interface/physdev.h:
- Add the MAP_PIRQ_TYPE_MULTI_MSI to map multi-vector MSI to the Xen
public interface.
x86/include/init.h:
- Add a hook for setting custom msi_init methods.
amd64/amd64/machdep.c:
i386/i386/machdep.c:
- Set the default msi_init hook to point to the native MSI
initialization method.
x86/xen/pv.c:
- Set the Xen MSI init hook when running as a Xen guest.
x86/x86/local_apic.c:
- Call the msi_init hook instead of directly calling msi_init.
xen/xen_intr.h:
x86/xen/xen_intr.c:
- Introduce support for registering/releasing MSI interrupts with
Xen.
- The MSI interrupts will use the same PIC as the IO APIC interrupts.
xen/xen_msi.h:
x86/xen/xen_msi.c:
- Introduce a Xen MSI implementation.
x86/xen/xen_nexus.c:
- Overwrite the default MSI hooks in the Xen Nexus to use the Xen MSI
implementation.
x86/xen/xen_pci.c:
- Introduce a Xen specific PCI bus that inherits from the ACPI PCI
bus and overwrites the native MSI methods.
- This is needed because when running under Xen the MSI messages used
to configure MSI interrupts on PCI devices are written by Xen
itself.
dev/acpica/acpi_pci.c:
- Lower the quality of the ACPI PCI bus so the newly introduced Xen
PCI bus can take over when needed.
conf/files.i386:
conf/files.amd64:
- Add the newly created files to the build process.
As of git submit e179f6914152eca9, the Linux kernel does a simple
probe of the PIC by writing a pattern to the IMR and then reading it
back, prior to the init sequence of ICW words.
The bhyve PIC emulation wasn't allowing the IMR to be read until
the ICW sequence was complete. This limitation isn't required so
relax the test.
With this change, Linux kernels 3.15-rc2 and later won't hang
on boot when calibrating the local APIC.
Reviewed by: tychon
MFC after: 3 days
When the FreeBSD kernel is loaded from Xen the symtab and strtab are
not loaded the same way as the native boot loader. This patch adds
three new global variables to ddb that can be used to specify the
exact position and size of those tables, so they can be directly used
as parameters to db_add_symbol_table. A new helper is introduced, so callers
that used to set ksym_start and ksym_end can use this helper to set the new
variables.
It also adds support for loading them from the Xen PVH port, that was
previously missing those tables.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
ddb/db_main.c:
- Add three new global variables: ksymtab, kstrtab, ksymtab_size that
can be used to specify the position and size of the symtab and
strtab.
- Use those new variables in db_init in order to call db_add_symbol_table.
- Move the logic in db_init to db_fetch_symtab in order to set ksymtab,
kstrtab, ksymtab_size from ksym_start and ksym_end.
ddb/ddb.h:
- Add prototype for db_fetch_ksymtab.
- Declate the extern variables ksymtab, kstrtab and ksymtab_size.
x86/xen/pv.c:
- Add support for finding the symtab and strtab when booted as a Xen
PVH guest. Since Xen loads the symtab and strtab as NetBSD expects
to find them we have to adapt and use the same method.
amd64/amd64/machdep.c:
arm/arm/machdep.c:
i386/i386/machdep.c:
mips/mips/machdep.c:
pc98/pc98/machdep.c:
powerpc/aim/machdep.c:
powerpc/booke/machdep.c:
sparc64/sparc64/machdep.c:
- Use the newly introduced db_fetch_ksymtab in order to set ksymtab,
kstrtab and ksymtab_size.
For now restrict it to amd64. Other architectures might be
re-added later once tested.
Remove the drivers from the global NOTES and files files and move
them to the amd64 specifics.
Remove the drivers from the i386 modules build and only leave the
amd64 version.
Rather than depending on "inet" depend on "pci" and make sure that
ixl(4) and ixlv(4) can be compiled independently [2]. This also
allows the drivers to build properly on IPv4-only or IPv6-only
kernels.
PR: 193824 [2]
Reviewed by: eric.joyner intel.com
MFC after: 3 days
References:
[1] http://lists.freebsd.org/pipermail/svn-src-all/2014-August/090470.html
behavior was changed in r271888 so update the comment block to reflect this.
MSR_KGSBASE is accessible from the guest without triggering a VM-exit. The
permission bitmap for MSR_KGSBASE is modified by vmx_msr_guest_init() so get
rid of redundant code in vmx_vminit().
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.
The VT-x code has the following types of MSRs:
- MSRs that are unconditionally saved/restored on every guest/host context
switch (e.g., MSR_GSBASE).
- MSRs that are restored to guest values on entry to vmx_run() and saved
before returning. This is an optimization for MSRs that are not used in
host kernel context (e.g., MSR_KGSBASE).
- MSRs that are emulated and every access by the guest causes a trap into
the hypervisor (e.g., MSR_IA32_MISC_ENABLE).
Reviewed by: grehan
- Note the quirk with the interrupt enabled state of the dna handler.
- Use just panic() instead of printf() and panic(). Print tid instead
of pid, the fpu state is per-thread.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
for amd64/linux32. Fix the entirely bogus (untested) version from
r161310 for i386/linux using the same shared code in compat/linux.
It is unclear to me if we could support more clock mappings but
the current set allows me to successfully run commercial
32bit linux software under linuxolator on amd64.
Reviewed by: jhb
Differential Revision: D784
MFC after: 3 days
Sponsored by: DARPA, AFRL
by explicitly moving it out of the interrupt shadow. The hypervisor is done
"executing" the HLT and by definition this moves the vcpu out of the
1-instruction interrupt shadow.
Prior to this change the interrupt would be held pending because the VMCS
guest-interruptibility-state would indicate that "blocking by STI" was in
effect. This resulted in an unnecessary round trip into the guest before
the pending interrupt could be injected.
Reviewed by: grehan
AP startup and AP resume (it was already used for BSP startup and BSP
resume).
- Split code to do one-time probing of cache properties out of
initializecpu() and into initializecpucache(). This is called once on
the BSP during boot.
- Move enable_sse() into initializecpu().
- Call initializecpu() for AP startup instead of enable_sse() and
manually frobbing MSR_EFER to enable PG_NX.
- Call initializecpu() when an AP resumes. In theory this will now
properly re-enable PG_NX in MSR_EFER when resuming a PAE kernel on
APs.
resume that is a superset of a pcb. Move the FPU state out of the pcb and
into this new structure. As part of this, move the FPU resume code on
amd64 into a C function. This allows resumectx() to still operate only on
a pcb and more closely mirrors the i386 code.
Reviewed by: kib (earlier version)
The legacy USB circuit tends to give trouble on MacBook.
While the original report covered MacBook, extend the fix
preemptively for the newer MacBookPro too.
PR: 191693
Reviewed by: emaste
MFC after: 5 days
<machine/md_var.h>.
- Move some CPU-related variables out of i386/i386/identcpu.c to
initcpu.c to match amd64.
- Move the declaration of has_f00f_hack out of identcpu.c to machdep.c.
- Remove a misleading comment from i386/i386/initcpu.c (locore zeros
the BSS before it calls identify_cpu()) and remove explicit zero
assignments to reduce the diff with amd64.
proper constraint for 'x'. The "+r" constraint indicates that 'x' is an
input and output register operand.
While here generate code for different variants of getcc() using a macro
GETCC(sz) where 'sz' indicates the operand size.
Update the status bits in %rflags when emulating AND and OR opcodes.
Reviewed by: grehan
optional attributes field.
- Add a 'machdep.smap' sysctl that exports the SMAP table of the running
system as an array of the ACPI 3.0 structure. (On older systems, the
attributes are given a value of zero.) Note that the sysctl only
exports the SMAP table if it is available via the metadata passed from
the loader to the kernel. If an SMAP is not available, an empty array
is returned.
- Add a format handler for the ACPI 3.0 SMAP structure to the sysctl(8)
binary to format the SMAP structures in a readable format similar to
the format found in boot messages.
MFC after: 2 weeks
tunables to modify the default cpu topology advertised by bhyve.
Also add a tunable "hw.vmm.topology.cpuid_leaf_b" to disable the CPUID
leaf 0xb. This is intended for testing guest behavior when it falls back
on using CPUID leaf 0x4 to deduce CPU topology.
The default behavior is to advertise each vcpu as a core in a separate soket.
the vcpu had no caches at all. This causes problems when executing applications
in the guest compiled with the Intel compiler.
Submitted by: Mark Hill (mark.hill@tidalscale.com)
Eventually, the vmd_segs of the struct vm_domain should become bitset
instead of long, to allow arbitrary compile-time selected maximum.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
starting with a superpage demotion by pmap_enter() that could result in
a PV list lock being held when pmap_enter() is just about to return
KERN_RESOURCE_SHORTAGE. Consequently, the KASSERT that no PV list locks
are held needs to be replaced with a conditional unlock.
Discussed with: kib
X-MFC with: r269728
Sponsored by: EMC / Isilon Storage Division
mapping size (currently unused). The flags includes the fault access
bits, wired flag as PMAP_ENTER_WIRED, and a new flag
PMAP_ENTER_NOSLEEP to indicate that pmap should not sleep.
For powerpc aim both 32 and 64 bit, fix implementation to ensure that
the requested mapping is created when PMAP_ENTER_NOSLEEP is not
specified, in particular, wait for the available memory required to
proceed.
In collaboration with: alc
Tested by: nwhitehorn (ppc aim32 and booke)
Sponsored by: The FreeBSD Foundation and EMC / Isilon Storage Division
MFC after: 2 weeks
Add the ACPI MCFG table to advertise the extended config memory window.
Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted
or moved in the guest's address space. The PCI extended config space is an
example of an immutable memory range.
Add emulation for the "movzw" instruction. This instruction is used by FreeBSD
to read a 16-bit extended config space register.
CR: https://phabric.freebsd.org/D505
Reviewed by: jhb, grehan
Requested by: tychon
The MD allocators were very common, however there were some minor
differencies. These differencies were all consolidated in the MI allocator,
under ifdefs. The defines from machine/vmparam.h turn on features required
for a particular machine. For details look in the comment in sys/sf_buf.h.
As result no MD code left in sys/*/*/vm_machdep.c. Some arches still have
machine/sf_buf.h, which is usually quite small.
Tested by: glebius (i386), tuexen (arm32), kevlo (arm32)
Reviewed by: kib
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
We continue to use pmap_enter() for that. For unwiring virtual pages, we
now use pmap_unwire(), which unwires a range of virtual addresses instead
of a single virtual page.
Sponsored by: EMC / Isilon Storage Division
features. If bootverbose is enabled, a detailed list is provided;
otherwise, a single-line summary is displayed.
- Add read-only sysctls for optional VT-x capabilities used by bhyve
under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that
indicate the presence of optional capabilities under this node.
CR: https://phabric.freebsd.org/D498
Reviewed by: grehan, neel
MFC after: 1 week
forever in vm_handle_hlt().
This is usually not an issue as long as one of the other vcpus properly resets
or powers off the virtual machine. However, if the bhyve(8) process is killed
with a signal the halted vcpu cannot be woken up because it's sleep cannot be
interrupted.
Fix this by waking up periodically and returning from vm_handle_hlt() if
TDF_ASTPENDING is set.
Reported by: Leon Dang
Sponsored by: Nahanni Systems
It is not possible to PUSH a 32-bit operand on the stack in 64-bit mode. The
default operand size for PUSH is 64-bits and the operand size override prefix
changes that to 16-bits.
vm_copy_setup() can return '1' if it encounters a fault when walking the
guest page tables. This is a guest issue and is now handled properly by
resuming the guest to handle the fault.
involves updating the corresponding page tables followed by accesses to the
pages in question. This sequence is subject to the situation exactly described
in the "AMD64 Architecture Programmer's Manual Volume 2: System Programming"
rev. 3.23, "7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore,
issuing the INVLPG right after modifying the PTE bits is crucial (see also
r269050).
For the amd64 PMAP code, the order of instructions was already correct. The
above fact still is worth documenting, though.
1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
Reviewed by: alc
Sponsored by: Bally Wulff Games & Entertainment GmbH
The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.
A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.
Setting PSE together with PAE or in long mode just makes the PSE bit
completely ignored, so don't set it.
Sponsored by: Citrix Systems R&D
Reviewed by: kib