Commit Graph

1793 Commits

Author SHA1 Message Date
Mark Johnston
bdb9ab0dd9 Factor out duplicated code from dumpsys() on each architecture into generic
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.

PR:		193873
Differential Revision:	https://reviews.freebsd.org/D904
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by:	jhibbits (earlier version)
Sponsored by:	EMC / Isilon Storage Division
2015-01-07 01:01:39 +00:00
Konstantin Belousov
a40e51e355 For /dev/mem and /dev/kmem accesses, avoid asserting that addresses
are within direct map.  We want to return error instead of panicing.

PR:	194995
Sponsored by:	The FreeBSD Foundation
2015-01-03 01:28:58 +00:00
Alan Cox
d866a563d4 The physical memory allocator supports the use of distinct free lists for
managing pages from different address ranges.  Generally speaking, this
feature is used to increase the likelihood that physical pages are
available that can meet special DMA requirements or can be accessed through
a limited-coverage direct mapping (e.g., MIPS).  However, prior to this
change, the configuration of the free lists was static, i.e., it was
determined at compile time.  Consequentally, free lists could be created
for address ranges that held no actual pages, for example, on 32-bit MIPS-
based systems with 512 MB or less of physical memory.  This change makes
the creation of the free lists dynamic, i.e., it is based on the available
physical memory at boot time.

On 64-bit x86-based systems with 64 GB or more of physical memory, create
free lists for managing pages with physical addresses below 4 GB.  This
change is to address reported problems with initializing devices that
require the allocation of physical pages below 4 GB on some systems with
128 GB or more of physical memory.

PR:		185727
Differential Revision:	https://reviews.freebsd.org/D1274
Reviewed by:	jhb, kib
MFC after:	3 weeks
Sponsored by:	EMC / Isilon Storage Division
2014-12-31 00:54:38 +00:00
Neel Natu
0dafa5cd4b Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.

Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.

The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>

Reviewed by:	tychon
Discussed with:	grehan
Differential Revision:	https://reviews.freebsd.org/D1385
MFC after:	2 weeks
2014-12-30 22:19:34 +00:00
Neel Natu
b053814333 Allow ktr(4) tracing of all guest exceptions via the tunable
"hw.vmm.trace_guest_exceptions".  To enable this feature set the tunable
to "1" before loading vmm.ko.

Tracing the guest exceptions can be useful when debugging guest triple faults.

Note that there is a performance impact when exception tracing is enabled
since every exception will now trigger a VM-exit.

Also, handle machine check exceptions that happen during guest execution
by vectoring to the host's machine check handler via "int $18".

Discussed with:	grehan
MFC after:	2 weeks
2014-12-23 02:14:49 +00:00
Ed Maste
294246bb7d Revert r274772: it is not valid on MIPS
Reported by:	sbruno
2014-11-25 03:50:31 +00:00
Ed Maste
688fd61ae8 Use canonical __PIC__ flag
It is automatically set when -fPIC is passed to the compiler.

Reviewed by:	dim, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D1179
2014-11-21 02:05:48 +00:00
Scott Long
18e3d9f521 Extend earlier addition of stack frames to most of support.S. This makes
stack traces in KDB, HWPMC, and DTrace much more reliable and useful.

Reviewed by:	kan, kib
Obtained from:	Netflix, Inc.
MFC after:	5 days
2014-11-13 22:11:44 +00:00
Ed Maste
96699e86a3 Add workaround for vt efifb's early use of PHYS_TO_DMAP
In vt_efifb_init the framebuffer's physaddr is passed to PHYS_TO_DMAP
before the DMAP is setup. The result is not actually accessed until
after the mapping is setup, though. Loosen the assertion in PHYS_TO_DMAP
for now, to allow use when dmaplimit == 0.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D1142
2014-11-11 14:59:46 +00:00
John Baldwin
01e1933dcc Rework virtual machine hypervisor detection.
- Move the existing code to x86/x86/identcpu.c since it is x86-specific.
- If the CPUID2_HV flag is set, assume a hypervisor is present and query
  the 0x40000000 leaf to determine the hypervisor vendor ID.  Export the
  vendor ID and the highest supported hypervisor CPUID leaf via
  hv_vendor[] and hv_high variables, respectively.  The hv_vendor[]
  array is also exported via the hw.hv_vendor sysctl.
- Merge the VMWare detection code from tsc.c into the new probe in
  identcpu.c.  Add a VM_GUEST_VMWARE to identify vmware and use that in
  the TSC code to identify VMWare.

Differential Revision:	https://reviews.freebsd.org/D1010
Reviewed by:	delphij, jkim, neel
2014-10-28 19:17:44 +00:00
Neel Natu
160ef77abf Move the ACPI PM timer emulation into vmm.ko.
This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).

Discussed with:	grehan
2014-10-26 04:44:28 +00:00
Roger Pau Monné
927dc0e02a amd64: make uiomove_fromphys functional for pages not mapped by the DMAP
Place the code introduced in r268660 into a separate function that can be
called from uiomove_fromphys. Instead of pre-allocating two KVA pages use
vmem_alloc to allocate them on demand when needed. This prevents blocking if
a page fault is taken while physical addresses from outside the DMAP are
used, since the lock is now removed.

Also introduce a safety catch in PHYS_TO_DMAP and DMAP_TO_PHYS.

Sponsored by: Citrix Systems R&D
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D947

amd64/amd64/pmap.c:
 - Factor out the code to deal with non DMAP addresses from pmap_copy_pages
   and place it in pmap_map_io_transient.
 - Change the code to use vmem_alloc instead of a set of pre-allocated
   pages.
 - Use pmap_qenter and don't pin the thread if there can be page faults.

amd64/amd64/uio_machdep.c:
 - Use pmap_map_io_transient in order to correctly deal with physical
   addresses not covered by the DMAP.

amd64/include/pmap.h:
 - Add the prototypes for the new functions.

amd64/include/vmparam.h:
 - Add safety catches to make sure PHYS_TO_DMAP and DMAP_TO_PHYS are only
   used with addresses covered by the DMAP.
2014-10-24 09:48:58 +00:00
Roger Pau Monné
bf7313e3b7 xen: implement the privcmd user-space device
This device is only attached to priviledged domains, and allows the
toolstack to interact with Xen. The two functions of the privcmd
interface is to allow the execution of hypercalls from user-space, and
the mapping of foreign domain memory.

Sponsored by: Citrix Systems R&D

i386/include/xen/hypercall.h:
amd64/include/xen/hypercall.h:
 - Introduce a function to make generic hypercalls into Xen.

xen/interface/xen.h:
xen/interface/memory.h:
 - Import the new hypercall XENMEM_add_to_physmap_range used by
   auto-translated guests to map memory from foreign domains.

dev/xen/privcmd/privcmd.c:
 - This device has the following functions:
   - Allow user-space applications to make hypercalls into Xen.
   - Allow user-space applications to map memory from foreign domains,
     this is accomplished using the newly introduced hypercall
     (XENMEM_add_to_physmap_range).

xen/privcmd.h:
 - Public ioctl interface for the privcmd device.

x86/xen/hvm.c:
 - Remove declaration of hypercall_page, now it's declared in
   hypercall.h.

conf/files:
 - Add the privcmd device to the build process.
2014-10-22 17:07:20 +00:00
Neel Natu
ed6aacb51f IFC @r272887 2014-10-10 23:52:56 +00:00
Mark Johnston
5eaae1411f Pass up the error status of minidumpsys() to its callers.
PR:		193761
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by:	EMC / Isilon Storage Division
2014-10-08 20:25:21 +00:00
Konstantin Belousov
07a92f34d6 Add an argument to the x86 pmap_invalidate_cache_range() to request
forced invalidation of the cache range regardless of the presence of
self-snoop feature.  Some recent Intel GPUs in some modes are not
coherent, and dirty lines in CPU cache must be flushed before the
pages are transferred to GPU domain.

Reviewed by:	alc (previous version)
Tested by:	pho (amd64)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-10-08 16:48:03 +00:00
Neel Natu
65145c7f50 Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'.
The hypervisor hides the MONITOR/MWAIT capability by unconditionally setting
CPUID.01H:ECX[3] to 0 so the guest should not expect these instructions to
be present anyways.

Discussed with:	grehan
2014-10-06 20:48:01 +00:00
Neel Natu
8f02c5e456 IFC r271888.
Restructure MSR emulation so it is all done in processor-specific code.
2014-09-20 21:46:31 +00:00
Neel Natu
c3498942a5 Restructure the MSR handling so it is entirely handled by processor-specific
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.

The VT-x code has the following types of MSRs:

- MSRs that are unconditionally saved/restored on every guest/host context
  switch (e.g., MSR_GSBASE).

- MSRs that are restored to guest values on entry to vmx_run() and saved
  before returning. This is an optimization for MSRs that are not used in
  host kernel context (e.g., MSR_KGSBASE).

- MSRs that are emulated and every access by the guest causes a trap into
  the hypervisor (e.g., MSR_IA32_MISC_ENABLE).

Reviewed by:	grehan
2014-09-20 02:35:21 +00:00
Neel Natu
4e27d36d38 IFC @r271694 2014-09-17 18:46:51 +00:00
Neel Natu
bbadcde418 Set the 'vmexit->inst_length' field properly depending on the type of the
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.

Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.

Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.

Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).

Reviewed by:	Anish Gupta (akgupt3@gmail.com)
Reviewed by:	grehan
2014-09-14 04:39:04 +00:00
Neel Natu
c2a875f970 AMD processors that have the SVM decode assist capability will store the
instruction bytes in the VMCB on a nested page fault. This is useful because
it saves having to walk the guest page tables to fetch the instruction.

vie_init() now takes two additional parameters 'inst_bytes' and 'inst_len'
that map directly to 'vie->inst[]' and 'vie->num_valid'.

The instruction emulation handler skips calling 'vmm_fetch_instruction()'
if 'vie->num_valid' is non-zero.

The use of this capability can be turned off by setting the sysctl/tunable
'hw.vmm.svm.disable_npf_assist' to '1'.

Reviewed by:	Anish Gupta (akgupt3@gmail.com)
Discussed with:	grehan
2014-09-13 22:16:40 +00:00
Neel Natu
d181963296 Optimize the common case of injecting an interrupt into a vcpu after a HLT
by explicitly moving it out of the interrupt shadow. The hypervisor is done
"executing" the HLT and by definition this moves the vcpu out of the
1-instruction interrupt shadow.

Prior to this change the interrupt would be held pending because the VMCS
guest-interruptibility-state would indicate that "blocking by STI" was in
effect. This resulted in an unnecessary round trip into the guest before
the pending interrupt could be injected.

Reviewed by:	grehan
2014-09-12 06:15:20 +00:00
John Baldwin
b1d735ba4c Create a separate structure for per-CPU state saved across suspend and
resume that is a superset of a pcb.  Move the FPU state out of the pcb and
into this new structure.  As part of this, move the FPU resume code on
amd64 into a C function.  This allows resumectx() to still operate only on
a pcb and more closely mirrors the i386 code.

Reviewed by:	kib (earlier version)
2014-09-06 15:23:28 +00:00
John Baldwin
7fb40488d6 - Move prototypes for various functions into out of C files and into
<machine/md_var.h>.
- Move some CPU-related variables out of i386/i386/identcpu.c to
  initcpu.c to match amd64.
- Move the declaration of has_f00f_hack out of identcpu.c to machdep.c.
- Remove a misleading comment from i386/i386/initcpu.c (locore zeros
  the BSS before it calls identify_cpu()) and remove explicit zero
  assignments to reduce the diff with amd64.
2014-09-04 01:46:06 +00:00
John Baldwin
89871cdeb6 - Add a new structure type for the ACPI 3.0 SMAP entry that includes the
optional attributes field.
- Add a 'machdep.smap' sysctl that exports the SMAP table of the running
  system as an array of the ACPI 3.0 structure.  (On older systems, the
  attributes are given a value of zero.)  Note that the sysctl only
  exports the SMAP table if it is available via the metadata passed from
  the loader to the kernel.  If an SMAP is not available, an empty array
  is returned.
- Add a format handler for the ACPI 3.0 SMAP structure to the sysctl(8)
  binary to format the SMAP structures in a readable format similar to
  the format found in boot messages.

MFC after:	2 weeks
2014-08-29 21:25:47 +00:00
Peter Grehan
7f21538b6e Change __inline style to be consistent with FreeBSD usage,
and also fix gcc build (on STABLE, when MFCd).

PR:	192880
Reviewed by:	neel
Reported by:	ngie
MFC after:	1 day
2014-08-24 02:07:34 +00:00
John Baldwin
64d6de263b Bump MAXCPU on amd64 from 64 to 256. In practice APIC only permits 255
CPUs (IDs 0 through 254).  Getting above that limit requires x2APIC.

MFC after:	1 month
2014-08-20 16:06:24 +00:00
Konstantin Belousov
3165194c6b Increase max number of physical segments on amd64 to 63.
Eventually, the vmd_segs of the struct vm_domain should become bitset
instead of long, to allow arbitrary compile-time selected maximum.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-20 08:07:08 +00:00
Gleb Smirnoff
c8d2ffd6a7 Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c
The MD allocators were very common, however there were some minor
differencies. These differencies were all consolidated in the MI allocator,
under ifdefs. The defines from machine/vmparam.h turn on features required
for a particular machine. For details look in the comment in sys/sf_buf.h.

As result no MD code left in sys/*/*/vm_machdep.c. Some arches still have
machine/sf_buf.h, which is usually quite small.

Tested by:	glebius (i386), tuexen (arm32), kevlo (arm32)
Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-05 09:44:10 +00:00
Neel Natu
f008d1571d If a vcpu has issued a HLT instruction with interrupts disabled then it sleeps
forever in vm_handle_hlt().

This is usually not an issue as long as one of the other vcpus properly resets
or powers off the virtual machine. However, if the bhyve(8) process is killed
with a signal the halted vcpu cannot be woken up because it's sleep cannot be
interrupted.

Fix this by waking up periodically and returning from vm_handle_hlt() if
TDF_ASTPENDING is set.

Reported by:	Leon Dang
Sponsored by:	Nahanni Systems
2014-07-26 02:53:51 +00:00
Neel Natu
d37f2adb38 Fix fault injection in bhyve.
The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.

A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.
2014-07-24 01:38:11 +00:00
Neel Natu
d665d229ce Emulate instructions emitted by OpenBSD/i386 version 5.5:
- CMP REG, r/m
- MOV AX/EAX/RAX, moffset
- MOV moffset, AX/EAX/RAX
- PUSH r/m
2014-07-23 04:28:51 +00:00
Neel Natu
091d453222 Handle nested exceptions in bhyve.
A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.

vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.

vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.
2014-07-19 20:59:08 +00:00
Neel Natu
3d5444c864 Add emulation for legacy x86 task switching mechanism.
FreeBSD/i386 uses task switching to handle double fault exceptions and this
change enables that to work.

Reported by:	glebius
2014-07-16 21:26:26 +00:00
Neel Natu
f7a9f1784f Add support for operand size and address size override prefixes in bhyve's
instruction emulation [1].

Fix bug in emulation of opcode 0x8A where the destination is a legacy high
byte register and the guest vcpu is in 32-bit mode. Prior to this change
instead of modifying %ah, %bh, %ch or %dh the emulation would end up
modifying %spl, %bpl, %sil or %dil instead.

Add support for moffsets by treating it as a 2, 4 or 8 byte immediate value
during instruction decoding.

Fix bug in verify_gla() where the linear address computed after decoding
the instruction was not being truncated to the effective address size [2].

Tested by:	Leon Dang [1]
Reported by:	Peter Grehan [2]
Sponsored by:	Nahanni Systems
2014-07-15 17:37:17 +00:00
Neel Natu
b301b9e28f Accurately identify the vcpu's operating mode as 64-bit, compatibility,
protected or real.
2014-07-08 21:48:57 +00:00
Konstantin Belousov
633034fe0e Add FPU_KERN_KTHR flag to fpu_kern_enter(9), which avoids saving FPU
context into memory for the kernel threads which called
fpu_kern_thread(9).  This allows the fpu_kern_enter() callers to not
check for is_fpu_kern_thread() to get the optimization.

Apply the flag to padlock(4) and aesni(4).  In aesni_cipher_process(),
do not leak FPU context state on error.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-23 07:37:54 +00:00
Roger Pau Monné
ef409ede7b amd64/i386: introduce APIC hooks for different APIC implementations.
This is needed for Xen PV(H) guests, since there's no hardware lapic
available on this kind of domains. This commit should not change
functionality.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Approved by: gibbs

amd64/include/cpu.h:
amd64/amd64/mp_machdep.c:
i386/include/cpu.h:
i386/i386/mp_machdep.c:
 - Remove lapic_ipi_vectored hook from cpu_ops, since it's now
   implemented in the lapic hooks.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
 - Use lapic_ipi_vectored directly, since it's now an inline function
   that will call the appropiate hook.

x86/x86/local_apic.c:
 - Prefix bare metal public lapic functions with native_ and mark them
   as static.
 - Define default implementation of apic_ops.

x86/include/apicvar.h:
 - Declare the apic_ops structure and create inline functions to
   access the hooks, so the change is transparent to existing users of
   the lapic_ functions.

x86/xen/hvm.c:
 - Switch to use the new apic_ops.
2014-06-16 08:43:03 +00:00
Tycho Nightingale
5ebc578ba6 Replace enum forward declarations with complete definitions.
Reviewed by:	neel
2014-06-10 18:46:00 +00:00
Neel Natu
404874659f Add helper functions to populate VM exit information for rendezvous and
astpending exits. This is to reduce code duplication between VT-x and
SVM implementations.
2014-06-10 16:45:58 +00:00
Neel Natu
5fcf252f41 Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained
by vmm.ko. This allows the virtual machine to be restarted without having
to destroy it first.

Reviewed by:	grehan
2014-06-07 21:36:52 +00:00
Neel Natu
95ebc360ef Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.

Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.

This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.
2014-05-31 23:37:34 +00:00
Neel Natu
65ffa035a7 Add segment protection and limits violation checks in vie_calculate_gla()
for 32-bit x86 guests.

Tested using ins/outs executed in a FreeBSD/i386 guest.
2014-05-27 04:26:22 +00:00
Neel Natu
5382c19d81 Do the linear address calculation for the ins/outs emulation using a new
API function 'vie_calculate_gla()'.

While the current implementation is simplistic it forms the basis of doing
segmentation checks if the guest is in 32-bit protected mode.
2014-05-25 00:57:24 +00:00
Neel Natu
da11f4aa1d Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out
of the guest linear address space. These APIs in turn use a new ioctl
'VM_GLA2GPA' to convert the guest linear address to guest physical.

Use the new copyin/copyout APIs when emulating ins/outs instruction in
bhyve(8).
2014-05-24 23:12:30 +00:00
Neel Natu
e813a87350 Consolidate all the information needed by the guest page table walker into
'struct vm_guest_paging'.

Check for canonical addressing in vmm_gla2gpa() and inject a protection
fault into the guest if a violation is detected.

If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to
point to the root of the page tables.
2014-05-24 20:26:57 +00:00
Neel Natu
37a723a5b3 When injecting a page fault into the guest also update the guest's %cr2 to
indicate the faulting linear address.

If the guest PML4 entry has the PG_PS bit set then inject a page fault into
the guest with the PGEX_RSV bit set in the error_code.

Get rid of redundant checks for the PG_RW violations when walking the page
tables.
2014-05-24 19:13:25 +00:00
Neel Natu
a7424861fb Check for alignment check violation when processing in/out string instructions. 2014-05-23 19:59:14 +00:00
Neel Natu
d17b5104a9 Add emulation of the "outsb" instruction. NetBSD guests use this to write to
the UART FIFO.

The emulation is constrained in a number of ways: 64-bit only, doesn't check
for all exception conditions, limited to i/o ports emulated in userspace.

Some of these constraints will be relaxed in followup commits.

Requested by:	grehan
Reviewed by:	tychon (partially and a much earlier version)
2014-05-23 05:15:17 +00:00