6631 Commits

Author SHA1 Message Date
neel
77ab4804ac Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

Add check to vm_gpa2hpa() that the range indicated by [gpa,gpa+len) is all
contained within a single 4KB page.
2012-10-03 01:18:51 +00:00
neel
3e50e0220b Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

Rewrite vm_gpa2hpa() to get the GPA to HPA mapping by querying the nested
page tables.
2012-10-03 00:46:30 +00:00
neel
bc87f08e98 Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

In this case vm_malloc() was using vm_gpa2hpa() to indirectly infer whether
or not the address range had already been allocated.

Replace this instead with an explicit API 'vm_gpa_available()' that returns
TRUE if a page is available for allocation in guest physical address space.
2012-09-29 01:15:45 +00:00
jhb
f643d4c50a - Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific
bits under #ifdef _KERNEL but leave definitions for various structures
  defined by standards ($PIR table, SMAP entries, etc.) available to
  userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
  and smbios(4) in <machine/pc/bios.h> and make them available to
  userland.

MFC after:	2 weeks
2012-09-28 11:59:32 +00:00
alc
55f6ff40ed Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.
2012-09-28 05:30:59 +00:00
neel
b65259b285 Intel VT-x provides the length of the instruction at the time of the nested
page table fault. Use this when fetching the instruction bytes from the guest
memory.

Also modify the lapic_mmio() API so that a decoded instruction is fed into it
instead of having it fetch the instruction bytes from the guest. This is
useful for hardware assists like SVM that provide the faulting instruction
as part of the vmexit.
2012-09-27 00:27:58 +00:00
neel
5dbc1ca26a Add an option "-a" to present the local apic in the XAPIC mode instead of the
default X2APIC mode to the guest.
2012-09-26 00:06:17 +00:00
neel
bc269b51af Add support for trapping MMIO writes to local apic registers and emulating them.
The default behavior is still to present the local apic to the guest in the
x2apic mode.
2012-09-25 22:31:35 +00:00
neel
ebdd69568d Add ioctls to control the X2APIC capability exposed by the virtual machine to
the guest.

At the moment this simply sets the state in the 'vcpu' instance but there is
no code that acts upon these settings.
2012-09-25 19:08:51 +00:00
neel
c34be7b811 Add an explicit exit code 'SPINUP_AP' to tell the controlling process that an
AP needs to be activated by spinning up an execution context for it.

The local apic emulation is now completely done in the hypervisor and it will
detect writes to the ICR_LO register that try to bring up the AP. In response
to such writes it will return to userspace with an exit code of SPINUP_AP.

Reviewed by: grehan
2012-09-25 02:33:25 +00:00
neel
34b672cc8a Stash the 'vm_exit' information in each 'struct vcpu'.
There is no functional change at this time but this paves the way for vm exit
handler functions to easily modify the exit reason going forward.
2012-09-24 19:32:24 +00:00
dim
1e04d43259 After r205013, amd64 and i386 CPU family and model IDs were printed out
in hexadecimal, but without any 0x prefix, which can be very misleading.

MFC after:	3 days
2012-09-21 10:31:19 +00:00
neel
c0caea8c2f Restructure the x2apic access code in preparation for supporting memory mapped
access to the local apic.

The vlapic code is now aware of the mode that the guest is using to access the
local apic.

Reviewed by: grehan@
2012-09-21 03:09:23 +00:00
attilio
be930066f1 MFC 2012-09-21 03:07:34 +00:00
jimharris
802d10fdbc Integrate nvme(4) and nvd(4) into the amd64 and i386 builds.
Sponsored by:	Intel
2012-09-17 19:26:33 +00:00
kib
7b37f0ff96 Rename the IVY_RNG option to RDRAND_RNG.
Based on submission by:	Arthur Mesh <arthurmesh@gmail.com>
MFC after:	2 weeks
2012-09-13 10:12:16 +00:00
alc
2bc702613a Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(),
pmap_unmapdev()'s own direct efforts to destroy the page table entries are
redundant, so eliminate them.

Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.
Setting PTE_W on MIPS is inconsistent with the implementation of this
function on other architectures.  Moreover, PTE_W should not be set, unless
the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}()
doesn't do.

MFC after:	10 days
2012-09-10 16:11:29 +00:00
attilio
8dece93b14 userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:27:11 +00:00
kib
dac91f5998 Add support for new Intel on-CPU Bull Mountain random number
generator, found on IvyBridge and supposedly later CPUs, accessible
with RDRAND instruction.

From the Intel whitepapers and articles about Bull Mountain, it seems
that we do not need to perform post-processing of RDRAND results, like
AES-encryption of the data with random IV and keys, which was done for
Padlock. Intel claims that sanitization is performed in hardware.

Make both Padlock and Bull Mountain random generators support code
covered by kernel config options, for the benefit of people who prefer
minimal kernels. Also add the tunables to disable hardware generator
even if detected.

Reviewed by:	markm, secteam (simon)
Tested by:	bapt, Michael Moll <kvedulv@kvedulv.de>
MFC after:	3 weeks
2012-09-05 13:18:51 +00:00
alc
b1adba8ac6 Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the
comment describing them.  Both the function names and the comment had grown
stale.  Quite some time has passed since these pmap implementations last
used the page's hold count to track the number of valid mapping within a
page table page.  Also, returning TRUE from pmap_unwire_ptp() rather than
_pmap_unwire_ptp() eliminates a few instructions from callers like
pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used
directly by a conditional statement.
2012-09-05 06:02:54 +00:00
delphij
afd0bbdf85 Add hpt27xx to GENERIC kernel for amd64 and i386 systems.
MFC after:	2 weeks
2012-09-04 21:02:57 +00:00
jhb
599115bdcb Fix duplicate entries for mwl(4):
- Move mwlfw from {amd64,i386}/conf/NOTES to sys/conf/NOTES (mwl(4) is
  already present in sys/conf/NOTES).
- Remove duplicate mwl(4) entries from {amd64,i386}/conf/NOTES.
- While here, add a description to the sfxge line in amd64/conf/NOTES.
2012-09-04 19:19:36 +00:00
jhb
dc45fbdfb7 Fix misspelled "Infiniband".
Submitted by:	gcooper
MFC after:	3 days
2012-08-28 11:34:09 +00:00
attilio
d3c5a80b69 MFC 2012-08-27 11:59:04 +00:00
grehan
6c5ad005be Add sysctls to display the total and free amount of hard-wired mem for VMs
# sysctl hw.vmm
   hw.vmm.mem_free: 2145386496
   hw.vmm.mem_total: 2145386496

Submitted by:	Takeshi HASEGAWA hasegaw at gmail com
2012-08-26 01:41:41 +00:00
gjb
3f013cdf9f Grammar fix: s/NIC's/NICs/
MFC after:	3 days
2012-08-26 01:21:02 +00:00
des
0c96728586 As discussed on -current, remove the hardcoded default maxswzone.
MFC after:	3 weeks
2012-08-14 17:01:21 +00:00
kib
92b79b92fb Add a hackish debugging facility to provide a bit of information about
reason for generated trap. The dump of basic signal information and 8
bytes of the faulting instruction are printed on the controlling
terminal of the process, if the machdep.uprintf_signal syscal is
enabled.

The print is the only practical way to debug traps from a.out
processes I am aware of. Because I have to reimplement it each time I
debug an issue with a.out support on amd64, commit the hack to main
tree.

MFC after:	1 week
2012-08-14 12:15:01 +00:00
kib
11621fbf7f Real hardware, as opposed to QEMU, does not allow to have a call gate
in long mode which transfers control to 32bit code segment. Unbreak
the lcall $7,$0 implementation on amd64 by putting the 64bit user code
segment' selector into call gate, and execute the 64bit trampoline
which converts the return frame into 32bit format and switches back to
32bit mode for executing int $0x80 trampoline.

Note that all jumps over the hoops are performed in the user mode.

MFC after:	1 week
2012-08-14 12:13:27 +00:00
jhb
7d55435a89 Remove the deassert INIT IPI from the IPI startup sequence for APs.
It is not listed in the boot sequence in the MP specification (1.4),
and it is explicitly ignored on modern CPUs.  It was only ever required
when bootstrapping systems with external APICs (that is, SMP machines
with 486s), which FreeBSD has never supported (and never will).

While here, tidy some comments and remove some banal ones.
2012-08-13 18:52:51 +00:00
jhb
6c62ea1c51 Add a 10 millisecond delay after sending the initial INIT IPI. This
matches the algorithm in the MP specification (1.4).  Previously we
were sending out the deassert INIT IPI immediately after the initial
INIT IPI was sent.
2012-08-13 16:33:22 +00:00
cperciva
fa7b327d11 Build modules along with the XENHVM kernels.
No objections from:	freebsd-xen mailing list
MFC after:	1 week
2012-08-13 07:36:57 +00:00
alc
54cb95d638 The assertion that I added in r238889 could legitimately fail when a
debugger creates a breakpoint.  Replace that assertion with a narrower
one that still achieves my objective.

Reported and tested by:	kib
2012-08-08 05:28:30 +00:00
kib
1187e5b624 Do not apply errata 721 workaround when under hypervisor, since
typical hypervisor does not implement access to the required MSR,
causing #GP on boot.

Reported and tested by:	olgeni
PR:	amd64/170388
MFC after:	3 days
2012-08-07 08:36:10 +00:00
pluknet
a35d69cfc4 Remove duplicate header inclusion of <sys/sysent.h>
Discussed with:	bz
2012-08-07 05:46:36 +00:00
alc
bc11d86648 Shave off a few more cycles from the average execution time of pmap_enter()
by simplifying the control flow and reducing the live range of "om".
2012-08-05 16:59:02 +00:00
neel
66c8120152 Include 'device uart' in the guest kernel. 2012-08-04 04:30:26 +00:00
neel
d40b98f60b Force certain bits in %cr4 to be hard-wired to '1' or '0' from a guest's
perspective. If we don't do this some guest OSes (e.g. Linux) will reset
the CR4_VMXE bit in %cr4 with disastrous consequences.

Reported by: grehan
2012-08-04 02:06:55 +00:00
attilio
c52a057b19 MFC 2012-08-03 15:58:05 +00:00
kib
36babd37ca Add lfence().
MFC after:	1 week
2012-08-01 17:24:53 +00:00
alc
9c4b62fad8 Revise pmap_enter()'s handling of mapping updates that change the
PTE's PG_M and PG_RW bits but not the physical page frame.  First,
only perform vm_page_dirty() on a managed vm_page when the PG_M bit is
being cleared.  If the updated PTE continues to have PG_M set, then
there is no requirement to perform vm_page_dirty().  Second, flush the
mapping from the TLB when PG_M alone is cleared, not just when PG_M
and PG_RW are cleared.  Otherwise, a stale TLB entry may stop PG_M
from being set again on the next store to the virtual page.  However,
since the vm_page's dirty field already shows the physical page as
being dirty, no actual harm comes from the PG_M bit not being set.
Nonetheless, it is potentially confusing to someone expecting to see
the PTE change after a store to the virtual page.
2012-08-01 16:04:13 +00:00
kib
c6e5735bc4 Change (unused) prototype for stmxcsr() to match reality.
Noted by:	jhb
MFC after:	1 week
2012-07-30 19:26:02 +00:00
alc
dcd4c15b16 Shave off a few more cycles from pmap_enter()'s critical section. In
particular, do a little less work with the PV list lock held.
2012-07-29 18:20:49 +00:00
neel
add4e182f6 Verify that VMX operation has been enabled by BIOS before executing the
VMXON instruction.

Reported by "s vas" on freebsd-virtualization@
2012-07-25 00:21:16 +00:00
kib
b9c519314f Forcibly shut up clang warning about NULL pointer dereference.
MFC after:	3 weeks
2012-07-23 19:16:31 +00:00
kib
bff2a9c29d Constently use 2-space sentence breaks.
Submitted by:	 bde
MFC after:	 1 week
2012-07-21 13:53:00 +00:00
kib
8175cd6cd3 Stop caching curpcb in the local variable.
Requested by:	    bde
MFC after:	    1 week
2012-07-21 13:47:37 +00:00
kib
125bff5612 The PT_I386_{GET,SET}XMMREGS and PT_{GET,SET}XSTATE operate on the
stopped threads. Implementation assumes that the thread's FPU context
is spilled into the PCB due to stop. This is mostly true, except when
FPU state for the thread is not initialized. Then the requests operate
on the garbage state which is currently left in the PCB, causing
confusion.

The situation is indeed observed after a signal delivery and before
#NM fault on execution of any FPU instruction in the signal handler,
since sendsig(9) drops FPU state for current thread, clearing
PCB_FPUINITDONE. When inspecting context state for the signal handler,
debugger sees the FPU state of the main program context instead of the
clear state supposed to be provided to handler.

Fix this by forcing clean FPU state in PCB user FPU save area by
performing getfpuregs(9) before accessing user FPU save area in
ptrace_machdep.c.

Note: this change will be merged to i386 kernel as well, where it is
much more important, since e.g. gdb on i386 uses PT_I386_GETXMMREGS to
inspect FPU context on CPUs that support SSE. Amd64 version of gdb
uses PT_GETFPREGS to inspect both 64 and 32 bit processes, which does
not exhibit the bug.

Reported by:	bde
MFC after:	1 week
2012-07-21 13:06:37 +00:00
kib
3584ef8115 Stop clearing x87 exceptions in the #MF handler on amd64. If user code
understands FPU hardware enough to catch SIGFPE and unmask exceptions
in control word, then it may as well properly handle return from
SIGFPE without causing an infinite loop of #MF exceptions due to
faulting instruction restart, when needed.

Clearing exceptions causes information loss for handlers which do
understand FPU hardware, and struct siginfo si_code member cannot be
considered adequate replacement for en_sw content due to translation.

Supposed reason for clearing the exceptions, which is IRQ13 handling
oddities, were never applicable to amd64.

Note: this change will be merged to i386 kernel as well, since we do
not support IRQ13 delivery of #MF notifications for some time.

Requested by:	bde
MFC after:	1 week
2012-07-21 13:05:34 +00:00
kib
ebf0cf4fd1 Introduce curpcb magic variable, similar to curthread, which is MD
amd64.  It is implemented as __pure2 inline with non-volatile asm read
from pcpu, which allows a compiler to cache its results.

Convert most PCPU_GET(pcb) and curthread->td_pcb accesses into curpcb.

Note that __curthread() uses magic value 0 as an offsetof(struct pcpu,
pc_curthread). It seems to be done this way due to machine/pcpu.h
needs to be processed before sys/pcpu.h, because machine/pcpu.h
contributes machine-depended fields to the struct pcpu definition. As
result, machine/pcpu.h cannot use struct pcpu yet.

The __curpcb() also uses a magic constant instead of offsetof(struct
pcpu, pc_curpcb) for the same reason. The constants are now defined as
symbols and CTASSERTs are added to ensure that future KBI changes do
not break the code.

Requested and reviewed by: bde
MFC after:    3 weeks
2012-07-19 19:09:12 +00:00