Commit Graph

6963 Commits

Author SHA1 Message Date
kib
d02ef38799 MFC r267767:
Add FPU_KERN_KTHR flag to fpu_kern_enter(9).
Apply the flag to padlock(4) and aesni(4).
In aesni_cipher_process(), do not leak FPU context state on error.
2014-06-30 09:48:44 +00:00
jhb
bdcad19a9a MFC 261781:
Don't waste a page of KVA for the boot-time memory test on x86.  For amd64,
reuse the first page of the crashdumpmap as CMAP1/CADDR1.  For i386,
remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test.
2014-06-27 17:22:18 +00:00
neel
13424f9f51 MFC r266901
Allocate a zeroed LDT.
Failing to do this might result in the LDT appearing to run out of free
descriptors because of random junk in the descriptor's 'sd_type' field.
2014-06-17 21:49:03 +00:00
dchagin
4fc6560286 Revert MFC r266925 because it can lead to instant panic at fexecve():
To allow to run interpreter itself add a new ELF branding type.

Pointed out by:	kib, mjg
2014-06-17 05:21:48 +00:00
jhb
f6a797dc57 MFC 262139,262140,262236,262281,262532:
Various x2APIC fixes and enhancements:
- Use spinlocks for the vioapic.
- Handle the SELF_IPI MSR.
- Simplify the APIC mode switching between MMIO and x2APIC.  The guest is
  no longer allowed to switch modes at runtime.  Instead, the desired mode
  is set when the virtual machine is created.
- Disallow MMIO access in x2APIC mode and MSR access in xAPIC mode.
- Add support for x2APIC virtualization assist in Intel VT-x.
2014-06-13 19:10:40 +00:00
jhb
d33f5b1253 MFC 262615,262624:
Workaround an apparent bug in VMWare Fusion's nested VT support where it
triggers a VM exit with the exit reason of an external interrupt but
without a valid interrupt set in the exit interrupt information.
2014-06-12 21:36:17 +00:00
jhb
3e1f2ae835 MFC 261638,262144,262506,266765:
Add virtualized XSAVE support to bhyve which permits guests to use XSAVE and
XSAVE-enabled features like AVX.
- Store a per-cpu guest xcr0 register and handle xsetbv VM exits by emulating
  the instruction.
- Only expose XSAVE to guests if XSAVE is enabled in the host.  Only expose
  a subset of XSAVE features currently supported by the guest and for which
  the proper emulation of xsetbv is known.  Currently this includes X87, SSE,
  AVX, AVX-512, and Intel MPX.
- Add support for injecting hardware exceptions into the guest and use this
  to trigger exceptions in the guest for invalid xsetbv operations instead
  of potentially faulting in the host.
- Queue pending exceptions in the 'struct vcpu' instead of directly updating
  the processor-specific VMCS or VMCB. The pending exception will be delivered
  right before entering the guest.
- Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
  it to only deliver x86 hardware exceptions. This new ioctl is now used to
  inject a protection fault when the guest accesses an unimplemented MSR.
- Expose a subset of known-safe features from leaf 0 of the structured
  extended features to guests if they are supported on the host including
  RDFSBASE/RDGSBASE, BMI1/2, AVX2, AVX-512, HLE, ERMS, and RTM.  Aside
  from AVX-512, these features are all new instructions available for use
  in ring 3 with no additional hypervisor changes needed.
2014-06-12 19:58:12 +00:00
jhb
8119d8278f MFC 266263,266551,266552:
- Add definitions for more structured extended features as well as
  XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions).
- Don't permit users to request a subset of the AVX512 or MPX xsave masks.
2014-06-12 17:15:56 +00:00
jhb
fa121e2a05 MFC 261504:
Add support for FreeBSD/i386 guests under bhyve.
2014-06-12 15:20:59 +00:00
jhb
a9824f54c0 MFC 261503,264501:
Emulate the byte move and zero/sign extend instructions.
2014-06-12 13:48:52 +00:00
jhb
835cb387cc MFC 260239,261268,265058:
Expand the support for PCI INTx interrupts including providing interrupt
routing information for INTx interrupts to I/O APIC pins and enabling
INTx interrupts in the virtio and AHCI backends.
2014-06-12 13:13:15 +00:00
kib
e6adf4e57f MFC r266846:
When usermode loaded non-default segment selector into the %gs,
correctly prepare KGSBASE msr to restore the user descriptor base on
the last swapgs during return to usermode.
2014-06-05 00:40:48 +00:00
jhb
79dc832193 MFC 260972:
There is no need to initialize the IOMMU if no passthru devices have been
configured for bhyve to use.
2014-06-04 17:57:48 +00:00
dchagin
e871acf5e4 MFC r266925:
To allow to run the interpreter itself add a new ELF branding type.
Allow Linux ABI to run ELF interpreter.
2014-06-03 04:31:42 +00:00
jhb
a8560098ab MFC 260802,260836,260863,261001,261074,261617:
Various fixes for NMI and interrupt injection.
- If a VM-exit happens during an NMI injection then clear the "NMI Blocking"
  bit in the Guest Interruptibility-state VMCS field.
- If the guest exits due to a fault while it is executing IRET then restore
  the state of "Virtual NMI blocking" in the guest's interruptibility-state
  field before resuming the guest.
- Inject a pending NMI only if NMI_BLOCKING, MOVSS_BLOCKING, STI_BLOCKING
  are all clear. If any of these bits are set then enable "NMI window
  exiting" and inject the NMI in the VM-exit handler.
- Handle a VM-exit due to a NMI properly by vectoring to the host's NMI
  handler via a software interrupt.
- Set "Interrupt Window Exiting" in the case where there is a vector to be
  injected into the vcpu but the VM-entry interruption information field
  already has the valid bit set.
- For VM-exits due to an NMI, handle the NMI with interrupts disabled in
  addition to "blocking by NMI" already established by the VM-exit.
2014-05-23 19:39:58 +00:00
jhb
eb31f23e07 MFC 260237:
Fix a bug in the HPET emulation where a timer interrupt could be lost when the
guest disables the HPET.

The HPET timer interrupt is triggered from the callout handler associated with
the timer. It is possible for the callout handler to be delayed before it gets
a chance to execute. If the guest disables the HPET during this window then the
handler never gets a chance to execute and the timer interrupt is lost.

This is now fixed by injecting a timer interrupt into the guest if the callout
time is detected to be in the past when the HPET is disabled.
2014-05-20 21:05:36 +00:00
jhb
5e2f766c6c MFC 259737, 262646:
Fix a couple of issues with vcpu state:
- Add a parameter to 'vcpu_set_state()' to enforce that the vcpu is in the
  IDLE state before the requested state transition. This guarantees that
  there is exactly one ioctl() operating on a vcpu at any point in time and
  prevents unintended state transitions.
- Fix a race between VMRUN() and vcpu_notify_event() due to 'vcpu->hostcpu'
  being updated outside of the vcpu_lock().
2014-05-18 04:33:24 +00:00
jhb
bbf655f9b4 MFC 259641,259863,259924,259937,259961,259978,260380,260383,260410,260466,
260531,260532,260550,260619,261170,261453,261621,263280,263290,264516:
Add support for local APIC hardware-assist.
- Restructure vlapic access and register handling to support hardware-assist
  for the local APIC.
- Use the 'Virtual Interrupt Delivery' and 'Posted Interrupt Processing'
  feature of Intel VT-x if supported by hardware.
- Add an API to rendezvous all active vcpus in a virtual machine and use
  it to support level triggered interrupts with VT-x 'Virtual Interrupt
  Delivery'.
- Use a cheaper IPI handler than IPI_AST for nested page table shootdowns
  and avoid doing unnecessary nested TLB invalidations.

Reviewed by:	neel
2014-05-17 19:11:08 +00:00
ian
4c5f4bdec5 MFC 263301
In kernel config files, it is supposed to be 'options<space><tab>' not
  'options<tab><tab>', per long standing (but recently not so strictly
  enforced) convention.
2014-05-17 17:34:37 +00:00
ian
d7afc2b3b0 MFC 263246: Align all comments in config files on same column. 2014-05-17 16:57:17 +00:00
ian
3724a25461 MFC 263036, 263059: delete advertising clause in licenses, renumber. 2014-05-17 13:59:11 +00:00
ian
3fad78482e MFC 264304: Really only allow IMGACT_BINMISC for amd64/i386 builds. 2014-05-16 22:55:01 +00:00
ian
8e2dd9b5e3 MFC r257854 (discussed with alc@)
As of r257209, all architectures have defined
  VM_KMEM_SIZE_SCALE.  In other words, every architecture is now
  auto-sizing the kmem arena.  This revision changes kmeminit() so
  that the definition of VM_KMEM_SIZE_SCALE becomes mandatory and
  the definition of VM_KMEM_SIZE becomes optional.

  Replace or eliminate all existing definitions of VM_KMEM_SIZE.
  With auto-sizing enabled, VM_KMEM_SIZE effectively became an
  alternate spelling for VM_KMEM_SIZE_MIN on most architectures.
  Use VM_KMEM_SIZE_MIN for clarity.
2014-05-16 01:30:30 +00:00
scottl
96be897ce1 Merge r264984
Retire smp_active.  It was racey and caused demonstrated problems with
the cpufreq code.  Replace its use with smp_started.  There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.

Obtained from:	Netflix, Inc.
2014-05-07 20:28:27 +00:00
alc
8be11d4db2 MFC r262338
When the kernel is running in a virtual machine, it cannot rely upon the
  processor family to determine if the workaround for AMD Family 10h Erratum
  383 should be enabled.  To enable virtual machine migration among a
  heterogeneous collection of physical machines, the hypervisor may have
  been configured to report an older processor family with a reduced feature
  set.  Effectively, the reported processor family and its features are like
  a "least common denominator" for the collection of machines.

  Therefore, when the kernel is running in a virtual machine, instead of
  relying upon the processor family, we now test for features that prove
  that the underlying processor is not affected by the erratum.  (The
  features that we test for are unlikely to ever be emulated in software
  on an affected physical processor.)

PR:		186061
2014-05-07 00:32:49 +00:00
ken
a354f057f0 MFC the mpr(4) driver for LSI's 12Gb SAS cards.
This includes r265236, r265237, r265241 and r265261:

  ------------------------------------------------------------------------
  r265236 | ken | 2014-05-02 14:25:09 -0600 (Fri, 02 May 2014) | 51 lines

  Bring in the mpr(4) driver for LSI's MPT3 12Gb SAS controllers.

  This is derived from the mps(4) driver, but it supports only the 12Gb
  IT and IR hardware including the SAS 3004, SAS 3008 and SAS 3108.

  Some notes about this driver:
   o The 12Gb hardware can do "FastPath" I/O, and that capability is included in
     this driver.

   o WarpDrive functionality has been removed, since it isn't supported in
     the 12Gb driver interface.

   o The Scatter/Gather list handling code is significantly different between
     the 6Gb and 12Gb hardware.  The 12Gb boards support IEEE Scatter/Gather
     lists.

  Thanks to LSI for developing and testing this driver for FreeBSD.

  share/man/man4/mpr.4:
  	mpr(4) man page.

  sys/dev/mpr/*:
  	mpr(4) driver files.

  sys/modules/Makefile,
  sys/modules/mpr/Makefile:
  	Add a module Makefile for the mpr(4) driver.

  sys/conf/files:
  	Add the mpr(4) driver.

  sys/amd64/conf/GENERIC,
  sys/i386/conf/GENERIC,
  sys/mips/conf/OCTEON1,
  sys/sparc64/conf/GENERIC:
  	Add the mpr(4) driver to all config files that currently
  	have the mps(4) driver.

  sys/ia64/conf/GENERIC:
  	Add the mps(4) and mpr(4) drivers to the ia64 GENERIC
  	config file.

  sys/i386/conf/XEN:
  	Exclude the mpr module from building here.

  Submitted by:	Steve McConnell <Stephen.McConnell@lsi.com>
  Tested by:	Chris Reeves <chrisr@spectralogic.com>
  Sponsored by:	LSI, Spectra Logic
  Relnotes:	LSI 12Gb SAS driver mpr(4) added

  ------------------------------------------------------------------------
  ------------------------------------------------------------------------
  r265237 | ken | 2014-05-02 14:36:20 -0600 (Fri, 02 May 2014) | 8 lines

  Add the mpr(4) man page to the man4 Makefile.

  This should have been included in r265236.

  Submitted by:	Steve McConnell <Stephen.McConnell@lsi.com>
  MFC after:	3 days
  Sponsored by:	LSI, Spectra Logic

  ------------------------------------------------------------------------
  ------------------------------------------------------------------------
  r265241 | brueffer | 2014-05-02 15:14:28 -0600 (Fri, 02 May 2014) | 2 lines

  Use our standard SYNOPSIS wording; perform some cleanup while here.

  ------------------------------------------------------------------------
  ------------------------------------------------------------------------
  r265261 | brueffer | 2014-05-03 05:15:28 -0600 (Sat, 03 May 2014) | 2 lines

  Add a missing colon.

  ------------------------------------------------------------------------

Submitted by:	Steve McConnell <Stephen.McConnell@lsi.com>
Tested by:	Chris Reeves <chrisr@spectralogic.com>
Sponsored by:	LSI, Spectra Logic
Relnotes:	LSI 12Gb SAS driver mpr(4) added
2014-05-05 20:35:35 +00:00
kib
2dee5cfb76 MFC r265004:
Same as it was done in r263878 for invlrng_handler(), fix order of
checks for special pcid values in invlpg_pcid_handler().
2014-05-04 07:22:51 +00:00
jhb
2e8b45c43c MFC 258860,260167,260238,260397:
- Restructure the VMX code to enter and exit the guest. In large part this
  change hides the setjmp/longjmp semantics of VM enter/exit.
  vmx_enter_guest() is used to enter guest context and vmx_exit_guest() is
  used to transition back into host context.

  Fix a longstanding race where a vcpu interrupt notification might be
  ignored if it happens after vmx_inject_interrupts() but before host
  interrupts are disabled in vmx_resume/vmx_launch. We now call
  vmx_inject_interrupts() with host interrupts disabled to prevent this.
- The 'protection' field in the VM exit collateral for the PAGING exit is
  not used - get rid of it.

Reviewed by:	grehan
2014-04-17 18:00:07 +00:00
kib
660fbf80db MFC r263912:
Clear the kernel grab of the FPU state on fork.
2014-04-05 14:24:29 +00:00
kib
5e4b8bc0fd MFC r263878:
Several fixes for the PCID implementation.
2014-04-04 19:17:33 +00:00
royger
382b727d12 MFC r263001
Move asm IPIs handlers to C code, so both Xen and native IPI handlers
share the same code.

Approved by: gibbs
Sponsored by: Citrix Systems R&D
2014-04-04 14:54:54 +00:00
kib
ef58943ab3 MFC r263475:
Fix two issues with /dev/mem access on amd64, both causing kernel page
faults.

First, for accesses to direct map region should check for the limit by
which direct map is instantiated.

Second, for accesses to the kernel map, use a new thread private flag
TDP_DEVMEMIO, which instructs vm_fault() to return error when fault
happens on the MAP_ENTRY_NOFAULT entry, instead of panicing.

MFC r263498:
Add change forgotten in r263475.  Make dmaplimit accessible outside
amd64/pmap.c.
2014-03-28 15:38:38 +00:00
dim
9cedb8bb69 MFC 261991:
Upgrade our copy of llvm/clang to 3.4 release.  This version supports
all of the features in the current working draft of the upcoming C++
standard, provisionally named C++1y.

The code generator's performance is greatly increased, and the loop
auto-vectorizer is now enabled at -Os and -O2 in addition to -O3.  The
PowerPC backend has made several major improvements to code generation
quality and compile time, and the X86, SPARC, ARM32, Aarch64 and SystemZ
backends have all seen major feature work.

Release notes for llvm and clang can be found here:
<http://llvm.org/releases/3.4/docs/ReleaseNotes.html>
<http://llvm.org/releases/3.4/tools/clang/docs/ReleaseNotes.html>

MFC 262121 (by emaste):

Update lldb for clang/llvm 3.4 import

This commit largely restores the lldb source to the upstream r196259
snapshot with the addition of threaded inferior support and a few bug
fixes.

Specific upstream lldb revisions restored include:
   SVN      git
  181387  779e6ac
  181703  7bef4e2
  182099  b31044e
  182650  f2dcf35
  182683  0d91b80
  183862  15c1774
  183929  99447a6
  184177  0b2934b
  184948  4dc3761
  184954  007e7bc
  186990  eebd175

Sponsored by:	DARPA, AFRL

MFC 262186 (by emaste):

Fix mismerge in r262121

A break statement was lost in the merge.  The error had no functional
impact, but restore it to reduce the diff against upstream.

MFC 262303:

Pull in r197521 from upstream clang trunk (by rdivacky):

  Use the integrated assembler by default on FreeBSD/ppc and ppc64.

Requested by:	jhibbits

MFC 262611:

Pull in r196874 from upstream llvm trunk:

  Fix a crash that occurs when PWD is invalid.

  MCJIT needs to be able to run in hostile environments, even when PWD
  is invalid. There's no need to crash MCJIT in this case.

  The obvious fix is to simply leave MCContext's CompilationDir empty
  when PWD can't be determined. This way, MCJIT clients,
  and other clients that link with LLVM don't need a valid working directory.

  If we do want to guarantee valid CompilationDir, that should be done
  only for clients of getCompilationDir(). This is as simple as checking
  for an empty string.

  The only current use of getCompilationDir is EmitGenDwarfInfo, which
  won't conceivably run with an invalid working dir. However, in the
  purely hypothetically and untestable case that this happens, the
  AT_comp_dir will be omitted from the compilation_unit DIE.

This should help fix assertions occurring with ports-mgmt/tinderbox,
when it is using jails, and sometimes invalidates clang's current
working directory.

Reported by:	decke

MFC 262809:

Pull in r203007 from upstream clang trunk:

  Don't produce an alias between destructors with different calling conventions.

  Fixes pr19007.

(Please note that is an LLVM PR identifier, not a FreeBSD one.)

This should fix Firefox and/or libxul crashes (due to problems with
regparm/stdcall calling conventions) on i386.

Reported by:	multiple users on freebsd-current
PR:		bin/187103

MFC 263048:

Repair recognition of "CC" as an alias for the C++ compiler, since it
was silently broken by upstream for a Windows-specific use-case.

Apparently some versions of CMake still rely on this archaic feature...

Reported by:	rakuco

MFC 263049:

Garbage collect the old way of adding the libstdc++ include directories
in clang's InitHeaderSearch.cpp.  This has been superseded by David
Chisnall's commit in r255321.

Moreover, if libc++ is used, the libstdc++ include directories should
not be in the search path at all.  These directories are now only used
if you pass -stdlib=libstdc++.
2014-03-21 17:53:59 +00:00
jkim
b04308525e MFC: r262746, r262748, r262750, r262752
Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c.
2014-03-10 20:47:24 +00:00
jhb
59b6242e90 MFC 259016,259019,259049,259071,259102,259110,259129,259130,259178,259179,
259203,259221,259261,259532,259615,259650,259651,259667,259680,259727,
259761,259772,259776,259777,259830,259882,259915,260160,260449,260450,
260688,260888,260953,261269,261547,261551,261552,261553,261585:
Merge the vt(4) driver (newcons) to stable/10.

Approved by:	ray
2014-03-06 18:30:56 +00:00
emaste
33dce93124 Disable amd64 TLB Context ID (pcid) by default for now
There are a number of reports of userspace application crashes that
are "solved" by setting vm.pmap.pcid_enabled=0, including Java and the
x11/mate-terminal port (PR ports/184362).

For now apply a band-aid of disabling pcid by default in stable/10.

Sponsored by:	The FreeBSD Foundation
2014-03-04 21:51:09 +00:00
jhb
92ff5e5bfb MFC 259542:
Use vmcs_read() and vmcs_write() in preference to vmread() and vmwrite()
respectively. The vmcs_xxx() functions provide inline error checking of
all accesses to the VMCS.
2014-02-23 01:34:40 +00:00
jhb
69d17427ca MFC 258859,259081,259085,259205,259213,259275,259482,259537,259702,259779:
Several changes to the local APIC support in bhyve:
- Rename 'vm_interrupt_hostcpu()' to 'vcpu_notify_event()'.
- If a vcpu disables its local apic and then executes a 'HLT' then spin
  down the vcpu and destroy its thread context. Also modify the 'HLT'
  processing to ignore pending interrupts in the IRR if interrupts have
  been disabled by the guest.  The interrupt cannot be injected into the
  guest in any case so resuming it is futile.
- Use callout(9) to drive the vlapic timer instead of clocking it on each
  VM exit.
- When the guest is bringing up the APs in the x2APIC mode a write to the
  ICR register will now trigger a return to userspace with an exitcode of
  VM_EXITCODE_SPINUP_AP.
- Change the vlapic timer lock to be a spinlock because the vlapic can be
  accessed from within a critical section (vm run loop) when guest is using
  x2apic mode.
- Fix the vlapic version register.
- Add a command to bhyvectl to inject an NMI on a specific vcpu.
- Add an API to deliver message signalled interrupts to vcpus. This allows
  callers to treat the MSI 'addr' and 'data' fields as opaque and also lets
  bhyve implement multiple destination modes: physical, flat and clustered.
- Rename the ambiguously named 'vm_setup_msi()' and 'vm_setup_msix()' to
  'vm_setup_pptdev_msi()' and 'vm_setup_pptdev_msix()' respectively.
- Consolidate the virtual apic initialization in a single function:
  vlapic_reset()
- Add a generic routine to trigger an LVT interrupt that supports both
  fixed and NMI delivery modes.
- Add an ioctl and bhyvectl command to trigger local interrupts inside a
  guest.  In particular, a global NMI similar to that raised by SERR# or
  PERR# can be simulated by asserting LINT1 on all vCPUs.
- Extend the LVT table in the vCPU local APIC to support CMCI.
- Flesh out the local APIC error reporting a bit to cache errors and
  report them via ESR when ESR is written to.  Add support for asserting
  the error LVT when an error occurs.  Raise illegal vector errors when
  attempting to signal an invalid vector for an interrupt or when sending
  an IPI.
- Export table entries in the MADT and MP Table advertising the stock x86
  config of LINT0 set to ExtInt and LINT1 wired to NMI.
2014-02-23 00:46:05 +00:00
jhb
04e37d68ee MFC 257297:
Remove unnecessary includes of <machine/pmap.h>
2014-02-22 23:34:39 +00:00
jhb
5b0bab6d24 MFC 261517,261520:
Convert the license on files where I am the sole copyright holder to
2 clause BSD licenses.
2014-02-18 20:27:17 +00:00
jhb
8e8bf4982f MFC 259140:
Move constants for indices in the local APIC's local vector table from
apicvar.h to apicreg.h.
2014-02-18 01:15:32 +00:00
avg
b46715eb45 MFC r257417: Remove references to an unused fasttrap probe hook 2014-02-17 12:57:13 +00:00
eadler
ec294fd7f5 MFC r258779,r258780,r258787,r258822:
Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this
shifts into the sign bit.  Instead use (1U << 31) which gets the
expected result.

Similar to the (1 << 31) case it is not defined to do (2 << 30).

This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.

A similar change was made in OpenBSD.
2014-02-04 03:36:42 +00:00
jhb
080d62f93d MFC 259782:
Add a resume hook for bhyve that runs a function on all CPUs during
resume.  For Intel CPUs, invoke vmxon for CPUs that were in VMX mode
at the time of suspend.
2014-01-29 21:23:37 +00:00
jhb
e1016866c7 MFC 257422,257661,258075,258476,258494,258579,258609,258699:
Several enhancements to the I/O APIC support in bhyve including:
- Move the I/O APIC device model from userspace into vmm.ko and add
  ioctls to assert and deassert I/O APIC pins.
- Add HPET device emulation including a single timer block with 8 timers.
- Remove the 'vdev' abstraction.

Approved by:	neel
2014-01-23 20:21:39 +00:00
kib
dd0e0d7345 MFC r260205:
Update the description for pmap_remove_pages() to match the modern
times.  Assert that the pmap passed to pmap_remove_pages() is only
active on current CPU.
2014-01-09 03:32:03 +00:00
kib
a3d28139be MFC r260204:
Assert that accounting for the pmap resident pages does not underflow.
2014-01-09 03:24:36 +00:00
dim
3a06a5ceae MFC r260103:
In sys/amd64/amd64/pmap.c, remove static function pmap_is_current(),
which has been unused since r189415.

Reviewed by:	alc
2014-01-04 21:45:52 +00:00
glebius
a70f2adf57 Merge r256868,257276-257277,257515,257913 from head. These are fixes
required to make Xen buтldable w/o INET.

Sponsored by:	Nginx, Inc.
2013-12-18 05:20:53 +00:00
kib
2c4d831850 MFC DMAR busdma implementation.
MFC r257251:
Import the driver for VT-d DMAR hardware.  Implement the busdma(9) using DMARs.

MFC r257512:
Add support for queued invalidation.

MFC miscellaneous follow-ups to r257251.

MFC r257266:
Remove redundand assignment to error variable and check for its value.

MFC r257308:
Remove redundand declaration.

MFC r257511:
Return BUS_PROBE_NOWILDCARD from the DMAR probe method.

MFC r257860,r257896,r257900,r257902,r257903 (by dim):
Fixes for gcc compilation.
2013-12-17 13:49:35 +00:00
royger
00083cffc4 MFC 258176:
Fix accounting for hw.realmem on the i386 and amd64 platforms.

sys/i386/i386/machdep.c:
sys/amd64/amd64/machdep.c:
	The value reported by FreeBSD as "real memory" when booting
	doesn't match what is later reported by sysctl as hw.realmem.
	This is due to the fact that the value printed during the
	boot process is fetched from smbios data (when possible),
	and accounts for holes in physical memory. On the other
	hand, the value of hw.realmem is unconditionally set to be
	one larger than the highest page of the physical address
	space.

	Fix this by setting hw.realmem to the same value printed
	during boot, this makes hw.realmem honour it's name and
	account properly for physical memory present in the system.

Submitted by:	Roger Pau Monné
Reviewed by:	gibbs
Approved by:	gibbs (mentor)
Approved by:	re (gjb)
2013-12-05 18:08:05 +00:00
kib
9934b5683d MFC r258660:
Fix sys/sysctl.h use for cc -m32 on amd64.

Approved by:	re (gjb)
2013-12-03 19:41:48 +00:00
emaste
b0519089ed MFC r258135: x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)
Debuggers may need to change PSL_RF. Note that tf_eflags is already stored
  in the signal context during signal handling and PSL_RF previously could
  be modified via sigreturn, so this change should not provide any new
  ability to userspace.

  For background see the thread at:
  http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

  Reviewed by:	jhb, kib

Sponsored by:	DARPA, AFRL
Approved by:	re (gjb)
2013-11-25 15:58:48 +00:00
kib
5b26108b2e MFC r257856:
Add bits for the AMD features from CPUID function 0x80000001 ECX,
described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf.

Approved by:	re (gjb)
2013-11-15 07:10:42 +00:00
kib
a2c021b32c MFC r257216:
Several small fixes for the amd64 minidump code.

Approved by:	re (gjb)
2013-11-03 16:03:19 +00:00
neel
b31f0060b5 MFC r256645.
Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.

Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.

Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.

Approved by:	re (hrs)
2013-10-22 00:58:51 +00:00
neel
3e1e4bc86d MFC r256570:
Fix the witness warning that warned against calling uiomove() while holding
the 'vmmdev_mtx' in vmmdev_rw().

Rely on the 'si_threadcount' accounting to ensure that we never destroy the
VM device node while it has operations in progress (e.g. ioctl, mmap etc).

Approved by:	re (rodrigc)
2013-10-16 21:52:54 +00:00
gjb
3595c37915 MFC r256328:
Document XENHVM and xenpci are mutually inclusive.

Approved by:    re (delphij)
Sponsored by:   The FreeBSD Foundation
2013-10-11 19:43:37 +00:00
gjb
cfff9c715e - Remove debugging from GENERIC* kernel configurations
- Enable MALLOC_PRODUCTION
- Default dumpdev=NO
- Remove UPDATING entry regarding debugging features
- Bump __FreeBSD_version to 1000500

Approved by:	re (implicit)
Sponsored by:	The FreeBSD Foundation
2013-10-10 17:59:44 +00:00
dim
f72bec9791 In sys/amd64/amd64/pmap.c, fix several gcc warnings about uninitialized
variables in reclaim_pv_chunk().

Approved by:	re (marius)
Reviewed by:	neel, kib
X-MFC-With:	r256072
2013-10-08 20:04:35 +00:00
gibbs
9c8c76f921 Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id
field.  Perform vcpu enumeration for Xen PV and HVM environments
and convert all Xen drivers to use vcpu_id instead of a hard coded
assumption of the mapping algorithm (acpi or apic ID) in use.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)

amd64/include/pcpu.h:
i386/include/pcpu.h:
	Add vcpu_id to the amd64 and i386 pcpu structures.

dev/xen/timer/timer.c
x86/xen/xen_intr.c
	Use new vcpu_id instead of assuming acpi_id == vcpu_id.

i386/xen/mp_machdep.c:
i386/xen/mptable.c
x86/xen/hvm.c:
	Perform Xen HVM and Xen full PV vcpu_id mapping.

x86/xen/hvm.c:
x86/acpica/madt.c
	Change SYSINIT ordering of acpi CPU enumeration so that it
	is guaranteed to be available at the time of Xen HVM vcpu
	id mapping.
2013-10-05 23:11:01 +00:00
neel
aed205d5cd Merge projects/bhyve_npt_pmap into head.
Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
               Bit Position           Interpreted By
PG_V               52                 software (accessed bit emulation handler)
PG_RW              53                 software (dirty bit emulation handler)
PG_A               0                  hardware (aka EPT_PG_RD)
PG_M               1                  hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by:	re
Discussed with:	grehan
Reviewed by:	alc, kib
Tested by:	pho
2013-10-05 21:22:35 +00:00
jmg
cb6017acba add aesni module to i386 and amd64 NOTES...
Approved by:	re (gjb)
2013-10-04 17:21:01 +00:00
grehan
b2189384ce Return 0 for a rdmsr of MSR_IA32_PLATFORM_ID. This
is enough to get Ubuntu 12.0.4/13.0.4 to boot.

Approved by:	re@ (blanket)
2013-09-27 14:55:59 +00:00
kib
8e50b328e4 In pmap_clear_modify(), initialize pvh even for fictitious managed
page, otherwise the small mappings loop would use uninitialized value.
Note that currently pmap_clear_modify() is not called for fictitious
pages.

Sponsored by:	The FreeBSD Foundation
Approved by:	re (glebius)
2013-09-24 13:52:47 +00:00
kib
28393634c5 Use the pv lists generation count to read-lock the pvh_global_lock in
pmap_clear_modify().

Noted and reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (marius)
2013-09-24 12:26:43 +00:00
kib
2346155b1e Ensure that the ERESTART return from the syscall reloads the
registers, to make the restarted syscall instruction pass the correct
arguments.

PR:	kern/182161
Reported by:	Russ Cox <rsc@swtch.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Approved by:	re (marius)
2013-09-24 12:24:48 +00:00
kib
776b37339a Free both KVA and backing pages when freeing TSS memory.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (marius)
2013-09-23 20:14:15 +00:00
gjb
1944aacf15 Put 'device hyperv' back in amd64/GENERIC, incorrectly removed with
r255736.

Pointed out by:	gibbs
Approved by:	re (delphij)
Sponsored by:	The FreeBSD Foundation
2013-09-21 01:07:27 +00:00
grehan
65054fa1ce Reorder/regroup the vmm ioctl api definitions to allow some
semblance of API stability and growth during the 10.* timeframe.

Userland/kernel bhyve will have to be recompiled after this.

Reviewed by:	neel
Approved by:	re@ (blanket)
2013-09-21 00:27:53 +00:00
gibbs
7ed30adae7 Merge Xen PVHVM support into the GENERIC kernel config for both
amd64 and i386.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
MFC after:	2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
	- Introduce two new CPU hooks for initialization and resume
	  purposes. This allows us to get rid of the XENHVM ifdefs in
	  mp_machdep, and also sets some hooks into common code that can be
	  used by other hypervisor implementations.

sys/amd64/conf/XENHVM:
sys/i386/conf/XENHVM:
	- Remove these configs now that GENERIC has builtin support for Xen
	  HVM.

sys/kern/subr_smp.c:
	- Make sure there are no pending IPIs when suspending a system.

sys/x86/xen/hvm.c:
	- Add cpu init and resume vectors that are called from mp_machdep
	  using the new hooks.
	- Only clear the vcpu_info mapping data on resume.  It is already
	  clear for the BSP on a cold boot and is set correctly as APs
	  are started.
	- Gate xen_hvm_init_cpu only to systems running under Xen.

sys/x86/xen/xen_intr.c:
	 - Gate the setup of event channels only to systems running under Xen.
2013-09-20 22:59:22 +00:00
davidch
e226cdd9b8 Substantial rewrite of bxe(4) to add support for the BCM57712 and
BCM578XX controllers.

Approved by:	re
MFC after:	4 weeks
2013-09-20 20:18:49 +00:00
neel
44c4dbefdb Merge the following changes from projects/bhyve_npt_pmap:
- add fields to 'struct pmap' that are required to manage nested page tables.
- add a parameter to 'vmspace_alloc()' that can be used to override the
  default pmap initialization routine 'pmap_pinit()'.

These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap'
in anticipation of the upcoming KBI freeze for 10.0.

Reviewed by:	kib@, alc@
Approved by:	re (glebius)
2013-09-20 17:06:49 +00:00
gibbs
a9c07a6f67 Add support for suspend/resume/migration operations when running as a
Xen PVHVM guest.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
MFC after:	2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	- Make sure that are no MMU related IPIs pending on migration.
	- Reset pending IPI_BITMAP on resume.
	- Init vcpu_info on resume.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
	- Add a "suspend_cancelled" parameter to pic_resume().  For the
	  Xen PIC, restoration of interrupt services differs between
	  the aborted suspend and normal resume cases, so we must provide
	  this information.

sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
	- Don't swap out "suspend safe" timers across a suspend/resume
	  cycle.  This includes the Xen PV and ACPI timers.

sys/dev/xen/control/control.c:
	- Perform proper suspend/resume process for PVHVM:
		- Suspend all APs before going into suspension, this allows us
		  to reset the vcpu_info on resume for each AP.
		- Reset shared info page and callback on resume.

sys/dev/xen/timer/timer.c:
	- Implement suspend/resume support for the PV timer. Since FreeBSD
	  doesn't perform a per-cpu resume of the timer, we need to call
	  smp_rendezvous in order to correctly resume the timer on each CPU.

sys/dev/xen/xenpci/xenpci.c:
	- Don't reset the PCI interrupt on each suspend/resume.

sys/kern/subr_smp.c:
	- When suspending a PVHVM domain make sure there are no MMU IPIs
	  in-flight, or we will get a lockup on resume due to the fact that
	  pending event channels are not carried over on migration.
	- Implement a generic version of restart_cpus that can be used by
	  suspended and stopped cpus.

sys/x86/xen/hvm.c:
	- Implement resume support for the hypercall page and shared info.
	- Clear vcpu_info so it can be reset by APs when resuming from
	  suspension.

sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
	- Support UP kernel configurations.

sys/x86/xen/xen_intr.c:
	- Properly rebind per-cpus VIRQs and IPIs on resume.
2013-09-20 05:06:03 +00:00
alc
88a4d0f31a The pmap function pmap_clear_reference() is no longer used. Remove it.
pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8.  Now, that call no
longer exists.

Approved by:	re (kib)
Sponsored by:	EMC / Isilon Storage Division
2013-09-20 04:30:18 +00:00
grehan
073f54353f Reconnect the hyperv drivers back into GENERIC now that the
disengage driver issue has been resolved.

Approved by:	re@ (gjb)
2013-09-19 05:07:51 +00:00
pjd
667d7255be Fix panic in ktrcapfail() when no capability rights are passed.
While here, correct all consumers to pass NULL instead of 0 as we pass
capability rights as pointers now, not uint64_t.

Reported by:	Daniel Peyrolon
Tested by:	Daniel Peyrolon
Approved by:	re (marius)
2013-09-18 19:26:08 +00:00
rdivacky
9e8e4eba85 Regen.
Approved by:	re (delphij)
2013-09-18 18:49:26 +00:00
rdivacky
fae003a069 Revert r255672, it has some serious flaws, leaking file references etc.
Approved by:	re (delphij)
2013-09-18 18:48:33 +00:00
rdivacky
b320aa22a6 Regen.
Approved by:    re (delphij)
2013-09-18 17:58:03 +00:00
rdivacky
d57db3eead Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by:    re (delphij)
Sponsored by:   Google Summer of Code
Submitted by:   Yuri Victorovich <yuri at rawbw dot com>
Tested by:      Yuri Victorovich <yuri at rawbw dot com>
2013-09-18 17:56:04 +00:00
grehan
a83c14de5c Hide TSC-deadline APIC timer support from guests. This mode
isn't yet implemented in bhyve's APIC emulation.

Reviewed by:	neel
Approved by:	re@ (blanket)
2013-09-17 17:56:53 +00:00
neel
16db26f84b Fix a bug in decoding an instruction that has an SIB byte as well as an
immediate operand. The presence of an SIB byte in decoding the ModR/M field
would cause 'imm_bytes' to not be set to the correct value.

Fix this by initializing 'imm_bytes' independent of the ModR/M decoding.

Reported by: grehan@
Approved by: re@
2013-09-17 16:06:07 +00:00
bryanv
5dc84522d4 Add vmx(4) to i386 and amd64 GENERIC
Approved by:	re (gjb)
2013-09-17 01:54:13 +00:00
kib
9867f4e99b In pmap_copy(), when the copied region is mapped with superpage but does
not cover entire superpage, avoid copying.  Doing partial copy would
require demotion, which is incompatible with the already held locks.

Reported by:    cperciva
Reviewed by:    alc
Sponsored by:	The FreeBSD Foundation
MFC after:      1 week
Approved by:	re (delphij)
2013-09-16 06:15:15 +00:00
grehan
dab286d879 Pull the hyperv drivers from GENERIC until the fix to the disengage
driver to make it only probe when running on hyperv is reviewed and
tested.

Approved by:	re (rodrigc)
2013-09-14 20:38:22 +00:00
grehan
0d2d002359 Import Hyper-V paravirtualized drivers from projects/hyperv
branch into head.

Approved by:	re@ (hrs)
Obtained from:	Microsoft, NetApp, and Citrix.
2013-09-13 18:47:58 +00:00
neel
348fe8d4ce Fix a limitation in bhyve that would limit the number of virtual machines to
the maximum number of VT-d domains (256 on a Sandybridge). We now allocate a
VT-d domain for a guest only if the administrator has explicitly configured
one or more PCI passthru device(s).

If there are no PCI passthru devices configured (the common case) then the
number of virtual machines is no longer limited by the maximum number of
VT-d domains.

Reviewed by: grehan@
Approved by: re@
2013-09-11 07:11:14 +00:00
grehan
d5d1038c7e IFC @ r255459 2013-09-11 00:19:16 +00:00
grehan
3d2b366a36 Go way past 11 and bump bhyve's max vCPUs to 16.
This should be sufficient for 10.0 and will do
until forthcoming work to avoid limitations
in this area is complete.

Thanks to Bela Lubkin at tidalscale for the
headsup on the apic/cpu id/io apic ASL parameters
that are actually hex values and broke when
written as decimal when 11 vCPUs were configured.

Approved by:	re@
2013-09-10 03:48:18 +00:00
alc
4aecbb077c Prior to r254304, we only began scanning the active page queue when the
amount of free memory was close to the point at which we would begin
reclaiming pages.  Now, we continuously scan the active page queue,
regardless of the amount of free memory.  Consequently, we are continuously
calling pmap_ts_referenced() on active pages.

Prior to this change, pmap_ts_referenced() would always demote superpage
mappings in order to obtain finer-grained reference information.  This made
sense because we were coming under memory pressure and would soon have to
begin reclaiming pages.  Now, however, with continuous scanning of the
active page queue, these demotions are taking a toll on performance.  For
example, on one of my test machines, the running time for the HPCC Random
Access benchmark (also known as GUPS) has increased by 54%.  To address this
problem, I have replaced the demotion with a heuristic for periodically
clearing the reference flag on superpage mappings.

Reviewed by:	kib
Approved by:	re (glebius)
Sponsored by:	EMC / Isilon Storage Division
2013-09-08 21:30:53 +00:00
neel
1306e6ef77 Allocate VPIDs by using the unit number allocator to keep do the bookkeeping.
Also deal with VPID exhaustion by allocating out of a reserved range as the
last resort.
2013-09-07 05:30:34 +00:00
grehan
0fa52a8a31 Mask off the vector from the MSI-x data word.
Some o/s's set the trigger-mode level bit which
results in an invalid vector and pass-thru interrupts
not being delivered.
2013-09-07 03:33:36 +00:00
gibbs
437790b349 Implement PV IPIs for PVHVM guests and further converge PV and HVM
IPI implmementations.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Submitted by: gibbs (misc cleanup, table driven config)
Reviewed by:  gibbs
MFC after: 2 weeks

sys/amd64/include/cpufunc.h:
sys/amd64/amd64/pmap.c:
	Move invltlb_globpcid() into cpufunc.h so that it can be
	used by the Xen HVM version of tlb shootdown IPI handlers.

sys/x86/xen/xen_intr.c:
sys/xen/xen_intr.h:
	Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(),
	and remove the ipi vector parameter.  This api allocates
	an event channel port that can be used for ipi services,
	but knows nothing of the actual ipi for which that port
	will be used.  Removing the unused argument and cleaning
	up the comments surrounding its declaration helps clarify
	its actual role.

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
	Implement a generic framework for amd64 and i386 that allows
	the implementation of certain CPU management functions to
	be selected at runtime.  Currently this is only used for
	the ipi send function, which we optimize for Xen when running
	on a Xen hypervisor, but can easily be expanded to support
	more operations.

sys/x86/xen/hvm.c:
	Implement Xen PV IPI handlers and operations, replacing native
	send IPI.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
sys/i386/include/smp.h:
	Remove NR_VIRQS and NR_IPIS from FreeBSD headers.  NR_VIRQS
	is defined already for us in the xen interface files.
	NR_IPIS is only needed in one file per Xen platform and is
	easily inferred by the IPI vector table that is defined in
	those files.

sys/i386/xen/mp_machdep.c:
	Restructure to more closely match the HVM implementation by
	performing table driven IPI setup.
2013-09-06 22:17:02 +00:00
bryanv
f46bf10bc5 Add vmx device to the i386 and amd64 NOTES files 2013-09-06 20:24:21 +00:00
kib
8439d9918f Only lock pvh_global_lock read-only for pmap_page_wired_mappings(),
pmap_is_modified() and pmap_is_referenced(), same as it was done for
pmap_ts_referenced().

Consolidate identical code for pmap_is_modified() and
pmap_is_referenced() into helper pmap_page_test_mappings().

Reviewed by:	alc
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
2013-09-06 16:53:48 +00:00
kib
b10a9f2e78 In pmap_ts_referenced(), when restarting the loop due to pv list
generation changed, do not drop and immediately relock the pv list.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-09-06 16:48:34 +00:00
glebius
358d3d145a On those machines, where sf_bufs do not represent any real object, make
sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely
empty functions.

Reviewed by:	alc, kib, scottl
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2013-09-06 05:37:49 +00:00
grehan
5110b054b2 Emulate reading of the IA32_MISC_ENABLE MSR, by returning
the host MSR and masking off features that aren't supported.
Linux reads this MSR to detect if NX has been disabled via
BIOS.
2013-09-06 05:20:11 +00:00
grehan
37b6e3223f Allow CPUID leaf 0xD to be read as zeroes.
Linux reads this even though extended features
aren't exposed.

Support for 0xD will be expanded once AVX[2]
is exposed to the guest in upcoming work.
2013-09-06 05:16:10 +00:00
pjd
029a6f5d92 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
kib
23ba68da73 Tidy up some loose ends in the PCID code:
- Restore the pre-PCID TLB shootdown handlers for whole address space
  and single page invalidation asm code, and assign the IPI handler to
  them when PCID is not supported or disabled.  Old handlers have
  linear control flow.  But, still use the common return sequence.

- Stop using pcpu for INVPCID descriptors in the invlrg handler.  It
  is enough to allocate descriptors on the stack.  As result, two
  SWAPGS instructions are shaved off from the code for Haswell+.

- Fix the reverted condition in invlrng for checking of the PCID
  support [1], also in invlrng check that pmap is kernel pmap before
  performing other tests.  For the kernel pmap, which provides global
  mappings, the INVLPG must be used for invalidation always.

- Save the pre-computed pmap' %CR3 register in the struct pmap.  This
  allows to remove several checks for pm_pcid validity when %CR3 is
  reloaded [2].

Noted by:   gibbs [1]
Discussed with:	alc [2]
Tested by:	pho, flo
Sponsored by:	The FreeBSD Foundation
2013-09-04 23:31:29 +00:00
grehan
9394ef331e IFC @ r255209 2013-09-04 20:55:56 +00:00
jhb
8f38cafe69 Add support for the 'invpcid' instruction to binutils and DDB's
disassembler on amd64.

MFC after:	1 month
2013-09-03 21:21:47 +00:00
kib
2dccc06e8e Fix two build failures for non-tb configurations, UP [2] and when using gas [1].
Reported by:	andreast [1], bf [2]
Sponsored by:	The FreeBSD Foundation
2013-08-31 19:13:21 +00:00
kib
6eb9415a25 The pm_save should be cleared on the pmap initialization, and not on
the activation.

Noted by:	alc
2013-08-30 20:10:01 +00:00
kib
a2b5da0090 Implement support for the process-context identifiers ('PCID') on
Intel CPUs.  The feature tags TLB entries with the Id of the address
space and allows to avoid TLB invalidation on the context switch, it
is available only in the long mode.  In the microbenchmarks, using the
PCID decreased latency of the context switches by ~30% on SandyBridge
class desktop CPUs, measured with the lat_ctx program from lmbench.

If available, use INVPCID instruction when a TLB entry in non-current
address space needs to be invalidated.  The instruction is typically
available on the Haswell.

If needed, the use of PCID can be turned off with the
vm.pmap.pcid_enabled loader tunable set to 0.  The state of the
feature is reported by the vm.pmap.pcid_enabled sysctl.  The sysctl
vm.pmap.pcid_save_cnt reports the number of context switches which
avoided invalidating the TLB; compare with the total number of context
switches, available as sysctl vm.stats.sys.v_swtch.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
Tested by:	pho, bf
2013-08-30 07:59:49 +00:00
kib
f8c0849eff Provide a wrapper for the INVPCID instruction, definition of the
descriptor and symbolic names for the operation types.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
Tested by:	pho, bf
2013-08-30 07:42:38 +00:00
gibbs
fcdbf70fd9 Implement vector callback for PVHVM and unify event channel implementations
Re-structure Xen HVM support so that:
	- Xen is detected and hypercalls can be performed very
	  early in system startup.
	- Xen interrupt services are implemented using FreeBSD's native
	  interrupt delivery infrastructure.
	- the Xen interrupt service implementation is shared between PV
	  and HVM guests.
	- Xen interrupt handlers can optionally use a filter handler
	  in order to avoid the overhead of dispatch to an interrupt
	  thread.
	- interrupt load can be distributed among all available CPUs.
	- the overhead of accessing the emulated local and I/O apics
	  on HVM is removed for event channel port events.
	- a similar optimization can eventually, and fairly easily,
	  be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
	Reserve IDT vector 0x93 for the Xen event channel upcall
	interrupt handler.  On Hypervisors that support the direct
	vector callback feature, we can request that this vector be
	called directly by an injected HVM interrupt event, instead
	of a simulated PCI interrupt on the Xen platform PCI device.
	This avoids all of the overhead of dealing with the emulated
	I/O APIC and local APIC.  It also means that the Hypervisor
	can inject these events on any CPU, allowing upcalls for
	different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
	Increase the FreeBSD IRQ vector table to include space
	for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
	Remove Xen HVM per-cpu variable data.  These fields are now
	allocated via the dynamic per-cpu scheme.  See xen_intr.c
	for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
	Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
	Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
	Remove constants, macros, and functions unused in FreeBSD's Xen
	support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
	Introduce new functions xen_domain(), xen_pv_domain(), and
	xen_hvm_domain().  These are used in favor of #ifdefs so that
	FreeBSD can dynamically detect and adapt to the presence of
	a hypervisor.  The goal is to have an HVM optimized GENERIC,
	but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
	Refactor magic ioport, Hypercall table and Hypervisor shared
	information page setup, and move it to a dedicated HVM support
	module.

	HVM mode initialization is now triggered during the
	SI_SUB_HYPERVISOR phase of system startup.  This currently
	occurs just after the kernel VM is fully setup which is
	just enough infrastructure to allow the hypercall table
	and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
	Add definitions and a method for configuring Hypervisor event
	delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
	Adjust kernel build to reflect the refactoring of early
	Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
	Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
	Since blkback defers all event handling to a taskqueue,
	convert this task queue to a "fast" taskqueue, and schedule
	it via an interrupt filter.  This avoids an unnecessary
	ithread context switch.

sys/xen/xenstore/xenstore.c:
	The xenstore driver is MPSAFE.  Indicate as much when
	registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
	Remove unused event channel APIs.

sys/xen/evtchn.h:
	Remove all kernel Xen interrupt service API definitions
	from this file.  It is now only used for structure and
	ioctl definitions related to the event channel userland
	device driver.

	Update the definitions in this file to match those from
	NetBSD.  Implementing this interface will be necessary for
	Dom0 support.

sys/xen/evtchn/evtchnvar.h:
	Add a header file for implemenation internal APIs related
	to managing event channels event delivery.  This is used
	to allow, for example, the event channel userland device
	driver to access low-level routines that typical kernel
	consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
	Standardize on the evtchn_port_t type for referring to
	an event channel port id.  In order to prevent low-level
	event channel APIs from leaking to kernel consumers who
	should not have access to this data, the type is defined
	twice: Once in the Xen provided event_channel.h, and again
	in xen/xen_intr.h.  The double declaration is protected by
	__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
	twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
	New implementation of Xen interrupt services.  This is
	similar in many respects to the i386 PV implementation with
	the exception that events for bound to event channel ports
	(i.e. not IPI, virtual IRQ, or physical IRQ) are further
	optimized to avoid mask/unmask operations that aren't
	necessary for these edge triggered events.

	Stubs exist for supporting physical IRQ binding, but will
	need additional work before this implementation can be
	fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
	Add support for placing vcpu_info into an arbritary memory
	page instead of using HYPERVISOR_shared_info->vcpu_info.
	This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
	Add support for new event channle implementation.
2013-08-29 19:52:18 +00:00
alc
aa9a7bb9e6 Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE).  Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE.  Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference().  Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2).  A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact.  For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures.  I will commit implementations for the
remaining architectures after further testing.  For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by:      pho (i386), marcel (ia64)
Sponsored by:   EMC / Isilon Storage Division
2013-08-29 15:49:05 +00:00
neel
99ab2bf08e Add support for emulating the byte move instruction "mov r/m8, r8".
This emulation is required when dumping MMIO space via the ddb "examine"
command.
2013-08-27 16:49:20 +00:00
kib
05a9dff802 Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone.  Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by:	alc (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-22 18:12:24 +00:00
kib
076969f54a Use the generation count of the pv list to work around LOR between
pmap lock and pv list lock, and use the shared locking on
pvh_global_lock in pmap_remove_write(), same as it was done for
pmap_ts_referenced().

Noted and reviewed by:	alc (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-22 18:05:31 +00:00
obrien
b450ec770a The PADLOCK_RNG and RDRAND_RNG kernel options are now devices.
Thus "device padlock_rng" and "device rdrand_rng" should be
used instead of "options PADLOCK_RNG" & "options RDRAND_RNG".

Requested by:	so@ (des)
Submitted by:	obrien, arthurmesh@gmail.com
Obtained from:	Juniper Networks
2013-08-21 22:43:29 +00:00
jkim
f4e86b0a9f Reimplement atomic operations on PDEs and PTEs in pmap.h. This change
significantly reduces duplicate code and make it easier to read.

Reviewed by:	alc, bde
2013-08-21 22:40:29 +00:00
jkim
18751f3bf8 Remove empty lines before return statements for style consistency. 2013-08-21 22:05:58 +00:00
jkim
f354dbfd86 Implement atomic_swap() and atomic_testandset().
Reviewed by:	arch, bde, jilles, kib
2013-08-21 22:03:06 +00:00
jkim
a0c0cc666e - Remove the "a" constraint from main output operand for atomic_cmpset().
- Use "+" modifier for the "expect" because it is also an output (unused).
2013-08-21 21:30:06 +00:00
jkim
8c10ae99f8 Use '+' modifier for a memory operand that is both an input and an output.
It was actually done in r86301 but reverted in r150182 because GCC 3.x was
not able to handle it for a memory operand.  Apparently, this problem was
fixed in GCC 4.1+ and several contrib sources already rely on this feature.
2013-08-21 21:14:16 +00:00
jkim
dd0ac928f4 Remove bogus labels. No functional change. 2013-08-21 20:49:46 +00:00
jkim
2deff14808 Use consistent style. No functional change. 2013-08-21 20:43:50 +00:00
neel
44099f4092 Do not create superpage mappings in the iommu.
This is a workaround to hide the fact that we do not have any code to
demote a superpage mapping before we unmap a single page that is part
of the superpage.
2013-08-20 06:46:40 +00:00
neel
15659a9ddf Extract the location of the remapping hardware units from the ACPI DMAR table.
Submitted by:	Gopakumar T (gopakumar_thekkedath@yahoo.co.in)
2013-08-20 06:20:05 +00:00
neel
bea80a701c Fix breakage caused by r254466 in minidumpsys().
r254466 increased the KVA from 512GB to 2TB which requires 4 PDP pages as
opposed to a single one before the change. This broke minidumpsys() since
it assumed that the entire KVA could be addressed via a single PDP page.

Fix this by obtaining the address of the PDP page from the PML4 entry
associated with the KVA being dumped.

Reported by:	pho
Submitted by:	kib
Pointy hat to:	neel
2013-08-20 02:09:26 +00:00
kib
17c7e12452 When code from r254064 in pmap_ts_referenced() drops pv lock and
blocks on a pmap lock, pmap_release() might proceed in parallel and
destroy the pmap mutex, since unlocked pv lock allows to remove pv
entry owned by the pmap.

For now, gate the pmap_release() on write-locked pvh_global_lock.
Since pmap_ts_release() does not unlock the global lock,
pmap_release() would not destroy pmap mutex until the
pmap_ts_referenced() finished.  We cannot enter pmap_ts_referenced()
and encounter a pv entry for the destroyed pmap if pmap_release()
passed the global lock gate, since pmap_remove_pages() would finish
earlier.

Reported by:	jeff, pho
Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-18 21:36:22 +00:00
pjd
ae1f4cb9ce Add process descriptors support to the GENERIC kernel. It is already being
used by the tools in base systems and with sandboxing more and more tools
the usage should only increase.

Submitted by:	Mariusz Zaborski <oshogbo@FreeBSD.org>
Sponsored by:	Google Summer of Code 2013
MFC after:	1 month
2013-08-18 10:21:29 +00:00
neel
1038bfa7cf Bump up the maximum addressable memory on amd64 systems from 1TB to 4TB.
Bump up the KVA size proportionally from 512GB to 2TB.

The number of page table pages used by the direct map is now calculated at
run time based on 'Maxmem'. This means the small memory systems will not
see any additional tax in terms of page table pages for the direct map.

However all amd64 systems, regardless of the memory size, will use 3 more
pages to accomodate the bump in the KVA size.

More details available here:
http://lists.freebsd.org/pipermail/freebsd-hackers/2013-June/043015.html
http://lists.freebsd.org/pipermail/freebsd-current/2013-July/043143.html

Tested with the following configurations:
- Sandybridge server with 64GB of memory.
- bhyve VM with 64MB of memory.
- bhyve VM with a 8GB of memory with the memory segment above 4GB cuddling
  right up against the 4TB maximum memory limit.

Discussed on:	hackers@, current@
Submitted by:	Chris Torek (torek@torek.net)
2013-08-17 19:49:08 +00:00
jilles
fd29e78a68 libc: Access _logname_valid more efficiently.
The variable _logname_valid is not exported via the version script;
therefore, change C and i386/amd64 assembler code to remove indirection
(which allowed interposition). This makes the code slightly smaller and
faster.

Also, remove #define PIC_GOT from i386/amd64 in !PIC mode. Without PIC,
there is no place containing the address of each variable, so there is no
possible definition for PIC_GOT.
2013-08-17 19:24:58 +00:00
brooks
2e54cfb79a Use an ANSI C definition of initializecpucache() to match the declaration
and the rest of the file.
2013-08-15 17:44:44 +00:00
jkim
20df47e877 Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these
two files were functionally identical.
2013-08-13 22:05:10 +00:00
jkim
c55a3ec3ad Tidy up global locks for ACPICA. There is no functional change. 2013-08-13 21:34:03 +00:00
kib
4675fcfce0 Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue.  Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked.  See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members.  Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-10 17:36:42 +00:00
attilio
e9f37cac74 On all the architectures, avoid to preallocate the physical memory
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.

In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.

vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc (older version)
Reviewed by:	jeff
Tested by:	pho, scottl
2013-08-09 11:28:55 +00:00
attilio
16c7563cf4 The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
  and vm_page_grab are being executed.  This will be very helpful
  once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff, kib
Tested by:	gavin, bapt (older version)
Tested by:	pho, scottl
2013-08-09 11:11:11 +00:00
avg
6f05d223ad follow up to r254051
- update powerpc/GENERIC64 as well, suggested by mdf
- update comments so that they make sense after the change, suggested by
  jhb

X-MFC after:	never (change specific to head)
2013-08-09 08:11:09 +00:00
neel
541895143d Use local variables with the appropriate types and eliminate a bunch of casts.
This is a cosmetic change but it does help with a proposed change to increase
the maximum size of physical memory supported on amd64 platforms.

Submitted by:	Chris Torek (torek@torek.net)
2013-08-08 03:17:39 +00:00
kib
8de1718b60 Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain.  The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass.  This is not optimal, since it
could cause excessive page deactivation and freeing.  The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues.  Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
kib
a3142db9ac Change the pmap_ts_referenced() method of amd64 pmap to use shared
pvh_global_lock.  This allows the method to be executed in parallel,
avoiding undue contention on the pvh_global_lock for the multithreaded
pagedaemon.

The pmap_ts_referenced() function has to inspect the page mappings for
several pmaps, which need to be locked while pv list lock is owned.
This contradicts to the lock order, where pmap lock is before pv list
lock.  Introduce the generation count for the pv list of the page or
superpage, which indicate any change in the pv list, and, as usual,
perform restart of the iteration if generation changed while pv lock
was dropped for blocking acquire of a pmap lock.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:33:15 +00:00
avg
0255b98224 enable KDB_TRACE in GENERICs
KDB_TRACE is not an alternative to DDB/etc, they are complementary.
So I do not see any reason to not enable KDB_TRACE by default.

X-MFC after:	never (change specific to head)
2013-08-07 08:03:50 +00:00
jeff
de4ecca213 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
jeff
baea70e2d2 - Introduce a specific function, pmap_remove_kernel_pde, for removing
huge pages in the kernel's address space.  This works around several
   asserts from pmap_demote_pde_locked that did not apply and gave false
   warnings.

Discovered by:	pho
Reviewed by:	alc
Sponsored by:	EMC / Isilon Storage Division
2013-08-05 00:28:03 +00:00
grehan
be25b04bb5 Follow-up commit to fix CR0 issues. Maintain
architectural state on CR vmexits by guaranteeing
that EFER, CR0 and the VMCS entry controls are
all in sync when transitioning to IA-32e mode.

Submitted by:	Tycho Nightingale (tycho.nightingale <at> plurisbusnetworks.com)
2013-08-03 03:16:42 +00:00
grehan
fd121a9c7b IFC @ r253862
- change the SI_SUB_RUN_SCHEDULER sysinits in hv_utilc and
hv_netvsc_drv_freebsd.c to SI_SUB_KTHREAD_IDLE, since the
former is no longer in FreeBSD.
  The use of these SYSINITs can probably be removed.
2013-08-01 22:09:57 +00:00
grehan
41f621778a Moved clearing of vmm_initialized to avoid the case
of unloading the module while VMs existed. This would
result in EBUSY, but would prevent further operations
on VMs resulting in the module being impossible to
unload.

Submitted by:   Tycho Nightingale (tycho.nightingale <at> plurisbusnetworks.com)
Reviewed by:	grehan, neel
2013-08-01 05:59:28 +00:00
grehan
045ef38328 Correctly maintain the CR0/CR4 shadow registers.
This was exposed with AP spinup of Linux, and
booting OpenBSD, where the CR0 register is unconditionally
written to prior to the longjump to enter protected
mode. The CR-vmexit handling was not updating CPU state which
resulted in a vmentry failure with invalid guest state.

A follow-on submit will fix the CPU state issue, but this
fix prevents the CR-vmexit prior to entering protected
mode by properly initializing and maintaining CR* state.

Reviewed by:	neel
Reported by:	Gopakumar.T @ netapp
2013-08-01 01:18:51 +00:00
obrien
7999076e3e Back out r253779 & r253786. 2013-07-31 17:21:18 +00:00
obrien
721ce839c7 Decouple yarrow from random(4) device.
* Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option.
  The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow.

* random(4) device doesn't really depend on rijndael-*.  Yarrow, however, does.

* Add random_adaptors.[ch] which is basically a store of random_adaptor's.
  random_adaptor is basically an adapter that plugs in to random(4).
  random_adaptor can only be plugged in to random(4) very early in bootup.
  Unplugging random_adaptor from random(4) is not supported, and is probably a
  bad idea anyway, due to potential loss of entropy pools.
  We currently have 3 random_adaptors:
  + yarrow
  + rdrand (ivy.c)
  + nehemeiah

* Remove platform dependent logic from probe.c, and move it into
  corresponding registration routines of each random_adaptor provider.
  probe.c doesn't do anything other than picking a specific random_adaptor
  from a list of registered ones.

* If the kernel doesn't have any random_adaptor adapters present then the
  creation of /dev/random is postponed until next random_adaptor is kldload'ed.

* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
  system wide one.

Submitted by: arthurmesh@gmail.com, obrien
Obtained from: Juniper Networks
Reviewed by: obrien
2013-07-29 20:26:27 +00:00
avg
4e6c4b2a36 Revert r253748,253749
This WIP should not have been committed yet.

Pointyhat to:	avg
2013-07-28 18:44:17 +00:00
avg
ea04b46439 put contents of cpu.h under _KERNEL
no userland-serviceable parts inside

MFC after:	20 days
2013-07-28 18:32:27 +00:00
avg
425573ccba x86: detect mwait capabilities and extensions, when present
Reviewed by:	kib (earlier amd64-only version)
MFC after:	2 weeks
2013-07-28 17:54:42 +00:00
jeff
c8d343ae8b - Use kmem_malloc rather than kmem_alloc() for GDT/LDT/tss allocations etc.
This eliminates some unusual uses of that API in favor of more typical
   uses of kmem_malloc().

Discussed with:	kib/alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-07-26 19:06:14 +00:00
neel
8a3e57621d Add support for emulation of the "or r/m, imm8" instruction.
Submitted by:	Zhixiang Yu (zxyu.core@gmail.com)
Obtained from:	GSoC 2013 (AHCI device emulation for bhyve)
2013-07-23 23:43:00 +00:00
neel
50750f842b Fix a bug introduced in r252646 that causes a page with the PG_PTE_PAT bit set
to be interpreted as a superpage. This is because PG_PTE_PAT is at the same
bit position in PTE as PG_PS is in a PDE.

This caused a number of regressions on amd64 systems: panic when starting
X applications, freeze during shutdown etc.

Pointy hat to:	me
Tested by: gperez@entel.upc.edu, joel, dumbbell
Reviewed by: kib
2013-07-23 22:17:00 +00:00
grehan
e5517eb7cf First cut at adding the hyperv drivers to GENERIC.
The files inventory should probably have the modules split
out into net/storage/common etc as the modules build is,
but this will do for now.
2013-07-19 05:32:58 +00:00
kib
c833aa741f MFi386: add ddb "show sysregs" command.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-15 06:30:57 +00:00
kib
336012d28c Clear m->object for the page taken from the delayed free list for
reuse as the pv chink page in reclaim_pv_chunk().  Having non-NULL
m->object is wrong for page not owned by an object and confuses both
vm_page_free_toq() and vm_page_remove() when the page is freed later.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2013-07-10 09:24:03 +00:00
delphij
15305ee17a Import HighPoint DC Series Data Center HBA (DC7280 and R750) driver.
This driver works for FreeBSD/i386 and FreeBSD/amd64 platforms.

Many thanks to HighPoint for providing this driver.

MFC after:	1 day
2013-07-06 07:49:41 +00:00
neel
bfcf8b6dd6 If a superpage mapping is being removed then we need to ignore the PG_PDE_PAT
bit when looking up the vm_page associated with the superpage's physical
address.

If the caching attribute for the mapping is write combining or write protected
then the PG_PDE_PAT bit will be set and thus cause an 'off-by-one' error
when looking up the vm_page.

Fix this by using the PG_PS_FRAME mask to compute the physical address for
a superpage mapping instead of PG_FRAME.

This is a theoretical issue at this point since non-writeback attributes are
currently used only for fictitious mappings and fictitious mappings are not
subject to promotion.

Discussed with:	alc, kib
MFC after:	2 weeks
2013-07-03 23:21:25 +00:00
neel
468b664f74 Verify that all bytes in the instruction buffer are consumed during decoding.
Suggested by:	grehan
2013-07-03 23:05:17 +00:00
grehan
cba9c47f36 Ignore guest PAT settings by default in EPT mappings.
From experimentation, other hypervisors also do this.

Diagnosed by:	tycho nightingale at pluribusnetworks com
Reviewed by:	neel
2013-07-01 20:05:43 +00:00
kib
8a0279994d Fix issues with zeroing and fetching the counters, on x86 and ppc64.
Issues were noted by Bruce Evans and are present on all architectures.

On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32.  If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.

On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot.  The effect would be a counter going off by
arbitrary value after zeroing.  Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive.  On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.

PowerPC64 is treated the same as amd64.

For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching.  It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm).  If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.

Noted by:  bde
Sponsored by:	The FreeBSD Foundation
2013-07-01 02:48:27 +00:00
grehan
abdbf599d1 Make sure all CPUID values are handled, instead of exiting the
bhyve process when an unhandled one is encountered.

Hide some additional capabilities from the guest (e.g. debug store).

This fixes the issue with FreeBSD 9.1 MP guests exiting the VM on
AP spinup (where CPUID is used when sync'ing the TSCs) and the
issue with the Java build where CPUIDs are issued from a guest
userspace.

Submitted by:	tycho nightingale at pluribusnetworks com
Reviewed by:	neel
Reported by:	many
2013-06-28 06:05:33 +00:00
jkim
70506a4dd6 Move definitions required by userland applications out of acpica_machdep.h. 2013-06-27 00:22:40 +00:00
kib
06771a9c87 Allow immediate operand.
Sponsored by:	The FreeBSD Foundation
2013-06-20 14:30:04 +00:00
kib
03abf664bd Some clarifications and updates for the comments, mostly retrieved
from Bruce Evans.  Trim the trailing spaces.

MFC after:	1 week
2013-06-19 05:05:16 +00:00
pluknet
6509325017 Fix a gcc warning uncovered after r251745.
Reported by:	Sergey V. Dyatko
Reviewed by:	neel
2013-06-18 23:31:09 +00:00
gibbs
d4000dfe1d Upgrade Xen interface headers to Xen 4.2.1.
Move FreeBSD from interface version 0x00030204 to 0x00030208.
Updates are required to our grant table implementation before we
can bump this further.

sys/xen/hvm.h:
	Replace the implementation of hvm_get_parameter(), formerly located
	in sys/xen/interface/hvm/params.h.  Linux has a similar file which
	primarily stores this function.

sys/xen/xenstore/xenstore.c:
	Include new xen/hvm.h header file to get hvm_get_parameter().

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
	Correctly protect function definition and variables from being
	included into assembly files in xen-os.h

	Xen memory barriers are now prefixed with "xen_" to avoid conflicts
	with OS native primatives.  Define Xen memory barriers in terms of
	the native FreeBSD primatives.

Sponsored by:	Spectra Logic Corporation
Reviewed by:	Roger Pau Monné
Tested by:	Roger Pau Monné
Obtained from:	Roger Pau Monné (bug fixes)
2013-06-14 23:43:44 +00:00
pluknet
672cd692ff Replace cpusetffs_obj with CPU_FFS, missed in r251703.
Reported by:	bdrewery, O. Hartmann
2013-06-14 10:26:38 +00:00
neel
10fa07d0e3 Remove unused macros PTESHIFT, PDESHIFT, PDPESHIFT and PML4ESHIFT.
Reviewed by:	alc
2013-06-14 00:03:43 +00:00
jeff
7ee88fb112 - Add a BIT_FFS() macro and use it to replace cpusetffs_obj()
Discussed with:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-06-13 20:46:03 +00:00
kib
17884c1f07 Assert that interrupts are enabled in the trap handlers on x86 before
calling generic code to deliver signals.

Discussed with:	bde
Tested by:	pho
MFC after:	1 week
2013-06-03 17:40:05 +00:00
kib
eaa828d3a5 Use slightly more idiomatic expression to get the address of array.
Tested by:	dim, pgj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:39:39 +00:00
kib
4f651f991c The _MC_HASFPXSTATE and _MC_IA32_HASFPXSTATE flags have the same bit
value on purpose, but the ia32 context handling code is logically more
correct to use the _MC_IA32_HASFPXSTATE name for the flag.

Tested by:	dim, pgj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:36:46 +00:00
kib
1ab68ccb64 The ia32_get_mcontext() does not need to set PCB_FULL_IRET. The
usermode context state is not changed by the get operation, and
get_mcontext() does not require full iret as well.

Tested by:	dim, pgj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:31:15 +00:00
kib
b48b0e8ec0 When reporting the fault details, also print %rsp.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:29:20 +00:00
kib
da32b7c314 When handling an exception from the attempt from loading the faulting
context on return from the trap handler, re-enable the interrupts on
i386 and amd64.  The trap return path have to disable interrupts since
the sequence of loading the machine state is not atomic.  The trap()
function which transfers the control to the special handler would
enable the interrupt, but an iret loads the previous eflags with PSL_I
clear.  Then, the special handler calls trap() on its own, which now
sees the original eflags with PSL_I set and does not enable
interrupts.

The end result is that signal delivery and process exiting code could
be executed with interrupts disabled, which is generally wrong and
triggers several assertions.

For amd64, the interrupts are enabled conditionally based on PSL_I in
the eflags of the outer frame, as it is already done for
doreti_iret_fault.  For i386, the interrupts are enabled
unconditionally, the ast loop could have opened a window with
interrupts enabled just before the iret anyway.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:26:08 +00:00
achim
10ab667100 Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver.
Approved by:	scottl (mentor)
2013-05-24 09:22:43 +00:00
attilio
fdf82ef9cf o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
  to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
  operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
  mostl of the times only with readlocks.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-05-21 20:38:19 +00:00
kib
b2b2f0b782 Fix the hardware watchpoints on SMP amd64. Load the updated %dr
registers also on other CPUs, besides the CPU which happens to execute
the ddb.  The debugging registers are stored in the pcpu area,
together with the command which is executed by the IPI stop handler
upon resume.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-21 11:24:32 +00:00
kib
5ee265a48b Add amd64-specific ddb command 'show phys2dmap', which calculates the
address in the direct map corresponding to the given physical address.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-21 11:07:12 +00:00
marcel
65b2bbd1ff Add basic support for FDT to i386 & amd64. This change includes:
1.  Common headers for fdt.h and ofw_machdep.h under x86/include
    with indirections under i386/include and amd64/include.
2.  New modinfo for loader provided FDT blob.
3.  Common x86_init_fdt() called from hammer_time() on amd64 and
    init386() on i386.
4.  Split-off FDT specific low-level console functions from FDT
    bus methods for the uart(4) driver. The low-level console
    logic has been moved to uart_cpu_fdt.c and is used for arm,
    mips & powerpc only. The FDT bus methods are shared across
    all architectures.
5.  Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
    fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from:	Juniper Networks, Inc.
2013-05-21 03:05:49 +00:00
ed
f489ad2fec Improve readability of static assertions for OFFSET_* macros.
Instead of doing all sorts of weird casting of constants to
pointer-pointers, simply use the standard C offsetof() macro to obtain
the offset of the respective fields in the structures.
2013-05-13 21:47:17 +00:00
peter
cec511ab65 Tidy up some CVS workarounds. 2013-05-12 01:53:47 +00:00
rpaulo
73e04ffd11 Fix several standard extended feature bits.
Submitted by:	Oliver Pinter <oliver.pntr at gmail.com>
2013-05-11 01:31:51 +00:00
neel
c5e619b651 Support array-type of stats in bhyve.
An array-type stat in vmm.ko is defined as follows:
VMM_STAT_ARRAY(IPIS_SENT, VM_MAXCPU, "ipis sent to vcpu");

It is incremented as follows:
vmm_stat_array_incr(vm, vcpuid, IPIS_SENT, array_index, 1);

And output of 'bhyvectl --get-stats' looks like:
ipis sent to vcpu[0]     3114
ipis sent to vcpu[1]     0

Reviewed by:	grehan
Obtained from:	NetApp
2013-05-10 02:59:49 +00:00
dchagin
b6dc2308e1 Retire write-only PCB_GS32BIT pcb flag on amd64. 2013-05-09 21:42:43 +00:00
kib
e51caee784 Correct the type for the literal used on the left side of the shift up
to 63 bit positions.

Do not fill the save area and do not set the saved bit in the xstate
bit vector for the state which is not marked as enabled in xsave_mask.

Reported and tested by:	Jim Ohlstein <jim@ohlste.in>
MFC after:	3 days
2013-05-09 17:25:29 +00:00
attilio
b24a52ec9e Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
emaste
607cfd414a Switch to standard copyright license text
The initial version of this came from Sandvine but had "PROVIDED BY NETAPP,
INC" in the copyright text, presuambly because the license block was copied
from another file.  Replace it with standard "AUTHOR AND CONTRIBUTORS" form.

Approvided by: grehan@
2013-05-02 12:35:15 +00:00
kib
19e5409088 Partially saved extended state must be handled always, i.e. for both
fpu-owned context, and for pcb-saved one.  More, the XSAVE could do
partial save, same as XSAVEOPT, so qualifier for the handler should be
use_xsave and not use_xsaveopt.

Since xsave_area_desc is now needed regardless of the XSAVEOPT use,
remove the write-only use_xsaveopt variable.

In collaboration with:	jhb
MFC after:	1 week
2013-05-01 20:08:33 +00:00
kib
f2ccbf32fb The check to ensure that xstate_bv always has XFEATURE_ENABLED_X87 and
XFEATURE_ENABLED_SSE bits set is not needed.  CPU correctly handles
any bitmask which is subset of the enabled bits in %XCR0.

More, CPU instructions XSAVE and XSAVEOPT could write the mask without
e.g. XFEATURE_ENABLED_SSE, after the VZEROALL.  The check prevents the
restoration of the otherwise valid FPU save area.

In collaboration with:	jhb
MFC after:	1 week
2013-05-01 20:03:50 +00:00
carl
542feb6d72 Add a new driver to support the Intel Non-Transparent Bridge(NTB).
The NTB allows you to connect two systems with this device using a PCI-e
link. The driver is made of two modules:
 - ntb_hw which is a basic hardware abstraction layer for the device.
 - if_ntb which implements the ntb network device and the communication
   protocol.

The driver is limited at the moment to CPU memcpy instead of using DMA, and
only Back-to-Back mode is supported. Also the network device isn't full
featured yet. These changes will be coming soon. The DMA change will also
bring in the ioat driver from the project branch it is on now.

This is an initial port of the GPL/BSD Linux driver contributed by Jon Mason
from Intel. Any bugs are my contributions.

Sponsored by: Intel
Reviewed by: jimharris, joel (man page only)
Approved by: jimharris (mentor)
2013-04-29 22:48:53 +00:00
grehan
d46f135643 Add RIP-relative addressing to the instruction decoder.
Rework the guest register fetch code to allow the RIP to
be extracted from the VMCS while the kernel decoder is
functioning.

Hit by the OpenBSD local-apic code.

Submitted by:	neel
Reviewed by:	grehan
Obtained from:	NetApp
2013-04-25 04:56:43 +00:00
rpaulo
ba95edba6f Print RDSEED, ADX, and SMAP.
Pointed out by:	kib
2013-04-18 01:21:44 +00:00
gabor
5635c6550b - Correct spelling in comments
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:56:11 +00:00
rpaulo
1f4363857d Print more bits from the standard extended features CPUID which will be
available in the Haswell architecture (c.f. Intel Document #319433-012A).
2013-04-17 06:51:17 +00:00
neel
df29079506 Create sysctl node 'hw.vmm.vmx' and populate it with oids that expose the VMX
hardware capabilities.

Obtained from:	NetApp
2013-04-13 21:41:51 +00:00
kib
7117978a81 Fix the name of the pcb member in the comments.
Submitted by:	Oliver Pinter <oliver.pntr@gmail.com>
MFC after:	3 days
2013-04-13 15:20:33 +00:00
neel
f80ac5cabf Use the MAKEDEV_CHECKNAME flag to check for an invalid device name and return
an error instead of panicking.

Obtained from:	NetApp
2013-04-13 05:11:21 +00:00
trasz
80b8b2f779 Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE'
and kern.cam.ctl.disable tunable; those were introduced as a workaround
to make it possible to boot GENERIC on low memory machines.

With ctl(4) being built as a module and automatically loaded by ctladm(8),
this makes CTL work out of the box.

Reviewed by:	ken
Sponsored by:	FreeBSD Foundation
2013-04-12 16:25:03 +00:00