Commit Graph

551 Commits

Author SHA1 Message Date
kib
60f7add83d Registers definitions for the new capabilities from the version 2.4 of
VT-d specification.  Also add definitions for the interrupt remapping
table and IEC.

Print new capabilities on boot. although there is no hardware which
support it.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-02-11 23:30:46 +00:00
kib
e2cc9f9eca vm_page_lookup() accepts read-locked object.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-02-11 23:28:28 +00:00
kib
0754f0eac9 Add x2APIC support. Enable it by default if CPU is capable. The
hw.x2apic_enable tunable allows disabling it from the loader prompt.

To closely repeat effects of the uncached memory ops when accessing
registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded
by mfence, except for the EOI notifications.  This is probably too
strict, only ICR writes to send IPI require serialization to ensure
that other CPUs see the previous actions when IPI is delivered.  This
may be changed later.

In vmm justreturn IPI handler, call doreti_iret instead of doing iretd
inline, to handle corner conditions.

Note that the patch only switches LAPICs into x2APIC mode. It does not
enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC
MADT entries and doing interrupts remapping, but is the required step
on the way.

Reviewed by:	neel
Tested by:	pho (real hardware), neel (on bhyve)
Discussed with:	jhb, grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-09 21:00:56 +00:00
jhb
9dfc36d985 Revert the IPI startup sequence to match what is described in the
Intel Multiprocessor Specification v1.4.  The Intel SDM claims that
the INIT IPIs here are invalid, but other systems follow the MP
spec instead.

While here, fix the IPI wait routine to accept a timeout in microseconds
instead of a raw spin count, and don't spin forever during AP startup.
Instead, panic if a STARTUP IPI is not delivered after 20 us.

PR:		196542
Differential Revision:	https://reviews.freebsd.org/D1719
MFC after:	2 weeks
2015-02-06 18:19:59 +00:00
bryanv
7a68cf4e59 Add interface to derive a TSC frequency from the pvclock
This can later use this to determine the TSC frequency like is done with
VMware, instead of using a DELAY loop that is not always accurate in an VM.

MFC after:	1 month
2015-02-04 08:33:04 +00:00
bryanv
25ce9181cf Generalized parts of the XEN timer code into a generic pvclock
KVM clock shares the same data structures between the guest and the host
as Xen so it makes sense to just have a single copy of this code.

Differential Revision: https://reviews.freebsd.org/D1429
Reviewed by:	royger (eariler version)
MFC after:	1 month
2015-02-04 08:26:43 +00:00
jhb
86833ce103 Opt for performance over power-saving on Intel CPUs that have a
P-state but not C-state invariant TSC by changing the default behavior
to leaving the TSC enabled as the timecounter and disabling C2+ instead
of disabling the TSC by default.

Discussed with:		jkim
Tested by:		Jan Kokemuller <jan.kokemueller@gmail.com>
2015-01-29 20:41:42 +00:00
royger
313e96ad33 loader: fix the size of MODINFOMD_MODULEP
The data in MODINFOMD_MODULEP is packed by the loader as a 4 byte type, but
the amd64 kernel expects a vm_paddr_t, which is of size 8 bytes. Fix this by
saving it as 8 bytes in the loader and retrieving it using the proper type
in the kernel.

Sponsored by: Citrix Systems R&D
2015-01-20 12:28:24 +00:00
neel
1fe4c6403b Update the vdso timehands only via tc_windup().
Prior to this change CLOCK_MONOTONIC could go backwards when the timecounter
hardware was changed via 'sysctl kern.timecounter.hardware'. This happened
because the vdso timehands update was missing the special treatment in
tc_windup() when changing timecounters.

Reviewed by:	kib
2015-01-20 03:54:30 +00:00
imp
5dd3f4fde2 Include mca_machdep.h. 2015-01-18 03:43:47 +00:00
imp
503404cbf7 Need to include opt_mca.h to test for DEV_MCA. 2015-01-17 02:17:59 +00:00
royger
0c5b62d3d2 loader: implement multiboot support for Xen Dom0
Implement a subset of the multiboot specification in order to boot Xen
and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot
implementation is tailored to boot Xen and FreeBSD Dom0, and it will
most surely fail to boot any other multiboot compilant kernel.

In order to detect and boot the Xen microkernel, two new file formats
are added to the bootloader, multiboot and multiboot_obj. Multiboot
support must be tested before regular ELF support, since Xen is a
multiboot kernel that also uses ELF. After a multiboot kernel is
detected, all the other loaded kernels/modules are parsed by the
multiboot_obj format.

The layout of the loaded objects in memory is the following; first the
Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to
long mode by itself), after that the FreeBSD kernel is loaded as a RAW
file (Xen will parse and load it using it's internal ELF loader), and
finally the metadata and the modules are loaded using the native
FreeBSD way. After everything is loaded we jump into Xen's entry point
using a small trampoline. The order of the multiboot modules passed to
Xen is the following, the first module is the RAW FreeBSD kernel, and
the second module is the metadata and the FreeBSD modules.

Since Xen will relocate the memory position of the second
multiboot module (the one that contains the metadata and native
FreeBSD modules), we need to stash the original modulep address inside
of the metadata itself in order to recalculate its position once
booted. This also means the metadata must come before the loaded
modules, so after loading the FreeBSD kernel a portion of memory is
reserved in order to place the metadata before booting.

In order to tell the loader to boot Xen and then the FreeBSD kernel the
following has to be added to the /boot/loader.conf file:

xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga"
xen_kernel="/boot/xen"

The first argument contains the command line that will be passed to the Xen
kernel, while the second argument is the path to the Xen kernel itself. This
can also be done manually from the loader command line, by for example
typing the following set of commands:

OK unload
OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga
OK load kernel
OK load zfs
OK load if_tap
OK load ...
OK boot

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D517

For the Forth bits:
Submitted by: Julien Grall <julien.grall AT citrix.com>
2015-01-15 16:27:20 +00:00
kib
11969484c8 For x86, read MAXPHYADDR, defined in SDM vol 3 4.1.4 Enumeration of Paging
Features by CPUID as CPUID.80000008H:EAX[7:0], into variable cpu_maxphyaddr.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-12 07:36:25 +00:00
kib
5757b9b9b2 Right now, for non-coherent DMARs, page table update code flushes the
cache for whole page containing modified pte, and more, only last page
in the series of the consequtive pages is flushed (i.e. the affected
mappings should be larger than 2MB).

Avoid excessive flushing and do missed neccessary flushing, by
splitting invalidation and unmapping.  For now, flush exactly the
range of the changed pte.  This is still somewhat bigger than
neccessary, since pte is 8 bytes, while cache flush line is at least
32 bytes.

The originator of the issue reports that after the change,
'dmar_bus_dmamap_unload went from 13,288 cycles down to
3,257. dmar_bus_dmamap_load_buffer went from 9,686 cycles down to
3,517.  and I am now able to get line 1GbE speed with Netperf TCP
(even with 1K message size).'

Diagnosed and tested by:	Nadav Amit <nadav.amit@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-11 20:27:15 +00:00
kib
4fe2e4b2c6 Fix calculation of requester for PCI device behind PCIe/PCI bridge.
In my case on the test machine, I have hierarchy of
pcib2 (PCIe port on host bridge with PCIe capability) -> pci2 ->
    pcib3 (ITE PCIe/PCI bridge) -> pci3 -> em1

The device to check PCIe capability is pcib2 and not pcib3, as it is
currently done in the code.  Also, in case of the bridge, we shall
step to pcib2 for the loop iteration, since pcib3 does not carry PCIe
capability info and would force wrong recalculation of rid.

Also change the returned requester to the PCIe bus which provides port
for the bridge.  This only results in changing
hw.busdma.pciX.X.X.X.bounce tunable to force identity-mapped context
for the device.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-10 23:12:49 +00:00
kib
b1c8acc0bc Print rid when announcing DMAR context creation. Print sid when fault
occurs.  This allows to connect dots in case the requester is
calculated erronously.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-10 22:57:08 +00:00
kib
4a8a592617 Fix DMAR context allocations for the devices behind PCIe->PCI bridges
after dmar driver was converted to use rids.  The bus component to
calculate context page must be taken from the requestor rid, which is
a bridge, and not from the device bus number.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-09 02:10:44 +00:00
sbruno
8d4db71d65 Update Features2 to display SDBG capability of processor. This is
showing up on Haswell-class CPUs

From the Intel SDM, "Table 3-20. Feature Information Returned in the
ECX Register"

11 | SDBG | A value of 1 indicates the processor supports
IA32_DEBUG_INTERFACE MSR for silicon debug.

Submitted by:	jiashiun@gmail.com
Reviewed by:	jhb neel
MFC after:	2 weeks
2015-01-08 16:50:35 +00:00
jhb
06e75f0dba Create a cpuset mask for each NUMA domain that is available in the
kernel via the global cpuset_domain[] array. To export these to userland,
add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a
specific domain. Add a -d flag to cpuset(1) that can be used to fetch
the mask for a given domain.

Differential Revision:	https://reviews.freebsd.org/D1232
Submitted by:	jeff (kernel bits)
Reviewed by:	adrian, jeff
2015-01-08 15:53:13 +00:00
markj
7e7e145818 Factor out duplicated code from dumpsys() on each architecture into generic
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.

PR:		193873
Differential Revision:	https://reviews.freebsd.org/D904
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by:	jhibbits (earlier version)
Sponsored by:	EMC / Isilon Storage Division
2015-01-07 01:01:39 +00:00
jhb
55d0376a65 On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST).  Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.

PR:		192316
Differential Revision:	https://reviews.freebsd.org/D1441
No objection from:	jkim
MFC after:	1 week
2015-01-05 20:44:44 +00:00
hselasky
a96e5946f7 Fix warning about possible use of uninitialized variable. 2015-01-02 08:42:44 +00:00
royger
9d0e5e2b5e xen/intr: balance dynamic interrupts across available vCPUs
By default Xen binds all event channels to vCPU#0, and FreeBSD only shuffles
the interrupt sources once, at the end of the boot process. Since new event
channels might be created after this point (because new devices or backends
are added), try to automatically shuffle them at creation time.

This does not affect VIRQ or IPI event channels, that are already bound to a
specific vCPU as requested by the caller.

Sponsored by: Citrix Systems R&D
2014-12-10 13:25:21 +00:00
royger
f5723debac xen: mask event channels while binding them to a vCPU
Mask the event channel source before trying to bind it to a CPU, this
prevents stray interrupts from firing while assigning them and hitting the
KASSERT in xen_intr_handle_upcall.

Sponsored by: Citrix Systems R&D
2014-12-10 11:42:02 +00:00
royger
e09f127692 xen: convert the Grant-table code to a NewBus device
This allows the Grant-table code to attach directly to the xenpv bus,
allowing us to remove the grant-table initialization done in xenpv.

Sponsored by: Citrix Systems R&D
2014-12-10 11:35:41 +00:00
royger
1f62e17066 xen: create a new PCI bus override
When running as a Xen PVH Dom0 we need to add custom buses that override
some of the functionality present in the ACPI PCI Bus and the PCI Bus. We
currently override the ACPI PCI Bus, but not the PCI Bus, so add a new
override for the PCI Bus and share the generic functions between them.

Reported by: David P. Discher <dpd@dpdtech.com>
Sponsored by: Citrix Systems R&D

conf/files.amd64:
 - Add the new files.

x86/xen/xen_pci_bus.c:
 - Generic file that contains the PCI overrides so they can be used by the
   several PCI specific buses.

xen/xen_pci.h:
 - Prototypes for the generic overried functions.

dev/xen/pci/xen_pci.c:
 - Xen specific override for the PCI bus.

dev/xen/pci/xen_acpi_pci.c:
 - Xen specific override for the ACPI PCI bus.
2014-12-09 18:03:25 +00:00
royger
ba7194c81a xen: notify ACPI about SCI override
If the SCI is remapped to a non-ISA global interrupt notify the ACPI
subsystem about the override.

Reported by: David P. Discher <dpd@dpdtech.com>
Sponsored by: Citrix Systems R&D
2014-12-09 11:12:24 +00:00
jhb
1671ac9155 Improve support for XSAVE with debuggers.
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
  to match what Linux does in that 1) it dumps the entire XSAVE area
  including the fxsave state, and 2) it stashes a copy of the current
  xsave mask in the unused padding between the fxsave state and the
  xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
  only the extra portion. This avoids having to always make two
  ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
  XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
  and the current XSAVE mask (%xcr0).

Differential Revision:	https://reviews.freebsd.org/D1193
Reviewed by:	kib
MFC after:	2 weeks
2014-11-21 20:53:17 +00:00
jhb
fdfced8ce8 MFamd64: Add support for extended FPU states on i386. This includes
support for AVX on i386.
- Similar to amd64, move the FPU save area out of the PCB and instead
  store saved FPU state in a variable-sized buffer after the PCB on the
  stack.
- To support the variable PCB location, alter the locore code to only use
  the bottom-most page of proc0stack for init386().  init386() returns
  the correct stack pointer to locore which adjusts the stack for thread0
  before calling mi_startup().
- Don't bother setting cr3 in thread0's pcb in locore before calling
  init386().  It wasn't used (init386() overwrote it at the end) and
  it doesn't work with the variable-sized FPU save area.
- Remove the new-bus attachment from npx.  This was only ever useful for
  external co-processors using IRQ13, but those have not been supported
  for several years.  npxinit() is now called much earlier during boot
  (init386()) similar to amd64.
- Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE.
- npxsave() is now only called from context switch contexts so it can
  use XSAVEOPT.

Differential Revision:	https://reviews.freebsd.org/D1058
Reviewed by:	kib
Tested on:	FreeBSD/i386 VM under bhyve on Intel i5-2520
2014-11-02 22:58:30 +00:00
jhb
d47eb7d2d4 Rework virtual machine hypervisor detection.
- Move the existing code to x86/x86/identcpu.c since it is x86-specific.
- If the CPUID2_HV flag is set, assume a hypervisor is present and query
  the 0x40000000 leaf to determine the hypervisor vendor ID.  Export the
  vendor ID and the highest supported hypervisor CPUID leaf via
  hv_vendor[] and hv_high variables, respectively.  The hv_vendor[]
  array is also exported via the hw.hv_vendor sysctl.
- Merge the VMWare detection code from tsc.c into the new probe in
  identcpu.c.  Add a VM_GUEST_VMWARE to identify vmware and use that in
  the TSC code to identify VMWare.

Differential Revision:	https://reviews.freebsd.org/D1010
Reviewed by:	delphij, jkim, neel
2014-10-28 19:17:44 +00:00
grehan
1d26d798b2 Output a summary of optional SVM features in dmesg similar to CPU features.
If bootverbose is enabled, a detailed list is provided; otherwise, a
single-line summary is displayed.

Differential Revision:	https://reviews.freebsd.org/D1008
Reviewed by:	jhb, neel
MFC after:	1 week
2014-10-27 22:02:35 +00:00
royger
919b7d8b7c xen: implement the privcmd user-space device
This device is only attached to priviledged domains, and allows the
toolstack to interact with Xen. The two functions of the privcmd
interface is to allow the execution of hypercalls from user-space, and
the mapping of foreign domain memory.

Sponsored by: Citrix Systems R&D

i386/include/xen/hypercall.h:
amd64/include/xen/hypercall.h:
 - Introduce a function to make generic hypercalls into Xen.

xen/interface/xen.h:
xen/interface/memory.h:
 - Import the new hypercall XENMEM_add_to_physmap_range used by
   auto-translated guests to map memory from foreign domains.

dev/xen/privcmd/privcmd.c:
 - This device has the following functions:
   - Allow user-space applications to make hypercalls into Xen.
   - Allow user-space applications to map memory from foreign domains,
     this is accomplished using the newly introduced hypercall
     (XENMEM_add_to_physmap_range).

xen/privcmd.h:
 - Public ioctl interface for the privcmd device.

x86/xen/hvm.c:
 - Remove declaration of hypercall_page, now it's declared in
   hypercall.h.

conf/files:
 - Add the privcmd device to the build process.
2014-10-22 17:07:20 +00:00
royger
bca2091349 xen: allow to register event channels without handlers
This is needed by the event channel user-space device, that requires
registering event channels without unmasking them. intr_add_handler
will unconditionally unmask the event channel, so we avoid calling it
if no filter/handler is provided, and then the user will be in charge
of calling it when ready.

In order to do this, we need to change the opaque type
xen_intr_handle_t to contain the event channel port instead of the
opaque cookie returned by intr_add_handler, since now registration of
event channels without handlers are allowed. The cookie will now be
stored inside of the private xenisrc struct. Also, introduce a new
function called xen_intr_add_handler that allows adding a
filter/handler after the event channel has been registered.

Sponsored by: Citrix Systems R&D

x86/xen/xen_intr.c:
 - Leave the event channel without a handler if no filter/handler is
   provided to xen_intr_bind_isrc.
 - Don't perform an evtchn_mask_port, intr_add_handler will already do
   it.
 - Change the opaque type xen_intr_handle_t to contain a pointer to
   the event channel port number, and make the necessary changes to
   related functions.
 - Introduce a new function called xen_intr_add_handler that can be
   used to add filter/handlers to an event channel after registration.

xen/xen_intr.h:
 - Add prototype of xen_intr_add_handler.
2014-10-22 16:51:52 +00:00
royger
040f3ac494 xen: fix usage of kern_getenv in PVH code
The value returned by kern_getenv should be freed using freeenv.

Reported by:	Coverity
CID:		1248852
Sponsored by: Citrix Systems R&D
2014-10-22 16:49:00 +00:00
marcel
48e5a4e056 Virtual machines can easily have more than 16 option ROMs and
when that happens, we happily access our resource array out of
bounds. Make sure we stay within the MAX_ROMS limit.
While here, bump MAX_ROMS from 16 to 32 to minimize the chance
of leaving option ROMs unaccounted for.

Obtained from:	Juniper Networks, Inc.
2014-10-22 01:37:32 +00:00
hselasky
49c137f7be Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
neel
be8b3ca439 Merge from projects/bhyve_svm all the changes outside vmm.ko or bhyve utilities:
Add support for AMD's nested page tables in pmap.c:
- Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit)
  for a pmap of type PT_RVI.
- Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of
  type PT_EPT or PT_RVI.

Add CPU_SET_ATOMIC_ACQ(num, cpuset):
This is used when activating a vcpu in the nested pmap. Using the 'acquire'
variant guarantees that the load of the 'pm_eptgen' will happen only after
the vcpu is activated in 'pm_active'.

Add defines for various AMD-specific MSRs.

Submitted by:	Anish Gupta (akgupt3@gmail.com)
2014-10-20 18:09:33 +00:00
davide
e88bd26b3f Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
neel
f443215307 Support Intel-specific MSRs that are accessed when booting up a linux in bhyve:
- MSR_PLATFORM_INFO
- MSR_TURBO_RATIO_LIMITx
- MSR_RAPL_POWER_UNIT

Reviewed by:	grehan
MFC after:	1 week
2014-10-09 19:13:33 +00:00
adrian
678ef3c9f9 Missing from previous commit - keep the VM domain -> PXM mapping
array and use it to map PXM -> VM domain when needed.

Differential Revision:	D906
Reviewed by:	jhb
2014-10-09 05:34:28 +00:00
markj
0ebf86e1b1 Pass up the error status of minidumpsys() to its callers.
PR:		193761
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by:	EMC / Isilon Storage Division
2014-10-08 20:25:21 +00:00
jhb
ec52dc2e32 Fix build for i386 kernels with out 'I686_CPU'.
PR:		193660
Submitted by:	holger@freyther.de
2014-10-06 18:11:05 +00:00
royger
0d6c943749 xen: add the Xen implementation of pci_child_added method
Add the Xen specific implementation of pci_child_added to the Xen PCI
bus. This is needed so FreeBSD can register the devices it finds with
the hypervisor.

Sponsored by: Citrix Systems R&D

x86/xen/xen_pci.c:
 - Add the Xen pci_child_added method.
2014-09-30 16:49:17 +00:00
royger
c5a5f5947f msi: add Xen MSI implementation
This patch adds support for MSI interrupts when running on Xen. Apart
from adding the Xen related code needed in order to register MSI
interrupts this patch also makes the msi_init function a hook in
init_ops, so different MSI implementations can have different
initialization functions.

Sponsored by: Citrix Systems R&D

xen/interface/physdev.h:
 - Add the MAP_PIRQ_TYPE_MULTI_MSI to map multi-vector MSI to the Xen
   public interface.

x86/include/init.h:
 - Add a hook for setting custom msi_init methods.

amd64/amd64/machdep.c:
i386/i386/machdep.c:
 - Set the default msi_init hook to point to the native MSI
   initialization method.

x86/xen/pv.c:
 - Set the Xen MSI init hook when running as a Xen guest.

x86/x86/local_apic.c:
 - Call the msi_init hook instead of directly calling msi_init.

xen/xen_intr.h:
x86/xen/xen_intr.c:
 - Introduce support for registering/releasing MSI interrupts with
   Xen.
 - The MSI interrupts will use the same PIC as the IO APIC interrupts.

xen/xen_msi.h:
x86/xen/xen_msi.c:
 - Introduce a Xen MSI implementation.

x86/xen/xen_nexus.c:
 - Overwrite the default MSI hooks in the Xen Nexus to use the Xen MSI
   implementation.

x86/xen/xen_pci.c:
 - Introduce a Xen specific PCI bus that inherits from the ACPI PCI
   bus and overwrites the native MSI methods.
 - This is needed because when running under Xen the MSI messages used
   to configure MSI interrupts on PCI devices are written by Xen
   itself.

dev/acpica/acpi_pci.c:
 - Lower the quality of the ACPI PCI bus so the newly introduced Xen
   PCI bus can take over when needed.

conf/files.i386:
conf/files.amd64:
 - Add the newly created files to the build process.
2014-09-30 16:46:45 +00:00
royger
890b160ee5 xen: add proper copyright attribution
Noted by:	jmallett
2014-09-26 09:05:55 +00:00
royger
494dc32ba6 ddb: allow specifying the exact address of the symtab and strtab
When the FreeBSD kernel is loaded from Xen the symtab and strtab are
not loaded the same way as the native boot loader. This patch adds
three new global variables to ddb that can be used to specify the
exact position and size of those tables, so they can be directly used
as parameters to db_add_symbol_table. A new helper is introduced, so callers
that used to set ksym_start and ksym_end can use this helper to set the new
variables.

It also adds support for loading them from the Xen PVH port, that was
previously missing those tables.

Sponsored by: Citrix Systems R&D
Reviewed by:	kib

ddb/db_main.c:
 - Add three new global variables: ksymtab, kstrtab, ksymtab_size that
   can be used to specify the position and size of the symtab and
   strtab.
 - Use those new variables in db_init in order to call db_add_symbol_table.
 - Move the logic in db_init to db_fetch_symtab in order to set ksymtab,
   kstrtab, ksymtab_size from ksym_start and ksym_end.

ddb/ddb.h:
 - Add prototype for db_fetch_ksymtab.
 - Declate the extern variables ksymtab, kstrtab and ksymtab_size.

x86/xen/pv.c:
 - Add support for finding the symtab and strtab when booted as a Xen
   PVH guest. Since Xen loads the symtab and strtab as NetBSD expects
   to find them we have to adapt and use the same method.

amd64/amd64/machdep.c:
arm/arm/machdep.c:
i386/i386/machdep.c:
mips/mips/machdep.c:
pc98/pc98/machdep.c:
powerpc/aim/machdep.c:
powerpc/booke/machdep.c:
sparc64/sparc64/machdep.c:
 - Use the newly introduced db_fetch_ksymtab in order to set ksymtab,
   kstrtab and ksymtab_size.
2014-09-25 08:28:10 +00:00
neel
46721cc2c7 Restructure the MSR handling so it is entirely handled by processor-specific
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.

The VT-x code has the following types of MSRs:

- MSRs that are unconditionally saved/restored on every guest/host context
  switch (e.g., MSR_GSBASE).

- MSRs that are restored to guest values on entry to vmx_run() and saved
  before returning. This is an optimization for MSRs that are not used in
  host kernel context (e.g., MSR_KGSBASE).

- MSRs that are emulated and every access by the guest causes a trap into
  the hypervisor (e.g., MSR_IA32_MISC_ENABLE).

Reviewed by:	grehan
2014-09-20 02:35:21 +00:00
adrian
e4c630d701 Migrate ie->ie_assign_cpu and associated code to use an int for CPU rather
than u_char.

Migrate post_filter to use an int for a CPU rather than u_char.

Change intr_event_bind() to use an int for CPU rather than u_char.

It touches the ppc, sparc64, arm and mips machdep code but it should
(hah!) be a no-op.

Tested:

* i386, AMD64 laptops

Reviewed by:	jhb
2014-09-17 17:33:22 +00:00
royger
522c50de15 xen: don't set suspend/resume methods for the PIRQ PIC
The suspend/resume of event channels is already handled by the xen_intr_pic.
If those methods are set on the PIRQ PIC they are just called twice, which
breaks proper resume. This fix restores migration of FreeBSD guests to a
working state.

Sponsored by: Citrix Systems R&D
2014-09-15 15:15:52 +00:00
jhb
6f8d6cd57b To workaround an errata on certain Pentium Pro CPUs, i386 disables
the local APIC in initializecpu() and re-enables it if the APIC code
decides to use the local APIC after all.  Rework this workaround
slightly so that initializecpu() won't re-disable the local APIC if
it is called after the APIC code re-enables the local APIC.
2014-09-10 21:25:54 +00:00
jhb
2a48d5d52c Move code to set various MSRs on AMD cpus out of printcpuinfo() and
into initalizecpu() instead.
2014-09-10 21:04:44 +00:00
kib
409097f5b7 Add a define for index of IA32_XSS MSR, which is, per SDM rev. 50, an
analog of XCR0 for ring 0 FPU state, used by XSAVES and XRSTORS.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-06 19:47:37 +00:00
kib
a52df371a9 SDM rev. 50 defines the use of the next 8 bytes in the xstate header.
It is the compaction bitmask, with the highest bit defining if compact
format of the xsave area is used at all.

Adjust the definition of struct xstate_hdr, provide define for bit 63.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-06 19:39:12 +00:00
kib
51e3f3be5c Add more bits for the XSAVE features from CPUID 0xd, sub-function 1
%eax report.

Print the XSAVE features 0xd/1 in the boot banner.  The printcpuinfo()
is executed late enough so that XSAVE is already enabled.

There is no known to me off the shelf hardware that implements any
feature bits except XSAVEOPT, the list is taken from SDM rev. 50.  The
banner printing will allow us to note the hardware arrival.

Sponsored by:	    The FreeBSD Foundation
MFC after:	    1 week
2014-09-06 15:45:45 +00:00
jhb
3a8cf1a38b Create a separate structure for per-CPU state saved across suspend and
resume that is a superset of a pcb.  Move the FPU state out of the pcb and
into this new structure.  As part of this, move the FPU resume code on
amd64 into a C function.  This allows resumectx() to still operate only on
a pcb and more closely mirrors the i386 code.

Reviewed by:	kib (earlier version)
2014-09-06 15:23:28 +00:00
jhb
f7c94cc497 Merge the amd64 and i386 identcpu.c into a single x86 implementation.
This brings the structured extended features mask and VT-x reporting to
i386 and Intel cache and TLB info (under bootverbose) to amd64.
2014-09-04 14:26:25 +00:00
jhb
f937116d20 - Move blacklists of broken TSCs out of the printcpuinfo() function
and into the TSC probe routine.
- Initialize cpu_exthigh once in finishidentcpu() which is called
  before printcpuinfo() (and matches the behavior on amd64).
2014-09-04 02:25:59 +00:00
jhb
59920f0385 Save and restore FPU state across suspend and resume. In earlier revisions
of this patch, resumectx() called npxresume() directly, but that doesn't
work because resumectx() runs with a non-standard %cs selector.  Instead,
all of the FPU suspend/resume handling is done in C.

MFC after:	1 week
2014-08-30 17:48:38 +00:00
royger
c934f5fd28 atpic: make sure atpic_init is called after IO APIC initialization
After r269510 the IO APIC and ATPIC initialization is done at the same
order, which means atpic_init can be called before the IO APIC has
been initalized. In that case the ATPIC will take over the interrupt
sources, preventing the IO APIC from registering them.

Reported by: David Wolfskill <david@catwhisker.org>
Tested by: David Wolfskill <david@catwhisker.org>,
           Trond Endrestøl <Trond.Endrestol@fagskolen.gjovik.no>
Sponsored by: Citrix Systems R&D
2014-08-07 17:00:50 +00:00
royger
8daf97263e xen: add ACPI bus to xen_nexus when running as Dom0
Also disable a couple of ACPI devices that are not usable under Dom0.
To this end a couple of booleans are added that allow disabling ACPI
specific devices.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb

x86/xen/xen_nexus.c:
 - Return BUS_PROBE_SPECIFIC in the Xen Nexus attachement routine to
   force the usage of the Xen Nexus.
 - Attach the ACPI bus when running as Dom0.

dev/acpica/acpi_cpu.c:
dev/acpica/acpi_hpet.c:
dev/acpica/acpi_timer.c
 - Add a variable that gates the addition of the devices.

x86/include/init.h:
 - Declare variables that control the attachment of ACPI cpu, hpet and
   timer devices.
2014-08-04 09:05:28 +00:00
royger
937cdd0a36 xen: implement support for mapping IO APIC interrupts on Xen
Allow a privileged Xen guest (Dom0) to parse the MADT ACPI interrupt
overrides and register them with the interrupt subsystem.

Also add a Xen specific implementation for bus_config_intr that
registers interrupts on demand for all the vectors less than
FIRST_MSI_INT.

Sponsored by: Citrix Systems R&D

x86/xen/pvcpu_enum.c:
 - Use helper functions from x86/acpica/madt.c in order to parse
   interrupt overrides from the MADT.
 - Walk the MADT and register any interrupt override with the
   interrupt subsystem.

x86/xen/xen_nexus.c:
 - Add a custom bus_config_intr method for Xen that intercepts calls
   to configure unset interrupts and registers them on the fly (if the
   vector is < FIRST_MSI_INT).
2014-08-04 09:01:21 +00:00
royger
afa7324d1a x86/madt: make the interrupt override parser a public function
Split a portion of the code in madt_parse_interrupt_override to a
separate function, that is public and can be used from other code.
This will be needed by the Xen port, since FreeBSD needs to parse the
interrupt overrides and notify Xen about them.

This commit should not introduce any functional change.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb, gibbs

x86/acpica/madt.c:
 - Introduce madt_parse_interrupt_values() that parses the intr
   information from ACPI and returns the triggering and the polarity.
   This is a subset of the functionality that used to be part of
   madt_parse_interrupt_override().
 - Make madt_found_sci_override a global variable that can be used
   from other files.

x86/include/acpica_machdep.h:
 - Prototype of madt_parse_interrupt_values.
 - Extern declaration of madt_found_sci_override.
2014-08-04 08:58:50 +00:00
royger
668dd4b0cb xen: change quality of the MADT ACPI enumerator
Lower the quality of the MADT ACPI enumerator, so on Xen Dom0 we can
force the usage of the Xen mptable enumerator even when ACPI is
detected.

This is needed because Xen might restrict the number of vCPUs
available to Dom0, but the MADT ACPI table parsed in FreeBSD is the
native one (which enumerates all the CPUs available in the system).

Sponsored by: Citrix Systems R&D
Reviewed by: gibbs

x86/acpica/madt.c:
 - Lower MADT enumerator quality to -50.

x86/xen/pvcpu_enum.c:
 - Rise Xen PV enumerator to 0.
2014-08-04 08:56:20 +00:00
royger
1bfb01ea6b xen: change order of Xen intr init and IO APIC registration
This change inserts the Xen interrupt subsystem (event channels)
initialization between the system interrupt initialization and the IO
APIC source registration.

This is needed when running on Dom0, that routes physical interrupts
on top of event channels, so that the interrupt sources found during
IO APIC initialization can be registered using the Xen interrupt
subsystem.

The resulting order in the SI_SUB_INTR stage is the following:

- System intr initialization
- Xen intr initalization
- IO APIC source registration

Sponsored by: Citrix Systems R&D

x86/x86/local_apic.c:
 - Change order of apic_setup_io to be called after xen interrupt
   subsystem is setup.

x86/xen/xen_intr.c:
 - Init Xen event channels before apic_setup_io.
2014-08-04 08:54:34 +00:00
royger
3c669f2b53 xen: add a DDB command to print event channel information
Add a new DDB command to dump all registered event channels.

Sponsored by: Citrix Systems R&D

x86/xen/xen_intr.c:
 - Add a new xen_evtchn command to DDB in order to dump all
   information related to event channels.
2014-08-04 08:52:10 +00:00
royger
8d27fa514f xen: mask all event channels on init
Mask all event channels during initialization. This is done so that we
don't receive spurious interrupts while dynamically registering new
event channels. There's a small window during registration where an
event channel can fire before we have attached a handler to it.

Sponsored by: Citrix Systems R&D

x86/xen/xen_intr.c:
 - Mask all event channels on init.
2014-08-04 08:43:27 +00:00
royger
eb7b09e785 xen: implement event channel PIRQ support
This allows Dom0 to manage physical hardware, redirecting the
physical interrupts to event channels.

Sponsored by: Citrix Systems R&D

x86/xen/xen_intr.c:
 - Expand struct xenisrc to hold the level and triggering of PIRQ
   event channels.
 - Implement missing methods in xen_intr_pirq_pic.
 - Allow xen_intr_alloc_isrc to take a vector parameter that globally
   identifies the interrupt. This is only used for PIRQs that are
   bound to a specific hardware IRQ.
 - Introduce xen_register_pirq used to register IO APIC legacy PIRQ
   interrupts.
 - Add support for the dynamic PIRQ EOI map, this shared memory is
   modified by Xen (if it suppoorts that feature), and notifies the
   guest if an EOI is needed or not. If it's not available fall back
   to the old implementation using PHYSDEVOP_irq_status_query.
 - Rename xen_intr_isrc_count to xen_intr_auto_vector_count and
   replace it's usages.
 - Align static variables by name.

xen/xen_intr.h:
 - Add prototype for xen_register_pirq.
2014-08-04 08:42:29 +00:00
jhb
05d21354c3 - Output a summary of optional VT-x features in dmesg similar to CPU
features.  If bootverbose is enabled, a detailed list is provided;
  otherwise, a single-line summary is displayed.
- Add read-only sysctls for optional VT-x capabilities used by bhyve
  under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that
  indicate the presence of optional capabilities under this node.

CR:		https://phabric.freebsd.org/D498
Reviewed by:	grehan, neel
MFC after:	1 week
2014-07-30 00:00:12 +00:00
marius
0af32d8345 Fix yet another comment typo in r269052. 2014-07-29 14:54:23 +00:00
marius
2ff80e0967 Fix comment typo in r269052.
Submitted by:	Daniel O'Connor
2014-07-29 13:26:24 +00:00
akiyama
966a1bc8d6 Add missing newline to output dmesg properly. 2014-07-28 13:47:02 +00:00
gavin
b01aa1f6d0 Add error return to dumpsys(), and use it in doadump().
This commit does not add error returns to minidumpsys() or
textdump_dumpsys(); those can also be added later.

Submitted by:	Conrad Meyer (EMC / Isilon storage division)
2014-07-25 23:52:53 +00:00
marius
34c6aed3e2 Intel desktop Haswell CPUs may report benign corrected parity errors (see
HSD131 erratum in [1]) at a considerable rate. So filter these (default),
unless logging is enabled. Unfortunately, there really is no better way to
reasonably implement suppressing these errors than to just skipping them
in mca_log(). Given that they are reported for bank 0, they'd need to be
masked in MSR_MC0_CTL. However, P6 family processors require that register
to be set to either all 0s or all 1s, disabling way more than the one error
in question when using all 0s there. Alternatively, it could be masked for
the corresponding CMCI, but that still wouldn't keep the periodic scanner
from detecting these spurious errors. Apart from that, register contents of
MSR_MC0_CTL{,2} don't seem to be publicly documented, neither in the Intel
Architectures Developer's Manual nor in the Haswell datasheets.

Note that while HSD131 actually is only about C0-stepping as of revision
014 of the Intel desktop 4th generation processor family specification
update, these corrected errors also have been observed with D0-stepping
aka "Haswell Refresh".

1: http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf

Reviewed by:	jhb
MFC after:	3 days
Sponsored by:	Bally Wulff Games & Entertainment GmbH
2014-07-24 10:14:51 +00:00
jhb
17d78db27b Fix build with SMP disabled.
CR:		https://phabric.freebsd.org/D407
Reviewed by:	royger
2014-07-15 15:40:33 +00:00
marcel
9f28abd980 Remove ia64.
This includes:
o   All directories named *ia64*
o   All files named *ia64*
o   All ia64-specific code guarded by __ia64__
o   All ia64-specific makefile logic
o   Mention of ia64 in comments and documentation

This excludes:
o   Everything under contrib/
o   Everything under crypto/
o   sys/xen/interface
o   sys/sys/elf_common.h

Discussed at: BSDcan
2014-07-07 00:27:09 +00:00
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
hselasky
bd1ed65f0f Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
hselasky
e34bd25949 Fix compile warning: Remove duplicate external declaration. 2014-06-19 05:06:24 +00:00
royger
4f6426a64f xen: fix out-of-bounds access to ipi_handle
Fix the gate in xen_pv_lapic_ipi_vectored to prevent access to element
at position nitems(xen_ipis).

Sponsored by: Citrix Systems R&D
Coverity ID: 1223203
Approved by: gibbs
2014-06-18 13:41:20 +00:00
kib
0b024c9158 Do not reference native_lapic_ipi_*() functions in the UP build.
The functions' definitions are protected by #ifdef SMP.
Keeping apic_ops.ipi_*() methods NULL would allow to catch the use
on UP machines.

Reviewed by:	royger
Sponsored by:	The FreeBSD Foundation
2014-06-17 09:33:22 +00:00
royger
ac5689b414 xen: add missing files
Commit missing files that actually belong to previous commits.

Sponsored by: Citrix Systems R&D
Approved by: gibbs
2014-06-16 08:54:04 +00:00
royger
152e3229be isa: allow ISA bus to attach to xenpv bus
This is needed because syscons depends on ISA.

Sponsored by: Citrix Systems R&D
Approved by: gibbs

x86/isa/isa.c:
 - Allow the ISA bus to attach to xenpv.
2014-06-16 08:49:16 +00:00
royger
a2b989a585 xen: add hooks for Xen PV APIC
Create the necessary hooks in order to provide a Xen PV APIC
implementation that can be used on PVH. Most of the lapic ops
shouldn't be called on Xen, since we trap those operations at a higher
layer.

Sponsored by: Citrix Systems R&D
Approved by: gibbs

x86/xen/hvm.c:
x86/xen/xen_apic.c:
 - Move IPI related code to xen_apic.c

x86/xen/xen_apic.c:
 - Introduce Xen PV APIC implementation, most of the functions of the
   lapic interface should never be called when running as PV(H) guest,
   so make sure FreeBSD panics when trying to use one of those.
 - Define the Xen APIC implementation in xen_apic_ops.

xen/xen_pv.h:
 - Extern declaration of the xen_apic struct.

x86/xen/pv.c:
 - Use xen_apic_ops as apic_ops when running as PVH guest.

conf/files.amd64:
conf/files.i386:
 - Include the xen_apic.c file in the build of i386/amd64 kernels
   using XENHVM.
2014-06-16 08:43:45 +00:00
royger
7c7f3fb2d0 amd64/i386: introduce APIC hooks for different APIC implementations.
This is needed for Xen PV(H) guests, since there's no hardware lapic
available on this kind of domains. This commit should not change
functionality.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Approved by: gibbs

amd64/include/cpu.h:
amd64/amd64/mp_machdep.c:
i386/include/cpu.h:
i386/i386/mp_machdep.c:
 - Remove lapic_ipi_vectored hook from cpu_ops, since it's now
   implemented in the lapic hooks.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
 - Use lapic_ipi_vectored directly, since it's now an inline function
   that will call the appropiate hook.

x86/x86/local_apic.c:
 - Prefix bare metal public lapic functions with native_ and mark them
   as static.
 - Define default implementation of apic_ops.

x86/include/apicvar.h:
 - Declare the apic_ops structure and create inline functions to
   access the hooks, so the change is transparent to existing users of
   the lapic_ functions.

x86/xen/hvm.c:
 - Switch to use the new apic_ops.
2014-06-16 08:43:03 +00:00
royger
6a8d0be395 xen: fix style in pv.c
Fix the lenght of some comments, and also add proper indentation to
xen_init_ops

Sponsored by: Citrix Systems R&D
Approved by: gibbs
2014-06-16 08:41:57 +00:00
scottl
38f58a4cec Eliminate the fake contig_dmamap and replace it with a new flag,
BUS_DMA_KMEM_ALLOC.  They serve the same purpose, but using the flag
means that the map can be NULL again, which in turn enables significant
optimizations for the common case of no bouncing.

Obtained from:	Netflix, Inc.
MFC after:	3 days
2014-05-27 21:31:11 +00:00
scottl
21c10cffc1 Now that there are separate back-end implementations of busdma, the bounce
implementation shouldn't steal flags from the common front-end.
Move those flags to the back-end.

Obtained from:  Netflix, Inc.
MFC after:      3 days
2014-05-27 14:18:57 +00:00
scottl
7cb2cf17ef Revert r266481. It was based on faulty analysis of the problem. A correct
fix is forthcoming.

Obtained from:	Netflix, Inc.
2014-05-27 14:06:23 +00:00
jhb
5662e763a6 Whitespace fix.
Submitted by:	kib
2014-05-22 18:13:17 +00:00
scottl
a1e98423bb Old PCIe implementations cannot allow a DMA transfer to cross a 4GB
boundary.  This was addressed several years ago by creating a parent
tag hierarchy for the root buses that set the boundary restriction
for appropriate buses and allowed child deviced to inherit it.
Somewhere along the way, this restriction was turned into a case for
marking the tag as a candidate for needing bounce buffers, instead
of just splitting the segment along the boundary line.  This flag
also causes all maps associated with this tag to be non-NULL, which
in turn causes bus_dmamap_sync() to take the slow path of function
pointer indirection to discover that there's no bouncing work to
do.  The end result is a lot of pages set aside in bounce pools
that will never be used, and a slow path for data buffers in nearly
every DMA-capable PCIe device.  For example, our workload at Netflix
was spending nearly 1% of all CPU time going through this slow path.

Fix this problem by being more selective about when to set the
COULD_BOUNCE flag.  Only set it when the boundary restriction
exists and the consumer cannot do more than a single DMA segment
at once.  This fixes the case of dynamic buffers (mbufs, bio's)
but doesn't address static buffers allocated from bus_dmamem_alloc().
That case will be addressed in the future.

For those interested, this was discovered thanks to Dtrace Flame
Graphs.

Discussed with: jhb, kib
Obtained from:	Netflix, Inc.
MFC after:	3 days
2014-05-20 22:43:17 +00:00
jhb
db4e203198 Add definitions for more structured extended features as well as
XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions).

Obtained from:	Intel's Instruction Set Extensions Programming Reference
                (March 2014)
2014-05-16 17:45:09 +00:00
imp
7081aab77a Make this compile with gcc.
Submitted by: royger@
2014-04-05 22:43:18 +00:00
rstone
254af40a92 Re-implement the DMAR I/O MMU code in terms of PCI RIDs
Under the hood the VT-d spec is really implemented in terms of
PCI RIDs instead of bus/slot/function, even though the spec makes
pains to convert back to bus/slot/function in examples.  However
working with bus/slot/function is not correct when PCI ARI is
in use, so convert to using RIDs in most cases.  bus/slot/function
will only be used when reporting errors to a user.

Reviewed by:	kib
MFC after:	2 months
Sponsored by:	Sandvine Inc.
2014-04-01 15:48:46 +00:00
rstone
120bf54d08 Revert PCI RID changes.
My PCI RID changes somehow got intermixed with my PCI ARI patch when I
committed it.  I may have accidentally applied a patch to a non-clean
working tree.  Revert everything while I figure out what went wrong.

Pointy hat to: rstone
2014-04-01 15:06:03 +00:00
rstone
eabfe8df7a Re-implement the DMAR I/O MMU code in terms of PCI RIDs
Under the hood the VT-d spec is really implemented in terms of
PCI RIDs instead of bus/slot/function, even though the spec makes
pains to convert back to bus/slot/function in examples.  However
working with bus/slot/function is not correct when PCI ARI is
in use, so convert to using RIDs in most cases.  bus/slot/function
will only be used when reporting errors to a user.

Reviewed by:	kib
Sponsored by:	Sandvine Inc.
2014-04-01 14:51:45 +00:00
tijl
606babe108 Rename __wchar_t so it no longer conflicts with __wchar_t from clang 3.4
-fms-extensions.

MFC after:	2 weeks
2014-04-01 14:46:11 +00:00
takawata
d968f19902 Change default logic to CONFORM because this routine is shared
with SCI polarity setting.

Reviewed by: jhb
2014-03-28 02:38:14 +00:00
takawata
a40940f6b2 Strict value checking will cause problem.
Bay trail DN2820FYKH is supported on Linux but does not work on FreeBSD.
This behaviour is bug-compatible with Linux-3.13.5.

References:
http://d.hatena.ne.jp/syuu1228/20140326
http://lxr.linux.no/linux+v3.13.5/arch/x86/kernel/acpi/boot.c#L1094

Submitted by: syuu
2014-03-27 06:36:38 +00:00
takawata
35538e1725 To check polarity, check ACPI_MADT_POLARITY_CONFORMS, instead of ACPI_MADT_TRIGGER_CONFORMS.
PR:amd64/188010
Submitted by: syuu
2014-03-27 06:08:07 +00:00
jhb
db52b17caa Fix build without SMP.
PR:		kern/187854
MFC after:	1 week
2014-03-26 17:40:13 +00:00
imp
9f008568e7 Remove vestiges of knowing the ISA bus, which we gave up on around 20
years ago. Remove redunant copy of isaregs.h.
2014-03-19 21:03:04 +00:00
kib
97d7557738 Add support for the PCI(e)-PCI bridges to the Intel VT-d driver. The
bridge takes ownership of the transaction, so bsf of the requester is
the bridge and not a device behind it.  As result, code needs to walk
the hierarchy up to use correct context.

Note that PCIe->PCI-X bridges are not handled quite correctly since
such bridges are allowed to only take ownership of some transactions.
Also, weird but unrealistic cases of PCIe behind PCI bus are also not
handled.

Still, the patch provides significant step forward for the bridge
handling.

Submitted by:	Jason Harmening <jason.harmening@gmail.com>
MFC after:	1 week
2014-03-18 16:41:32 +00:00
kib
f8f145010f It is not uncommon for BIOSes to report wrong RMRR entries in DMAR
table.  Among them, some (old AMI ?) BIOSes report entries with range
like (bf7ec000, bf7ebfff).  Attempts to ignore the bogus entries
result in faults, so the range must be covered somehow.

Provide a workaround by identity mapping the 32 pages after the bogus
entry start, which seems to be enough for the reported BIOS.

Reported and tested by:	Jason Harmening <jason.harmening@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-03-18 16:20:33 +00:00
kib
f53199b05b Trim at EOL.
MFC after:	3 days
2014-03-18 15:59:06 +00:00
emaste
dfd2dcdc01 Update NetBSD Foundation copyrights to 2-clause BSD
The NetBSD Foundation states "Third parties are encouraged to change the
license on any files which have a 4-clause license contributed to the
NetBSD Foundation to a 2-clause license."

This change removes clauses 3 and 4 from copyright / license blocks that
list The NetBSD Foundation as the only copyright holder.

Sponsored by:	The FreeBSD Foundation
2014-03-18 01:40:25 +00:00
jhb
bf0690b15c Correct type for malloc().
Submitted by:	"Conrad Meyer" <conrad.meyer@isilon.com>
2014-03-13 18:11:42 +00:00
royger
446e208ee2 xen: add a hook to perform AP startup
AP startup on PVH follows the PV method, so we need to add a hook in
order to diverge from bare metal.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Add hook for start_all_aps on native (using native_start_all_aps
   defined in mp_machdep).

amd64/amd64/mp_machdep.c:
 - Make some variables global because they will also be used by the
   Xen PVH AP startup code.
 - Use the start_all_aps hook to start APs.
 - Rename start_all_aps to native_start_all_aps.

amd64/include/smp.h:
 - Add declaration for native_start_all_aps.

x86/include/init.h:
 - Declare start_all_aps hook in init_ops.

x86/xen/pv.c:
 - Pick external declarations from mp_machdep.
 - Introduce Xen PV code to start APs on PVH.
 - Set start_all_aps init hook to use the Xen PVH implementation.
2014-03-11 10:27:57 +00:00
royger
b9559720d5 xen: changes to hvm code in order to support PVH guests
On PVH we don't need to init the shared info page, or disable emulated
devices. Also, make sure PV IPIs are set before starting the APs.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

x86/xen/hvm.c:
 - Return early from functions that are no-ops on Xen PVH guests.
 - In order to make sure PV IPIs are setup before AP startup,
   initialize them in SI_SUB_SMP-1.
2014-03-11 10:26:53 +00:00
royger
419270d8a7 xen: add hook for AP bootstrap memory reservation
This hook will only be implemented for bare metal, Xen doesn't require
any bootstrap code since APs are started in long mode with paging
enabled.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Set mp_bootaddress hook for bare metal.

x86/include/init.h:
 - Define mp_bootaddress in init_ops.
2014-03-11 10:26:16 +00:00
royger
66d3470d38 xen: add an apic_enumerator for PVH
On PVH there's no ACPI, so the CPU enumeration must be implemented
using Xen hypercalls.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

x86/xen/pvcpu_enum.c:
 - Enumerate avaiable vCPUs on PVH by using the VCPUOP_is_up
   hypercall.
 - Set vcpu_id for PVH guests.

conf/files.amd64:
 - Include the PV CPU enumerator in the XENHVM build.
2014-03-11 10:25:08 +00:00
royger
6b1be12234 xen: use the same hypercall mechanism for XEN and XENHVM
Currently XEN (PV) and XENHVM (PVHVM) ports use different ways to
issue hypercalls, unify this by filling the hypercall_page under HVM
also.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/include/xen/hypercall.h:
 - Unify Xen hypercall code by always using the PV way.

i386/i386/locore.s:
 - Define hypercall_page on i386 XENHVM.

x86/xen/hvm.c:
 - Fill hypercall_page on XENHVM kernels using the HVM method (only
   when running as an HVM guest).
2014-03-11 10:24:13 +00:00
royger
891131cb52 xen: implement hook to fetch and parse e820 memory map
e820 memory map is fetched using a hypercall under Xen PVH, so add a
hook to init_ops in oder to diverge from bare metal and implement a
Xen variant.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

x86/include/init.h:
 - Add a parse_memmap hook to init_ops, that will be called to fetch
   and parse the memory map.

amd64/amd64/machdep.c:
 - Decouple the fetch and the parse of the memmap, so the parse
   function can be shared with Xen code.
 - Move code around in order to implement the parse_memmap hook.

amd64/include/pc/bios.h:
 - Declare bios_add_smap_entries (implemented in machdep.c).

x86/xen/pv.c:
 - Implement fetching of e820 memmap when running as a PVH guest by
   using the XENMEM_memory_map hypercall.
2014-03-11 10:23:03 +00:00
royger
467e743960 xen: implement an early timer for Xen PVH
When running as a PVH guest, there's no emulated i8254, so we need to
use the Xen PV timer as the early source for DELAY. This change allows
for different implementations of the early DELAY function and
implements a Xen variant for it.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

dev/xen/timer/timer.c:
dev/xen/timer/timer.h:
 - Implement Xen early delay functions using the PV timer and declare
   them.

x86/include/init.h:
 - Add hooks for early clock source initialization and early delay
   functions.

i386/i386/machdep.c:
pc98/pc98/machdep.c:
amd64/amd64/machdep.c:
 - Set early delay hooks to use the i8254 on bare metal.
 - Use clock_init (that will in turn make use of init_ops) to
   initialize the early clock source.

amd64/include/clock.h:
i386/include/clock.h:
 - Declare i8254_delay and clock_init.

i386/xen/clock.c:
 - Rename DELAY to i8254_delay.

x86/isa/clock.c:
 - Introduce clock_init that will take care of initializing the early
   clock by making use of the init_ops hooks.
 - Move non ISA related delay functions to the newly introduced delay
   file.

x86/x86/delay.c:
 - Add moved delay related functions.
 - Implement generic DELAY function that will use the init_ops hooks.

x86/xen/pv.c:
 - Set PVH hooks for the early delay related functions in init_ops.

conf/files.amd64:
conf/files.i386:
conf/files.pc98:
 - Add delay.c to the kernel build.
2014-03-11 10:20:42 +00:00
royger
4df602a6bf amd64: introduce hook for custom preload metadata parsers
Add hooks to amd64 in order to have diverging implementations, since
on Xen PV the metadata is passed to the kernel in a different form.

Approbed by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Define init_ops for native.
 - Put native code inside of native_parse_preload_data hook.
 - Call the parse_preload_data in order to fill the metadata info.

x86/include/init.h:
 - Declare the init_ops struct.

x86/xen/pv.c:
 - Declare xen_init_ops that contains the Xen PV implementation of
   init_ops.
 - Implement the parse_preload_data for Xen PVH, the info is fetched
   from HYPERVISOR_start_info->cmd_line as provided by Xen.
2014-03-11 10:15:25 +00:00
royger
b13d7383ff howto_names: unify declaration
Approved by: gibbs
Sponsored by: Citrix Systems R&D

boot/i386/efi/bootinfo.c:
boot/i386/libi386/bootinfo.c:
boot/ia64/common/bootinfo.c:
boot/powerpc/ofw/metadata.c:
boot/powerpc/ps3/metadata.c:
boot/sparc64/loader/metadata.c:
boot/uboot/common/metadata.c:
boot/userboot/userboot/bootinfo.c:
i386/xen/xen_machdep.c:
 - Include sys/boot.h
 - Remove custom definition of howto_names.

sys/boot.h:
 - Define howto_names.

x86/xen/pv.c:
 - Include sys/boot.h
2014-03-11 10:13:06 +00:00
royger
3c7c289c46 xen: add and enable Xen console for PVH guests
This adds and enables the PV console used on XEN kernels to
GENERIC/XENHVM kernels in order for it to be used on PVH.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

dev/xen/console/console.c:
 - Define console_page.
 - Move xc_printf debug function from i386 XEN code to generic console
   code.
 - Rework xc_printf.
 - Use xen_initial_domain instead of open-coded checks for Dom0.
 - Gate the attach of the PV console to PV(H) guests.

dev/xen/console/xencons_ring.c:
 - Allow the PV Xen console to output earlier by directly signaling
   the event channel in start_info if the event channel is not yet
   initialized.
 - Use HYPERVISOR_start_info instead of xen_start_info.

i386/include/xen/xen-os.h:
 - Remove prototype for xc_printf since it's now declared in global
   xen-os.h

i386/xen/xen_machdep.c:
 - Remove previous version of xc_printf.
 - Remove definition of console_page (now it's defined in the console
   itself).
 - Fix some printf formatting errors.

x86/xen/pv.c:
 - Add some early boot debug messages using xc_printf.
 - Set console_page based on the value passed in start_info.

xen/xen-os.h:
 - Declare console_page and add prototype for xc_printf.
2014-03-11 10:09:23 +00:00
royger
5dd05db7ff xen: add PV/PVH kernel entry point
Add the PV/PVH entry point and the low level functions for PVH
early initialization.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/genassym.c:
 - Add __FreeBSD_version define to assym.s so it can be used for the
   Xen notes.

amd64/amd64/locore.S:
 - Make bootstack global so it can be used from Xen kernel entry
   point.

amd64/amd64/xen-locore.S:
 - Add Xen notes to the kernel.
 - Add the Xen PV entry point, that is going to call hammer_time_xen.

amd64/include/asmacros.h:
 - Add ELFNOTE macros.

i386/xen/xen_machdep.c:
 - Define HYPERVISOR_start_info for the XEN i386 PV port, which is
   going to be used in some shared code between PV and PVH.

x86/xen/hvm.c:
 - Define HYPERVISOR_start_info for the PVH port.

x86/xen/pv.c:
 - Introduce hammer_time_xen which is going to perform early setup for
   Xen PVH:
    - Setup shared Xen variables start_info, shared_info and
      xen_store.
    - Set guest type.
    - Create initial page tables as FreeBSD expects to find them.
    - Call into native init function (hammer_time).

xen/xen-os.h:
 - Declare HYPERVISOR_start_info.

conf/files.amd64:
 - Add amd64/amd64/locore.S and x86/xen/pv.c to the list of files.
2014-03-11 10:07:01 +00:00
royger
27026f4f2a amd64/i386: switch IPI handlers to C code.
Move asm IPIs handlers to C code, so both Xen and native IPI handlers
share the same code.

Reviewed by: jhb
Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/apic_vector.S:
i386/i386/apic_vector.s:
 - Remove asm coded IPI handlers and instead call the newly introduced
   C variants.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
 - Add C coded clones to the asm IPI handlers (moved from
   x86/xen/hvm.c).

i386/include/smp.h:
amd64/include/smp.h:
 - Add prototypes for the C IPI handlers.

x86/xen/hvm.c:
 - Move the C IPI handlers to mp_machdep and call those in the Xen IPI
   handlers.

i386/xen/mp_machdep.c:
 - Add dummy IPI handlers to the i386 Xen PV port (this port doesn't
   support SMP).
2014-03-11 10:03:29 +00:00
jkim
9b4d3b43ca Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c.
Inspired by:	jhb
2014-03-04 21:35:57 +00:00
jhb
6e6e271c34 Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
  PCI domain/segment.  Host-PCI bridge drivers use this API to allocate
  bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
  their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
  full range of bus numbers from secbus to subbus from their parent bridge.
  The drivers also always program their primary bus register.  The bridge
  drivers also support growing their bus range by extending the bus resource
  and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
  used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.

Reviewed by:	imp
MFC after:	1 month
2014-02-12 04:30:37 +00:00
jhb
94d685456e Drop the 3rd clause from all 3 clause BSD licenses where I am the sole
holder to convert them to 2 clause BSD licenses.

MFC after:	1 week
2014-02-05 18:13:27 +00:00
jhb
531c22988c Move a warning about LINT pins configured with a level trigger under
bootverbose.
2014-02-05 18:11:46 +00:00
tijl
3f99ec6091 Rename the AMD MSR_PERFCTR[0-3] so the Pentium Pro MSR_PERFCTR[0-1]
aren't redefined.

Reported by:	"Trivedi, Nishank" <Nishank.Trivedi@netapp.com>
Discussed with:	kib
2014-01-31 14:29:34 +00:00
jhb
b2533ec507 Move <machine/apicvar.h> to <x86/apicvar.h>. 2014-01-23 20:10:22 +00:00
jhb
6b3a4c086d - Reuse legacy_pcib_(read|write)_config() methods in the QPI pcib driver.
- Reuse legacy_pcib_alloc_msi{,x}() methods in the QPI and mptable pcib
  drivers.
2014-01-21 03:14:19 +00:00
jhb
094f2691ca - Only check the ivars for direct descendants.
- A couple of whitespace fixes.
2014-01-20 17:55:22 +00:00
jhb
35bc581adc The changes in r233781 attempted to make logging during a machine check
exception more readable.  In practice they prevented all logging during
a machine check exception on at least some systems.  Specifically, when
an uncorrected ECC error is detected in a DIMM on a Nehalem/Westmere
class machine, all CPUs receive a machine check exception, but only
CPUs on the same package as the memory controller for the erroring DIMM
log an error.  The CPUs on the other package would complete the scan of
their machine check banks and panic before the first set of CPUs could
log an error.  The end result was a clearer display during the panic
(no interleaved messages), but a crashdump without any useful info about
the error that occurred.

To handle this case, make all CPUs spin in the machine check handler
once they have completed their scan of their machine check banks until
at least one machine check error is logged.  I tried using a DELAY()
instead so that the CPUs would not potentially hang forever, but that
was not reliable in testing.

While here, don't clear MCIP from MSR_MCG_STATUS before invoking panic.
Only clear it if the machine check handler does not panic and returns
to the interrupted thread.
2014-01-08 21:04:12 +00:00
nwhitehorn
f06ffda243 Retire machine/fdt.h as a header used by MI code, as its function is now
obsolete. This involves the following pieces:
- Remove it entirely on PowerPC, where it is not used by MD code either
- Remove all references to machine/fdt.h in non-architecture-specific code
  (aside from uart_cpu_fdt.c, shared by ARM and MIPS, and so is somewhat
  non-arch-specific).
- Fix code relying on header pollution from machine/fdt.h includes
- Legacy fdtbus.c (still used on x86 FDT systems) now passes resource
  requests to its parent (nexus). This allows x86 FDT devices to allocate
  both memory and IO requests and removes the last notionally MI use of
  fdtbus_bs_tag.
- On those architectures that retain a machine/fdt.h, unused bits like
  FDT_MAP_IRQ and FDT_INTR_MAX have been removed.
2014-01-05 18:46:58 +00:00
jhb
0dede48f9c Fix i386 build.
Pointy hat to:	jhb
2013-12-24 14:48:52 +00:00
jhb
63c019063a Add a resume hook for bhyve that runs a function on all CPUs during
resume.  For Intel CPUs, invoke vmxon for CPUs that were in VMX mode
at the time of suspend.

Reviewed by:	neel
2013-12-23 19:48:22 +00:00
jhb
337d8cb0ee Use fixed-width types for all fields in MP Table structures and pack
all the structures.  While here, move a helper struct only used in
the kernel parser out of this header since it is not part of the MP
specification itself.
2013-12-11 21:19:04 +00:00
mav
bbaf4bdeea Do not DELAY() for P-state transition unless we want to see the result.
Intel manual says: "If a transition is already in progress, transition to
a new value will subsequently take effect. Reads of IA32_PERF_CTL determine
the last targeted operating point."  So seems it should be fine to just
trigger wanted transition and go.  Linux does the same.

MFC after:	1 month
2013-12-10 20:25:43 +00:00
jhb
8f255ea165 Move constants for indices in the local APIC's local vector table from
apicvar.h to apicreg.h.
2013-12-09 21:08:52 +00:00
jhb
77c171ac39 Fix the processor table entry structure to use a fixed-width type for
32-bit fields so it is the correct size on amd64.  Remove a workaround
for the broken structure from bhyve(8).

MFC after:	1 week
2013-12-05 21:51:54 +00:00
eadler
44c01df173 Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this
shifts into the sign bit.  Instead use (1U << 31) which gets the
expected result.

This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.

A similar change was made in OpenBSD.

Discussed with:	-arch, rdivacky
Reviewed by:	cperciva
2013-11-30 22:17:27 +00:00
attilio
7ee4e910ce - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
emaste
9dcbb8e88d x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)
Debuggers may need to change PSL_RF.  Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could be
modified via sigreturn, so this change should not provide any new ability
to userspace.

For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

Reviewed by:	jhb, kib
Sponsored by:	DARPA, AFRL
2013-11-14 15:37:20 +00:00
dim
65aecc9afe Fix gcc warning about an uninitialized bool in sys/x86/iommu/intel_drv.c.
Reviewed by:	kib
2013-11-09 22:05:29 +00:00
dim
ae77d250d3 Fix gcc warning about an empty device_printf() format string in
sys/x86/iommu/intel_fault.c.

Reviewed by:	kib
2013-11-09 22:00:44 +00:00
dim
0ab3bf57ca Fix (erroneous) gcc warnings about usage of uninitialized variables in
sys/x86/iommu/intel_idpgtbl.c.

Reviewed by:	kib
2013-11-09 20:36:52 +00:00
dim
2fd21fb1fe Fix gcc warnings about casting away const in sys/x86/iommu/intel_drv.c.
Reviewed by:	kib
2013-11-09 20:09:02 +00:00
dim
cd1f38856a Initialize variable in sys/x86/iommu/busdma_dmar.c, to avoid possible
uninitialized use.

Reviewed by:	kib
2013-11-08 17:27:22 +00:00
kib
3583963461 Add bits for the AMD features from CPUID function 0x80000001 ECX,
described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf.

Partially based on the patch submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-11-08 16:32:30 +00:00
sbruno
dce4ab5a66 Fix powerd/states on AMD cpus. Resolves issues with system reporting:
hwpstate0: set freq failed, err 6

Tested on FX-8150 and others.

PR:		167018
Submitted by:	avg
MFC after:	2 weeks
2013-11-06 23:29:25 +00:00
kib
f1a1b2ea9b Add support for queued invalidation.
Right now, the semaphore write is scheduled after each batch, which is
not optimal and must be tuned.

Discussed with:	alc
Tested by:	pho
MFC after:	1 month
2013-11-01 17:38:52 +00:00
kib
41ccfbfc30 Return BUS_PROBE_NOWILDCARD from the DMAR probe method.
Confirmed by:	nwhitehorn
MFC after:	1 month
2013-11-01 17:16:44 +00:00
markj
a5fb1fbfd8 Remove references to an unused fasttrap probe hook, and remove the
corresponding x86 trap type. Userland DTrace probes are currently handled
by the other fasttrap hooks (dtrace_pid_probe_ptr and
dtrace_return_probe_ptr).

Discussed with:	rpaulo
2013-10-31 02:35:00 +00:00
kib
adedddebaf Remove redundand declaration, fixing the build with gcc.
Reported and tested by:	Michael Butler <imb@protected-networks.net>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-29 07:25:54 +00:00
kib
04c60e5a21 Remove redundand assignment to error variable and check for its value [1].
Do CTR logging in the case of error as well.

Noted by:	rdivacky [1]
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-28 19:30:09 +00:00
kib
74b8996ebe Import the driver for VT-d DMAR hardware, as specified in the revision
1.3 of Intelб╝ Virtualization Technology for Directed I/O Architecture
Specification.  The Extended Context and PASIDs from the rev. 2.2 are
not supported, but I am not aware of any released hardware which
implements them.  Code does not use queued invalidation, see comments
for the reason, and does not provide interrupt remapping services.

Code implements the management of the guest address space per domain
and allows to establish and tear down arbitrary mappings, but not
partial unmapping.  The superpages are created as needed, but not
promoted.  Faults are recorded, fault records could be obtained
programmatically, and printed on the console.

Implement the busdma(9) using DMARs.  This busdma backend avoids
bouncing and provides security against misbehaving hardware and driver
bad programming, preventing leaks and corruption of the memory by wild
DMA accesses.

By default, the implementation is compiled into amd64 GENERIC kernel
but disabled; to enable, set hw.dmar.enable=1 loader tunable.  Code is
written to work on i386, but testing there was low priority, and
driver is not enabled in GENERIC.  Even with the DMAR turned on,
individual devices could be directed to use the bounce busdma with the
hw.busdma.pci<domain>:<bus>:<device>:<function>.bounce=1 tunable.  If
DMARs are capable of the pass-through translations, it is used,
otherwise, an identity-mapping page table is constructed.

The driver was tested on Xeon 5400/5500 chipset legacy machine,
Haswell desktop and E5 SandyBridge dual-socket boxes, with ahci(4),
ata(4), bce(4), ehci(4), mfi(4), uhci(4), xhci(4) devices.  It also
works with em(4) and igb(4), but there some fixes are needed for
drivers, which are not committed yet.  Intel GPUs do not work with
DMAR (yet).

Many thanks to John Baldwin, who explained me the newbus integration;
Peter Holm, who did all testing and helped me to discover and
understand several incredible bugs; and to Jim Harris for the access
to the EDS and BWG and for listening when I have to explain my
findings to somebody.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-28 13:33:29 +00:00
kib
bbb99e220d Add a virtual table for the busdma methods on x86, to allow different
busdma implementations to coexist.  Copy busdma_machdep.c to
busdma_bounce.c, which is still a single implementation of the busdma
interface on x86 for now.  The busdma_machdep.c only contains common
and dispatch code.

Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-27 22:05:10 +00:00
kib
79afbd5fdd Add bus_dmamap_load_ma() function to load map with the array of
vm_pages.  Provide trivial implementation which forwards the load to
_bus_dmamap_load_phys() page by page.  Right now all architectures use
bus_dmamap_load_ma_triv().

Tested by:	pho (as part of the functional patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-27 21:39:16 +00:00
kib
b4593acbf9 Add ddb 'show ioapic' and 'show all ioapics' commands.
Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-10-24 20:13:40 +00:00
phk
ce42421e8d Add a va_copy() to our fall-back stdarg implementation for use with lint(1)
Approved by:	re@ (glebius@)
2013-10-07 10:01:23 +00:00
gibbs
9c8c76f921 Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id
field.  Perform vcpu enumeration for Xen PV and HVM environments
and convert all Xen drivers to use vcpu_id instead of a hard coded
assumption of the mapping algorithm (acpi or apic ID) in use.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)

amd64/include/pcpu.h:
i386/include/pcpu.h:
	Add vcpu_id to the amd64 and i386 pcpu structures.

dev/xen/timer/timer.c
x86/xen/xen_intr.c
	Use new vcpu_id instead of assuming acpi_id == vcpu_id.

i386/xen/mp_machdep.c:
i386/xen/mptable.c
x86/xen/hvm.c:
	Perform Xen HVM and Xen full PV vcpu_id mapping.

x86/xen/hvm.c:
x86/acpica/madt.c
	Change SYSINIT ordering of acpi CPU enumeration so that it
	is guaranteed to be available at the time of Xen HVM vcpu
	id mapping.
2013-10-05 23:11:01 +00:00
gibbs
716c2031c7 Correct panic caused by attaching both Xen PV and HyperV virtualization
aware drivers on Xen hypervisors that advertise support for some
HyperV features.

x86/xen/hvm.c:
	When running in HVM mode on a Xen hypervisor, set vm_guest
	to VM_GUEST_XEN so other virtualization aware components in
	the FreeBSD kernel can detect this mode is active.

dev/hyperv/vmbus/hv_hv.c:
	Use vm_guest to ignore Xen's HyperV emulation when Xen is
	detected and Xen PV drivers are active.

Reported by:	Shanker Balan
Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (Xen blanket)
2013-10-05 19:51:09 +00:00
gibbs
7355b035d6 sys/x86/xen/hvm.c:
Set cpu_ops correctly for Xen hypervisors lacking the
	vector callback feature.

	Set preliminary Xen cpu_ops settings during early HVM
	initialization.  The old location raced with the startup
	of APs.

Submitted by:	Roger Pau Monné
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
2013-09-27 15:17:28 +00:00
gibbs
7ed30adae7 Merge Xen PVHVM support into the GENERIC kernel config for both
amd64 and i386.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
MFC after:	2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
	- Introduce two new CPU hooks for initialization and resume
	  purposes. This allows us to get rid of the XENHVM ifdefs in
	  mp_machdep, and also sets some hooks into common code that can be
	  used by other hypervisor implementations.

sys/amd64/conf/XENHVM:
sys/i386/conf/XENHVM:
	- Remove these configs now that GENERIC has builtin support for Xen
	  HVM.

sys/kern/subr_smp.c:
	- Make sure there are no pending IPIs when suspending a system.

sys/x86/xen/hvm.c:
	- Add cpu init and resume vectors that are called from mp_machdep
	  using the new hooks.
	- Only clear the vcpu_info mapping data on resume.  It is already
	  clear for the BSP on a cold boot and is set correctly as APs
	  are started.
	- Gate xen_hvm_init_cpu only to systems running under Xen.

sys/x86/xen/xen_intr.c:
	 - Gate the setup of event channels only to systems running under Xen.
2013-09-20 22:59:22 +00:00
gibbs
a9c07a6f67 Add support for suspend/resume/migration operations when running as a
Xen PVHVM guest.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
MFC after:	2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	- Make sure that are no MMU related IPIs pending on migration.
	- Reset pending IPI_BITMAP on resume.
	- Init vcpu_info on resume.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
	- Add a "suspend_cancelled" parameter to pic_resume().  For the
	  Xen PIC, restoration of interrupt services differs between
	  the aborted suspend and normal resume cases, so we must provide
	  this information.

sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
	- Don't swap out "suspend safe" timers across a suspend/resume
	  cycle.  This includes the Xen PV and ACPI timers.

sys/dev/xen/control/control.c:
	- Perform proper suspend/resume process for PVHVM:
		- Suspend all APs before going into suspension, this allows us
		  to reset the vcpu_info on resume for each AP.
		- Reset shared info page and callback on resume.

sys/dev/xen/timer/timer.c:
	- Implement suspend/resume support for the PV timer. Since FreeBSD
	  doesn't perform a per-cpu resume of the timer, we need to call
	  smp_rendezvous in order to correctly resume the timer on each CPU.

sys/dev/xen/xenpci/xenpci.c:
	- Don't reset the PCI interrupt on each suspend/resume.

sys/kern/subr_smp.c:
	- When suspending a PVHVM domain make sure there are no MMU IPIs
	  in-flight, or we will get a lockup on resume due to the fact that
	  pending event channels are not carried over on migration.
	- Implement a generic version of restart_cpus that can be used by
	  suspended and stopped cpus.

sys/x86/xen/hvm.c:
	- Implement resume support for the hypercall page and shared info.
	- Clear vcpu_info so it can be reset by APs when resuming from
	  suspension.

sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
	- Support UP kernel configurations.

sys/x86/xen/xen_intr.c:
	- Properly rebind per-cpus VIRQs and IPIs on resume.
2013-09-20 05:06:03 +00:00
gibbs
437790b349 Implement PV IPIs for PVHVM guests and further converge PV and HVM
IPI implmementations.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Submitted by: gibbs (misc cleanup, table driven config)
Reviewed by:  gibbs
MFC after: 2 weeks

sys/amd64/include/cpufunc.h:
sys/amd64/amd64/pmap.c:
	Move invltlb_globpcid() into cpufunc.h so that it can be
	used by the Xen HVM version of tlb shootdown IPI handlers.

sys/x86/xen/xen_intr.c:
sys/xen/xen_intr.h:
	Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(),
	and remove the ipi vector parameter.  This api allocates
	an event channel port that can be used for ipi services,
	but knows nothing of the actual ipi for which that port
	will be used.  Removing the unused argument and cleaning
	up the comments surrounding its declaration helps clarify
	its actual role.

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
	Implement a generic framework for amd64 and i386 that allows
	the implementation of certain CPU management functions to
	be selected at runtime.  Currently this is only used for
	the ipi send function, which we optimize for Xen when running
	on a Xen hypervisor, but can easily be expanded to support
	more operations.

sys/x86/xen/hvm.c:
	Implement Xen PV IPI handlers and operations, replacing native
	send IPI.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
sys/i386/include/smp.h:
	Remove NR_VIRQS and NR_IPIS from FreeBSD headers.  NR_VIRQS
	is defined already for us in the xen interface files.
	NR_IPIS is only needed in one file per Xen platform and is
	easily inferred by the IPI vector table that is defined in
	those files.

sys/i386/xen/mp_machdep.c:
	Restructure to more closely match the HVM implementation by
	performing table driven IPI setup.
2013-09-06 22:17:02 +00:00
gibbs
55177c016f Conform to style(9). No functional changes.
sys/x86/xen/hvm.c:
	Do not rely on implicit conversion to boolean in expressions
	(e.g. use "if (rc != 0)" instead of "if (rc)".

	Line continuations for functions are indented an additional
	4 spaces.

	Insert an empty line if the function has no local variables.

	Prefer separate initializtion statements to initialzing
	local variables in their declaration.

	Braces that are not necessary may be left out.

MFC after:	2 weeks
2013-09-01 23:49:36 +00:00
gibbs
fcdbf70fd9 Implement vector callback for PVHVM and unify event channel implementations
Re-structure Xen HVM support so that:
	- Xen is detected and hypercalls can be performed very
	  early in system startup.
	- Xen interrupt services are implemented using FreeBSD's native
	  interrupt delivery infrastructure.
	- the Xen interrupt service implementation is shared between PV
	  and HVM guests.
	- Xen interrupt handlers can optionally use a filter handler
	  in order to avoid the overhead of dispatch to an interrupt
	  thread.
	- interrupt load can be distributed among all available CPUs.
	- the overhead of accessing the emulated local and I/O apics
	  on HVM is removed for event channel port events.
	- a similar optimization can eventually, and fairly easily,
	  be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
	Reserve IDT vector 0x93 for the Xen event channel upcall
	interrupt handler.  On Hypervisors that support the direct
	vector callback feature, we can request that this vector be
	called directly by an injected HVM interrupt event, instead
	of a simulated PCI interrupt on the Xen platform PCI device.
	This avoids all of the overhead of dealing with the emulated
	I/O APIC and local APIC.  It also means that the Hypervisor
	can inject these events on any CPU, allowing upcalls for
	different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
	Increase the FreeBSD IRQ vector table to include space
	for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
	Remove Xen HVM per-cpu variable data.  These fields are now
	allocated via the dynamic per-cpu scheme.  See xen_intr.c
	for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
	Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
	Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
	Remove constants, macros, and functions unused in FreeBSD's Xen
	support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
	Introduce new functions xen_domain(), xen_pv_domain(), and
	xen_hvm_domain().  These are used in favor of #ifdefs so that
	FreeBSD can dynamically detect and adapt to the presence of
	a hypervisor.  The goal is to have an HVM optimized GENERIC,
	but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
	Refactor magic ioport, Hypercall table and Hypervisor shared
	information page setup, and move it to a dedicated HVM support
	module.

	HVM mode initialization is now triggered during the
	SI_SUB_HYPERVISOR phase of system startup.  This currently
	occurs just after the kernel VM is fully setup which is
	just enough infrastructure to allow the hypercall table
	and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
	Add definitions and a method for configuring Hypervisor event
	delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
	Adjust kernel build to reflect the refactoring of early
	Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
	Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
	Since blkback defers all event handling to a taskqueue,
	convert this task queue to a "fast" taskqueue, and schedule
	it via an interrupt filter.  This avoids an unnecessary
	ithread context switch.

sys/xen/xenstore/xenstore.c:
	The xenstore driver is MPSAFE.  Indicate as much when
	registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
	Remove unused event channel APIs.

sys/xen/evtchn.h:
	Remove all kernel Xen interrupt service API definitions
	from this file.  It is now only used for structure and
	ioctl definitions related to the event channel userland
	device driver.

	Update the definitions in this file to match those from
	NetBSD.  Implementing this interface will be necessary for
	Dom0 support.

sys/xen/evtchn/evtchnvar.h:
	Add a header file for implemenation internal APIs related
	to managing event channels event delivery.  This is used
	to allow, for example, the event channel userland device
	driver to access low-level routines that typical kernel
	consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
	Standardize on the evtchn_port_t type for referring to
	an event channel port id.  In order to prevent low-level
	event channel APIs from leaking to kernel consumers who
	should not have access to this data, the type is defined
	twice: Once in the Xen provided event_channel.h, and again
	in xen/xen_intr.h.  The double declaration is protected by
	__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
	twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
	New implementation of Xen interrupt services.  This is
	similar in many respects to the i386 PV implementation with
	the exception that events for bound to event channel ports
	(i.e. not IPI, virtual IRQ, or physical IRQ) are further
	optimized to avoid mask/unmask operations that aren't
	necessary for these edge triggered events.

	Stubs exist for supporting physical IRQ binding, but will
	need additional work before this implementation can be
	fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
	Add support for placing vcpu_info into an arbritary memory
	page instead of using HYPERVISOR_shared_info->vcpu_info.
	This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
	Add support for new event channle implementation.
2013-08-29 19:52:18 +00:00
brooks
861668a16b Call set_i8254_freq with MODE_STOP (0) rather than a magic number of 0. 2013-08-15 17:21:06 +00:00
jkim
20df47e877 Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these
two files were functionally identical.
2013-08-13 22:05:10 +00:00
kib
8de1718b60 Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain.  The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass.  This is not optimal, since it
could cause excessive page deactivation and freeing.  The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues.  Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
jeff
de4ecca213 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
avg
425573ccba x86: detect mwait capabilities and extensions, when present
Reviewed by:	kib (earlier amd64-only version)
MFC after:	2 weeks
2013-07-28 17:54:42 +00:00
rpaulo
4d601c587e Fix a KTR_BUSDMA format string. 2013-06-18 06:55:58 +00:00
marcel
65b2bbd1ff Add basic support for FDT to i386 & amd64. This change includes:
1.  Common headers for fdt.h and ofw_machdep.h under x86/include
    with indirections under i386/include and amd64/include.
2.  New modinfo for loader provided FDT blob.
3.  Common x86_init_fdt() called from hammer_time() on amd64 and
    init386() on i386.
4.  Split-off FDT specific low-level console functions from FDT
    bus methods for the uart(4) driver. The low-level console
    logic has been moved to uart_cpu_fdt.c and is used for arm,
    mips & powerpc only. The FDT bus methods are shared across
    all architectures.
5.  Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
    fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from:	Juniper Networks, Inc.
2013-05-21 03:05:49 +00:00
attilio
291f413ed8 o Add accessor functions to add and remove pages from a specific
freelist.
o Split the pool of free pages queues really by domain and not rely on
  definition of VM_RAW_NFREELIST.
o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific
  function that is called when calculating the allocation domain.
  The RR counter is kept, currently, per-thread.
  In the future it is expected that such function evolves in a real
  policy decision referee, based on specific informations retrieved by
  per-thread and per-vm_object attributes.
o Add the concept of "probed domains" under the form of vm_ndomains.
  It is responsibility for every architecture willing to support multiple
  memory domains to correctly probe vm_ndomains along with mem_affinity
  segments attributes.  Those two values are supposed to remain always
  consistent.
  Please also note that vm_ndomains and td_dom_rr_idx are both int
  because segments already store domains as int.  Ideally u_int would
  have much more sense. Probabilly this should be cleaned up in the
  future.
o Apply RR domain selection also to vm_phys_zero_pages_idle().

Sponsored by:	EMC / Isilon storage division
Partly obtained from:	jeff
Reviewed by:	alc
Tested by:	jeff
2013-05-13 15:40:51 +00:00
eadler
6907881cb8 Fix several typos
PR:		kern/176054
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
MFC after:	3 days
2013-05-12 16:43:26 +00:00
hiren
cd6fbb1d3e Adding a detach method to p4tcc driver.
PR:	118739
Submitted by:	Dan Lukes <dan@obluda.cz> (earlier version)
Reviewed by:	jhb
Approved by:	sbruno (mentor)
MFC after:	1 week
2013-05-10 22:43:27 +00:00
attilio
68174eb552 Revert r250339 as apparently it is more clutter than help.
Sponsored by:	EMC / Isilon storage division
Requested by:	jhb
2013-05-08 21:06:47 +00:00
attilio
e5932eeddb Add functions to do ACPI System Locality Information Table parsing
and printing at boot.
For reference on table informations and purposes please review ACPI specs.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	jhb (earlier version)
2013-05-07 22:49:56 +00:00
attilio
b24a52ec9e Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
mav
5055fe41fa Introduce kern.timecounter.smp_tsc_adjust tunable (disabled by default) and
respective functionality, allowing to synchronize TSC on APs to match BSP's
during boot.  It may be unsafe in general case due to theoretical chance of
later drift if CPUs are using different clock rate or source, but it allows
to use TSC in some cases when difference caused by some initialization bug,
while TSCs are known to increment synchronously.

Reviewed by:	jimharris, kib
MFC after:	1 month
2013-04-18 17:07:04 +00:00
rpaulo
c04b376f02 Move the previously added CPUID7 macros to CPUID_STDEXT. 2013-04-18 07:09:27 +00:00
rpaulo
7b5712a5b5 Add the most current CPUID7_* definitions. 2013-04-18 01:30:08 +00:00
neel
ea145a8678 Make the code to check if VMX is enabled more readable by using macros
instead of magic numbers.

Discussed with:	Chris Torek
2013-04-11 04:29:45 +00:00
neel
3bab173b64 Unsynchronized TSCs on the host require special handling in bhyve:
- use clock_gettime(2) as the time base for the emulated ACPI timer instead
  of directly using rdtsc().

- don't advertise the invariant TSC capability to the guest to discourage it
  from using the TSC as its time base.

Discussed with:	jhb@ (about making 'smp_tsc' a global)
Reported by:	Dan Mack on freebsd-virtualization@
Obtained from:	NetApp
2013-04-10 05:59:07 +00:00
kib
14e47b8577 Record the correct error in the trace.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2013-04-01 09:57:46 +00:00
mav
6cf7cc6e4d MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
2013-02-28 13:46:03 +00:00
imp
0e93b8cbe6 Use critical_enter/critical_exit around the time sensitive part of
this code to depessimize the worst case we've lived with silently and
uneventfully for the past 12 years. Add a comment about a refinement
for those needing more assurance of accuracy.

Fix ddb's show rtc command deadlock potential when debugging rtc code
by not taking the lock if we're in the debugger. If you need a thumb
to count the number of people that have encountered this, I'd be
surprised.

Submitted by:	bde
2013-02-21 15:35:48 +00:00
imp
e7a3528a80 Correct comment about use of pmtimer, and the real reason it isn't
used or desirable for amd64.
2013-02-21 06:38:24 +00:00
imp
c21cb04a9d Fix broken usage of splhigh() by removing it. 2013-02-21 00:40:08 +00:00
kib
cda69c9622 Convert machine/elf.h, machine/frame.h, machine/sigframe.h,
machine/signal.h and machine/ucontext.h into common x86 includes,
copying from amd64 and merging with i386.

Kernel-only compat definitions are kept in the i386/include/sigframe.h
and i386/include/signal.h, to reduce amd64 kernel namespace pollution.
The amd64 compat uses its own definitions so far.

The _MACHINE_ELF_WANT_32BIT definition is to allow the
sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions
on the amd64 compile host.  The same hack could be usefully abused by
other code too.
2013-02-20 17:39:52 +00:00
davide
da217eece1 Fixup r246916 in case gcc is used to build.
Reported by:	attilio, simon
2013-02-19 16:43:48 +00:00
mav
a8b029c7cf MFcalloutng:
Microoptimize i8254 one-shot operation mode (disabled by default to allow
timecounter functionality) by not writing to mode and MSB registers when
it is not required.  This saves several microseconds of CPU time per call,
reducing minimal measured interrupts interval to 19.5us.
2013-02-17 18:42:30 +00:00
jhb
ca6ecf3ea5 Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel
config file.

Requested by:	phk (ages ago)
MFC after:	1 month
2013-02-14 19:38:04 +00:00
kib
bd7f0fa0bb Reform the busdma API so that new types may be added without modifying
every architecture's busdma_machdep.c.  It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code.  The MD busdma is then given a chance to do any final processing
in the complete() callback.

The cam changes unify the bus_dmamap_load* handling in cam drivers.

The arm and mips implementations are updated to track virtual
addresses for sync().  Previously this was done in a type specific
way.  Now it is done in a generic way by recording the list of
virtuals in the map.

Submitted by:	jeff (sponsored by EMC/Isilon)
Reviewed by:	kan (previous version), scottl,
	mjacob (isp(4), no objections for target mode changes)
Discussed with:	     ian (arm changes)
Tested by:	marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
	amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
2013-02-12 16:57:20 +00:00
avg
09a43450b8 x86 suspend/resume: suspend pics and pseudo-pics in reverse order
- change 'pics' from STAILQ to TAILQ
- ensure that Local APIC is always first in 'pics'

Reviewed by:	jhb
Tested by:	Sergey V. Dyatko <sergey.dyatko@gmail.com>,
		KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
MFC after:	12 days
2013-02-02 12:02:42 +00:00
kib
5c708a87b1 The change to reduce default smp_tsc_shift caused tsc shift to become
zero on slower machines, which make the fenced get_timecount methods
not used despite needed.  Remove the (shift > 0) condition when
selecting the get_timecount() implementation.

Rename smp_tsc_shift to tsc_shift, and apply it for the UP case too.
Allow shift to reach value of 31 instead of 30, as it was previously
(should be nop).

Reorganize the tc quality calculation to remove the conditionally
compiled block.  Rename test_smp_tsc() to test_tsc() and provide
separate versions for SMP and UP builds.  The check for virtialized
hardware is more natural to perform in the smp version of the
test_tsc(), since it is only done for smp case.

Noted and reviewed by:	bde (previous version)
MFC after:	12 days
2013-02-01 16:48:55 +00:00
kib
b19d7b3a7d Reduce default shift used to calculate the max frequency for the TSC
timecounter to 1, and correspondingly increase the precision of the
gettimeofday(2) and related functions in the default configuration.

The motivation for the TSC-low timecounter, as described in the
r222866, seems to provide a workaround for the non-serializing
behaviour of the RDTSC on some Intel hardware.  Tests demonstrate that
even with the pre-shift of 8, the cross-core non-monotonicity of the
RDTSC is still observed reliably, e.g. on the Nehalems.  The r238755
and r238973 implemented the proper fix for the issue.

The pre-shift of 1 is applied to keep TSC not overflowing for the
frequency of hardclock down to 2 sec/intr.  The pre-shift is made a
tunable to allow the easy debugging of the issues users could see with
the shift being too low.

Reviewed by:	bde
MFC after:	2 weeks
2013-01-30 12:43:10 +00:00
jhb
4a3c4478d3 Don't attempt to use clflush on the local APIC register window. Various
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7).  The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.

MFC after:	2 weeks
2013-01-17 21:32:25 +00:00
neel
3566a3da10 Add macros required to enable VMX operation on Intel processors.
Obtained from:	NetApp
2013-01-05 04:20:14 +00:00
jimharris
49ec139f25 Add bus_space_read_8 and bus_space_write_8 for amd64.
Rather than trying to KASSERT for callers that invoke this on
IO tags, either do nothing (for write_8) or return ~0 (for read_8).
Using KASSERT here just makes bus.h too messy from both
polluting bus.h with systm.h (for any number of drivers that include
bus.h without first including systm.h) or ports that use bus.h
directly (i.e. libpciaccess) as reported by zeising@.

Also don't try to implement all of the other bus_space functions for
8 byte access since realistically only these two are needed for some
devices that expose 64-bit memory-mapped registers.

Put the amd64-specific functions here rather than sys/amd64/include/bus.h
so that we can keep this header unified for x86, as requested by mdf@
and tijl@.

Submitted by:	Carl Delsey <carl.r.delsey@intel.com>
MFC after:	3 days
2012-12-13 21:40:11 +00:00
jimharris
5e7d94235a Revert r243960 based on feedback regarding keeping x86 headers unified
(mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@).

Alternate implementation will be made in a separate commit.
2012-12-13 21:27:20 +00:00
jimharris
f20d18d37b Add amd64 implementations for 8-byte bus_space routines.
Submitted by:	Carl Delsey <carl.r.delsey@intel.com>
Discussed with:	jhb, rwatson
Reviewed by:	jimharris
MFC after:	1 week
2012-12-06 22:33:31 +00:00
avg
39d9709f4f ioapic_program_intpin: program high bits before low bits
Programming the low bits has a side-effect if unmasking the pin if it is
not disabled.  So if an interrupt was pending then it would be delivered
with the correct new vector but to the incorrect old LAPIC.

This fix could be made clearer by preserving the mask bit while
programming the low bits and then explicitly resetting the mask bit
after all the programming is done.

Probability to trip over the fixed bug could be increased by bootverbose
because printing of the interrupt information in ioapic_assign_cpu
lengthened the time window during which an interrupt could arrive while
a pin is masked.

Reported by:	Andreas Longwitz <longwitz@incore.de>
Tested by:	Andreas Longwitz <longwitz@incore.de>
MFC after:	12 days
2012-12-01 18:16:14 +00:00