Commit Graph

413 Commits

Author SHA1 Message Date
neel
3ae7b75a64 Free up the IPI slot used by IPI_STOP_HARD.
Change the numeric value of IPI_STOP_HARD so it doesn't occupy a valid IPI
slot. This can be done because IPI_STOP_HARD is actually delivered via NMI.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D1983
2015-03-01 02:31:27 +00:00
kib
f38c6a3d8e Since all generations of Intel CPUs have errata which causes hang on
the cache line flush in the LAPIC page, keep direct map page covering
LAPIC mapped uncached.

To have the (incomplete) check for the LAPIC range in
pmap_invalidate_cache_range() working, lapic_paddr must be initialized
in x2APIC mode too.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-27 11:13:46 +00:00
royger
4be3c87640 xen/intr: fix fallout from r278854
r278854 introduced a race in the event channel handling code. We must make
sure that the pending bit is cleared before executing the filter, or else we
might miss other events that would be injected after the filter has ran but
before the pending bit is cleared.

While there also mask event channels while FreeBSD executes the ithread
bound to that event channel. This refrains Xen from injecting more
interrupts while the ithread has not finished it's work.

Sponsored by: Citrix Systems R&D
Reported by: sbruno, robak
Tested by: robak
2015-02-26 16:05:09 +00:00
kib
46a27978f5 Implements EOI suppression mode, where LAPIC on EOI command for
level-triggered interrupt does not broadcast the EOI message to all
APICs in the system.  Instead, interrupt handler must follow LAPIC EOI
with IOAPIC EOI.  For modern IOAPICs, the later is done by writing to
EOIR register.  Otherwise, Intel provided Linux with a trick of
temporary switching the pin config to edge and then back to level.

Detect presence of EOIR register by reading IO-APIC version.  The
summary table in the comments was taken from the Linux kernel.  For
Intel, newer IO-APICs are only briefly documented as part of the
ICH/PCH datasheet.  According to the BKDG and chipset documentation,
AMD LAPICs do not provide EOI suppression, althought IO-APICs do
declare version 0x21 and implement EOIR.

The trick to temporary switch pin to edge mode to clear IRR was tested
on modern chipset, by pretending that EOIR is not present, i.e. by
forcing io_haseoi to zero.

Tunable hw.lapic_eoi_suppression disables the optimization.

Reviewed by:	neel
Tested by:	pho
Review:	https://reviews.freebsd.org/D1943
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-26 11:02:40 +00:00
kib
df1d9177e5 For now, disable x2APIC mode when Xen is detected, even if CPU
declares support for it.  Newer versions of Xen works fine with x2APIC
code, but e.g. Xen 4.2 delivers GPF on the LAPIC MSR write, despite
x2APIC mode being known to hypervisor.

Discussed with:	royger
Sponsored by:	The FreeBSD Foundation
2015-02-25 16:44:07 +00:00
kib
32241f2de4 Revert r276949 and redo the fix for PCIe/PCI bridges, which do not
follow specification and do not provide PCIe capability.

Verify if the port above such bridge is downstream PCIe (or root port)
and treat the bridge as PCIe/PCI then.  This allows to avoid
maintaining the table of device ids for bridges without capability,
while still calculate correct request originator for devices behind
the bridge.

Submitted by:	Jason Harmening <jason.harmening@gmail.com>
MFC after:	1 week
2015-02-21 22:38:32 +00:00
tijl
1d2f909a1d Fix build on i386 without "device apic"
Reviewed by:	kib
2015-02-20 19:42:26 +00:00
kib
ce948ff36c Fix UP build.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-18 10:51:48 +00:00
kib
7a89c3df60 Initialize x2APIC mode on the resume path before accessing LAPIC.
Remove unneeded disable of LAPIC in the native_lapic_xapic_mode().  We
attempt to send wakeup IPI on the resume path right after BSP wakeup,
so disabling is wrong.

Reported and tested by:	glebius, "Ranjan1018 ." <214748mv@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-16 21:56:19 +00:00
royger
1a911b7d48 xen/intr: improve handling of legacy IRQs
Devices that use ISA IRQs expect them to be already configured, and don't
call bus_config_intr, which prevents those IRQs from working on Xen. In
order to solve it pre-register all the legacy IRQs with the default values
(edge triggered, low polarity) if no override is found.

While there add a panic if the registration of an interrupt override fails.

Sponsored by: Citrix Systems R&D
2015-02-16 16:37:59 +00:00
royger
3b68dd1b95 xen/intr: improve PIRQ handling
Improve and cleanup the Xen PIRQ event channel code:

 - Remove the xi_shared field as it is unused.
 - Clean the "pending" bit in the EOI handler, this is more similar to how
   native interrupts are handled.
 - Don't mask edge triggered PIRQs, edge trigger interrupts cannot be
   masked.
 - Panic if PHYSDEVOP_eoi fails.
 - Remove the usage of the PHYSDEVOP_alloc_irq_vector hypercall because
   it's just a no-op in the Xen versions that are supported by FreeBSD Dom0.

Sponsored by: Citrix Systems R&D
2015-02-16 16:30:42 +00:00
kib
45b91c251b Detect whether x2APIC on VMWare is usable without interrupt
redirection support.  Older versions of the hypervisor mis-interpret
the cpuid format in ioapic registers when x2APIC is turned on, but IR
is not used by the guest OS.

Based on:	Linux commit 4cca6ea04d31c22a7d0436949c072b27bde41f86
Tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-14 09:00:12 +00:00
kib
60f7add83d Registers definitions for the new capabilities from the version 2.4 of
VT-d specification.  Also add definitions for the interrupt remapping
table and IEC.

Print new capabilities on boot. although there is no hardware which
support it.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-02-11 23:30:46 +00:00
kib
e2cc9f9eca vm_page_lookup() accepts read-locked object.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-02-11 23:28:28 +00:00
kib
0754f0eac9 Add x2APIC support. Enable it by default if CPU is capable. The
hw.x2apic_enable tunable allows disabling it from the loader prompt.

To closely repeat effects of the uncached memory ops when accessing
registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded
by mfence, except for the EOI notifications.  This is probably too
strict, only ICR writes to send IPI require serialization to ensure
that other CPUs see the previous actions when IPI is delivered.  This
may be changed later.

In vmm justreturn IPI handler, call doreti_iret instead of doing iretd
inline, to handle corner conditions.

Note that the patch only switches LAPICs into x2APIC mode. It does not
enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC
MADT entries and doing interrupts remapping, but is the required step
on the way.

Reviewed by:	neel
Tested by:	pho (real hardware), neel (on bhyve)
Discussed with:	jhb, grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-09 21:00:56 +00:00
jhb
9dfc36d985 Revert the IPI startup sequence to match what is described in the
Intel Multiprocessor Specification v1.4.  The Intel SDM claims that
the INIT IPIs here are invalid, but other systems follow the MP
spec instead.

While here, fix the IPI wait routine to accept a timeout in microseconds
instead of a raw spin count, and don't spin forever during AP startup.
Instead, panic if a STARTUP IPI is not delivered after 20 us.

PR:		196542
Differential Revision:	https://reviews.freebsd.org/D1719
MFC after:	2 weeks
2015-02-06 18:19:59 +00:00
bryanv
7a68cf4e59 Add interface to derive a TSC frequency from the pvclock
This can later use this to determine the TSC frequency like is done with
VMware, instead of using a DELAY loop that is not always accurate in an VM.

MFC after:	1 month
2015-02-04 08:33:04 +00:00
bryanv
25ce9181cf Generalized parts of the XEN timer code into a generic pvclock
KVM clock shares the same data structures between the guest and the host
as Xen so it makes sense to just have a single copy of this code.

Differential Revision: https://reviews.freebsd.org/D1429
Reviewed by:	royger (eariler version)
MFC after:	1 month
2015-02-04 08:26:43 +00:00
jhb
86833ce103 Opt for performance over power-saving on Intel CPUs that have a
P-state but not C-state invariant TSC by changing the default behavior
to leaving the TSC enabled as the timecounter and disabling C2+ instead
of disabling the TSC by default.

Discussed with:		jkim
Tested by:		Jan Kokemuller <jan.kokemueller@gmail.com>
2015-01-29 20:41:42 +00:00
royger
313e96ad33 loader: fix the size of MODINFOMD_MODULEP
The data in MODINFOMD_MODULEP is packed by the loader as a 4 byte type, but
the amd64 kernel expects a vm_paddr_t, which is of size 8 bytes. Fix this by
saving it as 8 bytes in the loader and retrieving it using the proper type
in the kernel.

Sponsored by: Citrix Systems R&D
2015-01-20 12:28:24 +00:00
neel
1fe4c6403b Update the vdso timehands only via tc_windup().
Prior to this change CLOCK_MONOTONIC could go backwards when the timecounter
hardware was changed via 'sysctl kern.timecounter.hardware'. This happened
because the vdso timehands update was missing the special treatment in
tc_windup() when changing timecounters.

Reviewed by:	kib
2015-01-20 03:54:30 +00:00
imp
5dd3f4fde2 Include mca_machdep.h. 2015-01-18 03:43:47 +00:00
imp
503404cbf7 Need to include opt_mca.h to test for DEV_MCA. 2015-01-17 02:17:59 +00:00
royger
0c5b62d3d2 loader: implement multiboot support for Xen Dom0
Implement a subset of the multiboot specification in order to boot Xen
and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot
implementation is tailored to boot Xen and FreeBSD Dom0, and it will
most surely fail to boot any other multiboot compilant kernel.

In order to detect and boot the Xen microkernel, two new file formats
are added to the bootloader, multiboot and multiboot_obj. Multiboot
support must be tested before regular ELF support, since Xen is a
multiboot kernel that also uses ELF. After a multiboot kernel is
detected, all the other loaded kernels/modules are parsed by the
multiboot_obj format.

The layout of the loaded objects in memory is the following; first the
Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to
long mode by itself), after that the FreeBSD kernel is loaded as a RAW
file (Xen will parse and load it using it's internal ELF loader), and
finally the metadata and the modules are loaded using the native
FreeBSD way. After everything is loaded we jump into Xen's entry point
using a small trampoline. The order of the multiboot modules passed to
Xen is the following, the first module is the RAW FreeBSD kernel, and
the second module is the metadata and the FreeBSD modules.

Since Xen will relocate the memory position of the second
multiboot module (the one that contains the metadata and native
FreeBSD modules), we need to stash the original modulep address inside
of the metadata itself in order to recalculate its position once
booted. This also means the metadata must come before the loaded
modules, so after loading the FreeBSD kernel a portion of memory is
reserved in order to place the metadata before booting.

In order to tell the loader to boot Xen and then the FreeBSD kernel the
following has to be added to the /boot/loader.conf file:

xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga"
xen_kernel="/boot/xen"

The first argument contains the command line that will be passed to the Xen
kernel, while the second argument is the path to the Xen kernel itself. This
can also be done manually from the loader command line, by for example
typing the following set of commands:

OK unload
OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga
OK load kernel
OK load zfs
OK load if_tap
OK load ...
OK boot

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D517

For the Forth bits:
Submitted by: Julien Grall <julien.grall AT citrix.com>
2015-01-15 16:27:20 +00:00
kib
11969484c8 For x86, read MAXPHYADDR, defined in SDM vol 3 4.1.4 Enumeration of Paging
Features by CPUID as CPUID.80000008H:EAX[7:0], into variable cpu_maxphyaddr.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-12 07:36:25 +00:00
kib
5757b9b9b2 Right now, for non-coherent DMARs, page table update code flushes the
cache for whole page containing modified pte, and more, only last page
in the series of the consequtive pages is flushed (i.e. the affected
mappings should be larger than 2MB).

Avoid excessive flushing and do missed neccessary flushing, by
splitting invalidation and unmapping.  For now, flush exactly the
range of the changed pte.  This is still somewhat bigger than
neccessary, since pte is 8 bytes, while cache flush line is at least
32 bytes.

The originator of the issue reports that after the change,
'dmar_bus_dmamap_unload went from 13,288 cycles down to
3,257. dmar_bus_dmamap_load_buffer went from 9,686 cycles down to
3,517.  and I am now able to get line 1GbE speed with Netperf TCP
(even with 1K message size).'

Diagnosed and tested by:	Nadav Amit <nadav.amit@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-11 20:27:15 +00:00
kib
4fe2e4b2c6 Fix calculation of requester for PCI device behind PCIe/PCI bridge.
In my case on the test machine, I have hierarchy of
pcib2 (PCIe port on host bridge with PCIe capability) -> pci2 ->
    pcib3 (ITE PCIe/PCI bridge) -> pci3 -> em1

The device to check PCIe capability is pcib2 and not pcib3, as it is
currently done in the code.  Also, in case of the bridge, we shall
step to pcib2 for the loop iteration, since pcib3 does not carry PCIe
capability info and would force wrong recalculation of rid.

Also change the returned requester to the PCIe bus which provides port
for the bridge.  This only results in changing
hw.busdma.pciX.X.X.X.bounce tunable to force identity-mapped context
for the device.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-10 23:12:49 +00:00
kib
b1c8acc0bc Print rid when announcing DMAR context creation. Print sid when fault
occurs.  This allows to connect dots in case the requester is
calculated erronously.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-10 22:57:08 +00:00
kib
4a8a592617 Fix DMAR context allocations for the devices behind PCIe->PCI bridges
after dmar driver was converted to use rids.  The bus component to
calculate context page must be taken from the requestor rid, which is
a bridge, and not from the device bus number.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-09 02:10:44 +00:00
sbruno
8d4db71d65 Update Features2 to display SDBG capability of processor. This is
showing up on Haswell-class CPUs

From the Intel SDM, "Table 3-20. Feature Information Returned in the
ECX Register"

11 | SDBG | A value of 1 indicates the processor supports
IA32_DEBUG_INTERFACE MSR for silicon debug.

Submitted by:	jiashiun@gmail.com
Reviewed by:	jhb neel
MFC after:	2 weeks
2015-01-08 16:50:35 +00:00
jhb
06e75f0dba Create a cpuset mask for each NUMA domain that is available in the
kernel via the global cpuset_domain[] array. To export these to userland,
add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a
specific domain. Add a -d flag to cpuset(1) that can be used to fetch
the mask for a given domain.

Differential Revision:	https://reviews.freebsd.org/D1232
Submitted by:	jeff (kernel bits)
Reviewed by:	adrian, jeff
2015-01-08 15:53:13 +00:00
markj
7e7e145818 Factor out duplicated code from dumpsys() on each architecture into generic
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.

PR:		193873
Differential Revision:	https://reviews.freebsd.org/D904
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by:	jhibbits (earlier version)
Sponsored by:	EMC / Isilon Storage Division
2015-01-07 01:01:39 +00:00
jhb
55d0376a65 On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST).  Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.

PR:		192316
Differential Revision:	https://reviews.freebsd.org/D1441
No objection from:	jkim
MFC after:	1 week
2015-01-05 20:44:44 +00:00
hselasky
a96e5946f7 Fix warning about possible use of uninitialized variable. 2015-01-02 08:42:44 +00:00
royger
9d0e5e2b5e xen/intr: balance dynamic interrupts across available vCPUs
By default Xen binds all event channels to vCPU#0, and FreeBSD only shuffles
the interrupt sources once, at the end of the boot process. Since new event
channels might be created after this point (because new devices or backends
are added), try to automatically shuffle them at creation time.

This does not affect VIRQ or IPI event channels, that are already bound to a
specific vCPU as requested by the caller.

Sponsored by: Citrix Systems R&D
2014-12-10 13:25:21 +00:00
royger
f5723debac xen: mask event channels while binding them to a vCPU
Mask the event channel source before trying to bind it to a CPU, this
prevents stray interrupts from firing while assigning them and hitting the
KASSERT in xen_intr_handle_upcall.

Sponsored by: Citrix Systems R&D
2014-12-10 11:42:02 +00:00
royger
e09f127692 xen: convert the Grant-table code to a NewBus device
This allows the Grant-table code to attach directly to the xenpv bus,
allowing us to remove the grant-table initialization done in xenpv.

Sponsored by: Citrix Systems R&D
2014-12-10 11:35:41 +00:00
royger
1f62e17066 xen: create a new PCI bus override
When running as a Xen PVH Dom0 we need to add custom buses that override
some of the functionality present in the ACPI PCI Bus and the PCI Bus. We
currently override the ACPI PCI Bus, but not the PCI Bus, so add a new
override for the PCI Bus and share the generic functions between them.

Reported by: David P. Discher <dpd@dpdtech.com>
Sponsored by: Citrix Systems R&D

conf/files.amd64:
 - Add the new files.

x86/xen/xen_pci_bus.c:
 - Generic file that contains the PCI overrides so they can be used by the
   several PCI specific buses.

xen/xen_pci.h:
 - Prototypes for the generic overried functions.

dev/xen/pci/xen_pci.c:
 - Xen specific override for the PCI bus.

dev/xen/pci/xen_acpi_pci.c:
 - Xen specific override for the ACPI PCI bus.
2014-12-09 18:03:25 +00:00
royger
ba7194c81a xen: notify ACPI about SCI override
If the SCI is remapped to a non-ISA global interrupt notify the ACPI
subsystem about the override.

Reported by: David P. Discher <dpd@dpdtech.com>
Sponsored by: Citrix Systems R&D
2014-12-09 11:12:24 +00:00
jhb
1671ac9155 Improve support for XSAVE with debuggers.
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
  to match what Linux does in that 1) it dumps the entire XSAVE area
  including the fxsave state, and 2) it stashes a copy of the current
  xsave mask in the unused padding between the fxsave state and the
  xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
  only the extra portion. This avoids having to always make two
  ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
  XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
  and the current XSAVE mask (%xcr0).

Differential Revision:	https://reviews.freebsd.org/D1193
Reviewed by:	kib
MFC after:	2 weeks
2014-11-21 20:53:17 +00:00
jhb
fdfced8ce8 MFamd64: Add support for extended FPU states on i386. This includes
support for AVX on i386.
- Similar to amd64, move the FPU save area out of the PCB and instead
  store saved FPU state in a variable-sized buffer after the PCB on the
  stack.
- To support the variable PCB location, alter the locore code to only use
  the bottom-most page of proc0stack for init386().  init386() returns
  the correct stack pointer to locore which adjusts the stack for thread0
  before calling mi_startup().
- Don't bother setting cr3 in thread0's pcb in locore before calling
  init386().  It wasn't used (init386() overwrote it at the end) and
  it doesn't work with the variable-sized FPU save area.
- Remove the new-bus attachment from npx.  This was only ever useful for
  external co-processors using IRQ13, but those have not been supported
  for several years.  npxinit() is now called much earlier during boot
  (init386()) similar to amd64.
- Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE.
- npxsave() is now only called from context switch contexts so it can
  use XSAVEOPT.

Differential Revision:	https://reviews.freebsd.org/D1058
Reviewed by:	kib
Tested on:	FreeBSD/i386 VM under bhyve on Intel i5-2520
2014-11-02 22:58:30 +00:00
jhb
d47eb7d2d4 Rework virtual machine hypervisor detection.
- Move the existing code to x86/x86/identcpu.c since it is x86-specific.
- If the CPUID2_HV flag is set, assume a hypervisor is present and query
  the 0x40000000 leaf to determine the hypervisor vendor ID.  Export the
  vendor ID and the highest supported hypervisor CPUID leaf via
  hv_vendor[] and hv_high variables, respectively.  The hv_vendor[]
  array is also exported via the hw.hv_vendor sysctl.
- Merge the VMWare detection code from tsc.c into the new probe in
  identcpu.c.  Add a VM_GUEST_VMWARE to identify vmware and use that in
  the TSC code to identify VMWare.

Differential Revision:	https://reviews.freebsd.org/D1010
Reviewed by:	delphij, jkim, neel
2014-10-28 19:17:44 +00:00
grehan
1d26d798b2 Output a summary of optional SVM features in dmesg similar to CPU features.
If bootverbose is enabled, a detailed list is provided; otherwise, a
single-line summary is displayed.

Differential Revision:	https://reviews.freebsd.org/D1008
Reviewed by:	jhb, neel
MFC after:	1 week
2014-10-27 22:02:35 +00:00
royger
919b7d8b7c xen: implement the privcmd user-space device
This device is only attached to priviledged domains, and allows the
toolstack to interact with Xen. The two functions of the privcmd
interface is to allow the execution of hypercalls from user-space, and
the mapping of foreign domain memory.

Sponsored by: Citrix Systems R&D

i386/include/xen/hypercall.h:
amd64/include/xen/hypercall.h:
 - Introduce a function to make generic hypercalls into Xen.

xen/interface/xen.h:
xen/interface/memory.h:
 - Import the new hypercall XENMEM_add_to_physmap_range used by
   auto-translated guests to map memory from foreign domains.

dev/xen/privcmd/privcmd.c:
 - This device has the following functions:
   - Allow user-space applications to make hypercalls into Xen.
   - Allow user-space applications to map memory from foreign domains,
     this is accomplished using the newly introduced hypercall
     (XENMEM_add_to_physmap_range).

xen/privcmd.h:
 - Public ioctl interface for the privcmd device.

x86/xen/hvm.c:
 - Remove declaration of hypercall_page, now it's declared in
   hypercall.h.

conf/files:
 - Add the privcmd device to the build process.
2014-10-22 17:07:20 +00:00
royger
bca2091349 xen: allow to register event channels without handlers
This is needed by the event channel user-space device, that requires
registering event channels without unmasking them. intr_add_handler
will unconditionally unmask the event channel, so we avoid calling it
if no filter/handler is provided, and then the user will be in charge
of calling it when ready.

In order to do this, we need to change the opaque type
xen_intr_handle_t to contain the event channel port instead of the
opaque cookie returned by intr_add_handler, since now registration of
event channels without handlers are allowed. The cookie will now be
stored inside of the private xenisrc struct. Also, introduce a new
function called xen_intr_add_handler that allows adding a
filter/handler after the event channel has been registered.

Sponsored by: Citrix Systems R&D

x86/xen/xen_intr.c:
 - Leave the event channel without a handler if no filter/handler is
   provided to xen_intr_bind_isrc.
 - Don't perform an evtchn_mask_port, intr_add_handler will already do
   it.
 - Change the opaque type xen_intr_handle_t to contain a pointer to
   the event channel port number, and make the necessary changes to
   related functions.
 - Introduce a new function called xen_intr_add_handler that can be
   used to add filter/handlers to an event channel after registration.

xen/xen_intr.h:
 - Add prototype of xen_intr_add_handler.
2014-10-22 16:51:52 +00:00
royger
040f3ac494 xen: fix usage of kern_getenv in PVH code
The value returned by kern_getenv should be freed using freeenv.

Reported by:	Coverity
CID:		1248852
Sponsored by: Citrix Systems R&D
2014-10-22 16:49:00 +00:00
marcel
48e5a4e056 Virtual machines can easily have more than 16 option ROMs and
when that happens, we happily access our resource array out of
bounds. Make sure we stay within the MAX_ROMS limit.
While here, bump MAX_ROMS from 16 to 32 to minimize the chance
of leaving option ROMs unaccounted for.

Obtained from:	Juniper Networks, Inc.
2014-10-22 01:37:32 +00:00
hselasky
49c137f7be Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
neel
be8b3ca439 Merge from projects/bhyve_svm all the changes outside vmm.ko or bhyve utilities:
Add support for AMD's nested page tables in pmap.c:
- Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit)
  for a pmap of type PT_RVI.
- Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of
  type PT_EPT or PT_RVI.

Add CPU_SET_ATOMIC_ACQ(num, cpuset):
This is used when activating a vcpu in the nested pmap. Using the 'acquire'
variant guarantees that the load of the 'pm_eptgen' will happen only after
the vcpu is activated in 'pm_active'.

Add defines for various AMD-specific MSRs.

Submitted by:	Anish Gupta (akgupt3@gmail.com)
2014-10-20 18:09:33 +00:00
davide
e88bd26b3f Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00