Commit Graph

446 Commits

Author SHA1 Message Date
Roger Pau Monné
e68c8d7f2b atpic: make sure atpic_init is called after IO APIC initialization
After r269510 the IO APIC and ATPIC initialization is done at the same
order, which means atpic_init can be called before the IO APIC has
been initalized. In that case the ATPIC will take over the interrupt
sources, preventing the IO APIC from registering them.

Reported by: David Wolfskill <david@catwhisker.org>
Tested by: David Wolfskill <david@catwhisker.org>,
           Trond Endrestøl <Trond.Endrestol@fagskolen.gjovik.no>
Sponsored by: Citrix Systems R&D
2014-08-07 17:00:50 +00:00
Roger Pau Monné
c2641d23e1 xen: add ACPI bus to xen_nexus when running as Dom0
Also disable a couple of ACPI devices that are not usable under Dom0.
To this end a couple of booleans are added that allow disabling ACPI
specific devices.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb

x86/xen/xen_nexus.c:
 - Return BUS_PROBE_SPECIFIC in the Xen Nexus attachement routine to
   force the usage of the Xen Nexus.
 - Attach the ACPI bus when running as Dom0.

dev/acpica/acpi_cpu.c:
dev/acpica/acpi_hpet.c:
dev/acpica/acpi_timer.c
 - Add a variable that gates the addition of the devices.

x86/include/init.h:
 - Declare variables that control the attachment of ACPI cpu, hpet and
   timer devices.
2014-08-04 09:05:28 +00:00
Roger Pau Monné
c0c19cce9e xen: implement support for mapping IO APIC interrupts on Xen
Allow a privileged Xen guest (Dom0) to parse the MADT ACPI interrupt
overrides and register them with the interrupt subsystem.

Also add a Xen specific implementation for bus_config_intr that
registers interrupts on demand for all the vectors less than
FIRST_MSI_INT.

Sponsored by: Citrix Systems R&D

x86/xen/pvcpu_enum.c:
 - Use helper functions from x86/acpica/madt.c in order to parse
   interrupt overrides from the MADT.
 - Walk the MADT and register any interrupt override with the
   interrupt subsystem.

x86/xen/xen_nexus.c:
 - Add a custom bus_config_intr method for Xen that intercepts calls
   to configure unset interrupts and registers them on the fly (if the
   vector is < FIRST_MSI_INT).
2014-08-04 09:01:21 +00:00
Roger Pau Monné
f48223fad2 x86/madt: make the interrupt override parser a public function
Split a portion of the code in madt_parse_interrupt_override to a
separate function, that is public and can be used from other code.
This will be needed by the Xen port, since FreeBSD needs to parse the
interrupt overrides and notify Xen about them.

This commit should not introduce any functional change.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb, gibbs

x86/acpica/madt.c:
 - Introduce madt_parse_interrupt_values() that parses the intr
   information from ACPI and returns the triggering and the polarity.
   This is a subset of the functionality that used to be part of
   madt_parse_interrupt_override().
 - Make madt_found_sci_override a global variable that can be used
   from other files.

x86/include/acpica_machdep.h:
 - Prototype of madt_parse_interrupt_values.
 - Extern declaration of madt_found_sci_override.
2014-08-04 08:58:50 +00:00
Roger Pau Monné
d9aa19f1f4 xen: change quality of the MADT ACPI enumerator
Lower the quality of the MADT ACPI enumerator, so on Xen Dom0 we can
force the usage of the Xen mptable enumerator even when ACPI is
detected.

This is needed because Xen might restrict the number of vCPUs
available to Dom0, but the MADT ACPI table parsed in FreeBSD is the
native one (which enumerates all the CPUs available in the system).

Sponsored by: Citrix Systems R&D
Reviewed by: gibbs

x86/acpica/madt.c:
 - Lower MADT enumerator quality to -50.

x86/xen/pvcpu_enum.c:
 - Rise Xen PV enumerator to 0.
2014-08-04 08:56:20 +00:00
Roger Pau Monné
a36a5425c7 xen: change order of Xen intr init and IO APIC registration
This change inserts the Xen interrupt subsystem (event channels)
initialization between the system interrupt initialization and the IO
APIC source registration.

This is needed when running on Dom0, that routes physical interrupts
on top of event channels, so that the interrupt sources found during
IO APIC initialization can be registered using the Xen interrupt
subsystem.

The resulting order in the SI_SUB_INTR stage is the following:

- System intr initialization
- Xen intr initalization
- IO APIC source registration

Sponsored by: Citrix Systems R&D

x86/x86/local_apic.c:
 - Change order of apic_setup_io to be called after xen interrupt
   subsystem is setup.

x86/xen/xen_intr.c:
 - Init Xen event channels before apic_setup_io.
2014-08-04 08:54:34 +00:00
Roger Pau Monné
a33ea97e26 xen: add a DDB command to print event channel information
Add a new DDB command to dump all registered event channels.

Sponsored by: Citrix Systems R&D

x86/xen/xen_intr.c:
 - Add a new xen_evtchn command to DDB in order to dump all
   information related to event channels.
2014-08-04 08:52:10 +00:00
Roger Pau Monné
c9f3ec7fd2 xen: mask all event channels on init
Mask all event channels during initialization. This is done so that we
don't receive spurious interrupts while dynamically registering new
event channels. There's a small window during registration where an
event channel can fire before we have attached a handler to it.

Sponsored by: Citrix Systems R&D

x86/xen/xen_intr.c:
 - Mask all event channels on init.
2014-08-04 08:43:27 +00:00
Roger Pau Monné
25e34dd327 xen: implement event channel PIRQ support
This allows Dom0 to manage physical hardware, redirecting the
physical interrupts to event channels.

Sponsored by: Citrix Systems R&D

x86/xen/xen_intr.c:
 - Expand struct xenisrc to hold the level and triggering of PIRQ
   event channels.
 - Implement missing methods in xen_intr_pirq_pic.
 - Allow xen_intr_alloc_isrc to take a vector parameter that globally
   identifies the interrupt. This is only used for PIRQs that are
   bound to a specific hardware IRQ.
 - Introduce xen_register_pirq used to register IO APIC legacy PIRQ
   interrupts.
 - Add support for the dynamic PIRQ EOI map, this shared memory is
   modified by Xen (if it suppoorts that feature), and notifies the
   guest if an EOI is needed or not. If it's not available fall back
   to the old implementation using PHYSDEVOP_irq_status_query.
 - Rename xen_intr_isrc_count to xen_intr_auto_vector_count and
   replace it's usages.
 - Align static variables by name.

xen/xen_intr.h:
 - Add prototype for xen_register_pirq.
2014-08-04 08:42:29 +00:00
John Baldwin
06fc6db948 - Output a summary of optional VT-x features in dmesg similar to CPU
features.  If bootverbose is enabled, a detailed list is provided;
  otherwise, a single-line summary is displayed.
- Add read-only sysctls for optional VT-x capabilities used by bhyve
  under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that
  indicate the presence of optional capabilities under this node.

CR:		https://phabric.freebsd.org/D498
Reviewed by:	grehan, neel
MFC after:	1 week
2014-07-30 00:00:12 +00:00
Marius Strobl
dc0ca75105 Fix yet another comment typo in r269052. 2014-07-29 14:54:23 +00:00
Marius Strobl
fe88aba370 Fix comment typo in r269052.
Submitted by:	Daniel O'Connor
2014-07-29 13:26:24 +00:00
Shunsuke Akiyama
0bd50b2210 Add missing newline to output dmesg properly. 2014-07-28 13:47:02 +00:00
Gavin Atkinson
f6b4f5ca21 Add error return to dumpsys(), and use it in doadump().
This commit does not add error returns to minidumpsys() or
textdump_dumpsys(); those can also be added later.

Submitted by:	Conrad Meyer (EMC / Isilon storage division)
2014-07-25 23:52:53 +00:00
Marius Strobl
a0d9385faa Intel desktop Haswell CPUs may report benign corrected parity errors (see
HSD131 erratum in [1]) at a considerable rate. So filter these (default),
unless logging is enabled. Unfortunately, there really is no better way to
reasonably implement suppressing these errors than to just skipping them
in mca_log(). Given that they are reported for bank 0, they'd need to be
masked in MSR_MC0_CTL. However, P6 family processors require that register
to be set to either all 0s or all 1s, disabling way more than the one error
in question when using all 0s there. Alternatively, it could be masked for
the corresponding CMCI, but that still wouldn't keep the periodic scanner
from detecting these spurious errors. Apart from that, register contents of
MSR_MC0_CTL{,2} don't seem to be publicly documented, neither in the Intel
Architectures Developer's Manual nor in the Haswell datasheets.

Note that while HSD131 actually is only about C0-stepping as of revision
014 of the Intel desktop 4th generation processor family specification
update, these corrected errors also have been observed with D0-stepping
aka "Haswell Refresh".

1: http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf

Reviewed by:	jhb
MFC after:	3 days
Sponsored by:	Bally Wulff Games & Entertainment GmbH
2014-07-24 10:14:51 +00:00
John Baldwin
fae9277339 Fix build with SMP disabled.
CR:		https://phabric.freebsd.org/D407
Reviewed by:	royger
2014-07-15 15:40:33 +00:00
Marcel Moolenaar
e7d939bda2 Remove ia64.
This includes:
o   All directories named *ia64*
o   All files named *ia64*
o   All ia64-specific code guarded by __ia64__
o   All ia64-specific makefile logic
o   Mention of ia64 in comments and documentation

This excludes:
o   Everything under contrib/
o   Everything under crypto/
o   sys/xen/interface
o   sys/sys/elf_common.h

Discussed at: BSDcan
2014-07-07 00:27:09 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Hans Petter Selasky
eff808e10a Fix compile warning: Remove duplicate external declaration. 2014-06-19 05:06:24 +00:00
Roger Pau Monné
8114c8e190 xen: fix out-of-bounds access to ipi_handle
Fix the gate in xen_pv_lapic_ipi_vectored to prevent access to element
at position nitems(xen_ipis).

Sponsored by: Citrix Systems R&D
Coverity ID: 1223203
Approved by: gibbs
2014-06-18 13:41:20 +00:00
Konstantin Belousov
d09b3c60d0 Do not reference native_lapic_ipi_*() functions in the UP build.
The functions' definitions are protected by #ifdef SMP.
Keeping apic_ops.ipi_*() methods NULL would allow to catch the use
on UP machines.

Reviewed by:	royger
Sponsored by:	The FreeBSD Foundation
2014-06-17 09:33:22 +00:00
Roger Pau Monné
24f7e474cc xen: add missing files
Commit missing files that actually belong to previous commits.

Sponsored by: Citrix Systems R&D
Approved by: gibbs
2014-06-16 08:54:04 +00:00
Roger Pau Monné
79cd455edb isa: allow ISA bus to attach to xenpv bus
This is needed because syscons depends on ISA.

Sponsored by: Citrix Systems R&D
Approved by: gibbs

x86/isa/isa.c:
 - Allow the ISA bus to attach to xenpv.
2014-06-16 08:49:16 +00:00
Roger Pau Monné
842471b331 xen: add hooks for Xen PV APIC
Create the necessary hooks in order to provide a Xen PV APIC
implementation that can be used on PVH. Most of the lapic ops
shouldn't be called on Xen, since we trap those operations at a higher
layer.

Sponsored by: Citrix Systems R&D
Approved by: gibbs

x86/xen/hvm.c:
x86/xen/xen_apic.c:
 - Move IPI related code to xen_apic.c

x86/xen/xen_apic.c:
 - Introduce Xen PV APIC implementation, most of the functions of the
   lapic interface should never be called when running as PV(H) guest,
   so make sure FreeBSD panics when trying to use one of those.
 - Define the Xen APIC implementation in xen_apic_ops.

xen/xen_pv.h:
 - Extern declaration of the xen_apic struct.

x86/xen/pv.c:
 - Use xen_apic_ops as apic_ops when running as PVH guest.

conf/files.amd64:
conf/files.i386:
 - Include the xen_apic.c file in the build of i386/amd64 kernels
   using XENHVM.
2014-06-16 08:43:45 +00:00
Roger Pau Monné
ef409ede7b amd64/i386: introduce APIC hooks for different APIC implementations.
This is needed for Xen PV(H) guests, since there's no hardware lapic
available on this kind of domains. This commit should not change
functionality.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Approved by: gibbs

amd64/include/cpu.h:
amd64/amd64/mp_machdep.c:
i386/include/cpu.h:
i386/i386/mp_machdep.c:
 - Remove lapic_ipi_vectored hook from cpu_ops, since it's now
   implemented in the lapic hooks.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
 - Use lapic_ipi_vectored directly, since it's now an inline function
   that will call the appropiate hook.

x86/x86/local_apic.c:
 - Prefix bare metal public lapic functions with native_ and mark them
   as static.
 - Define default implementation of apic_ops.

x86/include/apicvar.h:
 - Declare the apic_ops structure and create inline functions to
   access the hooks, so the change is transparent to existing users of
   the lapic_ functions.

x86/xen/hvm.c:
 - Switch to use the new apic_ops.
2014-06-16 08:43:03 +00:00
Roger Pau Monné
5f35f84fa0 xen: fix style in pv.c
Fix the lenght of some comments, and also add proper indentation to
xen_init_ops

Sponsored by: Citrix Systems R&D
Approved by: gibbs
2014-06-16 08:41:57 +00:00
Scott Long
de569d9181 Eliminate the fake contig_dmamap and replace it with a new flag,
BUS_DMA_KMEM_ALLOC.  They serve the same purpose, but using the flag
means that the map can be NULL again, which in turn enables significant
optimizations for the common case of no bouncing.

Obtained from:	Netflix, Inc.
MFC after:	3 days
2014-05-27 21:31:11 +00:00
Scott Long
ed4910768a Now that there are separate back-end implementations of busdma, the bounce
implementation shouldn't steal flags from the common front-end.
Move those flags to the back-end.

Obtained from:  Netflix, Inc.
MFC after:      3 days
2014-05-27 14:18:57 +00:00
Scott Long
9359d2ac62 Revert r266481. It was based on faulty analysis of the problem. A correct
fix is forthcoming.

Obtained from:	Netflix, Inc.
2014-05-27 14:06:23 +00:00
John Baldwin
d8d025897e Whitespace fix.
Submitted by:	kib
2014-05-22 18:13:17 +00:00
Scott Long
96d67b46df Old PCIe implementations cannot allow a DMA transfer to cross a 4GB
boundary.  This was addressed several years ago by creating a parent
tag hierarchy for the root buses that set the boundary restriction
for appropriate buses and allowed child deviced to inherit it.
Somewhere along the way, this restriction was turned into a case for
marking the tag as a candidate for needing bounce buffers, instead
of just splitting the segment along the boundary line.  This flag
also causes all maps associated with this tag to be non-NULL, which
in turn causes bus_dmamap_sync() to take the slow path of function
pointer indirection to discover that there's no bouncing work to
do.  The end result is a lot of pages set aside in bounce pools
that will never be used, and a slow path for data buffers in nearly
every DMA-capable PCIe device.  For example, our workload at Netflix
was spending nearly 1% of all CPU time going through this slow path.

Fix this problem by being more selective about when to set the
COULD_BOUNCE flag.  Only set it when the boundary restriction
exists and the consumer cannot do more than a single DMA segment
at once.  This fixes the case of dynamic buffers (mbufs, bio's)
but doesn't address static buffers allocated from bus_dmamem_alloc().
That case will be addressed in the future.

For those interested, this was discovered thanks to Dtrace Flame
Graphs.

Discussed with: jhb, kib
Obtained from:	Netflix, Inc.
MFC after:	3 days
2014-05-20 22:43:17 +00:00
John Baldwin
355d8a2f91 Add definitions for more structured extended features as well as
XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions).

Obtained from:	Intel's Instruction Set Extensions Programming Reference
                (March 2014)
2014-05-16 17:45:09 +00:00
Warner Losh
b7df74ee5a Make this compile with gcc.
Submitted by: royger@
2014-04-05 22:43:18 +00:00
Ryan Stone
6749935455 Re-implement the DMAR I/O MMU code in terms of PCI RIDs
Under the hood the VT-d spec is really implemented in terms of
PCI RIDs instead of bus/slot/function, even though the spec makes
pains to convert back to bus/slot/function in examples.  However
working with bus/slot/function is not correct when PCI ARI is
in use, so convert to using RIDs in most cases.  bus/slot/function
will only be used when reporting errors to a user.

Reviewed by:	kib
MFC after:	2 months
Sponsored by:	Sandvine Inc.
2014-04-01 15:48:46 +00:00
Ryan Stone
7036ae46bf Revert PCI RID changes.
My PCI RID changes somehow got intermixed with my PCI ARI patch when I
committed it.  I may have accidentally applied a patch to a non-clean
working tree.  Revert everything while I figure out what went wrong.

Pointy hat to: rstone
2014-04-01 15:06:03 +00:00
Ryan Stone
b5eb8abe3e Re-implement the DMAR I/O MMU code in terms of PCI RIDs
Under the hood the VT-d spec is really implemented in terms of
PCI RIDs instead of bus/slot/function, even though the spec makes
pains to convert back to bus/slot/function in examples.  However
working with bus/slot/function is not correct when PCI ARI is
in use, so convert to using RIDs in most cases.  bus/slot/function
will only be used when reporting errors to a user.

Reviewed by:	kib
Sponsored by:	Sandvine Inc.
2014-04-01 14:51:45 +00:00
Tijl Coosemans
0a4c54d606 Rename __wchar_t so it no longer conflicts with __wchar_t from clang 3.4
-fms-extensions.

MFC after:	2 weeks
2014-04-01 14:46:11 +00:00
Takanori Watanabe
95c52f142d Change default logic to CONFORM because this routine is shared
with SCI polarity setting.

Reviewed by: jhb
2014-03-28 02:38:14 +00:00
Takanori Watanabe
337877ca21 Strict value checking will cause problem.
Bay trail DN2820FYKH is supported on Linux but does not work on FreeBSD.
This behaviour is bug-compatible with Linux-3.13.5.

References:
http://d.hatena.ne.jp/syuu1228/20140326
http://lxr.linux.no/linux+v3.13.5/arch/x86/kernel/acpi/boot.c#L1094

Submitted by: syuu
2014-03-27 06:36:38 +00:00
Takanori Watanabe
7dcf10fe67 To check polarity, check ACPI_MADT_POLARITY_CONFORMS, instead of ACPI_MADT_TRIGGER_CONFORMS.
PR:amd64/188010
Submitted by: syuu
2014-03-27 06:08:07 +00:00
John Baldwin
6b7797339b Fix build without SMP.
PR:		kern/187854
MFC after:	1 week
2014-03-26 17:40:13 +00:00
Warner Losh
f79309d29c Remove vestiges of knowing the ISA bus, which we gave up on around 20
years ago. Remove redunant copy of isaregs.h.
2014-03-19 21:03:04 +00:00
Konstantin Belousov
9d0bc6d88f Add support for the PCI(e)-PCI bridges to the Intel VT-d driver. The
bridge takes ownership of the transaction, so bsf of the requester is
the bridge and not a device behind it.  As result, code needs to walk
the hierarchy up to use correct context.

Note that PCIe->PCI-X bridges are not handled quite correctly since
such bridges are allowed to only take ownership of some transactions.
Also, weird but unrealistic cases of PCIe behind PCI bus are also not
handled.

Still, the patch provides significant step forward for the bridge
handling.

Submitted by:	Jason Harmening <jason.harmening@gmail.com>
MFC after:	1 week
2014-03-18 16:41:32 +00:00
Konstantin Belousov
e02b05b39e It is not uncommon for BIOSes to report wrong RMRR entries in DMAR
table.  Among them, some (old AMI ?) BIOSes report entries with range
like (bf7ec000, bf7ebfff).  Attempts to ignore the bogus entries
result in faults, so the range must be covered somehow.

Provide a workaround by identity mapping the 32 pages after the bogus
entry start, which seems to be enough for the reported BIOS.

Reported and tested by:	Jason Harmening <jason.harmening@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-03-18 16:20:33 +00:00
Konstantin Belousov
e8c5884213 Trim at EOL.
MFC after:	3 days
2014-03-18 15:59:06 +00:00
Ed Maste
0fcefb433d Update NetBSD Foundation copyrights to 2-clause BSD
The NetBSD Foundation states "Third parties are encouraged to change the
license on any files which have a 4-clause license contributed to the
NetBSD Foundation to a 2-clause license."

This change removes clauses 3 and 4 from copyright / license blocks that
list The NetBSD Foundation as the only copyright holder.

Sponsored by:	The FreeBSD Foundation
2014-03-18 01:40:25 +00:00
John Baldwin
053cbfe3bd Correct type for malloc().
Submitted by:	"Conrad Meyer" <conrad.meyer@isilon.com>
2014-03-13 18:11:42 +00:00
Roger Pau Monné
079f7ef839 xen: add a hook to perform AP startup
AP startup on PVH follows the PV method, so we need to add a hook in
order to diverge from bare metal.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Add hook for start_all_aps on native (using native_start_all_aps
   defined in mp_machdep).

amd64/amd64/mp_machdep.c:
 - Make some variables global because they will also be used by the
   Xen PVH AP startup code.
 - Use the start_all_aps hook to start APs.
 - Rename start_all_aps to native_start_all_aps.

amd64/include/smp.h:
 - Add declaration for native_start_all_aps.

x86/include/init.h:
 - Declare start_all_aps hook in init_ops.

x86/xen/pv.c:
 - Pick external declarations from mp_machdep.
 - Introduce Xen PV code to start APs on PVH.
 - Set start_all_aps init hook to use the Xen PVH implementation.
2014-03-11 10:27:57 +00:00
Roger Pau Monné
2afbc44b38 xen: changes to hvm code in order to support PVH guests
On PVH we don't need to init the shared info page, or disable emulated
devices. Also, make sure PV IPIs are set before starting the APs.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

x86/xen/hvm.c:
 - Return early from functions that are no-ops on Xen PVH guests.
 - In order to make sure PV IPIs are setup before AP startup,
   initialize them in SI_SUB_SMP-1.
2014-03-11 10:26:53 +00:00
Roger Pau Monné
5a036d7e02 xen: add hook for AP bootstrap memory reservation
This hook will only be implemented for bare metal, Xen doesn't require
any bootstrap code since APs are started in long mode with paging
enabled.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Set mp_bootaddress hook for bare metal.

x86/include/init.h:
 - Define mp_bootaddress in init_ops.
2014-03-11 10:26:16 +00:00
Roger Pau Monné
3d80242f23 xen: add an apic_enumerator for PVH
On PVH there's no ACPI, so the CPU enumeration must be implemented
using Xen hypercalls.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

x86/xen/pvcpu_enum.c:
 - Enumerate avaiable vCPUs on PVH by using the VCPUOP_is_up
   hypercall.
 - Set vcpu_id for PVH guests.

conf/files.amd64:
 - Include the PV CPU enumerator in the XENHVM build.
2014-03-11 10:25:08 +00:00
Roger Pau Monné
4d30a3fb95 xen: use the same hypercall mechanism for XEN and XENHVM
Currently XEN (PV) and XENHVM (PVHVM) ports use different ways to
issue hypercalls, unify this by filling the hypercall_page under HVM
also.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/include/xen/hypercall.h:
 - Unify Xen hypercall code by always using the PV way.

i386/i386/locore.s:
 - Define hypercall_page on i386 XENHVM.

x86/xen/hvm.c:
 - Fill hypercall_page on XENHVM kernels using the HVM method (only
   when running as an HVM guest).
2014-03-11 10:24:13 +00:00
Roger Pau Monné
1e69553ed1 xen: implement hook to fetch and parse e820 memory map
e820 memory map is fetched using a hypercall under Xen PVH, so add a
hook to init_ops in oder to diverge from bare metal and implement a
Xen variant.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

x86/include/init.h:
 - Add a parse_memmap hook to init_ops, that will be called to fetch
   and parse the memory map.

amd64/amd64/machdep.c:
 - Decouple the fetch and the parse of the memmap, so the parse
   function can be shared with Xen code.
 - Move code around in order to implement the parse_memmap hook.

amd64/include/pc/bios.h:
 - Declare bios_add_smap_entries (implemented in machdep.c).

x86/xen/pv.c:
 - Implement fetching of e820 memmap when running as a PVH guest by
   using the XENMEM_memory_map hypercall.
2014-03-11 10:23:03 +00:00
Roger Pau Monné
5f05c79450 xen: implement an early timer for Xen PVH
When running as a PVH guest, there's no emulated i8254, so we need to
use the Xen PV timer as the early source for DELAY. This change allows
for different implementations of the early DELAY function and
implements a Xen variant for it.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

dev/xen/timer/timer.c:
dev/xen/timer/timer.h:
 - Implement Xen early delay functions using the PV timer and declare
   them.

x86/include/init.h:
 - Add hooks for early clock source initialization and early delay
   functions.

i386/i386/machdep.c:
pc98/pc98/machdep.c:
amd64/amd64/machdep.c:
 - Set early delay hooks to use the i8254 on bare metal.
 - Use clock_init (that will in turn make use of init_ops) to
   initialize the early clock source.

amd64/include/clock.h:
i386/include/clock.h:
 - Declare i8254_delay and clock_init.

i386/xen/clock.c:
 - Rename DELAY to i8254_delay.

x86/isa/clock.c:
 - Introduce clock_init that will take care of initializing the early
   clock by making use of the init_ops hooks.
 - Move non ISA related delay functions to the newly introduced delay
   file.

x86/x86/delay.c:
 - Add moved delay related functions.
 - Implement generic DELAY function that will use the init_ops hooks.

x86/xen/pv.c:
 - Set PVH hooks for the early delay related functions in init_ops.

conf/files.amd64:
conf/files.i386:
conf/files.pc98:
 - Add delay.c to the kernel build.
2014-03-11 10:20:42 +00:00
Roger Pau Monné
97baeefd5b amd64: introduce hook for custom preload metadata parsers
Add hooks to amd64 in order to have diverging implementations, since
on Xen PV the metadata is passed to the kernel in a different form.

Approbed by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Define init_ops for native.
 - Put native code inside of native_parse_preload_data hook.
 - Call the parse_preload_data in order to fill the metadata info.

x86/include/init.h:
 - Declare the init_ops struct.

x86/xen/pv.c:
 - Declare xen_init_ops that contains the Xen PV implementation of
   init_ops.
 - Implement the parse_preload_data for Xen PVH, the info is fetched
   from HYPERVISOR_start_info->cmd_line as provided by Xen.
2014-03-11 10:15:25 +00:00
Roger Pau Monné
aa389b4f8c howto_names: unify declaration
Approved by: gibbs
Sponsored by: Citrix Systems R&D

boot/i386/efi/bootinfo.c:
boot/i386/libi386/bootinfo.c:
boot/ia64/common/bootinfo.c:
boot/powerpc/ofw/metadata.c:
boot/powerpc/ps3/metadata.c:
boot/sparc64/loader/metadata.c:
boot/uboot/common/metadata.c:
boot/userboot/userboot/bootinfo.c:
i386/xen/xen_machdep.c:
 - Include sys/boot.h
 - Remove custom definition of howto_names.

sys/boot.h:
 - Define howto_names.

x86/xen/pv.c:
 - Include sys/boot.h
2014-03-11 10:13:06 +00:00
Roger Pau Monné
c203fa6940 xen: add and enable Xen console for PVH guests
This adds and enables the PV console used on XEN kernels to
GENERIC/XENHVM kernels in order for it to be used on PVH.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

dev/xen/console/console.c:
 - Define console_page.
 - Move xc_printf debug function from i386 XEN code to generic console
   code.
 - Rework xc_printf.
 - Use xen_initial_domain instead of open-coded checks for Dom0.
 - Gate the attach of the PV console to PV(H) guests.

dev/xen/console/xencons_ring.c:
 - Allow the PV Xen console to output earlier by directly signaling
   the event channel in start_info if the event channel is not yet
   initialized.
 - Use HYPERVISOR_start_info instead of xen_start_info.

i386/include/xen/xen-os.h:
 - Remove prototype for xc_printf since it's now declared in global
   xen-os.h

i386/xen/xen_machdep.c:
 - Remove previous version of xc_printf.
 - Remove definition of console_page (now it's defined in the console
   itself).
 - Fix some printf formatting errors.

x86/xen/pv.c:
 - Add some early boot debug messages using xc_printf.
 - Set console_page based on the value passed in start_info.

xen/xen-os.h:
 - Declare console_page and add prototype for xc_printf.
2014-03-11 10:09:23 +00:00
Roger Pau Monné
1a9cdd373a xen: add PV/PVH kernel entry point
Add the PV/PVH entry point and the low level functions for PVH
early initialization.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/genassym.c:
 - Add __FreeBSD_version define to assym.s so it can be used for the
   Xen notes.

amd64/amd64/locore.S:
 - Make bootstack global so it can be used from Xen kernel entry
   point.

amd64/amd64/xen-locore.S:
 - Add Xen notes to the kernel.
 - Add the Xen PV entry point, that is going to call hammer_time_xen.

amd64/include/asmacros.h:
 - Add ELFNOTE macros.

i386/xen/xen_machdep.c:
 - Define HYPERVISOR_start_info for the XEN i386 PV port, which is
   going to be used in some shared code between PV and PVH.

x86/xen/hvm.c:
 - Define HYPERVISOR_start_info for the PVH port.

x86/xen/pv.c:
 - Introduce hammer_time_xen which is going to perform early setup for
   Xen PVH:
    - Setup shared Xen variables start_info, shared_info and
      xen_store.
    - Set guest type.
    - Create initial page tables as FreeBSD expects to find them.
    - Call into native init function (hammer_time).

xen/xen-os.h:
 - Declare HYPERVISOR_start_info.

conf/files.amd64:
 - Add amd64/amd64/locore.S and x86/xen/pv.c to the list of files.
2014-03-11 10:07:01 +00:00
Roger Pau Monné
e8da1c4877 amd64/i386: switch IPI handlers to C code.
Move asm IPIs handlers to C code, so both Xen and native IPI handlers
share the same code.

Reviewed by: jhb
Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/apic_vector.S:
i386/i386/apic_vector.s:
 - Remove asm coded IPI handlers and instead call the newly introduced
   C variants.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
 - Add C coded clones to the asm IPI handlers (moved from
   x86/xen/hvm.c).

i386/include/smp.h:
amd64/include/smp.h:
 - Add prototypes for the C IPI handlers.

x86/xen/hvm.c:
 - Move the C IPI handlers to mp_machdep and call those in the Xen IPI
   handlers.

i386/xen/mp_machdep.c:
 - Add dummy IPI handlers to the i386 Xen PV port (this port doesn't
   support SMP).
2014-03-11 10:03:29 +00:00
Jung-uk Kim
1d22d877b8 Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c.
Inspired by:	jhb
2014-03-04 21:35:57 +00:00
John Baldwin
4edef187b8 Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
  PCI domain/segment.  Host-PCI bridge drivers use this API to allocate
  bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
  their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
  full range of bus numbers from secbus to subbus from their parent bridge.
  The drivers also always program their primary bus register.  The bridge
  drivers also support growing their bus range by extending the bus resource
  and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
  used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.

Reviewed by:	imp
MFC after:	1 month
2014-02-12 04:30:37 +00:00
John Baldwin
e432d5f6a7 Drop the 3rd clause from all 3 clause BSD licenses where I am the sole
holder to convert them to 2 clause BSD licenses.

MFC after:	1 week
2014-02-05 18:13:27 +00:00
John Baldwin
5c039412a2 Move a warning about LINT pins configured with a level trigger under
bootverbose.
2014-02-05 18:11:46 +00:00
Tijl Coosemans
b35ac06804 Rename the AMD MSR_PERFCTR[0-3] so the Pentium Pro MSR_PERFCTR[0-1]
aren't redefined.

Reported by:	"Trivedi, Nishank" <Nishank.Trivedi@netapp.com>
Discussed with:	kib
2014-01-31 14:29:34 +00:00
John Baldwin
e07ef9b0f6 Move <machine/apicvar.h> to <x86/apicvar.h>. 2014-01-23 20:10:22 +00:00
John Baldwin
84ca9aad53 - Reuse legacy_pcib_(read|write)_config() methods in the QPI pcib driver.
- Reuse legacy_pcib_alloc_msi{,x}() methods in the QPI and mptable pcib
  drivers.
2014-01-21 03:14:19 +00:00
John Baldwin
2f0df38779 - Only check the ivars for direct descendants.
- A couple of whitespace fixes.
2014-01-20 17:55:22 +00:00
John Baldwin
6d40361585 The changes in r233781 attempted to make logging during a machine check
exception more readable.  In practice they prevented all logging during
a machine check exception on at least some systems.  Specifically, when
an uncorrected ECC error is detected in a DIMM on a Nehalem/Westmere
class machine, all CPUs receive a machine check exception, but only
CPUs on the same package as the memory controller for the erroring DIMM
log an error.  The CPUs on the other package would complete the scan of
their machine check banks and panic before the first set of CPUs could
log an error.  The end result was a clearer display during the panic
(no interleaved messages), but a crashdump without any useful info about
the error that occurred.

To handle this case, make all CPUs spin in the machine check handler
once they have completed their scan of their machine check banks until
at least one machine check error is logged.  I tried using a DELAY()
instead so that the CPUs would not potentially hang forever, but that
was not reliable in testing.

While here, don't clear MCIP from MSR_MCG_STATUS before invoking panic.
Only clear it if the machine check handler does not panic and returns
to the interrupted thread.
2014-01-08 21:04:12 +00:00
Nathan Whitehorn
dcd08302e5 Retire machine/fdt.h as a header used by MI code, as its function is now
obsolete. This involves the following pieces:
- Remove it entirely on PowerPC, where it is not used by MD code either
- Remove all references to machine/fdt.h in non-architecture-specific code
  (aside from uart_cpu_fdt.c, shared by ARM and MIPS, and so is somewhat
  non-arch-specific).
- Fix code relying on header pollution from machine/fdt.h includes
- Legacy fdtbus.c (still used on x86 FDT systems) now passes resource
  requests to its parent (nexus). This allows x86 FDT devices to allocate
  both memory and IO requests and removes the last notionally MI use of
  fdtbus_bs_tag.
- On those architectures that retain a machine/fdt.h, unused bits like
  FDT_MAP_IRQ and FDT_INTR_MAX have been removed.
2014-01-05 18:46:58 +00:00
John Baldwin
4c9518f884 Fix i386 build.
Pointy hat to:	jhb
2013-12-24 14:48:52 +00:00
John Baldwin
63e62d390d Add a resume hook for bhyve that runs a function on all CPUs during
resume.  For Intel CPUs, invoke vmxon for CPUs that were in VMX mode
at the time of suspend.

Reviewed by:	neel
2013-12-23 19:48:22 +00:00
John Baldwin
b2b76a45bf Use fixed-width types for all fields in MP Table structures and pack
all the structures.  While here, move a helper struct only used in
the kernel parser out of this header since it is not part of the MP
specification itself.
2013-12-11 21:19:04 +00:00
Alexander Motin
9d75ca28f0 Do not DELAY() for P-state transition unless we want to see the result.
Intel manual says: "If a transition is already in progress, transition to
a new value will subsequently take effect. Reads of IA32_PERF_CTL determine
the last targeted operating point."  So seems it should be fine to just
trigger wanted transition and go.  Linux does the same.

MFC after:	1 month
2013-12-10 20:25:43 +00:00
John Baldwin
316032ad20 Move constants for indices in the local APIC's local vector table from
apicvar.h to apicreg.h.
2013-12-09 21:08:52 +00:00
John Baldwin
c71f0d951a Fix the processor table entry structure to use a fixed-width type for
32-bit fields so it is the correct size on amd64.  Remove a workaround
for the broken structure from bhyve(8).

MFC after:	1 week
2013-12-05 21:51:54 +00:00
Eitan Adler
7a22215c53 Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this
shifts into the sign bit.  Instead use (1U << 31) which gets the
expected result.

This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.

A similar change was made in OpenBSD.

Discussed with:	-arch, rdivacky
Reviewed by:	cperciva
2013-11-30 22:17:27 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Ed Maste
3d271aaab0 x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)
Debuggers may need to change PSL_RF.  Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could be
modified via sigreturn, so this change should not provide any new ability
to userspace.

For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

Reviewed by:	jhb, kib
Sponsored by:	DARPA, AFRL
2013-11-14 15:37:20 +00:00
Dimitry Andric
f7f5706f28 Fix gcc warning about an uninitialized bool in sys/x86/iommu/intel_drv.c.
Reviewed by:	kib
2013-11-09 22:05:29 +00:00
Dimitry Andric
d291234c33 Fix gcc warning about an empty device_printf() format string in
sys/x86/iommu/intel_fault.c.

Reviewed by:	kib
2013-11-09 22:00:44 +00:00
Dimitry Andric
d4e70c8074 Fix (erroneous) gcc warnings about usage of uninitialized variables in
sys/x86/iommu/intel_idpgtbl.c.

Reviewed by:	kib
2013-11-09 20:36:52 +00:00
Dimitry Andric
335521936f Fix gcc warnings about casting away const in sys/x86/iommu/intel_drv.c.
Reviewed by:	kib
2013-11-09 20:09:02 +00:00
Dimitry Andric
e7d8b7e43f Initialize variable in sys/x86/iommu/busdma_dmar.c, to avoid possible
uninitialized use.

Reviewed by:	kib
2013-11-08 17:27:22 +00:00
Konstantin Belousov
6f8a44a5dd Add bits for the AMD features from CPUID function 0x80000001 ECX,
described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf.

Partially based on the patch submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-11-08 16:32:30 +00:00
Sean Bruno
ded71d6788 Fix powerd/states on AMD cpus. Resolves issues with system reporting:
hwpstate0: set freq failed, err 6

Tested on FX-8150 and others.

PR:		167018
Submitted by:	avg
MFC after:	2 weeks
2013-11-06 23:29:25 +00:00
Konstantin Belousov
68eeb96ab5 Add support for queued invalidation.
Right now, the semaphore write is scheduled after each batch, which is
not optimal and must be tuned.

Discussed with:	alc
Tested by:	pho
MFC after:	1 month
2013-11-01 17:38:52 +00:00
Konstantin Belousov
3100f7dfb7 Return BUS_PROBE_NOWILDCARD from the DMAR probe method.
Confirmed by:	nwhitehorn
MFC after:	1 month
2013-11-01 17:16:44 +00:00
Mark Johnston
57170f49f2 Remove references to an unused fasttrap probe hook, and remove the
corresponding x86 trap type. Userland DTrace probes are currently handled
by the other fasttrap hooks (dtrace_pid_probe_ptr and
dtrace_return_probe_ptr).

Discussed with:	rpaulo
2013-10-31 02:35:00 +00:00
Konstantin Belousov
4ad0991f6a Remove redundand declaration, fixing the build with gcc.
Reported and tested by:	Michael Butler <imb@protected-networks.net>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-29 07:25:54 +00:00
Konstantin Belousov
06d513424a Remove redundand assignment to error variable and check for its value [1].
Do CTR logging in the case of error as well.

Noted by:	rdivacky [1]
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-28 19:30:09 +00:00
Konstantin Belousov
86be9f0dd5 Import the driver for VT-d DMAR hardware, as specified in the revision
1.3 of Intelб╝ Virtualization Technology for Directed I/O Architecture
Specification.  The Extended Context and PASIDs from the rev. 2.2 are
not supported, but I am not aware of any released hardware which
implements them.  Code does not use queued invalidation, see comments
for the reason, and does not provide interrupt remapping services.

Code implements the management of the guest address space per domain
and allows to establish and tear down arbitrary mappings, but not
partial unmapping.  The superpages are created as needed, but not
promoted.  Faults are recorded, fault records could be obtained
programmatically, and printed on the console.

Implement the busdma(9) using DMARs.  This busdma backend avoids
bouncing and provides security against misbehaving hardware and driver
bad programming, preventing leaks and corruption of the memory by wild
DMA accesses.

By default, the implementation is compiled into amd64 GENERIC kernel
but disabled; to enable, set hw.dmar.enable=1 loader tunable.  Code is
written to work on i386, but testing there was low priority, and
driver is not enabled in GENERIC.  Even with the DMAR turned on,
individual devices could be directed to use the bounce busdma with the
hw.busdma.pci<domain>:<bus>:<device>:<function>.bounce=1 tunable.  If
DMARs are capable of the pass-through translations, it is used,
otherwise, an identity-mapping page table is constructed.

The driver was tested on Xeon 5400/5500 chipset legacy machine,
Haswell desktop and E5 SandyBridge dual-socket boxes, with ahci(4),
ata(4), bce(4), ehci(4), mfi(4), uhci(4), xhci(4) devices.  It also
works with em(4) and igb(4), but there some fixes are needed for
drivers, which are not committed yet.  Intel GPUs do not work with
DMAR (yet).

Many thanks to John Baldwin, who explained me the newbus integration;
Peter Holm, who did all testing and helped me to discover and
understand several incredible bugs; and to Jim Harris for the access
to the EDS and BWG and for listening when I have to explain my
findings to somebody.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-28 13:33:29 +00:00
Konstantin Belousov
3f9d41ed10 Add a virtual table for the busdma methods on x86, to allow different
busdma implementations to coexist.  Copy busdma_machdep.c to
busdma_bounce.c, which is still a single implementation of the busdma
interface on x86 for now.  The busdma_machdep.c only contains common
and dispatch code.

Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-27 22:05:10 +00:00
Konstantin Belousov
80938e75f0 Add bus_dmamap_load_ma() function to load map with the array of
vm_pages.  Provide trivial implementation which forwards the load to
_bus_dmamap_load_phys() page by page.  Right now all architectures use
bus_dmamap_load_ma_triv().

Tested by:	pho (as part of the functional patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-27 21:39:16 +00:00
Konstantin Belousov
5596528930 Add ddb 'show ioapic' and 'show all ioapics' commands.
Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-10-24 20:13:40 +00:00
Poul-Henning Kamp
eafc73a8a9 Add a va_copy() to our fall-back stdarg implementation for use with lint(1)
Approved by:	re@ (glebius@)
2013-10-07 10:01:23 +00:00
Justin T. Gibbs
5fdd34ee20 Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id
field.  Perform vcpu enumeration for Xen PV and HVM environments
and convert all Xen drivers to use vcpu_id instead of a hard coded
assumption of the mapping algorithm (acpi or apic ID) in use.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)

amd64/include/pcpu.h:
i386/include/pcpu.h:
	Add vcpu_id to the amd64 and i386 pcpu structures.

dev/xen/timer/timer.c
x86/xen/xen_intr.c
	Use new vcpu_id instead of assuming acpi_id == vcpu_id.

i386/xen/mp_machdep.c:
i386/xen/mptable.c
x86/xen/hvm.c:
	Perform Xen HVM and Xen full PV vcpu_id mapping.

x86/xen/hvm.c:
x86/acpica/madt.c
	Change SYSINIT ordering of acpi CPU enumeration so that it
	is guaranteed to be available at the time of Xen HVM vcpu
	id mapping.
2013-10-05 23:11:01 +00:00
Justin T. Gibbs
bf57e9793a Correct panic caused by attaching both Xen PV and HyperV virtualization
aware drivers on Xen hypervisors that advertise support for some
HyperV features.

x86/xen/hvm.c:
	When running in HVM mode on a Xen hypervisor, set vm_guest
	to VM_GUEST_XEN so other virtualization aware components in
	the FreeBSD kernel can detect this mode is active.

dev/hyperv/vmbus/hv_hv.c:
	Use vm_guest to ignore Xen's HyperV emulation when Xen is
	detected and Xen PV drivers are active.

Reported by:	Shanker Balan
Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (Xen blanket)
2013-10-05 19:51:09 +00:00
Justin T. Gibbs
940837549b sys/x86/xen/hvm.c:
Set cpu_ops correctly for Xen hypervisors lacking the
	vector callback feature.

	Set preliminary Xen cpu_ops settings during early HVM
	initialization.  The old location raced with the startup
	of APs.

Submitted by:	Roger Pau Monné
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
2013-09-27 15:17:28 +00:00
Justin T. Gibbs
566a5f5020 Merge Xen PVHVM support into the GENERIC kernel config for both
amd64 and i386.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
MFC after:	2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
	- Introduce two new CPU hooks for initialization and resume
	  purposes. This allows us to get rid of the XENHVM ifdefs in
	  mp_machdep, and also sets some hooks into common code that can be
	  used by other hypervisor implementations.

sys/amd64/conf/XENHVM:
sys/i386/conf/XENHVM:
	- Remove these configs now that GENERIC has builtin support for Xen
	  HVM.

sys/kern/subr_smp.c:
	- Make sure there are no pending IPIs when suspending a system.

sys/x86/xen/hvm.c:
	- Add cpu init and resume vectors that are called from mp_machdep
	  using the new hooks.
	- Only clear the vcpu_info mapping data on resume.  It is already
	  clear for the BSP on a cold boot and is set correctly as APs
	  are started.
	- Gate xen_hvm_init_cpu only to systems running under Xen.

sys/x86/xen/xen_intr.c:
	 - Gate the setup of event channels only to systems running under Xen.
2013-09-20 22:59:22 +00:00
Justin T. Gibbs
428b7ca290 Add support for suspend/resume/migration operations when running as a
Xen PVHVM guest.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
MFC after:	2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	- Make sure that are no MMU related IPIs pending on migration.
	- Reset pending IPI_BITMAP on resume.
	- Init vcpu_info on resume.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
	- Add a "suspend_cancelled" parameter to pic_resume().  For the
	  Xen PIC, restoration of interrupt services differs between
	  the aborted suspend and normal resume cases, so we must provide
	  this information.

sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
	- Don't swap out "suspend safe" timers across a suspend/resume
	  cycle.  This includes the Xen PV and ACPI timers.

sys/dev/xen/control/control.c:
	- Perform proper suspend/resume process for PVHVM:
		- Suspend all APs before going into suspension, this allows us
		  to reset the vcpu_info on resume for each AP.
		- Reset shared info page and callback on resume.

sys/dev/xen/timer/timer.c:
	- Implement suspend/resume support for the PV timer. Since FreeBSD
	  doesn't perform a per-cpu resume of the timer, we need to call
	  smp_rendezvous in order to correctly resume the timer on each CPU.

sys/dev/xen/xenpci/xenpci.c:
	- Don't reset the PCI interrupt on each suspend/resume.

sys/kern/subr_smp.c:
	- When suspending a PVHVM domain make sure there are no MMU IPIs
	  in-flight, or we will get a lockup on resume due to the fact that
	  pending event channels are not carried over on migration.
	- Implement a generic version of restart_cpus that can be used by
	  suspended and stopped cpus.

sys/x86/xen/hvm.c:
	- Implement resume support for the hypercall page and shared info.
	- Clear vcpu_info so it can be reset by APs when resuming from
	  suspension.

sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
	- Support UP kernel configurations.

sys/x86/xen/xen_intr.c:
	- Properly rebind per-cpus VIRQs and IPIs on resume.
2013-09-20 05:06:03 +00:00
Justin T. Gibbs
e44af46e4c Implement PV IPIs for PVHVM guests and further converge PV and HVM
IPI implmementations.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Submitted by: gibbs (misc cleanup, table driven config)
Reviewed by:  gibbs
MFC after: 2 weeks

sys/amd64/include/cpufunc.h:
sys/amd64/amd64/pmap.c:
	Move invltlb_globpcid() into cpufunc.h so that it can be
	used by the Xen HVM version of tlb shootdown IPI handlers.

sys/x86/xen/xen_intr.c:
sys/xen/xen_intr.h:
	Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(),
	and remove the ipi vector parameter.  This api allocates
	an event channel port that can be used for ipi services,
	but knows nothing of the actual ipi for which that port
	will be used.  Removing the unused argument and cleaning
	up the comments surrounding its declaration helps clarify
	its actual role.

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
	Implement a generic framework for amd64 and i386 that allows
	the implementation of certain CPU management functions to
	be selected at runtime.  Currently this is only used for
	the ipi send function, which we optimize for Xen when running
	on a Xen hypervisor, but can easily be expanded to support
	more operations.

sys/x86/xen/hvm.c:
	Implement Xen PV IPI handlers and operations, replacing native
	send IPI.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
sys/i386/include/smp.h:
	Remove NR_VIRQS and NR_IPIS from FreeBSD headers.  NR_VIRQS
	is defined already for us in the xen interface files.
	NR_IPIS is only needed in one file per Xen platform and is
	easily inferred by the IPI vector table that is defined in
	those files.

sys/i386/xen/mp_machdep.c:
	Restructure to more closely match the HVM implementation by
	performing table driven IPI setup.
2013-09-06 22:17:02 +00:00
Justin T. Gibbs
f5f4f7f201 Conform to style(9). No functional changes.
sys/x86/xen/hvm.c:
	Do not rely on implicit conversion to boolean in expressions
	(e.g. use "if (rc != 0)" instead of "if (rc)".

	Line continuations for functions are indented an additional
	4 spaces.

	Insert an empty line if the function has no local variables.

	Prefer separate initializtion statements to initialzing
	local variables in their declaration.

	Braces that are not necessary may be left out.

MFC after:	2 weeks
2013-09-01 23:49:36 +00:00
Justin T. Gibbs
76acc41fb7 Implement vector callback for PVHVM and unify event channel implementations
Re-structure Xen HVM support so that:
	- Xen is detected and hypercalls can be performed very
	  early in system startup.
	- Xen interrupt services are implemented using FreeBSD's native
	  interrupt delivery infrastructure.
	- the Xen interrupt service implementation is shared between PV
	  and HVM guests.
	- Xen interrupt handlers can optionally use a filter handler
	  in order to avoid the overhead of dispatch to an interrupt
	  thread.
	- interrupt load can be distributed among all available CPUs.
	- the overhead of accessing the emulated local and I/O apics
	  on HVM is removed for event channel port events.
	- a similar optimization can eventually, and fairly easily,
	  be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
	Reserve IDT vector 0x93 for the Xen event channel upcall
	interrupt handler.  On Hypervisors that support the direct
	vector callback feature, we can request that this vector be
	called directly by an injected HVM interrupt event, instead
	of a simulated PCI interrupt on the Xen platform PCI device.
	This avoids all of the overhead of dealing with the emulated
	I/O APIC and local APIC.  It also means that the Hypervisor
	can inject these events on any CPU, allowing upcalls for
	different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
	Increase the FreeBSD IRQ vector table to include space
	for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
	Remove Xen HVM per-cpu variable data.  These fields are now
	allocated via the dynamic per-cpu scheme.  See xen_intr.c
	for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
	Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
	Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
	Remove constants, macros, and functions unused in FreeBSD's Xen
	support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
	Introduce new functions xen_domain(), xen_pv_domain(), and
	xen_hvm_domain().  These are used in favor of #ifdefs so that
	FreeBSD can dynamically detect and adapt to the presence of
	a hypervisor.  The goal is to have an HVM optimized GENERIC,
	but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
	Refactor magic ioport, Hypercall table and Hypervisor shared
	information page setup, and move it to a dedicated HVM support
	module.

	HVM mode initialization is now triggered during the
	SI_SUB_HYPERVISOR phase of system startup.  This currently
	occurs just after the kernel VM is fully setup which is
	just enough infrastructure to allow the hypercall table
	and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
	Add definitions and a method for configuring Hypervisor event
	delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
	Adjust kernel build to reflect the refactoring of early
	Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
	Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
	Since blkback defers all event handling to a taskqueue,
	convert this task queue to a "fast" taskqueue, and schedule
	it via an interrupt filter.  This avoids an unnecessary
	ithread context switch.

sys/xen/xenstore/xenstore.c:
	The xenstore driver is MPSAFE.  Indicate as much when
	registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
	Remove unused event channel APIs.

sys/xen/evtchn.h:
	Remove all kernel Xen interrupt service API definitions
	from this file.  It is now only used for structure and
	ioctl definitions related to the event channel userland
	device driver.

	Update the definitions in this file to match those from
	NetBSD.  Implementing this interface will be necessary for
	Dom0 support.

sys/xen/evtchn/evtchnvar.h:
	Add a header file for implemenation internal APIs related
	to managing event channels event delivery.  This is used
	to allow, for example, the event channel userland device
	driver to access low-level routines that typical kernel
	consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
	Standardize on the evtchn_port_t type for referring to
	an event channel port id.  In order to prevent low-level
	event channel APIs from leaking to kernel consumers who
	should not have access to this data, the type is defined
	twice: Once in the Xen provided event_channel.h, and again
	in xen/xen_intr.h.  The double declaration is protected by
	__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
	twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
	New implementation of Xen interrupt services.  This is
	similar in many respects to the i386 PV implementation with
	the exception that events for bound to event channel ports
	(i.e. not IPI, virtual IRQ, or physical IRQ) are further
	optimized to avoid mask/unmask operations that aren't
	necessary for these edge triggered events.

	Stubs exist for supporting physical IRQ binding, but will
	need additional work before this implementation can be
	fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
	Add support for placing vcpu_info into an arbritary memory
	page instead of using HYPERVISOR_shared_info->vcpu_info.
	This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
	Add support for new event channle implementation.
2013-08-29 19:52:18 +00:00
Brooks Davis
cb261f4315 Call set_i8254_freq with MODE_STOP (0) rather than a magic number of 0. 2013-08-15 17:21:06 +00:00
Jung-uk Kim
38da30b419 Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these
two files were functionally identical.
2013-08-13 22:05:10 +00:00
Konstantin Belousov
449c2e92c9 Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain.  The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass.  This is not optimal, since it
could cause excessive page deactivation and freeing.  The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues.  Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
Jeff Roberson
5df87b21d3 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
Andriy Gapon
a69e8d609e x86: detect mwait capabilities and extensions, when present
Reviewed by:	kib (earlier amd64-only version)
MFC after:	2 weeks
2013-07-28 17:54:42 +00:00
Rui Paulo
51091a0763 Fix a KTR_BUSDMA format string. 2013-06-18 06:55:58 +00:00
Marcel Moolenaar
cb34ed4434 Add basic support for FDT to i386 & amd64. This change includes:
1.  Common headers for fdt.h and ofw_machdep.h under x86/include
    with indirections under i386/include and amd64/include.
2.  New modinfo for loader provided FDT blob.
3.  Common x86_init_fdt() called from hammer_time() on amd64 and
    init386() on i386.
4.  Split-off FDT specific low-level console functions from FDT
    bus methods for the uart(4) driver. The low-level console
    logic has been moved to uart_cpu_fdt.c and is used for arm,
    mips & powerpc only. The FDT bus methods are shared across
    all architectures.
5.  Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
    fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from:	Juniper Networks, Inc.
2013-05-21 03:05:49 +00:00
Attilio Rao
7e226537c7 o Add accessor functions to add and remove pages from a specific
freelist.
o Split the pool of free pages queues really by domain and not rely on
  definition of VM_RAW_NFREELIST.
o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific
  function that is called when calculating the allocation domain.
  The RR counter is kept, currently, per-thread.
  In the future it is expected that such function evolves in a real
  policy decision referee, based on specific informations retrieved by
  per-thread and per-vm_object attributes.
o Add the concept of "probed domains" under the form of vm_ndomains.
  It is responsibility for every architecture willing to support multiple
  memory domains to correctly probe vm_ndomains along with mem_affinity
  segments attributes.  Those two values are supposed to remain always
  consistent.
  Please also note that vm_ndomains and td_dom_rr_idx are both int
  because segments already store domains as int.  Ideally u_int would
  have much more sense. Probabilly this should be cleaned up in the
  future.
o Apply RR domain selection also to vm_phys_zero_pages_idle().

Sponsored by:	EMC / Isilon storage division
Partly obtained from:	jeff
Reviewed by:	alc
Tested by:	jeff
2013-05-13 15:40:51 +00:00
Eitan Adler
a164074fc4 Fix several typos
PR:		kern/176054
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
MFC after:	3 days
2013-05-12 16:43:26 +00:00
Hiren Panchasara
46b29ff94a Adding a detach method to p4tcc driver.
PR:	118739
Submitted by:	Dan Lukes <dan@obluda.cz> (earlier version)
Reviewed by:	jhb
Approved by:	sbruno (mentor)
MFC after:	1 week
2013-05-10 22:43:27 +00:00
Attilio Rao
ab13ed1e45 Revert r250339 as apparently it is more clutter than help.
Sponsored by:	EMC / Isilon storage division
Requested by:	jhb
2013-05-08 21:06:47 +00:00
Attilio Rao
16e073e57a Add functions to do ACPI System Locality Information Table parsing
and printing at boot.
For reference on table informations and purposes please review ACPI specs.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	jhb (earlier version)
2013-05-07 22:49:56 +00:00
Attilio Rao
941646f5ec Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
Alexander Motin
b2c63698d4 Introduce kern.timecounter.smp_tsc_adjust tunable (disabled by default) and
respective functionality, allowing to synchronize TSC on APs to match BSP's
during boot.  It may be unsafe in general case due to theoretical chance of
later drift if CPUs are using different clock rate or source, but it allows
to use TSC in some cases when difference caused by some initialization bug,
while TSCs are known to increment synchronously.

Reviewed by:	jimharris, kib
MFC after:	1 month
2013-04-18 17:07:04 +00:00
Rui Paulo
5dfae12246 Move the previously added CPUID7 macros to CPUID_STDEXT. 2013-04-18 07:09:27 +00:00
Rui Paulo
ba5f77bf16 Add the most current CPUID7_* definitions. 2013-04-18 01:30:08 +00:00
Neel Natu
150369ab7c Make the code to check if VMX is enabled more readable by using macros
instead of magic numbers.

Discussed with:	Chris Torek
2013-04-11 04:29:45 +00:00
Neel Natu
1472b87f2f Unsynchronized TSCs on the host require special handling in bhyve:
- use clock_gettime(2) as the time base for the emulated ACPI timer instead
  of directly using rdtsc().

- don't advertise the invariant TSC capability to the guest to discourage it
  from using the TSC as its time base.

Discussed with:	jhb@ (about making 'smp_tsc' a global)
Reported by:	Dan Mack on freebsd-virtualization@
Obtained from:	NetApp
2013-04-10 05:59:07 +00:00
Konstantin Belousov
6460981c3a Record the correct error in the trace.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2013-04-01 09:57:46 +00:00
Alexander Motin
fdc5dd2d2f MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
2013-02-28 13:46:03 +00:00
Warner Losh
22e61bc2c1 Use critical_enter/critical_exit around the time sensitive part of
this code to depessimize the worst case we've lived with silently and
uneventfully for the past 12 years. Add a comment about a refinement
for those needing more assurance of accuracy.

Fix ddb's show rtc command deadlock potential when debugging rtc code
by not taking the lock if we're in the debugger. If you need a thumb
to count the number of people that have encountered this, I'd be
surprised.

Submitted by:	bde
2013-02-21 15:35:48 +00:00
Warner Losh
85e51e4918 Correct comment about use of pmtimer, and the real reason it isn't
used or desirable for amd64.
2013-02-21 06:38:24 +00:00
Warner Losh
7fe826349c Fix broken usage of splhigh() by removing it. 2013-02-21 00:40:08 +00:00
Konstantin Belousov
31a53cd036 Convert machine/elf.h, machine/frame.h, machine/sigframe.h,
machine/signal.h and machine/ucontext.h into common x86 includes,
copying from amd64 and merging with i386.

Kernel-only compat definitions are kept in the i386/include/sigframe.h
and i386/include/signal.h, to reduce amd64 kernel namespace pollution.
The amd64 compat uses its own definitions so far.

The _MACHINE_ELF_WANT_32BIT definition is to allow the
sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions
on the amd64 compile host.  The same hack could be usefully abused by
other code too.
2013-02-20 17:39:52 +00:00
Davide Italiano
ce4642ecd5 Fixup r246916 in case gcc is used to build.
Reported by:	attilio, simon
2013-02-19 16:43:48 +00:00
Alexander Motin
a937c5078f MFcalloutng:
Microoptimize i8254 one-shot operation mode (disabled by default to allow
timecounter functionality) by not writing to mode and MSB registers when
it is not required.  This saves several microseconds of CPU time per call,
reducing minimal measured interrupts interval to 19.5us.
2013-02-17 18:42:30 +00:00
John Baldwin
174b5f3850 Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel
config file.

Requested by:	phk (ages ago)
MFC after:	1 month
2013-02-14 19:38:04 +00:00
Konstantin Belousov
dd0b4fb6d5 Reform the busdma API so that new types may be added without modifying
every architecture's busdma_machdep.c.  It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code.  The MD busdma is then given a chance to do any final processing
in the complete() callback.

The cam changes unify the bus_dmamap_load* handling in cam drivers.

The arm and mips implementations are updated to track virtual
addresses for sync().  Previously this was done in a type specific
way.  Now it is done in a generic way by recording the list of
virtuals in the map.

Submitted by:	jeff (sponsored by EMC/Isilon)
Reviewed by:	kan (previous version), scottl,
	mjacob (isp(4), no objections for target mode changes)
Discussed with:	     ian (arm changes)
Tested by:	marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
	amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
2013-02-12 16:57:20 +00:00
Andriy Gapon
548b201607 x86 suspend/resume: suspend pics and pseudo-pics in reverse order
- change 'pics' from STAILQ to TAILQ
- ensure that Local APIC is always first in 'pics'

Reviewed by:	jhb
Tested by:	Sergey V. Dyatko <sergey.dyatko@gmail.com>,
		KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
MFC after:	12 days
2013-02-02 12:02:42 +00:00
Konstantin Belousov
e7f1427dd2 The change to reduce default smp_tsc_shift caused tsc shift to become
zero on slower machines, which make the fenced get_timecount methods
not used despite needed.  Remove the (shift > 0) condition when
selecting the get_timecount() implementation.

Rename smp_tsc_shift to tsc_shift, and apply it for the UP case too.
Allow shift to reach value of 31 instead of 30, as it was previously
(should be nop).

Reorganize the tc quality calculation to remove the conditionally
compiled block.  Rename test_smp_tsc() to test_tsc() and provide
separate versions for SMP and UP builds.  The check for virtialized
hardware is more natural to perform in the smp version of the
test_tsc(), since it is only done for smp case.

Noted and reviewed by:	bde (previous version)
MFC after:	12 days
2013-02-01 16:48:55 +00:00
Konstantin Belousov
82c3d173cc Reduce default shift used to calculate the max frequency for the TSC
timecounter to 1, and correspondingly increase the precision of the
gettimeofday(2) and related functions in the default configuration.

The motivation for the TSC-low timecounter, as described in the
r222866, seems to provide a workaround for the non-serializing
behaviour of the RDTSC on some Intel hardware.  Tests demonstrate that
even with the pre-shift of 8, the cross-core non-monotonicity of the
RDTSC is still observed reliably, e.g. on the Nehalems.  The r238755
and r238973 implemented the proper fix for the issue.

The pre-shift of 1 is applied to keep TSC not overflowing for the
frequency of hardclock down to 2 sec/intr.  The pre-shift is made a
tunable to allow the easy debugging of the issues users could see with
the shift being too low.

Reviewed by:	bde
MFC after:	2 weeks
2013-01-30 12:43:10 +00:00
John Baldwin
f876ffeae3 Don't attempt to use clflush on the local APIC register window. Various
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7).  The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.

MFC after:	2 weeks
2013-01-17 21:32:25 +00:00
Neel Natu
bf70b87555 Add macros required to enable VMX operation on Intel processors.
Obtained from:	NetApp
2013-01-05 04:20:14 +00:00
Jim Harris
7b332f2020 Add bus_space_read_8 and bus_space_write_8 for amd64.
Rather than trying to KASSERT for callers that invoke this on
IO tags, either do nothing (for write_8) or return ~0 (for read_8).
Using KASSERT here just makes bus.h too messy from both
polluting bus.h with systm.h (for any number of drivers that include
bus.h without first including systm.h) or ports that use bus.h
directly (i.e. libpciaccess) as reported by zeising@.

Also don't try to implement all of the other bus_space functions for
8 byte access since realistically only these two are needed for some
devices that expose 64-bit memory-mapped registers.

Put the amd64-specific functions here rather than sys/amd64/include/bus.h
so that we can keep this header unified for x86, as requested by mdf@
and tijl@.

Submitted by:	Carl Delsey <carl.r.delsey@intel.com>
MFC after:	3 days
2012-12-13 21:40:11 +00:00
Jim Harris
f2fcc434ee Revert r243960 based on feedback regarding keeping x86 headers unified
(mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@).

Alternate implementation will be made in a separate commit.
2012-12-13 21:27:20 +00:00
Jim Harris
71a30c4436 Add amd64 implementations for 8-byte bus_space routines.
Submitted by:	Carl Delsey <carl.r.delsey@intel.com>
Discussed with:	jhb, rwatson
Reviewed by:	jimharris
MFC after:	1 week
2012-12-06 22:33:31 +00:00
Andriy Gapon
ff08349df5 ioapic_program_intpin: program high bits before low bits
Programming the low bits has a side-effect if unmasking the pin if it is
not disabled.  So if an interrupt was pending then it would be delivered
with the correct new vector but to the incorrect old LAPIC.

This fix could be made clearer by preserving the mask bit while
programming the low bits and then explicitly resetting the mask bit
after all the programming is done.

Probability to trip over the fixed bug could be increased by bootverbose
because printing of the interrupt information in ioapic_assign_cpu
lengthened the time window during which an interrupt could arrive while
a pin is masked.

Reported by:	Andreas Longwitz <longwitz@incore.de>
Tested by:	Andreas Longwitz <longwitz@incore.de>
MFC after:	12 days
2012-12-01 18:16:14 +00:00
Konstantin Belousov
2773649d2f Provide the reading and display of the Standard Extended Features,
introduced with the IvyBridge CPUs.  Provide the definitions for new
bits in CR3 and CR4 registers.

Tested by:	avg, Michael Moll <kvedulv@kvedulv.de>
MFC after:	2 weeks
2012-11-01 15:14:37 +00:00
Eitan Adler
a8de37b024 This isn't functionally identical. In some cases a hint to disable
unit 0 would in fact disable all units.

This reverts r241856

Approved by: cperciva (implicit)
2012-10-22 13:06:09 +00:00
Eitan Adler
76b7512247 Now that device disabling is generic, remove extraneous code from the
device drivers that used to provide this feature.

Reviewed by:	des
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:41:14 +00:00
Attilio Rao
3a4730256a Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by:	bde, kib
Discussed with:	dim, theraven
MFC after:	2 weeks
2012-10-09 14:32:30 +00:00
Attilio Rao
af2bdacafb Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by:	marius
2012-10-09 12:22:43 +00:00
Kevin Lo
954c5baed9 Add missing header needed by free(9).
Spotted by:	David Wolfskill <david at catwhisker dot org>
2012-09-30 15:42:20 +00:00
Kevin Lo
b5db12bfb5 Free result of device_get_children(9). 2012-09-30 09:21:10 +00:00
John Baldwin
960b5a7080 - Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific
bits under #ifdef _KERNEL but leave definitions for various structures
  defined by standards ($PIR table, SMAP entries, etc.) available to
  userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
  and smbios(4) in <machine/pc/bios.h> and make them available to
  userland.

MFC after:	2 weeks
2012-09-28 11:59:32 +00:00