XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions).
Obtained from: Intel's Instruction Set Extensions Programming Reference
(March 2014)
The NetBSD Foundation states "Third parties are encouraged to change the
license on any files which have a 4-clause license contributed to the
NetBSD Foundation to a 2-clause license."
This change removes clauses 3 and 4 from copyright / license blocks that
list The NetBSD Foundation as the only copyright holder.
Sponsored by: The FreeBSD Foundation
AP startup on PVH follows the PV method, so we need to add a hook in
order to diverge from bare metal.
Approved by: gibbs
Sponsored by: Citrix Systems R&D
amd64/amd64/machdep.c:
- Add hook for start_all_aps on native (using native_start_all_aps
defined in mp_machdep).
amd64/amd64/mp_machdep.c:
- Make some variables global because they will also be used by the
Xen PVH AP startup code.
- Use the start_all_aps hook to start APs.
- Rename start_all_aps to native_start_all_aps.
amd64/include/smp.h:
- Add declaration for native_start_all_aps.
x86/include/init.h:
- Declare start_all_aps hook in init_ops.
x86/xen/pv.c:
- Pick external declarations from mp_machdep.
- Introduce Xen PV code to start APs on PVH.
- Set start_all_aps init hook to use the Xen PVH implementation.
This hook will only be implemented for bare metal, Xen doesn't require
any bootstrap code since APs are started in long mode with paging
enabled.
Approved by: gibbs
Sponsored by: Citrix Systems R&D
amd64/amd64/machdep.c:
- Set mp_bootaddress hook for bare metal.
x86/include/init.h:
- Define mp_bootaddress in init_ops.
e820 memory map is fetched using a hypercall under Xen PVH, so add a
hook to init_ops in oder to diverge from bare metal and implement a
Xen variant.
Approved by: gibbs
Sponsored by: Citrix Systems R&D
x86/include/init.h:
- Add a parse_memmap hook to init_ops, that will be called to fetch
and parse the memory map.
amd64/amd64/machdep.c:
- Decouple the fetch and the parse of the memmap, so the parse
function can be shared with Xen code.
- Move code around in order to implement the parse_memmap hook.
amd64/include/pc/bios.h:
- Declare bios_add_smap_entries (implemented in machdep.c).
x86/xen/pv.c:
- Implement fetching of e820 memmap when running as a PVH guest by
using the XENMEM_memory_map hypercall.
When running as a PVH guest, there's no emulated i8254, so we need to
use the Xen PV timer as the early source for DELAY. This change allows
for different implementations of the early DELAY function and
implements a Xen variant for it.
Approved by: gibbs
Sponsored by: Citrix Systems R&D
dev/xen/timer/timer.c:
dev/xen/timer/timer.h:
- Implement Xen early delay functions using the PV timer and declare
them.
x86/include/init.h:
- Add hooks for early clock source initialization and early delay
functions.
i386/i386/machdep.c:
pc98/pc98/machdep.c:
amd64/amd64/machdep.c:
- Set early delay hooks to use the i8254 on bare metal.
- Use clock_init (that will in turn make use of init_ops) to
initialize the early clock source.
amd64/include/clock.h:
i386/include/clock.h:
- Declare i8254_delay and clock_init.
i386/xen/clock.c:
- Rename DELAY to i8254_delay.
x86/isa/clock.c:
- Introduce clock_init that will take care of initializing the early
clock by making use of the init_ops hooks.
- Move non ISA related delay functions to the newly introduced delay
file.
x86/x86/delay.c:
- Add moved delay related functions.
- Implement generic DELAY function that will use the init_ops hooks.
x86/xen/pv.c:
- Set PVH hooks for the early delay related functions in init_ops.
conf/files.amd64:
conf/files.i386:
conf/files.pc98:
- Add delay.c to the kernel build.
Add hooks to amd64 in order to have diverging implementations, since
on Xen PV the metadata is passed to the kernel in a different form.
Approbed by: gibbs
Sponsored by: Citrix Systems R&D
amd64/amd64/machdep.c:
- Define init_ops for native.
- Put native code inside of native_parse_preload_data hook.
- Call the parse_preload_data in order to fill the metadata info.
x86/include/init.h:
- Declare the init_ops struct.
x86/xen/pv.c:
- Declare xen_init_ops that contains the Xen PV implementation of
init_ops.
- Implement the parse_preload_data for Xen PVH, the info is fetched
from HYPERVISOR_start_info->cmd_line as provided by Xen.
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
PCI domain/segment. Host-PCI bridge drivers use this API to allocate
bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
full range of bus numbers from secbus to subbus from their parent bridge.
The drivers also always program their primary bus register. The bridge
drivers also support growing their bus range by extending the bus resource
and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.
Reviewed by: imp
MFC after: 1 month
obsolete. This involves the following pieces:
- Remove it entirely on PowerPC, where it is not used by MD code either
- Remove all references to machine/fdt.h in non-architecture-specific code
(aside from uart_cpu_fdt.c, shared by ARM and MIPS, and so is somewhat
non-arch-specific).
- Fix code relying on header pollution from machine/fdt.h includes
- Legacy fdtbus.c (still used on x86 FDT systems) now passes resource
requests to its parent (nexus). This allows x86 FDT devices to allocate
both memory and IO requests and removes the last notionally MI use of
fdtbus_bs_tag.
- On those architectures that retain a machine/fdt.h, unused bits like
FDT_MAP_IRQ and FDT_INTR_MAX have been removed.
all the structures. While here, move a helper struct only used in
the kernel parser out of this header since it is not part of the MP
specification itself.
Debuggers may need to change PSL_RF. Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could be
modified via sigreturn, so this change should not provide any new ability
to userspace.
For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html
Reviewed by: jhb, kib
Sponsored by: DARPA, AFRL
described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf.
Partially based on the patch submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
corresponding x86 trap type. Userland DTrace probes are currently handled
by the other fasttrap hooks (dtrace_pid_probe_ptr and
dtrace_return_probe_ptr).
Discussed with: rpaulo
busdma implementations to coexist. Copy busdma_machdep.c to
busdma_bounce.c, which is still a single implementation of the busdma
interface on x86 for now. The busdma_machdep.c only contains common
and dispatch code.
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Re-structure Xen HVM support so that:
- Xen is detected and hypercalls can be performed very
early in system startup.
- Xen interrupt services are implemented using FreeBSD's native
interrupt delivery infrastructure.
- the Xen interrupt service implementation is shared between PV
and HVM guests.
- Xen interrupt handlers can optionally use a filter handler
in order to avoid the overhead of dispatch to an interrupt
thread.
- interrupt load can be distributed among all available CPUs.
- the overhead of accessing the emulated local and I/O apics
on HVM is removed for event channel port events.
- a similar optimization can eventually, and fairly easily,
be used to optimize MSI.
Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:
Sponsored by: Spectra Logic Corporation
Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:
Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
Reserve IDT vector 0x93 for the Xen event channel upcall
interrupt handler. On Hypervisors that support the direct
vector callback feature, we can request that this vector be
called directly by an injected HVM interrupt event, instead
of a simulated PCI interrupt on the Xen platform PCI device.
This avoids all of the overhead of dealing with the emulated
I/O APIC and local APIC. It also means that the Hypervisor
can inject these events on any CPU, allowing upcalls for
different ports to be handled in parallel.
sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
Map Xen per-vcpu area during AP startup.
sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
Increase the FreeBSD IRQ vector table to include space
for event channel interrupt sources.
sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
Remove Xen HVM per-cpu variable data. These fields are now
allocated via the dynamic per-cpu scheme. See xen_intr.c
for details.
sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
Prefer FreeBSD primatives to Linux ones in Xen support code.
sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
Pull common Xen OS support functions/settings into xen/xen-os.h.
sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
Remove constants, macros, and functions unused in FreeBSD's Xen
support.
sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
Introduce new functions xen_domain(), xen_pv_domain(), and
xen_hvm_domain(). These are used in favor of #ifdefs so that
FreeBSD can dynamically detect and adapt to the presence of
a hypervisor. The goal is to have an HVM optimized GENERIC,
but more is necessary before this is possible.
sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
Refactor magic ioport, Hypercall table and Hypervisor shared
information page setup, and move it to a dedicated HVM support
module.
HVM mode initialization is now triggered during the
SI_SUB_HYPERVISOR phase of system startup. This currently
occurs just after the kernel VM is fully setup which is
just enough infrastructure to allow the hypercall table
and shared info page to be properly mapped.
sys/xen/hvm.h:
sys/x86/xen/hvm.c:
Add definitions and a method for configuring Hypervisor event
delievery via a direct vector callback.
sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:
sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
Adjust kernel build to reflect the refactoring of early
Xen startup code and Xen interrupt services.
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
Adjust drivers to use new xen_intr_*() API.
sys/dev/xen/blkback/blkback.c:
Since blkback defers all event handling to a taskqueue,
convert this task queue to a "fast" taskqueue, and schedule
it via an interrupt filter. This avoids an unnecessary
ithread context switch.
sys/xen/xenstore/xenstore.c:
The xenstore driver is MPSAFE. Indicate as much when
registering its interrupt handler.
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
Remove unused event channel APIs.
sys/xen/evtchn.h:
Remove all kernel Xen interrupt service API definitions
from this file. It is now only used for structure and
ioctl definitions related to the event channel userland
device driver.
Update the definitions in this file to match those from
NetBSD. Implementing this interface will be necessary for
Dom0 support.
sys/xen/evtchn/evtchnvar.h:
Add a header file for implemenation internal APIs related
to managing event channels event delivery. This is used
to allow, for example, the event channel userland device
driver to access low-level routines that typical kernel
consumers of event channel services should never access.
sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
Standardize on the evtchn_port_t type for referring to
an event channel port id. In order to prevent low-level
event channel APIs from leaking to kernel consumers who
should not have access to this data, the type is defined
twice: Once in the Xen provided event_channel.h, and again
in xen/xen_intr.h. The double declaration is protected by
__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
twice within a given compilation unit.
sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
New implementation of Xen interrupt services. This is
similar in many respects to the i386 PV implementation with
the exception that events for bound to event channel ports
(i.e. not IPI, virtual IRQ, or physical IRQ) are further
optimized to avoid mask/unmask operations that aren't
necessary for these edge triggered events.
Stubs exist for supporting physical IRQ binding, but will
need additional work before this implementation can be
fully shared between PV and HVM.
sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
Add support for placing vcpu_info into an arbritary memory
page instead of using HYPERVISOR_shared_info->vcpu_info.
This allows the creation of domains with more than 32 vcpus.
sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
Add support for new event channle implementation.
1. Common headers for fdt.h and ofw_machdep.h under x86/include
with indirections under i386/include and amd64/include.
2. New modinfo for loader provided FDT blob.
3. Common x86_init_fdt() called from hammer_time() on amd64 and
init386() on i386.
4. Split-off FDT specific low-level console functions from FDT
bus methods for the uart(4) driver. The low-level console
logic has been moved to uart_cpu_fdt.c and is used for arm,
mips & powerpc only. The FDT bus methods are shared across
all architectures.
5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
fdt_pic_table[] arrays. Both are empty right now.
FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.
Obtained from: Juniper Networks, Inc.
machine/signal.h and machine/ucontext.h into common x86 includes,
copying from amd64 and merging with i386.
Kernel-only compat definitions are kept in the i386/include/sigframe.h
and i386/include/signal.h, to reduce amd64 kernel namespace pollution.
The amd64 compat uses its own definitions so far.
The _MACHINE_ELF_WANT_32BIT definition is to allow the
sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions
on the amd64 compile host. The same hack could be usefully abused by
other code too.
Rather than trying to KASSERT for callers that invoke this on
IO tags, either do nothing (for write_8) or return ~0 (for read_8).
Using KASSERT here just makes bus.h too messy from both
polluting bus.h with systm.h (for any number of drivers that include
bus.h without first including systm.h) or ports that use bus.h
directly (i.e. libpciaccess) as reported by zeising@.
Also don't try to implement all of the other bus_space functions for
8 byte access since realistically only these two are needed for some
devices that expose 64-bit memory-mapped registers.
Put the amd64-specific functions here rather than sys/amd64/include/bus.h
so that we can keep this header unified for x86, as requested by mdf@
and tijl@.
Submitted by: Carl Delsey <carl.r.delsey@intel.com>
MFC after: 3 days
introduced with the IvyBridge CPUs. Provide the definitions for new
bits in CR3 and CR4 registers.
Tested by: avg, Michael Moll <kvedulv@kvedulv.de>
MFC after: 2 weeks
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.
Reviewed by: bde, kib
Discussed with: dim, theraven
MFC after: 2 weeks
mostly meets the guidelines set by the Intel SDM:
1. We use XRSTOR and XSAVE from the same CPL using the same linear
address for the store area
2. Contrary to the recommendations, we cannot zero the FPU save area
for a new thread, since fork semantic requires the copy of the
previous state. This advice seemingly contradicts to the advice
from the item 6.
3. We do use XSAVEOPT in the context switch code only, and the area
for XSAVEOPT already always contains the data saved by XSAVE.
4. We do not modify the save area between XRSTOR, when the area is
loaded into FPU context, and XSAVE. We always spit the fpu context
into save area and start emulation when directly writing into FPU
context.
5. We do not use segmented addressing to access save area, or rather,
always address it using %ds basing.
6. XSAVEOPT can be only executed in the area which was previously
loaded with XRSTOR, since context switch code checks for FPU use by
outgoing thread before saving, and thread which stopped emulation
forcibly get context loaded with XRSTOR.
7. The PCB cannot be paged out while FPU emulation is turned off, since
stack of the executing thread is never swapped out.
The context switch code is patched to issue XSAVEOPT instead of XSAVE
if supported. This approach eliminates one conditional in the context
switch code, which would be needed otherwise.
For user-visible machine context to have proper data, fpugetregs()
checks for unsaved extension blocks and manually copies pristine FPU
state into them, according to the description provided by CPUID leaf
0xd.
MFC after: 1 month
This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the
ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an
unsigned short with the former preferred.
Because of this requirement we need to move the definition of __wchar_t to
a machine dependent header. It also cleans up the macros defining the limits
of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine
dependent header then using them to define WCHAR_MIN and WCHAR_MAX
respectively.
Discussed with: bde
usermode, using shared page. The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.
The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.
The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.
Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.
Minimal stubs neccessary for non-x86 architectures to still compile
are provided.
Discussed with: bde
Reviewed by: jhb
Tested by: flo
MFC after: 1 month
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible. Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).
MFC after: 2 weeks
that revision, the bswapXX_const() macros were renamed to bswapXX_gen().
Also, bswap64_gen() was implemented as two calls to bswap32(), and
similarly, bswap32_gen() as two calls to bswap16(). This mainly helps
our base gcc to produce more efficient assembly.
However, the arguments are not properly masked, which results in the
wrong value being calculated in some instances. For example,
bswap32(0x12345678) returns 0x7c563412, and bswap64(0x123456789abcdef0)
returns 0xfcdefc9a7c563412.
Fix this by appropriately masking the arguments to bswap16() in
bswap32_gen(), and to bswap32() in bswap64_gen(). This should also
silence warnings from clang.
Submitted by: jh
revision has two problems:
- It can produce worse code with both clang and gcc.
- It doesn't fix the actual issue introduced in r232721, which will be
fixed in the next commit.
Submitted by: bde, tijl and jh
Pointy hat to: dim