This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.
Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029
The file as is is the maze of #ifdef passages, all slightly different.
Divorcing i386 and amd64 version actually makes changing the code
easier, also no changes for i386 are planned.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31931
This is required since kernel text might be physically located
anywhere below 4G.
PR: 258432
Reported by: Taku YAMAMOTO <taku@tackymt.homeip.net>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31916
Do not use tab between type and variable name in local declarations.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31916
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
Clang 13 produces the following warning for hammer_time_xen():
sys/x86/xen/pv.c:183:19: error: the pointer incremented by -2147483648 refers past the last possible element for an array in 64-bit address space containing 256-bit (32-byte) elements (max possible 576460752303423488 elements) [-Werror,-Warray-bounds]
(vm_paddr_t)start_info->modlist_paddr + KERNBASE;
^ ~~~~~~~~
sys/xen/interface/arch-x86/hvm/start_info.h:131:5: note: array 'modlist_paddr' declared here
uint64_t modlist_paddr; /* Physical address of an array of */
^
This is because the expression first casts start_info->modlist_paddr to
struct hvm_modlist_entry * (via vmpaddr_t), and *then* adds KERNBASE,
which is then interpreted as KERNBASE * sizeof(struct
hvm_modlist_entry).
Instead, parenthesize the addition to get the intended result, and cast
it to struct hvm_modlist_entry * afterwards. Also remove the cast to
vmpaddr_t since it is not necessary.
Reviewed by: royger
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D31711
Add vDSO support for timekeeping devices that support the KVM/XEN
paravirtual clock API.
Also, expose, in the userspace-accessible '<machine/pvclock.h>',
definitions that will be needed by 'libc' to support
'VDSO_TH_ALGO_X86_PVCLK'.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
Add support for the KVM paravirtual clock device.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29733
Consolidate more hypervisor-agnostic functionality behind a new 'struct
pvclock' API.
This should also make it easier to subsequently add hypervisor-agnostic
vDSO timekeeping support.
Also, perform some clean-up:
- Remove 'pvclock_get_last_cycles()'; do not allow external access
to 'pvclock_last_systime' since this is not necessary.
- Consolidate/simplify wall and system time reading codepaths.
- Ensure correct ordering within wall and system time reading
codepaths via 'atomic(9)' and 'rdtsc_ordered()' rather than via
'rmb()'.
- Remove some extra newlines.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc'
appropriate for the invoking x86 variant.
Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed
by 'rdtsc_ordered()' for general use.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31416
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().
Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D31472
MFC after: 2 weeks
Sanitizer instrumentation of course cannot automatically update shadow
state when devices write to host memory. KMSAN thus hooks into busdma,
both to update shadow state after a device write, and to verify that the
kernel does not publish uninitalized bytes to devices.
To implement this, when KMSAN is configured, each dmamap embeds a memory
descriptor describing the region currently loaded into the map.
bus_dmamap_sync() uses the operation flags to determine whether to
validate the loaded region or to mark it as initialized in the shadow
map.
Note that in cases where the amount of data written is less than the
buffer size, the entire buffer is marked initialized even when it is
not. For example, if a NIC writes a 128B packet into a 2KB buffer, the
entire buffer will be marked initialized, but subsequent accesses past
the first 128 bytes are likely caused by bugs.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31338
Use this flag to indicate that busdma should allocate a map structure
even no bouncing is required to satisfy the tag's constraints. This
will be used for KMSAN.
Also fix a memory leak that can occur if the kernel fails to allocate
bounce pages in bounce_bus_dmamap_create().
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31338
Interrupt and exception handlers must call kmsan_intr_enter() prior to
calling any C code. This is because the KMSAN runtime maintains some
TLS in order to track initialization state of function parameters and
return values across function calls. Then, to ensure that this state is
kept consistent in the face of asynchronous kernel-mode excpeptions, the
runtime uses a stack of TLS blocks, and kmsan_intr_enter() and
kmsan_intr_leave() push and pop that stack, respectively.
Use these functions in amd64 interrupt and exception handlers. Note
that handlers for user->kernel transitions need not be annotated.
Also ensure that trap frames pushed by the CPU and by handlers are
marked as initialized before they are used.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31467
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
KASAN and KCSAN implement interceptors for various primitive operations
that are not instrumented by the compiler. KMSAN requires them as well.
Rather than adding new cases for each sanitizer which requires
interceptors, implement the following protocol:
- When interceptor definitions are required, define
SAN_NEEDS_INTERCEPTORS and SANITIZER_INTERCEPTOR_PREFIX.
- In headers that declare functions which need to be intercepted by a
sanitizer runtime, use SANITIZER_INTERCEPTOR_PREFIX to provide
declarations.
- When SAN_RUNTIME is defined, do not redefine the names of intercepted
functions. This is typically the case in files which implement
sanitizer runtimes but is also needed in, for example, files which
define ifunc selectors for intercepted operations.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
There is no reason now why do we need to allocate trampoline page very
early in the boot process. The only requirement for the page is that
it is below 1M to be usable by the real mode during init. This can be
handled by vm_alloc_contig() when we do the startup.
Also assert that startup trampoline fits into single page. In principle
we can do multi-page allocation if needed, but it is not.
Move the alloc_ap_trampoline() function and the boot_address variable to
i386/mp_machdep.c. Keep existing mechanism of early alloc on i386.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D31343
Before this change my dual-Xeon(R) Gold 6242R always reported 3 levels
or topology (root, package/L3 and core/L2). But with SMT disabled
core/L2 matches thread, so additional topology level only causes more
traversal work. With this change SMT case is reported same as before,
while non-SMT is reported with only 2 much more simple levels.
MFC after: 2 weeks
xen_vector_callback_enabled is x86 specific and availability of
per-cpu event channel delivery differs on other architectures.
Introduce a new helper to check if there's support for per-cpu event
channel injection.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29402
While x86 only register PV shutdown handler for PV guests. ARM guests
are always using HVM and requires the PV shutdown handler.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29406
ARM guest is considered as HVM in Freebsd but they only support PV disk
(no emulation available).
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29403
ARM guest is considered as HVM but it only supports PV nics (no
emulation available).
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29405
Several of x86 enable/disable functions depend upon the xen*domain()
functions. As such the xen*domain() functions need to be declared
before machine/xen-os.h.
Officially declare direct inclusion of machine/xen/xen-os.h verboten as
such will break these functions/macros. Remove one such soon to be
broken inclusion.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29811
Functions tend to get renamed and unless the developer is careful
often debugging messages are missed. As such using func is far
superior. Replace several instances of hard-coded function names.
Reviewed by: royger
Differential revision: https://reviews.freebsd.org/D29499
Since xen_intr_handle_t is meant to be an opaque handle and the only
use is retrieving the associated struct xenisrc *, directly use it as
the opaque handler.
Also add a wrapper function for converting the other direction. If some
other value becomes appropriate in the future, these two functions will
be the only spots needing modification.
Reviewed by: mhorne, royger
Differential Revision: https://reviews.freebsd.org/D29500
The requirements for pages shared with Xen/other VMs may vary from
architecture to architecture. As such create a macro which various
architectures can use.
Remove a use of PAT_WRITE_BACK in xenstore.c. This is a x86-ism which
shouldn't have been present in a common area.
Original idea: Julien Grall <julien@xen.org>, 2014-01-14 06:44:08
Approach suggested by: royger
Reviewed by: royger, mhorne
Differential Revision: https://reviews.freebsd.org/D29351
Minor changes are necessary to make this processor-independent, but
moving the file out of x86 and into common is the first step (so
others don't add /more/ x86-isms).
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29042
Stop using temporal page table with 1:1 mapping of low 1G populated over
the whole VA. Use 1:1 mapping of low 4G temporarily installed in the
normal kernel page table.
The features are:
- now there is one less step for startup asm to perform
- the startup code still needs to be at lower 1G because CPU starts in
real mode. But everything else can be located anywhere in low 4G
because it is accessed by non-paged 32bit protected mode. Note that
kernel page table root page is at low 4G, as well as the kernel itself.
- the page table pages can be allocated by normal allocator, there is
no need to carve them from the phys_avail segments at very early time.
The allocation of the page for startup code still requires some magic.
Pages are freed after APs are ignited.
- la57 startup for APs is less tricky, we directly load the final page
table and do not need to tweak the paging mode.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.
The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
implemented in the vDSO: for now gettimeofday, clock_gettime.
The first two have been implemented, so add the implementation of system
calls.
System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).
In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.
Tested by: trasz (arm64)
Differential revision: https://reviews.freebsd.org/D30900
MFC after: 2 weeks
Many of these typedefs are the same across all architectures or can
be set based on an architecture-independent compiler-provided macro
(e.g. __SIZEOF_SIZE_T__). These macros have been available since GCC 4.6
and Clang sometime before 3.0 (godbolt.org does not have any older clang
versions installed).
I originally considered using the compiler-provided `__FOO_TYPE__` directly.
However, in order to do so we have to check that those match the previous
typedef exactly (not just that they have the same size) since any change
would be an ABI break. For example, changing `long` to `long long` results
in different C++ name mangling. Additionally, Clang and GCC disagree on
the underlying type for some of (u)int*_fast_t types, so this change
only moves the definitions that are identical across all architectures
and does not touch those types.
This de-deduplication will allow us to have a smaller diff downstream in
CheriBSD: we only have to only change the (u)intptr_t definition in
sys/_types.h in CheriBSD instead of having to change machine/_types.h for
all CHERI-enabled architectures (currently RISC-V, AArch64 and MIPS).
Reviewed By: imp, kib
Differential Revision: https://reviews.freebsd.org/D29895
From the PR:
Almost always, my Samsung RF511 laptop could not boot with
x2APIC enabled in the kernel. It froze during SMP initialization,
shortly after "ACPI APIC Table: <SECCSD LH43STAR>" was printed
to the console. When the kernel is instructed not to use x2APIC,
the system boots correctly.
PR: 256389
Submitted by: David Sebek <dasebek@gmail.com>
Reviewed by: markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30624
to prepare for one more addition
Reviewed by: markj
Tested by: David Sebek <dasebek@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30624
setidt_disp is the offset of the ISR trampoline relative to the address
of the routines in exception.s, so uintptr_t is not quite right.
Also remove a bogus declaration I added in commit 18f55c67f7, it is not
required after all.
Reported by: jrtc27
Reviewed by: jrtc27, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30590
The loop which checks to see if "dynamic" IDT entries are allocated
needs to compare with the trampoline address of the reserved ISR.
Otherwise it will never succeed.
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30576
The AP startup extern variable declarations are not longer needed,
since PVHv2 uses the native AP startup path using the lapic. Remove
the declaration and make the variables static to mp_machdep.c
Sponsored by: Citrix Systems R&D
PVHv1 was officially removed from Xen in 4.9, so just axe the related
code from FreeBSD.
Note FreeBSD supports PVHv2, which is the replacement for PVHv1.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, Elliott Mitchell
Differential Revision: https://reviews.freebsd.org/D30228