Make it easy to define interceptors for new sanitizer runtimes, rather
than assuming KCSAN. Lay a bit of groundwork for KASAN and KMSAN.
When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions
in atomic_san.h are used by default instead of the inline
implementations in the platform's atomic.h. These definitions are
implemented in the sanitizer runtime, which includes
machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual
implementations.
No functional change intended.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
This is a prerequisite to using these functions outside of ddb, but also
provides some cleanup and minor refactoring. This code is almost
entirely duplicated between the two implementations, the only
significant difference being the lack of dbreg synchronization on i386.
Cleanups are:
- demote some internal functions to static
- use the constant NDBREGS instead of a '4' literal
- remove K&R definitions
- some added comments
Reviewed by: kib, jhb
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29153
This is x86-only and so should not be in the common area.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential revision: https://reviews.freebsd.org/D29040
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.
No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29103
Allow setting the bootmethod variable from the Xen PVH entry point, in
order to be able to correctly set the underlying firmware mode when
booted as a dom0.
Move the bootmethod variable to be defined in x86/cpu_machdep.c
instead so it can be shared by both i386 and amd64.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D28619
There's a currently ad-hoc protocol to hand off the FreeBSD kernel
payload between the loader and the kernel itself when Xen is in the
middle of the picture. Such protocol wasn't very resilient to changes
to the loader itself, because it relied on moving metadata around to
package it using a certain layout. This has proven to be fragile, so
replace it with a more robust version.
The new protocol requires using a xen_header structure that will be
used to pass data between the FreeBSD loader and the FreeBSD kernel
when booting in dom0 mode. At the moment the only data conveyed is the
offset of the start of the module metadata relative to the start of the
module itself.
This is a slightly disruptive change since it also requires a change
to the kernel which is contained in this patch. In order to update
with this change the kernel must be updated before updating the
loader, as described in the handbook. Note this is only required when
booting a FreeBSD/Xen dom0. This change doesn't affect the normal
FreeBSD boot protocol.
This fixes booting FreeBSD/Xen in dom0 mode after
3630506b9d.
Sponsored by: Citrix Systems R&D
MFC after: 3 days
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D28411
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.
struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.
Differential Revision: https://reviews.freebsd.org/D27373
From Linux sources and several datasheets I looked at, it seems that
the workaround is only needed on families 0xf and 0x10. For instance,
Ryzens do not implement the accessed MSR at all, it is documented as
reserved. Also, hypervisors should not allow guest to put CPU into
idle state, so activate workaround only when on bare hardware.
While there, style the code:
move MSR defines to specialreg.h
move identification to initcpu.c
Reported by: whu
Reviewed by: avg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26470
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.
Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.
This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25371
so we don't ifdef for every arch in busdma_iommu.c;
o No need to include specialreg.h for x86, remove it.
Requested by: andrew
Reviewed by: kib
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D25957
APEI allows platform to report different kinds of errors to OS in several
ways. We've found that Supermicro X10/X11 motherboards report PCIe errors
appearing on hot-unplug via this interface using NMI. Without respective
driver it ended up in kernel panic without any additional information.
This driver introduces support for the APEI Generic Hardware Error Source
reporting via NMI, SCI or polling. It decodes the reported errors and
either pass them to pci(4) for processing or just logs otherwise. Errors
marked as fatal still end up in kernel panic, but some more informative.
When somebody get to native PCIe AER support implementation both of the
reporting mechanisms should get common error recovery code. Since in our
case errors happen when the device is already gone, there is nothing to
recover, so the code just clears the error statuses, practically ignoring
the otherwise destructive NMIs in nicer way.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs. On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY. To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25754
It allows safe IPI sending to current CPU from NMI context.
Unlike other ipi_*() functions this waits for delivery to leave LAPIC in
a state safe for interrupted code.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
pmc_cpuid was uninitialized for most AMD processor families. We can still
populate this string for unimplemented families.
Also added a CPUID_TO_STEPPING macro and converted existing code to use it.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25673
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu. Now each CPU can initiate
shootdown IPI independently from other CPUs. Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu. After that IPI is sent to all targets
which scan for zeroed scoreboard generation words. Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set. Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.
Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.
The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.
Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.
Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
this is fine because we in fact do not send IPIs from interrupt
handlers. More for !x2APIC mode where ICR access for write requires
two registers write, we disable interrupts around it. If considered
incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
cache line, to reduce ping-pong.
Reviewed by: markj (previous version)
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25510
Right now code first flushes all local TLB entries that needs to be
flushed, then signals IPI to remote cores, and then waits for
acknowledgements while spinning idle. In the VMWare article 'Don’t
shoot down TLB shootdowns!' it was noted that the time spent spinning
is lost, and can be more usefully used doing local TLB invalidation.
We could use the same invalidation handler for local TLB as for
remote, but typically for pmap == curpmap we can use INVLPG for locals
instead of INVPCID on remotes, since we cannot control context
switches on them. Due to that, keep the local code and provide the
callbacks to be called from smp_targeted_tlb_shootdown() after IPIs
are fired but before spin wait starts.
Reviewed by: alc, cem, markj, Anton Rang <rang at acm.org>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D25188
The flush is needed to prevent cross-process ret2spec, which is not handled
on kernel entry if IBPB is enabled but SMEP is present.
While there, add i386 RSB flush.
Reported by: Anthony Steinhauser <asteinhauser@google.com>
Reviewed by: markj, Anthony Steinhauser
Discussed with: philip
admbugs: 961
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
When turning IBRS mitigation using sysctl, as opposed to loader tunable,
send IPI to tweak MSR on all cores. Right now code only performed MSR write
onr the CPU where sysctl was run.
Properly report hw.ibrs_active for IBRS_ALL. Split hw_ibrs_ibpb_active out
from ibrs_active, to keep the current semantic of guiding kernel entry and
exit handlers.
Reported and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Per Intel SDM (Vol 3b Part 2), if HWP indicates EPP (energy-performance
preference) is not supported, the hardware instead uses the ENERGY_PERF_BIAS
MSR. In the epp sysctl handler, fall back to that MSR if HWP does not
support EPP and CPUID indicates the ENERGY_PERF_BIAS MSR is supported.
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.
Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.
Simplify the KPIs. Remove stack_save_td_running() and add a return
value to stack_save_td(). On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP. If the
target thread is running in user mode, stack_save_td() returns EBUSY.
Reviewed by: kib
Reported by: mjg, pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23355
As a new x86 CPU vendor, Chengdu Haiguang IC Design Co., Ltd (Hygon)
is a joint venture between AMD and Haiguang Information Technology Co.,
Ltd., aims at providing x86 processors for China server market.
The first generation Hygon processor(Dhyana) shares most architecture
with AMD's family 17h, but with different CPU vendor ID("HygonGenuine")
and PCI vendor ID(0x1d94) and family series number 18h(Hygon negotiated
with AMD to confirm that only Hygon use family 18h).
To enable Hygon Dhyana support in FreeBSD, add new definitions
HYGON_VENDOR_ID("HygonGenuine") and X86_VENDOR_HYGON(0x1d94) to identify
Hygon Dhyana CPU.
Initialize the CPU features(topology, local APIC ext, MSI, TSC, hwpstate,
MCA, DEBUG_CTL, etc) for amd64 and i386 mode by sharing the code path of
AMD family 17h.
The changes have been applied on FreeBSD 13.0-CURRENT and tested
successfully on Hygon Dhyana processor.
References:
[1] Linux kernel patches for Hygon Dhyana, merged in 4.20:
https://git.kernel.org/tip/c9661c1e80b609cd038db7c908e061f0535804ef
[2] MSR and CPUID definition:
https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
Submitted by: Pu Wen <puwen@hygon.cn>
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23163
The header is abused for inclusion into userspace, and on stable
branches neither device_t nor bool types are not defined when used
from userspace.
Sponsored by: The FreeBSD Foundation
X-MFC after: now
Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in
the FreeBSD kernel. It is a useful tool for finding data races between
threads executing on different CPUs.
This can be enabled by enabling KCSAN in the kernel config, or by using the
GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later
needs a compiler change to allow -fsanitize=thread that KCSAN uses.
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22315
context should share page tables.
Practically it means that dma requests from any device on the bus are
translated according to the entries loaded for the bus:0:0 device.
KPI requires that the slot and function of the device be 0:0, and that
no tags for other devices on the bus were used.
The intended use are NTBs which pass TLPs from the downstream to the
host with slot:func of the downstream originator.
Reviewed and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22434
Use the KPI to tweak MSRs in mitigation code.
Reviewed by: markj, scottl
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22431
This CVE has already been announced in FreeBSD SA-19:26.mcu.
Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.
Control knobs are:
machdep.mitigations.taa.enable:
0 - no software mitigation is enabled
1 - attempt to disable TSX
2 - use the VERW mitigation
3 - automatically select the mitigation based on processor
features.
machdep.mitigations.taa.state:
inactive - no mitigation is active/enabled
TSX disable - TSX is disabled in the bare metal CPU as well as
- any virtualized CPUs
VERW - VERW instruction clears CPU buffers
not vulnerable - The CPU has identified itself as not being
vulnerable
Nothing in the base FreeBSD system uses TSX. However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.
Reviewed by: emaste, imp, kib, scottph
Sponsored by: Intel
Differential Revision: 22374
Disable the use of executable 2M page mappings in EPT-format page
tables on affected CPUs. For bhyve virtual machines, this effectively
disables all use of superpage mappings on affected CPUs. The
vm.pmap.allow_2m_x_ept sysctl can be set to override the default and
enable mappings on affected CPUs.
Alternate approaches have been suggested, but at present we do not
believe the complexity is warranted for typical bhyve's use cases.
Reviewed by: alc, emaste, markj, scottl
Security: CVE-2018-12207
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21884
Rather than a few scattered places in the tree. Organize flag names in a
contiguous region of specialreg.h.
While here, delete deprecated PCOMMIT from leaf 7.
No functional change.
The former spelling probably confused MOVDIR64B with MOVDIRI64.
MOVDIR_64B is the 64-*byte* direct store instruction; MOVDIR_I64 is the
64-*bit* direct store instruction (underscores added here for clarity; they are
not part of the canonical instruction name).
No functional change.
Sponsored by: Dell EMC Isilon