i386 codegen insists on preloading tc_priv into register on i386, and
this register cannot be %eax because RDTSCP instruction clobbers it
before it is used.
Reported and tested by: dim
MFC after: 6 days
Sponsored by: The FreeBSD Foundation
Use it in preference of Xfenced RDTSC if RDTSCP is supported. It is
recommended by both Intel and AMD. But, on AMD Zens and newer use
LFENCE, as recommended by AMD [*]. In particular, this means that now
AMD CPUs use more appropriate fence instead of too harsh MFENCe.
Add comment explaining the intent of the selection logic.
Reported by: gallatin [*]
Reviewed by: gallatin, markj
Tested by: gallatin, pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
I suspect that virtualization techniques improved from the time when we
have to effectively disable TSC use in VM. For instance, it was reported
(complained) in https://github.com/JuliaLang/julia/issues/38877 that
FreeBSD is groundlessly slow on AWS with some loads.
Remove the check and start watching for complaints.
Reviewed by: emaste, grehan
Discussed with: cperciva
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27629
PCI memory address space is shared between memory-mapped devices (MMIO)
and host memory (which may be remapped by an IOMMU). Device accesses to
an address within a memory aperture in a PCIe root port will be treated
as peer-to-peer and not forwarded to an IOMMU. To avoid this, reserve
the address space of the root port's memory apertures in the address
space used by the IOMMU for remapping.
Reviewed by: kib, tychon
Discussed with: Anton Rang <rang@acm.org>
Tested by: tychon
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D27503
Add an explicit thread fence release before returning from
bus_dmamap_sync. This should be a no-op in practice, but makes
explicit that all ordinary stores will be completed before subsequent
reads/writes to ordinary device memory. On x86, normal memory ordering
is strong enough to generally guarantee this. The fence keeps the
optimizer (likely LTO) from reordering other calls around this.
The other architectures already have calls, as appropriate, that
are equivalent.
Note: On x86, there is one exception to this rule. If you've mapped
memory as write combining, then you will need to add a sfence or
similar. Normally, though, busdma doesn't operate on such memory, and
drivers that do already cope appropriately.
Reviewed by: kib@, gallatin@, chuck@, mav@
Differential Revision: https://reviews.freebsd.org/D27448
This is useful for stack unwinders which need to avoid out-of-bounds
reads of a kernel stack which can trigger kernel faults.
Reviewed by: kib, markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27356
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.
struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.
Differential Revision: https://reviews.freebsd.org/D27373
Actually check the wrmsr_safe() return value when setting autonomous
HWP for package.
PR: 245582
Differential Revision: https://reviews.freebsd.org/D24744
This is needed on arm64 for the interface between iommu framework
and iommu controller drivers.
Reviewed by: kib
Sponsored by: Innovate DSbD
Differential Revision: https://reviews.freebsd.org/D27229
platforms.
This allows to not depend on the IOMMU macro in AHCI driver.
Requested by: kib
Suggested by: andrew
Reviewed by: kib
Sponsored by: Innovate DSbD
Differential Revision: https://reviews.freebsd.org/D26887
From Linux sources and several datasheets I looked at, it seems that
the workaround is only needed on families 0xf and 0x10. For instance,
Ryzens do not implement the accessed MSR at all, it is documented as
reserved. Also, hypervisors should not allow guest to put CPU into
idle state, so activate workaround only when on bare hardware.
While there, style the code:
move MSR defines to specialreg.h
move identification to initcpu.c
Reported by: whu
Reviewed by: avg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26470
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h. This fixes default gcc build for mips.
Reviewed by: alc, scottph
Tested by: kevans (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26741
The prototype was added with the creation of kern_shutdown.c in r17658,
but it appears to have never been implemented. Remove it now.
Reviewed by: cem, kib
Differential Revision: https://reviews.freebsd.org/D26702
Dynamically created OIDs automatically get this flag set.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26561
This is mostly needed for a common arm64/amd64 iommu code.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26587
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.
Suggested by: kevans
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.
Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.
This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25371
that can be extended, but also ensure compile-time type checking. Refactor
common code out of arch-specific implementations. Move the mpr and mps
drivers to this new API. The template type remains visible to the consumer
so that it can be allocated on the stack, but should be considered opaque.
For configurations without x2APIC support (guests, older hardware), the global
LAPIC MMIO mapping will trigger false-positive KCSan reports as it will appear
that multiple CPUs are concurrently reading and writing the same address.
This isn't actually true, as the underlying physical access will be performed
on the local CPU's APIC. Additionally, because LAPIC access can happen during
event timer configuration, the resulting KCSan printf can produce a panic due
to attempted recursion on event timer resources.
Add a __nosanitizethread preprocessor define to prevent the compiler from
inserting TSan hooks, and apply it to the x86 LAPIC accessors.
PR: 249149
Reported by: gbe
Reviewed by: andrew, kib
Tested by: gbe
Differential Revision: https://reviews.freebsd.org/D26354
It could be used in various IOMMU platforms, not only DMAR.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26373
We can switch into long mode directly with LA57 enabled.
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25273
so we don't ifdef for every arch in busdma_iommu.c;
o No need to include specialreg.h for x86, remove it.
Requested by: andrew
Reviewed by: kib
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D25957
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches. They no longer serve any
purpose since UMA now provides their functionality by default. Remove
them to simplyify the kernel memory allocator interfaces a bit.
Reviewed by: cem, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25937
so x86 can support Intel DMAR and AMD IOMMU simultaneously.
Reviewed by: kib
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D25894
APEI allows platform to report different kinds of errors to OS in several
ways. We've found that Supermicro X10/X11 motherboards report PCIe errors
appearing on hot-unplug via this interface using NMI. Without respective
driver it ended up in kernel panic without any additional information.
This driver introduces support for the APEI Generic Hardware Error Source
reporting via NMI, SCI or polling. It decodes the reported errors and
either pass them to pci(4) for processing or just logs otherwise. Errors
marked as fatal still end up in kernel panic, but some more informative.
When somebody get to native PCIe AER support implementation both of the
reporting mechanisms should get common error recovery code. Since in our
case errors happen when the device is already gone, there is nothing to
recover, so the code just clears the error statuses, practically ignoring
the otherwise destructive NMIs in nicer way.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.