- Add constants for fields in DR6 and the reserved fields in DR7. Use
these constants instead of magic numbers in most places that use DR6
and DR7.
- Refer to T_TRCTRAP as "debug exception" rather than a "trace trap"
as it is not just for trace exceptions.
- Always read DR6 for debug exceptions and only clear TF in the flags
register for user exceptions where DR6.BS is set.
- Clear DR6 before returning from a debug exception handler as
recommended by the SDM dating all the way back to the 386. This
allows debuggers to determine the cause of each exception. For
kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value
to other parts of the handler (namely, user_dbreg_trap()). For user
traps, wait until after trapsignal to clear DR6 so that userland
debuggers can read DR6 via PT_GETDBREGS while the thread is stopped
in trapsignal().
Reviewed by: kib, rgrimes
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D15189
Speculative Store Bypass (SSB) is a speculative execution side channel
vulnerability identified by Jann Horn of Google Project Zero (GPZ) and
Ken Johnson of the Microsoft Security Response Center (MSRC)
https://bugs.chromium.org/p/project-zero/issues/detail?id=1528.
Updated Intel microcode introduces a MSR bit to disable SSB as a
mitigation for the vulnerability.
Introduce a sysctl hw.spec_store_bypass_disable to provide global
control over the SSBD bit, akin to the existing sysctl that controls
IBRS. The sysctl can be set to one of three values:
0: off
1: on
2: auto
Future work will enable applications to control SSBD on a per-process
basis (when it is not enabled globally).
SSBD bit detection and control was verified with prerelease microcode.
Security: CVE-2018-3639
Tested by: emaste (previous version, without updated microcode)
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Install appropriate pti-aware shootdown IPI handlers, otherwise user
page tables do not get enough invalidations. The non-pti handlers
were used so far.
Reported and tested by: cperciva
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
The intent was to disable IBPB and IBRS around MWAIT, and re-enable on
the sleep end.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
This change reverts a "while here" part of r333321 that moved clearing
of suspended_cpus to an earlier place.
Apparently, there can be a problem when modifying (shared) memory before
restoring proper cache attributes. So, to be safe, move the clearing to
the old place.
Many thanks to Johannes Lundberg for bisecting the changes to that
particular commit and then bisecting the commit to the particular
change.
Reported by: many
Debugged by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
X-MFC with: r333321
The idea is to calibrate the LAPIC timer just once and only on boot,
given that [at present] the timer constants are global and shared
between all processors.
My primary motivation is to fix a panic that can happen when dynamically
switching to lapic timer. The panic is caused by a recursion on
et_hw_mtx when printing the calibration results to console. See the
review for the details of the panic.
Also, the code should become slightly simpler and easier to read. The
previous code was racy too. Multiple processors could start calibrating
the global constants concurrently, although that seems to have been
benign.
Reviewed by: kib, mav, jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D15422
Without a subsequent wbinvd the changes to suspended_cpus (and
resuming_cpus) can be lost at least on AMD systems that use MOESI cache
coherency protocol. That can happen because one of APs ends up as an
Owner of the corresponding cache line(s) and the changes may never reach
the main memory before the AP is reset.
While here, move clearing of suspended_cpus a little bit earlier as the
fact of returning from savectx (with zero return value) means that the
CPU has fully restored it execution context.
Also, rework the comment that describes the need for resuming_cpus.
This change fixed suspend to RAM a previously broken AMD-based system.
Reviewed by: kib
Discussed with: bde
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D15295
ifuncs on x86.
Also keep helpers to define 'pseudo-ifuncs' which are emulated by the
indirect jmp.
Reviewed by: jhb (previous version, as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D13838
Resume starts CPU from the init state, which clears any loaded
microcode updates. As result, IBRS MSRs are no longer available,
until the microcode is reloaded.
I have to forcibly clear cpu_stdext_feature3, which assumes that CPUID
leaf 7 reg %ebx does not report anything except Meltdown/Spectre bugs
bits. If future CPUs add new bits there, hw_ibrs_recalculate() and
identify_cpu1()/identify_cpu2() need to be adjusted for that.
Submitted and tested by: Michael Danilov <mike.d.ft402@gmail.com>
PR: 227866
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15236
The APL31 NDA errata is APL30 public errata. Add the reference and
provide the description [2].
Noted by: emaste [2], rpokala [1]
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
If the workaround is activated, always send IPI for wake up, not rely
on the write to the monitor line. This fixes Appolo Lake machines
early hang in sched_bind(), without requiring user to manually select
idle method.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Use designated initializers for the idlt_tlb elements.
Remove strstr() use, add flag field to detect supported MWAIT.
Use nitems() instead of the terminating NULL entry for idle_tlb.
Move several functions into cpu_idle_* namespace.
Based on the discussion with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
disabled.
Intel finally added this information, which allows us to not parse CPU
identification string looking for the nominal frequency. The leaf is
present e.g. on Appolo Lake Atom CPUs. It is only used if the TSC
calibration is disabled by user.
Also, report the TSC frequency in bootverbose mode always, regardless
of the way it was obtained.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
It is applied before it is possible for idle threads to execute on any
CPU, allowing to work around against some bugs.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Otherwise, under bootverbose, the lapic_enable_cmc() banner 'lapicX:
CMCI unmasked' is printed by several CPUs in parallel, causing garbled
output for the LAPIC dumps.
Reported by: royger
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15157
machine check banks must be only monitored by single CPU.
Noted and reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15157
We must ensure that accesses occur, they do not have any other
compiler-visible effects. Bruce found some situations where
optimization could remove an access, and provided a patch to use
volatile qualifier for the state variables. Since volatile behaviour
there is the compiler-specific interpretation of the keyword, use
relaxed atomics instead, which gives exactly the desired semantic.
Noted by and discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This sysctl allows a deeper dive into the sleep abyss comparing to
debug.acpi.suspend_bounce. When the new sysctl is set the system will
execute the suspend sequence up to the call to AcpiEnterSleepState().
That includes saving processor contexts and parking APs. Then, instead
of actually entering the sleep state, the BSP will call resumectx() to
emulate the wakeup. The APs should get restarted by the sequence of
Init and Startup IPIs that BSP sends to them.
MFC after: 8 days
x86 enforces an (arbitray) limit on the number of available MSI and
MSI-X interrupts to simplify code (in particular, interrupt_source[]
is statically sized). This means that an attempt to allocate an MSI
vector needs to fail if it would go beyond the limit, but the checks
for exceeding the limit had an off-by-one error. In the case of MSI-X
which allocates interrupts one at a time this meant that IRQ 768 kept
getting handed out multiple times for msix_alloc() instead of failing
because all MSI IRQs were in use.
Tested by: lidl
MFC after: 1 week
The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.
By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap. The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.
There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.
copyout(9) was rewritten to use vm_fault_quick_hold(). An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls. The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline. If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.
The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done. The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging. I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.
Tested by: pho
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D14633
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
Add the missing breaks in the for loops, in order to exit the loop
when a suitable entry is found.
Also switch amd64 native_start_all_aps to use PHYS_TO_DMAP in order to
find the virtual address of the boot_trampoline and the initial page
tables.
Reported and tested by: pho
Sponsored by: Citrix Systems R&D
So that it doesn't rely on physmap[1] containing an address below
1MiB. Instead scan the full physmap and search for a suitable address
to place the trampoline code (below 1MiB) and the initial memory pages
(below 4GiB).
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D14878
x86/cpu_machdep.c now needs to include elan_mmcr.h when CPU_ELAN is set.
While here, also remove the now unneeded inclusion of isareg.h in i386
and amd64 vm_machdep.c.
Reported by: lwhsu
MFC after: 14 days
X-MFC with: r331878
When I moved these functions from i386 and amd64 to x86 I dropped their
prototype declarations (that were correct) and left only their definitions
that became incorrect.
Reported by: bde
MFC after: 15 days
X-MFC with: r331878
Because I didn't see any reason not too.
I've been making some changes to the code and couldn't help but notice
that the i386 and am64 code was nearly identical.
MFC after: 17 days
platforms. Original commit message as follows:
Only use CPUs in the domain the device is attached to for default
assignment. Device drivers are able to override the default assignment
if they bind directly. There are severe performance penalties for
handling interrupts on remote CPUs and this should only be done in
very controlled circumstances.
Reviewed by: jhb, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14838
These have been supplanted by the MI signal information codes in
<sys/signal.h> since 7.0. The FPE_*_TRAP ones were deprecated even
earlier in 1999.
PR: 226579 (exp-run)
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D14637
assignment. Device drivers are able to override the default assignment
if they bind directly. There are severe performance penalties for
handling interrupts on remote CPUs and this should only be done in
very controlled circumstances.
Reviewed by: jhb, kib
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14838
Originally KVM set %eax to 0 in the cpuid leaf 0x4000000 rather than
to the highest supported leaf in the hypervisor "branch". Detect this
case and fixup the %eax value so that the hypervisor is still
detected.
Reported by: jpaetzel
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14810
Or else disable the device. Note that the detection can be bypassed by
setting the hw.atrtc.enable option in the loader configuration file.
More information can be found on atrtc(4).
Sponsored by: Citrix Systems R&D
Reviewed by: ian
Differential revision: https://reviews.freebsd.org/D14399
from the i8254 driver when I created separate mutexes for each. The i8254
driver could be the active timecounter, leading to recursion during mutex
profiling, but the atrtc driver cannot be a timecounter, so it isn't needed.