For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs. On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY. To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25754
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu. Now each CPU can initiate
shootdown IPI independently from other CPUs. Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu. After that IPI is sent to all targets
which scan for zeroed scoreboard generation words. Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set. Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.
Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.
The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.
Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.
Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
this is fine because we in fact do not send IPIs from interrupt
handlers. More for !x2APIC mode where ICR access for write requires
two registers write, we disable interrupts around it. If considered
incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
cache line, to reduce ping-pong.
Reviewed by: markj (previous version)
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25510
Currently NMIs are sent over event channels, but that defeats the
purpose of NMIs since event channels can be masked. Fix this by
issuing NMIs using a hypercall, which injects a NMI (vector #2) to the
desired vCPU.
Note that NMIs could also be triggered using the emulated local APIC,
but using a hypercall is better from a performance point of view
since it doesn't involve instruction decoding when not using x2APIC
mode.
Reported and Tested by: avg
Sponsored by: Citrix Systems R&D
So that it's done when the vcpu_id has been set. For the BSP the
vcpu_id is set at SUB_INTR, while for the APs it's done in
init_secondary_tail that's called at SUB_SMP order FIRST.
Reported and tested by: cperciva
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D17013
Install appropriate pti-aware shootdown IPI handlers, otherwise user
page tables do not get enough invalidations. The non-pti handlers
were used so far.
Reported and tested by: cperciva
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
The change introduced a dependency between genassym.c and header files
generated from .m files, but that dependency is not specified in the
make files.
Also, the change could be not as useful as I thought it was.
Reported by: dchagin, Manfred Antar <null@pozo.com>, and many others
The change is more intrusive than I would like because the feature
requires that a vector number is written to a special register.
Thus, now the vector number has to be provided to lapic_eoi().
It was readily available in the IO-APIC and MSI cases, but the IPI
handlers required more work.
Also, we now store the VMM IPI number in a global variable, so that it
is available to the justreturn handler for the same reason.
Reviewed by: kib
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D9880
Current Xen IPI setup functions require that the caller provide a device in
order to obtain the name of the interrupt from it. With early AP startup this
device is no longer available at the point where IPIs are bound, and a KASSERT
would trigger:
panic: NULL pcpu device_t
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82233a20
vpanic() at vpanic+0x186/frame 0xffffffff82233aa0
kassert_panic() at kassert_panic+0x126/frame 0xffffffff82233b10
xen_setup_cpus() at xen_setup_cpus+0x5b/frame 0xffffffff82233b50
mi_startup() at mi_startup+0x118/frame 0xffffffff82233b70
btext() at btext+0x2c
Fix this by no longer requiring the presence of a device in order to bind IPIs,
and simply use the "cpuX" format where X is the CPU identifier in order to
describe the interrupt.
Reported by: sbruno, cperciva
Tested by: sbruno
X-MFC-With: r310177
Sponsored by: Citrix Systems R&D
If BIOS performed hand-off to OS with BSP LAPIC in the x2APIC mode,
system usually consumes such configuration without a notice, since
x2APIC is turned on by OS if possible (nop). But if BIOS
simultaneously requested OS to not use x2APIC, code assumption that
that xAPIC is active breaks.
In my opinion, we cannot safely turn off x2APIC if control is passed
in this mode. Make madt.c ignore user or BIOS requests to turn x2APIC
off, and do not check the x2APIC black list. Just trust the config
and try to continue, giving a warning in dmesg.
Reported and tested by: Slawa Olhovchenkov <slw@zxy.spb.ru> (previous version)
Diagnosed by and discussed with: avg
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms". The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.
The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated. If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.
Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated. On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid. The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.
The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs. Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.
Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
shows no difference with the code removed.
On both amd64 and i386, assert that a released pmap is not active.
Proposed and reviewed by: alc
Discussed with: Svatopluk Kraus <onwahe@gmail.com>, peter
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
vectors to be dynamically allocated. This allows kernel modules like vmm.ko
to allocate unique IPI slots when loaded (as opposed to hard allocating one
or more vectors).
Also, reorganize the fixed IPI vectors to create a contiguous space for
dynamic IPI allocation.
Reviewed by: kib, jhb
Differential Revision: https://reviews.freebsd.org/D2042
hw.x2apic_enable tunable allows disabling it from the loader prompt.
To closely repeat effects of the uncached memory ops when accessing
registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded
by mfence, except for the EOI notifications. This is probably too
strict, only ICR writes to send IPI require serialization to ensure
that other CPUs see the previous actions when IPI is delivered. This
may be changed later.
In vmm justreturn IPI handler, call doreti_iret instead of doing iretd
inline, to handle corner conditions.
Note that the patch only switches LAPICs into x2APIC mode. It does not
enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC
MADT entries and doing interrupts remapping, but is the required step
on the way.
Reviewed by: neel
Tested by: pho (real hardware), neel (on bhyve)
Discussed with: jhb, grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
Fix the gate in xen_pv_lapic_ipi_vectored to prevent access to element
at position nitems(xen_ipis).
Sponsored by: Citrix Systems R&D
Coverity ID: 1223203
Approved by: gibbs