These definitions will be used by a driver to implement Hardware
P-States (autonomous control of HWP, via Intel Speed Shift technology).
Reviewed by: kib
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D18050
SDM rev. 068 was released yesterday and it contains the description of
the MSR 0x10a IA32_ARCH_CAP. This change adds symbolic definitions for
all bits present in the document, and decode them in the CPU
identification lines printed on boot.
But also, the document defines SSB_NO as bit 4, while FreeBSD used but
2 to detect the need to work-around Speculative Store Bypass
issue. Change code to use the bit from SDM.
Similarly, the document describes bit 3 as an indicator that L1TF
issue is not present, in particular, no L1D flush is needed on
VMENTRY. We used RDCL_NO to avoid flushing, and again I changed the
code to follow new spec from SDM.
In fact my Apollo Lake machine with latest ucode shows this:
IA32_ARCH_CAPS=0x19<RDCL_NO,SKIP_L1DFL_VME,SSB_NO>
Reviewed by: bwidawsk
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D18006
Do not use C constant suffixes. Bit values are small enough to not
require typing, despite they are used for 64bit MSR writes. The added
cast in hw_ibrs_recalculate() is redundand but I prefer to add it for
clarity.
Reported by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
It is coded according to the Intel document 336996-001, reading of the
patches posted on lkml, and some additional consultations with Intel.
For existing processors, you need a microcode update which adds IBRS
CPU features, and to manually enable it by setting the tunable/sysctl
hw.ibrs_disable to 0. Current status can be checked in sysctl
hw.ibrs_active. The mitigation might be inactive if the CPU feature
is not patched in, or if CPU reports that IBRS use is not required, by
IA32_ARCH_CAP_IBRS_ALL bit.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14029
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
17h supports MCA thresholding in the same way as 16h and earlier.
Supposedly a ScalableMca feature bit in CPUID 8000_0007:EBX must be set, but
that was not true for earlier models, so be careful about relying on it.
While here, document a missing bit in LS MCA MISC0.
Reviewed by: truckman
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12237
Currently the feature is implemented only for a subset of errors
reported via Bank 4. The subset includes only DRAM-related errors.
The new code builds upon and reuses the Intel CMC (Correctable MCE
Counters) support code. However, the AMD feature is quite different
and, unfortunately, much less regular.
For references please see AMD BKDGs for models 10h - 16h.
Specifically, see MSR0000_0413 NB Machine Check Misc (Thresholding)
Register (MC4_MISC0).
http://developer.amd.com/resources/developer-guides-manuals/
Reviewed by: jhb
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D9613
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors. We re-enable the extension, so that we can properly discover
core and cache topology. Linux seems to do the same.
Reported by: Johannes Dieterich <dieterich.joh@gmail.com>
Reviewed by: jhb, kib
Tested by: Johannes Dieterich <dieterich.joh@gmail.com>
(earlier version)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D5883
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms". The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.
The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated. If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.
Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated. On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid. The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.
The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs. Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.
Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
hw.x2apic_enable tunable allows disabling it from the loader prompt.
To closely repeat effects of the uncached memory ops when accessing
registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded
by mfence, except for the EOI notifications. This is probably too
strict, only ICR writes to send IPI require serialization to ensure
that other CPUs see the previous actions when IPI is delivered. This
may be changed later.
In vmm justreturn IPI handler, call doreti_iret instead of doing iretd
inline, to handle corner conditions.
Note that the patch only switches LAPICs into x2APIC mode. It does not
enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC
MADT entries and doing interrupts remapping, but is the required step
on the way.
Reviewed by: neel
Tested by: pho (real hardware), neel (on bhyve)
Discussed with: jhb, grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
showing up on Haswell-class CPUs
From the Intel SDM, "Table 3-20. Feature Information Returned in the
ECX Register"
11 | SDBG | A value of 1 indicates the processor supports
IA32_DEBUG_INTERFACE MSR for silicon debug.
Submitted by: jiashiun@gmail.com
Reviewed by: jhb neel
MFC after: 2 weeks
Add support for AMD's nested page tables in pmap.c:
- Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit)
for a pmap of type PT_RVI.
- Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of
type PT_EPT or PT_RVI.
Add CPU_SET_ATOMIC_ACQ(num, cpuset):
This is used when activating a vcpu in the nested pmap. Using the 'acquire'
variant guarantees that the load of the 'pm_eptgen' will happen only after
the vcpu is activated in 'pm_active'.
Add defines for various AMD-specific MSRs.
Submitted by: Anish Gupta (akgupt3@gmail.com)
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.
The VT-x code has the following types of MSRs:
- MSRs that are unconditionally saved/restored on every guest/host context
switch (e.g., MSR_GSBASE).
- MSRs that are restored to guest values on entry to vmx_run() and saved
before returning. This is an optimization for MSRs that are not used in
host kernel context (e.g., MSR_KGSBASE).
- MSRs that are emulated and every access by the guest causes a trap into
the hypervisor (e.g., MSR_IA32_MISC_ENABLE).
Reviewed by: grehan
%eax report.
Print the XSAVE features 0xd/1 in the boot banner. The printcpuinfo()
is executed late enough so that XSAVE is already enabled.
There is no known to me off the shelf hardware that implements any
feature bits except XSAVEOPT, the list is taken from SDM rev. 50. The
banner printing will allow us to note the hardware arrival.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
features. If bootverbose is enabled, a detailed list is provided;
otherwise, a single-line summary is displayed.
- Add read-only sysctls for optional VT-x capabilities used by bhyve
under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that
indicate the presence of optional capabilities under this node.
CR: https://phabric.freebsd.org/D498
Reviewed by: grehan, neel
MFC after: 1 week
XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions).
Obtained from: Intel's Instruction Set Extensions Programming Reference
(March 2014)
described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf.
Partially based on the patch submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
introduced with the IvyBridge CPUs. Provide the definitions for new
bits in CR3 and CR4 registers.
Tested by: avg, Michael Moll <kvedulv@kvedulv.de>
MFC after: 2 weeks
mostly meets the guidelines set by the Intel SDM:
1. We use XRSTOR and XSAVE from the same CPL using the same linear
address for the store area
2. Contrary to the recommendations, we cannot zero the FPU save area
for a new thread, since fork semantic requires the copy of the
previous state. This advice seemingly contradicts to the advice
from the item 6.
3. We do use XSAVEOPT in the context switch code only, and the area
for XSAVEOPT already always contains the data saved by XSAVE.
4. We do not modify the save area between XRSTOR, when the area is
loaded into FPU context, and XSAVE. We always spit the fpu context
into save area and start emulation when directly writing into FPU
context.
5. We do not use segmented addressing to access save area, or rather,
always address it using %ds basing.
6. XSAVEOPT can be only executed in the area which was previously
loaded with XRSTOR, since context switch code checks for FPU use by
outgoing thread before saving, and thread which stopped emulation
forcibly get context loaded with XRSTOR.
7. The PCB cannot be paged out while FPU emulation is turned off, since
stack of the executing thread is never swapped out.
The context switch code is patched to issue XSAVEOPT instead of XSAVE
if supported. This approach eliminates one conditional in the context
switch code, which would be needed otherwise.
For user-visible machine context to have proper data, fpugetregs()
checks for unsaved extension blocks and manually copies pristine FPU
state into them, according to the description provided by CPUID leaf
0xd.
MFC after: 1 month