pmc_cpuid was uninitialized for most AMD processor families. We can still
populate this string for unimplemented families.
Also added a CPUID_TO_STEPPING macro and converted existing code to use it.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25673
It follows the equivalent Linux change to be able to differentiate
skylakex and cascadelakex, sharing the same model but not stepping.
This fixes skylakex handling broken by r363144.
MFC after: 6 days
To make the PMC tool pmcstat working properly on Hygon platform, add
support for Hygon Dhyana family 18h by using the PMC initialization
code path of AMD family 17h.
Submitted by: Pu Wen <puwen@hygon.cn>
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23562
- amd_intr() does not account for the offset (0x200) in the counter
MSR address and ends up accessing invalid regions while reading
counter value after the 4th counter (0xC001000[8,9,..]) and
erroneously updates the counter values for counters [1-4].
- amd_intr() should only check core pmcs for interrupts since
other types of pmcs (L3,DF) cannot generate interrupts.
- fix pmc NMI's being ignored due to NMI latency on newer AMD processors
Note that this fixes a kernel panic due to GPFs accessing MSRs on
higher core count AMD cpus (seen on both Rome 7502P, and
Threadripper 2990WX 32-core CPUs)
Discussed with: markj
Submitted by: Shreyank Amartya
Differential Revision: https://reviews.freebsd.org/D21553
pmc_process_interrupt takes 5 arguments when only 3 are needed.
cpu is always available in curcpu and inuserspace can always be
derived from the passed trapframe.
While facially a reasonable cleanup this change was motivated
by the need to workaround a compiler bug.
core2_intr(cpu, tf) ->
pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) ->
pmc_add_sample(cpu, ring, pm, tf, inuserspace)
In the process of optimizing the tail call the tf pointer was getting
clobbered:
(kgdb) up
at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709
4709 pmc_save_kernel_callchain(ps->ps_pc,
(kgdb) up
1205 error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
resulting in a crash in pmc_save_kernel_callchain.
On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.
Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &
Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total
After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total
For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &
Collecting 10 samples of `make -j96 buildkernel` from each:
x before
+ after
real time:
N Min Max Median Avg Stddev
x 10 76.4 127.62 84.845 88.577 15.100031
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
-28.398 +/- 10.0344
-32.0602% +/- 7.69825%
(Student's t, pooled s = 10.6794)
system time:
N Min Max Median Avg Stddev
x 10 2277.96 6948.53 2949.47 3341.492 1385.2677
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
-2277.47 +/- 920.425
-68.1574% +/- 8.77623%
(Student's t, pooled s = 979.596)
x no pmc
+ pmc running
real time:
HEAD:
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 76.4 127.62 84.845 88.577 15.100031
Difference at 95.0% confidence
29.73 +/- 10.0335
50.5208% +/- 17.0525%
(Student's t, pooled s = 10.6785)
patched:
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
1.332 +/- 0.248939
2.2635% +/- 0.426506%
(Student's t, pooled s = 0.264942)
system time:
HEAD:
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 2277.96 6948.53 2949.47 3341.492 1385.2677
Difference at 95.0% confidence
2309.97 +/- 920.443
223.937% +/- 89.3039%
(Student's t, pooled s = 979.616)
patched:
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
32.493 +/- 16.0042
3.15% +/- 1.5794%
(Student's t, pooled s = 17.0331)
Reviewed by: jeff@
Approved by: sbruno@
Differential Revision: https://reviews.freebsd.org/D15155
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
It is possible that wrmsr in amd_stop_pmc() causes an overflow in a counter
that it disables. In that case a non-maskable interrupt is generated. The
interrupt handler code was written in such a way that it would re-enable the
counter. That would lead to an unexpected interrupt later on.
This problem was easy to reproduce with
$ pmcstat -T -P instructions -t $pid
if the target process is sufficiently busy and there are context switches from
time to time. There would be a lot of interrupts to "race" with amd_stop_pmc()
called during the context switches. The problem affected only AMD processors.
While there, trace whether amd_intr() claimed an interrupt.
Reviewed by: jhb
MFC after: 2 weeks
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).
Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.
Sponsored by: NETASQ
MFC after: 1 month
hardware for PMCs that have been configured for sampling.
- Bug fix: acknowledge PMC hardware overflows irrespective of the
the (software) PMC's state.
dependencies. A 'struct pmc_classdep' structure describes operations
on PMCs; 'struct pmc_mdep' contains one or more 'struct pmc_classdep'
structures depending on the CPU in question.
Inside PMC class dependent code, row indices are relative to the
PMCs supported by the PMC class; MI code in "hwpmc_mod.c" translates
global row indices before invoking class dependent operations.
- Augment the OP_GETCPUINFO request with the number of PMCs present
in a PMC class.
- Move code common to Intel CPUs to file "hwpmc_intel.c".
- Move TSC handling to file "hwpmc_tsc.c".
2's compliment.
The 2's compliment transform is done so a "count down" sampling interval
can be converted into a "count up" PMC value. a 2's complimented 'count down'
value is written to the PMC counter; then the read-back counter is reverted
via another 2's compliment.
PR: kern/121660
Reviewed by: jkoshy
Approved by: jkoshy
MFC after: 1 week
'buffers' pending NMIs from multiple interrupting PMCs and delivers
them serially.
Reported by: Olivier Crameri <olivier.crameri@epfl.ch>
MFC after: 3 days
- Update driver interrupt statistics correctly.
sys/sys/pmc.h, sys/dev/hwpmc/hwpmc_mod.c:
- Fix a bug affecting debug printfs.
- Move the 'stalled' flag from being in a bit in the
'pm_flags' field of a 'struct pmc' to a field of its own in the
same structure. This flag is updated from the NMI handler and
keeping it separate makes it easier to avoid races with other
parts of the code.
sys/dev/hwpmc/hwpmc_logging.c:
- Do arithmetic with 'uintptr_t' types rather that casting
to and from 'char *'.
Approved by: re (scottl)
- Implement sampling modes and logging support in hwpmc(4).
- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.
- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).
- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.
- bug fixes & documentation.
Have pmcstat(8) and pmccontrol(8) use these APIs.
Return PMC class-related constants (PMC widths and capabilities)
with the OP GETCPUINFO call leaving OP PMCINFO to return only the
dynamic information associated with a PMC (i.e., whether enabled,
owner pid, reload count etc.).
Allow pmc_read() (i.e., OPS PMCRW) on active self-attached PMCs to
get upto-date values from hardware since we can guarantee that the
hardware is running the correct PMC at the time of the call.
Bug fixes:
- (x86 class processors) Fix a bug that prevented an RDPMC
instruction from being recognized as permitted till after the
attached process had context switched out and back in again after
a pmc_start() call.
Tighten the rules for using RDPMC class instructions: a GETMSR
OP is now allowed only after an OP ATTACH has been done by the
PMC's owner to itself. OP GETMSR is not allowed for PMCs that
track descendants, for PMCs attached to processes other than
their owner processes.
- (P4/HTT processors only) Fix a bug that caused the MI and MD
layers to get out of sync. Add a new MD operation 'get_config()'
as part of this fix.
- Allow multiple system-mode PMCs at the same row-index but on
different CPUs to be allocated.
- Reject allocation of an administratively disabled PMC.
Misc. code cleanups and refactoring. Improve a few comments.
Only allow a process to use the x86 RDPMC instruction if it has
allocated and attached a PMC to itself.
Inform the MD layer of the "pseudo context switch out" that needs
to be done when the last thread of a process is exiting.
includes the MD header for us. Do not include <machine/specialreg.h>
as it is not a header file that can be included from MI files. It
is included from <machine/pmc_mdep.h> if so needed and possible.
Ok'd: jkoshy@