pmc_process_interrupt takes 5 arguments when only 3 are needed.
cpu is always available in curcpu and inuserspace can always be
derived from the passed trapframe.
While facially a reasonable cleanup this change was motivated
by the need to workaround a compiler bug.
core2_intr(cpu, tf) ->
pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) ->
pmc_add_sample(cpu, ring, pm, tf, inuserspace)
In the process of optimizing the tail call the tf pointer was getting
clobbered:
(kgdb) up
at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709
4709 pmc_save_kernel_callchain(ps->ps_pc,
(kgdb) up
1205 error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
resulting in a crash in pmc_save_kernel_callchain.
- add '-j' options to filter to enable converting native pmc
log format to json lines format to enable the use of scripts
and external tooling
% pmc filter -j pmc.log pmc.jsonl
- Record the tsc value in sampling interrupts as opposed to
recording nanotime when the sample is copied to a global log
in hardclock - potentially many milliseconds later.
- At initialize record the tsc_freq and the time of day to give
us an offset for translating the tsc values in callchain records
By logging all threads and processes 'pmc filter'
can now filter on process or thread name, relieving
the user of the burden of determining which tid or
pid was which when the sample was taken.
% pmc filter -T if_io_tqg -P nginx pmc.log pmc-iflib.log
% pmc filter -x -T idle pmc.log pmc-noidle.log
- increase pmc cpuid field from 8 to 12 bits
- add cpuid version string to initialize entry in the log
so that filter can identify which counter index an
event name maps to
- GC unused config flags
- make fixed counter assignment more robust as well as the
changes needed to be properly identified for filter
This adds the -U options to pmcstat which will attribute in-kernel samples
back to the user stack that invoked the system call. It is not the default,
because when looking at kernel profiles it is generally more desirable to
merge all instances of a given system call together.
Although heavily revised, this change is directly derived from D7350 by
Jonathan T. Looney.
Obtained from: jtl
Sponsored by: Juniper Networks, Limelight Networks
Intel now provides comprehensive tables for all performance counters
and the various valid configuration permutations as text .json files.
Libpmc has been converted to use these and hwpmc_core has been greatly
simplified by moving to passthrough of the table values.
The one gotcha is that said tables don't support pentium pro and and pentium
IV. There's very few users of hwpmc on _amd64_ kernels on new hardware. It is
unlikely that anyone is doing low level optimization on 15 year old Intel
hardware. Nonetheless, if someone feels strongly enough to populate the
corresponding tables for p4 and ppro I will reinstate the files in to the
build.
Code for the K8 counters and !x86 architectures remains unchanged.
vendor provided pmu-events tables and sundry cleanups.
The vendor pmu-events tables provide counter descriptions, default
sample rates, event, umask, and flag values for all the counter
configuration permutations. Using this gives us:
- much simpler kernel code for the MD component
- helpful long and short event descriptions
- simpler user code
- sample rates that won't overload the system
Update man page with newer sample types and remove unused sample type.
vendor provided pmu-events tables and sundry cleanups.
The vendor pmu-events tables provide counter descriptions, default
sample rates, event, umask, and flag values for all the counter
configuration permutations. Using this gives us:
- much simpler kernel code for the MD component
- helpful long and short event descriptions
- simpler user code
- sample rates that won't overload the system
Update man page with newer sample types and remove unused sample type.
Squashed commit of the following:
commit 4459d43eff815bec08ccc5533dbe5de846f03128
Author: Matt Macy <mmacy@mattmacy.io>
Date: Sat May 26 00:06:31 2018 -0700
libpmc: fix pmu function signatures for non amd64
commit a2cb8bbc586c65d41f9b291430a2261ec67b59fe
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:38:11 2018 -0700
pmcstat: fix indentation of usage
commit f686954b15ff56a833ac80404898977cb80a265b
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:19:49 2018 -0700
pmclog(3): add callchain and pmcallocatedyn, remove pcsample
commit 73e13a0d2e9498c81c150d14d022050cee7511bb
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:19:00 2018 -0700
pmclog.h: GC pcsample field
commit 3e93ffd65da641fa657539dad3c48e281f8b5798
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:05:57 2018 -0700
hwpmc: make Intel core CPUs use external event tables
commit 634f5fae1e1644ac324003136c66cd9c619d1c93
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:00:06 2018 -0700
pmclog: update log record types, bump PMC_MAJOR
- explicitly make log record types a multiple of 8 bytes
- hook in pmu event types for pmc_allocate records
- remove references to no longer PCSAMPLE record
commit 83d84fcd2d65bdf6ddcb2e155a22f0cfa2a9c225
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 21:52:10 2018 -0700
libpmc: add support for having vendor table driven pmc_allocate
commit 9e6ad63c40c2fce8404847ace5078ca6cb33a736
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 19:11:33 2018 -0700
hwpmc_core: add accessors for EVSEL & UMASK, make IAP_UMASK useful to user
commit 859dceb93daa6419a48c794db99b6758e5b041c9
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 19:09:45 2018 -0700
pmcstat: update usage and man page as well as make -L consistent with pmccontrol
commit 79c7d8597e28c2eb13f5f9113e65ec2792ca57b1
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 18:07:03 2018 -0700
pmu_util: add support for all current intel event keywords
commit d8089c7f6a6c8527f38324252b1ffb47004694c6
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 17:45:00 2018 -0700
add description for new arguments
commit 058336740bab53c62ec88a3a026ea848cf3878c6
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 17:38:15 2018 -0700
libpmc: move pmu_events table and pmu_utils out of libpmcstat so that they can be used by pmc_allocate
commit 049b66b382e2f833c3f47bc8df9e750cb265709f
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 16:12:41 2018 -0700
pmcstat: hook pmu_events counter description utility routines in
commit f5e01e7b37a691dc045e1aa16b3ebdd162515de8
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 16:11:59 2018 -0700
pmu_events: add utility routines for listing counters and their descriptions
commit cba4d4f8907f772279f86f18f915e0d74d33ac56
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 16:09:50 2018 -0700
pmu-events: expand out skylake regex to simplify string matches
There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.
Reported by: markj
This implements per-thread counters for PMC sampling. The thread
descriptors are stored in a list attached to the process descriptor.
These thread descriptors can store any per-thread information necessary
for current or future features. For the moment, they just store the counters
for sampling.
The thread descriptors are created when the process descriptor is created.
Additionally, thread descriptors are created or freed when threads
are started or stopped. Because the thread exit function is called in a
critical section, we can't directly free the thread descriptors. Hence,
they are freed to a cache, which is also used as a source of allocations
when needed for new threads.
Approved by: sbruno
Obtained from: jtl
Sponsored by: Juniper Networks, Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15335
Once a pmc owner is added to the pmc_ss_owners list it is
visible for all to see. We don't want this to happen until
setup is complete.
Reported by: mjg
Approved by: sbruno
- fix load/unload race by allocating the per-domain list structure at boot
- fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch
to protect the liveness of elements of the pmc_ss_owners list
Reported by: pho
Approved by: sbruno
It appears that domain information is set correctly independent
of whether or not NUMA is defined. However, there is no memory
backing secondary domains leading to allocation failure.
Reported by: pho@, np@
Approved by: sbruno@
On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.
Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &
Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total
After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total
For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &
Collecting 10 samples of `make -j96 buildkernel` from each:
x before
+ after
real time:
N Min Max Median Avg Stddev
x 10 76.4 127.62 84.845 88.577 15.100031
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
-28.398 +/- 10.0344
-32.0602% +/- 7.69825%
(Student's t, pooled s = 10.6794)
system time:
N Min Max Median Avg Stddev
x 10 2277.96 6948.53 2949.47 3341.492 1385.2677
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
-2277.47 +/- 920.425
-68.1574% +/- 8.77623%
(Student's t, pooled s = 979.596)
x no pmc
+ pmc running
real time:
HEAD:
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 76.4 127.62 84.845 88.577 15.100031
Difference at 95.0% confidence
29.73 +/- 10.0335
50.5208% +/- 17.0525%
(Student's t, pooled s = 10.6785)
patched:
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
1.332 +/- 0.248939
2.2635% +/- 0.426506%
(Student's t, pooled s = 0.264942)
system time:
HEAD:
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 2277.96 6948.53 2949.47 3341.492 1385.2677
Difference at 95.0% confidence
2309.97 +/- 920.443
223.937% +/- 89.3039%
(Student's t, pooled s = 979.616)
patched:
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
32.493 +/- 16.0042
3.15% +/- 1.5794%
(Student's t, pooled s = 17.0331)
Reviewed by: jeff@
Approved by: sbruno@
Differential Revision: https://reviews.freebsd.org/D15155
- Add macros to allow preinitialization of cap_rights_t.
- Convert most commonly used code paths to use preinitialized cap_rights_t.
A 3.6% speedup in fstat was measured with this change.
Reported by: mjg
Reviewed by: oshogbo
Approved by: sbruno
MFC after: 1 month
pmcstat request for close will generate a close event.
This event will be in turn received by pmcstat to close the file.
Reviewed by: kib
Tested by: pho
MFC after: 1 week
Sponsored by: Stormshield
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Note that PMCLOG_RESERVE_WITH_ERROR() macro contains goto error;
statement and executed after the flag is set.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
instead of asserting the condition.
The row index is directly supplied by userspace, the kernel must
handle invalid values.
Submitted by: pho
MFC after: 3 days
The r195005 unlocked pmc_sx before calling into pmclog_configure_log()
to avoid the LOR, but it allows flush or closelog to run in parallel
with the configuration, causing many failure modes.
Revert r195005. Pre-create the logging process, allowing it to run
after the set up succeeded, otherwise the process terminates itself.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D12882
hwpmc(4) must not voluntarily call fo_close(), doing this causes
double-close of the file. It seems to almost avoid bad consequences
for pipes, but other types of files demonstrate random memory access.
To fix, remove fo_close() calls, which also do not provide the
declared wake-up of waiters consistently. Instead, send a signal to
the logger and configure the logger process to not block it. Since
logger never returns to userspace, the signal only causes termination
of the interruptible sleeps in fo_write().
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
X-Differential revision: https://reviews.freebsd.org/D12882