Commit Graph

319 Commits

Author SHA1 Message Date
Matt Macy
ebfaf69cc0 hwpmc: log name->pid, name->tid mappings
By logging all threads and processes 'pmc filter'
can now filter on process or thread name, relieving
the user of the burden of determining which tid or
pid was which when the sample was taken.

% pmc filter -T if_io_tqg -P nginx pmc.log pmc-iflib.log

% pmc filter -x -T idle pmc.log pmc-noidle.log
2018-06-05 04:26:40 +00:00
Matt Macy
ac7012d284 hwpmc: don't defer user callchain capture completion to ast 2018-06-04 21:17:30 +00:00
Matt Macy
cf823003a7 hwpmc: remove gratuitous curthread checks 2018-06-04 17:49:34 +00:00
Matt Macy
9645bcabdf hwpmc: fix fixed counters checks 2018-06-04 04:49:06 +00:00
Matt Macy
07d80fd8dc hwpmc: ABI fixes
- increase pmc cpuid field from 8 to 12 bits
- add cpuid version string to initialize entry in the log
  so that filter can identify which counter index an
  event name maps to
- GC unused config flags
- make fixed counter assignment more robust as well as the
  changes needed to be properly identified for filter
2018-06-04 02:05:48 +00:00
Matt Macy
5de96e33d6 hwpmc: support sampling both kernel and user stacks when interrupted in kernel
This adds the -U options to pmcstat which will attribute in-kernel samples
back to the user stack that invoked the system call. It is not the default,
because when looking at kernel profiles it is generally more desirable to
merge all instances of a given system call together.

Although heavily revised, this change is directly derived from D7350 by
Jonathan T. Looney.

Obtained from: jtl
Sponsored by: Juniper Networks, Limelight Networks
2018-06-04 01:10:23 +00:00
Matt Macy
2ce69a4d04 hwpmc: ensure that mapin updates are synchronous 2018-06-03 19:37:17 +00:00
Matt Macy
e92a1350b5 hwpmc: remove unused pre-table driven bits for intel
Intel now provides comprehensive tables for all performance counters
and the various valid configuration permutations as text .json files.
Libpmc has been converted to use these and hwpmc_core has been greatly
simplified by moving to passthrough of the table values.

The one gotcha is that said tables don't support pentium pro and and pentium
IV. There's very few users of hwpmc on _amd64_ kernels on new hardware. It is
unlikely that anyone is doing low level optimization on 15 year old Intel
hardware. Nonetheless, if someone feels strongly enough to populate the
corresponding tables for p4 and ppro I will reinstate the files in to the
build.

Code for the K8 counters and !x86 architectures remains unchanged.
2018-05-31 22:41:07 +00:00
Matt Macy
02ce8216df hwpmc: remove stale assert
Reported by:	eadler
2018-05-30 06:29:22 +00:00
Matt Macy
b99aa0fbb2 hwpmc: don't enter epoch section across mmap hook 2018-05-29 18:03:48 +00:00
Matt Macy
23c01e5b57 hwpmc: don't incrorrectly strip the ANY flag 2018-05-29 04:04:06 +00:00
Matt Macy
ba32b20330 hwpmc: make pmc class specification work to enable fixed function counters 2018-05-28 23:17:57 +00:00
Matt Macy
3de228499a hwmpc: fix brain-damaged handling of thread descriptor freeing 2018-05-28 23:16:39 +00:00
Matt Macy
39446ce5b3 hwpmc_logging.c: don't call wakeup_one with thread lock held, don't
malloc(M_WAITOK) in an epoch section
2018-05-28 23:12:26 +00:00
Matt Macy
959826ca1b pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the
vendor provided pmu-events tables and sundry cleanups.

The vendor pmu-events tables provide counter descriptions, default
sample rates, event, umask, and flag values for all the counter
configuration permutations. Using this gives us:

- much simpler kernel code for the MD component
- helpful long and short event descriptions
- simpler user code
- sample rates that won't overload the system

Update man page with newer sample types and remove unused sample type.
2018-05-26 19:29:19 +00:00
Matt Macy
5506ceb87f Revert r334242 "pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the"
because of squash commit messages
2018-05-26 19:26:19 +00:00
Matt Macy
4928135658 pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the
vendor provided pmu-events tables and sundry cleanups.

The vendor pmu-events tables provide counter descriptions, default
sample rates, event, umask, and flag values for all the counter
configuration permutations. Using this gives us:

- much simpler kernel code for the MD component
- helpful long and short event descriptions
- simpler user code
- sample rates that won't overload the system

Update man page with newer sample types and remove unused sample type.

Squashed commit of the following:

commit 4459d43eff815bec08ccc5533dbe5de846f03128
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Sat May 26 00:06:31 2018 -0700

    libpmc: fix pmu function signatures for non amd64

commit a2cb8bbc586c65d41f9b291430a2261ec67b59fe
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:38:11 2018 -0700

    pmcstat: fix indentation of usage

commit f686954b15ff56a833ac80404898977cb80a265b
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:19:49 2018 -0700

    pmclog(3): add callchain and pmcallocatedyn, remove pcsample

commit 73e13a0d2e9498c81c150d14d022050cee7511bb
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:19:00 2018 -0700

    pmclog.h: GC pcsample field

commit 3e93ffd65da641fa657539dad3c48e281f8b5798
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:05:57 2018 -0700

    hwpmc: make Intel core CPUs use external event tables

commit 634f5fae1e1644ac324003136c66cd9c619d1c93
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:00:06 2018 -0700

    pmclog: update log record types, bump PMC_MAJOR
    - explicitly make log record types a multiple of 8 bytes
    - hook in pmu event types for pmc_allocate records
    - remove references to no longer PCSAMPLE record

commit 83d84fcd2d65bdf6ddcb2e155a22f0cfa2a9c225
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 21:52:10 2018 -0700

    libpmc: add support for having vendor table driven pmc_allocate

commit 9e6ad63c40c2fce8404847ace5078ca6cb33a736
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 19:11:33 2018 -0700

    hwpmc_core: add accessors for EVSEL & UMASK, make IAP_UMASK useful to user

commit 859dceb93daa6419a48c794db99b6758e5b041c9
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 19:09:45 2018 -0700

    pmcstat: update usage and man page as well as make -L consistent with pmccontrol

commit 79c7d8597e28c2eb13f5f9113e65ec2792ca57b1
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 18:07:03 2018 -0700

    pmu_util: add support for all current intel event keywords

commit d8089c7f6a6c8527f38324252b1ffb47004694c6
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 17:45:00 2018 -0700

    add description for new arguments

commit 058336740bab53c62ec88a3a026ea848cf3878c6
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 17:38:15 2018 -0700

    libpmc: move pmu_events table and pmu_utils out of libpmcstat so that they can be used by pmc_allocate

commit 049b66b382e2f833c3f47bc8df9e750cb265709f
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 16:12:41 2018 -0700

    pmcstat: hook pmu_events counter description utility routines in

commit f5e01e7b37a691dc045e1aa16b3ebdd162515de8
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 16:11:59 2018 -0700

    pmu_events: add utility routines for listing counters and their descriptions

commit cba4d4f8907f772279f86f18f915e0d74d33ac56
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 16:09:50 2018 -0700

    pmu-events: expand out skylake regex to simplify string matches
2018-05-26 18:12:50 +00:00
Matt Macy
0f8d79d977 CK: update consumers to use CK macros across the board
r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors
2018-05-24 23:21:23 +00:00
Matt Macy
a85289cf9b hwppmc: set threadid in callchain records - second part of r334108 2018-05-23 17:44:29 +00:00
Matt Macy
ae573a91cf pmc: detach free_gtask on unload
Reported by:	pho
2018-05-20 20:34:15 +00:00
Matt Macy
f2daab2c8f pmc: avoid potential race on shutdown
Clear shutdown flag first, conservatively allow 5ms for all hardclock consumers to
see flag before drainining
2018-05-20 19:35:24 +00:00
Matt Macy
70398c2f86 epoch(9): Make epochs non-preemptible by default
There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.

Reported by:	markj
2018-05-18 17:29:43 +00:00
Matt Macy
6161b98c99 hwpmc: Implement per-thread counters for PMC sampling
This implements per-thread counters for PMC sampling. The thread
descriptors are stored in a list attached to the process descriptor.
These thread descriptors can store any per-thread information necessary
for current or future features. For the moment, they just store the counters
for sampling.

The thread descriptors are created when the process descriptor is created.
Additionally, thread descriptors are created or freed when threads
are started or stopped. Because the thread exit function is called in a
critical section, we can't directly free the thread descriptors. Hence,
they are freed to a cache, which is also used as a source of allocations
when needed for new threads.

Approved by:	sbruno
Obtained from:	jtl
Sponsored by:	Juniper Networks, Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15335
2018-05-16 22:29:20 +00:00
Matt Macy
102ccac21c hwpmc: don't reference domain index with no memory backing it
On multi-socket the domain will be correctly set for a given CPU
regardless of whether or not NUMA is enabled.

Approved by:	sbruno
2018-05-14 06:11:25 +00:00
Matt Macy
8fa7df3668 pmc: don't add pmc owner to list until setup is complete
Once a pmc owner is added to the pmc_ss_owners list it is
visible for all to see. We don't want this to happen until
setup is complete.

Reported by:	mjg
Approved by:	sbruno
2018-05-14 01:08:47 +00:00
Matt Macy
0f00315cb3 hwpmc: fix load/unload race and vm map LOR
- fix load/unload race by allocating the per-domain list structure at boot

- fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch
  to protect the liveness of elements of the pmc_ss_owners list

Reported by:	pho
Approved by:	sbruno
2018-05-14 00:21:04 +00:00
Matt Macy
f1401123c5 hwpmc/epoch - don't reference domain if NUMA is not set
It appears that domain information is set correctly independent
of whether or not NUMA is defined. However, there is no memory
backing secondary domains leading to allocation failure.

Reported by:	pho@, np@
Approved by:	sbruno@
2018-05-12 20:00:29 +00:00
Matt Macy
d626a614b9 hwpmc(9): clear remaining sample work for hardclock
- fix last minute change in 333509 where by runcount references
  to a pmc would remaining causing us to pause loop forever

Approved by:	sbruno
2018-05-12 03:45:30 +00:00
Matt Macy
e6b475e0af hwpmc(9): Make pmclog buffer pcpu and update constants
On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.

Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &

Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total

After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total

For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &

Collecting 10 samples of `make -j96 buildkernel` from each:

x before
+ after

real time:
    N           Min           Max        Median           Avg        Stddev
x  10          76.4        127.62        84.845        88.577     15.100031
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        -28.398 +/- 10.0344
        -32.0602% +/- 7.69825%
        (Student's t, pooled s = 10.6794)

system time:
    N           Min           Max        Median           Avg        Stddev
x  10       2277.96       6948.53       2949.47      3341.492     1385.2677
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        -2277.47 +/- 920.425
        -68.1574% +/- 8.77623%
        (Student's t, pooled s = 979.596)

x no pmc
+ pmc running
real time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10          76.4        127.62        84.845        88.577     15.100031
Difference at 95.0% confidence
        29.73 +/- 10.0335
        50.5208% +/- 17.0525%
        (Student's t, pooled s = 10.6785)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        1.332 +/- 0.248939
        2.2635% +/- 0.426506%
        (Student's t, pooled s = 0.264942)

system time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10       2277.96       6948.53       2949.47      3341.492     1385.2677
Difference at 95.0% confidence
        2309.97 +/- 920.443
        223.937% +/- 89.3039%
        (Student's t, pooled s = 979.616)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        32.493 +/- 16.0042
        3.15% +/- 1.5794%
        (Student's t, pooled s = 17.0331)

Reviewed by:	jeff@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15155
2018-05-12 01:26:34 +00:00
Matt Macy
cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Fabien Thomas
9f4f1d4d1f Fix pmcstat exit from kernel introduced by r325275.
pmcstat request for close will generate a close event.
This event will be in turn received by pmcstat to close the file.

Reviewed by:	kib
Tested by:	pho
MFC after:	1 week
Sponsored by: Stormshield
2018-01-17 16:41:22 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Konstantin Belousov
f4dd123e15 Do not leak PMC_PO_OWNS_LOGFILE on error.
Note that PMCLOG_RESERVE_WITH_ERROR() macro contains goto error;
statement and executed after the flag is set.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-11-13 10:45:31 +00:00
Konstantin Belousov
c9da263712 Style bug.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-11-13 10:43:31 +00:00
Konstantin Belousov
d2cd638852 Check that the pmc index is less than the number of hardware PMCs,
instead of asserting the condition.

The row index is directly supplied by userspace, the kernel must
handle invalid values.

Submitted by:	pho
MFC after:	3 days
2017-11-10 19:10:14 +00:00
Konstantin Belousov
20b555e146 Do not run pmclog_configure_log() without pmc_sx protection.
The r195005 unlocked pmc_sx before calling into pmclog_configure_log()
to avoid the LOR, but it allows flush or closelog to run in parallel
with the configuration, causing many failure modes.

Revert r195005.  Pre-create the logging process, allowing it to run
after the set up succeeded, otherwise the process terminates itself.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:43:39 +00:00
Konstantin Belousov
1121a37474 Be protective and check the po_file validity before dropping the ref.
Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:37:45 +00:00
Konstantin Belousov
ea4d25f90b In hwpmc, do not double-close the logging file.
hwpmc(4) must not voluntarily call fo_close(), doing this causes
double-close of the file.  It seems to almost avoid bad consequences
for pipes, but other types of files demonstrate random memory access.

To fix, remove fo_close() calls, which also do not provide the
declared wake-up of waiters consistently.  Instead, send a signal to
the logger and configure the logger process to not block it.  Since
logger never returns to userspace, the signal only causes termination
of the interruptible sleeps in fo_write().

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:32:52 +00:00
Konstantin Belousov
bd63e82975 There is no use for dropping Giant in the pmc syscall.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:16:18 +00:00
Konstantin Belousov
cf9ef80607 Minor style tweaks.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:05:47 +00:00
Konstantin Belousov
1cfbc451b9 Use designated initializers for pmc sysent and module data.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 10:49:41 +00:00
Ruslan Bukin
07ff05c2ae o Support for Kabylake CPU PMCs (fall down to PMC_CPU_INTEL_SKYLAKE).
o Fix bugs in events descriptions for Skylake, Skylake Xeon and Haswell.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12654
2017-10-13 15:02:29 +00:00
Conrad Meyer
1356a2e6fa hwpmc(4): Actually use a sufficiently wide type
jhibbits@ points out that left shifting bits 8-11 24 bits won't fit in a 32-bit
integer either.

Corrects r324533.

Submitted by:	jhibbits
Sponsored by:	Dell EMC Isilon
2017-10-11 15:13:40 +00:00
Conrad Meyer
a7b8be82f0 hwpmc(4): Force sufficiently wide type for left shift
Ordinary input to this macro comes from pe_code, which is uint16_t.  Coverity
points out that shifting such a value discards the result of a 24 bit shift,
which is not what we want.

A follow-up to r324291.

CID:		1381676
Sponsored by:	Dell EMC Isilon
2017-10-11 14:59:04 +00:00
Conrad Meyer
1d3aa3624d hwpmc(4): Add support for extended AMD events
Sponsored by:	Dell EMC Isilon
2017-10-04 23:35:10 +00:00
Konstantin Belousov
b99b705d9c Skylake server core PMC support for hwpmc(4).
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
Hardware provided by:	Intel
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12221
2017-09-06 17:19:48 +00:00
Konstantin Belousov
16997138d3 Fix logic error in the the assert, causing the condition to be always true.
Also improve the formatting of the corresponding KASSERT message.

Based on the submission by:	Svyatoslav <razmyslov@viva64.com>
Found by:	PVS-Studio
PR:	217741
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2017-08-08 15:46:29 +00:00
Zbigniew Bodek
6cb40391c4 Fix INVARIANTS debug code in HWPMC
When HWPMC stops sampling, ps_pmc may be freed before samples
are processed. In such situation treat PMC as stopped.
Add "ifdef" to fix build without INVARIANTS code.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10912
2017-06-13 18:53:56 +00:00
Zbigniew Bodek
95ca4f5a0e Fix event table for Cortex A9.
Removed events 0x8 (INSTR_EXECUTED), 0xE (PC_PROC_RETURN) and
0x13-0x1d not supported on Cortex A9.
Add events 0x68 and 0x6E which replaced 0x8 and 0xE.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10911
2017-06-13 18:52:39 +00:00
Zbigniew Bodek
ab632d9651 Fix HWPMC interrupt handling in Counting Mode
Additionally:
 - Fix support for Cycle Counter (evsel == 0xFF)
 - Stop and mask interrupts from all counters on init and finish

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10910
2017-06-13 18:51:23 +00:00