f5f9340b98
New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month
462 lines
17 KiB
Groff
462 lines
17 KiB
Groff
.\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd October 4, 2008
|
|
.Dt PMC 3
|
|
.Os
|
|
.Sh NAME
|
|
.Nm pmc
|
|
.Nd library for accessing hardware performance monitoring counters
|
|
.Sh LIBRARY
|
|
.Lb libpmc
|
|
.Sh SYNOPSIS
|
|
.In pmc.h
|
|
.Sh DESCRIPTION
|
|
Intel Pentium PMCs are present in Intel
|
|
.Tn Pentium
|
|
and
|
|
.Tn "Pentium MMX"
|
|
processors.
|
|
These PMCs are documented in the
|
|
.Rs
|
|
.%B "Intel 64 and IA-32 Intel(R) Architectures Software Developer's Manual"
|
|
.%T "Volume 3B: System Programming Guide, Part 2"
|
|
.%N "Order Number 253669-024US"
|
|
.%D "August 2007"
|
|
.%Q "Intel Corporation"
|
|
.Re
|
|
.Ss PMC Features
|
|
These CPUs contain two PMCs, each 40 bits wide.
|
|
These PMCs support the following capabilities:
|
|
.Bl -column "PMC_CAP_INTERRUPT" "Support"
|
|
.It Em Capability Ta Em Support
|
|
.It PMC_CAP_CASCADE Ta \&No
|
|
.It PMC_CAP_EDGE Ta \&No
|
|
.It PMC_CAP_INTERRUPT Ta \&No
|
|
.It PMC_CAP_INVERT Ta \&No
|
|
.It PMC_CAP_READ Ta Yes
|
|
.It PMC_CAP_PRECISE Ta \&No
|
|
.It PMC_CAP_SYSTEM Ta Yes
|
|
.It PMC_CAP_TAGGING Ta \&No
|
|
.It PMC_CAP_THRESHOLD Ta \&No
|
|
.It PMC_CAP_USER Ta Yes
|
|
.It PMC_CAP_WRITE Ta Yes
|
|
.El
|
|
.Ss Event Qualifiers
|
|
Event specifiers for Intel Pentium PMCs can have the following common
|
|
qualifiers:
|
|
.Bl -tag -width indent
|
|
.It Li duration
|
|
Count duration (in clocks) of events.
|
|
The default is to count events.
|
|
.It Li os
|
|
Measure events at privilege levels 0, 1 and 2.
|
|
.It Li overflow
|
|
Assert the external processor pin associated with a counter on counter
|
|
overflow.
|
|
.It Li usr
|
|
Measure events at privilege level 3.
|
|
.El
|
|
.Pp
|
|
If neither of the
|
|
.Dq Li os
|
|
or
|
|
.Dq Li usr
|
|
qualifiers are specified, the default is to enable both.
|
|
.Pp
|
|
Some events may only be used on specific counters and some events
|
|
are defined only on processors supporting the MMX instruction set.
|
|
Note that these PMCs do not have the ability to interrupt the CPU.
|
|
.Ss Intel Pentium Event Specifiers
|
|
The event specifiers supported by Intel Pentium PMCs are:
|
|
.Bl -tag -width indent
|
|
.It Li p5-any-segment-register-loaded
|
|
.Pq Event 0FH
|
|
The number of writes to any segment register, including the LDTR,
|
|
GDTR, TR and IDTR.
|
|
Far control transfers and task switches that involve privilege
|
|
level changes will count this event twice.
|
|
.It Li p5-bank-conflicts
|
|
.Pq Event 0AH
|
|
The number of actual bank conflicts.
|
|
.It Li p5-branches
|
|
.Pq Event 12H
|
|
The number of taken and not taken branches including branches, jumps, calls,
|
|
software interrupts and interrupt returns.
|
|
.It Li p5-breakpoint-match-on-dr0-register
|
|
.Pq Event 23H
|
|
The number of matches on the DR0 breakpoint register.
|
|
.It Li p5-breakpoint-match-on-dr1-register
|
|
.Pq Event 24H
|
|
The number of matches on the DR1 breakpoint register.
|
|
.It Li p5-breakpoint-match-on-dr2-register
|
|
.Pq Event 25H
|
|
The number of matches on the DR2 breakpoint register.
|
|
.It Li p5-breakpoint-match-on-dr3-register
|
|
.Pq Event 26H
|
|
The number of matches on the DR3 breakpoint register.
|
|
.It Li p5-btb-false-entries
|
|
.Pq Event 3AH , Tn Pentium MMX
|
|
The number of false entries in the BTB.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-btb-hits
|
|
.Pq Event 13H
|
|
The number of branches executed that hit in the branch table buffer.
|
|
.It Li p5-btb-miss-prediction-on-not-taken-branch
|
|
.Pq Event 3AH , Tn Pentium MMX
|
|
The number of times the BTB predicted a not-taken branch as taken.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-bus-cycle-duration
|
|
.Pq Event 18H
|
|
The number of cycles while a bus cycle was in progress.
|
|
.It Li p5-bus-ownership-latency
|
|
.Pq Event 2AH , Tn Pentium MMX
|
|
The time from bus ownership being requested to ownership being granted.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-bus-ownership-transfers
|
|
.Pq Event 2AH , Tn Pentium MMX
|
|
The number of bus ownership transfers.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-bus-utilization-due-to-processor-activity
|
|
.Pq Event 2EH , Tn Pentium MMX
|
|
The number of clocks the bus is busy due to the processor's own
|
|
activity.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-cache-line-sharing
|
|
.Pq Event 2CH , Tn Pentium MMX
|
|
The number of shared data lines in L1 cache.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-cache-m-state-line-sharing
|
|
.Pq Event 2CH , Tn Pentium MMX
|
|
The number of hits to an M- state line due to a memory access by
|
|
another processor.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-code-cache-miss
|
|
.Pq Event 0EH
|
|
The number of instruction reads that miss the internal code cache.
|
|
Both cacheable and un-cacheable misses are counted.
|
|
.It Li p5-code-read
|
|
.Pq Event 0CH
|
|
The number of instruction reads to both cacheable and un-cacheable regions.
|
|
.It Li p5-code-tlb-miss
|
|
.Pq Event 0DH
|
|
The number of instruction reads that miss the instruction TLB.
|
|
Both cacheable and un-cacheable unreads are counted.
|
|
.It Li p5-d1-starvation-and-fifo-is-empty
|
|
.Pq Event 33H , Tn Pentium MMX
|
|
The number of times the D1 stage cannot issue any instructions because
|
|
the FIFO was empty.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-d1-starvation-and-only-one-instruction-in-fifo
|
|
.Pq Event 33H , Tn Pentium MMX
|
|
The number of times the D1 stage could issue only one instruction
|
|
because the FIFO had one instruction ready.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-data-cache-lines-written-back
|
|
.Pq Event 06H
|
|
The number of data cache lines that are written back, including
|
|
those caused by internal and external snoops.
|
|
.It Li p5-data-cache-tlb-miss-stall-duration
|
|
.Pq Event 30H , Tn Pentium MMX
|
|
The number of clocks the pipeline is stalled due to a data cache
|
|
TLB miss.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-data-read
|
|
.Pq Event 00H
|
|
The number of memory data reads, counting internal data cache hits and
|
|
misses.
|
|
I/O and data memory accesses due to TLB miss processing are
|
|
not included.
|
|
Split cycle reads are counted individually.
|
|
.It Li p5-data-read-miss
|
|
.Pq Event 03H
|
|
The number of memory read accesses that miss the data cache, counting
|
|
both cacheable and un-cacheable accesses.
|
|
Data accesses that are part of TLB miss processing are not included.
|
|
I/O accesses are not included.
|
|
.It Li p5-data-read-miss-or-write-miss
|
|
.Pq Event 29H
|
|
The number of data reads and writes that miss the internal data cache,
|
|
counting un-cacheable accesses.
|
|
Data accesses due to TLB miss processing are not counted.
|
|
.It Li p5-data-read-or-write
|
|
.Pq Event 28H
|
|
The number of data reads and writes including internal data cache hits
|
|
and misses.
|
|
Data reads due to TLB miss processing are not counted.
|
|
.It Li p5-data-tlb-miss
|
|
.Pq Event 02H
|
|
The number of misses to the data cache translation look aside buffer.
|
|
.It Li p5-data-write
|
|
.Pq Event 01H
|
|
The number of memory data writes, counting internal data cache hits
|
|
and misses.
|
|
I/O is not included and split cycle writes are counted individually.
|
|
.It Li p5-data-write-miss
|
|
.Pq Event 04H
|
|
The number of memory write accesses that miss the data cache, counting
|
|
both cacheable and un-cacheable accesses.
|
|
I/O accesses are not counted.
|
|
.It Li p5-emms-instructions-executed
|
|
.Pq Event 2DH , Tn Pentium MMX
|
|
The number of EMMS instructions executed.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-external-data-cache-snoop-hits
|
|
.Pq Event 08H
|
|
The number of external snoops to the data cache that hit a valid line,
|
|
or the data line fill buffer, or one of the write back buffers.
|
|
.It Li p5-external-snoops
|
|
.Pq Event 07H
|
|
The number of external snoop requests accepted, including snoops that
|
|
hit in the code cache, the data cache and that hit in neither.
|
|
.It Li p5-floating-point-stalls-duration
|
|
.Pq Event 32H , Tn Pentium MMX
|
|
The number of cycles the pipeline is stalled due to a floating point
|
|
freeze.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-flops
|
|
.Pq Event 22H
|
|
The number of floating point adds, subtracts, multiples, divides and
|
|
square roots.
|
|
Transcendental instructions trigger this event multiple times.
|
|
Instructions generating divide-by-zero, negative square root, special
|
|
operand and stack exceptions are not counted.
|
|
Integer multiply instructions that use the x87 FPU are counted.
|
|
.It Li p5-full-write-buffer-stall-duration-while-executing-mmx-instructions
|
|
.Pq Event 3BH , Tn Pentium MMX
|
|
The number of clocks the pipeline has stalled due to full write
|
|
buffers when executing MMX instructions.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-hardware-interrupts
|
|
.Pq Event 27H
|
|
The number of taken INTR and NMI interrupts.
|
|
.It Li p5-instructions-executed
|
|
.Pq Event 16H
|
|
The number of instructions executed.
|
|
Repeat prefixed instructions are counted only once.
|
|
The HLT instruction is counted only once, irrespective of the number
|
|
of cycles spent in the halted state.
|
|
All hardware and software exceptions are counted as instructions, and
|
|
fault handler invocations are also counted as instructions.
|
|
.It Li p5-instructions-executed-v-pipe
|
|
.Pq Event 17H
|
|
The number of instructions that executed in the V pipe.
|
|
.It Li p5-io-read-or-write-cycle
|
|
.Pq Event 1DH
|
|
The number of bus cycles directed to I/O space.
|
|
.It Li p5-locked-bus-cycle
|
|
.Pq Event 1CH
|
|
The number of locked bus cycles that occur on account of the lock
|
|
prefixes, LOCK instructions, page table updates and descriptor table
|
|
updates.
|
|
.It Li p5-memory-accesses-in-both-pipes
|
|
.Pq Event 09H
|
|
The number of data memory reads or writes that are paired in both pipes.
|
|
.It Li p5-misaligned-data-memory-or-io-references
|
|
.Pq Event 0BH
|
|
The number of memory or I/O reads or writes that are not aligned on
|
|
natural boundaries.
|
|
2- and 4-byte accesses are counted as misaligned if they cross a 4
|
|
byte boundary.
|
|
.It Li p5-misaligned-data-memory-reference-on-mmx-instructions
|
|
.Pq Event 36H , Tn Pentium MMX
|
|
The number of misaligned data memory references when executing MMX
|
|
instructions.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-mispredicted-or-unpredicted-returns
|
|
.Pq Event 37H , Tn Pentium MMX
|
|
The number of returns predicted incorrectly or not at all, only
|
|
counting RET instructions.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-mmx-instruction-data-read-misses
|
|
.Pq Event 31H , Tn Pentium MMX
|
|
The number of MMX instruction data read misses.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-mmx-instruction-data-reads
|
|
.Pq Event 31H , Tn Pentium MMX
|
|
The number of MMX instruction data reads.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-mmx-instruction-data-write-misses
|
|
.Pq Event 34H , Tn Pentium MMX
|
|
The number of data write misses caused by MMX instructions.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-mmx-instruction-data-writes
|
|
.Pq Event 34H , Tn Pentium MMX
|
|
The number of data writes caused by MMX instructions.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-mmx-instructions-executed-u-pipe
|
|
.Pq Event 2BH , Tn Pentium MMX
|
|
The number of MMX instructions executed in the U pipe.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-mmx-instructions-executed-v-pipe
|
|
.Pq Event 2BH , Tn Pentium MMX
|
|
The number of MMX instructions executed in the V pipe.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-mmx-multiply-unit-interlock
|
|
.Pq Event 38H , Tn Pentium MMX
|
|
The number of clocks the pipeline is stalled because the destination
|
|
of a prior MMX multiply is not ready.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-movd-movq-store-stall-due-to-previous-mmx-operation
|
|
.Pq Event 38H , Tn Pentium MMX
|
|
The number of clocks a MOVD/MOVQ instruction stalled in the D2 stage
|
|
of the pipeline due to a previous MMX instruction.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-noncacheable-memory-reads
|
|
.Pq Event 1EH
|
|
The number of bus cycles for non-cacheable instruction or data reads,
|
|
including cycles caused by TLB misses.
|
|
.It Li p5-number-of-cycles-not-in-halt-state
|
|
.Pq Event 30H , Tn Pentium MMX
|
|
The number of cycles the processor is not idle due to the HLT
|
|
instruction.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-pipeline-agi-stalls
|
|
.Pq Event 1FH
|
|
The number of address generation interlock stalls.
|
|
An AGI that occurs in both the U and V pipelines in the same clock
|
|
signals the event twice.
|
|
.It Li p5-pipeline-flushes
|
|
.Pq Event 15H
|
|
The number of pipeline flushes that occur.
|
|
Pipeline flushes are caused by branch mispredicts, exceptions,
|
|
interrupts, some segment register loads, and BTB misses.
|
|
Prefetch queue flushes due to serializing instructions are not
|
|
counted.
|
|
.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions
|
|
.Pq Event 35H , Tn Pentium MMX
|
|
The number of pipeline flushes due to wrong branch predictions
|
|
resolved in either the E- or WB- stage of the pipeline.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions-resolved-in-wb-stage
|
|
.Pq Event 35H , Tn Pentium MMX
|
|
The number of pipeline flushes due to wrong branch predictions
|
|
resolved in the stage of the pipeline.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-pipeline-stall-for-mmx-instruction-data-memory-reads
|
|
.Pq Event 36H , Tn Pentium MMX
|
|
The number of clocks during pipeline stalls caused by waiting MMX data
|
|
memory reads.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-predicted-returns
|
|
.Pq Event 37H , Tn Pentium MMX
|
|
The number of predicted returns, whether correct or incorrect.
|
|
This counter only counts RET instructions.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-returns
|
|
.Pq Event 39H , Tn Pentium MMX
|
|
The number of RET instructions executed.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-saturating-mmx-instructions-executed
|
|
.Pq Event 2FH , Tn Pentium MMX
|
|
The number of saturating MMX instructions executed.
|
|
This event is only allocated on counter 0.
|
|
.It Li p5-saturations-performed
|
|
.Pq Event 2FH , Tn Pentium MMX
|
|
The number of saturating MMX instructions executed when at least one
|
|
of its results were actually saturated.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-stall-on-mmx-instruction-write-to-e-o-m-state-line
|
|
.Pq Event 3BH , Tn Pentium MMX
|
|
The number of clocks during stalls on MMX instructions writing to
|
|
E- or M- state cache lines.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-stall-on-write-to-an-e-or-m-state-line
|
|
.Pq Event 1BH
|
|
The number of stalls on a write to an exclusive or modified data cache
|
|
line.
|
|
.It Li p5-taken-branch-or-btb-hit
|
|
.Pq Event 14H
|
|
The number of events that may cause a hit in the BTB, namely either
|
|
taken branches or BTB hits.
|
|
.It Li p5-taken-branches
|
|
.Pq Event 32H , Tn Pentium MMX
|
|
The number of taken branches.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-transitions-between-mmx-and-fp-instructions
|
|
.Pq Event 2DH , Tn Pentium MMX
|
|
The number of transitions between MMX and floating-point instructions
|
|
and vice-versa.
|
|
This event is only allocated on counter 1.
|
|
.It Li p5-waiting-for-data-memory-read-stall-duration
|
|
.Pq Event 1AH
|
|
The number of clocks the pipeline was stalled waiting for data
|
|
memory reads.
|
|
Data TLB misses processing is included in this count.
|
|
.It Li p5-write-buffer-full-stall-duration
|
|
.Pq Event 19H
|
|
The number of clocks while the pipeline was stalled due to write
|
|
buffers being full.
|
|
.It Li p5-write-hit-to-m-or-e-state-lines
|
|
.Pq Event 05H
|
|
The number of writes that hit exclusive or modified lines in the data
|
|
cache.
|
|
.It Li p5-writes-to-noncacheable-memory
|
|
.Pq Event 2EH , Tn Pentium MMX
|
|
The number of writes to non-cacheable memory, including write cycles
|
|
caused by TLB misses and I/O writes.
|
|
This event is only allocated on counter 1.
|
|
.El
|
|
.Ss Event Name Aliases
|
|
The following table shows the mapping between the PMC-independent
|
|
aliases supported by
|
|
.Lb libpmc
|
|
and the underlying hardware events used.
|
|
.Bl -column "branch-mispredicts" "Description"
|
|
.It Em Alias Ta Em Event
|
|
.It Li branches Ta Li p5-taken-branches
|
|
.It Li branch-mispredicts Ta Li (unsupported)
|
|
.It Li dc-misses Ta Li p5-data-read-miss-or-write-miss
|
|
.It Li ic-misses Ta Li p5-code-cache-miss
|
|
.It Li instructions Ta Li p5-instructions-executed
|
|
.It Li interrupts Ta Li p5-hardware-interrupts
|
|
.It Li unhalted-cycles Ta Li p5-number-of-cycles-not-in-halt-state
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr pmc 3 ,
|
|
.Xr pmc.atom 3 ,
|
|
.Xr pmc.core 3 ,
|
|
.Xr pmc.core2 3 ,
|
|
.Xr pmc.iaf 3 ,
|
|
.Xr pmc.k7 3 ,
|
|
.Xr pmc.k8 3 ,
|
|
.Xr pmc.p4 3 ,
|
|
.Xr pmc.p6 3 ,
|
|
.Xr pmc.soft 3 ,
|
|
.Xr pmc.tsc 3 ,
|
|
.Xr pmclog 3 ,
|
|
.Xr hwpmc 4
|
|
.Sh HISTORY
|
|
The
|
|
.Nm pmc
|
|
library first appeared in
|
|
.Fx 6.0 .
|
|
.Sh AUTHORS
|
|
The
|
|
.Lb libpmc
|
|
library was written by
|
|
.An "Joseph Koshy"
|
|
.Aq jkoshy@FreeBSD.org .
|