freebsd-skq/lib/libpmc/pmc.k8.3
2008-09-16 16:53:25 +00:00

704 lines
21 KiB
Groff

.\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" This software is provided by Joseph Koshy ``as is'' and
.\" any express or implied warranties, including, but not limited to, the
.\" implied warranties of merchantability and fitness for a particular purpose
.\" are disclaimed. in no event shall Joseph Koshy be liable
.\" for any direct, indirect, incidental, special, exemplary, or consequential
.\" damages (including, but not limited to, procurement of substitute goods
.\" or services; loss of use, data, or profits; or business interruption)
.\" however caused and on any theory of liability, whether in contract, strict
.\" liability, or tort (including negligence or otherwise) arising in any way
.\" out of the use of this software, even if advised of the possibility of
.\" such damage.
.\"
.\" $FreeBSD$
.\"
.Dd September 17, 2008
.Os
.Dt PMC.K8 3
.Sh NAME
.Nm pmc.k8
.Nd measurement events for
.Tn AMD
.Tn Athlon 64
(K8 family) CPUs
.Sh LIBRARY
.Lb libpmc
.Sh SYNOPSIS
.In pmc.h
.Sh DESCRIPTION
AMD K8 PMCs are present in the
.Tn "AMD Athlon64"
and
.Tn "AMD Opteron"
series of CPUs.
They are documented in the
.Rs
.%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
.%N "Publication No. 26094"
.%D "April 2004"
.%Q "Advanced Micro Devices, Inc."
.Re
.Ss PMC Features
AMD K8 PMCs are 48 bits wide.
Each CPU contains 4 PMCs with the following capabilities:
.Bl -column "PMC_CAP_INTERRUPT" "Support"
.It Em Capability Ta Em Support
.It PMC_CAP_CASCADE Ta \&No
.It PMC_CAP_EDGE Ta Yes
.It PMC_CAP_INTERRUPT Ta Yes
.It PMC_CAP_INVERT Ta Yes
.It PMC_CAP_READ Ta Yes
.It PMC_CAP_PRECISE Ta \&No
.It PMC_CAP_SYSTEM Ta Yes
.It PMC_CAP_TAGGING Ta \&No
.It PMC_CAP_THRESHOLD Ta Yes
.It PMC_CAP_USER Ta Yes
.It PMC_CAP_WRITE Ta Yes
.El
.Ss Event Qualifiers
.Pp
Event specifiers for AMD K8 PMCs can have the following optional
qualifiers:
.Bl -tag -width indent
.It Li count= Ns Ar value
Configure the counter to increment only if the number of configured
events measured in a cycle is greater than or equal to
.Ar value .
.It Li edge
Configure the counter to only count negated-to-asserted transitions
of the conditions expressed by the other fields.
In other words, the counter will increment only once whenever a given
condition becomes true, irrespective of the number of clocks during
which the condition remains true.
.It Li inv
Invert the sense of comparision when the
.Dq Li count
qualifier is present, making the counter to increment when the
number of events per cycle is less than the value specified by
the
.Dq Li count
qualifier.
.It Li mask= Ns Ar qualifier
Many event specifiers for AMD K8 PMCs need to be additionally
qualified using a mask qualifier.
These additional qualifiers are event-specific and are documented
along with their associated event specifiers below.
.It Li os
Configure the PMC to count events happening at privilege level 0.
.It Li usr
Configure the PMC to count events occurring at privilege levels 1, 2
or 3.
.El
.Pp
If neither of the
.Dq Li os
or
.Dq Li usr
qualifiers were specified, the default is to enable both.
.Ss AMD K8 Event Specifiers
The event specifiers supported on AMD K8 PMCs are:
.Bl -tag -width indent
.It Li k8-bu-cpu-clk-unhalted
Count the number of clock cycles when the CPU is not in the HLT or
STPCLK states.
.It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
Count fill requests that missed in the L2 cache.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li dc-fill
Count data cache fill requests.
.It Li ic-fill
Count instruction cache fill requests.
.It Li tlb-reload
Count TLB reloads.
.El
.Pp
The default is to count all types of requests.
.It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
Count internally generated requests to the L2 cache.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li cancelled
Count cancelled requests.
.It Li dc-fill
Count data cache fill requests.
.It Li ic-fill
Count instruction cache fill requests.
.It Li tag-snoop
Count tag snoop requests.
.It Li tlb-reload
Count TLB reloads.
.El
.Pp
The default is to count all types of requests.
.It Li k8-dc-access
Count data cache accesses including microcode scratchpad accesses.
.It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
Count data cache copyback operations.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li exclusive
Count operations for lines in the
.Dq exclusive
state.
.It Li invalid
Count operations for lines in the
.Dq invalid
state.
.It Li modified
Count operations for lines in the
.Dq modified
state.
.It Li owner
Count operations for lines in the
.Dq owner
state.
.It Li shared
Count operations for lines in the
.Dq shared
state.
.El
.Pp
The default is to count operations for lines in all the
above states.
.It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
Count data cache accesses by lock instructions.
This event is only available on processors of revision C or later
vintage.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li accesses
Count data cache accesses by lock instructions.
.It Li misses
Count data cache misses by lock instructions.
.El
.Pp
The default is to count all accesses.
.It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
Count the number of dispatched prefetch instructions.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li load
Count load operations.
.It Li nta
Count non-temporal operations.
.It Li store
Count store operations.
.El
.Pp
The default is to count all operations.
.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
Count L1 DTLB misses that are L2 DTLB hits.
.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
Count L1 DTLB misses that are also misses in the L2 DTLB.
.It Li k8-dc-microarchitectural-early-cancel-of-an-access
Count microarchitectural early cancels of data cache accesses.
.It Li k8-dc-microarchitectural-late-cancel-of-an-access
Count microarchitectural late cancels of data cache accesses.
.It Li k8-dc-misaligned-data-reference
Count misaligned data references.
.It Li k8-dc-miss
Count data cache misses.
.It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
Count one bit ECC errors found by the scrubber.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li scrubber
Count scrubber detected errors.
.It Li piggyback
Count piggyback scrubber errors.
.El
.Pp
The default is to count both kinds of errors.
.It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
Count data cache refills from L2 cache.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li exclusive
Count operations for lines in the
.Dq exclusive
state.
.It Li invalid
Count operations for lines in the
.Dq invalid
state.
.It Li modified
Count operations for lines in the
.Dq modified
state.
.It Li owner
Count operations for lines in the
.Dq owner
state.
.It Li shared
Count operations for lines in the
.Dq shared
state.
.El
.Pp
The default is to count operations for lines in all the
above states.
.It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
Count data cache refills from system memory.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li exclusive
Count operations for lines in the
.Dq exclusive
state.
.It Li invalid
Count operations for lines in the
.Dq invalid
state.
.It Li modified
Count operations for lines in the
.Dq modified
state.
.It Li owner
Count operations for lines in the
.Dq owner
state.
.It Li shared
Count operations for lines in the
.Dq shared
state.
.El
.Pp
The default is to count operations for lines in all the
above states.
.It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
Count the number of dispatched FPU ops.
This event is supported in revision B and later CPUs.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li add-pipe-excluding-junk-ops
Count add pipe ops excluding junk ops.
.It Li add-pipe-junk-ops
Count junk ops in the add pipe.
.It Li multiply-pipe-excluding-junk-ops
Count multiply pipe ops excluding junk ops.
.It Li multiply-pipe-junk-ops
Count junk ops in the multiply pipe.
.It Li store-pipe-excluding-junk-ops
Count store pipe ops excluding junk ops
.It Li store-pipe-junk-ops
Count junk ops in the store pipe.
.El
.Pp
The default is to count all types of ops.
.It Li k8-fp-cycles-with-no-fpu-ops-retired
Count cycles when no FPU ops were retired.
This event is supported in revision B and later CPUs.
.It Li k8-fp-dispatched-fpu-fast-flag-ops
Count dispatched FPU ops that use the fast flag interface.
This event is supported in revision B and later CPUs.
.It Li k8-fr-decoder-empty
Count cycles when there was nothing to dispatch (i.e., the decoder
was empty).
.It Li k8-fr-dispatch-stalls
Count all dispatch stalls.
.It Li k8-fr-dispatch-stall-for-segment-load
Count dispatch stalls for segment loads.
.It Li k8-fr-dispatch-stall-for-serialization
Count dispatch stalls for serialization.
.It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
Count dispatch stalls from branch abort to retiral.
.It Li k8-fr-dispatch-stall-when-fpu-is-full
Count dispatch stalls when the FPU is full.
.It Li k8-fr-dispatch-stall-when-ls-is-full
Count dispatch stalls when the load/store unit is full.
.It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
Count dispatch stalls when the reorder buffer is full.
.It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
Count dispatch stalls when reservation stations are full.
.It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
Count dispatch stalls when waiting for all to be quiet.
.\" XXX What does "waiting for all to be quiet" mean?
.It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
Count dispatch stalls when a far control transfer or a resync branch
is pending.
.It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
Count FPU exceptions.
This event is supported in revision B and later CPUs.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li sse-and-x87-microtraps
Count SSE and x87 microtraps.
.It Li sse-reclass-microfaults
Count SSE reclass microfaults
.It Li sse-retype-microfaults
Count SSE retype microfaults
.It Li x87-reclass-microfaults
Count x87 reclass microfaults.
.El
.Pp
The default is to count all types of exceptions.
.It Li k8-fr-interrupts-masked-cycles
Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
.It Li k8-fr-interrupts-masked-while-pending-cycles
Count cycles while interrupts were masked while pending (i.e., cycles
when INTR was asserted while CPU RFLAGS field IF was zero).
.It Li k8-fr-number-of-breakpoints-for-dr0
Count the number of breakpoints for DR0.
.It Li k8-fr-number-of-breakpoints-for-dr1
Count the number of breakpoints for DR1.
.It Li k8-fr-number-of-breakpoints-for-dr2
Count the number of breakpoints for DR2.
.It Li k8-fr-number-of-breakpoints-for-dr3
Count the number of breakpoints for DR3.
.It Li k8-fr-retired-branches
Count retired branches including exceptions and interrupts.
.It Li k8-fr-retired-branches-mispredicted
Count mispredicted retired branches.
.It Li k8-fr-retired-far-control-transfers
Count retired far control transfers (which are always mispredicted).
.It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
Count retired fastpath double op instructions.
This event is supported in revision B and later CPUs.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li low-op-pos-0
Count instructions with the low op in position 0.
.It Li low-op-pos-1
Count instructions with the low op in position 1.
.It Li low-op-pos-2
Count instructions with the low op in position 2.
.El
.Pp
The default is to count all types of instructions.
.It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
Count retired FPU instructions.
This event is supported in revision B and later CPUs.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li mmx-3dnow
Count MMX and 3DNow!\& instructions.
.It Li packed-sse-sse2
Count packed SSE and SSE2 instructions.
.It Li scalar-sse-sse2
Count scalar SSE and SSE2 instructions
.It Li x87
Count x87 instructions.
.El
.Pp
The default is to count all types of instructions.
.It Li k8-fr-retired-near-returns
Count retired near returns.
.It Li k8-fr-retired-near-returns-mispredicted
Count mispredicted near returns.
.It Li k8-fr-retired-resyncs
Count retired resyncs (non-control transfer branches).
.It Li k8-fr-retired-taken-hardware-interrupts
Count retired taken hardware interrupts.
.It Li k8-fr-retired-taken-branches
Count retired taken branches.
.It Li k8-fr-retired-taken-branches-mispredicted
Count retired taken branches that were mispredicted.
.It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
Count retired taken branches that were mispredicted only due to an
address miscompare.
.It Li k8-fr-retired-uops
Count retired uops.
.It Li k8-fr-retired-x86-instructions
Count retired x86 instructions including exceptions and interrupts.
.It Li k8-ic-fetch
Count instruction cache fetches.
.It Li k8-ic-instruction-fetch-stall
Count cycles in stalls due to instruction fetch.
.It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
Count L1 ITLB misses that are L2 ITLB hits.
.It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
Count ITLB misses that miss in both L1 and L2 ITLBs.
.It Li k8-ic-microarchitectural-resync-by-snoop
Count microarchitectural resyncs caused by snoops.
.It Li k8-ic-miss
Count instruction cache misses.
.It Li k8-ic-refill-from-l2
Count instruction cache refills from L2 cache.
.It Li k8-ic-refill-from-system
Count instruction cache refills from system memory.
.It Li k8-ic-return-stack-hits
Count hits to the return stack.
.It Li k8-ic-return-stack-overflow
Count overflows of the return stack.
.It Li k8-ls-buffer2-full
Count load/store buffer2 full events.
.It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
Count locked operations.
For revision C and later CPUs, the following qualifiers are supported:
.Pp
.Bl -tag -width indent -compact
.It Li cycles-in-request
Count the number of cycles in the lock request/grant stage.
.It Li cycles-to-complete
Count the number of cycles a lock takes to complete once it is
non-speculative and is the older load/store operation.
.It Li locked-instructions
Count the number of lock instructions executed.
.El
.Pp
The default is to count the number of lock instructions executed.
.It Li k8-ls-microarchitectural-late-cancel
Count microarchitectural late cancels of operations in the load/store
unit.
.It Li k8-ls-microarchitectural-resync-by-self-modifying-code
Count microarchitectural resyncs caused by self-modifying code.
.It Li k8-ls-microarchitectural-resync-by-snoop
Count microarchitectural resyncs caused by snoops.
.It Li k8-ls-retired-cflush-instructions
Count retired CFLUSH instructions.
.It Li k8-ls-retired-cpuid-instructions
Count retired CPUID instructions.
.It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
Count segment register loads.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Bl -tag -width indent -compact
.It Li cs
Count CS register loads.
.It Li ds
Count DS register loads.
.It Li es
Count ES register loads.
.It Li fs
Count FS register loads.
.It Li gs
Count GS register loads.
.\" .It Li hs
.\" Count HS register loads.
.\" XXX "HS" register?
.It Li ss
Count SS register loads.
.El
.Pp
The default is to count all types of loads.
.It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
Count memory controller bypass counter saturation events.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li dram-controller-interface-bypass
Count DRAM controller interface bypass.
.It Li dram-controller-queue-bypass
Count DRAM controller queue bypass.
.It Li memory-controller-hi-pri-bypass
Count memory controller high priority bypasses.
.It Li memory-controller-lo-pri-bypass
Count memory controller low priority bypasses.
.El
.Pp
.It Li k8-nb-memory-controller-dram-slots-missed
Count memory controller DRAM command slots missed (in MemClks).
.It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
Count memory controller page access events.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li page-conflict
Count page conflicts.
.It Li page-hit
Count page hits.
.It Li page-miss
Count page misses.
.El
.Pp
The default is to count all types of events.
.It Li k8-nb-memory-controller-page-table-overflow
Count memory control page table overflow events.
.It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
Count probe events.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li probe-hit
Count all probe hits.
.It Li probe-hit-dirty-no-memory-cancel
Count probe hits without memory cancels.
.It Li probe-hit-dirty-with-memory-cancel
Count probe hits with memory cancels.
.It Li probe-miss
Count probe misses.
.El
.It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
Count sized commands issued.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li nonpostwrszbyte
.It Li nonpostwrszdword
.It Li postwrszbyte
.It Li postwrszdword
.It Li rdszbyte
.It Li rdszdword
.It Li rdmodwr
.El
.Pp
The default is to count all types of commands.
.It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
Count memory control turnaround events.
This event may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.\" XXX doc is unclear whether these are cycle counts or event counts
.It Li dimm-turnaround
Count DIMM turnarounds.
.It Li read-to-write-turnaround
Count read to write turnarounds.
.It Li write-to-read-turnaround
Count write to read turnarounds.
.El
.Pp
The default is to count all types of events.
.It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
.It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
.It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
Count events on the HyperTransport(tm) buses.
These events may be further qualified using
.Ar qualifier ,
which is a
.Ql +
separated set of the following keywords:
.Pp
.Bl -tag -width indent -compact
.It Li buffer-release
Count buffer release messages sent.
.It Li command
Count command messages sent.
.It Li data
Count data messages sent.
.It Li nop
Count nop messages sent.
.El
.Pp
The default is to count all types of messages.
.El
.Ss Event Name Aliases
The following table shows the mapping between the PMC-independent
aliases supported by
.Lb libpmc
and the underlying hardware events used.
.Bl -column "branch-mispredicts" "Description"
.It Em Alias Ta Em Event
.It Li branches Ta Li k8-fr-retired-taken-branches
.It Li branch-mispredicts Ta Li k8-fr-retired-taken-branches-mispredicted
.It Li dc-misses Ta Li k8-dc-miss
.It Li ic-misses Ta Li k8-ic-miss
.It Li instructions Ta Li k8-fr-retired-x86-instructions
.It Li interrupts Ta Li k8-fr-taken-hardware-interrupts
.It Li unhalted-cycles Ta Li k8-bu-cpu-clk-unhalted
.El
.Sh SEE ALSO
.Xr pmc 3 ,
.Xr pmc.k7 3 ,
.Xr pmc.p4 3 ,
.Xr pmc.p5 3 ,
.Xr pmc.p6 3 ,
.Xr pmc.tsc 3 ,
.Xr pmclog 3 ,
.Xr hwpmc 4
.Sh HISTORY
The
.Nm pmc
library first appeared in
.Fx 6.0 .
.Sh AUTHORS
The
.Lb libpmc
library was written by
.An "Joseph Koshy"
.Aq jkoshy@FreeBSD.org .