1f77f137dc
PR: 191174 Submitted by: Franco Fichtner <franco at lastsummer.de>
1399 lines
53 KiB
Groff
1399 lines
53 KiB
Groff
.\" Copyright (c) 2010 Fabien Thomas. All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd February 25, 2012
|
|
.Dt PMC.WESTMERE 3
|
|
.Os
|
|
.Sh NAME
|
|
.Nm pmc.westmere
|
|
.Nd measurement events for
|
|
.Tn Intel
|
|
.Tn Westmere
|
|
family CPUs
|
|
.Sh LIBRARY
|
|
.Lb libpmc
|
|
.Sh SYNOPSIS
|
|
.In pmc.h
|
|
.Sh DESCRIPTION
|
|
.Tn Intel
|
|
.Tn "Westmere"
|
|
CPUs contain PMCs conforming to version 2 of the
|
|
.Tn Intel
|
|
performance measurement architecture.
|
|
These CPUs may contain up to three classes of PMCs:
|
|
.Bl -tag -width "Li PMC_CLASS_IAP"
|
|
.It Li PMC_CLASS_IAF
|
|
Fixed-function counters that count only one hardware event per counter.
|
|
.It Li PMC_CLASS_IAP
|
|
Programmable counters that may be configured to count one of a defined
|
|
set of hardware events.
|
|
.El
|
|
.Pp
|
|
The number of PMCs available in each class and their widths need to be
|
|
determined at run time by calling
|
|
.Xr pmc_cpuinfo 3 .
|
|
.Pp
|
|
Intel Westmere PMCs are documented in
|
|
.Rs
|
|
.%B "Intel(R) 64 and IA-32 Architectures Software Developes Manual"
|
|
.%T "Volume 3B: System Programming Guide, Part 2"
|
|
.%N "Order Number: 253669-033US"
|
|
.%D December 2009
|
|
.%Q "Intel Corporation"
|
|
.Re
|
|
.Ss WESTMERE FIXED FUNCTION PMCS
|
|
These PMCs and their supported events are documented in
|
|
.Xr pmc.iaf 3 .
|
|
.Ss WESTMERE PROGRAMMABLE PMCS
|
|
The programmable PMCs support the following capabilities:
|
|
.Bl -column "PMC_CAP_INTERRUPT" "Support"
|
|
.It Em Capability Ta Em Support
|
|
.It PMC_CAP_CASCADE Ta \&No
|
|
.It PMC_CAP_EDGE Ta Yes
|
|
.It PMC_CAP_INTERRUPT Ta Yes
|
|
.It PMC_CAP_INVERT Ta Yes
|
|
.It PMC_CAP_READ Ta Yes
|
|
.It PMC_CAP_PRECISE Ta \&No
|
|
.It PMC_CAP_SYSTEM Ta Yes
|
|
.It PMC_CAP_TAGGING Ta \&No
|
|
.It PMC_CAP_THRESHOLD Ta Yes
|
|
.It PMC_CAP_USER Ta Yes
|
|
.It PMC_CAP_WRITE Ta Yes
|
|
.El
|
|
.Ss Event Qualifiers
|
|
Event specifiers for these PMCs support the following common
|
|
qualifiers:
|
|
.Bl -tag -width indent
|
|
.It Li rsp= Ns Ar value
|
|
Configure the Off-core Response bits.
|
|
.Bl -tag -width indent
|
|
.It Li DMND_DATA_RD
|
|
Counts the number of demand and DCU prefetch data reads of full
|
|
and partial cachelines as well as demand data page table entry
|
|
cacheline reads.
|
|
Does not count L2 data read prefetches or
|
|
instruction fetches.
|
|
.It Li DMND_RFO
|
|
Counts the number of demand and DCU prefetch reads for ownership
|
|
(RFO) requests generated by a write to data cacheline.
|
|
Does not count L2 RFO.
|
|
.It Li DMND_IFETCH
|
|
Counts the number of demand and DCU prefetch instruction cacheline
|
|
reads.
|
|
Does not count L2 code read prefetches.
|
|
WB
|
|
Counts the number of writeback (modified to exclusive) transactions.
|
|
.It Li PF_DATA_RD
|
|
Counts the number of data cacheline reads generated by L2 prefetchers.
|
|
.It Li PF_RFO
|
|
Counts the number of RFO requests generated by L2 prefetchers.
|
|
.It Li PF_IFETCH
|
|
Counts the number of code reads generated by L2 prefetchers.
|
|
.It Li OTHER
|
|
Counts one of the following transaction types, including L3 invalidate,
|
|
I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences,
|
|
lock, unlock, split lock.
|
|
.It Li UNCORE_HIT
|
|
L3 Hit: local or remote home requests that hit L3 cache in the uncore
|
|
with no coherency actions required (snooping).
|
|
.It Li OTHER_CORE_HIT_SNP
|
|
L3 Hit: local or remote home requests that hit L3 cache in the uncore
|
|
and was serviced by another core with a cross core snoop where no modified
|
|
copies were found (clean).
|
|
.It Li OTHER_CORE_HITM
|
|
L3 Hit: local or remote home requests that hit L3 cache in the uncore
|
|
and was serviced by another core with a cross core snoop where modified
|
|
copies were found (HITM).
|
|
.It Li REMOTE_CACHE_FWD
|
|
L3 Miss: local homed requests that missed the L3 cache and was serviced
|
|
by forwarded data following a cross package snoop where no modified
|
|
copies found. (Remote home requests are not counted)
|
|
.It Li REMOTE_DRAM
|
|
L3 Miss: remote home requests that missed the L3 cache and were serviced
|
|
by remote DRAM.
|
|
.It Li LOCAL_DRAM
|
|
L3 Miss: local home requests that missed the L3 cache and were serviced
|
|
by local DRAM.
|
|
.It Li NON_DRAM
|
|
Non-DRAM requests that were serviced by IOH.
|
|
.El
|
|
.It Li cmask= Ns Ar value
|
|
Configure the PMC to increment only if the number of configured
|
|
events measured in a cycle is greater than or equal to
|
|
.Ar value .
|
|
.It Li edge
|
|
Configure the PMC to count the number of de-asserted to asserted
|
|
transitions of the conditions expressed by the other qualifiers.
|
|
If specified, the counter will increment only once whenever a
|
|
condition becomes true, irrespective of the number of clocks during
|
|
which the condition remains true.
|
|
.It Li inv
|
|
Invert the sense of comparison when the
|
|
.Dq Li cmask
|
|
qualifier is present, making the counter increment when the number of
|
|
events per cycle is less than the value specified by the
|
|
.Dq Li cmask
|
|
qualifier.
|
|
.It Li os
|
|
Configure the PMC to count events happening at processor privilege
|
|
level 0.
|
|
.It Li usr
|
|
Configure the PMC to count events occurring at privilege levels 1, 2
|
|
or 3.
|
|
.El
|
|
.Pp
|
|
If neither of the
|
|
.Dq Li os
|
|
or
|
|
.Dq Li usr
|
|
qualifiers are specified, the default is to enable both.
|
|
.Ss Event Specifiers (Programmable PMCs)
|
|
Westmere programmable PMCs support the following events:
|
|
.Bl -tag -width indent
|
|
.It Li LOAD_BLOCK.OVERLAP_STORE
|
|
.Pq Event 03H , Umask 02H
|
|
Loads that partially overlap an earlier store
|
|
.It Li SB_DRAIN.ANY
|
|
.Pq Event 04H , Umask 07H
|
|
All Store buffer stall cycles
|
|
.It Li MISALIGN_MEMORY.STORE
|
|
.Pq Event 05H , Umask 02H
|
|
All store referenced with misaligned address
|
|
.It Li STORE_BLOCKS.AT_RET
|
|
.Pq Event 06H , Umask 04H
|
|
Counts number of loads delayed with at-Retirement block code.
|
|
The following
|
|
loads need to be executed at retirement and wait for all senior stores on
|
|
the same thread to be drained: load splitting across 4K boundary (page
|
|
split), load accessing uncacheable (UC or USWC) memory, load lock, and load
|
|
with page table in UC or USWC memory region.
|
|
.It Li STORE_BLOCKS.L1D_BLOCK
|
|
.Pq Event 06H , Umask 08H
|
|
Cacheable loads delayed with L1D block code
|
|
.It Li PARTIAL_ADDRESS_ALIAS
|
|
.Pq Event 07H , Umask 01H
|
|
Counts false dependency due to partial address aliasing
|
|
.It Li DTLB_LOAD_MISSES.ANY
|
|
.Pq Event 08H , Umask 01H
|
|
Counts all load misses that cause a page walk
|
|
.It Li DTLB_LOAD_MISSES.WALK_COMPLETED
|
|
.Pq Event 08H , Umask 02H
|
|
Counts number of completed page walks due to load miss in the STLB.
|
|
.It Li DTLB_LOAD_MISSES.WALK_CYCLES
|
|
.Pq Event 08H , Umask 04H
|
|
Cycles PMH is busy with a page walk due to a load miss in the STLB.
|
|
.It Li DTLB_LOAD_MISSES.STLB_HIT
|
|
.Pq Event 08H , Umask 10H
|
|
Number of cache load STLB hits
|
|
.It Li DTLB_LOAD_MISSES.PDE_MISS
|
|
.Pq Event 08H , Umask 20H
|
|
Number of DTLB cache load misses where the low part of the linear to
|
|
physical address translation was missed.
|
|
.It Li MEM_INST_RETIRED.LOADS
|
|
.Pq Event 0BH , Umask 01H
|
|
Counts the number of instructions with an architecturally-visible store
|
|
retired on the architected path.
|
|
In conjunction with ld_lat facility
|
|
.It Li MEM_INST_RETIRED.STORES
|
|
.Pq Event 0BH , Umask 02H
|
|
Counts the number of instructions with an architecturally-visible store
|
|
retired on the architected path.
|
|
In conjunction with ld_lat facility
|
|
.It Li MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD
|
|
.Pq Event 0BH , Umask 10H
|
|
Counts the number of instructions exceeding the latency specified with
|
|
ld_lat facility.
|
|
In conjunction with ld_lat facility
|
|
.It Li MEM_STORE_RETIRED.DTLB_MISS
|
|
.Pq Event 0CH , Umask 01H
|
|
The event counts the number of retired stores that missed the DTLB.
|
|
The DTLB miss is not counted if the store operation causes a fault.
|
|
Does not counter prefetches.
|
|
Counts both primary and secondary misses to the TLB
|
|
.It Li UOPS_ISSUED.ANY
|
|
.Pq Event 0EH , Umask 01H
|
|
Counts the number of Uops issued by the Register Allocation Table to the
|
|
Reservation Station, i.e. the UOPs issued from the front end to the back
|
|
end.
|
|
.It Li UOPS_ISSUED.STALLED_CYCLES
|
|
.Pq Event 0EH , Umask 01H
|
|
Counts the number of cycles no Uops issued by the Register Allocation Table
|
|
to the Reservation Station, i.e. the UOPs issued from the front end to the
|
|
back end.
|
|
set invert=1, cmask = 1
|
|
.It Li UOPS_ISSUED.FUSED
|
|
.Pq Event 0EH , Umask 02H
|
|
Counts the number of fused Uops that were issued from the Register
|
|
Allocation Table to the Reservation Station.
|
|
.It Li MEM_UNCORE_RETIRED.LOCAL_HITM
|
|
.Pq Event 0FH , Umask 02H
|
|
Load instructions retired that HIT modified data in sibling core (Precise
|
|
Event)
|
|
.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM_AND_REMOTE_CACHE_HIT
|
|
.Pq Event 0FH , Umask 08H
|
|
Load instructions retired local dram and remote cache HIT data sources
|
|
(Precise Event)
|
|
.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM
|
|
.Pq Event 0FH , Umask 10H
|
|
Load instructions retired with a data source of local DRAM or locally homed
|
|
remote cache HITM (Precise Event)
|
|
.It Li MEM_UNCORE_RETIRED.REMOTE_DRAM
|
|
.Pq Event 0FH , Umask 20H
|
|
Load instructions retired remote DRAM and remote home-remote cache HITM
|
|
(Precise Event)
|
|
.It Li MEM_UNCORE_RETIRED.UNCACHEABLE
|
|
.Pq Event 0FH , Umask 80H
|
|
Load instructions retired I/O (Precise Event)
|
|
.It Li FP_COMP_OPS_EXE.X87
|
|
.Pq Event 10H , Umask 01H
|
|
Counts the number of FP Computational Uops Executed.
|
|
The number of FADD,
|
|
FSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer
|
|
DIVs, and IDIVs.
|
|
This event does not distinguish an FADD used in the middle
|
|
of a transcendental flow from a separate FADD instruction.
|
|
.It Li FP_COMP_OPS_EXE.MMX
|
|
.Pq Event 10H , Umask 02H
|
|
Counts number of MMX Uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_FP
|
|
.Pq Event 10H , Umask 04H
|
|
Counts number of SSE and SSE2 FP uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE2_INTEGER
|
|
.Pq Event 10H , Umask 08H
|
|
Counts number of SSE2 integer uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_FP_PACKED
|
|
.Pq Event 10H , Umask 10H
|
|
Counts number of SSE FP packed uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_FP_SCALAR
|
|
.Pq Event 10H , Umask 20H
|
|
Counts number of SSE FP scalar uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION
|
|
.Pq Event 10H , Umask 40H
|
|
Counts number of SSE* FP single precision uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION
|
|
.Pq Event 10H , Umask 80H
|
|
Counts number of SSE* FP double precision uops executed.
|
|
.It Li SIMD_INT_128.PACKED_MPY
|
|
.Pq Event 12H , Umask 01H
|
|
Counts number of 128 bit SIMD integer multiply operations.
|
|
.It Li SIMD_INT_128.PACKED_SHIFT
|
|
.Pq Event 12H , Umask 02H
|
|
Counts number of 128 bit SIMD integer shift operations.
|
|
.It Li SIMD_INT_128.PACK
|
|
.Pq Event 12H , Umask 04H
|
|
Counts number of 128 bit SIMD integer pack operations.
|
|
.It Li SIMD_INT_128.UNPACK
|
|
.Pq Event 12H , Umask 08H
|
|
Counts number of 128 bit SIMD integer unpack operations.
|
|
.It Li SIMD_INT_128.PACKED_LOGICAL
|
|
.Pq Event 12H , Umask 10H
|
|
Counts number of 128 bit SIMD integer logical operations.
|
|
.It Li SIMD_INT_128.PACKED_ARITH
|
|
.Pq Event 12H , Umask 20H
|
|
Counts number of 128 bit SIMD integer arithmetic operations.
|
|
.It Li SIMD_INT_128.SHUFFLE_MOVE
|
|
.Pq Event 12H , Umask 40H
|
|
Counts number of 128 bit SIMD integer shuffle and move operations.
|
|
.It Li LOAD_DISPATCH.RS
|
|
.Pq Event 13H , Umask 01H
|
|
Counts number of loads dispatched from the Reservation Station that bypass
|
|
the Memory Order Buffer.
|
|
.It Li LOAD_DISPATCH.RS_DELAYED
|
|
.Pq Event 13H , Umask 02H
|
|
Counts the number of delayed RS dispatches at the stage latch.
|
|
If an RS dispatch can not bypass to LB, it has another chance to dispatch
|
|
from the one-cycle delayed staging latch before it is written into the LB.
|
|
.It Li LOAD_DISPATCH.MOB
|
|
.Pq Event 13H , Umask 04H
|
|
Counts the number of loads dispatched from the Reservation Station to the
|
|
Memory Order Buffer.
|
|
.It Li LOAD_DISPATCH.ANY
|
|
.Pq Event 13H , Umask 07H
|
|
Counts all loads dispatched from the Reservation Station.
|
|
.It Li ARITH.CYCLES_DIV_BUSY
|
|
.Pq Event 14H , Umask 01H
|
|
Counts the number of cycles the divider is busy executing divide or square
|
|
root operations.
|
|
The divide can be integer, X87 or Streaming SIMD Extensions (SSE).
|
|
The square root operation can be either X87 or SSE.
|
|
Set 'edge =1, invert=1, cmask=1' to count the number of divides.
|
|
Count may be incorrect When SMT is on
|
|
.It Li ARITH.MUL
|
|
.Pq Event 14H , Umask 02H
|
|
Counts the number of multiply operations executed.
|
|
This includes integer as
|
|
well as floating point multiply operations but excludes DPPS mul and MPSAD.
|
|
Count may be incorrect When SMT is on
|
|
.It Li INST_QUEUE_WRITES
|
|
.Pq Event 17H , Umask 01H
|
|
Counts the number of instructions written into the instruction queue every
|
|
cycle.
|
|
.It Li INST_DECODED.DEC0
|
|
.Pq Event 18H , Umask 01H
|
|
Counts number of instructions that require decoder 0 to be decoded.
|
|
Usually, this means that the instruction maps to more than 1 uop
|
|
.It Li TWO_UOP_INSTS_DECODED
|
|
.Pq Event 19H , Umask 01H
|
|
An instruction that generates two uops was decoded
|
|
.It Li INST_QUEUE_WRITE_CYCLES
|
|
.Pq Event 1EH , Umask 01H
|
|
This event counts the number of cycles during which instructions are written
|
|
to the instruction queue.
|
|
Dividing this counter by the number of
|
|
instructions written to the instruction queue (INST_QUEUE_WRITES) yields the
|
|
average number of instructions decoded each cycle.
|
|
If this number is less
|
|
than four and the pipe stalls, this indicates that the decoder is failing to
|
|
decode enough instructions per cycle to sustain the 4-wide pipeline.
|
|
If SSE* instructions that are 6 bytes or longer arrive one after another,
|
|
then front end throughput may limit execution speed.
|
|
In such case,
|
|
.It Li LSD_OVERFLOW
|
|
.Pq Event 20H , Umask 01H
|
|
Number of loops that can not stream from the instruction queue.
|
|
.It Li L2_RQSTS.LD_HIT
|
|
.Pq Event 24H , Umask 01H
|
|
Counts number of loads that hit the L2 cache.
|
|
L2 loads include both L1D demand misses as well as L1D prefetches.
|
|
L2 loads can be rejected for various reasons.
|
|
Only non rejected loads are counted.
|
|
.It Li L2_RQSTS.LD_MISS
|
|
.Pq Event 24H , Umask 02H
|
|
Counts the number of loads that miss the L2 cache.
|
|
L2 loads include both L1D demand misses as well as L1D prefetches.
|
|
.It Li L2_RQSTS.LOADS
|
|
.Pq Event 24H , Umask 03H
|
|
Counts all L2 load requests.
|
|
L2 loads include both L1D demand misses as well as L1D prefetches.
|
|
.It Li L2_RQSTS.RFO_HIT
|
|
.Pq Event 24H , Umask 04H
|
|
Counts the number of store RFO requests that hit the L2 cache.
|
|
L2 RFO requests include both L1D demand RFO misses as well as L1D RFO
|
|
prefetches.
|
|
Count includes WC memory requests, where the data is not fetched but the
|
|
permission to write the line is required.
|
|
.It Li L2_RQSTS.RFO_MISS
|
|
.Pq Event 24H , Umask 08H
|
|
Counts the number of store RFO requests that miss the L2 cache.
|
|
L2 RFO requests include both L1D demand RFO misses as well as L1D RFO
|
|
prefetches.
|
|
.It Li L2_RQSTS.RFOS
|
|
.Pq Event 24H , Umask 0CH
|
|
Counts all L2 store RFO requests.
|
|
L2 RFO requests include both L1D demand
|
|
RFO misses as well as L1D RFO prefetches.
|
|
.It Li L2_RQSTS.IFETCH_HIT
|
|
.Pq Event 24H , Umask 10H
|
|
Counts number of instruction fetches that hit the L2 cache.
|
|
L2 instruction fetches include both L1I demand misses as well as L1I
|
|
instruction prefetches.
|
|
.It Li L2_RQSTS.IFETCH_MISS
|
|
.Pq Event 24H , Umask 20H
|
|
Counts number of instruction fetches that miss the L2 cache.
|
|
L2 instruction fetches include both L1I demand misses as well as L1I
|
|
instruction prefetches.
|
|
.It Li L2_RQSTS.IFETCHES
|
|
.Pq Event 24H , Umask 30H
|
|
Counts all instruction fetches.
|
|
L2 instruction fetches include both L1I
|
|
demand misses as well as L1I instruction prefetches.
|
|
.It Li L2_RQSTS.PREFETCH_HIT
|
|
.Pq Event 24H , Umask 40H
|
|
Counts L2 prefetch hits for both code and data.
|
|
.It Li L2_RQSTS.PREFETCH_MISS
|
|
.Pq Event 24H , Umask 80H
|
|
Counts L2 prefetch misses for both code and data.
|
|
.It Li L2_RQSTS.PREFETCHES
|
|
.Pq Event 24H , Umask C0H
|
|
Counts all L2 prefetches for both code and data.
|
|
.It Li L2_RQSTS.MISS
|
|
.Pq Event 24H , Umask AAH
|
|
Counts all L2 misses for both code and data.
|
|
.It Li L2_RQSTS.REFERENCES
|
|
.Pq Event 24H , Umask FFH
|
|
Counts all L2 requests for both code and data.
|
|
.It Li L2_DATA_RQSTS.DEMAND.I_STATE
|
|
.Pq Event 26H , Umask 01H
|
|
Counts number of L2 data demand loads where the cache line to be loaded is
|
|
in the I (invalid) state, i.e. a cache miss.
|
|
L2 demand loads are both L1D demand misses and L1D prefetches.
|
|
.It Li L2_DATA_RQSTS.DEMAND.S_STATE
|
|
.Pq Event 26H , Umask 02H
|
|
Counts number of L2 data demand loads where the cache line to be loaded is
|
|
in the S (shared) state.
|
|
L2 demand loads are both L1D demand misses and L1D
|
|
prefetches.
|
|
.It Li L2_DATA_RQSTS.DEMAND.E_STATE
|
|
.Pq Event 26H , Umask 04H
|
|
Counts number of L2 data demand loads where the cache line to be loaded is
|
|
in the E (exclusive) state.
|
|
L2 demand loads are both L1D demand misses and
|
|
L1D prefetches.
|
|
.It Li L2_DATA_RQSTS.DEMAND.M_STATE
|
|
.Pq Event 26H , Umask 08H
|
|
Counts number of L2 data demand loads where the cache line to be loaded is
|
|
in the M (modified) state.
|
|
L2 demand loads are both L1D demand misses and
|
|
L1D prefetches.
|
|
.It Li L2_DATA_RQSTS.DEMAND.MESI
|
|
.Pq Event 26H , Umask 0FH
|
|
Counts all L2 data demand requests.
|
|
L2 demand loads are both L1D demand
|
|
misses and L1D prefetches.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.I_STATE
|
|
.Pq Event 26H , Umask 10H
|
|
Counts number of L2 prefetch data loads where the cache line to be loaded is
|
|
in the I (invalid) state, i.e. a cache miss.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.S_STATE
|
|
.Pq Event 26H , Umask 20H
|
|
Counts number of L2 prefetch data loads where the cache line to be loaded is
|
|
in the S (shared) state.
|
|
A prefetch RFO will miss on an S state line, while
|
|
a prefetch read will hit on an S state line.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.E_STATE
|
|
.Pq Event 26H , Umask 40H
|
|
Counts number of L2 prefetch data loads where the cache line to be loaded is
|
|
in the E (exclusive) state.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.M_STATE
|
|
.Pq Event 26H , Umask 80H
|
|
Counts number of L2 prefetch data loads where the cache line to be loaded is
|
|
in the M (modified) state.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.MESI
|
|
.Pq Event 26H , Umask F0H
|
|
Counts all L2 prefetch requests.
|
|
.It Li L2_DATA_RQSTS.ANY
|
|
.Pq Event 26H , Umask FFH
|
|
Counts all L2 data requests.
|
|
.It Li L2_WRITE.RFO.I_STATE
|
|
.Pq Event 27H , Umask 01H
|
|
Counts number of L2 demand store RFO requests where the cache line to be
|
|
loaded is in the I (invalid) state, i.e, a cache miss.
|
|
The L1D prefetcher
|
|
does not issue a RFO prefetch.
|
|
This is a demand RFO request
|
|
.It Li L2_WRITE.RFO.S_STATE
|
|
.Pq Event 27H , Umask 02H
|
|
Counts number of L2 store RFO requests where the cache line to be loaded is
|
|
in the S (shared) state.
|
|
The L1D prefetcher does not issue a RFO prefetch.
|
|
This is a demand RFO request.
|
|
.It Li L2_WRITE.RFO.M_STATE
|
|
.Pq Event 27H , Umask 08H
|
|
Counts number of L2 store RFO requests where the cache line to be loaded is
|
|
in the M (modified) state.
|
|
The L1D prefetcher does not issue a RFO prefetch.
|
|
This is a demand RFO request.
|
|
.It Li L2_WRITE.RFO.HIT
|
|
.Pq Event 27H , Umask 0EH
|
|
Counts number of L2 store RFO requests where the cache line to be loaded is
|
|
in either the S, E or M states.
|
|
The L1D prefetcher does not issue a RFO
|
|
prefetch.
|
|
This is a demand RFO request
|
|
.It Li L2_WRITE.RFO.MESI
|
|
.Pq Event 27H , Umask 0FH
|
|
Counts all L2 store RFO requests.The L1D prefetcher does not issue a RFO
|
|
prefetch.
|
|
This is a demand RFO request.
|
|
.It Li L2_WRITE.LOCK.I_STATE
|
|
.Pq Event 27H , Umask 10H
|
|
Counts number of L2 demand lock RFO requests where the cache line to be
|
|
loaded is in the I (invalid) state, i.e. a cache miss.
|
|
.It Li L2_WRITE.LOCK.S_STATE
|
|
.Pq Event 27H , Umask 20H
|
|
Counts number of L2 lock RFO requests where the cache line to be loaded is
|
|
in the S (shared) state.
|
|
.It Li L2_WRITE.LOCK.E_STATE
|
|
.Pq Event 27H , Umask 40H
|
|
Counts number of L2 demand lock RFO requests where the cache line to be
|
|
loaded is in the E (exclusive) state.
|
|
.It Li L2_WRITE.LOCK.M_STATE
|
|
.Pq Event 27H , Umask 80H
|
|
Counts number of L2 demand lock RFO requests where the cache line to be
|
|
loaded is in the M (modified) state.
|
|
.It Li L2_WRITE.LOCK.HIT
|
|
.Pq Event 27H , Umask E0H
|
|
Counts number of L2 demand lock RFO requests where the cache line to be
|
|
loaded is in either the S, E, or M state.
|
|
.It Li L2_WRITE.LOCK.MESI
|
|
.Pq Event 27H , Umask F0H
|
|
Counts all L2 demand lock RFO requests.
|
|
.It Li L1D_WB_L2.I_STATE
|
|
.Pq Event 28H , Umask 01H
|
|
Counts number of L1 writebacks to the L2 where the cache line to be written
|
|
is in the I (invalid) state, i.e. a cache miss.
|
|
.It Li L1D_WB_L2.S_STATE
|
|
.Pq Event 28H , Umask 02H
|
|
Counts number of L1 writebacks to the L2 where the cache line to be written
|
|
is in the S state.
|
|
.It Li L1D_WB_L2.E_STATE
|
|
.Pq Event 28H , Umask 04H
|
|
Counts number of L1 writebacks to the L2 where the cache line to be written
|
|
is in the E (exclusive) state.
|
|
.It Li L1D_WB_L2.M_STATE
|
|
.Pq Event 28H , Umask 08H
|
|
Counts number of L1 writebacks to the L2 where the cache line to be written
|
|
is in the M (modified) state.
|
|
.It Li L1D_WB_L2.MESI
|
|
.Pq Event 28H , Umask 0FH
|
|
Counts all L1 writebacks to the L2.
|
|
.It Li L3_LAT_CACHE.REFERENCE
|
|
.Pq Event 2EH , Umask 02H
|
|
Counts uncore Last Level Cache references.
|
|
Because cache hierarchy, cache
|
|
sizes and other implementation-specific characteristics; value comparison to
|
|
estimate performance differences is not recommended.
|
|
See Table A-1.
|
|
.It Li L3_LAT_CACHE.MISS
|
|
.Pq Event 2EH , Umask 01H
|
|
Counts uncore Last Level Cache misses.
|
|
Because cache hierarchy, cache sizes
|
|
and other implementation-specific characteristics; value comparison to
|
|
estimate performance differences is not recommended.
|
|
See Table A-1.
|
|
.It Li CPU_CLK_UNHALTED.THREAD_P
|
|
.Pq Event 3CH , Umask 00H
|
|
Counts the number of thread cycles while the thread is not in a halt state.
|
|
The thread enters the halt state when it is running the HLT instruction.
|
|
The core frequency may change from time to time due to power or thermal
|
|
throttling.
|
|
see Table A-1
|
|
.It Li CPU_CLK_UNHALTED.REF_P
|
|
.Pq Event 3CH , Umask 01H
|
|
Increments at the frequency of TSC when not halted.
|
|
see Table A-1
|
|
.It Li DTLB_MISSES.ANY
|
|
.Pq Event 49H , Umask 01H
|
|
Counts the number of misses in the STLB which causes a page walk.
|
|
.It Li DTLB_MISSES.WALK_COMPLETED
|
|
.Pq Event 49H , Umask 02H
|
|
Counts number of misses in the STLB which resulted in a completed page walk.
|
|
.It Li DTLB_MISSES.WALK_CYCLES
|
|
.Pq Event 49H , Umask 04H
|
|
Counts cycles of page walk due to misses in the STLB.
|
|
.It Li DTLB_MISSES.STLB_HIT
|
|
.Pq Event 49H , Umask 10H
|
|
Counts the number of DTLB first level misses that hit in the second level
|
|
TLB.
|
|
This event is only relevant if the core contains multiple DTLB levels.
|
|
.It Li DTLB_MISSES.LARGE_WALK_COMPLETED
|
|
.Pq Event 49H , Umask 80H
|
|
Counts number of completed large page walks due to misses in the STLB.
|
|
.It Li LOAD_HIT_PRE
|
|
.Pq Event 4CH , Umask 01H
|
|
Counts load operations sent to the L1 data cache while a previous SSE
|
|
prefetch instruction to the same cache line has started prefetching but has
|
|
not yet finished.
|
|
.It Li L1D_PREFETCH.REQUESTS
|
|
.Pq Event 4EH , Umask 01H
|
|
Counts number of hardware prefetch requests dispatched out of the prefetch
|
|
FIFO.
|
|
.It Li L1D_PREFETCH.MISS
|
|
.Pq Event 4EH , Umask 02H
|
|
Counts number of hardware prefetch requests that miss the L1D.
|
|
There are two
|
|
prefetchers in the L1D.
|
|
A streamer, which predicts lines sequentially after
|
|
this one should be fetched, and the IP prefetcher that remembers access
|
|
patterns for the current instruction.
|
|
The streamer prefetcher stops on an
|
|
L1D hit, while the IP prefetcher does not.
|
|
.It Li L1D_PREFETCH.TRIGGERS
|
|
.Pq Event 4EH , Umask 04H
|
|
Counts number of prefetch requests triggered by the Finite State Machine and
|
|
pushed into the prefetch FIFO.
|
|
Some of the prefetch requests are dropped due
|
|
to overwrites or competition between the IP index prefetcher and streamer
|
|
prefetcher.
|
|
The prefetch FIFO contains 4 entries.
|
|
.It Li EPT.WALK_CYCLES
|
|
.Pq Event 4FH , Umask 10H
|
|
Counts Extended Page walk cycles.
|
|
.It Li L1D.REPL
|
|
.Pq Event 51H , Umask 01H
|
|
Counts the number of lines brought into the L1 data cache.
|
|
Counter 0, 1 only.
|
|
.It Li L1D.M_REPL
|
|
.Pq Event 51H , Umask 02H
|
|
Counts the number of modified lines brought into the L1 data cache.
|
|
Counter 0, 1 only.
|
|
.It Li L1D.M_EVICT
|
|
.Pq Event 51H , Umask 04H
|
|
Counts the number of modified lines evicted from the L1 data cache due to
|
|
replacement.
|
|
Counter 0, 1 only.
|
|
.It Li L1D.M_SNOOP_EVICT
|
|
.Pq Event 51H , Umask 08H
|
|
Counts the number of modified lines evicted from the L1 data cache due to
|
|
snoop HITM intervention.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_PREFETCH_LOCK_FB_HIT
|
|
.Pq Event 52H , Umask 01H
|
|
Counts the number of cacheable load lock speculated instructions accepted
|
|
into the fill buffer.
|
|
.It Li L1D_CACHE_LOCK_FB_HIT
|
|
.Pq Event 53H , Umask 01H
|
|
Counts the number of cacheable load lock speculated or retired instructions
|
|
accepted into the fill buffer.
|
|
.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_DATA
|
|
.Pq Event 60H , Umask 01H
|
|
Counts weighted cycles of offcore demand data read requests.
|
|
Does not include L2 prefetch requests.
|
|
Counter 0.
|
|
.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_CODE
|
|
.Pq Event 60H , Umask 02H
|
|
Counts weighted cycles of offcore demand code read requests.
|
|
Does not include L2 prefetch requests.
|
|
Counter 0.
|
|
.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.RFO
|
|
.Pq Event 60H , Umask 04H
|
|
Counts weighted cycles of offcore demand RFO requests.
|
|
Does not include L2 prefetch requests.
|
|
Counter 0.
|
|
.It Li OFFCORE_REQUESTS_OUTSTANDING.ANY.READ
|
|
.Pq Event 60H , Umask 08H
|
|
Counts weighted cycles of offcore read requests of any kind.
|
|
Include L2 prefetch requests.
|
|
Counter 0.
|
|
.It Li CACHE_LOCK_CYCLES.L1D_L2
|
|
.Pq Event 63H , Umask 01H
|
|
Cycle count during which the L1D and L2 are locked.
|
|
A lock is asserted when
|
|
there is a locked memory access, due to uncacheable memory, a locked
|
|
operation that spans two cache lines, or a page walk from an uncacheable
|
|
page table.
|
|
Counter 0, 1 only.
|
|
L1D and L2 locks have a very high performance penalty and
|
|
it is highly recommended to avoid such accesses.
|
|
.It Li CACHE_LOCK_CYCLES.L1D
|
|
.Pq Event 63H , Umask 02H
|
|
Counts the number of cycles that cacheline in the L1 data cache unit is
|
|
locked.
|
|
Counter 0, 1 only.
|
|
.It Li IO_TRANSACTIONS
|
|
.Pq Event 6CH , Umask 01H
|
|
Counts the number of completed I/O transactions.
|
|
.It Li L1I.HITS
|
|
.Pq Event 80H , Umask 01H
|
|
Counts all instruction fetches that hit the L1 instruction cache.
|
|
.It Li L1I.MISSES
|
|
.Pq Event 80H , Umask 02H
|
|
Counts all instruction fetches that miss the L1I cache.
|
|
This includes
|
|
instruction cache misses, streaming buffer misses, victim cache misses and
|
|
uncacheable fetches.
|
|
An instruction fetch miss is counted only once and not
|
|
once for every cycle it is outstanding.
|
|
.It Li L1I.READS
|
|
.Pq Event 80H , Umask 03H
|
|
Counts all instruction fetches, including uncacheable fetches that bypass
|
|
the L1I.
|
|
.It Li L1I.CYCLES_STALLED
|
|
.Pq Event 80H , Umask 04H
|
|
Cycle counts for which an instruction fetch stalls due to a L1I cache miss,
|
|
ITLB miss or ITLB fault.
|
|
.It Li LARGE_ITLB.HIT
|
|
.Pq Event 82H , Umask 01H
|
|
Counts number of large ITLB hits.
|
|
.It Li ITLB_MISSES.ANY
|
|
.Pq Event 85H , Umask 01H
|
|
Counts the number of misses in all levels of the ITLB which causes a page
|
|
walk.
|
|
.It Li ITLB_MISSES.WALK_COMPLETED
|
|
.Pq Event 85H , Umask 02H
|
|
Counts number of misses in all levels of the ITLB which resulted in a
|
|
completed page walk.
|
|
.It Li ITLB_MISSES.WALK_CYCLES
|
|
.Pq Event 85H , Umask 04H
|
|
Counts ITLB miss page walk cycles.
|
|
.It Li ITLB_MISSES.LARGE_WALK_COMPLETED
|
|
.Pq Event 85H , Umask 80H
|
|
Counts number of completed large page walks due to misses in the STLB.
|
|
.It Li ILD_STALL.LCP
|
|
.Pq Event 87H , Umask 01H
|
|
Cycles Instruction Length Decoder stalls due to length changing prefixes:
|
|
66, 67 or REX.W (for EM64T) instructions which change the length of the
|
|
decoded instruction.
|
|
.It Li ILD_STALL.MRU
|
|
.Pq Event 87H , Umask 02H
|
|
Instruction Length Decoder stall cycles due to Brand Prediction Unit (PBU)
|
|
Most Recently Used (MRU) bypass.
|
|
.It Li ILD_STALL.IQ_FULL
|
|
.Pq Event 87H , Umask 04H
|
|
Stall cycles due to a full instruction queue.
|
|
.It Li ILD_STALL.REGEN
|
|
.Pq Event 87H , Umask 08H
|
|
Counts the number of regen stalls.
|
|
.It Li ILD_STALL.ANY
|
|
.Pq Event 87H , Umask 0FH
|
|
Counts any cycles the Instruction Length Decoder is stalled.
|
|
.It Li BR_INST_EXEC.COND
|
|
.Pq Event 88H , Umask 01H
|
|
Counts the number of conditional near branch instructions executed, but not
|
|
necessarily retired.
|
|
.It Li BR_INST_EXEC.DIRECT
|
|
.Pq Event 88H , Umask 02H
|
|
Counts all unconditional near branch instructions excluding calls and
|
|
indirect branches.
|
|
.It Li BR_INST_EXEC.INDIRECT_NON_CALL
|
|
.Pq Event 88H , Umask 04H
|
|
Counts the number of executed indirect near branch instructions that are not
|
|
calls.
|
|
.It Li BR_INST_EXEC.NON_CALLS
|
|
.Pq Event 88H , Umask 07H
|
|
Counts all non call near branch instructions executed, but not necessarily
|
|
retired.
|
|
.It Li BR_INST_EXEC.RETURN_NEAR
|
|
.Pq Event 88H , Umask 08H
|
|
Counts indirect near branches that have a return mnemonic.
|
|
.It Li BR_INST_EXEC.DIRECT_NEAR_CALL
|
|
.Pq Event 88H , Umask 10H
|
|
Counts unconditional near call branch instructions, excluding non call
|
|
branch, executed.
|
|
.It Li BR_INST_EXEC.INDIRECT_NEAR_CALL
|
|
.Pq Event 88H , Umask 20H
|
|
Counts indirect near calls, including both register and memory indirect,
|
|
executed.
|
|
.It Li BR_INST_EXEC.NEAR_CALLS
|
|
.Pq Event 88H , Umask 30H
|
|
Counts all near call branches executed, but not necessarily retired.
|
|
.It Li BR_INST_EXEC.TAKEN
|
|
.Pq Event 88H , Umask 40H
|
|
Counts taken near branches executed, but not necessarily retired.
|
|
.It Li BR_INST_EXEC.ANY
|
|
.Pq Event 88H , Umask 7FH
|
|
Counts all near executed branches (not necessarily retired).
|
|
This includes only instructions and not micro-op branches.
|
|
Frequent branching is not necessarily a major performance issue.
|
|
However frequent branch mispredictions may be a problem.
|
|
.It Li BR_MISP_EXEC.COND
|
|
.Pq Event 89H , Umask 01H
|
|
Counts the number of mispredicted conditional near branch instructions
|
|
executed, but not necessarily retired.
|
|
.It Li BR_MISP_EXEC.DIRECT
|
|
.Pq Event 89H , Umask 02H
|
|
Counts mispredicted macro unconditional near branch instructions, excluding
|
|
calls and indirect branches (should always be 0).
|
|
.It Li BR_MISP_EXEC.INDIRECT_NON_CALL
|
|
.Pq Event 89H , Umask 04H
|
|
Counts the number of executed mispredicted indirect near branch instructions
|
|
that are not calls.
|
|
.It Li BR_MISP_EXEC.NON_CALLS
|
|
.Pq Event 89H , Umask 07H
|
|
Counts mispredicted non call near branches executed, but not necessarily
|
|
retired.
|
|
.It Li BR_MISP_EXEC.RETURN_NEAR
|
|
.Pq Event 89H , Umask 08H
|
|
Counts mispredicted indirect branches that have a rear return mnemonic.
|
|
.It Li BR_MISP_EXEC.DIRECT_NEAR_CALL
|
|
.Pq Event 89H , Umask 10H
|
|
Counts mispredicted non-indirect near calls executed, (should always be 0).
|
|
.It Li BR_MISP_EXEC.INDIRECT_NEAR_CALL
|
|
.Pq Event 89H , Umask 20H
|
|
Counts mispredicted indirect near calls executed, including both register
|
|
and memory indirect.
|
|
.It Li BR_MISP_EXEC.NEAR_CALLS
|
|
.Pq Event 89H , Umask 30H
|
|
Counts all mispredicted near call branches executed, but not necessarily
|
|
retired.
|
|
.It Li BR_MISP_EXEC.TAKEN
|
|
.Pq Event 89H , Umask 40H
|
|
Counts executed mispredicted near branches that are taken, but not
|
|
necessarily retired.
|
|
.It Li BR_MISP_EXEC.ANY
|
|
.Pq Event 89H , Umask 7FH
|
|
Counts the number of mispredicted near branch instructions that were
|
|
executed, but not necessarily retired.
|
|
.It Li RESOURCE_STALLS.ANY
|
|
.Pq Event A2H , Umask 01H
|
|
Counts the number of Allocator resource related stalls.
|
|
Includes register renaming buffer entries, memory buffer entries.
|
|
In addition to resource related stalls, this event counts some other events.
|
|
Includes stalls arising
|
|
during branch misprediction recovery, such as if retirement of the
|
|
mispredicted branch is delayed and stalls arising while store buffer is
|
|
draining from synchronizing operations.
|
|
Does not include stalls due to SuperQ (off core) queue full, too many cache
|
|
misses, etc.
|
|
.It Li RESOURCE_STALLS.LOAD
|
|
.Pq Event A2H , Umask 02H
|
|
Counts the cycles of stall due to lack of load buffer for load operation.
|
|
.It Li RESOURCE_STALLS.RS_FULL
|
|
.Pq Event A2H , Umask 04H
|
|
This event counts the number of cycles when the number of instructions in
|
|
the pipeline waiting for execution reaches the limit the processor can
|
|
handle.
|
|
A high count of this event indicates that there are long latency
|
|
operations in the pipe (possibly load and store operations that miss the L2
|
|
cache, or instructions dependent upon instructions further down the pipeline
|
|
that have yet to retire.
|
|
When RS is full, new instructions can not enter the reservation station and
|
|
start execution.
|
|
.It Li RESOURCE_STALLS.STORE
|
|
.Pq Event A2H , Umask 08H
|
|
This event counts the number of cycles that a resource related stall will
|
|
occur due to the number of store instructions reaching the limit of the
|
|
pipeline, (i.e. all store buffers are used).
|
|
The stall ends when a store
|
|
instruction commits its data to the cache or memory.
|
|
.It Li RESOURCE_STALLS.ROB_FULL
|
|
.Pq Event A2H , Umask 10H
|
|
Counts the cycles of stall due to re- order buffer full.
|
|
.It Li RESOURCE_STALLS.FPCW
|
|
.Pq Event A2H , Umask 20H
|
|
Counts the number of cycles while execution was stalled due to writing the
|
|
floating-point unit (FPU) control word.
|
|
.It Li RESOURCE_STALLS.MXCSR
|
|
.Pq Event A2H , Umask 40H
|
|
Stalls due to the MXCSR register rename occurring to close to a previous
|
|
MXCSR rename.
|
|
The MXCSR provides control and status for the MMX registers.
|
|
.It Li RESOURCE_STALLS.OTHER
|
|
.Pq Event A2H , Umask 80H
|
|
Counts the number of cycles while execution was stalled due to other
|
|
resource issues.
|
|
.It Li MACRO_INSTS.FUSIONS_DECODED
|
|
.Pq Event A6H , Umask 01H
|
|
Counts the number of instructions decoded that are macro-fused but not
|
|
necessarily executed or retired.
|
|
.It Li BACLEAR_FORCE_IQ
|
|
.Pq Event A7H , Umask 01H
|
|
Counts number of times a BACLEAR was forced by the Instruction Queue.
|
|
The IQ is also responsible for providing conditional branch prediction
|
|
direction based on a static scheme and dynamic data provided by the L2
|
|
Branch Prediction Unit.
|
|
If the conditional branch target is not found in the Target
|
|
Array and the IQ predicts that the branch is taken, then the IQ will force
|
|
the Branch Address Calculator to issue a BACLEAR.
|
|
Each BACLEAR asserted by
|
|
the BAC generates approximately an 8 cycle bubble in the instruction fetch
|
|
pipeline.
|
|
.It Li LSD.UOPS
|
|
.Pq Event A8H , Umask 01H
|
|
Counts the number of micro-ops delivered by loop stream detector
|
|
Use cmask=1 and invert to count cycles
|
|
.It Li ITLB_FLUSH
|
|
.Pq Event AEH , Umask 01H
|
|
Counts the number of ITLB flushes
|
|
.It Li OFFCORE_REQUESTS.DEMAND.READ_DATA
|
|
.Pq Event B0H , Umask 01H
|
|
Counts number of offcore demand data read requests.
|
|
Does not count L2 prefetch requests.
|
|
.It Li OFFCORE_REQUESTS.DEMAND.READ_CODE
|
|
.Pq Event B0H , Umask 02H
|
|
Counts number of offcore demand code read requests.
|
|
Does not count L2 prefetch requests.
|
|
.It Li OFFCORE_REQUESTS.DEMAND.RFO
|
|
.Pq Event B0H , Umask 04H
|
|
Counts number of offcore demand RFO requests.
|
|
Does not count L2 prefetch requests.
|
|
.It Li OFFCORE_REQUESTS.ANY.READ
|
|
.Pq Event B0H , Umask 08H
|
|
Counts number of offcore read requests.
|
|
Includes L2 prefetch requests.
|
|
.It Li OFFCORE_REQUESTS.ANY.RFO
|
|
.Pq Event 80H , Umask 10H
|
|
Counts number of offcore RFO requests.
|
|
Includes L2 prefetch requests.
|
|
.It Li OFFCORE_REQUESTS.L1D_WRITEBACK
|
|
.Pq Event B0H , Umask 40H
|
|
Counts number of L1D writebacks to the uncore.
|
|
.It Li OFFCORE_REQUESTS.ANY
|
|
.Pq Event B0H , Umask 80H
|
|
Counts all offcore requests.
|
|
.It Li UOPS_EXECUTED.PORT0
|
|
.Pq Event B1H , Umask 01H
|
|
Counts number of Uops executed that were issued on port 0.
|
|
Port 0 handles integer arithmetic, SIMD and FP add Uops.
|
|
.It Li UOPS_EXECUTED.PORT1
|
|
.Pq Event B1H , Umask 02H
|
|
Counts number of Uops executed that were issued on port 1.
|
|
Port 1 handles integer arithmetic, SIMD, integer shift, FP multiply and
|
|
FP divide Uops.
|
|
.It Li UOPS_EXECUTED.PORT2_CORE
|
|
.Pq Event B1H , Umask 04H
|
|
Counts number of Uops executed that were issued on port 2.
|
|
Port 2 handles the load Uops.
|
|
This is a core count only and can not be collected per
|
|
thread.
|
|
.It Li UOPS_EXECUTED.PORT3_CORE
|
|
.Pq Event B1H , Umask 08H
|
|
Counts number of Uops executed that were issued on port 3.
|
|
Port 3 handles store Uops.
|
|
This is a core count only and can not be collected per thread.
|
|
.It Li UOPS_EXECUTED.PORT4_CORE
|
|
.Pq Event B1H , Umask 10H
|
|
Counts number of Uops executed that where issued on port 4.
|
|
Port 4 handles the value to be stored for the store Uops issued on port 3.
|
|
This is a core count only and can not be collected per thread.
|
|
.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES_NO_PORT5
|
|
.Pq Event B1H , Umask 1FH
|
|
Counts number of cycles there are one or more uops being executed and were
|
|
issued on ports 0-4.
|
|
This is a core count only and can not be collected per thread.
|
|
.It Li UOPS_EXECUTED.PORT5
|
|
.Pq Event B1H , Umask 20H
|
|
Counts number of Uops executed that where issued on port 5.
|
|
.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES
|
|
.Pq Event B1H , Umask 3FH
|
|
Counts number of cycles there are one or more uops being executed on any
|
|
ports.
|
|
This is a core count only and can not be collected per thread.
|
|
.It Li UOPS_EXECUTED.PORT015
|
|
.Pq Event B1H , Umask 40H
|
|
Counts number of Uops executed that where issued on port 0, 1, or 5.
|
|
Use cmask=1, invert=1 to count stall cycles.
|
|
.It Li UOPS_EXECUTED.PORT234
|
|
.Pq Event B1H , Umask 80H
|
|
Counts number of Uops executed that where issued on port 2, 3, or 4.
|
|
.It Li OFFCORE_REQUESTS_SQ_FULL
|
|
.Pq Event B2H , Umask 01H
|
|
Counts number of cycles the SQ is full to handle off-core requests.
|
|
.It Li SNOOPQ_REQUESTS_OUTSTANDING.DATA
|
|
.Pq Event B3H , Umask 01H
|
|
Counts weighted cycles of snoopq requests for data.
|
|
Counter 0 only
|
|
Use cmask=1 to count cycles not empty.
|
|
.It Li SNOOPQ_REQUESTS_OUTSTANDING.INVALIDATE
|
|
.Pq Event B3H , Umask 02H
|
|
Counts weighted cycles of snoopq invalidate requests.
|
|
Counter 0 only.
|
|
Use cmask=1 to count cycles not empty.
|
|
.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE
|
|
.Pq Event B3H , Umask 04H
|
|
Counts weighted cycles of snoopq requests for code.
|
|
Counter 0 only.
|
|
Use cmask=1 to count cycles not empty.
|
|
.It Li SNOOPQ_REQUESTS.CODE
|
|
.Pq Event B4H , Umask 01H
|
|
Counts the number of snoop code requests.
|
|
.It Li SNOOPQ_REQUESTS.DATA
|
|
.Pq Event B4H , Umask 02H
|
|
Counts the number of snoop data requests.
|
|
.It Li SNOOPQ_REQUESTS.INVALIDATE
|
|
.Pq Event B4H , Umask 04H
|
|
Counts the number of snoop invalidate requests
|
|
.It Li OFF_CORE_RESPONSE_0
|
|
.Pq Event B7H , Umask 01H
|
|
see Section 30.6.1.3, Off-core Response Performance Monitoring in the
|
|
Processor Core.
|
|
Requires programming MSR 01A6H.
|
|
.It Li SNOOP_RESPONSE.HIT
|
|
.Pq Event B8H , Umask 01H
|
|
Counts HIT snoop response sent by this thread in response to a snoop
|
|
request.
|
|
.It Li SNOOP_RESPONSE.HITE
|
|
.Pq Event B8H , Umask 02H
|
|
Counts HIT E snoop response sent by this thread in response to a snoop
|
|
request.
|
|
.It Li SNOOP_RESPONSE.HITM
|
|
.Pq Event B8H , Umask 04H
|
|
Counts HIT M snoop response sent by this thread in response to a snoop
|
|
request.
|
|
.It Li OFF_CORE_RESPONSE_1
|
|
.Pq Event BBH , Umask 01H
|
|
see Section 30.6.1.3, Off-core Response Performance Monitoring in the
|
|
Processor Core.
|
|
Use MSR 01A7H.
|
|
.It Li INST_RETIRED.ANY_P
|
|
.Pq Event C0H , Umask 01H
|
|
See Table A-1
|
|
Notes: INST_RETIRED.ANY is counted by a designated fixed counter.
|
|
INST_RETIRED.ANY_P is counted by a programmable counter and is an
|
|
architectural performance event.
|
|
Event is supported if CPUID.A.EBX[1] = 0.
|
|
Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not
|
|
count as retired instructions.
|
|
.It Li INST_RETIRED.X87
|
|
.Pq Event C0H , Umask 02H
|
|
Counts the number of floating point computational operations retired
|
|
floating point computational operations executed by the assist handler and
|
|
sub-operations of complex floating point instructions like transcendental
|
|
instructions.
|
|
.It Li INST_RETIRED.MMX
|
|
.Pq Event C0H , Umask 04H
|
|
Counts the number of retired: MMX instructions.
|
|
.It Li UOPS_RETIRED.ANY
|
|
.Pq Event C2H , Umask 01H
|
|
Counts the number of micro-ops retired, (macro-fused=1, micro- fused=2,
|
|
others=1; maximum count of 8 per cycle).
|
|
Most instructions are composed of one or two micro-ops.
|
|
Some instructions are decoded into longer sequences
|
|
such as repeat instructions, floating point transcendental instructions, and
|
|
assists.
|
|
Use cmask=1 and invert to count active cycles or stalled cycles
|
|
.It Li UOPS_RETIRED.RETIRE_SLOTS
|
|
.Pq Event C2H , Umask 02H
|
|
Counts the number of retirement slots used each cycle
|
|
.It Li UOPS_RETIRED.MACRO_FUSED
|
|
.Pq Event C2H , Umask 04H
|
|
Counts number of macro-fused uops retired.
|
|
.It Li MACHINE_CLEARS.CYCLES
|
|
.Pq Event C3H , Umask 01H
|
|
Counts the cycles machine clear is asserted.
|
|
.It Li MACHINE_CLEARS.MEM_ORDER
|
|
.Pq Event C3H , Umask 02H
|
|
Counts the number of machine clears due to memory order conflicts.
|
|
.It Li MACHINE_CLEARS.SMC
|
|
.Pq Event C3H , Umask 04H
|
|
Counts the number of times that a program writes to a code section.
|
|
Self-modifying code causes a sever penalty in all Intel 64 and IA-32
|
|
processors.
|
|
The modified cache line is written back to the L2 and L3caches.
|
|
.It Li BR_INST_RETIRED.ANY_P
|
|
.Pq Event C4H , Umask 00H
|
|
See Table A-1.
|
|
.It Li BR_INST_RETIRED.CONDITIONAL
|
|
.Pq Event C4H , Umask 01H
|
|
Counts the number of conditional branch instructions retired.
|
|
.It Li BR_INST_RETIRED.NEAR_CALL
|
|
.Pq Event C4H , Umask 02H
|
|
Counts the number of direct & indirect near unconditional calls retired.
|
|
.It Li BR_INST_RETIRED.ALL_BRANCHES
|
|
.Pq Event C4H , Umask 04H
|
|
Counts the number of branch instructions retired.
|
|
.It Li BR_MISP_RETIRED.ANY_P
|
|
.Pq Event C5H , Umask 00H
|
|
See Table A-1.
|
|
.It Li BR_MISP_RETIRED.CONDITIONAL
|
|
.Pq Event C5H , Umask 01H
|
|
Counts mispredicted conditional retired calls.
|
|
.It Li BR_MISP_RETIRED.NEAR_CALL
|
|
.Pq Event C5H , Umask 02H
|
|
Counts mispredicted direct & indirect near unconditional retired calls.
|
|
.It Li BR_MISP_RETIRED.ALL_BRANCHES
|
|
.Pq Event C5H , Umask 04H
|
|
Counts all mispredicted retired calls.
|
|
.It Li SSEX_UOPS_RETIRED.PACKED_SINGLE
|
|
.Pq Event C7H , Umask 01H
|
|
Counts SIMD packed single-precision floating point Uops retired.
|
|
.It Li SSEX_UOPS_RETIRED.SCALAR_SINGLE
|
|
.Pq Event C7H , Umask 02H
|
|
Counts SIMD calar single-precision floating point Uops retired.
|
|
.It Li SSEX_UOPS_RETIRED.PACKED_DOUBLE
|
|
.Pq Event C7H , Umask 04H
|
|
Counts SIMD packed double- precision floating point Uops retired.
|
|
.It Li SSEX_UOPS_RETIRED.SCALAR_DOUBLE
|
|
.Pq Event C7H , Umask 08H
|
|
Counts SIMD scalar double-precision floating point Uops retired.
|
|
.It Li SSEX_UOPS_RETIRED.VECTOR_INTEGER
|
|
.Pq Event C7H , Umask 10H
|
|
Counts 128-bit SIMD vector integer Uops retired.
|
|
.It Li ITLB_MISS_RETIRED
|
|
.Pq Event C8H , Umask 20H
|
|
Counts the number of retired instructions that missed the ITLB when the
|
|
instruction was fetched.
|
|
.It Li MEM_LOAD_RETIRED.L1D_HIT
|
|
.Pq Event CBH , Umask 01H
|
|
Counts number of retired loads that hit the L1 data cache.
|
|
.It Li MEM_LOAD_RETIRED.L2_HIT
|
|
.Pq Event CBH , Umask 02H
|
|
Counts number of retired loads that hit the L2 data cache.
|
|
.It Li MEM_LOAD_RETIRED.L3_UNSHARED_HIT
|
|
.Pq Event CBH , Umask 04H
|
|
Counts number of retired loads that hit their own, unshared lines in the L3
|
|
cache.
|
|
.It Li MEM_LOAD_RETIRED.OTHER_CORE_L2_HIT_HITM
|
|
.Pq Event CBH , Umask 08H
|
|
Counts number of retired loads that hit in a sibling core's L2 (on die
|
|
core).
|
|
Since the L3 is inclusive of all cores on the package, this is an L3 hit.
|
|
This counts both clean or modified hits.
|
|
.It Li MEM_LOAD_RETIRED.L3_MISS
|
|
.Pq Event CBH , Umask 10H
|
|
Counts number of retired loads that miss the L3 cache.
|
|
The load was satisfied by a remote socket, local memory or an IOH.
|
|
.It Li MEM_LOAD_RETIRED.HIT_LFB
|
|
.Pq Event CBH , Umask 40H
|
|
Counts number of retired loads that miss the L1D and the address is located
|
|
in an allocated line fill buffer and will soon be committed to cache.
|
|
This is counting secondary L1D misses.
|
|
.It Li MEM_LOAD_RETIRED.DTLB_MISS
|
|
.Pq Event CBH , Umask 80H
|
|
Counts the number of retired loads that missed the DTLB.
|
|
The DTLB miss is not counted if the load operation causes a fault.
|
|
This event counts loads from cacheable memory only.
|
|
The event does not count loads by software prefetches.
|
|
Counts both primary and secondary misses to the TLB.
|
|
.It Li FP_MMX_TRANS.TO_FP
|
|
.Pq Event CCH , Umask 01H
|
|
Counts the first floating-point instruction following any MMX instruction.
|
|
You can use this event to estimate the penalties for the transitions between
|
|
floating-point and MMX technology states.
|
|
.It Li FP_MMX_TRANS.TO_MMX
|
|
.Pq Event CCH , Umask 02H
|
|
Counts the first MMX instruction following a floating-point instruction.
|
|
You can use this event to estimate the penalties for the transitions between
|
|
floating-point and MMX technology states.
|
|
.It Li FP_MMX_TRANS.ANY
|
|
.Pq Event CCH , Umask 03H
|
|
Counts all transitions from floating point to MMX instructions and from MMX
|
|
instructions to floating point instructions.
|
|
You can use this event to estimate the penalties for the transitions between
|
|
floating-point and MMX technology states.
|
|
.It Li MACRO_INSTS.DECODED
|
|
.Pq Event D0H , Umask 01H
|
|
Counts the number of instructions decoded, (but not necessarily executed or
|
|
retired).
|
|
.It Li UOPS_DECODED.STALL_CYCLES
|
|
.Pq Event D1H , Umask 01H
|
|
Counts the cycles of decoder stalls.
|
|
.It Li UOPS_DECODED.MS
|
|
.Pq Event D1H , Umask 02H
|
|
Counts the number of Uops decoded by the Microcode Sequencer, MS.
|
|
The MS delivers uops when the instruction is more than 4 uops long or a
|
|
microcode assist is occurring.
|
|
.It Li UOPS_DECODED.ESP_FOLDING
|
|
.Pq Event D1H , Umask 04H
|
|
Counts number of stack pointer (ESP) instructions decoded: push , pop , call
|
|
, ret, etc. ESP instructions do not generate a Uop to increment or decrement
|
|
ESP.
|
|
Instead, they update an ESP_Offset register that keeps track of the
|
|
delta to the current value of the ESP register.
|
|
.It Li UOPS_DECODED.ESP_SYNC
|
|
.Pq Event D1H , Umask 08H
|
|
Counts number of stack pointer (ESP) sync operations where an ESP
|
|
instruction is corrected by adding the ESP offset register to the current
|
|
value of the ESP register.
|
|
.It Li RAT_STALLS.FLAGS
|
|
.Pq Event D2H , Umask 01H
|
|
Counts the number of cycles during which execution stalled due to several
|
|
reasons, one of which is a partial flag register stall.
|
|
A partial register
|
|
stall may occur when two conditions are met: 1) an instruction modifies
|
|
some, but not all, of the flags in the flag register and 2) the next
|
|
instruction, which depends on flags, depends on flags that were not modified
|
|
by this instruction.
|
|
.It Li RAT_STALLS.REGISTERS
|
|
.Pq Event D2H , Umask 02H
|
|
This event counts the number of cycles instruction execution latency became
|
|
longer than the defined latency because the instruction used a register that
|
|
was partially written by previous instruction.
|
|
.It Li RAT_STALLS.ROB_READ_PORT
|
|
.Pq Event D2H , Umask 04H
|
|
Counts the number of cycles when ROB read port stalls occurred, which did
|
|
not allow new micro-ops to enter the out-of-order pipeline.
|
|
Note that, at
|
|
this stage in the pipeline, additional stalls may occur at the same cycle
|
|
and prevent the stalled micro-ops from entering the pipe.
|
|
In such a case,
|
|
micro-ops retry entering the execution pipe in the next cycle and the
|
|
ROB-read port stall is counted again.
|
|
.It Li RAT_STALLS.SCOREBOARD
|
|
.Pq Event D2H , Umask 08H
|
|
Counts the cycles where we stall due to microarchitecturally required
|
|
serialization.
|
|
Microcode scoreboarding stalls.
|
|
.It Li RAT_STALLS.ANY
|
|
.Pq Event D2H , Umask 0FH
|
|
Counts all Register Allocation Table stall cycles due to: Cycles when ROB
|
|
read port stalls occurred, which did not allow new micro-ops to enter the
|
|
execution pipe.
|
|
Cycles when partial register stalls occurred Cycles when
|
|
flag stalls occurred Cycles floating-point unit (FPU) status word stalls
|
|
occurred.
|
|
To count each of these conditions separately use the events:
|
|
RAT_STALLS.ROB_READ_PORT, RAT_STALLS.PARTIAL, RAT_STALLS.FLAGS, and
|
|
RAT_STALLS.FPSW.
|
|
.It Li SEG_RENAME_STALLS
|
|
.Pq Event D4H , Umask 01H
|
|
Counts the number of stall cycles due to the lack of renaming resources for
|
|
the ES, DS, FS, and GS segment registers.
|
|
If a segment is renamed but not
|
|
retired and a second update to the same segment occurs, a stall occurs in
|
|
the front- end of the pipeline until the renamed segment retires.
|
|
.It Li ES_REG_RENAMES
|
|
.Pq Event D5H , Umask 01H
|
|
Counts the number of times the ES segment register is renamed.
|
|
.It Li UOP_UNFUSION
|
|
.Pq Event DBH , Umask 01H
|
|
Counts unfusion events due to floating point exception to a fused uop.
|
|
.It Li BR_INST_DECODED
|
|
.Pq Event E0H , Umask 01H
|
|
Counts the number of branch instructions decoded.
|
|
.It Li BPU_MISSED_CALL_RET
|
|
.Pq Event E5H , Umask 01H
|
|
Counts number of times the Branch Prediction Unit missed predicting a call
|
|
or return branch.
|
|
.It Li BACLEAR.CLEAR
|
|
.Pq Event E6H , Umask 01H
|
|
Counts the number of times the front end is resteered, mainly when the
|
|
Branch Prediction Unit cannot provide a correct prediction and this is
|
|
corrected by the Branch Address Calculator at the front end.
|
|
This can occur
|
|
if the code has many branches such that they cannot be consumed by the BPU.
|
|
Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble
|
|
in the instruction fetch pipeline.
|
|
The effect on total execution time depends on the surrounding code.
|
|
.It Li BACLEAR.BAD_TARGET
|
|
.Pq Event E6H , Umask 02H
|
|
Counts number of Branch Address Calculator clears (BACLEAR) asserted due to
|
|
conditional branch instructions in which there was a target hit but the
|
|
direction was wrong.
|
|
Each BACLEAR asserted by the BAC generates
|
|
approximately an 8 cycle bubble in the instruction fetch pipeline.
|
|
.It Li BPU_CLEARS.EARLY
|
|
.Pq Event E8H , Umask 01H
|
|
Counts early (normal) Branch Prediction Unit clears: BPU predicted a taken
|
|
branch after incorrectly assuming that it was not taken.
|
|
The BPU clear leads to 2 cycle bubble in the Front End.
|
|
.It Li BPU_CLEARS.LATE
|
|
.Pq Event E8H , Umask 02H
|
|
Counts late Branch Prediction Unit clears due to Most Recently Used
|
|
conflicts.
|
|
The PBU clear leads to a 3 cycle bubble in the Front End.
|
|
.It Li THREAD_ACTIVE
|
|
.Pq Event ECH , Umask 01H
|
|
Counts cycles threads are active.
|
|
.It Li L2_TRANSACTIONS.LOAD
|
|
.Pq Event F0H , Umask 01H
|
|
Counts L2 load operations due to HW prefetch or demand loads.
|
|
.It Li L2_TRANSACTIONS.RFO
|
|
.Pq Event F0H , Umask 02H
|
|
Counts L2 RFO operations due to HW prefetch or demand RFOs.
|
|
.It Li L2_TRANSACTIONS.IFETCH
|
|
.Pq Event F0H , Umask 04H
|
|
Counts L2 instruction fetch operations due to HW prefetch or demand ifetch.
|
|
.It Li L2_TRANSACTIONS.PREFETCH
|
|
.Pq Event F0H , Umask 08H
|
|
Counts L2 prefetch operations.
|
|
.It Li L2_TRANSACTIONS.L1D_WB
|
|
.Pq Event F0H , Umask 10H
|
|
Counts L1D writeback operations to the L2.
|
|
.It Li L2_TRANSACTIONS.FILL
|
|
.Pq Event F0H , Umask 20H
|
|
Counts L2 cache line fill operations due to load, RFO, L1D writeback or
|
|
prefetch.
|
|
.It Li L2_TRANSACTIONS.WB
|
|
.Pq Event F0H , Umask 40H
|
|
Counts L2 writeback operations to the L3.
|
|
.It Li L2_TRANSACTIONS.ANY
|
|
.Pq Event F0H , Umask 80H
|
|
Counts all L2 cache operations.
|
|
.It Li L2_LINES_IN.S_STATE
|
|
.Pq Event F1H , Umask 02H
|
|
Counts the number of cache lines allocated in the L2 cache in the S (shared)
|
|
state.
|
|
.It Li L2_LINES_IN.E_STATE
|
|
.Pq Event F1H , Umask 04H
|
|
Counts the number of cache lines allocated in the L2 cache in the E
|
|
(exclusive) state.
|
|
.It Li L2_LINES_IN.ANY
|
|
.Pq Event F1H , Umask 07H
|
|
Counts the number of cache lines allocated in the L2 cache.
|
|
.It Li L2_LINES_OUT.DEMAND_CLEAN
|
|
.Pq Event F2H , Umask 01H
|
|
Counts L2 clean cache lines evicted by a demand request.
|
|
.It Li L2_LINES_OUT.DEMAND_DIRTY
|
|
.Pq Event F2H , Umask 02H
|
|
Counts L2 dirty (modified) cache lines evicted by a demand request.
|
|
.It Li L2_LINES_OUT.PREFETCH_CLEAN
|
|
.Pq Event F2H , Umask 04H
|
|
Counts L2 clean cache line evicted by a prefetch request.
|
|
.It Li L2_LINES_OUT.PREFETCH_DIRTY
|
|
.Pq Event F2H , Umask 08H
|
|
Counts L2 modified cache line evicted by a prefetch request.
|
|
.It Li L2_LINES_OUT.ANY
|
|
.Pq Event F2H , Umask 0FH
|
|
Counts all L2 cache lines evicted for any reason.
|
|
.It Li SQ_MISC.LRU_HINTS
|
|
.Pq Event F4H , Umask 04H
|
|
Counts number of Super Queue LRU hints sent to L3.
|
|
.It Li SQ_MISC.SPLIT_LOCK
|
|
.Pq Event F4H , Umask 10H
|
|
Counts the number of SQ lock splits across a cache line.
|
|
.It Li SQ_FULL_STALL_CYCLES
|
|
.Pq Event F6H , Umask 01H
|
|
Counts cycles the Super Queue is full.
|
|
Neither of the threads on this core will be able to access the uncore.
|
|
.It Li FP_ASSIST.ALL
|
|
.Pq Event F7H , Umask 01H
|
|
Counts the number of floating point operations executed that required
|
|
micro-code assist intervention.
|
|
Assists are required in the following cases:
|
|
SSE instructions, (Denormal input when the DAZ flag is off or Underflow
|
|
result when the FTZ flag is off): x87 instructions, (NaN or denormal are
|
|
loaded to a register or used as input from memory, Division by 0 or
|
|
Underflow output).
|
|
.It Li FP_ASSIST.OUTPUT
|
|
.Pq Event F7H , Umask 02H
|
|
Counts number of floating point micro-code assist when the output value
|
|
(destination register) is invalid.
|
|
.It Li FP_ASSIST.INPUT
|
|
.Pq Event F7H , Umask 04H
|
|
Counts number of floating point micro-code assist when the input value (one
|
|
of the source operands to an FP instruction) is invalid.
|
|
.It Li SIMD_INT_64.PACKED_MPY
|
|
.Pq Event FDH , Umask 01H
|
|
Counts number of SID integer 64 bit packed multiply operations.
|
|
.It Li SIMD_INT_64.PACKED_SHIFT
|
|
.Pq Event FDH , Umask 02H
|
|
Counts number of SID integer 64 bit packed shift operations.
|
|
.It Li SIMD_INT_64.PACK
|
|
.Pq Event FDH , Umask 04H
|
|
Counts number of SID integer 64 bit pack operations.
|
|
.It Li SIMD_INT_64.UNPACK
|
|
.Pq Event FDH , Umask 08H
|
|
Counts number of SID integer 64 bit unpack operations.
|
|
.It Li SIMD_INT_64.PACKED_LOGICAL
|
|
.Pq Event FDH , Umask 10H
|
|
Counts number of SID integer 64 bit logical operations.
|
|
.It Li SIMD_INT_64.PACKED_ARITH
|
|
.Pq Event FDH , Umask 20H
|
|
Counts number of SID integer 64 bit arithmetic operations.
|
|
.It Li SIMD_INT_64.SHUFFLE_MOVE
|
|
.Pq Event FDH , Umask 40H
|
|
Counts number of SID integer 64 bit shift or move operations.
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr pmc 3 ,
|
|
.Xr pmc.atom 3 ,
|
|
.Xr pmc.core 3 ,
|
|
.Xr pmc.iaf 3 ,
|
|
.Xr pmc.ucf 3 ,
|
|
.Xr pmc.k7 3 ,
|
|
.Xr pmc.k8 3 ,
|
|
.Xr pmc.p4 3 ,
|
|
.Xr pmc.p5 3 ,
|
|
.Xr pmc.p6 3 ,
|
|
.Xr pmc.corei7 3 ,
|
|
.Xr pmc.corei7uc 3 ,
|
|
.Xr pmc.westmereuc 3 ,
|
|
.Xr pmc.soft 3 ,
|
|
.Xr pmc.tsc 3 ,
|
|
.Xr pmc_cpuinfo 3 ,
|
|
.Xr pmclog 3 ,
|
|
.Xr hwpmc 4
|
|
.Sh HISTORY
|
|
The
|
|
.Nm pmc
|
|
library first appeared in
|
|
.Fx 6.0 .
|
|
.Sh AUTHORS
|
|
The
|
|
.Lb libpmc
|
|
library was written by
|
|
.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org .
|