Whitespace cleanup:
o Wrap sentences on to new lines o Rewrap lines where possible while trying to keep the diff to a minimum Found with: textproc/igor MFC after: 1 week X-MFC-With: r232157
This commit is contained in:
parent
1ab2433a4c
commit
7723655272
@ -92,15 +92,17 @@ Configure the Off-core Response bits.
|
||||
.It Li DMND_DATA_RD
|
||||
Counts the number of demand and DCU prefetch data reads of full
|
||||
and partial cachelines as well as demand data page table entry
|
||||
cacheline reads. Does not count L2 data read prefetches or
|
||||
cacheline reads.
|
||||
Does not count L2 data read prefetches or
|
||||
instruction fetches.
|
||||
.It Li DMND_RFO
|
||||
Counts the number of demand and DCU prefetch reads for ownership
|
||||
(RFO) requests generated by a write to data cacheline. Does not
|
||||
count L2 RFO.
|
||||
(RFO) requests generated by a write to data cacheline.
|
||||
Does not count L2 RFO.
|
||||
.It Li DMND_IFETCH
|
||||
Counts the number of demand and DCU prefetch instruction cacheline
|
||||
reads. Does not count L2 code read prefetches.
|
||||
reads.
|
||||
Does not count L2 code read prefetches.
|
||||
WB
|
||||
Counts the number of writeback (modified to exclusive) transactions.
|
||||
.It Li PF_DATA_RD
|
||||
@ -181,7 +183,8 @@ All Store buffer stall cycles
|
||||
All store referenced with misaligned address
|
||||
.It Li STORE_BLOCKS.AT_RET
|
||||
.Pq Event 06H , Umask 04H
|
||||
Counts number of loads delayed with at-Retirement block code. The following
|
||||
Counts number of loads delayed with at-Retirement block code.
|
||||
The following
|
||||
loads need to be executed at retirement and wait for all senior stores on
|
||||
the same thread to be drained: load splitting across 4K boundary (page
|
||||
split), load accessing uncacheable (UC or USWC) memory, load lock, and load
|
||||
@ -225,9 +228,10 @@ ld_lat facility.
|
||||
In conjunction with ld_lat facility
|
||||
.It Li MEM_STORE_RETIRED.DTLB_MISS
|
||||
.Pq Event 0CH , Umask 01H
|
||||
The event counts the number of retired stores that missed the DTLB. The DTLB
|
||||
miss is not counted if the store operation causes a fault. Does not counter
|
||||
prefetches. Counts both primary and secondary misses to the TLB
|
||||
The event counts the number of retired stores that missed the DTLB.
|
||||
The DTLB miss is not counted if the store operation causes a fault.
|
||||
Does not counter prefetches.
|
||||
Counts both primary and secondary misses to the TLB
|
||||
.It Li UOPS_ISSUED.ANY
|
||||
.Pq Event 0EH , Umask 01H
|
||||
Counts the number of Uops issued by the Register Allocation Table to the
|
||||
@ -264,9 +268,11 @@ Load instructions retired remote DRAM and remote home-remote cache HITM
|
||||
Load instructions retired I/O (Precise Event)
|
||||
.It Li FP_COMP_OPS_EXE.X87
|
||||
.Pq Event 10H , Umask 01H
|
||||
Counts the number of FP Computational Uops Executed. The number of FADD,
|
||||
Counts the number of FP Computational Uops Executed.
|
||||
The number of FADD,
|
||||
FSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer
|
||||
DIVs, and IDIVs. This event does not distinguish an FADD used in the middle
|
||||
DIVs, and IDIVs.
|
||||
This event does not distinguish an FADD used in the middle
|
||||
of a transcendental flow from a separate FADD instruction.
|
||||
.It Li FP_COMP_OPS_EXE.MMX
|
||||
.Pq Event 10H , Umask 02H
|
||||
@ -316,9 +322,9 @@ Counts number of loads dispatched from the Reservation Station that bypass
|
||||
the Memory Order Buffer.
|
||||
.It Li LOAD_DISPATCH.RS_DELAYED
|
||||
.Pq Event 13H , Umask 02H
|
||||
Counts the number of delayed RS dispatches at the stage latch. If an RS
|
||||
dispatch can not bypass to LB, it has another chance to dispatch from the
|
||||
one-cycle delayed staging latch before it is written into the LB.
|
||||
Counts the number of delayed RS dispatches at the stage latch.
|
||||
If an RS dispatch can not bypass to LB, it has another chance to dispatch
|
||||
from the one-cycle delayed staging latch before it is written into the LB.
|
||||
.It Li LOAD_DISPATCH.MOB
|
||||
.Pq Event 13H , Umask 04H
|
||||
Counts the number of loads dispatched from the Reservation Station to the
|
||||
@ -329,13 +335,15 @@ Counts all loads dispatched from the Reservation Station.
|
||||
.It Li ARITH.CYCLES_DIV_BUSY
|
||||
.Pq Event 14H , Umask 01H
|
||||
Counts the number of cycles the divider is busy executing divide or square
|
||||
root operations. The divide can be integer, X87 or Streaming SIMD Extensions
|
||||
(SSE). The square root operation can be either X87 or SSE.
|
||||
root operations.
|
||||
The divide can be integer, X87 or Streaming SIMD Extensions (SSE).
|
||||
The square root operation can be either X87 or SSE.
|
||||
Set 'edge =1, invert=1, cmask=1' to count the number of divides.
|
||||
Count may be incorrect When SMT is on
|
||||
.It Li ARITH.MUL
|
||||
.Pq Event 14H , Umask 02H
|
||||
Counts the number of multiply operations executed. This includes integer as
|
||||
Counts the number of multiply operations executed.
|
||||
This includes integer as
|
||||
well as floating point multiply operations but excludes DPPS mul and MPSAD.
|
||||
Count may be incorrect When SMT is on
|
||||
.It Li INST_QUEUE_WRITES
|
||||
@ -344,64 +352,72 @@ Counts the number of instructions written into the instruction queue every
|
||||
cycle.
|
||||
.It Li INST_DECODED.DEC0
|
||||
.Pq Event 18H , Umask 01H
|
||||
Counts number of instructions that require decoder 0 to be decoded. Usually,
|
||||
this means that the instruction maps to more than 1 uop
|
||||
Counts number of instructions that require decoder 0 to be decoded.
|
||||
Usually, this means that the instruction maps to more than 1 uop
|
||||
.It Li TWO_UOP_INSTS_DECODED
|
||||
.Pq Event 19H , Umask 01H
|
||||
An instruction that generates two uops was decoded
|
||||
.It Li INST_QUEUE_WRITE_CYCLES
|
||||
.Pq Event 1EH , Umask 01H
|
||||
This event counts the number of cycles during which instructions are written
|
||||
to the instruction queue. Dividing this counter by the number of
|
||||
to the instruction queue.
|
||||
Dividing this counter by the number of
|
||||
instructions written to the instruction queue (INST_QUEUE_WRITES) yields the
|
||||
average number of instructions decoded each cycle. If this number is less
|
||||
average number of instructions decoded each cycle.
|
||||
If this number is less
|
||||
than four and the pipe stalls, this indicates that the decoder is failing to
|
||||
decode enough instructions per cycle to sustain the 4-wide pipeline.
|
||||
If SSE* instructions that are 6 bytes or longer arrive one after another,
|
||||
then front end throughput may limit execution speed. In such case,
|
||||
then front end throughput may limit execution speed.
|
||||
In such case,
|
||||
.It Li LSD_OVERFLOW
|
||||
.Pq Event 20H , Umask 01H
|
||||
Number of loops that can not stream from the instruction queue.
|
||||
.It Li L2_RQSTS.LD_HIT
|
||||
.Pq Event 24H , Umask 01H
|
||||
Counts number of loads that hit the L2 cache. L2 loads include both L1D
|
||||
demand misses as well as L1D prefetches. L2 loads can be rejected for
|
||||
various reasons. Only non rejected loads are counted.
|
||||
Counts number of loads that hit the L2 cache.
|
||||
L2 loads include both L1D demand misses as well as L1D prefetches.
|
||||
L2 loads can be rejected for various reasons.
|
||||
Only non rejected loads are counted.
|
||||
.It Li L2_RQSTS.LD_MISS
|
||||
.Pq Event 24H , Umask 02H
|
||||
Counts the number of loads that miss the L2 cache. L2 loads include both L1D
|
||||
demand misses as well as L1D prefetches.
|
||||
Counts the number of loads that miss the L2 cache.
|
||||
L2 loads include both L1D demand misses as well as L1D prefetches.
|
||||
.It Li L2_RQSTS.LOADS
|
||||
.Pq Event 24H , Umask 03H
|
||||
Counts all L2 load requests. L2 loads include both L1D demand misses as well
|
||||
as L1D prefetches.
|
||||
Counts all L2 load requests.
|
||||
L2 loads include both L1D demand misses as well as L1D prefetches.
|
||||
.It Li L2_RQSTS.RFO_HIT
|
||||
.Pq Event 24H , Umask 04H
|
||||
Counts the number of store RFO requests that hit the L2 cache. L2 RFO
|
||||
requests include both L1D demand RFO misses as well as L1D RFO prefetches.
|
||||
Counts the number of store RFO requests that hit the L2 cache.
|
||||
L2 RFO requests include both L1D demand RFO misses as well as L1D RFO
|
||||
prefetches.
|
||||
Count includes WC memory requests, where the data is not fetched but the
|
||||
permission to write the line is required.
|
||||
.It Li L2_RQSTS.RFO_MISS
|
||||
.Pq Event 24H , Umask 08H
|
||||
Counts the number of store RFO requests that miss the L2 cache. L2 RFO
|
||||
requests include both L1D demand RFO misses as well as L1D RFO prefetches.
|
||||
Counts the number of store RFO requests that miss the L2 cache.
|
||||
L2 RFO requests include both L1D demand RFO misses as well as L1D RFO
|
||||
prefetches.
|
||||
.It Li L2_RQSTS.RFOS
|
||||
.Pq Event 24H , Umask 0CH
|
||||
Counts all L2 store RFO requests. L2 RFO requests include both L1D demand
|
||||
Counts all L2 store RFO requests.
|
||||
L2 RFO requests include both L1D demand
|
||||
RFO misses as well as L1D RFO prefetches.
|
||||
.It Li L2_RQSTS.IFETCH_HIT
|
||||
.Pq Event 24H , Umask 10H
|
||||
Counts number of instruction fetches that hit the L2 cache. L2 instruction
|
||||
fetches include both L1I demand misses as well as L1I instruction
|
||||
prefetches.
|
||||
Counts number of instruction fetches that hit the L2 cache.
|
||||
L2 instruction fetches include both L1I demand misses as well as L1I
|
||||
instruction prefetches.
|
||||
.It Li L2_RQSTS.IFETCH_MISS
|
||||
.Pq Event 24H , Umask 20H
|
||||
Counts number of instruction fetches that miss the L2 cache. L2 instruction
|
||||
fetches include both L1I demand misses as well as L1I instruction
|
||||
prefetches.
|
||||
Counts number of instruction fetches that miss the L2 cache.
|
||||
L2 instruction fetches include both L1I demand misses as well as L1I
|
||||
instruction prefetches.
|
||||
.It Li L2_RQSTS.IFETCHES
|
||||
.Pq Event 24H , Umask 30H
|
||||
Counts all instruction fetches. L2 instruction fetches include both L1I
|
||||
Counts all instruction fetches.
|
||||
L2 instruction fetches include both L1I
|
||||
demand misses as well as L1I instruction prefetches.
|
||||
.It Li L2_RQSTS.PREFETCH_HIT
|
||||
.Pq Event 24H , Umask 40H
|
||||
@ -421,26 +437,30 @@ Counts all L2 requests for both code and data.
|
||||
.It Li L2_DATA_RQSTS.DEMAND.I_STATE
|
||||
.Pq Event 26H , Umask 01H
|
||||
Counts number of L2 data demand loads where the cache line to be loaded is
|
||||
in the I (invalid) state, i.e. a cache miss. L2 demand loads are both L1D
|
||||
demand misses and L1D prefetches.
|
||||
in the I (invalid) state, i.e. a cache miss.
|
||||
L2 demand loads are both L1D demand misses and L1D prefetches.
|
||||
.It Li L2_DATA_RQSTS.DEMAND.S_STATE
|
||||
.Pq Event 26H , Umask 02H
|
||||
Counts number of L2 data demand loads where the cache line to be loaded is
|
||||
in the S (shared) state. L2 demand loads are both L1D demand misses and L1D
|
||||
in the S (shared) state.
|
||||
L2 demand loads are both L1D demand misses and L1D
|
||||
prefetches.
|
||||
.It Li L2_DATA_RQSTS.DEMAND.E_STATE
|
||||
.Pq Event 26H , Umask 04H
|
||||
Counts number of L2 data demand loads where the cache line to be loaded is
|
||||
in the E (exclusive) state. L2 demand loads are both L1D demand misses and
|
||||
in the E (exclusive) state.
|
||||
L2 demand loads are both L1D demand misses and
|
||||
L1D prefetches.
|
||||
.It Li L2_DATA_RQSTS.DEMAND.M_STATE
|
||||
.Pq Event 26H , Umask 08H
|
||||
Counts number of L2 data demand loads where the cache line to be loaded is
|
||||
in the M (modified) state. L2 demand loads are both L1D demand misses and
|
||||
in the M (modified) state.
|
||||
L2 demand loads are both L1D demand misses and
|
||||
L1D prefetches.
|
||||
.It Li L2_DATA_RQSTS.DEMAND.MESI
|
||||
.Pq Event 26H , Umask 0FH
|
||||
Counts all L2 data demand requests. L2 demand loads are both L1D demand
|
||||
Counts all L2 data demand requests.
|
||||
L2 demand loads are both L1D demand
|
||||
misses and L1D prefetches.
|
||||
.It Li L2_DATA_RQSTS.PREFETCH.I_STATE
|
||||
.Pq Event 26H , Umask 10H
|
||||
@ -449,7 +469,8 @@ in the I (invalid) state, i.e. a cache miss.
|
||||
.It Li L2_DATA_RQSTS.PREFETCH.S_STATE
|
||||
.Pq Event 26H , Umask 20H
|
||||
Counts number of L2 prefetch data loads where the cache line to be loaded is
|
||||
in the S (shared) state. A prefetch RFO will miss on an S state line, while
|
||||
in the S (shared) state.
|
||||
A prefetch RFO will miss on an S state line, while
|
||||
a prefetch read will hit on an S state line.
|
||||
.It Li L2_DATA_RQSTS.PREFETCH.E_STATE
|
||||
.Pq Event 26H , Umask 40H
|
||||
@ -468,23 +489,27 @@ Counts all L2 data requests.
|
||||
.It Li L2_WRITE.RFO.I_STATE
|
||||
.Pq Event 27H , Umask 01H
|
||||
Counts number of L2 demand store RFO requests where the cache line to be
|
||||
loaded is in the I (invalid) state, i.e, a cache miss. The L1D prefetcher
|
||||
loaded is in the I (invalid) state, i.e, a cache miss.
|
||||
The L1D prefetcher
|
||||
does not issue a RFO prefetch.
|
||||
This is a demand RFO request
|
||||
.It Li L2_WRITE.RFO.S_STATE
|
||||
.Pq Event 27H , Umask 02H
|
||||
Counts number of L2 store RFO requests where the cache line to be loaded is
|
||||
in the S (shared) state. The L1D prefetcher does not issue a RFO prefetch.
|
||||
in the S (shared) state.
|
||||
The L1D prefetcher does not issue a RFO prefetch.
|
||||
This is a demand RFO request.
|
||||
.It Li L2_WRITE.RFO.M_STATE
|
||||
.Pq Event 27H , Umask 08H
|
||||
Counts number of L2 store RFO requests where the cache line to be loaded is
|
||||
in the M (modified) state. The L1D prefetcher does not issue a RFO prefetch.
|
||||
in the M (modified) state.
|
||||
The L1D prefetcher does not issue a RFO prefetch.
|
||||
This is a demand RFO request.
|
||||
.It Li L2_WRITE.RFO.HIT
|
||||
.Pq Event 27H , Umask 0EH
|
||||
Counts number of L2 store RFO requests where the cache line to be loaded is
|
||||
in either the S, E or M states. The L1D prefetcher does not issue a RFO
|
||||
in either the S, E or M states.
|
||||
The L1D prefetcher does not issue a RFO
|
||||
prefetch.
|
||||
This is a demand RFO request
|
||||
.It Li L2_WRITE.RFO.MESI
|
||||
@ -536,21 +561,23 @@ is in the M (modified) state.
|
||||
Counts all L1 writebacks to the L2.
|
||||
.It Li L3_LAT_CACHE.REFERENCE
|
||||
.Pq Event 2EH , Umask 02H
|
||||
Counts uncore Last Level Cache references. Because cache hierarchy, cache
|
||||
Counts uncore Last Level Cache references.
|
||||
Because cache hierarchy, cache
|
||||
sizes and other implementation-specific characteristics; value comparison to
|
||||
estimate performance differences is not recommended.
|
||||
See Table A-1.
|
||||
.It Li L3_LAT_CACHE.MISS
|
||||
.Pq Event 2EH , Umask 01H
|
||||
Counts uncore Last Level Cache misses. Because cache hierarchy, cache sizes
|
||||
Counts uncore Last Level Cache misses.
|
||||
Because cache hierarchy, cache sizes
|
||||
and other implementation-specific characteristics; value comparison to
|
||||
estimate performance differences is not recommended.
|
||||
See Table A-1.
|
||||
.It Li CPU_CLK_UNHALTED.THREAD_P
|
||||
.Pq Event 3CH , Umask 00H
|
||||
Counts the number of thread cycles while the thread is not in a halt state.
|
||||
The thread enters the halt state when it is running the HLT instruction. The
|
||||
core frequency may change from time to time due to power or thermal
|
||||
The thread enters the halt state when it is running the HLT instruction.
|
||||
The core frequency may change from time to time due to power or thermal
|
||||
throttling.
|
||||
see Table A-1
|
||||
.It Li CPU_CLK_UNHALTED.REF_P
|
||||
@ -569,7 +596,8 @@ Counts cycles of page walk due to misses in the STLB.
|
||||
.It Li DTLB_MISSES.STLB_HIT
|
||||
.Pq Event 49H , Umask 10H
|
||||
Counts the number of DTLB first level misses that hit in the second level
|
||||
TLB. This event is only relevant if the core contains multiple DTLB levels.
|
||||
TLB.
|
||||
This event is only relevant if the core contains multiple DTLB levels.
|
||||
.It Li DTLB_MISSES.LARGE_WALK_COMPLETED
|
||||
.Pq Event 49H , Umask 80H
|
||||
Counts number of completed large page walks due to misses in the STLB.
|
||||
@ -584,17 +612,22 @@ Counts number of hardware prefetch requests dispatched out of the prefetch
|
||||
FIFO.
|
||||
.It Li L1D_PREFETCH.MISS
|
||||
.Pq Event 4EH , Umask 02H
|
||||
Counts number of hardware prefetch requests that miss the L1D. There are two
|
||||
prefetchers in the L1D. A streamer, which predicts lines sequentially after
|
||||
Counts number of hardware prefetch requests that miss the L1D.
|
||||
There are two
|
||||
prefetchers in the L1D.
|
||||
A streamer, which predicts lines sequentially after
|
||||
this one should be fetched, and the IP prefetcher that remembers access
|
||||
patterns for the current instruction. The streamer prefetcher stops on an
|
||||
patterns for the current instruction.
|
||||
The streamer prefetcher stops on an
|
||||
L1D hit, while the IP prefetcher does not.
|
||||
.It Li L1D_PREFETCH.TRIGGERS
|
||||
.Pq Event 4EH , Umask 04H
|
||||
Counts number of prefetch requests triggered by the Finite State Machine and
|
||||
pushed into the prefetch FIFO. Some of the prefetch requests are dropped due
|
||||
pushed into the prefetch FIFO.
|
||||
Some of the prefetch requests are dropped due
|
||||
to overwrites or competition between the IP index prefetcher and streamer
|
||||
prefetcher. The prefetch FIFO contains 4 entries.
|
||||
prefetcher.
|
||||
The prefetch FIFO contains 4 entries.
|
||||
.It Li EPT.WALK_CYCLES
|
||||
.Pq Event 4FH , Umask 10H
|
||||
Counts Extended Page walk cycles.
|
||||
@ -626,31 +659,33 @@ Counts the number of cacheable load lock speculated or retired instructions
|
||||
accepted into the fill buffer.
|
||||
.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_DATA
|
||||
.Pq Event 60H , Umask 01H
|
||||
Counts weighted cycles of offcore demand data read requests. Does not
|
||||
include L2 prefetch requests.
|
||||
Counts weighted cycles of offcore demand data read requests.
|
||||
Does not include L2 prefetch requests.
|
||||
Counter 0.
|
||||
.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_CODE
|
||||
.Pq Event 60H , Umask 02H
|
||||
Counts weighted cycles of offcore demand code read requests. Does not
|
||||
include L2 prefetch requests.
|
||||
Counts weighted cycles of offcore demand code read requests.
|
||||
Does not include L2 prefetch requests.
|
||||
Counter 0.
|
||||
.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.RFO
|
||||
.Pq Event 60H , Umask 04H
|
||||
Counts weighted cycles of offcore demand RFO requests. Does not include L2
|
||||
prefetch requests.
|
||||
Counts weighted cycles of offcore demand RFO requests.
|
||||
Does not include L2 prefetch requests.
|
||||
Counter 0.
|
||||
.It Li OFFCORE_REQUESTS_OUTSTANDING.ANY.READ
|
||||
.Pq Event 60H , Umask 08H
|
||||
Counts weighted cycles of offcore read requests of any kind. Include L2
|
||||
prefetch requests.
|
||||
Counts weighted cycles of offcore read requests of any kind.
|
||||
Include L2 prefetch requests.
|
||||
Counter 0.
|
||||
.It Li CACHE_LOCK_CYCLES.L1D_L2
|
||||
.Pq Event 63H , Umask 01H
|
||||
Cycle count during which the L1D and L2 are locked. A lock is asserted when
|
||||
Cycle count during which the L1D and L2 are locked.
|
||||
A lock is asserted when
|
||||
there is a locked memory access, due to uncacheable memory, a locked
|
||||
operation that spans two cache lines, or a page walk from an uncacheable
|
||||
page table.
|
||||
Counter 0, 1 only. L1D and L2 locks have a very high performance penalty and
|
||||
Counter 0, 1 only.
|
||||
L1D and L2 locks have a very high performance penalty and
|
||||
it is highly recommended to avoid such accesses.
|
||||
.It Li CACHE_LOCK_CYCLES.L1D
|
||||
.Pq Event 63H , Umask 02H
|
||||
@ -665,9 +700,11 @@ Counts the number of completed I/O transactions.
|
||||
Counts all instruction fetches that hit the L1 instruction cache.
|
||||
.It Li L1I.MISSES
|
||||
.Pq Event 80H , Umask 02H
|
||||
Counts all instruction fetches that miss the L1I cache. This includes
|
||||
Counts all instruction fetches that miss the L1I cache.
|
||||
This includes
|
||||
instruction cache misses, streaming buffer misses, victim cache misses and
|
||||
uncacheable fetches. An instruction fetch miss is counted only once and not
|
||||
uncacheable fetches.
|
||||
An instruction fetch miss is counted only once and not
|
||||
once for every cycle it is outstanding.
|
||||
.It Li L1I.READS
|
||||
.Pq Event 80H , Umask 03H
|
||||
@ -747,10 +784,10 @@ Counts all near call branches executed, but not necessarily retired.
|
||||
Counts taken near branches executed, but not necessarily retired.
|
||||
.It Li BR_INST_EXEC.ANY
|
||||
.Pq Event 88H , Umask 7FH
|
||||
Counts all near executed branches (not necessarily retired). This includes
|
||||
only instructions and not micro-op branches. Frequent branching is not
|
||||
necessarily a major performance issue. However frequent branch
|
||||
mispredictions may be a problem.
|
||||
Counts all near executed branches (not necessarily retired).
|
||||
This includes only instructions and not micro-op branches.
|
||||
Frequent branching is not necessarily a major performance issue.
|
||||
However frequent branch mispredictions may be a problem.
|
||||
.It Li BR_MISP_EXEC.COND
|
||||
.Pq Event 89H , Umask 01H
|
||||
Counts the number of mispredicted conditional near branch instructions
|
||||
@ -791,9 +828,10 @@ Counts the number of mispredicted near branch instructions that were
|
||||
executed, but not necessarily retired.
|
||||
.It Li RESOURCE_STALLS.ANY
|
||||
.Pq Event A2H , Umask 01H
|
||||
Counts the number of Allocator resource related stalls. Includes register
|
||||
renaming buffer entries, memory buffer entries. In addition to resource
|
||||
related stalls, this event counts some other events. Includes stalls arising
|
||||
Counts the number of Allocator resource related stalls.
|
||||
Includes register renaming buffer entries, memory buffer entries.
|
||||
In addition to resource related stalls, this event counts some other events.
|
||||
Includes stalls arising
|
||||
during branch misprediction recovery, such as if retirement of the
|
||||
mispredicted branch is delayed and stalls arising while store buffer is
|
||||
draining from synchronizing operations.
|
||||
@ -806,7 +844,8 @@ Counts the cycles of stall due to lack of load buffer for load operation.
|
||||
.Pq Event A2H , Umask 04H
|
||||
This event counts the number of cycles when the number of instructions in
|
||||
the pipeline waiting for execution reaches the limit the processor can
|
||||
handle. A high count of this event indicates that there are long latency
|
||||
handle.
|
||||
A high count of this event indicates that there are long latency
|
||||
operations in the pipe (possibly load and store operations that miss the L2
|
||||
cache, or instructions dependent upon instructions further down the pipeline
|
||||
that have yet to retire.
|
||||
@ -816,7 +855,8 @@ start execution.
|
||||
.Pq Event A2H , Umask 08H
|
||||
This event counts the number of cycles that a resource related stall will
|
||||
occur due to the number of store instructions reaching the limit of the
|
||||
pipeline, (i.e. all store buffers are used). The stall ends when a store
|
||||
pipeline, (i.e. all store buffers are used).
|
||||
The stall ends when a store
|
||||
instruction commits its data to the cache or memory.
|
||||
.It Li RESOURCE_STALLS.ROB_FULL
|
||||
.Pq Event A2H , Umask 10H
|
||||
@ -828,7 +868,8 @@ floating-point unit (FPU) control word.
|
||||
.It Li RESOURCE_STALLS.MXCSR
|
||||
.Pq Event A2H , Umask 40H
|
||||
Stalls due to the MXCSR register rename occurring to close to a previous
|
||||
MXCSR rename. The MXCSR provides control and status for the MMX registers.
|
||||
MXCSR rename.
|
||||
The MXCSR provides control and status for the MMX registers.
|
||||
.It Li RESOURCE_STALLS.OTHER
|
||||
.Pq Event A2H , Umask 80H
|
||||
Counts the number of cycles while execution was stalled due to other
|
||||
@ -839,12 +880,14 @@ Counts the number of instructions decoded that are macro-fused but not
|
||||
necessarily executed or retired.
|
||||
.It Li BACLEAR_FORCE_IQ
|
||||
.Pq Event A7H , Umask 01H
|
||||
Counts number of times a BACLEAR was forced by the Instruction Queue. The IQ
|
||||
is also responsible for providing conditional branch prediction direction
|
||||
based on a static scheme and dynamic data provided by the L2 Branch
|
||||
Prediction Unit. If the conditional branch target is not found in the Target
|
||||
Counts number of times a BACLEAR was forced by the Instruction Queue.
|
||||
The IQ is also responsible for providing conditional branch prediction
|
||||
direction based on a static scheme and dynamic data provided by the L2
|
||||
Branch Prediction Unit.
|
||||
If the conditional branch target is not found in the Target
|
||||
Array and the IQ predicts that the branch is taken, then the IQ will force
|
||||
the Branch Address Calculator to issue a BACLEAR. Each BACLEAR asserted by
|
||||
the Branch Address Calculator to issue a BACLEAR.
|
||||
Each BACLEAR asserted by
|
||||
the BAC generates approximately an 8 cycle bubble in the instruction fetch
|
||||
pipeline.
|
||||
.It Li LSD.UOPS
|
||||
@ -856,22 +899,24 @@ Use cmask=1 and invert to count cycles
|
||||
Counts the number of ITLB flushes
|
||||
.It Li OFFCORE_REQUESTS.DEMAND.READ_DATA
|
||||
.Pq Event B0H , Umask 01H
|
||||
Counts number of offcore demand data read requests. Does not count L2
|
||||
prefetch requests.
|
||||
Counts number of offcore demand data read requests.
|
||||
Does not count L2 prefetch requests.
|
||||
.It Li OFFCORE_REQUESTS.DEMAND.READ_CODE
|
||||
.Pq Event B0H , Umask 02H
|
||||
Counts number of offcore demand code read requests. Does not count L2
|
||||
prefetch requests.
|
||||
Counts number of offcore demand code read requests.
|
||||
Does not count L2 prefetch requests.
|
||||
.It Li OFFCORE_REQUESTS.DEMAND.RFO
|
||||
.Pq Event B0H , Umask 04H
|
||||
Counts number of offcore demand RFO requests. Does not count L2 prefetch
|
||||
requests.
|
||||
Counts number of offcore demand RFO requests.
|
||||
Does not count L2 prefetch requests.
|
||||
.It Li OFFCORE_REQUESTS.ANY.READ
|
||||
.Pq Event B0H , Umask 08H
|
||||
Counts number of offcore read requests. Includes L2 prefetch requests.
|
||||
Counts number of offcore read requests.
|
||||
Includes L2 prefetch requests.
|
||||
.It Li OFFCORE_REQUESTS.ANY.RFO
|
||||
.Pq Event 80H , Umask 10H
|
||||
Counts number of offcore RFO requests. Includes L2 prefetch requests.
|
||||
Counts number of offcore RFO requests.
|
||||
Includes L2 prefetch requests.
|
||||
.It Li OFFCORE_REQUESTS.L1D_WRITEBACK
|
||||
.Pq Event B0H , Umask 40H
|
||||
Counts number of L1D writebacks to the uncore.
|
||||
@ -880,38 +925,42 @@ Counts number of L1D writebacks to the uncore.
|
||||
Counts all offcore requests.
|
||||
.It Li UOPS_EXECUTED.PORT0
|
||||
.Pq Event B1H , Umask 01H
|
||||
Counts number of Uops executed that were issued on port 0. Port 0 handles
|
||||
integer arithmetic, SIMD and FP add Uops.
|
||||
Counts number of Uops executed that were issued on port 0.
|
||||
Port 0 handles integer arithmetic, SIMD and FP add Uops.
|
||||
.It Li UOPS_EXECUTED.PORT1
|
||||
.Pq Event B1H , Umask 02H
|
||||
Counts number of Uops executed that were issued on port 1. Port 1 handles
|
||||
integer arithmetic, SIMD, integer shift, FP multiply and FP divide Uops.
|
||||
Counts number of Uops executed that were issued on port 1.
|
||||
Port 1 handles integer arithmetic, SIMD, integer shift, FP multiply and
|
||||
FP divide Uops.
|
||||
.It Li UOPS_EXECUTED.PORT2_CORE
|
||||
.Pq Event B1H , Umask 04H
|
||||
Counts number of Uops executed that were issued on port 2. Port 2 handles
|
||||
the load Uops. This is a core count only and can not be collected per
|
||||
Counts number of Uops executed that were issued on port 2.
|
||||
Port 2 handles the load Uops.
|
||||
This is a core count only and can not be collected per
|
||||
thread.
|
||||
.It Li UOPS_EXECUTED.PORT3_CORE
|
||||
.Pq Event B1H , Umask 08H
|
||||
Counts number of Uops executed that were issued on port 3. Port 3 handles
|
||||
store Uops. This is a core count only and can not be collected per thread.
|
||||
Counts number of Uops executed that were issued on port 3.
|
||||
Port 3 handles store Uops.
|
||||
This is a core count only and can not be collected per thread.
|
||||
.It Li UOPS_EXECUTED.PORT4_CORE
|
||||
.Pq Event B1H , Umask 10H
|
||||
Counts number of Uops executed that where issued on port 4. Port 4 handles
|
||||
the value to be stored for the store Uops issued on port 3. This is a core
|
||||
count only and can not be collected per thread.
|
||||
Counts number of Uops executed that where issued on port 4.
|
||||
Port 4 handles the value to be stored for the store Uops issued on port 3.
|
||||
This is a core count only and can not be collected per thread.
|
||||
.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES_NO_PORT5
|
||||
.Pq Event B1H , Umask 1FH
|
||||
Counts number of cycles there are one or more uops being executed and were
|
||||
issued on ports 0-4. This is a core count only and can not be collected per
|
||||
thread.
|
||||
issued on ports 0-4.
|
||||
This is a core count only and can not be collected per thread.
|
||||
.It Li UOPS_EXECUTED.PORT5
|
||||
.Pq Event B1H , Umask 20H
|
||||
Counts number of Uops executed that where issued on port 5.
|
||||
.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES
|
||||
.Pq Event B1H , Umask 3FH
|
||||
Counts number of cycles there are one or more uops being executed on any
|
||||
ports. This is a core count only and can not be collected per thread.
|
||||
ports.
|
||||
This is a core count only and can not be collected per thread.
|
||||
.It Li UOPS_EXECUTED.PORT015
|
||||
.Pq Event B1H , Umask 40H
|
||||
Counts number of Uops executed that where issued on port 0, 1, or 5.
|
||||
@ -924,15 +973,18 @@ Counts number of Uops executed that where issued on port 2, 3, or 4.
|
||||
Counts number of cycles the SQ is full to handle off-core requests.
|
||||
.It Li SNOOPQ_REQUESTS_OUTSTANDING.DATA
|
||||
.Pq Event B3H , Umask 01H
|
||||
Counts weighted cycles of snoopq requests for data. Counter 0 only
|
||||
Counts weighted cycles of snoopq requests for data.
|
||||
Counter 0 only
|
||||
Use cmask=1 to count cycles not empty.
|
||||
.It Li SNOOPQ_REQUESTS_OUTSTANDING.INVALIDATE
|
||||
.Pq Event B3H , Umask 02H
|
||||
Counts weighted cycles of snoopq invalidate requests. Counter 0 only.
|
||||
Counts weighted cycles of snoopq invalidate requests.
|
||||
Counter 0 only.
|
||||
Use cmask=1 to count cycles not empty.
|
||||
.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE
|
||||
.Pq Event B3H , Umask 04H
|
||||
Counts weighted cycles of snoopq requests for code. Counter 0 only.
|
||||
Counts weighted cycles of snoopq requests for code.
|
||||
Counter 0 only.
|
||||
Use cmask=1 to count cycles not empty.
|
||||
.It Li SNOOPQ_REQUESTS.CODE
|
||||
.Pq Event B4H , Umask 01H
|
||||
@ -970,7 +1022,8 @@ Use MSR 01A7H.
|
||||
See Table A-1
|
||||
Notes: INST_RETIRED.ANY is counted by a designated fixed counter.
|
||||
INST_RETIRED.ANY_P is counted by a programmable counter and is an
|
||||
architectural performance event. Event is supported if CPUID.A.EBX[1] = 0.
|
||||
architectural performance event.
|
||||
Event is supported if CPUID.A.EBX[1] = 0.
|
||||
Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not
|
||||
count as retired instructions.
|
||||
.It Li INST_RETIRED.X87
|
||||
@ -985,8 +1038,9 @@ Counts the number of retired: MMX instructions.
|
||||
.It Li UOPS_RETIRED.ANY
|
||||
.Pq Event C2H , Umask 01H
|
||||
Counts the number of micro-ops retired, (macro-fused=1, micro- fused=2,
|
||||
others=1; maximum count of 8 per cycle). Most instructions are composed of
|
||||
one or two micro-ops. Some instructions are decoded into longer sequences
|
||||
others=1; maximum count of 8 per cycle).
|
||||
Most instructions are composed of one or two micro-ops.
|
||||
Some instructions are decoded into longer sequences
|
||||
such as repeat instructions, floating point transcendental instructions, and
|
||||
assists.
|
||||
Use cmask=1 and invert to count active cycles or stalled cycles
|
||||
@ -1006,7 +1060,8 @@ Counts the number of machine clears due to memory order conflicts.
|
||||
.Pq Event C3H , Umask 04H
|
||||
Counts the number of times that a program writes to a code section.
|
||||
Self-modifying code causes a sever penalty in all Intel 64 and IA-32
|
||||
processors. The modified cache line is written back to the L2 and L3caches.
|
||||
processors.
|
||||
The modified cache line is written back to the L2 and L3caches.
|
||||
.It Li BR_INST_RETIRED.ANY_P
|
||||
.Pq Event C4H , Umask 00H
|
||||
See Table A-1.
|
||||
@ -1063,23 +1118,25 @@ cache.
|
||||
.It Li MEM_LOAD_RETIRED.OTHER_CORE_L2_HIT_HITM
|
||||
.Pq Event CBH , Umask 08H
|
||||
Counts number of retired loads that hit in a sibling core's L2 (on die
|
||||
core). Since the L3 is inclusive of all cores on the package, this is an L3
|
||||
hit. This counts both clean or modified hits.
|
||||
core).
|
||||
Since the L3 is inclusive of all cores on the package, this is an L3 hit.
|
||||
This counts both clean or modified hits.
|
||||
.It Li MEM_LOAD_RETIRED.L3_MISS
|
||||
.Pq Event CBH , Umask 10H
|
||||
Counts number of retired loads that miss the L3 cache. The load was
|
||||
satisfied by a remote socket, local memory or an IOH.
|
||||
Counts number of retired loads that miss the L3 cache.
|
||||
The load was satisfied by a remote socket, local memory or an IOH.
|
||||
.It Li MEM_LOAD_RETIRED.HIT_LFB
|
||||
.Pq Event CBH , Umask 40H
|
||||
Counts number of retired loads that miss the L1D and the address is located
|
||||
in an allocated line fill buffer and will soon be committed to cache. This
|
||||
is counting secondary L1D misses.
|
||||
in an allocated line fill buffer and will soon be committed to cache.
|
||||
This is counting secondary L1D misses.
|
||||
.It Li MEM_LOAD_RETIRED.DTLB_MISS
|
||||
.Pq Event CBH , Umask 80H
|
||||
Counts the number of retired loads that missed the DTLB. The DTLB miss is
|
||||
not counted if the load operation causes a fault. This event counts loads
|
||||
from cacheable memory only. The event does not count loads by software
|
||||
prefetches. Counts both primary and secondary misses to the TLB.
|
||||
Counts the number of retired loads that missed the DTLB.
|
||||
The DTLB miss is not counted if the load operation causes a fault.
|
||||
This event counts loads from cacheable memory only.
|
||||
The event does not count loads by software prefetches.
|
||||
Counts both primary and secondary misses to the TLB.
|
||||
.It Li FP_MMX_TRANS.TO_FP
|
||||
.Pq Event CCH , Umask 01H
|
||||
Counts the first floating-point instruction following any MMX instruction.
|
||||
@ -1087,15 +1144,15 @@ You can use this event to estimate the penalties for the transitions between
|
||||
floating-point and MMX technology states.
|
||||
.It Li FP_MMX_TRANS.TO_MMX
|
||||
.Pq Event CCH , Umask 02H
|
||||
Counts the first MMX instruction following a floating-point instruction. You
|
||||
can use this event to estimate the penalties for the transitions between
|
||||
Counts the first MMX instruction following a floating-point instruction.
|
||||
You can use this event to estimate the penalties for the transitions between
|
||||
floating-point and MMX technology states.
|
||||
.It Li FP_MMX_TRANS.ANY
|
||||
.Pq Event CCH , Umask 03H
|
||||
Counts all transitions from floating point to MMX instructions and from MMX
|
||||
instructions to floating point instructions. You can use this event to
|
||||
estimate the penalties for the transitions between floating-point and MMX
|
||||
technology states.
|
||||
instructions to floating point instructions.
|
||||
You can use this event to estimate the penalties for the transitions between
|
||||
floating-point and MMX technology states.
|
||||
.It Li MACRO_INSTS.DECODED
|
||||
.Pq Event D0H , Umask 01H
|
||||
Counts the number of instructions decoded, (but not necessarily executed or
|
||||
@ -1105,14 +1162,15 @@ retired).
|
||||
Counts the cycles of decoder stalls.
|
||||
.It Li UOPS_DECODED.MS
|
||||
.Pq Event D1H , Umask 02H
|
||||
Counts the number of Uops decoded by the Microcode Sequencer, MS. The MS
|
||||
delivers uops when the instruction is more than 4 uops long or a microcode
|
||||
assist is occurring.
|
||||
Counts the number of Uops decoded by the Microcode Sequencer, MS.
|
||||
The MS delivers uops when the instruction is more than 4 uops long or a
|
||||
microcode assist is occurring.
|
||||
.It Li UOPS_DECODED.ESP_FOLDING
|
||||
.Pq Event D1H , Umask 04H
|
||||
Counts number of stack pointer (ESP) instructions decoded: push , pop , call
|
||||
, ret, etc. ESP instructions do not generate a Uop to increment or decrement
|
||||
ESP. Instead, they update an ESP_Offset register that keeps track of the
|
||||
ESP.
|
||||
Instead, they update an ESP_Offset register that keeps track of the
|
||||
delta to the current value of the ESP register.
|
||||
.It Li UOPS_DECODED.ESP_SYNC
|
||||
.Pq Event D1H , Umask 08H
|
||||
@ -1122,7 +1180,8 @@ value of the ESP register.
|
||||
.It Li RAT_STALLS.FLAGS
|
||||
.Pq Event D2H , Umask 01H
|
||||
Counts the number of cycles during which execution stalled due to several
|
||||
reasons, one of which is a partial flag register stall. A partial register
|
||||
reasons, one of which is a partial flag register stall.
|
||||
A partial register
|
||||
stall may occur when two conditions are met: 1) an instruction modifies
|
||||
some, but not all, of the flags in the flag register and 2) the next
|
||||
instruction, which depends on flags, depends on flags that were not modified
|
||||
@ -1135,28 +1194,34 @@ was partially written by previous instruction.
|
||||
.It Li RAT_STALLS.ROB_READ_PORT
|
||||
.Pq Event D2H , Umask 04H
|
||||
Counts the number of cycles when ROB read port stalls occurred, which did
|
||||
not allow new micro-ops to enter the out-of-order pipeline. Note that, at
|
||||
not allow new micro-ops to enter the out-of-order pipeline.
|
||||
Note that, at
|
||||
this stage in the pipeline, additional stalls may occur at the same cycle
|
||||
and prevent the stalled micro-ops from entering the pipe. In such a case,
|
||||
and prevent the stalled micro-ops from entering the pipe.
|
||||
In such a case,
|
||||
micro-ops retry entering the execution pipe in the next cycle and the
|
||||
ROB-read port stall is counted again.
|
||||
.It Li RAT_STALLS.SCOREBOARD
|
||||
.Pq Event D2H , Umask 08H
|
||||
Counts the cycles where we stall due to microarchitecturally required
|
||||
serialization. Microcode scoreboarding stalls.
|
||||
serialization.
|
||||
Microcode scoreboarding stalls.
|
||||
.It Li RAT_STALLS.ANY
|
||||
.Pq Event D2H , Umask 0FH
|
||||
Counts all Register Allocation Table stall cycles due to: Cycles when ROB
|
||||
read port stalls occurred, which did not allow new micro-ops to enter the
|
||||
execution pipe. Cycles when partial register stalls occurred Cycles when
|
||||
execution pipe.
|
||||
Cycles when partial register stalls occurred Cycles when
|
||||
flag stalls occurred Cycles floating-point unit (FPU) status word stalls
|
||||
occurred. To count each of these conditions separately use the events:
|
||||
occurred.
|
||||
To count each of these conditions separately use the events:
|
||||
RAT_STALLS.ROB_READ_PORT, RAT_STALLS.PARTIAL, RAT_STALLS.FLAGS, and
|
||||
RAT_STALLS.FPSW.
|
||||
.It Li SEG_RENAME_STALLS
|
||||
.Pq Event D4H , Umask 01H
|
||||
Counts the number of stall cycles due to the lack of renaming resources for
|
||||
the ES, DS, FS, and GS segment registers. If a segment is renamed but not
|
||||
the ES, DS, FS, and GS segment registers.
|
||||
If a segment is renamed but not
|
||||
retired and a second update to the same segment occurs, a stall occurs in
|
||||
the front- end of the pipeline until the renamed segment retires.
|
||||
.It Li ES_REG_RENAMES
|
||||
@ -1176,16 +1241,18 @@ or return branch.
|
||||
.Pq Event E6H , Umask 01H
|
||||
Counts the number of times the front end is resteered, mainly when the
|
||||
Branch Prediction Unit cannot provide a correct prediction and this is
|
||||
corrected by the Branch Address Calculator at the front end. This can occur
|
||||
corrected by the Branch Address Calculator at the front end.
|
||||
This can occur
|
||||
if the code has many branches such that they cannot be consumed by the BPU.
|
||||
Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble
|
||||
in the instruction fetch pipeline. The effect on total execution time
|
||||
depends on the surrounding code.
|
||||
in the instruction fetch pipeline.
|
||||
The effect on total execution time depends on the surrounding code.
|
||||
.It Li BACLEAR.BAD_TARGET
|
||||
.Pq Event E6H , Umask 02H
|
||||
Counts number of Branch Address Calculator clears (BACLEAR) asserted due to
|
||||
conditional branch instructions in which there was a target hit but the
|
||||
direction was wrong. Each BACLEAR asserted by the BAC generates
|
||||
direction was wrong.
|
||||
Each BACLEAR asserted by the BAC generates
|
||||
approximately an 8 cycle bubble in the instruction fetch pipeline.
|
||||
.It Li BPU_CLEARS.EARLY
|
||||
.Pq Event E8H , Umask 01H
|
||||
@ -1195,7 +1262,8 @@ The BPU clear leads to 2 cycle bubble in the Front End.
|
||||
.It Li BPU_CLEARS.LATE
|
||||
.Pq Event E8H , Umask 02H
|
||||
Counts late Branch Prediction Unit clears due to Most Recently Used
|
||||
conflicts. The PBU clear leads to a 3 cycle bubble in the Front End.
|
||||
conflicts.
|
||||
The PBU clear leads to a 3 cycle bubble in the Front End.
|
||||
.It Li THREAD_ACTIVE
|
||||
.Pq Event ECH , Umask 01H
|
||||
Counts cycles threads are active.
|
||||
@ -1258,12 +1326,13 @@ Counts number of Super Queue LRU hints sent to L3.
|
||||
Counts the number of SQ lock splits across a cache line.
|
||||
.It Li SQ_FULL_STALL_CYCLES
|
||||
.Pq Event F6H , Umask 01H
|
||||
Counts cycles the Super Queue is full. Neither of the threads on this core
|
||||
will be able to access the uncore.
|
||||
Counts cycles the Super Queue is full.
|
||||
Neither of the threads on this core will be able to access the uncore.
|
||||
.It Li FP_ASSIST.ALL
|
||||
.Pq Event F7H , Umask 01H
|
||||
Counts the number of floating point operations executed that required
|
||||
micro-code assist intervention. Assists are required in the following cases:
|
||||
micro-code assist intervention.
|
||||
Assists are required in the following cases:
|
||||
SSE instructions, (Denormal input when the DAZ flag is off or Underflow
|
||||
result when the FTZ flag is off): x87 instructions, (NaN or denormal are
|
||||
loaded to a register or used as input from memory, Division by 0 or
|
||||
|
Loading…
Reference in New Issue
Block a user