Commit Graph

64 Commits

Author SHA1 Message Date
Matt Macy
e6b475e0af hwpmc(9): Make pmclog buffer pcpu and update constants
On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.

Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &

Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total

After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total

For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &

Collecting 10 samples of `make -j96 buildkernel` from each:

x before
+ after

real time:
    N           Min           Max        Median           Avg        Stddev
x  10          76.4        127.62        84.845        88.577     15.100031
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        -28.398 +/- 10.0344
        -32.0602% +/- 7.69825%
        (Student's t, pooled s = 10.6794)

system time:
    N           Min           Max        Median           Avg        Stddev
x  10       2277.96       6948.53       2949.47      3341.492     1385.2677
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        -2277.47 +/- 920.425
        -68.1574% +/- 8.77623%
        (Student's t, pooled s = 979.596)

x no pmc
+ pmc running
real time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10          76.4        127.62        84.845        88.577     15.100031
Difference at 95.0% confidence
        29.73 +/- 10.0335
        50.5208% +/- 17.0525%
        (Student's t, pooled s = 10.6785)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        1.332 +/- 0.248939
        2.2635% +/- 0.426506%
        (Student's t, pooled s = 0.264942)

system time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10       2277.96       6948.53       2949.47      3341.492     1385.2677
Difference at 95.0% confidence
        2309.97 +/- 920.443
        223.937% +/- 89.3039%
        (Student's t, pooled s = 979.616)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        32.493 +/- 16.0042
        3.15% +/- 1.5794%
        (Student's t, pooled s = 17.0331)

Reviewed by:	jeff@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15155
2018-05-12 01:26:34 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Ruslan Bukin
07ff05c2ae o Support for Kabylake CPU PMCs (fall down to PMC_CPU_INTEL_SKYLAKE).
o Fix bugs in events descriptions for Skylake, Skylake Xeon and Haswell.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12654
2017-10-13 15:02:29 +00:00
Konstantin Belousov
b99b705d9c Skylake server core PMC support for hwpmc(4).
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
Hardware provided by:	Intel
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12221
2017-09-06 17:19:48 +00:00
George V. Neville-Neil
593b0c8420 Fix PMC architecture check to handle later IPAs including Skylake
Tested with tools/test/hwpmc/pmctest.py

Obtained from:	Oliver Pinter
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9036
2017-01-04 02:15:03 +00:00
John Baldwin
1f095f7051 Apply the fix from r232612 to fixed function counters.
Reviewed by:	emaste
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7397
2016-08-03 16:52:00 +00:00
Pedro F. Giffuni
8dfea46460 Remove slightly used const values that can be replaced with nitems().
Suggested by:	jhb
2016-04-21 15:38:28 +00:00
Konstantin Belousov
411c83ccd6 If full width writes to the performance monitoring counters are
supported, use full-width aliases MSRs for writes.  This fixes the
"[pmc,X] negative increment" assertion on the context switch when
clipped counter value is sign-extended.

Add definitions for the MSR IA32_PERF_CAPABILITIES needed to detect
the feature.

PR:	207068
Submitted by:	joss.upton@yahoo.com
MFC after:	2 weeks
2016-02-12 07:27:24 +00:00
Randall Stewart
34d659d314 More fixes in the various intel processors, fixing missing
IAP_F_FM's as well as incorrect umask specifications for
some of the new Broadwell/Skylake PMC's. Also silvermont
had a *lot* of missing IAP_F_FM.

Sponsored by:	Netflix Inc.
2015-12-11 01:21:32 +00:00
Randall Stewart
f19bae413c Add support for Intel Skylake and Intel Broadwell PMC's. The Broadwell PMC's have been
tested on the Broadwell-Xeon with a hacked up version of pmcstudy -T. I still need
to circle back and add in to pmcstudy all the new tests from the Broadwell Vtune
guide (for the hacked up version I just made it so I could run the -T option). The
Skylake CPU is not yet available (even though Intel is advertising it .. imagine that).
The Skylake PMC's will need to be tested once we can get a sample skylake CPU :-)

Sponsored by: Netflix Inc.
2015-11-30 17:35:49 +00:00
John Baldwin
a1febbf667 Fix two bugs that could result in PMC sampling effectively stopping.
In both cases, the the effect of the bug was that a very small positive
number was written to the counter. This means that a large number of
events needed to occur before the next sampling interrupt would trigger.
Even with very frequently occurring events like clock cycles wrapping all
the way around could take a long time. Both bugs occurred when updating
the saved reload count for an outgoing thread on a context switch.

First, the counter-independent code compares the current reload count
against the count set when the thread switched in and generates a delta
to apply to the saved count. If this delta causes the reload counter
to go negative, it would add a full reload interval to wrap it around to
a positive value. The fix is to add the full reload interval if the
resulting counter is zero.

Second, occasionally the raw counter value read during a context switch
has actually wrapped, but an interrupt has not yet triggered. In this
case the existing logic would return a very large reload count (e.g.
2^48 - 2 if the counter had overflowed by a count of 2). This was seen
both for fixed-function and programmable counters on an E5-2643.
Workaround this case by returning a reload count of zero.

PR:		198149
Differential Revision:	https://reviews.freebsd.org/D2557
Reviewed by:	emaste
MFC after:	1 week
Sponsored by:	Norse Corp, Inc.
2015-05-19 19:15:19 +00:00
John Baldwin
2b1df86c17 Use the proper mask when reloading sampling PMCs for Core CPUs.
Differential Revision:	https://reviews.freebsd.org/D2492
Reviewed by:	emaste
MFC after:	1 month
2015-05-19 19:01:22 +00:00
John Baldwin
4a3690dfa1 Convert hwpmc(4) debug printfs over to KTR.
Differential Revision:	https://reviews.freebsd.org/D2487
Reviewed by:	davide, emaste
MFC after:	2 weeks
Sponsored by:	Norse Corp, Inc.
2015-05-08 19:40:00 +00:00
Rui Paulo
bc3464096a hwpmc: add initial Intel Broadwell support.
The full list of aliases and events will follow in a subsequent
commit.

MFC after:	1 month
2015-04-05 05:14:20 +00:00
Rui Paulo
03a24b7026 Remove whitespace. 2015-04-05 05:09:38 +00:00
Ryan Stone
67e51766bd hwpmc: Fix event number to match enum name
Differential revision:	https://reviews.freebsd.org/D1592
Reviewed by:	Joseph Kong
MFC after:	1 month
2015-03-12 23:44:28 +00:00
Ryan Stone
1761699194 Add missing counter definitions
Differential Revision:	https://reviews.freebsd.org/D1591
MFC after:	1 month
Sponsored by:	Sandvine Inc
2015-03-10 01:24:16 +00:00
Ryan Stone
bc0ad9a99d Fix Ivy Bridge+ MEM_UOPS_RETIRED counters
The MEM_UOPS_RETIRED actually work the same way as the Sandy
Bridge counters, but the counters were documented in a different
way and that seemed to cause the Ivy Bridge counters to be
implemented incorrectly.  Use the same counter definitions as
Sandy Bridge.  While I'm here, rename the counters to match
what's documented in the datasheet.

Differential Revision:	https://reviews.freebsd.org/D1590
MFC after:	1 month
Sponsored by:	Sandvine Inc.
2015-03-10 01:24:08 +00:00
Ryan Stone
9e60f3acd2 Fix Sandy Bridge+ hwpmc branch counters
On Sandy Bridge and later, to count branch-related events you
have to or together a mask indicating the type of branch
instruction to count (e.g. direct jump, branch, etc) and a bits
indicating whether to count taken and not-taken branches.  The
current counter definitions where defining this bits individually,
so the counters never worked and always just counted 0.

Fix the counter definitions to instead contain the proper
combination of masks.  Also update the man pages to reflect the
new counters.

Differential Revision:	https://reviews.freebsd.org/D1587
MFC after:	1 month
Sponsored by:	Sandvine Inc.
2015-03-10 01:23:47 +00:00
Ryan Stone
89d0633b86 Fix pmc unit restrictions to match documentation
A couple of pmc counters did not work because there were being
restricted to the wrong PMC unit.  I've verified that these
counters now work and match the documented restrictions.

Differential Revision:	https://reviews.freebsd.org/D1586
MFC after:	1 month
Sponsored by:	Sandvine Inc
2015-03-10 01:23:40 +00:00
Ryan Stone
6a429fa5d7 style(9) cleanup 2015-01-22 03:56:23 +00:00
Randall Stewart
d95b3509e1 Update the hwpmc driver to have the new type HASWELL_XEON. Also
go back through HASWELL, IVY_BRIDGE, IVY_BRIDGE_XEON and SANDY_BRIDGE
to straighten out all the missing PMCs. We also add a new pmc tool
pmcstudy, this allows one to run the various formulas from
the documents "Using Intel Vtune Amplifier XE on XXX Generation platforms" for
IB/SB and Haswell. The tool also allows one to postulate your own
formulas with any of the various PMC's. At some point I will enahance
this to work with Brendan Gregg's flame-graphs so we can flamegraph
various PMC interactions. Note the manual page also needs some
work (lots of work) but gnn has committed to help me with that ;-)
Reviewed by: gnn
MFC after:1 month
Sponsored by:	Netflix Inc.
2015-01-14 12:46:58 +00:00
Bjoern A. Zeeb
c896236487 Since introducing the extra mapping in r250103 for architectural performance
events we have actually counted 'Branch Instruction Retired' when people
asked for 'Unhalted core cycles' using the 'unhalted-core-cycles' event mask
mnemonic.

Reviewed by:	jimharris
Discussed with:	gnn, rwatson
MFC after:	3 days
Sponsored by:	DARPA/AFRL
2014-10-07 18:00:34 +00:00
Konstantin Belousov
49fe48ab0c For Xeon 7500 and 48XX (Nehalem EX and Westmere EX) variants of the
Core i7 and Westmere processors, the uncore PMC subsystem is
completely different from the uncore PMC on smaller versions of CPUs.
Disable existing uncore hwpmc code for EX, otherwise non-existing MSRs
are accessed.

The cores PMCs seems to be identical for non-EX and EX, according to
the SDM.

Reviewed by:	davide, fabient
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-06-04 16:06:38 +00:00
Hiren Panchasara
e8f021a3f7 Update hwpmc to support core events for Atom Silvermont microarchitecture.
(Model 0x4D as per Intel document 330061-001 01/2014)

Tested by:	Olivier Cochard-Labbe <olivier@cochatrd.me>
MFC after:	4 weeks
2014-03-20 20:51:08 +00:00
Eitan Adler
7793787339 Fix pointer type in call to malloc
Submitted by:	Meyer, Conrad conrad.meyer@isilon.com
2014-03-13 16:51:01 +00:00
John Baldwin
e07ef9b0f6 Move <machine/apicvar.h> to <x86/apicvar.h>. 2014-01-23 20:10:22 +00:00
Attilio Rao
026346c8f1 o Remove assertions on ipa_version as sometimes the version detection
using cpuid can be quirky (this is the case of VMWare without the
  vPMC support) but fail to probe hwpmc.
o Apply the fix for XEON family of processors as established by
  315338-020 document (bug AJ85).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	fabient
2013-12-20 14:03:56 +00:00
Adrian Chadd
cb42cde6ec Fix a >80 character long line, introduced in my previous commit.
Noticed by: hiren
2013-08-25 12:02:20 +00:00
Adrian Chadd
c55e621eed Update the MEM_UOP_RETIRED PMC operation for sandy bridge and sandy
bridge Xeon.

Summary: These are PEBS events but they're also available as normal
counter/sample events.  The source table (Table 19-2) lists the
base versions (LOAD, STLB_MISS, SPLIT, ALL) but it says they must
be qualified with other values.  This particular commit fleshes
out those umask values.

Source:

* Linux; SDM June 2013, Volume 3B, Table 19-2 and 18-21.

Tested:

* Sandy Bridge (non-Xeon)
2013-08-25 02:07:28 +00:00
Adrian Chadd
f7cd52247d Add in missing events for Sandy Bridge Xeon.
* Add in MEM_LOAD_UOPS_LLC_HIT_RETIRED for both sandy bridge and sandy
  bridge Xeon.  Right now it only is enabled for Sandy Bridge.
* D2/0F is actually a combination rather than a separate counter, so
  just flip that on for the CPU types that support it.

There's an errata for using this on SB Xeon hardware - I've documented
it in kern/181346.

Tested:

* Sandy Bridge
* Sandy Bridge Xeon

Sponsored by:	Netflix, Inc.
2013-08-18 06:08:52 +00:00
Davide Italiano
31d7f0b7b0 Suppress a GCC warning. This warning is actually bogus and newer GCC
versions than the one in base (dim@ mentioned he tried on 4.7.3 and 4.8.1)
do not whine about it, so, at some point this workaround will be reverted.

Reported by:	ache
Discussed with:	dim
2013-05-02 14:55:21 +00:00
Davide Italiano
38179e6e76 The Intel PMC architectural events have encodings which are identical to
those of some non-architectural core events. This is not a problem in the
general case as long as there's an 1:1 mapping between the two, but there
are few exceptions. For example, 3CH_01H on Nehalem/Westmere represents
both unhalted-reference-cycles and CPU_CLK_UNHALTED.REF_P.
CPU_CLK_UNHALTED.REF_P on the aforementioned architectures does not measure
reference (i.e. bus) but TSC, so there's the need to disambiguate.
In order to avoid the namespace collision rename all the architectural
events in a way they cannot be ambigous and refactor the architectural
events handling function to reflect this change.
While here, per Jim Harris request, rename
iap_architectural_event_is_unsupported() to iap_event_is_architectural().

Discussed with:	jimharris
Reviewed by:	jimharris, gnn
2013-04-30 15:31:45 +00:00
Davide Italiano
9dc0d11127 Fixup Westmere hwpmc(4) support: add missing CPU flag so that
intrucion-retired, llc-misses and llc-reference events can now be
allocated.

Reviewed by:	jimharris, gnn
2013-04-30 08:18:08 +00:00
Hiren Panchasara
c851725506 Improve/correct a comment. We now support a lot more cpu types.
PR:	kern/177496
Approved by:	sbruno (mentor)
2013-04-14 02:26:12 +00:00
Ryan Stone
c83c365008 Cosmetic change: make a comment reference Sandy Bridge *Xeon*
Reviewed by:	sbruno
MFC after:	1 week
2013-04-12 20:43:14 +00:00
Sean Bruno
4b226201da Trailing whitespace cleanup along with 80 column enforcemnt.
Submitted by:	hiren.panchasara@gmail.com
Reviewed by:	sbruno@freebsd.org
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-04-03 21:34:35 +00:00
Sean Bruno
cc0c1555d3 Update hwpmc to support Haswell class processors.
0x3C:      /* Per Intel document 325462-045US 01/2013. */

Add manpage to document all the goodness that is available in this
processor model.

Submitted by:	hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by:	jimharris, sbruno
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-03-28 19:15:54 +00:00
Sean Bruno
3f929d8cdd Update hwpmc to support the Xeon class of Ivybridge processors.
case 0x3E:      /* Per Intel document 325462-045US 01/2013. */

Add manpage to document all the goodness that is available in this
processor model.

No support for uncore events at this time.

Submitted by:	hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by:	davide, jimharris, sbruno
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-01-31 22:09:53 +00:00
Sean Bruno
fabe02f5f3 Update hwpmc to support the Xeon class of Sandybridge processors.
(Model 0x2D     /* Per Intel document 253669-044US 08/2012. */)

Add manpage to document all the goodness that is available in this
processor model.

No support for uncore events at this time.

Submitted by:	hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by:	jimharris@ fabient@
Obtained from:	Yahoo! Inc.
MFC after:	  2 weeks
2012-10-19 17:01:27 +00:00
Fabien Thomas
35db642499 Complete and merge the list between Sandy/Ivy bridge of events
that can run on specific PMC.

MFC after:	1 month
2012-09-07 14:45:59 +00:00
Fabien Thomas
1e862e5ad0 Add Intel Ivy Bridge support to hwpmc(9).
Update offcore RSP token for Sandy Bridge.
Note: No uncore support.

Will works on Family 6 Model 3a.

MFC after: 1 month
Tested by: bapt, grehan
2012-09-06 13:54:01 +00:00
George V. Neville-Neil
03963ef8d1 Fix so that ,usr and ,os work correctly with fixed function (IAF)
counters.

MFC after:	1 week
2012-05-02 16:23:36 +00:00
Fabien Thomas
f5f9340b98 Add software PMC support.
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).

Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.

Sponsored by: NETASQ
MFC after:	1 month
2012-03-28 20:58:30 +00:00
Oleksandr Tymoshenko
ef902782ef Fix crash on VirtualBox (and probably on some real hardware):
- Do not cover error returned by pmc_core_initialize with the
    result of pmc_uncore_initialize, fail right away.
- Give a user something to report instead failing silently

Reported by:	Alexandr Kovalenko <never@nevermind.kiev.ua>
2012-03-27 18:22:14 +00:00
George V. Neville-Neil
a7ef8bbb2f Properly mask off bits that are not supported in the IAP counters.
This fixes a bug where users would see massively large counts, near
to 2**64 -1, due to the bits not being cleared.

MFC after:	3 weeks
2012-03-06 17:17:03 +00:00
Davide Italiano
78d763a29b - Add support for the Intel Sandy Bridge microarchitecture (both core and uncore counting events)
- New manpages with event lists.
- Add MSRs for the Intel Sandy Bridge microarchitecture

Reviewed by:	attilio, brueffer, fabient
Approved by:	gnn (mentor)
MFC after:	3 weeks
2012-03-01 21:23:26 +00:00
Fabien Thomas
ba89031aea Update PMC events from October 2011 Intel documentation.
Submitted by:	Davide Italiano <davide.italiano@gmail.com>
MFC after:	3 days
2012-01-04 07:58:36 +00:00
Eitan Adler
53113e4db9 - Remove extra space
Submitted by:	Davide Italiano <davide.italiano@gmail.com>
Approved by:	brucec
2011-12-21 17:51:49 +00:00
Fabien Thomas
2dde521a9a There's a small set of events on Nehalem, that are not supported in
processors with CPUID signature 06_1AH, 06_1EH, and 06_1FH.

Refuse to allocate them on unsupported model.

Submitted by:	Davide Italiano <davide.italiano@gmail.com>
MFC after:	1 month
2011-12-12 13:12:55 +00:00