Commit Graph

395 Commits

Author SHA1 Message Date
Konstantin Belousov
22d7708455 hwpmc/core: Adopt to upcoming Skylake TSX errata.
The forthcoming microcode update will fix a TSX bug by clobbering PMC3
when TSX instructions are executed (even speculatively).  There is an
alternate mode where CPU executes all TSX instructions by aborting
them, in which case PMC3 is still available to OS.  Any code that
correctly uses TSX must be ready to handle abort anyway.

Since it is believed that FreeBSD population of hwpmc(4) users is
significantly larger than the population of TSX users, switch the
microcode into TSX abort mode whenever a pmc is allocated, and back to
bug avoidance mode when the last pmc is deallocated.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-03-12 19:33:25 +00:00
Konstantin Belousov
45a2d058d2 Remove useless version check.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-03-12 18:57:11 +00:00
Mark Johnston
8f77f60f94 hwpmc: Plug memory disclosures from PMC_OP_{GETPMCINFO,GETCPUINFO}.
admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
MFC after:	1 day
Security:	Kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 18:07:02 +00:00
Matt Macy
acf50a7f68 hwpmc: limit wait for user callchain collection to 1 tick
The hwpmc pcpu sample buffer is prone to head of line blocking
when waiting for user process to return to user space and
collect a pending callchain. If more than one tick has elapsed
between the time the sample entry was marked for collection and
the time that the hardclock pmc handler runs to copy the records
to a larger temporary buffer, mark the sample entry as not in
use.

This changes reduces the number of samples marked as not valid
when collecting under load from ~99.5% to 5-20%.

Reported by:	mjg@
MFC after:	3 days
2018-11-05 08:11:16 +00:00
Matt Macy
dacc43df34 Add aditional counter descriptions to AMD 0x17
Submitted by:	Somalapuram Amaranath
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17401
2018-11-04 06:24:27 +00:00
Mark Johnston
9978bd996b Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain.  The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted.  Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with:	gallatin, jeff
Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17418
2018-10-30 18:26:34 +00:00
Matt Macy
d9f1b8dbf2 hwpmc: Refactor sample ring buffer handling to fix races
Refactor sample ring buffer ring handling to make it more robust to
long running callchain collection handling

r338112 introduced a (now fixed) regression that exposed a number of race
conditions within the management of the sample buffers. This
simplifies the handling and moves the decision to overwrite a
callchain sample that has taken too long out of the NMI in to the
hardlock handler. With this change the problem no longer shows up as a
ring corruption but as the code spending all of its time in callchain
collection.

- Makes the producer / consumer index incrementing monotonic, making it
  easier (for me at least) to reason about.
- Moves the decision to overwrite a sample from NMI context to interrupt
  context where we can enforce serialization.
- Puts a time limit on waiting to collect a user callchain - putting a
  bound on head-of-line blocking causing samples to be dropped
- Removes the flush routine which was previously needed to purge
  dangling references to the pmc from the sample buffers but now is only
  a source of a race condition on unload.

Previously one could lock up or crash HEAD by running:
pmcstat -S inst_retired.any_p -T and then hitting ^C

After this change it is no longer possible.

PR:	231793
Reviewed by:	markj@
Approved by:	re (gjb@)
Differential Revision:	https://reviews.freebsd.org/D17011
2018-10-05 05:55:56 +00:00
Matt Macy
0204d85a62 hwpmc: set default rate if event description lacks one / filter rate against misuse
Not all event descriptions have a sample rate (such as inst_retired.any)
this will restore the legacy behavior of using 65536 in that case. It also
prevents accidental API misuse that could lead to panic.

PR:	230985
Reported by:	markj
Reviewed by:	markj
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16958
2018-09-14 01:30:05 +00:00
Matt Macy
81eb4dcf9e Add library and kernel support for AMD Family 17h counters
NB: lacks default sample rate for most counters
2018-08-14 05:18:43 +00:00
Warner Losh
c81b12e0d7 Revert r336773: it removed too much.
r336773 removed all things xscale. However, some things xscale are
really armv5. Revert that entirely. A more modest removal will follow.

Noticed by: andrew@
2018-07-27 21:25:01 +00:00
Warner Losh
626930c2fd Remove xscale support
The OLD XSCALE stuff hasn't been useful in a while. The original
committer (cognet@) was the only one that had boards for it. He's
blessed this removal. Newer XSCALE (GUMSTIX) is for hardware that's
quite old. After discussion on arm@, it was clear there was no support
for keeping it.

Differential Review: https://reviews.freebsd.org/D16313
2018-07-27 18:33:09 +00:00
Eitan Adler
33f4bccaa6 Use https over http for FreeBSD pages 2018-07-27 10:40:48 +00:00
Warner Losh
ff9452772d Remove kernel support for armeb
Remove all the big-endian arm architectures (ixp425 and ixp435)
support in the kernel and associated drivers.

Differential Revision:  https://reviews.freebsd.org/D16257
2018-07-17 23:23:45 +00:00
Matt Macy
72ac73fa46 hwpmc: remove hacks to work around incorrect pc_domain 2018-07-06 06:21:24 +00:00
Matt Macy
6573d7580b epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
Mark Johnston
a9336cef39 Use the cached curthread reference in pmc_process_interrupt().
Fix indentation while here.
2018-06-11 16:27:09 +00:00
Matt Macy
4f63fbc955 hwpmc: remove dangling references to hwpmc_xscale
Reported by:	mjg
2018-06-08 20:39:49 +00:00
Matt Macy
7bca795ee0 hwpmc: retire never completed xscale support
hwpmc xscale support is not actually functional and the
architecture is well past its shelf life.
2018-06-08 18:09:19 +00:00
Matt Macy
d73912e57a hwpmc: update files missed by r334827 2018-06-08 17:41:49 +00:00
Matt Macy
7f5336f666 hwpmc: fix arm64 INVARIANTS build 2018-06-08 05:48:28 +00:00
Matt Macy
978910109d hwpmc: avoid undefined variable on LINT 2018-06-08 05:01:09 +00:00
Matt Macy
eb7c901995 hwpmc: simplify calling convention for hwpmc interrupt handling
pmc_process_interrupt takes 5 arguments when only 3 are needed.
cpu is always available in curcpu and inuserspace can always be
derived from the passed trapframe.

While facially a reasonable cleanup this change was motivated
by the need to workaround a compiler bug.

core2_intr(cpu, tf) ->
  pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) ->
    pmc_add_sample(cpu, ring, pm, tf, inuserspace)

In the process of optimizing the tail call the tf pointer was getting
clobbered:

(kgdb) up
    at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709
4709                                pmc_save_kernel_callchain(ps->ps_pc,
(kgdb) up
1205                    error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,

resulting in a crash in pmc_save_kernel_callchain.
2018-06-08 04:58:03 +00:00
Matt Macy
9616acde97 hwpmc: don't do EMIT64 on constant 2018-06-07 02:20:27 +00:00
Matt Macy
f992dd4b5c pmc: convert native to jsonl and track TSC value of samples
- add '-j' options to filter to enable converting native pmc
  log format to json lines format to enable the use of scripts
  and external tooling

% pmc filter -j pmc.log pmc.jsonl

- Record the tsc value in sampling interrupts as opposed to
  recording nanotime when the sample is copied to a global log
  in hardclock - potentially many milliseconds later.

- At initialize record the tsc_freq and the time of day to give
  us an offset for translating the tsc values in callchain records
2018-06-07 02:03:22 +00:00
Matt Macy
41abd7afa3 hwpmc: don't log pid->name more than once 2018-06-07 00:54:43 +00:00
Matt Macy
b2ca2e50b9 hwpmc: add summary command and further metadata extensions
metadata changes:
- log pmc sample rate with pmcallocate
- log proc flags with thread / process logging
  to identify user vs kernel threads

fixes:
- use log cpuid to translate event id to event name

Implement rudimentary summary command to track sample
counts by thread and process name within a pmc log.

% make -j4 buildkernel >& /dev/null &
% sudo pmcstat -S unhalted_core_cycles -S llc-misses -O foo sleep 15
% pmc summary foo
cpu_clk_unhalted.thread_p_any:
        idle: 138108207162
        clang-6.0: 105336158004
        sh: 72340108510
        make: 8642012963
        kernel: 7754011631
longest_lat_cache.miss:
        clang-6.0: 87502625
        sh: 40901227
        make: 5500165
        kernel: 3300099
        awk: 2000060

%  pmc summary -f ~/foo
idx: 278 name: cpu_clk_unhalted.thread_p_any rate: 2000003
idle: 69054
clang-6.0: 52668
sh: 36170
make: 4321
kernel: 3877
hwpmc: proc(7445): 3319
awk: 1289
xargs: 357
rand_harvestq: 181
mtree: 102
intr: 53
zfskern: 31
usb: 7
pagedaemon: 4
ntpd: 3
syslogd: 1
acpi_thermal: 1
logger: 1
syncer: 1
snmptrapd: 1
sleep: 1
idx: 17 name: longest_lat_cache.miss rate: 100003
clang-6.0: 875
sh: 409
make: 55
kernel: 33
awk: 20
hwpmc: proc(7445): 14
xargs: 9
idle: 8
intr: 3
zfskern: 2
2018-06-06 02:48:09 +00:00
Matt Macy
ebfaf69cc0 hwpmc: log name->pid, name->tid mappings
By logging all threads and processes 'pmc filter'
can now filter on process or thread name, relieving
the user of the burden of determining which tid or
pid was which when the sample was taken.

% pmc filter -T if_io_tqg -P nginx pmc.log pmc-iflib.log

% pmc filter -x -T idle pmc.log pmc-noidle.log
2018-06-05 04:26:40 +00:00
Matt Macy
ac7012d284 hwpmc: don't defer user callchain capture completion to ast 2018-06-04 21:17:30 +00:00
Matt Macy
cf823003a7 hwpmc: remove gratuitous curthread checks 2018-06-04 17:49:34 +00:00
Matt Macy
9645bcabdf hwpmc: fix fixed counters checks 2018-06-04 04:49:06 +00:00
Matt Macy
07d80fd8dc hwpmc: ABI fixes
- increase pmc cpuid field from 8 to 12 bits
- add cpuid version string to initialize entry in the log
  so that filter can identify which counter index an
  event name maps to
- GC unused config flags
- make fixed counter assignment more robust as well as the
  changes needed to be properly identified for filter
2018-06-04 02:05:48 +00:00
Matt Macy
5de96e33d6 hwpmc: support sampling both kernel and user stacks when interrupted in kernel
This adds the -U options to pmcstat which will attribute in-kernel samples
back to the user stack that invoked the system call. It is not the default,
because when looking at kernel profiles it is generally more desirable to
merge all instances of a given system call together.

Although heavily revised, this change is directly derived from D7350 by
Jonathan T. Looney.

Obtained from: jtl
Sponsored by: Juniper Networks, Limelight Networks
2018-06-04 01:10:23 +00:00
Matt Macy
2ce69a4d04 hwpmc: ensure that mapin updates are synchronous 2018-06-03 19:37:17 +00:00
Matt Macy
e92a1350b5 hwpmc: remove unused pre-table driven bits for intel
Intel now provides comprehensive tables for all performance counters
and the various valid configuration permutations as text .json files.
Libpmc has been converted to use these and hwpmc_core has been greatly
simplified by moving to passthrough of the table values.

The one gotcha is that said tables don't support pentium pro and and pentium
IV. There's very few users of hwpmc on _amd64_ kernels on new hardware. It is
unlikely that anyone is doing low level optimization on 15 year old Intel
hardware. Nonetheless, if someone feels strongly enough to populate the
corresponding tables for p4 and ppro I will reinstate the files in to the
build.

Code for the K8 counters and !x86 architectures remains unchanged.
2018-05-31 22:41:07 +00:00
Matt Macy
02ce8216df hwpmc: remove stale assert
Reported by:	eadler
2018-05-30 06:29:22 +00:00
Matt Macy
b99aa0fbb2 hwpmc: don't enter epoch section across mmap hook 2018-05-29 18:03:48 +00:00
Matt Macy
23c01e5b57 hwpmc: don't incrorrectly strip the ANY flag 2018-05-29 04:04:06 +00:00
Matt Macy
ba32b20330 hwpmc: make pmc class specification work to enable fixed function counters 2018-05-28 23:17:57 +00:00
Matt Macy
3de228499a hwmpc: fix brain-damaged handling of thread descriptor freeing 2018-05-28 23:16:39 +00:00
Matt Macy
39446ce5b3 hwpmc_logging.c: don't call wakeup_one with thread lock held, don't
malloc(M_WAITOK) in an epoch section
2018-05-28 23:12:26 +00:00
Matt Macy
959826ca1b pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the
vendor provided pmu-events tables and sundry cleanups.

The vendor pmu-events tables provide counter descriptions, default
sample rates, event, umask, and flag values for all the counter
configuration permutations. Using this gives us:

- much simpler kernel code for the MD component
- helpful long and short event descriptions
- simpler user code
- sample rates that won't overload the system

Update man page with newer sample types and remove unused sample type.
2018-05-26 19:29:19 +00:00
Matt Macy
5506ceb87f Revert r334242 "pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the"
because of squash commit messages
2018-05-26 19:26:19 +00:00
Matt Macy
4928135658 pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the
vendor provided pmu-events tables and sundry cleanups.

The vendor pmu-events tables provide counter descriptions, default
sample rates, event, umask, and flag values for all the counter
configuration permutations. Using this gives us:

- much simpler kernel code for the MD component
- helpful long and short event descriptions
- simpler user code
- sample rates that won't overload the system

Update man page with newer sample types and remove unused sample type.

Squashed commit of the following:

commit 4459d43eff815bec08ccc5533dbe5de846f03128
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Sat May 26 00:06:31 2018 -0700

    libpmc: fix pmu function signatures for non amd64

commit a2cb8bbc586c65d41f9b291430a2261ec67b59fe
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:38:11 2018 -0700

    pmcstat: fix indentation of usage

commit f686954b15ff56a833ac80404898977cb80a265b
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:19:49 2018 -0700

    pmclog(3): add callchain and pmcallocatedyn, remove pcsample

commit 73e13a0d2e9498c81c150d14d022050cee7511bb
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:19:00 2018 -0700

    pmclog.h: GC pcsample field

commit 3e93ffd65da641fa657539dad3c48e281f8b5798
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:05:57 2018 -0700

    hwpmc: make Intel core CPUs use external event tables

commit 634f5fae1e1644ac324003136c66cd9c619d1c93
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 22:00:06 2018 -0700

    pmclog: update log record types, bump PMC_MAJOR
    - explicitly make log record types a multiple of 8 bytes
    - hook in pmu event types for pmc_allocate records
    - remove references to no longer PCSAMPLE record

commit 83d84fcd2d65bdf6ddcb2e155a22f0cfa2a9c225
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 21:52:10 2018 -0700

    libpmc: add support for having vendor table driven pmc_allocate

commit 9e6ad63c40c2fce8404847ace5078ca6cb33a736
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 19:11:33 2018 -0700

    hwpmc_core: add accessors for EVSEL & UMASK, make IAP_UMASK useful to user

commit 859dceb93daa6419a48c794db99b6758e5b041c9
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 19:09:45 2018 -0700

    pmcstat: update usage and man page as well as make -L consistent with pmccontrol

commit 79c7d8597e28c2eb13f5f9113e65ec2792ca57b1
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 18:07:03 2018 -0700

    pmu_util: add support for all current intel event keywords

commit d8089c7f6a6c8527f38324252b1ffb47004694c6
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 17:45:00 2018 -0700

    add description for new arguments

commit 058336740bab53c62ec88a3a026ea848cf3878c6
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 17:38:15 2018 -0700

    libpmc: move pmu_events table and pmu_utils out of libpmcstat so that they can be used by pmc_allocate

commit 049b66b382e2f833c3f47bc8df9e750cb265709f
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 16:12:41 2018 -0700

    pmcstat: hook pmu_events counter description utility routines in

commit f5e01e7b37a691dc045e1aa16b3ebdd162515de8
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 16:11:59 2018 -0700

    pmu_events: add utility routines for listing counters and their descriptions

commit cba4d4f8907f772279f86f18f915e0d74d33ac56
Author: Matt Macy <mmacy@mattmacy.io>
Date:   Fri May 25 16:09:50 2018 -0700

    pmu-events: expand out skylake regex to simplify string matches
2018-05-26 18:12:50 +00:00
Matt Macy
0f8d79d977 CK: update consumers to use CK macros across the board
r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors
2018-05-24 23:21:23 +00:00
Matt Macy
a85289cf9b hwppmc: set threadid in callchain records - second part of r334108 2018-05-23 17:44:29 +00:00
Matt Macy
ae573a91cf pmc: detach free_gtask on unload
Reported by:	pho
2018-05-20 20:34:15 +00:00
Matt Macy
f2daab2c8f pmc: avoid potential race on shutdown
Clear shutdown flag first, conservatively allow 5ms for all hardclock consumers to
see flag before drainining
2018-05-20 19:35:24 +00:00
Matt Macy
70398c2f86 epoch(9): Make epochs non-preemptible by default
There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.

Reported by:	markj
2018-05-18 17:29:43 +00:00
Matt Macy
6161b98c99 hwpmc: Implement per-thread counters for PMC sampling
This implements per-thread counters for PMC sampling. The thread
descriptors are stored in a list attached to the process descriptor.
These thread descriptors can store any per-thread information necessary
for current or future features. For the moment, they just store the counters
for sampling.

The thread descriptors are created when the process descriptor is created.
Additionally, thread descriptors are created or freed when threads
are started or stopped. Because the thread exit function is called in a
critical section, we can't directly free the thread descriptors. Hence,
they are freed to a cache, which is also used as a source of allocations
when needed for new threads.

Approved by:	sbruno
Obtained from:	jtl
Sponsored by:	Juniper Networks, Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15335
2018-05-16 22:29:20 +00:00
Matt Macy
102ccac21c hwpmc: don't reference domain index with no memory backing it
On multi-socket the domain will be correctly set for a given CPU
regardless of whether or not NUMA is enabled.

Approved by:	sbruno
2018-05-14 06:11:25 +00:00
Matt Macy
8fa7df3668 pmc: don't add pmc owner to list until setup is complete
Once a pmc owner is added to the pmc_ss_owners list it is
visible for all to see. We don't want this to happen until
setup is complete.

Reported by:	mjg
Approved by:	sbruno
2018-05-14 01:08:47 +00:00
Matt Macy
0f00315cb3 hwpmc: fix load/unload race and vm map LOR
- fix load/unload race by allocating the per-domain list structure at boot

- fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch
  to protect the liveness of elements of the pmc_ss_owners list

Reported by:	pho
Approved by:	sbruno
2018-05-14 00:21:04 +00:00
Matt Macy
f1401123c5 hwpmc/epoch - don't reference domain if NUMA is not set
It appears that domain information is set correctly independent
of whether or not NUMA is defined. However, there is no memory
backing secondary domains leading to allocation failure.

Reported by:	pho@, np@
Approved by:	sbruno@
2018-05-12 20:00:29 +00:00
Matt Macy
d626a614b9 hwpmc(9): clear remaining sample work for hardclock
- fix last minute change in 333509 where by runcount references
  to a pmc would remaining causing us to pause loop forever

Approved by:	sbruno
2018-05-12 03:45:30 +00:00
Matt Macy
e6b475e0af hwpmc(9): Make pmclog buffer pcpu and update constants
On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.

Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &

Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total

After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total

For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &

Collecting 10 samples of `make -j96 buildkernel` from each:

x before
+ after

real time:
    N           Min           Max        Median           Avg        Stddev
x  10          76.4        127.62        84.845        88.577     15.100031
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        -28.398 +/- 10.0344
        -32.0602% +/- 7.69825%
        (Student's t, pooled s = 10.6794)

system time:
    N           Min           Max        Median           Avg        Stddev
x  10       2277.96       6948.53       2949.47      3341.492     1385.2677
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        -2277.47 +/- 920.425
        -68.1574% +/- 8.77623%
        (Student's t, pooled s = 979.596)

x no pmc
+ pmc running
real time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10          76.4        127.62        84.845        88.577     15.100031
Difference at 95.0% confidence
        29.73 +/- 10.0335
        50.5208% +/- 17.0525%
        (Student's t, pooled s = 10.6785)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        1.332 +/- 0.248939
        2.2635% +/- 0.426506%
        (Student's t, pooled s = 0.264942)

system time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10       2277.96       6948.53       2949.47      3341.492     1385.2677
Difference at 95.0% confidence
        2309.97 +/- 920.443
        223.937% +/- 89.3039%
        (Student's t, pooled s = 979.616)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        32.493 +/- 16.0042
        3.15% +/- 1.5794%
        (Student's t, pooled s = 17.0331)

Reviewed by:	jeff@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15155
2018-05-12 01:26:34 +00:00
Matt Macy
cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Fabien Thomas
9f4f1d4d1f Fix pmcstat exit from kernel introduced by r325275.
pmcstat request for close will generate a close event.
This event will be in turn received by pmcstat to close the file.

Reviewed by:	kib
Tested by:	pho
MFC after:	1 week
Sponsored by: Stormshield
2018-01-17 16:41:22 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Konstantin Belousov
f4dd123e15 Do not leak PMC_PO_OWNS_LOGFILE on error.
Note that PMCLOG_RESERVE_WITH_ERROR() macro contains goto error;
statement and executed after the flag is set.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-11-13 10:45:31 +00:00
Konstantin Belousov
c9da263712 Style bug.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-11-13 10:43:31 +00:00
Konstantin Belousov
d2cd638852 Check that the pmc index is less than the number of hardware PMCs,
instead of asserting the condition.

The row index is directly supplied by userspace, the kernel must
handle invalid values.

Submitted by:	pho
MFC after:	3 days
2017-11-10 19:10:14 +00:00
Konstantin Belousov
20b555e146 Do not run pmclog_configure_log() without pmc_sx protection.
The r195005 unlocked pmc_sx before calling into pmclog_configure_log()
to avoid the LOR, but it allows flush or closelog to run in parallel
with the configuration, causing many failure modes.

Revert r195005.  Pre-create the logging process, allowing it to run
after the set up succeeded, otherwise the process terminates itself.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:43:39 +00:00
Konstantin Belousov
1121a37474 Be protective and check the po_file validity before dropping the ref.
Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:37:45 +00:00
Konstantin Belousov
ea4d25f90b In hwpmc, do not double-close the logging file.
hwpmc(4) must not voluntarily call fo_close(), doing this causes
double-close of the file.  It seems to almost avoid bad consequences
for pipes, but other types of files demonstrate random memory access.

To fix, remove fo_close() calls, which also do not provide the
declared wake-up of waiters consistently.  Instead, send a signal to
the logger and configure the logger process to not block it.  Since
logger never returns to userspace, the signal only causes termination
of the interruptible sleeps in fo_write().

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:32:52 +00:00
Konstantin Belousov
bd63e82975 There is no use for dropping Giant in the pmc syscall.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:16:18 +00:00
Konstantin Belousov
cf9ef80607 Minor style tweaks.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 11:05:47 +00:00
Konstantin Belousov
1cfbc451b9 Use designated initializers for pmc sysent and module data.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D12882
2017-11-01 10:49:41 +00:00
Ruslan Bukin
07ff05c2ae o Support for Kabylake CPU PMCs (fall down to PMC_CPU_INTEL_SKYLAKE).
o Fix bugs in events descriptions for Skylake, Skylake Xeon and Haswell.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12654
2017-10-13 15:02:29 +00:00
Conrad Meyer
1356a2e6fa hwpmc(4): Actually use a sufficiently wide type
jhibbits@ points out that left shifting bits 8-11 24 bits won't fit in a 32-bit
integer either.

Corrects r324533.

Submitted by:	jhibbits
Sponsored by:	Dell EMC Isilon
2017-10-11 15:13:40 +00:00
Conrad Meyer
a7b8be82f0 hwpmc(4): Force sufficiently wide type for left shift
Ordinary input to this macro comes from pe_code, which is uint16_t.  Coverity
points out that shifting such a value discards the result of a 24 bit shift,
which is not what we want.

A follow-up to r324291.

CID:		1381676
Sponsored by:	Dell EMC Isilon
2017-10-11 14:59:04 +00:00
Conrad Meyer
1d3aa3624d hwpmc(4): Add support for extended AMD events
Sponsored by:	Dell EMC Isilon
2017-10-04 23:35:10 +00:00
Konstantin Belousov
b99b705d9c Skylake server core PMC support for hwpmc(4).
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
Hardware provided by:	Intel
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12221
2017-09-06 17:19:48 +00:00
Konstantin Belousov
16997138d3 Fix logic error in the the assert, causing the condition to be always true.
Also improve the formatting of the corresponding KASSERT message.

Based on the submission by:	Svyatoslav <razmyslov@viva64.com>
Found by:	PVS-Studio
PR:	217741
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2017-08-08 15:46:29 +00:00
Zbigniew Bodek
6cb40391c4 Fix INVARIANTS debug code in HWPMC
When HWPMC stops sampling, ps_pmc may be freed before samples
are processed. In such situation treat PMC as stopped.
Add "ifdef" to fix build without INVARIANTS code.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10912
2017-06-13 18:53:56 +00:00
Zbigniew Bodek
95ca4f5a0e Fix event table for Cortex A9.
Removed events 0x8 (INSTR_EXECUTED), 0xE (PC_PROC_RETURN) and
0x13-0x1d not supported on Cortex A9.
Add events 0x68 and 0x6E which replaced 0x8 and 0xE.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10911
2017-06-13 18:52:39 +00:00
Zbigniew Bodek
ab632d9651 Fix HWPMC interrupt handling in Counting Mode
Additionally:
 - Fix support for Cycle Counter (evsel == 0xFF)
 - Stop and mask interrupts from all counters on init and finish

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10910
2017-06-13 18:51:23 +00:00
Fabien Thomas
d42aefee43 Fix arm stack frame walking support:
- Adjust stack offset for Clang
- Correctly fill registers for fake stack frame (soft PMC)

MFC after:	1 week
Sponsored by:	Stormshield
Differential Revision:	https://reviews.freebsd.org/D7396
2017-03-14 16:06:57 +00:00
George V. Neville-Neil
593b0c8420 Fix PMC architecture check to handle later IPAs including Skylake
Tested with tools/test/hwpmc/pmctest.py

Obtained from:	Oliver Pinter
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9036
2017-01-04 02:15:03 +00:00
Andriy Gapon
593077d613 pmc_process_csw_out: ignore deleted counters
I see the fllowing panic on AMD when exiting pmcstat:

panic: [pmc,1473] pp_pmcval outside of expected range cpu=2 ri=17
pp_pmcval=fffffffffa529f5b pm_reloadcount=10000

It seems that at least on AMD a performance counter keeps counting after
overflowing.  When pmcstat exits it sets counters that it used to
PMC_STATE_DELETED and waits until their use count goes to zero.
amd_intr() wouldn't reload a counter in that state and, thus, a counter
would be allowed to overflow.  That means that the counter's value would
be allowed to go outside the expected range.

MFC after:	2 weeks
2016-11-10 11:12:45 +00:00
Andriy Gapon
3c1f73b18d hwpmc: fix a race between amd_stop_pmc and amd_intr
It is possible that wrmsr in amd_stop_pmc() causes an overflow in a counter
that it disables.  In that case a non-maskable interrupt is generated.  The
interrupt handler code was written in such a way that it would re-enable the
counter.  That would lead to an unexpected interrupt later on.

This problem was easy to reproduce with
$ pmcstat -T -P instructions -t $pid
if the target process is sufficiently busy and there are context switches from
time to time.  There would be a lot of interrupts to "race" with amd_stop_pmc()
called during the context switches.  The problem affected only AMD processors.

While there, trace whether amd_intr() claimed an interrupt.

Reviewed by:	jhb
MFC after:	2 weeks
2016-10-30 09:38:10 +00:00
Ed Maste
cec1957ae1 hwpmc: remove sys/capability.h backwards compatibility
The Capsicum header is installed as sys/capsicum.h in stable/10 as well.
2016-09-20 12:56:03 +00:00
John Baldwin
1f095f7051 Apply the fix from r232612 to fixed function counters.
Reviewed by:	emaste
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7397
2016-08-03 16:52:00 +00:00
Andrew Turner
7bc7e3cd65 Don't panic in hwpmc when stopping sampling.
When hwpmc stops sampling it will set the pm_state to something other
than PMC_STATE_RUNNING. This means the following sequence can happen:

CPU 0: Enter the interrupt handler
CPU 0: Set the thread TDP_CALLCHAIN pflag
CPU 1: Stop sampling
CPU 0: Call pmc_process_samples, sampling is stopped so clears ps_nsamples
CPU 0: Finishes interrupt processing with the TDP_CALLCHAIN flag set
CPU 0: Call pmc_capture_user_callchain to capture the user call chain
CPU 0: Find all the pmc sample are free so no call chains need to be captured
CPU 0: KASSERT because of this

This fixes the issue by checking if any of the samples have been stopped
and including this in te KASSERT.

PR:		204273
Reviewed by:	bz, gnn
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6581
2016-05-28 13:05:39 +00:00
John Baldwin
fdce57a042 Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
Edward Tomasz Napierala
084d207584 Remove misc NULL checks after M_WAITOK allocations.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-10 10:26:07 +00:00
Pedro F. Giffuni
453130d9bf sys/dev: minor spelling fixes.
Most affect comments, very few have user-visible effects.
2016-05-03 03:41:25 +00:00
Pedro F. Giffuni
b790c1938d etc: minor spelling fixes.
Mostly comments but also some user-visible strings.

MFC after:	2 weeks
2016-05-02 16:47:28 +00:00
Pedro F. Giffuni
323b076e9c sys: use our nitems() macro when param.h is available.
This should cover all the remaining cases in the kernel.

Discussed in:	freebsd-current
2016-04-21 19:40:10 +00:00
Pedro F. Giffuni
8dfea46460 Remove slightly used const values that can be replaced with nitems().
Suggested by:	jhb
2016-04-21 15:38:28 +00:00
Pedro F. Giffuni
198e7845ee Remove unused e500_event_codes_size.
Found by:	jhb
2016-04-20 20:37:58 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
Justin Hibbits
c4d7f6ab97 Fix a masking bug for e500 PMC.
No idea how this slipped through my regression testing.  pe_code is the event to
count, pe_cpu is the CPU family mask.
2016-04-09 01:02:17 +00:00
Konstantin Belousov
411c83ccd6 If full width writes to the performance monitoring counters are
supported, use full-width aliases MSRs for writes.  This fixes the
"[pmc,X] negative increment" assertion on the context switch when
clipped counter value is sign-extended.

Add definitions for the MSR IA32_PERF_CAPABILITIES needed to detect
the feature.

PR:	207068
Submitted by:	joss.upton@yahoo.com
MFC after:	2 weeks
2016-02-12 07:27:24 +00:00
Konstantin Belousov
0c8cc7b076 Remove tautological cast.
PR:	207068
Submitted by:	joss.upton@yahoo.com
MFC after:	2 weeks
2016-02-12 07:19:59 +00:00
Konstantin Belousov
db57c70a5b Rename P_KTHREAD struct proc p_flag to P_KPROC.
I left as is an apparent bug in ntoskrnl_var.h:AT_PASSIVE_LEVEL()
definition.

Suggested by:	jhb
Sponsored by:	The FreeBSD Foundation
2016-02-09 16:30:16 +00:00
Konstantin Belousov
0fb2c5d60c Do not call vn_fullpath(9) (through the pmc_getfilename() wrapper)
when its result is immediately ignored, i.e. for kernel processes
forked from the user process.  Do not test for non-null before freeing
string.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-02-06 15:39:04 +00:00
Ruslan Bukin
28029b68c0 Welcome the RISC-V 64-bit kernel.
This is the final step required allowing to compile and to run RISC-V
kernel and userland from HEAD.

RISC-V is a completely open ISA that is freely available to academia
and industry.

Thanks to all the people involved! Special thanks to Andrew Turner,
David Chisnall, Ed Maste, Konstantin Belousov, John Baldwin and
Arun Thomas for their help.
Thanks to Robert Watson for organizing this project.

This project sponsored by UK Higher Education Innovation Fund (HEIF5) and
DARPA CTSRD project at the University of Cambridge Computer Laboratory.

FreeBSD/RISC-V project home: https://wiki.freebsd.org/riscv

Reviewed by:	andrew, emaste, kib
Relnotes:	Yes
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D4982
2016-01-29 15:12:31 +00:00
Justin Hibbits
cdf9344e50 e5500 HWPMC is identical to e500mc, so add support check for it. 2016-01-17 00:14:22 +00:00
Randall Stewart
34d659d314 More fixes in the various intel processors, fixing missing
IAP_F_FM's as well as incorrect umask specifications for
some of the new Broadwell/Skylake PMC's. Also silvermont
had a *lot* of missing IAP_F_FM.

Sponsored by:	Netflix Inc.
2015-12-11 01:21:32 +00:00
Randall Stewart
b01c40f171 Fix the tunable in logging so that if its pre-11 we have the proper
line so the tunable is present.

Sponsored by:	Netflix Inc.
2015-12-09 22:46:40 +00:00