freebsd-dev

Author	SHA1	Message	Date
Matt Macy	f992dd4b5c	pmc: convert native to jsonl and track TSC value of samples - add '-j' options to filter to enable converting native pmc log format to json lines format to enable the use of scripts and external tooling % pmc filter -j pmc.log pmc.jsonl - Record the tsc value in sampling interrupts as opposed to recording nanotime when the sample is copied to a global log in hardclock - potentially many milliseconds later. - At initialize record the tsc_freq and the time of day to give us an offset for translating the tsc values in callchain records	2018-06-07 02:03:22 +00:00
Matt Macy	41abd7afa3	hwpmc: don't log pid->name more than once	2018-06-07 00:54:43 +00:00
Matt Macy	b2ca2e50b9	hwpmc: add summary command and further metadata extensions metadata changes: - log pmc sample rate with pmcallocate - log proc flags with thread / process logging to identify user vs kernel threads fixes: - use log cpuid to translate event id to event name Implement rudimentary summary command to track sample counts by thread and process name within a pmc log. % make -j4 buildkernel >& /dev/null & % sudo pmcstat -S unhalted_core_cycles -S llc-misses -O foo sleep 15 % pmc summary foo cpu_clk_unhalted.thread_p_any: idle: 138108207162 clang-6.0: 105336158004 sh: 72340108510 make: 8642012963 kernel: 7754011631 longest_lat_cache.miss: clang-6.0: 87502625 sh: 40901227 make: 5500165 kernel: 3300099 awk: `2000060` % pmc summary -f ~/foo idx: 278 name: cpu_clk_unhalted.thread_p_any rate: 2000003 idle: 69054 clang-6.0: 52668 sh: 36170 make: 4321 kernel: 3877 hwpmc: proc(7445): 3319 awk: 1289 xargs: 357 rand_harvestq: 181 mtree: 102 intr: 53 zfskern: 31 usb: 7 pagedaemon: 4 ntpd: 3 syslogd: 1 acpi_thermal: 1 logger: 1 syncer: 1 snmptrapd: 1 sleep: 1 idx: 17 name: longest_lat_cache.miss rate: 100003 clang-6.0: 875 sh: 409 make: 55 kernel: 33 awk: 20 hwpmc: proc(7445): 14 xargs: 9 idle: 8 intr: 3 zfskern: 2	2018-06-06 02:48:09 +00:00
Matt Macy	ebfaf69cc0	hwpmc: log name->pid, name->tid mappings By logging all threads and processes 'pmc filter' can now filter on process or thread name, relieving the user of the burden of determining which tid or pid was which when the sample was taken. % pmc filter -T if_io_tqg -P nginx pmc.log pmc-iflib.log % pmc filter -x -T idle pmc.log pmc-noidle.log	2018-06-05 04:26:40 +00:00
Matt Macy	ac7012d284	hwpmc: don't defer user callchain capture completion to ast	2018-06-04 21:17:30 +00:00
Matt Macy	cf823003a7	hwpmc: remove gratuitous curthread checks	2018-06-04 17:49:34 +00:00
Matt Macy	9645bcabdf	hwpmc: fix fixed counters checks	2018-06-04 04:49:06 +00:00
Matt Macy	07d80fd8dc	hwpmc: ABI fixes - increase pmc cpuid field from 8 to 12 bits - add cpuid version string to initialize entry in the log so that filter can identify which counter index an event name maps to - GC unused config flags - make fixed counter assignment more robust as well as the changes needed to be properly identified for filter	2018-06-04 02:05:48 +00:00
Matt Macy	5de96e33d6	hwpmc: support sampling both kernel and user stacks when interrupted in kernel This adds the -U options to pmcstat which will attribute in-kernel samples back to the user stack that invoked the system call. It is not the default, because when looking at kernel profiles it is generally more desirable to merge all instances of a given system call together. Although heavily revised, this change is directly derived from D7350 by Jonathan T. Looney. Obtained from: jtl Sponsored by: Juniper Networks, Limelight Networks	2018-06-04 01:10:23 +00:00
Matt Macy	2ce69a4d04	hwpmc: ensure that mapin updates are synchronous	2018-06-03 19:37:17 +00:00
Matt Macy	e92a1350b5	hwpmc: remove unused pre-table driven bits for intel Intel now provides comprehensive tables for all performance counters and the various valid configuration permutations as text .json files. Libpmc has been converted to use these and hwpmc_core has been greatly simplified by moving to passthrough of the table values. The one gotcha is that said tables don't support pentium pro and and pentium IV. There's very few users of hwpmc on _amd64_ kernels on new hardware. It is unlikely that anyone is doing low level optimization on 15 year old Intel hardware. Nonetheless, if someone feels strongly enough to populate the corresponding tables for p4 and ppro I will reinstate the files in to the build. Code for the K8 counters and !x86 architectures remains unchanged.	2018-05-31 22:41:07 +00:00
Matt Macy	02ce8216df	hwpmc: remove stale assert Reported by: eadler	2018-05-30 06:29:22 +00:00
Matt Macy	b99aa0fbb2	hwpmc: don't enter epoch section across mmap hook	2018-05-29 18:03:48 +00:00
Matt Macy	23c01e5b57	hwpmc: don't incrorrectly strip the ANY flag	2018-05-29 04:04:06 +00:00
Matt Macy	ba32b20330	hwpmc: make pmc class specification work to enable fixed function counters	2018-05-28 23:17:57 +00:00
Matt Macy	3de228499a	hwmpc: fix brain-damaged handling of thread descriptor freeing	2018-05-28 23:16:39 +00:00
Matt Macy	39446ce5b3	hwpmc_logging.c: don't call wakeup_one with thread lock held, don't malloc(M_WAITOK) in an epoch section	2018-05-28 23:12:26 +00:00
Matt Macy	959826ca1b	pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the vendor provided pmu-events tables and sundry cleanups. The vendor pmu-events tables provide counter descriptions, default sample rates, event, umask, and flag values for all the counter configuration permutations. Using this gives us: - much simpler kernel code for the MD component - helpful long and short event descriptions - simpler user code - sample rates that won't overload the system Update man page with newer sample types and remove unused sample type.	2018-05-26 19:29:19 +00:00
Matt Macy	5506ceb87f	Revert r334242 "pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the" because of squash commit messages	2018-05-26 19:26:19 +00:00
Matt Macy	4928135658	pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the vendor provided pmu-events tables and sundry cleanups. The vendor pmu-events tables provide counter descriptions, default sample rates, event, umask, and flag values for all the counter configuration permutations. Using this gives us: - much simpler kernel code for the MD component - helpful long and short event descriptions - simpler user code - sample rates that won't overload the system Update man page with newer sample types and remove unused sample type. Squashed commit of the following: commit 4459d43eff815bec08ccc5533dbe5de846f03128 Author: Matt Macy <mmacy@mattmacy.io> Date: Sat May 26 00:06:31 2018 -0700 libpmc: fix pmu function signatures for non amd64 commit a2cb8bbc586c65d41f9b291430a2261ec67b59fe Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 22:38:11 2018 -0700 pmcstat: fix indentation of usage commit f686954b15ff56a833ac80404898977cb80a265b Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 22:19:49 2018 -0700 pmclog(3): add callchain and pmcallocatedyn, remove pcsample commit 73e13a0d2e9498c81c150d14d022050cee7511bb Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 22:19:00 2018 -0700 pmclog.h: GC pcsample field commit 3e93ffd65da641fa657539dad3c48e281f8b5798 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 22:05:57 2018 -0700 hwpmc: make Intel core CPUs use external event tables commit 634f5fae1e1644ac324003136c66cd9c619d1c93 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 22:00:06 2018 -0700 pmclog: update log record types, bump PMC_MAJOR - explicitly make log record types a multiple of 8 bytes - hook in pmu event types for pmc_allocate records - remove references to no longer PCSAMPLE record commit 83d84fcd2d65bdf6ddcb2e155a22f0cfa2a9c225 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 21:52:10 2018 -0700 libpmc: add support for having vendor table driven pmc_allocate commit 9e6ad63c40c2fce8404847ace5078ca6cb33a736 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 19:11:33 2018 -0700 hwpmc_core: add accessors for EVSEL & UMASK, make IAP_UMASK useful to user commit 859dceb93daa6419a48c794db99b6758e5b041c9 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 19:09:45 2018 -0700 pmcstat: update usage and man page as well as make -L consistent with pmccontrol commit 79c7d8597e28c2eb13f5f9113e65ec2792ca57b1 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 18:07:03 2018 -0700 pmu_util: add support for all current intel event keywords commit d8089c7f6a6c8527f38324252b1ffb47004694c6 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 17:45:00 2018 -0700 add description for new arguments commit 058336740bab53c62ec88a3a026ea848cf3878c6 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 17:38:15 2018 -0700 libpmc: move pmu_events table and pmu_utils out of libpmcstat so that they can be used by pmc_allocate commit 049b66b382e2f833c3f47bc8df9e750cb265709f Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 16:12:41 2018 -0700 pmcstat: hook pmu_events counter description utility routines in commit f5e01e7b37a691dc045e1aa16b3ebdd162515de8 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 16:11:59 2018 -0700 pmu_events: add utility routines for listing counters and their descriptions commit cba4d4f8907f772279f86f18f915e0d74d33ac56 Author: Matt Macy <mmacy@mattmacy.io> Date: Fri May 25 16:09:50 2018 -0700 pmu-events: expand out skylake regex to simplify string matches	2018-05-26 18:12:50 +00:00
Matt Macy	0f8d79d977	CK: update consumers to use CK macros across the board r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors	2018-05-24 23:21:23 +00:00
Matt Macy	a85289cf9b	hwppmc: set threadid in callchain records - second part of r334108	2018-05-23 17:44:29 +00:00
Matt Macy	ae573a91cf	pmc: detach free_gtask on unload Reported by: pho	2018-05-20 20:34:15 +00:00
Matt Macy	f2daab2c8f	pmc: avoid potential race on shutdown Clear shutdown flag first, conservatively allow 5ms for all hardclock consumers to see flag before drainining	2018-05-20 19:35:24 +00:00
Matt Macy	70398c2f86	epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj	2018-05-18 17:29:43 +00:00
Matt Macy	6161b98c99	hwpmc: Implement per-thread counters for PMC sampling This implements per-thread counters for PMC sampling. The thread descriptors are stored in a list attached to the process descriptor. These thread descriptors can store any per-thread information necessary for current or future features. For the moment, they just store the counters for sampling. The thread descriptors are created when the process descriptor is created. Additionally, thread descriptors are created or freed when threads are started or stopped. Because the thread exit function is called in a critical section, we can't directly free the thread descriptors. Hence, they are freed to a cache, which is also used as a source of allocations when needed for new threads. Approved by: sbruno Obtained from: jtl Sponsored by: Juniper Networks, Limelight Networks Differential Revision: https://reviews.freebsd.org/D15335	2018-05-16 22:29:20 +00:00
Matt Macy	102ccac21c	hwpmc: don't reference domain index with no memory backing it On multi-socket the domain will be correctly set for a given CPU regardless of whether or not NUMA is enabled. Approved by: sbruno	2018-05-14 06:11:25 +00:00
Matt Macy	8fa7df3668	pmc: don't add pmc owner to list until setup is complete Once a pmc owner is added to the pmc_ss_owners list it is visible for all to see. We don't want this to happen until setup is complete. Reported by: mjg Approved by: sbruno	2018-05-14 01:08:47 +00:00
Matt Macy	0f00315cb3	hwpmc: fix load/unload race and vm map LOR - fix load/unload race by allocating the per-domain list structure at boot - fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch to protect the liveness of elements of the pmc_ss_owners list Reported by: pho Approved by: sbruno	2018-05-14 00:21:04 +00:00
Matt Macy	f1401123c5	hwpmc/epoch - don't reference domain if NUMA is not set It appears that domain information is set correctly independent of whether or not NUMA is defined. However, there is no memory backing secondary domains leading to allocation failure. Reported by: pho@, np@ Approved by: sbruno@	2018-05-12 20:00:29 +00:00
Matt Macy	d626a614b9	hwpmc(9): clear remaining sample work for hardclock - fix last minute change in 333509 where by runcount references to a pmc would remaining causing us to pause loop forever Approved by: sbruno	2018-05-12 03:45:30 +00:00
Matt Macy	e6b475e0af	hwpmc(9): Make pmclog buffer pcpu and update constants On non-trivial SMP systems the contention on the pmc_owner mutex leads to a substantial number of samples captured being from the pmc process itself. This change a) makes buffers larger to avoid contention on the global list b) makes the working sample buffer per cpu. Run pmcstat in the background (default event rate of 64k): pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 & Before: make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total After: make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total For more realistic overhead measurement set the sample rate for ~2khz on a 2.1Ghz processor: pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 & Collecting 10 samples of `make -j96 buildkernel` from each: x before + after real time: N Min Max Median Avg Stddev x 10 76.4 127.62 84.845 88.577 15.100031 + 10 59.71 60.79 60.135 60.179 0.29957192 Difference at 95.0% confidence -28.398 +/- 10.0344 -32.0602% +/- 7.69825% (Student's t, pooled s = 10.6794) system time: N Min Max Median Avg Stddev x 10 2277.96 6948.53 2949.47 3341.492 1385.2677 + 10 1038.7 1081.06 1070.555 1064.017 15.85404 Difference at 95.0% confidence -2277.47 +/- 920.425 -68.1574% +/- 8.77623% (Student's t, pooled s = 979.596) x no pmc + pmc running real time: HEAD: N Min Max Median Avg Stddev x 10 58.38 59.15 58.86 58.847 0.22504567 + 10 76.4 127.62 84.845 88.577 15.100031 Difference at 95.0% confidence 29.73 +/- 10.0335 50.5208% +/- 17.0525% (Student's t, pooled s = 10.6785) patched: N Min Max Median Avg Stddev x 10 58.38 59.15 58.86 58.847 0.22504567 + 10 59.71 60.79 60.135 60.179 0.29957192 Difference at 95.0% confidence 1.332 +/- 0.248939 2.2635% +/- 0.426506% (Student's t, pooled s = 0.264942) system time: HEAD: N Min Max Median Avg Stddev x 10 1010.15 1073.31 1025.465 1031.524 18.135705 + 10 2277.96 6948.53 2949.47 3341.492 1385.2677 Difference at 95.0% confidence 2309.97 +/- 920.443 223.937% +/- 89.3039% (Student's t, pooled s = 979.616) patched: N Min Max Median Avg Stddev x 10 1010.15 1073.31 1025.465 1031.524 18.135705 + 10 1038.7 1081.06 1070.555 1064.017 15.85404 Difference at 95.0% confidence 32.493 +/- 16.0042 3.15% +/- 1.5794% (Student's t, pooled s = 17.0331) Reviewed by: jeff@ Approved by: sbruno@ Differential Revision: https://reviews.freebsd.org/D15155	2018-05-12 01:26:34 +00:00
Matt Macy	cbd92ce62e	Eliminate the overhead of gratuitous repeated reinitialization of cap_rights - Add macros to allow preinitialization of cap_rights_t. - Convert most commonly used code paths to use preinitialized cap_rights_t. A 3.6% speedup in fstat was measured with this change. Reported by: mjg Reviewed by: oshogbo Approved by: sbruno MFC after: 1 month	2018-05-09 18:47:24 +00:00
Fabien Thomas	9f4f1d4d1f	Fix pmcstat exit from kernel introduced by r325275. pmcstat request for close will generate a close event. This event will be in turn received by pmcstat to close the file. Reviewed by: kib Tested by: pho MFC after: 1 week Sponsored by: Stormshield	2018-01-17 16:41:22 +00:00
Pedro F. Giffuni	718cf2ccb9	sys/dev: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 14:52:40 +00:00
Konstantin Belousov	f4dd123e15	Do not leak PMC_PO_OWNS_LOGFILE on error. Note that PMCLOG_RESERVE_WITH_ERROR() macro contains goto error; statement and executed after the flag is set. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-11-13 10:45:31 +00:00
Konstantin Belousov	c9da263712	Style bug. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-11-13 10:43:31 +00:00
Konstantin Belousov	d2cd638852	Check that the pmc index is less than the number of hardware PMCs, instead of asserting the condition. The row index is directly supplied by userspace, the kernel must handle invalid values. Submitted by: pho MFC after: 3 days	2017-11-10 19:10:14 +00:00
Konstantin Belousov	20b555e146	Do not run pmclog_configure_log() without pmc_sx protection. The r195005 unlocked pmc_sx before calling into pmclog_configure_log() to avoid the LOR, but it allows flush or closelog to run in parallel with the configuration, causing many failure modes. Revert r195005. Pre-create the logging process, allowing it to run after the set up succeeded, otherwise the process terminates itself. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 11:43:39 +00:00
Konstantin Belousov	1121a37474	Be protective and check the po_file validity before dropping the ref. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 11:37:45 +00:00
Konstantin Belousov	ea4d25f90b	In hwpmc, do not double-close the logging file. hwpmc(4) must not voluntarily call fo_close(), doing this causes double-close of the file. It seems to almost avoid bad consequences for pipes, but other types of files demonstrate random memory access. To fix, remove fo_close() calls, which also do not provide the declared wake-up of waiters consistently. Instead, send a signal to the logger and configure the logger process to not block it. Since logger never returns to userspace, the signal only causes termination of the interruptible sleeps in fo_write(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 11:32:52 +00:00
Konstantin Belousov	bd63e82975	There is no use for dropping Giant in the pmc syscall. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 11:16:18 +00:00
Konstantin Belousov	cf9ef80607	Minor style tweaks. Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 11:05:47 +00:00
Konstantin Belousov	1cfbc451b9	Use designated initializers for pmc sysent and module data. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 10:49:41 +00:00
Ruslan Bukin	07ff05c2ae	o Support for Kabylake CPU PMCs (fall down to PMC_CPU_INTEL_SKYLAKE). o Fix bugs in events descriptions for Skylake, Skylake Xeon and Haswell. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D12654	2017-10-13 15:02:29 +00:00
Conrad Meyer	1356a2e6fa	hwpmc(4): Actually use a sufficiently wide type jhibbits@ points out that left shifting bits 8-11 24 bits won't fit in a 32-bit integer either. Corrects r324533. Submitted by: jhibbits Sponsored by: Dell EMC Isilon	2017-10-11 15:13:40 +00:00
Conrad Meyer	a7b8be82f0	hwpmc(4): Force sufficiently wide type for left shift Ordinary input to this macro comes from pe_code, which is uint16_t. Coverity points out that shifting such a value discards the result of a 24 bit shift, which is not what we want. A follow-up to r324291. CID: 1381676 Sponsored by: Dell EMC Isilon	2017-10-11 14:59:04 +00:00
Conrad Meyer	1d3aa3624d	hwpmc(4): Add support for extended AMD events Sponsored by: Dell EMC Isilon	2017-10-04 23:35:10 +00:00
Konstantin Belousov	b99b705d9c	Skylake server core PMC support for hwpmc(4). Reviewed by: emaste Sponsored by: The FreeBSD Foundation Hardware provided by: Intel MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12221	2017-09-06 17:19:48 +00:00
Konstantin Belousov	16997138d3	Fix logic error in the the assert, causing the condition to be always true. Also improve the formatting of the corresponding KASSERT message. Based on the submission by: Svyatoslav <razmyslov@viva64.com> Found by: PVS-Studio PR: 217741 Reviewed by: emaste Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week	2017-08-08 15:46:29 +00:00
Zbigniew Bodek	6cb40391c4	Fix INVARIANTS debug code in HWPMC When HWPMC stops sampling, ps_pmc may be freed before samples are processed. In such situation treat PMC as stopped. Add "ifdef" to fix build without INVARIANTS code. Submitted by: Michal Mazur <mkm@semihalf.com> Obtained from: Semihalf Sponsored by: Stormshield, Netgate Differential revision: https://reviews.freebsd.org/D10912	2017-06-13 18:53:56 +00:00
Zbigniew Bodek	95ca4f5a0e	Fix event table for Cortex A9. Removed events 0x8 (INSTR_EXECUTED), 0xE (PC_PROC_RETURN) and 0x13-0x1d not supported on Cortex A9. Add events 0x68 and 0x6E which replaced 0x8 and 0xE. Submitted by: Michal Mazur <mkm@semihalf.com> Obtained from: Semihalf Sponsored by: Stormshield, Netgate Differential revision: https://reviews.freebsd.org/D10911	2017-06-13 18:52:39 +00:00
Zbigniew Bodek	ab632d9651	Fix HWPMC interrupt handling in Counting Mode Additionally: - Fix support for Cycle Counter (evsel == 0xFF) - Stop and mask interrupts from all counters on init and finish Submitted by: Michal Mazur <mkm@semihalf.com> Obtained from: Semihalf Sponsored by: Stormshield, Netgate Differential revision: https://reviews.freebsd.org/D10910	2017-06-13 18:51:23 +00:00
Fabien Thomas	d42aefee43	Fix arm stack frame walking support: - Adjust stack offset for Clang - Correctly fill registers for fake stack frame (soft PMC) MFC after: 1 week Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D7396	2017-03-14 16:06:57 +00:00
George V. Neville-Neil	593b0c8420	Fix PMC architecture check to handle later IPAs including Skylake Tested with tools/test/hwpmc/pmctest.py Obtained from: Oliver Pinter MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9036	2017-01-04 02:15:03 +00:00
Andriy Gapon	593077d613	pmc_process_csw_out: ignore deleted counters I see the fllowing panic on AMD when exiting pmcstat: panic: [pmc,1473] pp_pmcval outside of expected range cpu=2 ri=17 pp_pmcval=fffffffffa529f5b pm_reloadcount=10000 It seems that at least on AMD a performance counter keeps counting after overflowing. When pmcstat exits it sets counters that it used to PMC_STATE_DELETED and waits until their use count goes to zero. amd_intr() wouldn't reload a counter in that state and, thus, a counter would be allowed to overflow. That means that the counter's value would be allowed to go outside the expected range. MFC after: 2 weeks	2016-11-10 11:12:45 +00:00
Andriy Gapon	3c1f73b18d	hwpmc: fix a race between amd_stop_pmc and amd_intr It is possible that wrmsr in amd_stop_pmc() causes an overflow in a counter that it disables. In that case a non-maskable interrupt is generated. The interrupt handler code was written in such a way that it would re-enable the counter. That would lead to an unexpected interrupt later on. This problem was easy to reproduce with $ pmcstat -T -P instructions -t $pid if the target process is sufficiently busy and there are context switches from time to time. There would be a lot of interrupts to "race" with amd_stop_pmc() called during the context switches. The problem affected only AMD processors. While there, trace whether amd_intr() claimed an interrupt. Reviewed by: jhb MFC after: 2 weeks	2016-10-30 09:38:10 +00:00
Ed Maste	cec1957ae1	hwpmc: remove sys/capability.h backwards compatibility The Capsicum header is installed as sys/capsicum.h in stable/10 as well.	2016-09-20 12:56:03 +00:00
John Baldwin	1f095f7051	Apply the fix from r232612 to fixed function counters. Reviewed by: emaste MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7397	2016-08-03 16:52:00 +00:00
Andrew Turner	7bc7e3cd65	Don't panic in hwpmc when stopping sampling. When hwpmc stops sampling it will set the pm_state to something other than PMC_STATE_RUNNING. This means the following sequence can happen: CPU 0: Enter the interrupt handler CPU 0: Set the thread TDP_CALLCHAIN pflag CPU 1: Stop sampling CPU 0: Call pmc_process_samples, sampling is stopped so clears ps_nsamples CPU 0: Finishes interrupt processing with the TDP_CALLCHAIN flag set CPU 0: Call pmc_capture_user_callchain to capture the user call chain CPU 0: Find all the pmc sample are free so no call chains need to be captured CPU 0: KASSERT because of this This fixes the issue by checking if any of the samples have been stopped and including this in te KASSERT. PR: 204273 Reviewed by: bz, gnn Obtained from: ABT Systems Ltd Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6581	2016-05-28 13:05:39 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Edward Tomasz Napierala	084d207584	Remove misc NULL checks after M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-05-10 10:26:07 +00:00
Pedro F. Giffuni	453130d9bf	sys/dev: minor spelling fixes. Most affect comments, very few have user-visible effects.	2016-05-03 03:41:25 +00:00
Pedro F. Giffuni	b790c1938d	etc: minor spelling fixes. Mostly comments but also some user-visible strings. MFC after: 2 weeks	2016-05-02 16:47:28 +00:00
Pedro F. Giffuni	323b076e9c	sys: use our nitems() macro when param.h is available. This should cover all the remaining cases in the kernel. Discussed in: freebsd-current	2016-04-21 19:40:10 +00:00
Pedro F. Giffuni	8dfea46460	Remove slightly used const values that can be replaced with nitems(). Suggested by: jhb	2016-04-21 15:38:28 +00:00
Pedro F. Giffuni	198e7845ee	Remove unused e500_event_codes_size. Found by: jhb	2016-04-20 20:37:58 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
Justin Hibbits	c4d7f6ab97	Fix a masking bug for e500 PMC. No idea how this slipped through my regression testing. pe_code is the event to count, pe_cpu is the CPU family mask.	2016-04-09 01:02:17 +00:00
Konstantin Belousov	411c83ccd6	If full width writes to the performance monitoring counters are supported, use full-width aliases MSRs for writes. This fixes the "[pmc,X] negative increment" assertion on the context switch when clipped counter value is sign-extended. Add definitions for the MSR IA32_PERF_CAPABILITIES needed to detect the feature. PR: 207068 Submitted by: joss.upton@yahoo.com MFC after: 2 weeks	2016-02-12 07:27:24 +00:00
Konstantin Belousov	0c8cc7b076	Remove tautological cast. PR: 207068 Submitted by: joss.upton@yahoo.com MFC after: 2 weeks	2016-02-12 07:19:59 +00:00
Konstantin Belousov	db57c70a5b	Rename P_KTHREAD struct proc p_flag to P_KPROC. I left as is an apparent bug in ntoskrnl_var.h:AT_PASSIVE_LEVEL() definition. Suggested by: jhb Sponsored by: The FreeBSD Foundation	2016-02-09 16:30:16 +00:00
Konstantin Belousov	0fb2c5d60c	Do not call vn_fullpath(9) (through the pmc_getfilename() wrapper) when its result is immediately ignored, i.e. for kernel processes forked from the user process. Do not test for non-null before freeing string. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-02-06 15:39:04 +00:00
Ruslan Bukin	28029b68c0	Welcome the RISC-V 64-bit kernel. This is the final step required allowing to compile and to run RISC-V kernel and userland from HEAD. RISC-V is a completely open ISA that is freely available to academia and industry. Thanks to all the people involved! Special thanks to Andrew Turner, David Chisnall, Ed Maste, Konstantin Belousov, John Baldwin and Arun Thomas for their help. Thanks to Robert Watson for organizing this project. This project sponsored by UK Higher Education Innovation Fund (HEIF5) and DARPA CTSRD project at the University of Cambridge Computer Laboratory. FreeBSD/RISC-V project home: https://wiki.freebsd.org/riscv Reviewed by: andrew, emaste, kib Relnotes: Yes Sponsored by: DARPA, AFRL Sponsored by: HEIF5 Differential Revision: https://reviews.freebsd.org/D4982	2016-01-29 15:12:31 +00:00
Justin Hibbits	cdf9344e50	e5500 HWPMC is identical to e500mc, so add support check for it.	2016-01-17 00:14:22 +00:00
Randall Stewart	34d659d314	More fixes in the various intel processors, fixing missing IAP_F_FM's as well as incorrect umask specifications for some of the new Broadwell/Skylake PMC's. Also silvermont had a lot of missing IAP_F_FM. Sponsored by: Netflix Inc.	2015-12-11 01:21:32 +00:00
Randall Stewart	b01c40f171	Fix the tunable in logging so that if its pre-11 we have the proper line so the tunable is present. Sponsored by: Netflix Inc.	2015-12-09 22:46:40 +00:00
Randall Stewart	f19bae413c	Add support for Intel Skylake and Intel Broadwell PMC's. The Broadwell PMC's have been tested on the Broadwell-Xeon with a hacked up version of pmcstudy -T. I still need to circle back and add in to pmcstudy all the new tests from the Broadwell Vtune guide (for the hacked up version I just made it so I could run the -T option). The Skylake CPU is not yet available (even though Intel is advertising it .. imagine that). The Skylake PMC's will need to be tested once we can get a sample skylake CPU :-) Sponsored by: Netflix Inc.	2015-11-30 17:35:49 +00:00
Jonathan T. Looney	5eaa6f01f5	Improve accuracy of PMC sampling frequency The code tracks a counter which is the number of events until the next sample. On context switch in, it loads the saved counter. On context switch out, it tries to calculate a new saved counter. Problems: 1. The saved counter was shared by all threads in a process. However, this means that all threads would be initially loaded with the same saved counter. However, that could result in sampling more often than once every X number of events. 2. The calculation to determine a new saved counter was backwards. It added when it should have subtracted, and subtracted when it should have added. Assume a single-threaded process with a reload count of 1000 events. Assuming the counter on context switch in was 100 and the counter on context switch out was 50 (meaning the thread has "consumed" 50 more events), the code would calculate a new saved counter of 150 (instead of the proper 50). Fix: 1. As soon as the saved counter is used to initialize a monitor for a thread on context switch in, set the saved counter to the reload count. That way, subsequent threads to use the saved counter will get the full reload count, assuring we sample at least once every X number of events (across all threads). 2. Change the calculation of the saved counter. Due to the change to the saved counter in #1, we simply need to add (modulo the reload count) the remaining counter time we retrieve from the CPU when a thread is context switched out. Differential Revision: https://reviews.freebsd.org/D4122 Approved by: gnn (mentor) MFC after: 1 month Sponsored by: Juniper Networks	2015-11-16 15:22:15 +00:00
Jonathan T. Looney	c66ea2ee5c	Optimizations to the way hwpmc gathers user callchains Changes to the code to gather user stacks: * Delay setting pmc_cpumask until we actually have the stack. * When recording user stack traces, only walk the portion of the ring that should have samples for us. Sponsored by: Juniper Networks Approved by: gnn (mentor) MFC after: 1 month	2015-11-14 01:45:55 +00:00
Jonathan T. Looney	a39249680f	Fix hwpmc "stalled" behavior Currently, there is a single pm_stalled flag that tracks whether a performance monitor was "stalled" due to insufficent ring buffer space for samples. However, because the same performance monitor can run on multiple processes or threads at the same time, a single pm_stalled flag that impacts them all seems insufficient. In particular, you can hit corner cases where the code fails to stop performance monitors during a context switch out, because it thinks the performance monitor is already stopped. However, in reality, it may be that only the monitor running on a different CPU was stalled. This patch attempts to fix that behavior by tracking on a per-CPU basis whether a PM desires to run and whether it is "stalled". This lets the code make better decisions about when to stop PMs and when to try to restart them. Ideally, we should avoid the case where the code fails to stop a PM during a context switch out. Sponsored by: Juniper Networks Reviewed by: jhb Approved by: gnn (mentor) Differential Revision: https://reviews.freebsd.org/D4124	2015-11-14 01:40:12 +00:00
Bjoern A. Zeeb	71f7442233	Now that we can detect the Cortex-A8 properly, fix the event list according to the Cortex-A8 TRM r3p2 section 3.2.49. The A8 list differs from the "ARM-v7 common" list, given the A8 was an earlier model. There is still more work to be done for other Cortex-Ax version as andrew points out, but I am just trying to fix A8 for now for teaching. MFC after: 2 weeks Sponsored by: DARPA/AFRL Obtained from: Cambridge/L41 Reviewed by: andrew Differential Revision: https://reviews.freebsd.org/D3876	2015-10-14 17:20:19 +00:00
Bjoern A. Zeeb	e6f4757735	When forking a child process with PMC_F_DESCENDANTS set in pmc_attach() in the parent, we will inherit the pmcids but cannot execute any operations on them in the child. The reason for this is that pmc_find_pmc() only tries to find the current process on the owners hash list, but given the child does not own the attachment, we cannot find it. Thus, in case the initial lookup fails, try to find the pmc_process state affiliated with the child process, lookup the pmc from there using the row index, and get the owner process from that pmc. Then continue as normal and lookup the pmc context of the owner (process). This allows us to call, e.g., pmc_start() in the child process before we start the work there, but to collect the accumulated results later in the parent. Sponsored by: DARPA,AFRL Obtained from: L41 Tested by: rwatson, L41 MFC after: 4 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D2052	2015-08-24 18:57:32 +00:00
Ruslan Bukin	3e0bfdd882	o Rework ARMv7 events list using aliases - same way as we have for arm64. o Extend it with Cortex A9-specific events.	2015-06-10 12:42:30 +00:00
Eric van Gyzen	63e4c6cdf9	Provide vnode in memory map info for files on tmpfs When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor)	2015-06-02 18:37:04 +00:00
John Baldwin	a1febbf667	Fix two bugs that could result in PMC sampling effectively stopping. In both cases, the the effect of the bug was that a very small positive number was written to the counter. This means that a large number of events needed to occur before the next sampling interrupt would trigger. Even with very frequently occurring events like clock cycles wrapping all the way around could take a long time. Both bugs occurred when updating the saved reload count for an outgoing thread on a context switch. First, the counter-independent code compares the current reload count against the count set when the thread switched in and generates a delta to apply to the saved count. If this delta causes the reload counter to go negative, it would add a full reload interval to wrap it around to a positive value. The fix is to add the full reload interval if the resulting counter is zero. Second, occasionally the raw counter value read during a context switch has actually wrapped, but an interrupt has not yet triggered. In this case the existing logic would return a very large reload count (e.g. 2^48 - 2 if the counter had overflowed by a count of 2). This was seen both for fixed-function and programmable counters on an E5-2643. Workaround this case by returning a reload count of zero. PR: 198149 Differential Revision: https://reviews.freebsd.org/D2557 Reviewed by: emaste MFC after: 1 week Sponsored by: Norse Corp, Inc.	2015-05-19 19:15:19 +00:00
John Baldwin	2b1df86c17	Use the proper mask when reloading sampling PMCs for Core CPUs. Differential Revision: https://reviews.freebsd.org/D2492 Reviewed by: emaste MFC after: 1 month	2015-05-19 19:01:22 +00:00
John Baldwin	0ceb54c2cf	Use fixed enum values for PMC_CLASSES(). This removes one of the frequent causes of ABI breakage when new CPU types are added to hwpmc(4). Differential Revision: https://reviews.freebsd.org/D2586 Reviewed by: davide, emaste, gnn (earlier version) MFC after: 2 weeks	2015-05-19 18:58:18 +00:00
Ruslan Bukin	bc88bb2bf3	Add Performance Monitoring Counters support for AArch64. Family-common and CPU-specific counters implemented. Supported CPUs: ARM Cortex A53/57/72. Reviewed by: andrew, bz, emaste, gnn, jhb Sponsored by: ARM Limited Differential Revision: https://reviews.freebsd.org/D2555	2015-05-19 15:25:47 +00:00
Bjoern A. Zeeb	62699f3424	Convert remaining hwpmc(4) debug printfs over to KTR to unbreak the build for at least powerpc kernels. Missed in r282658. MFC after: 10 days	2015-05-09 09:21:59 +00:00
John Baldwin	4a3690dfa1	Convert hwpmc(4) debug printfs over to KTR. Differential Revision: https://reviews.freebsd.org/D2487 Reviewed by: davide, emaste MFC after: 2 weeks Sponsored by: Norse Corp, Inc.	2015-05-08 19:40:00 +00:00
John Baldwin	680f1afd94	Move hwpmc(4) debugging code under a new HWPMC_DEBUG option instead of the broader DEBUG option. Reviewed by: emaste MFC after: 2 weeks Sponsored by: Norse Corp, Inc.	2015-05-08 15:57:23 +00:00
Justin Hibbits	a745246822	Implement hwpmc(4) for Freescale e500 core. This supports e500v1, e500v2, and e500mc. Tested only on e500v2, but the performance counters are identical across all, with e500mc having some additional events. Relnotes: Yes	2015-04-18 21:39:17 +00:00
Rui Paulo	bc3464096a	hwpmc: add initial Intel Broadwell support. The full list of aliases and events will follow in a subsequent commit. MFC after: 1 month	2015-04-05 05:14:20 +00:00
Rui Paulo	03a24b7026	Remove whitespace.	2015-04-05 05:09:38 +00:00
Adrian Chadd	f6e6460dfc	Add support for the MIPS74K SoC family performance counters events. These are similar to the mips24k performance counters - some are available on perfcnt0/3, some are available on perfcnt1/4. However, the events aren't all the same. * Add the events, named the same as from Linux oprofile. * Verify they're the same as "MIPS32(R) 74KTM Processor Core Family Software User's Manual"; Document Number: MD00519; Revision 01.05. * Rename INSTRUCTIONS to something else, so it doesn't clash with the alias INSTRUCTIONS. I'll try to tidy this up later; there are a few other aliases to add and shuffle around. Tested: * QCA9558 SoC (AP135 board) - MIPS74Kc core (no FPU.) * make universe; where it didn't fail for other reasons. TODO: * It'd be nice to support the four performance counters in at least this hardware, rather than just two. Reviewed by: bsdimp ("looks good; don't break world".)	2015-04-05 02:57:02 +00:00
Bjoern A. Zeeb	54384e56c9	Remove all the handcrafted assembly in hwpmc_armv7.c and use the common (autogenerated) versions. Removes extra vertical space, and makes it easier to grep for usage throughout the tree. Conditionally compile only for arm6 [1] (yes sounds odd but is right). Submitted by: andrew [1] Reviewed by: gnn, andrew (ian earlier version I think) Differential Revision: https://reviews.freebsd.org/D2159 Obtained from: Cambridge/L41 Sponsored by: DARPA, AFRL	2015-03-28 18:57:13 +00:00
Bjoern A. Zeeb	0ede88a413	Rather than defining our own magic checks here use INKERNEL() for the PMC_IN_KERNEL() macro definition. Add missing macros to extract the return address (LR) from the trapframe. Discussed with: andrew Obtained from: Cambridge/L41 Sponsored by: DARPA, AFRL MFC after: 2 weeks	2015-03-27 08:47:16 +00:00
Ryan Stone	67e51766bd	hwpmc: Fix event number to match enum name Differential revision: https://reviews.freebsd.org/D1592 Reviewed by: Joseph Kong MFC after: 1 month	2015-03-12 23:44:28 +00:00
Randall Stewart	de8d8ca4c8	You need to have the capabilities and not skip it if you are not on head.. otherwise the file pointer will be NULL and when you try to do something with it you will crash. Make the #else be the old capabilites, and then remove the erroneous ifdefs for 11. MFC after: 1 week (with the other MFC I was going to do until the panic)	2015-03-11 20:15:49 +00:00

1 2 3 4 5 ...

372 Commits