freebsd-nq

Author	SHA1	Message	Date
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Matt Macy	1bb4eefa6c	Fix sample check in hwpmc Don't drop samples with callchain pending Tested by: mjg@ Submitted by: Rajeeb Barman at amd.com MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17011	2019-10-13 22:26:55 +00:00
Doug Moore	2288078c5e	Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map. In case the implementation ever changes from using a chain of next pointers, then changing the macro definition will be necessary, but changing all the files that iterate over vm_map entries will not. Drop a counter in vm_object.c that would have an effect only if the vm_map entry count was wrong. Discussed with: alc Reviewed by: markj Tested by: pho (earlier version) Differential Revision: https://reviews.freebsd.org/D21882	2019-10-08 07:14:21 +00:00
Mark Johnston	8f77f60f94	hwpmc: Plug memory disclosures from PMC_OP_{GETPMCINFO,GETCPUINFO}. admbugs: 765 Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net> MFC after: 1 day Security: Kernel memory disclosure Sponsored by: The FreeBSD Foundation	2019-01-21 18:07:02 +00:00
Matt Macy	acf50a7f68	hwpmc: limit wait for user callchain collection to 1 tick The hwpmc pcpu sample buffer is prone to head of line blocking when waiting for user process to return to user space and collect a pending callchain. If more than one tick has elapsed between the time the sample entry was marked for collection and the time that the hardclock pmc handler runs to copy the records to a larger temporary buffer, mark the sample entry as not in use. This changes reduces the number of samples marked as not valid when collecting under load from ~99.5% to 5-20%. Reported by: mjg@ MFC after: 3 days	2018-11-05 08:11:16 +00:00
Mark Johnston	9978bd996b	Add malloc_domainset(9) and _domainset variants to other allocator KPIs. Remove malloc_domain(9) and most other _domain KPIs added in r327900. The new functions allow the caller to specify a general NUMA domain selection policy, rather than specifically requesting an allocation from a specific domain. The latter policy tends to interact poorly with M_WAITOK, resulting in situations where a caller is blocked indefinitely because the specified domain is depleted. Most existing consumers of the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy, in which we fall back to other domains to satisfy the allocation request. This change also defines a set of DOMAINSET_FIXED() policies, which only permit allocations from the specified domain. Discussed with: gallatin, jeff Reported and tested by: pho (previous version) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17418	2018-10-30 18:26:34 +00:00
Matt Macy	d9f1b8dbf2	hwpmc: Refactor sample ring buffer handling to fix races Refactor sample ring buffer ring handling to make it more robust to long running callchain collection handling r338112 introduced a (now fixed) regression that exposed a number of race conditions within the management of the sample buffers. This simplifies the handling and moves the decision to overwrite a callchain sample that has taken too long out of the NMI in to the hardlock handler. With this change the problem no longer shows up as a ring corruption but as the code spending all of its time in callchain collection. - Makes the producer / consumer index incrementing monotonic, making it easier (for me at least) to reason about. - Moves the decision to overwrite a sample from NMI context to interrupt context where we can enforce serialization. - Puts a time limit on waiting to collect a user callchain - putting a bound on head-of-line blocking causing samples to be dropped - Removes the flush routine which was previously needed to purge dangling references to the pmc from the sample buffers but now is only a source of a race condition on unload. Previously one could lock up or crash HEAD by running: pmcstat -S inst_retired.any_p -T and then hitting ^C After this change it is no longer possible. PR: 231793 Reviewed by: markj@ Approved by: re (gjb@) Differential Revision: https://reviews.freebsd.org/D17011	2018-10-05 05:55:56 +00:00
Matt Macy	0204d85a62	hwpmc: set default rate if event description lacks one / filter rate against misuse Not all event descriptions have a sample rate (such as inst_retired.any) this will restore the legacy behavior of using 65536 in that case. It also prevents accidental API misuse that could lead to panic. PR: 230985 Reported by: markj Reviewed by: markj Approved by: re (gjb) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16958	2018-09-14 01:30:05 +00:00
Matt Macy	6573d7580b	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
Mark Johnston	a9336cef39	Use the cached curthread reference in pmc_process_interrupt(). Fix indentation while here.	2018-06-11 16:27:09 +00:00
Matt Macy	eb7c901995	hwpmc: simplify calling convention for hwpmc interrupt handling pmc_process_interrupt takes 5 arguments when only 3 are needed. cpu is always available in curcpu and inuserspace can always be derived from the passed trapframe. While facially a reasonable cleanup this change was motivated by the need to workaround a compiler bug. core2_intr(cpu, tf) -> pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) -> pmc_add_sample(cpu, ring, pm, tf, inuserspace) In the process of optimizing the tail call the tf pointer was getting clobbered: (kgdb) up at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709 4709 pmc_save_kernel_callchain(ps->ps_pc, (kgdb) up 1205 error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, resulting in a crash in pmc_save_kernel_callchain.	2018-06-08 04:58:03 +00:00
Matt Macy	f992dd4b5c	pmc: convert native to jsonl and track TSC value of samples - add '-j' options to filter to enable converting native pmc log format to json lines format to enable the use of scripts and external tooling % pmc filter -j pmc.log pmc.jsonl - Record the tsc value in sampling interrupts as opposed to recording nanotime when the sample is copied to a global log in hardclock - potentially many milliseconds later. - At initialize record the tsc_freq and the time of day to give us an offset for translating the tsc values in callchain records	2018-06-07 02:03:22 +00:00
Matt Macy	41abd7afa3	hwpmc: don't log pid->name more than once	2018-06-07 00:54:43 +00:00
Matt Macy	b2ca2e50b9	hwpmc: add summary command and further metadata extensions metadata changes: - log pmc sample rate with pmcallocate - log proc flags with thread / process logging to identify user vs kernel threads fixes: - use log cpuid to translate event id to event name Implement rudimentary summary command to track sample counts by thread and process name within a pmc log. % make -j4 buildkernel >& /dev/null & % sudo pmcstat -S unhalted_core_cycles -S llc-misses -O foo sleep 15 % pmc summary foo cpu_clk_unhalted.thread_p_any: idle: 138108207162 clang-6.0: 105336158004 sh: 72340108510 make: 8642012963 kernel: 7754011631 longest_lat_cache.miss: clang-6.0: 87502625 sh: 40901227 make: 5500165 kernel: 3300099 awk: 2000060 % pmc summary -f ~/foo idx: 278 name: cpu_clk_unhalted.thread_p_any rate: 2000003 idle: 69054 clang-6.0: 52668 sh: 36170 make: 4321 kernel: 3877 hwpmc: proc(7445): 3319 awk: 1289 xargs: 357 rand_harvestq: 181 mtree: 102 intr: 53 zfskern: 31 usb: 7 pagedaemon: 4 ntpd: 3 syslogd: 1 acpi_thermal: 1 logger: 1 syncer: 1 snmptrapd: 1 sleep: 1 idx: 17 name: longest_lat_cache.miss rate: 100003 clang-6.0: 875 sh: 409 make: 55 kernel: 33 awk: 20 hwpmc: proc(7445): 14 xargs: 9 idle: 8 intr: 3 zfskern: 2	2018-06-06 02:48:09 +00:00
Matt Macy	ebfaf69cc0	hwpmc: log name->pid, name->tid mappings By logging all threads and processes 'pmc filter' can now filter on process or thread name, relieving the user of the burden of determining which tid or pid was which when the sample was taken. % pmc filter -T if_io_tqg -P nginx pmc.log pmc-iflib.log % pmc filter -x -T idle pmc.log pmc-noidle.log	2018-06-05 04:26:40 +00:00
Matt Macy	ac7012d284	hwpmc: don't defer user callchain capture completion to ast	2018-06-04 21:17:30 +00:00
Matt Macy	cf823003a7	hwpmc: remove gratuitous curthread checks	2018-06-04 17:49:34 +00:00
Matt Macy	5de96e33d6	hwpmc: support sampling both kernel and user stacks when interrupted in kernel This adds the -U options to pmcstat which will attribute in-kernel samples back to the user stack that invoked the system call. It is not the default, because when looking at kernel profiles it is generally more desirable to merge all instances of a given system call together. Although heavily revised, this change is directly derived from D7350 by Jonathan T. Looney. Obtained from: jtl Sponsored by: Juniper Networks, Limelight Networks	2018-06-04 01:10:23 +00:00
Matt Macy	02ce8216df	hwpmc: remove stale assert Reported by: eadler	2018-05-30 06:29:22 +00:00
Matt Macy	b99aa0fbb2	hwpmc: don't enter epoch section across mmap hook	2018-05-29 18:03:48 +00:00
Matt Macy	ba32b20330	hwpmc: make pmc class specification work to enable fixed function counters	2018-05-28 23:17:57 +00:00
Matt Macy	3de228499a	hwmpc: fix brain-damaged handling of thread descriptor freeing	2018-05-28 23:16:39 +00:00
Matt Macy	39446ce5b3	hwpmc_logging.c: don't call wakeup_one with thread lock held, don't malloc(M_WAITOK) in an epoch section	2018-05-28 23:12:26 +00:00
Matt Macy	0f8d79d977	CK: update consumers to use CK macros across the board r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors	2018-05-24 23:21:23 +00:00
Matt Macy	a85289cf9b	hwppmc: set threadid in callchain records - second part of r334108	2018-05-23 17:44:29 +00:00
Matt Macy	ae573a91cf	pmc: detach free_gtask on unload Reported by: pho	2018-05-20 20:34:15 +00:00
Matt Macy	70398c2f86	epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj	2018-05-18 17:29:43 +00:00
Matt Macy	6161b98c99	hwpmc: Implement per-thread counters for PMC sampling This implements per-thread counters for PMC sampling. The thread descriptors are stored in a list attached to the process descriptor. These thread descriptors can store any per-thread information necessary for current or future features. For the moment, they just store the counters for sampling. The thread descriptors are created when the process descriptor is created. Additionally, thread descriptors are created or freed when threads are started or stopped. Because the thread exit function is called in a critical section, we can't directly free the thread descriptors. Hence, they are freed to a cache, which is also used as a source of allocations when needed for new threads. Approved by: sbruno Obtained from: jtl Sponsored by: Juniper Networks, Limelight Networks Differential Revision: https://reviews.freebsd.org/D15335	2018-05-16 22:29:20 +00:00
Matt Macy	8fa7df3668	pmc: don't add pmc owner to list until setup is complete Once a pmc owner is added to the pmc_ss_owners list it is visible for all to see. We don't want this to happen until setup is complete. Reported by: mjg Approved by: sbruno	2018-05-14 01:08:47 +00:00
Matt Macy	0f00315cb3	hwpmc: fix load/unload race and vm map LOR - fix load/unload race by allocating the per-domain list structure at boot - fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch to protect the liveness of elements of the pmc_ss_owners list Reported by: pho Approved by: sbruno	2018-05-14 00:21:04 +00:00
Matt Macy	f1401123c5	hwpmc/epoch - don't reference domain if NUMA is not set It appears that domain information is set correctly independent of whether or not NUMA is defined. However, there is no memory backing secondary domains leading to allocation failure. Reported by: pho@, np@ Approved by: sbruno@	2018-05-12 20:00:29 +00:00
Matt Macy	e6b475e0af	hwpmc(9): Make pmclog buffer pcpu and update constants On non-trivial SMP systems the contention on the pmc_owner mutex leads to a substantial number of samples captured being from the pmc process itself. This change a) makes buffers larger to avoid contention on the global list b) makes the working sample buffer per cpu. Run pmcstat in the background (default event rate of 64k): pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 & Before: make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total After: make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total For more realistic overhead measurement set the sample rate for ~2khz on a 2.1Ghz processor: pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 & Collecting 10 samples of `make -j96 buildkernel` from each: x before + after real time: N Min Max Median Avg Stddev x 10 76.4 127.62 84.845 88.577 15.100031 + 10 59.71 60.79 60.135 60.179 0.29957192 Difference at 95.0% confidence -28.398 +/- 10.0344 -32.0602% +/- 7.69825% (Student's t, pooled s = 10.6794) system time: N Min Max Median Avg Stddev x 10 2277.96 6948.53 2949.47 3341.492 1385.2677 + 10 1038.7 1081.06 1070.555 1064.017 15.85404 Difference at 95.0% confidence -2277.47 +/- 920.425 -68.1574% +/- 8.77623% (Student's t, pooled s = 979.596) x no pmc + pmc running real time: HEAD: N Min Max Median Avg Stddev x 10 58.38 59.15 58.86 58.847 0.22504567 + 10 76.4 127.62 84.845 88.577 15.100031 Difference at 95.0% confidence 29.73 +/- 10.0335 50.5208% +/- 17.0525% (Student's t, pooled s = 10.6785) patched: N Min Max Median Avg Stddev x 10 58.38 59.15 58.86 58.847 0.22504567 + 10 59.71 60.79 60.135 60.179 0.29957192 Difference at 95.0% confidence 1.332 +/- 0.248939 2.2635% +/- 0.426506% (Student's t, pooled s = 0.264942) system time: HEAD: N Min Max Median Avg Stddev x 10 1010.15 1073.31 1025.465 1031.524 18.135705 + 10 2277.96 6948.53 2949.47 3341.492 1385.2677 Difference at 95.0% confidence 2309.97 +/- 920.443 223.937% +/- 89.3039% (Student's t, pooled s = 979.616) patched: N Min Max Median Avg Stddev x 10 1010.15 1073.31 1025.465 1031.524 18.135705 + 10 1038.7 1081.06 1070.555 1064.017 15.85404 Difference at 95.0% confidence 32.493 +/- 16.0042 3.15% +/- 1.5794% (Student's t, pooled s = 17.0331) Reviewed by: jeff@ Approved by: sbruno@ Differential Revision: https://reviews.freebsd.org/D15155	2018-05-12 01:26:34 +00:00
Fabien Thomas	9f4f1d4d1f	Fix pmcstat exit from kernel introduced by r325275. pmcstat request for close will generate a close event. This event will be in turn received by pmcstat to close the file. Reviewed by: kib Tested by: pho MFC after: 1 week Sponsored by: Stormshield	2018-01-17 16:41:22 +00:00
Pedro F. Giffuni	718cf2ccb9	sys/dev: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 14:52:40 +00:00
Konstantin Belousov	d2cd638852	Check that the pmc index is less than the number of hardware PMCs, instead of asserting the condition. The row index is directly supplied by userspace, the kernel must handle invalid values. Submitted by: pho MFC after: 3 days	2017-11-10 19:10:14 +00:00
Konstantin Belousov	20b555e146	Do not run pmclog_configure_log() without pmc_sx protection. The r195005 unlocked pmc_sx before calling into pmclog_configure_log() to avoid the LOR, but it allows flush or closelog to run in parallel with the configuration, causing many failure modes. Revert r195005. Pre-create the logging process, allowing it to run after the set up succeeded, otherwise the process terminates itself. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 11:43:39 +00:00
Konstantin Belousov	bd63e82975	There is no use for dropping Giant in the pmc syscall. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 11:16:18 +00:00
Konstantin Belousov	cf9ef80607	Minor style tweaks. Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 11:05:47 +00:00
Konstantin Belousov	1cfbc451b9	Use designated initializers for pmc sysent and module data. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D12882	2017-11-01 10:49:41 +00:00
Zbigniew Bodek	6cb40391c4	Fix INVARIANTS debug code in HWPMC When HWPMC stops sampling, ps_pmc may be freed before samples are processed. In such situation treat PMC as stopped. Add "ifdef" to fix build without INVARIANTS code. Submitted by: Michal Mazur <mkm@semihalf.com> Obtained from: Semihalf Sponsored by: Stormshield, Netgate Differential revision: https://reviews.freebsd.org/D10912	2017-06-13 18:53:56 +00:00
Andriy Gapon	593077d613	pmc_process_csw_out: ignore deleted counters I see the fllowing panic on AMD when exiting pmcstat: panic: [pmc,1473] pp_pmcval outside of expected range cpu=2 ri=17 pp_pmcval=fffffffffa529f5b pm_reloadcount=10000 It seems that at least on AMD a performance counter keeps counting after overflowing. When pmcstat exits it sets counters that it used to PMC_STATE_DELETED and waits until their use count goes to zero. amd_intr() wouldn't reload a counter in that state and, thus, a counter would be allowed to overflow. That means that the counter's value would be allowed to go outside the expected range. MFC after: 2 weeks	2016-11-10 11:12:45 +00:00
Andrew Turner	7bc7e3cd65	Don't panic in hwpmc when stopping sampling. When hwpmc stops sampling it will set the pm_state to something other than PMC_STATE_RUNNING. This means the following sequence can happen: CPU 0: Enter the interrupt handler CPU 0: Set the thread TDP_CALLCHAIN pflag CPU 1: Stop sampling CPU 0: Call pmc_process_samples, sampling is stopped so clears ps_nsamples CPU 0: Finishes interrupt processing with the TDP_CALLCHAIN flag set CPU 0: Call pmc_capture_user_callchain to capture the user call chain CPU 0: Find all the pmc sample are free so no call chains need to be captured CPU 0: KASSERT because of this This fixes the issue by checking if any of the samples have been stopped and including this in te KASSERT. PR: 204273 Reviewed by: bz, gnn Obtained from: ABT Systems Ltd Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6581	2016-05-28 13:05:39 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Pedro F. Giffuni	b790c1938d	etc: minor spelling fixes. Mostly comments but also some user-visible strings. MFC after: 2 weeks	2016-05-02 16:47:28 +00:00
Konstantin Belousov	0c8cc7b076	Remove tautological cast. PR: 207068 Submitted by: joss.upton@yahoo.com MFC after: 2 weeks	2016-02-12 07:19:59 +00:00
Konstantin Belousov	db57c70a5b	Rename P_KTHREAD struct proc p_flag to P_KPROC. I left as is an apparent bug in ntoskrnl_var.h:AT_PASSIVE_LEVEL() definition. Suggested by: jhb Sponsored by: The FreeBSD Foundation	2016-02-09 16:30:16 +00:00
Konstantin Belousov	0fb2c5d60c	Do not call vn_fullpath(9) (through the pmc_getfilename() wrapper) when its result is immediately ignored, i.e. for kernel processes forked from the user process. Do not test for non-null before freeing string. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-02-06 15:39:04 +00:00
Jonathan T. Looney	5eaa6f01f5	Improve accuracy of PMC sampling frequency The code tracks a counter which is the number of events until the next sample. On context switch in, it loads the saved counter. On context switch out, it tries to calculate a new saved counter. Problems: 1. The saved counter was shared by all threads in a process. However, this means that all threads would be initially loaded with the same saved counter. However, that could result in sampling more often than once every X number of events. 2. The calculation to determine a new saved counter was backwards. It added when it should have subtracted, and subtracted when it should have added. Assume a single-threaded process with a reload count of 1000 events. Assuming the counter on context switch in was 100 and the counter on context switch out was 50 (meaning the thread has "consumed" 50 more events), the code would calculate a new saved counter of 150 (instead of the proper 50). Fix: 1. As soon as the saved counter is used to initialize a monitor for a thread on context switch in, set the saved counter to the reload count. That way, subsequent threads to use the saved counter will get the full reload count, assuring we sample at least once every X number of events (across all threads). 2. Change the calculation of the saved counter. Due to the change to the saved counter in #1, we simply need to add (modulo the reload count) the remaining counter time we retrieve from the CPU when a thread is context switched out. Differential Revision: https://reviews.freebsd.org/D4122 Approved by: gnn (mentor) MFC after: 1 month Sponsored by: Juniper Networks	2015-11-16 15:22:15 +00:00
Jonathan T. Looney	c66ea2ee5c	Optimizations to the way hwpmc gathers user callchains Changes to the code to gather user stacks: * Delay setting pmc_cpumask until we actually have the stack. * When recording user stack traces, only walk the portion of the ring that should have samples for us. Sponsored by: Juniper Networks Approved by: gnn (mentor) MFC after: 1 month	2015-11-14 01:45:55 +00:00
Jonathan T. Looney	a39249680f	Fix hwpmc "stalled" behavior Currently, there is a single pm_stalled flag that tracks whether a performance monitor was "stalled" due to insufficent ring buffer space for samples. However, because the same performance monitor can run on multiple processes or threads at the same time, a single pm_stalled flag that impacts them all seems insufficient. In particular, you can hit corner cases where the code fails to stop performance monitors during a context switch out, because it thinks the performance monitor is already stopped. However, in reality, it may be that only the monitor running on a different CPU was stalled. This patch attempts to fix that behavior by tracking on a per-CPU basis whether a PM desires to run and whether it is "stalled". This lets the code make better decisions about when to stop PMs and when to try to restart them. Ideally, we should avoid the case where the code fails to stop a PM during a context switch out. Sponsored by: Juniper Networks Reviewed by: jhb Approved by: gnn (mentor) Differential Revision: https://reviews.freebsd.org/D4124	2015-11-14 01:40:12 +00:00

1 2 3

129 Commits