pmcstat(8) contains an option to output sampling data in a gmon format
compatible with gprof(1). Currently, it uses the default histcounter,
which is an (unsigned short). With large sets of sampling data, it
is possible to overflow the maximum value provided by an (unsigned
short).
This change adds the -e argument to pmcstat. If -e and -g are both
specified, pmcstat will use a histcounter type of uint64_t.
Differential Revision: https://reviews.freebsd.org/D4151
Reviewed by: jhb, bjk
Approved by: gnn (mentor)
MFC after: 1 month
Sponsored by: Juniper Networks
When pmcstat exits after some samples were dropped, give the user an
idea of how many were lost. (Granted, these are global numbers, but
they may still help quantify the scope of the loss.)
Differential Revision: https://reviews.freebsd.org/D4123
Approved by: gnn (mentor)
MFC after: 1 month
Sponsored by: Juniper Networks
Off by default, build behaves normally.
WITH_META_MODE we get auto objdir creation, the ability to
start build from anywhere in the tree.
Still need to add real targets under targets/ to build packages.
Differential Revision: D2796
Reviewed by: brooks imp
- Fetch the root set from cpuset_getaffinity() instead of assuming all CPUs
from 0 to hw.ncpu are the root set.
- Use CPU_SETSIZE and CPU_FFS.
- The original notion of halted CPUs the manpage and code refers to is gone.
Use the term "available" instead.
Differential Revision: https://reviews.freebsd.org/D2491
Reviewed by: emaste
MFC after: 1 week
When examining existing processes pmcstat fails to
correctly determine the locations of executable sections
of the process due to a miscalculated virtual load address.
This does not affect the newly launched processes as the
same value passed as a "start address" to the pmcstat_image_link()
thus nullifying the effect of it. The issue manifests itself
in processes not being reported in the pmcstat(8) output and
"dubious frames" being reported.
Fix it for now by ignoring all the sections except the executable
one. This won't fix the issue for objects with multiple
executable sections but helps in majority of real world usecases.
The real solution would be to modify the MAP-IN event to include
the appropriate load address so pmcstat(8) won't have to manually
parse object files to try to determine it.
PR: 198147, 198148
Reviewed by: jhb, rpaulo
MFC after: 2 weeks
hardcoding /boot/kernel. This allows pmcstat(8) to work without -k when
using nextboot -k or 'boot foo' at the loader to boot alternate kernels.
Differential Revision: https://reviews.freebsd.org/D2425
Reviewed by: adrian, emaste, gnn
MFC after: 2 weeks
Sponsored by: Norse Corp, Inc.
The -a flag reads a file saved by -O, not -o.
The -m flag requires the -R flag. Copy that paragraph from -a.
Reviewed by: adrian
Approved by: kib (mentor)
MFC after: 1 week
Sponsored by: Dell Inc
the -d argument should be passed before -p, -s, -P or -S to be taken in account
Differential Revision: https://reviews.freebsd.org/D1011
Reviewed by: adrian, gnn
MFC after: 1 week
variants. This allows usable file system images (i.e. those with both a
shell and an editor) to be created with only one copy of the curses library.
Exp-run: antoine
PR: 189842
Discussed with: bapt
Sponsored by: DARPA, AFRL
In one case generating callgraph output from a 24MB system-wide sampling
data file took 17.4 seconds on average. Profiling showed pmcstat
spending a lot of time in strcmp, due to hash collisions.
Replacing the XOR-only hash with FNV-1a reduces the run time for my
test by 40%.
'-m <file>' spits out the given stream into <file> (eg, /dev/stdout).
However, it only resolves the first symbol; it doesn't parse the entire
callgraph. If it fails to lookup then it doesn't print anything.
'-a' instead does a symbol and file:line lookup for each address in each
callgraph and will happily print the address itself with no lookup
information if it couldn't look things up.
This makes it much easier to pull out individual records from a
pmc data file and look at the callgraph information without having to
hand-decode the addresses.
Sponsored by: Netflix, Inc.
In addition to adding `static' where possible:
- bin/date: Move `retval' into extern.h to make it visible to date.c.
- bin/ed: Move globally used variables into ed.h.
- sbin/camcontrol: Move `verbose' into camcontrol.h and fix shadow warnings.
- usr.bin/calendar: Remove unneeded variables.
- usr.bin/chat: Make `line' local instead of global.
- usr.bin/elfdump: Comment out unneeded function.
- usr.bin/rlogin: Use _Noreturn instead of __dead2.
- usr.bin/tset: Pull `Ospeed' into extern.h.
- usr.sbin/mfiutil: Put global variables in mfiutil.h.
- usr.sbin/pkg: Remove unused `os_corres'.
- usr.sbin/quotaon, usr.sbin/repquota: Remove unused `qfname'.
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).
Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.
Sponsored by: NETASQ
MFC after: 1 month
In case of multiple level of inlining all the locations are flattened.
Require recent binutils/addr2line (head works or binutils from ports
with the right $PATH order).
- Multiple fixes in the calltree output (recursion case, ...)
- Fix the calltree top view that previously hide some shared nodes.
Tested with Kcachegrind(kdesdk4)/qcachegrind(head).
Sponsored by: NETASQ
error: variable 'current_cpu' set but not used
Approved by: dim, cperciva (mentor, blanket for pre-mentorship already-approved commits)
MFC after: 3 days