malloc(9) statistics from kernel memory or a kernel coredump, to catch
up with recent changes to adopt per-CPU malloc(9) statistics. The new
routines walk the per-CPU statistics pools and coalesce them for
presentation to the user.
Fixed a nearby bug. The "play it safe" code in dosysctl() was unsafe
because it overran the buffer by 1 if sysctl() filled all of the buffer.
Fixed a nearby style bug in output. Not just 1, but 2 extra newlines
were printed at the end by "vmstat -m" and "vmstat -z". Don't print
any newlines explicitly. This depends on 2 of the many formatting
bugs in the corresponding sysctls. First, the sysctls return an extra
newline at the end of the strings. This also messes up output from
sysctl(8). Second, the sysctls return an extra newline at the beginning
of the strings. This is good for separating the 2 tables output by
"vmstat -mz" and for starting the header on a new line in plain sysctl
output, but gives a bogus extra newline at the beginning for "vm -[m | z]"
and "sysctl -n [kern.malloc | vm.zone]".
Fixed some nearby style bugs in the source code:
- the same line that misspelled 0 as NULL also spelled NULL as 0.
- the size was doubled twice in the realloc loop.
- the "play it safe" comment was misleading. Terminating the buffer
is bogus because dosysctl() is only meant to work with sysctls that
return strings and the terminator is part of a string. However, the
kern.malloc sysctl has more than style bugs. It also doesn't return
a string. Termination is needed to work around this bug.
- Replace overly-complicated (and buggy) -a logic with a much simpler
version: -a causes all interrupts to be displayed, otherwise only
those that have occurred are displayed. This removes the need for
any MD code.
- Instead of just making sure intrcnt is large enough, figure out the
exact size it needs to be. We derive nintr from this number, and we
don't want to risk printing garbage. Note that on sparc64, we end up
printing garbage anyway because the names of non-existent interrupts
are left uninitialized by the kernel.
Tested on: alpha, i386, sparc64
o nintr and inamlen must by of type size_t, not int,
o Remove now unnecessary casts,
o Handle the aflag differently, because the intr. names have a
fixed width and almost always have trailing spaces.
The use of libkvm for post-mortem analysis is still supported (though it
could use more testing). We can now remove vmstat's setgid bit.
While I'm here, hack the interrupt listing code to not display interrupts
that haven't occurred unless the -a option was given on the command line,
and document this change.
Kernel:
Change statistics to use the *uptime() timescale (ie: relative to
boottime) rather than the UTC aligned timescale. This makes the
device statistics code oblivious to clock steps.
Change timestamps to bintime format, they are cheaper.
Remove the "busy_count", and replace it with two counter fields:
"start_count" and "end_count", which are updated in the down and
up paths respectively. This removes the locking constraint on
devstat.
Add a timestamp argument to devstat_start_transaction(), this will
normally be a timestamp set by the *_bio() function in bp->bio_t0.
Use this field to calculate duration of I/O operations.
Add two timestamp arguments to devstat_end_transaction(), one is
the current time, a NULL pointer means "take timestamp yourself",
the other is the timestamp of when this transaction started (see
above).
Change calculation of busy_time to operate on "the salami principle":
Only when we are idle, which we can determine by the start+end
counts being identical, do we update the "busy_from" field in the
down path. In the up path we accumulate the timeslice in busy_time
and update busy_from.
Change the byte_* and num_* fields into two arrays: bytes[] and
operations[].
Userland:
Change the misleading "busy_time" name to be called "snap_time" and
make the time long double since that is what most users need anyway,
fill it using clock_gettime(CLOCK_MONOTONIC) to put it on the same
timescale as the kernel fields.
Change devstat_compute_etime() to operate on struct bintime.
Remove the version 2 legacy interface: the change to bintime makes
compatibility far too expensive.
Fix a bug in systat's "vm" page where boot relative busy times would
be bogus.
Bump __FreeBSD_version to 500107
Review & Collaboration by: ken
Updated the kmemzones logic such that the ks_size bitmap can be used as an
index into it to report the size of the zone used.
Create the kern.malloc sysctl which replaces the kvm mechanism to report
similar data. This will provide an easy place for statistics aggregation if
malloc_type statistics become per cpu data.
Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing
the malloc buckets.
to unsigned long long.
Don't be too overzealous with the printing of ks_calls in the total
statistics, cut back from 20 to 13 positions to print (which should last
a couple of years easily (20 digits is enough for 3168 years of calls at a
measly billion (10^9) calls per second.)).
Submitted by: bde
make sense to me) and change the printf argument from %8ld to %20llu to
accompany the printing of the totals.
Realigned the header printed above it as well.
PR: 32342
Submitted by: ryan beasley <ryanb@goddamnbastard.org>
Reviewed by: jeff, Tim J Robbins
hold a 64bit or 32bit ~0 value, i.e. 20 and 10; this anticipates
soon-to-be machines with Exahertz rtc interrupt frequencies. :-)
PR: bin/16206
Submitted by: John Capo <jc@irbs.com>
MFC after: 1 week
columns confuse the heck out of other apps trying to parse vmstat output
(eg sscope). I made sure we're still <= 80 cols per line.
Fixed warnings about unused vars and printf %format mismatches.
Requested by: Eugene Aleynikov <eugenea@infospace.com>
Reviewed by: joerg (implicitly)
MFC after: 2 weeks
which is slightly less than 4GB. To use a quote from someone who shall
remain nameless "No one will ever need more than 4 GB" :-) But FreeBSD
is prepared if we one day will.
Requested by: Eugene Aleynikov <eugenea@infospace.com>
Submitted by: Maxim Konovalov <maxim@macomnet.ru>
Silence a warning by renaming the 'pgtok' #define to 'vmstat_pgtok' so
as not to conflict with the 'pgtok' #define in sys/param.h
is ultimately silly because no locks are held in user space while traversing
the list via kvm_reads... really, this should use the sysctl interface
which *is* protected by a lock in the kernel.
the magicness of 200. Cleaned up the remaining parts. Circularisation
of the list of malloc types was a kernel bug (now fixed). Interfering
with applications' definitions of pgtok is a system header bug (not
fixed).
changed from a simple list to a circular one. We compensate by only
looping until we see the first address again. Before, things would
terminate because it was limited to 200 iterations. This lead to
bogus statistics and repeating stats for memory types.
This should be merged into 3.2, as the same bug is there.
I'm not sure why we have `mvstat -z'. `sysctl vm.zone' gives more
information. OTOH, `sysctl vm.zone' shouldn't return ASCII data,
and reporting of memory use should be integrated, at least as an
option.
the display wrapped around.
This decreases the default maximum number of disks shown to 2, so things
don't wrap around so easily. Also, it fixes the header display issues.
Submitted by: Bruce Evans <bde@FreeBSD.ORG>
peripheral drivers can determine where in the devstat(9) list they are
inserted.
This requires recompilation of libdevstat, systat, vmstat, rpc.rstatd, and
any ports that depend on the devstat code, since the size of the devstat
structure has changed. The devstat version number has been incremented as
well to reflect the change.
This sorts devices in the devstat list in "more interesting" to "less
interesting" order. So, for instance, da devices are now more important
than floppy drives, and so will appear before floppy drives in the default
output from systat, iostat, vmstat, etc.
The order of devices is, for now, kept in a central table in devicestat.h.
If individual drivers were able to make a meaningful decision on what
priority they should be at attach time, we could consider splitting the
priority information out into the various drivers. For now, though, they
have no way of knowing that, so it's easier to put them in an easy to find
table.
Also, move the checkversion() call in vmstat(8) to a more logical place.
Thanks to Bruce and David O'Brien for suggestions, for reviewing this, and
for putting up with the long time it has taken me to commit it. Bruce did
object somewhat to the central priority table (he would rather the
priorities be distributed in each driver), so his objection is duly noted
here.
Reviewed by: bde, obrien
generation was causing unaligned access faults on the Alpha.
I have incremented the devstat version number, since this is an interface
change. You'll need to recompile libdevstat, systat, iostat, vmstat and
rpc.rstatd along with your kernel.
Partially Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr w0 w1 in sy cs us sy id
1 0 04135184 6016 180 2 1 0 158 135 10 0 386 1820 77 20 6 74
looks like:
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr w0 w1 in sy cs us sy id
1 0 0 4135188 6016 180 2 1 0 158 135 10 0 387 1821 77 20 6 74