w/ non-zero data, and it turns out we don't... This is really optimized
zero filled on demand, or pages that were already zero'd for us...
MFC after: 3 days
3 columns were wasted at the left, except these columns were used to
make the header line up. Now there is no space on the same line for
the "Proc:" part of the header. Try putting this on the line above
although it clutters that line (there is already similar clutter for
the "Interrupts" header). Leave 1 column between these fields. With
the above and a previous change there is enough of space for this.
Use 5 columns instead of 3 for the number of users since 3 is not quite
enough and there was space to spare. This also fixes an off-by-2 error
in a previous fix forthe column count in the comment on STATROW.
Move all the pager fields 1 to the right so that the "count" and "pages"
descriptors more clearly apply to the pager fields and not the memory
fields. There was space to space.
Waste some of the spare space at the right of the pager fields to expand
all the pager field widths to their old values (but now with a column
between the fields). There are fields more in need of expansion but most
of them are not in places near spare space.
made it unnecessary. (Rev.1.6 had to reduce the field width to 4, and
changed 100.0 and preposterous larger values down to 99.9 since 100.0
wouldn't have fitted. Rev.1.35 handles precentages > 99.9 well enough by
changing the format to %.0f when the string given by the initial format
is too wide.)
Even with this change, during short testing I've never seen a percentage
of 100 being displayed by systat -v, although top(1) displays percentages
of 100 user or 100 idle for similar loads.
Always use snprintf()'s return value, since discarding it is a style
bug at best and using it here gives slightly simpler code and better
error checking. Use snprintf() in putlongdouble() the same as in
putfloat(). (1.25 changed most sprintf()'s to snprintf()'s to fix
non-bugs without changing the logic to use the result of snprintf();
1.27 restored one of the sprintf()s by cloning a stale version of
putfloat().)
Don't print a too-long field in the unlikely case that the fallback
to M units in putint() leaves the field still too long. (The fallback
to printing stars was lost in rev.1.58 when the fallback to M units
was added.)
cannot run into other fields or field descriptors. If the value is
too large to fit in the field width, then the output format is adjusted
so that the value (usually) fits, but with fields running together
externally this adjustment usually didn't help. Mostly it doesn't
matter to lose 1 digit of precision, but switching the output format
is bad if it happens often or gives bogus units. The loss of width
is most serious for fields near "Csw" (which are also the ones which
must often ran together) since these have a high variance and large
values relative to the possible field widths so the switch occurs more
often now, and for the memory size fields where the switch gives the
bogus units kKB or MKB.
Now only the fields for r, p, d, s and w can run into each other.
These fields have width 3, and 3 cannot be reduced to 2 without losing
all precision when the value is between 100 and 999.
Trim "pdwake" to "pdwak" at think time now that it doesn't get clobbered
at runtime. The manpage doesn't need to be changed for this because
it documents the clobbered descriptor, unlike for 4 other too-long
descriptors which only get clobbered if there are lots of interrupt
sources.
Trim "% busy" to "%busy" since most other descriptors for percentages
are spelled without the space and this change makes changing the widths
of the %busy fields unnecessary.
around PUTRATE() because PUTRATE() only looked like a function -- it was
multiple statements. Use "do {...} while(0)" as usual in PUTRATE() so
that it is a single statement that can be used like a function.
large. In most cases it is still 1 too large, so fields tend to run
together, but in the following cases it was more than 1 too large, and
the starting column was too small too, so the field started inside the
previous field or descriptor and clobbered that:
- "wire": the number for this overwrote 2 characters of the number for
"Flt". Reduce the field width by 3 (2 to avoid the overwrite and 1
so that the fields don't run together). This was already done for
the preceding number for "cow".
- "inact": the number for this overwrote 1 character of the descriptor
"Idle". Reducing the field width by 2 is enough.
- "cache:" the number for this overwrote 3 characters of the scale
"...| |". The field width should be reduced by 4 to keep things
from running together, but that is a lot and not so necessary here
since the final "|" in the scale serves as a delimiter. Only reduce
it by 3.
- "free": the number for this overwrote 2 characters of the bar graph.
The character position under the final "|" in the scale is apparently
not used, so reducing the field width by 3 is enough.
When "zfod" is in the main vmstat display:
- use the normal field width of 9 (not 5) for it since there is no shortage
of space. Fix style bugs (excessive {}) in the statement that
conditionally writes it.
Write all reduced field widths for vmstat fields as "9 - <reduction>" as
a hint that we don't want to reduce them.
number in more cases by stealing 2 characters from the count field to
give more space in the descriptor field, but it did the column adjustments
for this strangely using an off-by-2 error in the base column and
compensating off-by-2 errors in 6 offsets from the base column (4 new
errors and 2 from not changing the offsets that actually changed).
Print the "Interrupts" header directly at its offset from the base column
instead of spacing it half using the offset and half by printing a space
character.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
<netinet/tcp_var.h>'s prerequisites. Prerequistes should not grow for
userland headers, and <netinet/tcp_var.h> is unfortunately still needed
in userland.
where MB/s and tps statistics would always be zero, presumably because
they were being averaged out over the time between now and when the
system booted instead of a few seconds.
PR: 58683
Kernel:
Change statistics to use the *uptime() timescale (ie: relative to
boottime) rather than the UTC aligned timescale. This makes the
device statistics code oblivious to clock steps.
Change timestamps to bintime format, they are cheaper.
Remove the "busy_count", and replace it with two counter fields:
"start_count" and "end_count", which are updated in the down and
up paths respectively. This removes the locking constraint on
devstat.
Add a timestamp argument to devstat_start_transaction(), this will
normally be a timestamp set by the *_bio() function in bp->bio_t0.
Use this field to calculate duration of I/O operations.
Add two timestamp arguments to devstat_end_transaction(), one is
the current time, a NULL pointer means "take timestamp yourself",
the other is the timestamp of when this transaction started (see
above).
Change calculation of busy_time to operate on "the salami principle":
Only when we are idle, which we can determine by the start+end
counts being identical, do we update the "busy_from" field in the
down path. In the up path we accumulate the timeslice in busy_time
and update busy_from.
Change the byte_* and num_* fields into two arrays: bytes[] and
operations[].
Userland:
Change the misleading "busy_time" name to be called "snap_time" and
make the time long double since that is what most users need anyway,
fill it using clock_gettime(CLOCK_MONOTONIC) to put it on the same
timescale as the kernel fields.
Change devstat_compute_etime() to operate on struct bintime.
Remove the version 2 legacy interface: the change to bintime makes
compatibility far too expensive.
Fix a bug in systat's "vm" page where boot relative busy times would
be bogus.
Bump __FreeBSD_version to 500107
Review & Collaboration by: ken
ifstat Display the network traffic going through active interfaces
on the system. Idle interfaces will not be displayed until
they receive some traffic.
For each interface being displayed, the current, peak and
total statistics are displayed for incoming and outgoing
traffic. By default, the ifstat display will automatically
scale the units being used so that they are in a human-read-
able format. The scaling units used for the current and peak
traffic columns can be altered by the scale command.
Submitted by: Trent Nelson <trent@arpa.com>
non-default but reasonable values of hz this member overflowed,
breaking NFS over UDP.
Also, as long as I'm plowing up struct sockbuf ... Change certain
members from u_long/long to u_int/int in order to reduce wasted
space on 64-bit machines. This change was requested by Andrew
Gallatin.
Netstat and systat need to be rebuilt. I am incrementing
__FreeBSD_version in case any ports need to change.