when I changed the allocator bits. This implements per-CPU mbtypes
stats by keeping net number of decrements/increments of a given mbtype
per-CPU and then summing all of the per-CPU mbtypes to produce the total
net number of allocated mbufs of the given mbtype.
Counters are carefully balanced to avoid/prevent underflows/overflows.
mbtypes stats are re-enabled with the idea that we may occasionally
(although very rarely) observe slight inconsistencies in the stat
reporting. Most of the time, we should be fine, though.
Also make appropriate modifications to netstat(1) and systat(1) to do
the necessary reporting.
Submitted by: Jiangyi Liu <jyliu@163.net>
were indices in a dense array. The cpuids are a sparse set and treat
them as such, setting up containers only for CPUs activated during
mb_init().
- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
map, in accordance with the above.
This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.
Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:
o Reduce contention for SMP by offering per-CPU pools and locks.
o Better use of data cache due to per-CPU pools.
o Much less code cache pollution due to excessively large allocation macros.
o Framework for `grouping' objects from same page together so as to be able
to possibly free wired-down pages back to the system if they are no longer
needed by the network stacks.
Additional things changed with this addition:
- Moved some mbuf specific declarations and initializations from
sys/conf/param.c into mbuf-specific code where they belong.
- m_getclr() has been renamed to m_get_clrd() because the old name is really
confusing. m_getclr() HAS been preserved though and is defined to the new
name. No tree sweep has been done "to change the interface," as the old
name will continue to be supported and is not depracated. The change was
merely done because m_getclr() sounds too much like "m_get a cluster."
- TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
systat(1) (see TODO below).
- Fixed systat(1) to display number of "free mbufs" based on new per-CPU
stat structures.
- Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
per-CPU stat structures. All infos are fetched via sysctl.
TODO (in order of priority):
- Re-enable mbtypes statistics in both netstat(1) and systat(1) after
introducing an SMP friendly way to collect the mbtypes stats under the
already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
seems too costly for a mere stat update, especially when other locks are
already present).
- Optionally have systat(1) display not only "total free mbufs" but also
"total free mbufs per CPU pool."
- Fix minor length-fetching issues in netstat(1) related to recently
re-enabled option to read mbuf stats from a core file.
- Move reference counters at least for mbuf clusters into an unused portion
of the cluster itself, to save space and need to allocate a counter.
- Look into introducing resource freeing possibly from a kproc.
Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/
and compiler warnings.
The data for network statistics are still obtained via the kvm interface
if systat was started with the needed privileges, otherwise sysctls are
used. The reason for this is that with really many open sockets, the
sysctl method is probably slower, but it systat -netstat is probably not
really usable in either mode under these conditions.
Approved by: rwatson
base system, but not in BruceBSD.
o Fix up style violations of various sorts.
o Remove redundant normalization of hertz variable, as the sysctl handler
does this work (unlike when kread was used).
Submitted by: bde
no longer contains kernel specific data structures, but rather
only scalar values and structures that are already part of the
kernel/user interface, specifically rusage and rtprio. It no
longer contains proc, session, pcred, ucred, procsig, vmspace,
pstats, mtx, sigiolst, klist, callout, pasleep, or mdproc. If
any of these changed in size, ps, w, fstat, gcore, systat, and
top would all stop working. The new structure has over 200 bytes
of unassigned space for future values to be added, yet is nearly
100 bytes smaller per entry than the structure that it replaced.
and numvnodes are longs in the kernel. They should remain longs in systat,
what really needs to change is that they should be using SYSCTL_LONG rather
than SYSCTL_INT. I also changed wantfreevnodes to SYSCTL_LONG because I
happened to notice it.
I wish there was a way to find all of these automatically..
Pointed out by: bde
maxvnodes, numvnodes, freevnodes, nchstats, and numdirtybuffers.
o Make the hw.ncpu error checking code a little more rigorous by
sanity checking the returned data size.
o Didn't fix machine-dependent non-sysctl-exported variables:
intrnames, eintrnames, intrcnt, eintrcnt, as these variables are
defined and exported from machine-dependent kernel code in
assembly. This should probably be fixed somehow.
structure member that doesn't exist anymore.
Use getsysctlbyname for kern.ipc.mbstat instead of sysctl.
Use netstat's method of displaying values from mtnames.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Missed by PR: 19809
track.
The $Id$ line is normally at the bottom of the main comment block in the
man page, separated from the rest of the manpage by an empty comment,
like so;
.\" $Id$
.\"
If the immediately preceding comment is a @(#) format ID marker than the
the $Id$ will line up underneath it with no intervening blank lines.
Otherwise, an additional blank line is inserted.
Approved by: bde