more obvious imprecision in the previous top changes.
Specifically, top uses a delta of clock_gettime() calls right after
invoking the kern.proc sysctl to fetch the process/thread list to
compute the time delta between the fetches. However, the kern.proc
sysctl handler does not run in constant time. It can spin on locks,
be preempted by an interrupt handler, etc. As a result, the time
between the gathering of stats for individual processes or threads
between subsequent kern.proc handlers can vary. If a "slow" kern.proc
run is followed by a "fast" kern.proc run, then the threads/processes
at the start of the "slow" run will have a longer time delta than the
threads/processes at the end. If the clock_gettime() time delta is
not itself skewed by preemption, then the delta may be too short for
a given thread/process resulting in a higher percent CPU than actual.
However, there is no good way to calculate the exact amount of overage,
nor to know which threads to subtract the overage from. Instead, just
punt and fix the definitely-wrong case of an individual thread having
more than 100% CPU.
Discussed with: zonk
displays after a pause, use the difference in runtime divided by the
length of the pause as the percentage of CPU used instead of the value
calculated by the kernel. In addition, when determing if a process or
thread is idle or not, treat any process or thread that has used any
runtime or performed any context switches during the interval as busy.
Note that the percent CPU is calculated as a double and stored in an
array to avoid recalculating the value multiple times in the comparison
method used to sort processes in the CPU display.
Tested by: Jamie Landeg-Jones <jamie@dyslexicfish.net>
Reviewed by: emaste (earlier version)
MFC after: 1 week
to 999.99% CPU. It still won't be aligned if you have a multithreaded
process using more than 1000% CPU (e.g. idle process on an idle 12-way
system), but 100% is a common case.
Submitted by: Jeremy Chadwick (partial)
MFC after: 1 week
in the manpage by having it display the current CPU (ki_oncpu) rather
than the previously used CPU (ki_lastcpu). ki_lastcpu is still used for
all other thread states.
Reported by: Chris Ross <cross+freebsd@distal.com>
MFC after: 1 week
cmdlengthdelta is the size of the header and we were using it to
allocate a buffer to store the command line. This would mean that
the cmdbuf could be too short. In practice this was never noticed unless
you usually run top -a. On a stock FreeBSD system you can see the
problem by running sendmail and then running top -a on a big terminal
window. In practice this doubles to size available to cmdbuf since the
header is around 65-68 bytes.
Reviewed by: adrian
usage on hosts using ZFS. The new line displays the total amount of RAM
used by the ARC along with the size of MFU, MRU, anonymous (in flight),
headers, and other (miscellaneous) sub-categories. The line is not
displayed on systems that are not using ZFS.
Reviewed by: avg, fs@
MFC after: 3 days
to the maximum number of CPUs to ensure that lcpustates[] array is always
allocated to the maximum size. Previously, if top was started without
per-CPU stats it would allocate a smaller lcpustates[] array. When
per-CPU stats were then enabled, it would overflow the array and trash
the cpustates_columns[] array causing the CPU stats to be printed in the
wrong locations.
Approved by: re (kib)
MFC after: 1 week
ki_rusage member when KERN_PROC_INC_THREAD is passed to one of the
process sysctls.
- Correctly account for the current thread's cputime in the thread when
doing the runtime fixup in calcru().
- Use TIDs as the key to lookup the previous thread to compute IO stat
deltas in IO mode in top when thread display is enabled.
Reviewed by: kib
Approved by: re (kib)
rather than at the bottom of the manpage.
- Remove an obsolete comment about SWAIT being a stale state. It was
resurrected for a different purpose in FreeBSD 5 to mark idle ithreads.
- Add a comment documenting that the SLEEP and LOCK states typically
display the name of the event being waited on with lock names being
prefixed with an asterisk and sleep event names not having a prefix.
MFC after: 1 week
idle threads). The process is displayed by default (subject to whether or
not system processes are displayed) to preserve existing behavior. The
system idle process can be hidden via the '-z' command line argument or the
'z' key while top is running. When it is hidden, top more closely matches
the behavior of FreeBSD <= 4.x where idle time was not accounted to any
process.
MFC after: 2 weeks
The bug was unnoticed on non-i386 because mp_maxid is
initialized differently, kern.cp_times doesn't print
zeroes for non-existing CPUs, so no "writing outside of
array bounds" happens.
MFC after: 3 days
kthread_add() takes the same parameters as the old kthread_create()
plus a pointer to a process structure, and adds a kernel thread
to that process.
kproc_kthread_add() takes the parameters for kthread_add,
plus a process name and a pointer to a pointer to a process instead of just
a pointer, and if the proc * is NULL, it creates the process to the
specifications required, before adding the thread to it.
All other old kthread_xxx() calls return, but act on (struct thread *)
instead of (struct proc *). One reason to change the name is so that
any old kernel modules that are lying around and expect kthread_create()
to make a process will not just accidentally link.
fix top to show kernel threads by their thread name in -SH mode
add a tdnam formatting option to ps to show thread names.
make all idle threads actual kthreads and put them into their own idled process.
make all interrupt threads kthreads and put them in an interd process
(mainly for aesthetic and accounting reasons)
rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper'
man page fixes to follow.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.
Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)
priorities, etc.) in the NICE field:
Use a combination of pri_native and pri_user instead of pri_level to
guess the original realtime priority. Using pri_level here has been
wrong since 2001/02/12. Using only pri_native here would be correct
if the kernel actually initialized it reasonably. (The kernel exports
its raw td_base_priority as pri_native, but userland mostly wants a
refined base priority). Give up on waiting pri_native to work correctly
and only use it when there is nothing better (for kthreads).
This should reduce printing of bizarre pseudo-nice values. Bizarre
values are still printed if we observe a transient borrowed priority
for a kthread (transient borrowing is the main thing that makes the
raw td_base_priority almost useless in userland), or if there is a
kernel bug. One current kernel bug involves the kernel idprio thread
pagezero permanently changing its priority from PRI_MAX_IDLE (255) to
PUSER (160). Then the bizarre value "ki-6" is printed instead of
"ki31". Here "-6" is PRI_MIN_IDLE - PUSER = -64 truncated to 2
characters. We are observing a transient borrowed priority that has
become permanent due to a bug.
ps/print.c:priorityr() needs similar changes (including ones in stage 2
here).
titles extracted from argv vector instead of the real executable names.
This is useful when you want to watch applications that set their status
information via setproctitle(3).
Approved by: alfred
MFC after: 2 weeks