Commit Graph

176 Commits

Author SHA1 Message Date
Peter Wemm
4da0d332f4 Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit
being in opt_global.h and forcing a global recompile when only a few files
reference it.

Approved by:  re
2005-06-24 00:16:57 +00:00
Joseph Koshy
36c0fd9d0f Kernel hooks to support PMC sampling modes.
Reviewed by:	alc
2005-05-30 06:29:29 +00:00
Jeff Roberson
85da7a569b - Define KTR points for KTR_SCHED. 2004-12-26 00:14:21 +00:00
John Baldwin
78c85e8dfc Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  It also now only locks sched_lock internally while
  doing the rux_runtime fixup.  calcru() now only requires the caller to
  hold the proc lock and calcru1() only requires the proc lock internally.
  calcru() also no longer allows callers to ask for an interrupt timeval
  since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
  proctree lock is used to protect the process group rather than the process
  group lock.  By holding this lock until the end of the function we now
  ensure that the process/thread that we pick to dump info about will no
  longer vanish while we are trying to output its info to the console.

Submitted by:	bde (mostly)
MFC after:	1 month
2004-10-05 18:51:11 +00:00
Marcel Moolenaar
2d50560abc Update for the KDB framework:
o  Make debugging code conditional upon KDB instead of DDB.
o  Call kdb_enter() instead of Debugger().
o  Call kdb_backtrace() instead of db_print_backtrace() or backtrace().

kern_mutex.c:
o  Replace checks for db_active with checks for kdb_active and make
   them unconditional.

kern_shutdown.c:
o  s/DDB_UNATTENDED/KDB_UNATTENDED/g
o  s/DDB_TRACE/KDB_TRACE/g
o  Save the TID of the thread doing the kernel dump so the debugger
   knows which thread to select as the current when debugging the
   kernel core file.
o  Clear kdb_active instead of db_active and do so unconditionally.
o  Remove backtrace() implementation.

kern_synch.c:
o  Call kdb_reenter() instead of db_error().
2004-07-10 21:36:01 +00:00
John Baldwin
16f9f20579 - Assert that any process that has statclock called on it has both a
stats structure and a vmspace as this should always be true rather
  than checking the always true condition in an if statement.
- Remove never-false check: if ((ru = &pstats->p_ru) != NULL)
- Remove pstats variable that is only used once and inline its one use
  instead.
2004-07-02 03:48:09 +00:00
Julian Elischer
fa88511615 Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.
2004-06-16 00:26:31 +00:00
Tim J. Robbins
e4e815db72 Remove a redundant "td = curthread" statement from profclock(). 2004-06-02 12:05:06 +00:00
Colin Percival
b62b230461 Fix a race condition which could result in profprocs being decremented
more than once if stopprofclock is called multiple times on the same
process.
2004-05-03 00:48:11 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Poul-Henning Kamp
3d6e5ccb06 Make sure to disable the watchdog if we cannot honour the timeout. 2004-02-28 22:01:19 +00:00
Poul-Henning Kamp
4103b7652d Rename the WATCHDOG option to SW_WATCHDOG and make it use the
generic watchdoc(9) interface.

Make watchdogd(8) perform as watchdog(8) as well, and make it
possible to specify a check command to run, timeout and sleep
periods.

Update watchdog(4) to talk about the generic interface and add
new watchdog(8) page.
2004-02-28 20:56:35 +00:00
Peter Wemm
a89ec05e3e Catch a few places where NULL (pointer) was used where 0 (integer) was
expected.
2003-12-23 02:36:43 +00:00
Jeff Roberson
7cf90fb376 - Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td
argument rather than a kse.
2003-10-16 08:39:15 +00:00
Sean Kelly
6cda41555b Fix this to build on alpha. Build test successful.
Suggested fix from:	tjr
2003-06-27 08:35:05 +00:00
Sean Kelly
370c3cb57c - Add a software watchdog facility.
This commit has two pieces. One half is the watchdog kernel code which lives
primarily in hardclock() in sys/kern/kern_clock.c. The other half is a userland
daemon which, when run, will keep the watchdog from firing while the userland
is intact and functioning.

Approved by:	jeff (mentor)
2003-06-26 09:50:52 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
John Baldwin
9752f794c7 - Move PS_PROFIL and its new cousin PS_STOPPROF back over to p_flag and
rename them appropriately.  Protect both flags with both the proc lock
  and the sched_lock.
- Protect p_profthreads with the proc lock.
- Remove Giant from profil(2).
2003-04-22 20:54:04 +00:00
Jeff Roberson
f6f230febe - Adjust sched hooks for fork and exec to take processes as arguments instead
of ksegs since they primarily operation on processes.
 - KSEs take ticks so pass the kse through sched_clock().
 - Add a sched_class() routine that adjusts a ksegrp pri class.
 - Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp}
   that will be used to tell the scheduler about new instances of these
   structures within the same process.  These will be used by THR and KSE.
 - Change sched_4bsd to reflect this API update.
2003-04-11 03:39:07 +00:00
Jonathan Lemon
1cafed3941 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
Julian Elischer
ac2e415327 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
David Xu
768298d8c4 Remove a never true condition. 2003-02-25 05:14:18 +00:00
Julian Elischer
4a338afd7a Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by:	 parts by davidxu@
Reviewed by:	jeff@ mini@
2003-02-17 09:55:10 +00:00
Jeff Roberson
5215b1872f - Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers.  This greatly simplifies much the
   KSE code.

Submitted by:	davidxu
2003-02-17 05:14:26 +00:00
Jeff Roberson
e4625663c9 - Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc.  These counters are only examined through calcru.

Submitted by:	davidxu
Tested on:	x86, alpha, UP/SMP
2003-02-17 02:19:58 +00:00
Poul-Henning Kamp
f341ca9891 Remove #include <sys/dkstat.h> 2003-02-16 14:13:23 +00:00
Poul-Henning Kamp
3abd4ccf87 Move the tty related statistics counters to live with the tty code. 2003-02-16 13:22:15 +00:00
Julian Elischer
a282253a29 A little infrastructure, preceding some upcoming changes
to the profiling and statistics code.

Submitted by:	DavidXu@
Reviewed by:	peter@
2003-02-08 02:58:16 +00:00
Jake Burkholder
238dd3209a Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway.  This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by:	jhb, tmm
Tested on:	i386, sparc64
2003-02-03 17:53:15 +00:00
Julian Elischer
6f8132a867 Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.
2003-02-01 12:17:09 +00:00
David Xu
0dbb100b9b Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00
David Xu
8798d4f9c8 1. Support versioning and wall clock in kse mailbox,
also add rusage time in thread mailbox.
2. Minor change for thread limit code in thread_user_enter(),
   fix typo in kse_release() last I committed.

Reviewed by: deischen, mini
2002-11-18 01:59:31 +00:00
Jeff Roberson
b43179fbe8 - Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
 - Replace direct manipulation of scheduler data with hooks provided by the
   new api.
 - Remove KSE specific state modifications and single runq assumptions from
   kern_switch.c

Reviewed by:	-arch
2002-10-12 05:32:24 +00:00
Poul-Henning Kamp
e7fa55af89 Give up on calling tc_ticktock() from a timeout, we have timeout
functions which run for several milliseconds at a time and getting
in queue behind one or more of those makes us miss our rewind.

Instead call it from hardclock() like we used to do, but retain the
prescaler so we still cope with high HZ values.
2002-09-04 10:15:19 +00:00
Bruce Evans
a9a0f15a69 Fixed breakage of binary compatibility of the kern.clockrate sysctl in
sys/time.h rev.1.53, etc.  Zero out the entire struct clkinfo and not
just the new spare part of it so that there is no possibility of leaking
kernel stack context to userland.
2002-05-05 04:33:09 +00:00
Poul-Henning Kamp
9e1b5510c3 Move the winding of timecounters out of hardclock and into a normal
timeout loop.

Limit the rate at which we wind the timecounters to approx 1000 Hz.

This limits the precision of the get{bin,nano,micro}[up]time(9)
functions to roughly a millisecond.
2002-04-26 12:37:36 +00:00
Poul-Henning Kamp
b35c8f287d Take the "tickadj" element out of struct clockinfo. Our adjtime(2)
implementation is being changed and the very concept of tickadj will
no longer be meaningful.
2002-04-15 12:11:06 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Luigi Rizzo
daccb6386b MFS: synchronize the code with the version in -stable, specifically:
+ SYSCTL_ULONG -> SYSCTL_UINT
 + some procedure renaming and variable rearrangement
 + fix the 'interface going deaf' problem same as in -stable.
2002-02-11 23:56:18 +00:00
John Baldwin
c86b6ff551 Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe.  Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer.  This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs.  Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called.  (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha.  I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine.  PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken.  Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by:	peter
Tested on:	i386, alpha
2002-01-05 08:47:13 +00:00
Jake Burkholder
c9f4877d7c Change traces in hardclock and statclock to use the KTR_CLK trace
facility, rather than KTR_INTR.
2001-12-29 08:39:57 +00:00
Luigi Rizzo
af1408e33f Add/correct description for some sysctl variables where it was missing.
The description field is unused in -stable, so the MFC there is equivalent
to a comment. It can be done at any time, i am just setting a reminder
in 45 days when hopefully we are past 4.5-release.

MFC after: 45 days
2001-12-16 16:07:20 +00:00
Luigi Rizzo
e4fc250c15 Device Polling code for -current.
Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

        options DEVICE_POLLING

and at runtime enable polling with

        sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

        sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

	http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
        new file kern/kern_poll.c
sys/conf/options.i386
        new option
sys/i386/i386/trap.c
        poll in trap (disabled by default)
sys/kern/kern_clock.c
        initialization and hardclock hooks.
sys/kern/kern_intr.c
        minor swi_net changes
sys/kern/kern_poll.c
        the bulk of the code.
sys/net/if.h
        new flag
sys/net/if_var.h
        declaration for functions used in device drivers.
sys/net/netisr.h
        NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
        device driver modifications
2001-12-14 17:56:12 +00:00
John Baldwin
21a7a9aeb6 Use MTX_QUIET for the lock operations during clock interrupts so their logs
don't drown out more useful log messages.
2001-11-15 19:54:48 +00:00
John Baldwin
61d80e90a9 Add missing includes of sys/ktr.h. 2001-10-11 17:53:43 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
John Baldwin
688ebe120c - Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel.  The ast() function now loops as long
  as the PS_ASTPENDING or PS_NEEDRESCHED flags are set.  It returns with
  preemption disabled so that any further AST's that arrive via an
  interrupt will be delayed until the low-level MD code returns to user
  mode.
- Use u_int's to store the tick counts for profiling purposes so that we
  do not need sched_lock just to read p_sticks.  This also closes a
  problem where the call to addupc_task() could screw up the arithmetic
  due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
  clear_resched(), and resched_wanted() in favor of direct bit operations
  on p_sflag.
- Fix up locking with sched_lock some.  In addupc_intr(), use sched_lock
  to ensure pr_addr and pr_ticks are updated atomically with setting
  PS_OWEUPC.  In ast() we clear pr_ticks atomically with clearing
  PS_OWEUPC.  We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by:	bde (mostly)
2001-08-10 22:53:32 +00:00
John Baldwin
c9c1406f76 Add KTR_INTR tracepoints for when clock interrupts are triggered. 2001-08-03 20:54:41 +00:00