Commit Graph

238 Commits

Author SHA1 Message Date
John Baldwin
964b557211 Trim trailing whitespace. 2006-04-17 20:14:51 +00:00
Poul-Henning Kamp
e8444a7e6f CPU time accounting speedup (step 2)
Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru.  While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64
2006-02-11 09:33:07 +00:00
Poul-Henning Kamp
eb2da9a51f Simplify system time accounting for profiling.
Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly.  Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret().  Use the absolute value of
td->pticks in userret() and eliminate third argument.
2006-02-08 08:09:17 +00:00
John Baldwin
b439e431bf Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly.  In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures.  Basically,
  all the archs were taking a trapframe and converting it into a clockframe
  one way or another.  Now they can just extract the PC and usermode values
  directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
  accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
  the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
  timecounter, call hardclock() directly.  This removes an extra
  conditional check from every clock interrupt on Alpha on the BSP.
  There is probably room for even further pruning here by changing Alpha
  to use the simplified timecounter we use on x86 with the lapic timer
  since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
  is false, so add a KASSERT() to that affect and remove a condition
  to slightly optimize the non-lapic case.
- Change prototypeof  arm_handler_execute() so that it's first arg is a
  trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on:	alpha, amd64, arm, i386, ia64, sparc64
Reviewed by:	bde (mostly)
2005-12-22 22:16:09 +00:00
Nate Lawson
bd6b217753 Remove the KTR for hardclock completely. It seems to not be useful.
Requested by:	jhb
2005-12-18 18:11:55 +00:00
Nate Lawson
8615fd8696 Clean up unused or poorly utilized KTR values. Remove KTR_FS, KTR_KGDB,
and KTR_IO as they were never used.  Remove KTR_CLK since it was only
used for hardclock firing and use KTR_INTR there instead.  Remove
KTR_CRITICAL since it was only used for crit enter/exit and use
KTR_CONTENTION instead.
2005-12-17 03:57:10 +00:00
John Baldwin
5c8b444153 - Use uintfptr_t rather than int for the kernel profiling index (though it
really should be a fptrdiff_t if we had that) in profclock().
- Don't try to profile kernel pc's that are >= the kernel lowpc to avoid
  underflows when computing a profiling index.
- Use the PC_TO_I() macro to compute the kernel profiling index rather than
  doing it inline.

Discussed with:	bde
2005-12-16 22:11:52 +00:00
Ed Maste
5d89e1d0af In watchdog_config enable the software watchdog iff the WD_ACTIVE flag is
set.  When watchdogd(1) is terminated intentionally it clears the bit,
which should then disable it in the kernel.

PR:		kern/74386
Submitted by:	Alex Hoff <ahoff at sandvine dot com>
Approved by:	phk, rwatson (mentor)
2005-10-27 17:22:47 +00:00
John Baldwin
e0f66ef861 Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces.  struct intr_event holds the list
  of interrupt handlers associated with interrupt sources.
  struct intr_thread contains the data relative to an interrupt thread.
  Currently we still provide a 1:1 relationship of events to threads
  with the exception that events only have an associated thread if there
  is at least one threaded interrupt handler attached to the event.  This
  means that on x86 we no longer have 4 bazillion interrupt threads with
  no handlers.  It also means that interrupt events with only INTR_FAST
  handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
  intr_foo naming convention.  This did require renaming the powerpc
  MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
  powerpc.  This means that multiple INTR_FAST handlers can attach to the
  same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
  to the same interrupt.  Sharing INTR_FAST handlers may not always be
  desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
  either.  Drivers can always still use INTR_EXCL to ask for an interrupt
  exclusively.  The way this sharing works is that when an interrupt
  comes in, all the INTR_FAST handlers are executed first, and if any
  threaded handlers exist, the interrupt thread is scheduled afterwards.
  This type of layout also makes it possible to investigate using interrupt
  filters ala OS X where the filter determines whether or not its companion
  threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
  is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
  dumping their state.  It also has a '/v' verbose switch which dumps
  info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
  handler is removed because it would suck for things like ppbus(8)'s
  braindead behavior.  The code is present, though, it is just under
  #if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
  event into a separate function so that ithread_loop() becomes more
  readable.  Previously this code was all in the middle of ithread_loop()
  and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
  with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
  curthread's proc against idlethread's proc. (Not really related to intr
  changes)

Tested on:	alpha, amd64, i386, sparc64
Tested on:	arm, ia64 (older version of patch by cognet and marcel)
2005-10-25 19:48:48 +00:00
Gleb Smirnoff
f0796cd26c - Don't pollute opt_global.h with DEVICE_POLLING and introduce
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
  can be compiled as loadable modules.

Reviewed by:	bde
2005-10-05 10:09:17 +00:00
Paul Saab
cff2e749e2 Use SCTL_MASK32 to determine that the sysctl call is from a 32bit
binary for kern.cp_time.

Approved by:	re
2005-06-30 17:17:29 +00:00
Peter Wemm
62919d788b Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
	procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
	and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
	sscanf fails.  They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets.  Note
	that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet.  We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by:	re
2005-06-30 07:49:22 +00:00
Peter Wemm
4da0d332f4 Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit
being in opt_global.h and forcing a global recompile when only a few files
reference it.

Approved by:  re
2005-06-24 00:16:57 +00:00
Joseph Koshy
36c0fd9d0f Kernel hooks to support PMC sampling modes.
Reviewed by:	alc
2005-05-30 06:29:29 +00:00
Jeff Roberson
85da7a569b - Define KTR points for KTR_SCHED. 2004-12-26 00:14:21 +00:00
John Baldwin
78c85e8dfc Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  It also now only locks sched_lock internally while
  doing the rux_runtime fixup.  calcru() now only requires the caller to
  hold the proc lock and calcru1() only requires the proc lock internally.
  calcru() also no longer allows callers to ask for an interrupt timeval
  since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
  proctree lock is used to protect the process group rather than the process
  group lock.  By holding this lock until the end of the function we now
  ensure that the process/thread that we pick to dump info about will no
  longer vanish while we are trying to output its info to the console.

Submitted by:	bde (mostly)
MFC after:	1 month
2004-10-05 18:51:11 +00:00
Marcel Moolenaar
2d50560abc Update for the KDB framework:
o  Make debugging code conditional upon KDB instead of DDB.
o  Call kdb_enter() instead of Debugger().
o  Call kdb_backtrace() instead of db_print_backtrace() or backtrace().

kern_mutex.c:
o  Replace checks for db_active with checks for kdb_active and make
   them unconditional.

kern_shutdown.c:
o  s/DDB_UNATTENDED/KDB_UNATTENDED/g
o  s/DDB_TRACE/KDB_TRACE/g
o  Save the TID of the thread doing the kernel dump so the debugger
   knows which thread to select as the current when debugging the
   kernel core file.
o  Clear kdb_active instead of db_active and do so unconditionally.
o  Remove backtrace() implementation.

kern_synch.c:
o  Call kdb_reenter() instead of db_error().
2004-07-10 21:36:01 +00:00
John Baldwin
16f9f20579 - Assert that any process that has statclock called on it has both a
stats structure and a vmspace as this should always be true rather
  than checking the always true condition in an if statement.
- Remove never-false check: if ((ru = &pstats->p_ru) != NULL)
- Remove pstats variable that is only used once and inline its one use
  instead.
2004-07-02 03:48:09 +00:00
Julian Elischer
fa88511615 Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.
2004-06-16 00:26:31 +00:00
Tim J. Robbins
e4e815db72 Remove a redundant "td = curthread" statement from profclock(). 2004-06-02 12:05:06 +00:00
Colin Percival
b62b230461 Fix a race condition which could result in profprocs being decremented
more than once if stopprofclock is called multiple times on the same
process.
2004-05-03 00:48:11 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Poul-Henning Kamp
3d6e5ccb06 Make sure to disable the watchdog if we cannot honour the timeout. 2004-02-28 22:01:19 +00:00
Poul-Henning Kamp
4103b7652d Rename the WATCHDOG option to SW_WATCHDOG and make it use the
generic watchdoc(9) interface.

Make watchdogd(8) perform as watchdog(8) as well, and make it
possible to specify a check command to run, timeout and sleep
periods.

Update watchdog(4) to talk about the generic interface and add
new watchdog(8) page.
2004-02-28 20:56:35 +00:00
Peter Wemm
a89ec05e3e Catch a few places where NULL (pointer) was used where 0 (integer) was
expected.
2003-12-23 02:36:43 +00:00
Jeff Roberson
7cf90fb376 - Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td
argument rather than a kse.
2003-10-16 08:39:15 +00:00
Sean Kelly
6cda41555b Fix this to build on alpha. Build test successful.
Suggested fix from:	tjr
2003-06-27 08:35:05 +00:00
Sean Kelly
370c3cb57c - Add a software watchdog facility.
This commit has two pieces. One half is the watchdog kernel code which lives
primarily in hardclock() in sys/kern/kern_clock.c. The other half is a userland
daemon which, when run, will keep the watchdog from firing while the userland
is intact and functioning.

Approved by:	jeff (mentor)
2003-06-26 09:50:52 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
John Baldwin
9752f794c7 - Move PS_PROFIL and its new cousin PS_STOPPROF back over to p_flag and
rename them appropriately.  Protect both flags with both the proc lock
  and the sched_lock.
- Protect p_profthreads with the proc lock.
- Remove Giant from profil(2).
2003-04-22 20:54:04 +00:00
Jeff Roberson
f6f230febe - Adjust sched hooks for fork and exec to take processes as arguments instead
of ksegs since they primarily operation on processes.
 - KSEs take ticks so pass the kse through sched_clock().
 - Add a sched_class() routine that adjusts a ksegrp pri class.
 - Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp}
   that will be used to tell the scheduler about new instances of these
   structures within the same process.  These will be used by THR and KSE.
 - Change sched_4bsd to reflect this API update.
2003-04-11 03:39:07 +00:00
Jonathan Lemon
1cafed3941 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
Julian Elischer
ac2e415327 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
David Xu
768298d8c4 Remove a never true condition. 2003-02-25 05:14:18 +00:00
Julian Elischer
4a338afd7a Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by:	 parts by davidxu@
Reviewed by:	jeff@ mini@
2003-02-17 09:55:10 +00:00
Jeff Roberson
5215b1872f - Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers.  This greatly simplifies much the
   KSE code.

Submitted by:	davidxu
2003-02-17 05:14:26 +00:00
Jeff Roberson
e4625663c9 - Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc.  These counters are only examined through calcru.

Submitted by:	davidxu
Tested on:	x86, alpha, UP/SMP
2003-02-17 02:19:58 +00:00
Poul-Henning Kamp
f341ca9891 Remove #include <sys/dkstat.h> 2003-02-16 14:13:23 +00:00
Poul-Henning Kamp
3abd4ccf87 Move the tty related statistics counters to live with the tty code. 2003-02-16 13:22:15 +00:00
Julian Elischer
a282253a29 A little infrastructure, preceding some upcoming changes
to the profiling and statistics code.

Submitted by:	DavidXu@
Reviewed by:	peter@
2003-02-08 02:58:16 +00:00
Jake Burkholder
238dd3209a Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway.  This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by:	jhb, tmm
Tested on:	i386, sparc64
2003-02-03 17:53:15 +00:00
Julian Elischer
6f8132a867 Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.
2003-02-01 12:17:09 +00:00
David Xu
0dbb100b9b Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00
David Xu
8798d4f9c8 1. Support versioning and wall clock in kse mailbox,
also add rusage time in thread mailbox.
2. Minor change for thread limit code in thread_user_enter(),
   fix typo in kse_release() last I committed.

Reviewed by: deischen, mini
2002-11-18 01:59:31 +00:00
Jeff Roberson
b43179fbe8 - Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
 - Replace direct manipulation of scheduler data with hooks provided by the
   new api.
 - Remove KSE specific state modifications and single runq assumptions from
   kern_switch.c

Reviewed by:	-arch
2002-10-12 05:32:24 +00:00
Poul-Henning Kamp
e7fa55af89 Give up on calling tc_ticktock() from a timeout, we have timeout
functions which run for several milliseconds at a time and getting
in queue behind one or more of those makes us miss our rewind.

Instead call it from hardclock() like we used to do, but retain the
prescaler so we still cope with high HZ values.
2002-09-04 10:15:19 +00:00
Bruce Evans
a9a0f15a69 Fixed breakage of binary compatibility of the kern.clockrate sysctl in
sys/time.h rev.1.53, etc.  Zero out the entire struct clkinfo and not
just the new spare part of it so that there is no possibility of leaking
kernel stack context to userland.
2002-05-05 04:33:09 +00:00
Poul-Henning Kamp
9e1b5510c3 Move the winding of timecounters out of hardclock and into a normal
timeout loop.

Limit the rate at which we wind the timecounters to approx 1000 Hz.

This limits the precision of the get{bin,nano,micro}[up]time(9)
functions to roughly a millisecond.
2002-04-26 12:37:36 +00:00
Poul-Henning Kamp
b35c8f287d Take the "tickadj" element out of struct clockinfo. Our adjtime(2)
implementation is being changed and the very concept of tickadj will
no longer be meaningful.
2002-04-15 12:11:06 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Luigi Rizzo
daccb6386b MFS: synchronize the code with the version in -stable, specifically:
+ SYSCTL_ULONG -> SYSCTL_UINT
 + some procedure renaming and variable rearrangement
 + fix the 'interface going deaf' problem same as in -stable.
2002-02-11 23:56:18 +00:00
John Baldwin
c86b6ff551 Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe.  Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer.  This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs.  Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called.  (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha.  I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine.  PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken.  Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by:	peter
Tested on:	i386, alpha
2002-01-05 08:47:13 +00:00
Jake Burkholder
c9f4877d7c Change traces in hardclock and statclock to use the KTR_CLK trace
facility, rather than KTR_INTR.
2001-12-29 08:39:57 +00:00
Luigi Rizzo
af1408e33f Add/correct description for some sysctl variables where it was missing.
The description field is unused in -stable, so the MFC there is equivalent
to a comment. It can be done at any time, i am just setting a reminder
in 45 days when hopefully we are past 4.5-release.

MFC after: 45 days
2001-12-16 16:07:20 +00:00
Luigi Rizzo
e4fc250c15 Device Polling code for -current.
Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

        options DEVICE_POLLING

and at runtime enable polling with

        sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

        sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

	http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
        new file kern/kern_poll.c
sys/conf/options.i386
        new option
sys/i386/i386/trap.c
        poll in trap (disabled by default)
sys/kern/kern_clock.c
        initialization and hardclock hooks.
sys/kern/kern_intr.c
        minor swi_net changes
sys/kern/kern_poll.c
        the bulk of the code.
sys/net/if.h
        new flag
sys/net/if_var.h
        declaration for functions used in device drivers.
sys/net/netisr.h
        NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
        device driver modifications
2001-12-14 17:56:12 +00:00
John Baldwin
21a7a9aeb6 Use MTX_QUIET for the lock operations during clock interrupts so their logs
don't drown out more useful log messages.
2001-11-15 19:54:48 +00:00
John Baldwin
61d80e90a9 Add missing includes of sys/ktr.h. 2001-10-11 17:53:43 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
John Baldwin
688ebe120c - Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel.  The ast() function now loops as long
  as the PS_ASTPENDING or PS_NEEDRESCHED flags are set.  It returns with
  preemption disabled so that any further AST's that arrive via an
  interrupt will be delayed until the low-level MD code returns to user
  mode.
- Use u_int's to store the tick counts for profiling purposes so that we
  do not need sched_lock just to read p_sticks.  This also closes a
  problem where the call to addupc_task() could screw up the arithmetic
  due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
  clear_resched(), and resched_wanted() in favor of direct bit operations
  on p_sflag.
- Fix up locking with sched_lock some.  In addupc_intr(), use sched_lock
  to ensure pr_addr and pr_ticks are updated atomically with setting
  PS_OWEUPC.  In ast() we clear pr_ticks atomically with clearing
  PS_OWEUPC.  We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by:	bde (mostly)
2001-08-10 22:53:32 +00:00
John Baldwin
c9c1406f76 Add KTR_INTR tracepoints for when clock interrupts are triggered. 2001-08-03 20:54:41 +00:00
John Baldwin
8bd57f8fc2 Remove unneeded includes of sys/ipl.h and machine/ipl.h. 2001-05-15 23:22:29 +00:00
John Baldwin
6caa8a1501 Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
  into hardclock_process() and statclock_process() respectively.  hardclock()
  and statclock() call the *_process() functions for the current process so
  that UP systems will run as before.  For SMP systems, it is simply necessary
  to ensure that all other processors execute the *_process() functions when the
  main clock functions are triggered on one CPU by an interrupt.  For the alpha
  4100, clock interrupts are delievered in a staggered broadcast fashion, so
  we simply call hardclock/statclock on the boot CPU and call the *_process()
  functions on the secondaries.  For x86, we call statclock and hardclock as
  usual and then call forward_hardclock/statclock in the MD code to send an IPI
  to cause the AP's to execute forwared_hardclock/statclock which then call the
  *_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
  involve less hackery.  Now the cpu doing the forward sets any flags, etc. and
  sends a very simple IPI_AST to the other cpu(s).  AST IPIs now just basically
  return so that they can execute ast() and don't bother with setting the
  astpending or needresched flags themselves.  This also removes the loop in
  forward_signal() as sched_lock closes the race condition that the loop worked
  around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
  a process to act on rather than assuming curproc so that they can be used to
  implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
  header sys/smp.h declares MI SMP variables and API's.   The IPI API's from
  machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
  SLIST of globaldata structures has become MI and moved into subr_smp.c.
  Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by:	jake, peter
Looked over by:	eivind
2001-04-27 19:28:25 +00:00
John Baldwin
e3ee8974e3 Fix an old bug related to BETTER_CLOCK. Call forward_*clock if SMP
and __i386__ are defined rather than if SMP and BETTER_CLOCK are defined.
The removal of BETTER_CLOCK would have broken this except that kern_clock.c
doesn't include <machine/smptests.h>, so it doesn't see the definition of
BETTER_CLOCK, and forward_*clock aren't called, even on 4.x.  This seems to
fix the problem where a n-way SMP system would see 100 * n clk interrupts
and 128 * n rtc interrupts.
2001-04-17 17:53:36 +00:00
John Baldwin
f34fa851e0 Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
2001-03-28 09:17:56 +00:00
Bruce Evans
866546105a Changed the aston() family to operate on a specified process instead of
always on curproc.  This is needed to implement signal delivery properly
(see a future log message for kern_sig.c).

Debogotified the definition of aston().  aston() was defined in terms
of signotify() (perhaps because only the latter already operated on
a specified process), but aston() is the primitive.

Similar changes are needed in the ia64 versions of cpu.h and trap.c.
I didn't make them because the ia64 is missing the prerequisite changes
to make astpending and need_resched per-process and those changes are
too large to make without testing.
2001-02-19 04:15:59 +00:00
John Baldwin
062d8ff5a0 - Catch up to the new swi API changes:
- Use swi_* function names.
  - Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
  lock to minimize diffs with cy(4).
2001-02-09 17:46:35 +00:00
Bosko Milekic
9ed346bab0 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
John Baldwin
01cd094c32 - Proc locking.
- P_FOO -> PS_FOO.
2001-01-24 10:43:25 +00:00
Jake Burkholder
ef73ae4b0c Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.
2001-01-10 04:43:51 +00:00
Robert Watson
7f112b0489 o Export cp_time ("CPU time statistics") using SYSCTL_OPAQUE.
This removes a reason that systat requires setgid kmem.  More to
  come.
2000-11-20 00:44:58 +00:00
Jake Burkholder
fa2fbc3dac - Protect the callout wheel with a separate spin mutex, callout_lock.
- Use the mutex in hardclock to ensure no races between it and
  softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
  CALLOUT_MPSAFE, which specifies that a callout handler does not
  need giant.  There is still no way to set this flag when
  regstering a callout.

Reviewed by:	-smp@, jlemon
2000-11-19 06:02:32 +00:00
John Baldwin
8088699f79 - Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt.  Roughly, what used to be a bit in spending
  now maps to a swi thread.  Each thread can have multiple handlers, just
  like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
  software interrupt thread to run, so spending, NSWI, and the shandlers
  array are no longer needed.  We can now have an arbitrary number of
  software interrupt threads.  When you register a software interrupt
  thread via sinthand_add(), you get back a struct intrhand that you pass
  to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
  more intuitive.  Also, prefix all the members of struct intrhand with
  'ih_'.
- Make swi_net() a MI function since there is now no point in it being
  MD.

Submitted by:	cp
2000-10-25 05:19:40 +00:00
John Baldwin
35e0e5b311 Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h
2000-10-20 07:58:15 +00:00
John Baldwin
6c56727456 - Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's.  This is necessary for the
  clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
  This is needed because both hardclock() and statclock() need to run in
  the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
  during the clock interrupt.  Instead, use two p_flag's in the proc struct
  to mark the current process as having a pending SIGVTALRM or a SIGPROF
  and let them be delivered during ast() when hardclock() has finished
  running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways.  It was
  broken on the x86 if it was turned on since cpl is gone.  It's only use
  was to bogusly run softclock() directly during hardclock() rather than
  scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
  mutex.  Since the spin mutex already handles disabling/restoring
  interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
  temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
  signals in hardclock() that are to be delivered in ast().

Submitted by:	jakeb (making statclock safe in a fast interrupt)
Submitted by:	cp (concept of delaying signals until ast())
2000-10-06 02:20:21 +00:00
John Baldwin
77044cb6d9 Clean up process accounting some more. Unfortunately, it is still not
quite right on i386 as the CPU who runs statclock() doesn't have a valid
clockframe to calculate statistics with.
2000-09-12 18:57:59 +00:00
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Poul-Henning Kamp
77978ab8bc Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by:	bde
2000-07-04 11:25:35 +00:00
Poul-Henning Kamp
82d9ae4e32 Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

        -sysctl_vm_zone SYSCTL_HANDLER_ARGS
        +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
2000-07-03 09:35:31 +00:00
Poul-Henning Kamp
3389ae9350 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
Matthew Dillon
db6a426158 The SMP cleanup commit broke UP compiles. Make UP compiles work again. 2000-03-28 18:06:49 +00:00
Poul-Henning Kamp
91266b96c4 Isolate the Timecounter internals in their own two files.
Make the public interface more systematically named.

Remove the alternate method, it doesn't do any good, only ruins performance.

Add counters to profile the usage of the 8 access functions.

Apply the beer-ware to my code.

The weird +/- counts are caused by two repocopies behind the scenes:
	kern/kern_clock.c -> kern/kern_tc.c
	sys/time.h -> sys/timetc.h
(thanks peter!)
2000-03-20 14:09:06 +00:00
Poul-Henning Kamp
f1220da49c Fix sign reversal in adjtime(2).
Approved by:	jkh
2000-02-13 10:56:32 +00:00
Poul-Henning Kamp
2ac0d0a1f7 Make adjtime(2) adjust boottime so it doesn't cause non-monotonous
uptime.
1999-12-08 10:02:12 +00:00
Bruce Evans
71a62f8a05 Fixed some comments in statclock(). The previous commit made it clearer
that one comment was attached to null code.
1999-11-27 14:37:34 +00:00
Bruce Evans
8a9d4d98b1 Moved scheduling-related code to kern_synch.c so that it is easier to fix
and extend.  The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).

Agreed with in principle by:	dufault
1999-11-27 12:32:27 +00:00
Peter Wemm
de3f888991 #ifdef PPS_SYNC around "kapi" declaration to fix a -Wunused warning. 1999-10-10 16:18:36 +00:00
John Hay
b7424f2dfb Update the PPSAPI to draft-mogul-pps-api-05.txt which is the latest.
NOTE: This will break building ntpd until ntpd has been upgraded to also
support draft 05. People that want to build ntpd in the meantime can
get patches from me.
1999-10-09 14:49:56 +00:00
Bruce Evans
37d3877723 Moved the definition of `boottime' and its sysctl to the correct file. 1999-09-13 14:22:27 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Bruce Evans
a1a10fdfc0 Oops, the previous commit only worked in the one case it was tested for. 1999-07-24 20:21:10 +00:00
Bruce Evans
6b6ef746e5 Added a sysctl "kern.timecounter.hardware" for selecting the hardware
used for timecounting.  The possible values are the names of the
physically present harware timecounters ("i8254" and "TSC" on i386's).

Fixed some nearby bitrot in comments in <sys/time.h>.

Reviewed by:	phk
1999-07-18 15:07:20 +00:00
John Polstra
7ac9503b86 Remove four no-op casts. 1999-07-18 01:35:26 +00:00
Poul-Henning Kamp
0bb2226a4d Make the machdep.i8254_freq and machdep.tsc_freq sysctls modify the
timecounter as well

Asked for by:	bde, jhay
1999-04-25 09:00:00 +00:00
Poul-Henning Kamp
c4a6db710a Don't open window for race condition.
Detected by:	Reg Clemens <reg@dwf.com>
1999-04-02 13:57:21 +00:00
Poul-Henning Kamp
30f27235cb Fix an old cut&paste bogon.
Noticed by: bde
1999-03-12 21:58:54 +00:00
Poul-Henning Kamp
37d39f0a50 Remove duplicate include.
Noticed by: bde
1999-03-12 11:09:50 +00:00
Poul-Henning Kamp
32c203577a Make even more of the PPSAPI implementations generic.
FLL support in hardpps()

Various magic shuffles and improved comments

Style fixes from Bruce.
1999-03-11 15:09:51 +00:00
Poul-Henning Kamp
c68996e271 Integrate the new "nanokernel" PLL from Dave Mills.
This code is backwards compatible with the older "microkernel" PLL, but
allows ntpd v4 to use nanosecond resolution.  Many other improvements.

PPS_SYNC and hardpps() are NOT supported yet.
1999-03-08 12:36:14 +00:00