freebsd-dev

Author	SHA1	Message	Date
Poul-Henning Kamp	5b1a8eb397	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.	2006-02-07 21:22:02 +00:00
John Baldwin	8963150678	patch(1) and I aren't friends today. Axe a duplicate copy of the msleep_spin() function definition. Spotted by: pjd	2005-12-29 21:15:32 +00:00
John Baldwin	0cb7e6aec8	Add a new function msleep_spin() which is a slightly stripped down version of msleep(). msleep_spin() doesn't support changing the priority of the thread while it is asleep nor does it support interruptible sleeps (PCATCH) or the PDROP flag. It does support timeouts however. It differs from msleep() in that the passed in mutex is a spin mutex. This means one can use msleep_spin() and wakeup() with a spin mutex similar to msleep() and wakeup() with a regular mutex. Note that the spin mutex in question needs to come before sched_lock and the sleepq locks in lock order.	2005-12-29 20:57:45 +00:00
John Baldwin	ef627e7da0	When checking to see if a process has exceeded its time limit, flag the process as over the limit when its time is >= to the limit rather than > the limit. Technically, if p->p_rux.rux_runtime.sec == p->p_pcpulimit and p->p_rux.rux_runtime.frac == 0, the process hasn't exceeded the limit yet. However, having the fraction exactly equal to 0 is rather rare, and it is not worth the overhead to handle that edge case. With just the > comparison, the process would have to exceed its limit by almost a second before it was killed. PR: kern/83192 Submitted by: Maciej Zawadzinski mzawadzinski at gmail dot com Reviewed by: bde MFC after: 1 week	2005-11-28 19:09:08 +00:00
Stephan Uphoff	d13ec71369	Use low level constructs borrowed from interrupt threads to wait for work in proc0. Remove the TDP_WAKEPROC0 workaround.	2005-05-23 23:01:53 +00:00
Stephan Uphoff	779186434a	Sprinkle some volatile magic and rearrange things a bit to avoid race conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb	2005-04-08 03:37:53 +00:00
John Baldwin	b80ed61487	Don't recursively panic when we call mi_switch() in a critical section, even though calling mi_switch() after a panic is likely a bug anyway as the recursive panic only serves to make things worse.	2005-03-31 20:36:44 +00:00
John Baldwin	63710c4d35	Stop explicitly touching td_base_pri outside of the scheduler and simply set a thread's priority via sched_prio() when that is the desired action. The schedulers will start managing td_base_pri internally shortly.	2004-12-30 20:29:58 +00:00
Jeff Roberson	85da7a569b	- Define KTR points for KTR_SCHED.	2004-12-26 00:14:21 +00:00
David Xu	7d2eb68b66	Unlock mutex if PDROP was set by caller.	2004-11-27 11:43:31 +00:00
Scott Long	b96741f410	If a process needs to be swapped in, wakeup the swapper from within critical_exit as the process is getting scheduled to run. This is subotimal but for now avoid the LOR between the scheduler and the sleepq systems. This is a 5.3 candidate. Submitted by: davidxu MFC After: 3 days	2004-10-16 06:38:22 +00:00
John Baldwin	2ff0e645d1	Refine the turnstile and sleep queue interfaces just a bit: - Add a new _lock() call to each API that locks the associated chain lock for a lock_object pointer or wait channel. The _lookup() functions now require that the chain lock be locked via _lock() when they are called. - Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup the associated queue structure internally via _lookup() rather than accepting a pointer from the caller. For turnstiles, this means that the actual lookup of the turnstile in the hash table is only done when the thread actually blocks rather than being done on each loop iteration in _mtx_lock_sleep(). For sleep queues, this means that sleepq_lookup() is no longer used outside of the sleep queue code except to implement an assertion in cv_destroy(). - Change sleepq_broadcast() and sleepq_signal() to require that the chain lock is already required. For condition variables, this lets the cv_broadcast() and cv_signal() functions lock the sleep queue chain lock while testing the waiters count. This means that the waiters count internal to condition variables is no longer protected by the interlock mutex and cv_broadcast() and cv_signal() now no longer require that the interlock be held when they are called. This lets consumers of condition variables drop the lock before waking other threads which can result in fewer context switches. MFC after: 1 month	2004-10-12 18:36:20 +00:00
John Baldwin	78c85e8dfc	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month	2004-10-05 18:51:11 +00:00
Julian Elischer	14f0e2e9bf	clean up thread runq accounting a bit. MFC after: 3 days	2004-09-16 07:12:59 +00:00
Julian Elischer	3389af30e8	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week	2004-09-10 21:04:38 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
John Baldwin	007ddf7e7a	Now that the return value semantics of cv's for multithreaded processes have been unified with that of msleep(9), further refine the sleepq interface and consolidate some duplicated code: - Move the pre-sleep checks for theaded processes into a thread_sleep_check() function in kern_thread.c. - Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c. Specifically, if a thread is awakened by something other than a signal while checking for signals before going to sleep, clear TDF_SINTR in sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in that edge case during an interruptible sleep. Also, fix sleepq_check_signals() to properly handle the condition if TDF_SINTR is clear rather than requiring the callers of the sleepq API to notice this edge case and call a non-_sig variant of sleepq_wait(). - Clarify the flags arguments to sleepq_add(), sleepq_signal() and sleepq_broadcast() by creating an explicit submask for sleepq types. Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of 0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and move the setting of TDF_SINTR to sleepq_add() if this flag is set rather than sleepq_catch_signals(). Note that it is the caller's responsibility to ensure that sleepq_catch_signals() is called if and only if this flag is passed to the preceeding sleepq_add(). Note that this also removes a sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures that for an interruptible sleep, TDF_SINTR is always set when TD_ON_SLEEPQ() is true.	2004-08-19 11:31:42 +00:00
Julian Elischer	732d95288a	Increase the amount of data exported by KTR in the KTR_RUNQ setting. This extra data is needed to really follow what is going on in the threaded case.	2004-08-09 18:21:12 +00:00
John Baldwin	c950c15c76	Workaround a possible deadlock on SMP due to a spin lock LOR by disabling the immediate awakening of proc0 (scheduler kproc, controls swapping processes in and out). The scheduler process periodically awakens already, so this will not result in processes not being swapped in, there will just be more latency in between a thread being made runnable and the scheduler waking up to swap the affected process back in.	2004-08-04 20:24:40 +00:00
David Xu	8bda8a620c	Use P_SINGLE_EXIT to check single-threading case, P_WEXIT is not for that purpose.	2004-07-28 06:30:52 +00:00
John Baldwin	52eb84641d	- Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).	2004-07-16 21:04:55 +00:00
Marcel Moolenaar	2d50560abc	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().	2004-07-10 21:36:01 +00:00
John Baldwin	0c0b25ae91	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)	2004-07-02 20:21:44 +00:00
John Baldwin	bf0acc273a	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.	2004-07-02 19:09:50 +00:00
John Baldwin	a5471e4ef4	Remove the signal_caught argument from sleepq_timedwait() as it was effectively always zero.	2004-06-28 18:57:06 +00:00
Tim J. Robbins	be5318b2ca	Remove a stale and misleading comment.	2004-06-07 09:35:00 +00:00
Bruce Evans	a13ec35b05	Fixed some common printf format errors. Don't assume that "struct foo " is "void " (it isn't) or that the default promotion of pid_t is int. Instead, assume that casting "struct foo " to "void " and printing the result with %p is useful, and that all pid_t's are representable as longs. Fixed some minor style bugs (mainly spelling errors in comments).	2004-05-14 20:51:42 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
John Baldwin	1ed3e44f22	- Remove old sleep queues. - Remove sleepqueue argument from sleepq_set_timeout() since it is not used.	2004-03-12 19:06:18 +00:00
Robert Watson	ce89352952	Mark loadaverage callout as CALLOUT_MPSAFE. Reviewed by: jhb	2004-03-08 22:01:19 +00:00
John Baldwin	959c0c4122	Correct handling of PDROP in msleep() to just skip the mtx_lock() rather than clear the lock pointer so that sleepq_add() still gets the correct lock pointer and doesn't bogusly trip an assertion.	2004-03-02 14:58:33 +00:00
John Baldwin	44f3b09204	Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.	2004-02-27 18:52:44 +00:00
Jeff Roberson	40ece05382	- Revert rev 1.240 we no longer need a kthread for loadav().	2004-02-01 05:37:36 +00:00
Jeff Roberson	e7f004fe23	- Use sched_load() rather than grabbing the sx lock and traversing the proc table to discover the load.	2004-02-01 02:51:33 +00:00
John Baldwin	d5b75694e7	Move the loadav() callout into its own kthread since it uses allproc_lock which is a sleepable lock and thus is not safe to acquire from a callout routine.	2004-01-28 20:44:41 +00:00
Jeff Roberson	d1605f0ac9	- Use a unique string for the sched_setup SYSINIT and rename sched_setup to synch_setup. The schedulers use the sched_setup function name.	2004-01-25 07:49:45 +00:00
Jeff Roberson	29bcc4514f	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.	2004-01-25 03:54:52 +00:00
Bruce Evans	b3aeaf2ed1	Removed mostly-dead code for setting switchtime after the idle loop clobbers this variable. Long ago, when the idle loop wasn't in a process, it set switchtime.tv_sec to zero to indicate that the time needs to be read after the idle loop finishes. The special case for this isn't needed now that there is an idle process (for each CPU). The time is read in the normal way when the idle process is switched away from. The seconds component of the time is only zero for the first second after the uptime is set, and the mostly-dead code was only executed during this time. (This was slightly broken by using uptimes instead of times relative to the Epoch -- in the original version the seconds component of the time was only 0 for the first second after the Epoch.) In mi_switch(), moved the setting of switchticks to just after the first (and now only) setting of switchtime. This setting used to be delayed since a late setting was needed for the idle case and an early setting was not needed. Now the early setting is needed so that fork_exit() doesn't need to set either switchtime or switchticks. Removed now-completely-rotted comment attached to this. Most of the code described by the comment had already moved to sched_switch().	2003-10-29 15:23:09 +00:00
Jeff Roberson	ae53b483cc	- Collapse sched_switchin() and sched_switchout() into sched_switch(). Now mi_switch() calls sched_switch() which calls cpu_switch(). This is actually one less function call than it had been.	2003-10-16 08:53:46 +00:00
Bruce M Simpson	0c9601bc6b	Add a pre-emption counter, td_generation, so that threads can notice when they have been pre-empted by other threads. This is bumped from within mi_switch() every time a context switch takes place. Discussed with: pete	2003-10-05 09:35:08 +00:00
Jeff Roberson	fa3f9daae5	- On my Pentium4-M laptop, invalpg takes ~1100 cycles if the page is found in the TLB and ~1600 if it is not. Therefore, it is more effecient to invalidate the TLB after operations that use CMAP rather than before. - So that the tlb is invalidated prior to switching off of a processor, we must change the switchin functions to switchout functions. - Remove td_switchout from the thread and move it to the x86 pcb. - Move the code that calls switchout into swtch.s. These changes make this optimization truely x86 specific.	2003-09-30 08:11:36 +00:00
Sam Leffler	c06eb4e293	Change instances of callout_init that specify MPSAFE behaviour to use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.	2003-08-19 17:51:11 +00:00
John Baldwin	70fca4277e	- Various style fixes in both code and comments. - Update some stale comments. - Sort a couple of includes. - Only set 'newcpu' in updatepri() if we use it. - No functional changes. Obtained from: bde (via an old diff I got a long time ago)	2003-08-15 21:29:06 +00:00
Peter Grehan	eac100658a	Update powerpc to use the (old thread,new thread) calling convention for cpu_throw() and cpu_switch().	2003-08-14 03:56:24 +00:00
John Baldwin	e9911cf591	- Convert Alpha over to the new calling conventions for cpu_throw() and cpu_switch() where both the old and new threads are passed in as arguments. Only powerpc uses the old conventions now. - Update comments in the Alpha swtch.s to reflect KSE changes. Tested by: obrien, marcel	2003-08-12 19:33:36 +00:00
Peter Wemm	e95babf3a8	unifdef -DLAZY_SWITCH and start to tidy up the associated glue.	2003-07-10 01:02:59 +00:00
David Xu	9dde3bc999	o Change kse_thr_interrupt to allow send a signal to a specified thread, or unblock a thread in kernel, and allow UTS to specify whether syscall should be restarted. o Add ability for UTS to monitor signal comes in and removed from process, the flag PS_SIGEVENT is used to indicate the events. o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with this flag set to wait for above signal event. o For SA based thread, kernel masks all signal in its signal mask, let UTS to use kse_thr_interrupt interrupt a thread, and install a signal frame in userland for the thread. o Add a tm_syncsig in thread mailbox, when a hardware trap occurs, it is used to deliver synchronous signal to userland, and upcall is schedule, so UTS can process the synchronous signal for the thread. Reviewed by: julian (mentor)	2003-06-28 08:29:05 +00:00
Peter Wemm	eabd19726f	Tidy up leftover lazy_switch instrumentation that is no longer needed. This cleans up some #ifdef hell.	2003-06-27 22:39:14 +00:00
David Xu	0e2a4d3aeb	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.	2003-06-15 00:31:24 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Poul-Henning Kamp	4fe77d64a0	Add a couple of XXX comments where the intent is not clear. Found by: FlexeLint	2003-05-31 20:13:58 +00:00
Marcel Moolenaar	f2c49dd248	Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)	2003-05-16 21:26:42 +00:00
John Baldwin	90af4afacb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)	2003-05-13 20:36:02 +00:00
John Baldwin	e668d8d834	Remove TD_ON_RUNQ() from a check to make sure Giant is not held when calling mi_switch(). The kernel would panic on an earlier KASSERT() in mi_switch() if TD_ON_RUNQ() was true.	2003-05-05 21:12:36 +00:00
John Baldwin	f2957f6b9a	Garbage collect unused TDF_INMSLEEP flag.	2003-05-01 17:05:24 +00:00
Peter Wemm	cb1f265c60	AMD64 uses the new-style cpu_switch()/cpu_throw() calling conventions.	2003-04-30 21:45:03 +00:00
John Baldwin	112afcb232	- Protect p_numthreads with the sched_lock. - Protect p_singlethread with both the sched_lock and the proc lock. - Protect p_suspcount with the proc lock.	2003-04-23 18:46:51 +00:00
Peter Wemm	cc66ebe2a9	Commit a partial lazy thread switch mechanism for i386. it isn't as lazy as it could be and can do with some more cleanup. Currently its under options LAZY_SWITCH. What this does is avoid %cr3 reloads for short context switches that do not involve another user process. ie: we can take an interrupt, switch to a kthread and return to the user without explicitly flushing the tlb. However, this isn't as exciting as it could be, the interrupt overhead is still high and too much blocks on Giant still. There are some debug sysctls, for stats and for an on/off switch. The main problem with doing this has been "what if the process that you're running on exits while we're borrowing its address space?" - in this case we use an IPI to give it a kick when we're about to reclaim the pmap. Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a few more things and get some more feedback before turning it on by default. This is NOT a replacement for Bosko's lazy interrupt stuff. This was more meant for the kthread case, while his was for interrupts. Mine helps a little for interrupts, but his helps a lot more. The stats are enabled with options SWTCH_OPTIM_STATS - this has been a pseudo-option for years, I just added a bunch of stuff to it. One non-trivial change was to select a new thread before calling cpu_switch() in the first place. This allows us to catch the silly case of doing a cpu_switch() to the current process. This happens uncomfortably often. This simplifies a bit of the asm code in cpu_switch (no longer have to call choosethread() in the middle). This has been implemented on i386 and (thanks to jake) sparc64. The others will come soon. This is actually seperate to the lazy switch stuff. Glanced at by: jake, jhb	2003-04-02 23:53:30 +00:00
Jeff Roberson	2c10d16a4b	- Borrow the KSE single threading code for exec and exit. We use the check if (p->p_numthreads > 1) and not a flag because action is only necessary if there are other threads. The rest of the system has no need to identify thr threaded processes. - In kern_thread.c use thr_exit1() instead of thread_exit() if P_THREADED is not set.	2003-04-01 01:26:20 +00:00
David Xu	6ce75196ce	Adjust code for userland preemptive. Userland can set a quantum in kse_mailbox to schedule an upcall, this is useful for userland timeout routine, for example pthread_cond_timedwait(). Also extract upcall scheduling code from kse_reassign and create a new function called thread_switchout to include these code. Reviewed by: julain	2003-03-19 05:49:38 +00:00
John Baldwin	263067951a	Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().	2003-03-04 21:03:05 +00:00
Hartmut Brandt	b89bc9e62b	When a process has been waiting on a condition variable or mutex the td_wmesg field in the thread structure points to the description string of the condition variable or mutex. If the condvar or the mutex had been initialized from a loadable module that was unloaded in the meantime, td_wmesg may now point to invalid memory. Retrieving the process table now may panic the kernel (or access junk). Setting the td_wmesg field to NULL after unblocking on the condvar/mutex prevents this panic. PR: kern/47408 Approved by: jake (mentor)	2003-02-27 08:43:27 +00:00
Julian Elischer	ac2e415327	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.	2003-02-27 02:05:19 +00:00
Julian Elischer	4a338afd7a	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@	2003-02-17 09:55:10 +00:00
Alfred Perlstein	aae87a3681	Print a backtrace in case we tsleep from inside of DDB.	2003-02-14 12:44:07 +00:00
Julian Elischer	93a7aa79d6	Add code to ddb to allow backtracing an arbitrary thread. (show thread {address}) Remove the IDLE kse state and replace it with a change in the way threads sahre KSEs. Every KSE now has a thread, which is considered its "owner" however a KSE may also be lent to other threads in the same group to allow completion of in-kernel work. n this case the owner remains the same and the KSE will revert to the owner when the other work has been completed. All creations of upcalls etc. is now done from kse_reassign() which in turn is called from mi_switch or thread_exit(). This means that special code can be removed from msleep() and cv_wait(). kse_release() does not leave a KSE with no thread any more but converts the existing thread into teh KSE's owner, and sets it up for doing an upcall. It is just inhibitted from being scheduled until there is some reason to do an upcall. Remove all trace of the kse_idle queue since it is no-longer needed. "Idle" KSEs are now on the loanable queue.	2002-12-28 01:23:07 +00:00
Julian Elischer	696058c3c5	Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage during the context switch. Rearrange thread cleanups to avoid problems with Giant. Clean threads when freed or when recycled. Approved by: re (jhb)	2002-12-10 02:33:45 +00:00
Jeff Roberson	148302c9c9	- Move FSCALE back to kern_sync. This is not scheduler specific. - Create a new callout for lbolt and move it out of schedcpu(). This is not scheduler specific either. Approved by: re	2002-11-21 08:57:08 +00:00
David Xu	34e80e027d	Add an actual implementation of kse_thr_interrupt()	2002-10-30 02:28:41 +00:00
Julian Elischer	9d10277721	More work on the interaction between suspending and sleeping threads. Also clean up some code used with 'single-threading'. Reviewed by: davidxu	2002-10-25 07:11:12 +00:00
Jeff Roberson	b43179fbe8	- Create a new scheduler api that is defined in sys/sched.h - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch	2002-10-12 05:32:24 +00:00
John Baldwin	5715307f74	- Move p_cpulimit to struct proc from struct plimit and protect it with sched_lock. This means that we no longer access p_limit in mi_switch() and the p_limit pointer can be protected by the proc lock. - Remove PRS_ZOMBIE check from CPU limit test in mi_switch(). PRS_ZOMBIE processes don't call mi_switch(), and even if they did there is no longer the danger of p_limit being NULL (which is what the original zombie check was added for). - When we bump the current processes soft CPU limit in ast(), just bump the private p_cpulimit instead of the shared rlimit. This fixes an XXX for some value of fix. There is still a (probably benign) bug in that this code doesn't check that the new soft limit exceeds the hard limit. Inspired by: bde (2)	2002-10-09 17:17:24 +00:00
Julian Elischer	48bfcddd94	Round out the facilty for a 'bound' thread to loan out its KSE in specific situations. The owner thread must be blocked, and the borrower can not proceed back to user space with the borrowed KSE. The borrower will return the KSE on the next context switch where teh owner wants it back. This removes a lot of possible race conditions and deadlocks. It is consceivable that the borrower should inherit the priority of the owner too. that's another discussion and would be simple to do. Also, as part of this, the "preallocatd spare thread" is attached to the thread doing a syscall rather than the KSE. This removes the need to lock the scheduler when we want to access it, as it's now "at hand". DDB now shows a lot mor info for threaded proceses though it may need some optimisation to squeeze it all back into 80 chars again. (possible JKH project) Upcalls are now "bound" threads, but "KSE Lending" now means that other completing syscalls can be completed using that KSE before the upcall finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is one of the highest priorities in the KSE system.) The upcall when it happens will present all the completed syscalls to the KSE for selection.	2002-10-09 02:33:36 +00:00
Juli Mallett	a723033a4d	XXX Add a check for p->p_limit being NULL before dereferencing it. This is totally bogus but will hide the occurances of access of 0xbc(NULL) which people have run into lately. This is not a proper fix, just a bandaid, until the cause of this happening is tracked down and fixed. Reviewed by: rwatson	2002-10-03 04:09:00 +00:00
John Baldwin	551cf4e150	Rename the mutex thread and process states to use a more generic 'LOCK' name instead. (e.g., SLOCK instead of SMTX, TD_ON_LOCK() instead of TD_ON_MUTEX()) Eventually a turnstile abstraction will be added that will be shared with mutexes and other types of locks. SLOCK/TDI_LOCK will be used internally by the turnstile code and will not be specific to mutexes. Making the change now ensures that turnstiles can be dropped in at a later date without affecting the ABI of userland applications.	2002-10-02 20:31:47 +00:00
John Baldwin	1d56414515	- Adjust comment noting that handling of CPU limit exhaustion is done in ast(). - Actually set KEF_ASTPENDING so ast() is called. I think this is buggy for a process with multiple KSE's in that PS_XCPU is not a KSE event, it's a process-wide event. IMO there really should probably be two ASTPENDING flags, one for per-process, and one for per-KSE. Submitted by: bde	2002-10-01 14:10:08 +00:00
John Baldwin	dc183990ca	- Add a new per-process flag PS_XCPU to indicate that at least one thread has exceeded its CPU time limit. - In mi_switch(), set PS_XCPU when the CPU time limit is exceeded. - Perform actual CPU time limit exceeded work in ast() when PS_XCPU is set. Requested by: many	2002-09-30 21:13:54 +00:00
Julian Elischer	9eb1fdea37	Implement basic KSE loaning. This stops a hread that is blocked in BOUND mode from stopping another thread from completing a syscall, and this allows it to release its resources etc. Probably more related commits to follow (at least one I know of) Initial concept by: julian, dillon Submitted by: davidxu	2002-09-29 23:04:34 +00:00
Julian Elischer	71fad9fdee	Completely redo thread states. Reviewed by: davidxu@freebsd.org	2002-09-11 08:13:56 +00:00
Julian Elischer	472be95807	Rejig the code to figure out estcpu and work out how long a KSEGRP has been idle. What was there before was surprisingly ALMOST correct. Peter and I fried our brains on this for a couple of hours figuring out what this actually means in the context of multiple threads. Reviewed by: peter@freebsd.org	2002-08-30 00:25:49 +00:00
Peter Wemm	d13947c3b0	updatepri() works on a ksegrp (where the scheduling parameters are), so directly give it the ksegrp instead of the thread. The only thing it used to use in the thread was the ksegrp. Reviewed by: julian	2002-08-28 23:45:15 +00:00
Julian Elischer	04774f2357	Slight cleanup of some comments/whitespace. Make idle process state more consistant. Add an assert on thread state. Clean up idleproc/mi_switch() interaction. Use a local instead of referencing curthread 7 times in a row (I've been told curthread can be expensive on some architectures) Remove some commented out code. Add a little commented out code (completion coming soon) Reviewed by: jhb@freebsd.org	2002-08-01 18:45:10 +00:00
Seigo Tanimura	133267776c	In endtsleep() and cv_timedwait_end(), a thread marked TDF_TIMEOUT may be swapped out. Do not put such the thread directly back to the run queue. Spotted by: David Xu <davidx@viasoft.com.cn> While I am here, s/PS_TIMEOUT/TDF_TIMEOUT/.	2002-07-30 10:12:11 +00:00
Seigo Tanimura	9eb881f804	- Optimize wakeup() and its friends; if a thread waken up is being swapped in, we do not have to ask for the scheduler thread to do that. - Assert that a process is not swapped out in runq functions and swapout(). - Introduce thread_safetoswapout() for readability. - In swapout_procs(), perform a test that may block (check of a thread working on its vm map) first. This lets us call swapout() with the sched_lock held, providing a better atomicity.	2002-07-30 06:54:05 +00:00
Julian Elischer	1d7b9ed2e6	Create a new thread state to describe threads that would be ready to run except for the fact tha they are presently swapped out. Also add a process flag to indicate that the process has started the struggle to swap back in. This will be needed for the case where multiple threads start the swapin action top a collision. Also add code to stop a process fropm being swapped out if one of the threads in this process is actually off running on another CPU.. that might hurt... Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>	2002-07-29 18:33:32 +00:00
Julian Elischer	294e6308bf	slight stylisations to take into account recent code changes.	2002-07-24 23:59:15 +00:00
Julian Elischer	2d014fd7f8	Fix a reversed test. Fix some style nits. Fix a KASSERT message. Add/fix some comments. Submitted by: bde@freebsd.org	2002-07-17 19:20:48 +00:00
John Baldwin	627ed43ba7	Add a KASSERT() to assert that td_critnest is == 1 when mi_switch() is called.	2002-07-17 02:46:13 +00:00
Andrew Gallatin	fe79953325	Allow alphas to do crashdumps: Refuse to run anything in choosethread() after a panic which is not an interrupt thread, or the thread which caused the panic. Also, remove panicstr checks from msleep() and from cv_wait() in order to allow threads to go to sleep and yeild the cpu to the panicing thread, or to an interrupt thread which might be doing the crashdump. Reviewed by: jhb (and it was mostly his idea too)	2002-07-17 02:23:44 +00:00
Julian Elischer	c3b98db091	Thinking about it I came to the conclusion that the KSE states were incorrectly formulated. The correct states should be: IDLE: On the idle KSE list for that KSEG RUNQ: Linked onto the system run queue. THREAD: Attached to a thread and slaved to whatever state the thread is in. This means that most places where we were adjusting kse state can go away as it is just moving around because the thread is.. The only places we need to adjust the KSE state is in transition to and from the idle and run queues. Reviewed by: jhb@freebsd.org	2002-07-14 03:43:33 +00:00
Julian Elischer	ac8bcbb700	oops, state cannot be two different values at once.. use \|\| instead of &&	2002-07-14 01:36:48 +00:00
Matthew Dillon	fbcf77c2ea	Re-enable the idle page-zeroing code. Remove all IPIs from the idle page-zeroing code as well as from the general page-zeroing code and use a lazy tlb page invalidation scheme based on a callback made at the end of mi_switch. A number of people came up with this idea at the same time so credit belongs to Peter, John, and Jake as well. Two-way SMP buildworld -j 5 tests (second run, after stabilization) 2282.76 real 2515.17 user 704.22 sys before peter's IPI commit 2266.69 real 2467.50 user 633.77 sys after peter's commit 2232.80 real 2468.99 user 615.89 sys after this commit Reviewed by: peter, jhb Approved by: peter	2002-07-12 20:17:06 +00:00
Julian Elischer	fe0f1bf4b1	make this repect ps_sigintr if there is a pre-existing signal or suspension request. Submitted by: David Xu	2002-07-06 08:47:24 +00:00
Julian Elischer	55fb7ca894	Fix at least one of the things wrong with signals ^Z should work a lot better now. Submitted by: peter@freebsd.org	2002-07-06 02:45:11 +00:00
Julian Elischer	aa0fa33464	Try clean up some of the mess that resulted from layers and layers of p4 merges from -current as things started getting different. Corroborated by: Similar patches just mailed by BDE.	2002-07-03 09:15:20 +00:00
Julian Elischer	8b768fc82b	When going back to SLEEP state, make sure our State is correctly marked so.	2002-07-02 05:40:51 +00:00
Julian Elischer	e602ba25fd	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..	2002-06-29 17:26:22 +00:00
Alfred Perlstein	016091145e	more caddr_t removal.	2002-06-29 02:00:02 +00:00
Matthew Dillon	727300861d	I Noticed a defect in the way wakeup() scans the tailq. Tor noticed an even worse defect in wakeup_one(). This patch cleans up both. Submitted by: tegge MFC after: 3 days	2002-06-24 00:14:36 +00:00
John Baldwin	9ba7fe1b76	- Catch up to new ktrace API. - ktrace trace points in msleep() and cv_wait() no longer need Giant.	2002-06-07 05:39:16 +00:00
Julian Elischer	628855e758	CURSIG() is not a macro so rename it cursig(). Obtained from: KSE tree	2002-05-29 23:44:32 +00:00
John Baldwin	cc5d39f81e	Minor nit: get p pointer in msleep() from td->td_proc (where td == curthread) rather than from curproc.	2002-05-23 04:14:18 +00:00
Alfred Perlstein	4d77a549fe	Remove __P.	2002-03-19 21:25:46 +00:00
Peter Wemm	30171114b3	Fix a gcc-3.1+ warning. warning: deprecated use of label at end of compound statement ie: you cannot do this anymore: switch(foo) { .... default: }	2002-03-19 11:02:06 +00:00
Poul-Henning Kamp	1cbb9c3b03	Convert p->p_runtime and PCPU(switchtime) to bintime format.	2002-02-22 13:32:01 +00:00
Julian Elischer	2c1007663f	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)	2002-02-11 20:37:54 +00:00
John Baldwin	c86b6ff551	Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha	2002-01-05 08:47:13 +00:00
John Baldwin	7e1f6dfe9d	Modify the critical section API as follows: - The MD functions critical_enter/exit are renamed to start with a cpu_ prefix. - MI wrapper functions critical_enter/exit maintain a per-thread nesting count and a per-thread critical section saved state set when entering a critical section while at nesting level 0 and restored when exiting to nesting level 0. This moves the saved state out of spin mutexes so that interlocking spin mutexes works properly. - Most low-level MD code that used critical_enter/exit now use cpu_critical_enter/exit. MI code such as device drivers and spin mutexes use the MI wrappers. Note that since the MI wrappers store the state in the current thread, they do not have any return values or arguments. - mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is assigned to curthread->td_savecrit during fork_exit(). Tested on: i386, alpha	2001-12-18 00:27:18 +00:00
Luigi Rizzo	af1408e33f	Add/correct description for some sysctl variables where it was missing. The description field is unused in -stable, so the MFC there is equivalent to a comment. It can be done at any time, i am just setting a reminder in 45 days when hopefully we are past 4.5-release. MFC after: 45 days	2001-12-16 16:07:20 +00:00
John Baldwin	ac9a258074	Assert that Giant is not held in mi_switch() unless the process state is SMTX or SRUN.	2001-10-23 17:52:49 +00:00
Ian Dowse	72ec63a53d	Introduce some jitter to the timing of the samples that determine the system load average. Previously, the load average measurement was susceptible to synchronisation with processes that run at regular intervals such as the system bufdaemon process. Each interval is now chosen at random within the range of 4 to 6 seconds. This large variation is chosen so that over the shorter 5-minute load average timescale there is a good dispersion of samples across the 5-second sample period (the time to perform 60 5-second samples now has a standard deviation of approx 4.5 seconds).	2001-10-20 16:07:17 +00:00
Ian Dowse	0eb6ce3169	Move the code that computes the system load average from vm_meter.c to kern_synch.c in preparation for adding some jitter to the inter-sample time. Note that the "vm.loadavg" sysctl still lives in vm_meter.c which isn't the right place, but it is appropriate for the current (bad) name of that sysctl. Suggested by: jhb (some time ago) Reviewed by: bde	2001-10-20 13:10:43 +00:00
John Baldwin	21832b1ec0	GC some #if 0'd code.	2001-09-21 19:21:18 +00:00
John Baldwin	3226cbf43b	Whitespace and spelling fixes.	2001-09-21 19:16:12 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Matthew Dillon	918c3b1361	Make yield() MPSAFE. Synchronize syscalls.master with all MPSAFE changes to date. Synchronize new syscall generation follows because yield() will panic if it is out of sync with syscalls.master.	2001-09-01 03:54:09 +00:00
John Baldwin	b285782b29	Release the sched_lock before bombing out in mi_switch() via db_error(). This makes things slightly easier if you call a function that calls mi_switch() as it keeps the locking before and after closer.	2001-08-21 23:10:37 +00:00
John Baldwin	161778121a	Add a hook to mi_switch() to abort via db_error() if we attempt to perform a context switch from DDB. Consulting from: bde	2001-08-21 20:09:05 +00:00
John Baldwin	91a4536f22	- Fix a bug in the previous workaround for the tsleep/endtsleep race. callout_stop() would fail in two cases: 1) The timeout was currently executing, and 2) The timeout had already executed. We only needed to work around the race for 1). We caught some instances of 2) via the PS_TIMEOUT flag, however, if endtsleep() fired after the process had been woken up but before it had resumed execution, PS_TIMEOUT would not be set, but callout_stop() would fail, so we would block the process until endtsleep() resumed it. Except that endtsleep() had already run and couldn't resume it. This adds a new flag PS_TIMOFAIL to indicate the case of 2) when PS_TIMEOUT isn't set. - Implement this race fix for condition variables as well. Tested by: sos	2001-08-21 18:42:45 +00:00
John Baldwin	688ebe120c	- Close races with signals and other AST's being triggered while we are in the process of exiting the kernel. The ast() function now loops as long as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with preemption disabled so that any further AST's that arrive via an interrupt will be delayed until the low-level MD code returns to user mode. - Use u_int's to store the tick counts for profiling purposes so that we do not need sched_lock just to read p_sticks. This also closes a problem where the call to addupc_task() could screw up the arithmetic due to non-atomic reads of p_sticks. - Axe need_proftick(), aston(), astoff(), astpending(), need_resched(), clear_resched(), and resched_wanted() in favor of direct bit operations on p_sflag. - Fix up locking with sched_lock some. In addupc_intr(), use sched_lock to ensure pr_addr and pr_ticks are updated atomically with setting PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing PS_OWEUPC. We also do not grab the lock just to test a flag. - Simplify the handling of Giant in ast() slightly. Reviewed by: bde (mostly)	2001-08-10 22:53:32 +00:00
John Baldwin	8791b43513	Work around a race between msleep() and endtsleep() where it was possible for endtsleep() to be executing when msleep() resumed, for endtsleep() to spin on sched_lock long enough for the other process to loop on msleep() and sleep again resulting in endtsleep() waking up the "wrong" msleep. Obtained from: BSD/OS	2001-08-10 21:08:56 +00:00
John Baldwin	4d33620270	Style nit: covert a couple of if (p_wchan) tests to if (p_wchan != NULL).	2001-08-10 20:56:25 +00:00
John Baldwin	8ec48c6dbf	- Remove asleep(), await(), and M_ASLEEP. - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused. Reviewed by: jasone, peter	2001-08-10 06:37:05 +00:00
John Baldwin	b39bc3e160	Use 'p' instead of the potentially more expensive 'curproc' inside of mi_switch().	2001-08-02 22:15:31 +00:00
John Baldwin	36c2e9feb4	Apply the cluebat to myself and undo the await() -> mawait() rename. The asleep() and await() functions split the functionality of msleep() up into two halves. Only the asleep() half (which is what puts the process on the sleep queue) actually needs the lock usually passed to msleep() held to prevent lost wakeups. await() does not need the lock held, so the lock can be released prior to calling await() and does not need to be passed in to the await() function. Typical usage of these functions would be as follows: mtx_lock(&foo_mtx); ... do stuff ... asleep(&foo_cond, PRIxx, "foowt", hz); ... mtx_unlock&foo_mtx); ... await(-1, -1); Inspired by: dillon on the couch at Usenix	2001-07-31 22:06:56 +00:00
John Baldwin	e9121d0663	Add a safety belt to mawait() for the (cold \|\| panicstr) case identical to the one in msleep() such that we return immediately rather than blocking. Submitted by: peter Prodded by: sheldonh	2001-07-31 20:57:57 +00:00
Jake Burkholder	d652b3d918	Backout mwakeup, etc.	2001-07-06 01:16:43 +00:00
Jake Burkholder	9316aed2ef	Implement mwakeup, mwakeup_one, cv_signal_drop and cv_broadcast_drop. These take an additional mutex argument, which is dropped before any processes are made runnable. This can avoid contention on the mutex if the processes would immediately acquire it, and is done in such a way that wakeups will not be lost. Reviewed by: jhb	2001-07-04 00:32:50 +00:00
John Baldwin	d68a8cc0ab	Remove commented-out garbage that skipped updating schedcpu() stats for ithreads in SWAIT.	2001-07-03 08:03:56 +00:00
John Baldwin	97b4306f0f	Just check p_oncpu when determining if a process is executing or not. We already did this in the SMP case, and it is now maintained in the UP case as well, and makes the code slightly more readable. Note that curproc is always executing, thus the p != curproc test does not need to be performed if the p_oncpu check is made.	2001-07-03 08:00:57 +00:00
John Baldwin	9d36b83e2c	Axe spl's that are covered by the sched_lock (and have been for quite some time.)	2001-07-03 07:53:35 +00:00
John Baldwin	36f1548b96	Include the wait message and channel for msleep() in the KTR tracepoint.	2001-07-03 07:39:06 +00:00
John Baldwin	8f451b4114	Remove bogus need_resched() of the current CPU in roundrobin(). We don't actually need to force a context switch of the current process. The act of firing the event triggers a context switch to softclock() and then switching back out again which is equivalent to a preemption, thus no further work is needed on the local CPU.	2001-07-03 05:33:09 +00:00
John Baldwin	a300519d41	Make the schedlock saved critical section state a per-thread property.	2001-06-30 03:11:26 +00:00
John Baldwin	1df95969b5	- Lock CURSIG() with the proc lock to close the signal race with psignal. - Grab Giant around ktrace points. - Clean up KTR_PROC tracepoints to not display the value of sched_lock.mtx_lock as it isn't really needed anymore and just obfuscates the messages. - Add a few if conditions to replace gotos. - Ensure that every msleep KTR event ends up with a matching msleep resume KTR event (this was broken when we didn't do a mi_switch()). - Only note via ktrace that we resumed from a switch once rather than twice in several places in msleep(). - Remove spl's rom asleep and await as the proc lock and sched_lock provide all the needed locking. - In mawait() add in a needed ktrace point for noting that we are about to switch out.	2001-06-22 23:11:26 +00:00
John Baldwin	b516d2f5e1	Add in assertions to ensure that we always call msleep or mawait with either a timeout or a held mutex to detect unprotected infinite sleeps that can easily lead to deadlock. Submitted by: alfred	2001-05-23 19:38:26 +00:00
Alfred Perlstein	a4d22b8035	Remove KASSERT test for sleeping on mv_mtx, instead let WITNESS catch it. Requested by: jhb	2001-05-22 00:58:20 +00:00
Alfred Perlstein	5ee5c3aa1f	remove my private assertions from tsleep. add one assertion to ensure we don't sleep while holding vm.	2001-05-19 01:40:48 +00:00
Alfred Perlstein	2395531439	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
John Baldwin	74fc745594	- Remove unneeded include of sys/ipl.h. - Lock the process before calling killproc() to kill it for exceeding the maximum CPU limit.	2001-05-15 23:15:06 +00:00
John Baldwin	6caa8a1501	Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP. - It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the _process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the _process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the _process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the _process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in. Reviewed by: jake, peter Looked over by: eivind	2001-04-27 19:28:25 +00:00
John Baldwin	1005a129e5	Convert the allproc and proctree locks from lockmgr locks to sx locks.	2001-03-28 11:52:56 +00:00
John Baldwin	192846463a	Rework the witness code to work with sx locks as well as mutexes. - Introduce lock classes and lock objects. Each lock class specifies a name and set of flags (or properties) shared by all locks of a given type. Currently there are three lock classes: spin mutexes, sleep mutexes, and sx locks. A lock object specifies properties of an additional lock along with a lock name and all of the extra stuff needed to make witness work with a given lock. This abstract lock stuff is defined in sys/lock.h. The lockmgr constants, types, and prototypes have been moved to sys/lockmgr.h. For temporary backwards compatability, sys/lock.h includes sys/lockmgr.h. - Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin locks held. By making this per-cpu, we do not have to jump through magic hoops to deal with sched_lock changing ownership during context switches. - Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with proc->p_sleeplocks, which is a list of held sleep locks including sleep mutexes and sx locks. - Add helper macros for logging lock events via the KTR_LOCK KTR logging level so that the log messages are consistent. - Add some new flags that can be passed to mtx_init(): - MTX_NOWITNESS - specifies that this lock should be ignored by witness. This is used for the mutex that blocks a sx lock for example. - MTX_QUIET - this is not new, but you can pass this to mtx_init() now and no events will be logged for this lock, so that one doesn't have to change all the individual mtx_lock/unlock() operations. - All lock objects maintain an initialized flag. Use this flag to export a mtx_initialized() macro that can be safely called from drivers. Also, we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness performs the corresponding checks using the initialized flag. - The lock order reversal messages have been improved to output slightly more accurate file and line numbers.	2001-03-28 09:03:24 +00:00
John Baldwin	eed4805444	- Proc locking. - Remove unneeded spl()'s.	2001-03-07 03:01:53 +00:00
John Baldwin	c978f49e20	Add a mtx_assert() in maybe_resched() just to be sure it's always called with sched_lock held.	2001-02-22 13:47:01 +00:00
John Baldwin	5a93f3e851	- Use the new NOCPU constant. - Fix a warning. Noticed by: bde (2)	2001-02-22 00:32:13 +00:00
John Baldwin	5813dc03bd	- Don't call clear_resched() in userret(), instead, clear the resched flag in mi_switch() just before calling cpu_switch() so that the first switch after a resched request will satisfy the request. - While I'm at it, move a few things into mi_switch() and out of cpu_switch(), specifically set the p_oncpu and p_lastcpu members of proc in mi_switch(), and handle the sched_lock state change across a context switch in mi_switch(). - Since cpu_switch() no longer handles the sched_lock state change, we have to setup an initial state for sched_lock in fork_exit() before we release it.	2001-02-20 05:26:15 +00:00
Jake Burkholder	d5a08a6065	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.	2001-02-12 00:20:08 +00:00
Jake Burkholder	c11f93b3e7	Acquire sched_lock around need_resched() in roundrobin() to satisfy assertions that it is held. Since roundrobin() is a timeout there's no possible way that it could be called with sched_lock held.	2001-02-10 19:07:32 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00

1 2 3 4 5 ...

374 Commits