Commit Graph

197 Commits

Author SHA1 Message Date
Julian Elischer
ed062c8d66 Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is  now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by:	scottl, peter
MFC after:	1 week
2004-09-05 02:09:54 +00:00
Julian Elischer
2630e4c90c Give setrunqueue() and sched_add() more of a clue as to
where they are coming from and what is expected from them.

MFC after:	2 days
2004-09-01 02:11:28 +00:00
David Xu
cf1867f932 Remove TDP_USTATCLOCK, we no longer need it because we now always
update tick count for userland in thread_userret. This change
also removes a "no upcall owned" panic because fuword() schedules
an upcall under heavily loaded, and code assumes there is no upcall
can occur.

Reported and Tested by: Peter Holm <peter@holm.cc>
2004-08-31 11:52:05 +00:00
Julian Elischer
5995adc206 Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after:	3 days
2004-08-31 07:34:54 +00:00
David Xu
5897f840f0 1. try to use existing mailbox address in thread_update_usr_ticks.
2. remove '\n' in KASSERT.
2004-08-28 04:16:32 +00:00
David Xu
ad1280b593 Move TDF_CAN_UNBIND to thread private flags td_pflags, this eliminates
need of sched_lock in some places. Also in thread_userret, remove
spare thread allocation code, it is already done in thread_user_enter.

Reviewed by: julian
2004-08-28 04:08:05 +00:00
David Xu
d30412a8db Remove checking of single exit flag in thread_user_enter(), this is
generic code for threaded process, should not be here.
2004-08-23 22:54:37 +00:00
Julian Elischer
e2105bce2a Slight changes to comments and some whitespace changes. 2004-08-09 21:57:30 +00:00
David Xu
604be46d1e 1.Add KSE_INTR_DBSUSPEND command for kse_thr_interrupt to suspend a bound
thread, after the bound thread leaves critical region, the thread should
check debug flag may suspend itself by using the command.
2.Schedule upcall after thread is suspended by debugger
3.Wakeup upcall thread after process suspension.

Reviewed by: deischen
2004-08-08 22:32:20 +00:00
David Xu
4513fb36aa s/TMDF_DONOTRUNUSER/TMDF_SUSPEND/g
Dicussed with: deischen
2004-08-03 02:23:06 +00:00
Julian Elischer
4fd54632b0 Repeat after me:
"Do not apply your tested patches to your commit tree by hand"
2004-08-03 01:43:29 +00:00
Julian Elischer
c94b38af46 Remove an argument that is never used. 2004-08-02 23:48:43 +00:00
Robert Watson
3d3f5f6057 Add what appears to be a missing '*/' at the end of a comment. 2004-08-02 01:38:27 +00:00
Julian Elischer
6e0fbb01c5 Comment kse_create() and make a few minor code cleanups
Reviewed by:	davidxu
2004-08-01 23:02:00 +00:00
Julian Elischer
55d44f79ea When calling scheduler entrypoints for creating new threads and processes,
specify "us" as the thread not the process/ksegrp/kse.
You can always find the others from the thread but the converse is not true.
Theorotically this would lead to runtime being allocated to the wrong
entity in some cases though it is not clear how often this actually happenned.
(would only affect threaded processes and would probably be pretty benign,
but it WAS a bug..)

Reviewed by: peter
2004-07-18 23:36:13 +00:00
John Baldwin
52eb84641d - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags
since they are only accessed by curthread and thus do not need any
  locking.
- Move pr_addr and pr_ticks out of struct uprof (which is per-process)
  and directly into struct thread as td_profil_addr and td_profil_ticks
  as these variables are really per-thread.  (They are used to defer an
  addupc_intr() that was too "hard" until ast()).
2004-07-16 21:04:55 +00:00
David Xu
4d47dc5549 Add code to support debugging threaded process.
1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
     thread current user thread is running on. Add tm_dflags into
     kse_thr_mailbox, the flags is written by debugger, it tells
     UTS and kernel what should be done when the process is being
     debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.

     TMDF_SSTEP is used to tell kernel to turn on single stepping,
     or turn off if it is not set.

     TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
     whenever possible, to UTS, it means do not run the user thread
     until debugger clears it, this behaviour is necessary because
     gdb wants to resume only one thread when the thread's pc is
     at a breakpoint, and thread needs to go forward, in order to
     avoid other threads sneak pass the breakpoints, it needs to remove
     breakpoint, only wants one thread to go. Also, add km_lwp to
     kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
     switch time when process is not being debugged, so when process
     is attached, debugger can map kernel thread to user thread.

  2. Add p_xthread to proc strcuture and td_xsig to thread structure.
     p_xthread is used by a thread when it wants to report event
     to debugger, every thread can set the pointer, especially, when
     it is used in ptracestop, it is the last thread reporting event
     will win the race. Every thread has a td_xsig to exchange signal
     with debugger, thread uses TDF_XSIG flag to indicate it is reporting
     signal to debugger, if the flag is not cleared, thread will keep
     retrying until it is cleared by debugger, p_xthread may be
     used by debugger to indicate CURRENT thread. The p_xstat is still
     in proc structure to keep wait() to work, in future, we may
     just use td_xsig.

  3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
     a thread. When process stops, debugger can set the flag for
     thread, thread will check the flag in thread_suspend_check,
     enters a loop, unless it is cleared by debugger, process is
     detached or process is existing. The flag is also checked in
     ptracestop, so debugger can temporarily suspend a thread even
     if the thread wants to exchange signal.

  4. Current, in ptrace, we always resume all threads, but if a thread
     has already a TDF_DBSUSPEND flag set by debugger, it won't run.

Encouraged by: marcel, julian, deischen
2004-07-13 07:33:40 +00:00
David Xu
507b03186a Change kse_switchin to accept kse_thr_mailbox pointer, the syscall
will be used heavily in debugging KSE threads. This breaks libpthread
on IA64, but because libpthread was not in 5.2.1 release, I would like
to change it so we needn't to introduce another syscall.
2004-07-12 07:39:20 +00:00
Marcel Moolenaar
247aba2474 Allocate TIDs in thread_init() and deallocate them in thread_fini().
The overhead of unconditionally allocating TIDs (and likewise,
unconditionally deallocating them), is amortized across multiple
thread creations by the way UMA makes it possible to have type-stable
storage.
Previously the cost was kept down by having threads created as part
of a fork operation use the process' PID as the TID. While this had
some nice properties, it also introduced complexity in the way TIDs
were allocated. Most importantly, by using the type-stable storage
that UMA gives us this was also unnecessary.

This change affects how core dumps are created and in particular how
the PRSTATUS notes are dumped. Since we don't have a thread with a
TID equalling the PID, we now need a different way to preserve the
old and previous behavior. We do this by having the given thread (i.e.
the thread passed to the core dump code in td) dump it's state first
and fill in pr_pid with the actual PID. All other threads will have
pr_pid contain their TIDs. The upshot of all this is that the debugger
will now likely select the right LWP (=TID) as the initial thread.

Credits to: julian@ for spotting how we can utilize UMA.
Thanks to: all who provided julian@ with test results.
2004-06-26 18:58:22 +00:00
Julian Elischer
94e0a4cdf3 Shuffle some code around. 2004-06-11 17:48:20 +00:00
Julian Elischer
30276dc9f8 Move the KSE ABI specific code here and separate it from code that
is generic to any threading system. This commit does not link this
file to the build yet, nor does it remove these functions from their
current location in kern_thread.c. (that commit coming up after further review)
2004-06-07 07:25:03 +00:00
Tim J. Robbins
aa0aa7a113 Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by:	davidxu
2004-06-02 07:52:36 +00:00
David Xu
702ac0f112 Clear KSE thread flags after KSE thread mode is ended. The side effect
of not clearing the flags for execv() syscall will result that a new
program runs in KSE thread mode without enabling it.

Submitted by: tjr
Modified by: davidxu
2004-05-21 14:50:23 +00:00
Daniel Eischen
4fc21c0947 Keep track of threads waiting in kse_release() to avoid a race
condition where kse_wakeup() doesn't yet see them in (interruptible)
sleep queues.  Also add an upcall check to sleepqueue_catch_signals()
suggested by jhb.

This commit should fix recent mysql hangs.

Reviewed by:	jhb, davidxu
Mysql'd by:	Robin P. Blanchard <robin.blanchard at gactr uga edu>
2004-04-28 20:36:53 +00:00
Marcel Moolenaar
fdcac92868 Assign thread IDs to kernel threads. The purpose of the thread ID (tid)
is twofold:
1. When a 1:1 or M:N threaded process dumps core, we need to put the
   register state of each of its kernel threads in the core file.
   This can only be done by differentiating the pid field in the
   respective note. For this we need the tid.
2. When thread support is present for remote debugging the kernel
   with gdb(1), threads need to be identified by an integer due to
   limitations in the remote protocol. This requires having a tid.

To minimize the impact of having thread IDs, threads that are created
as part of a fork (i.e. the initial thread in a process) will inherit
the process ID (i.e. tid=pid). Subsequent threads will have IDs larger
than PID_MAX to avoid interference with the pid allocation algorithm.
The assignment of tids is handled by thread_new_tid().

The thread ID allocation algorithm has been written with 3 assumptions
in mind:
1. IDs need to be created as fast a possible,
2. Reuse of IDs may happen instantaneously,
3. Someone else will write a better algorithm.
2004-04-03 15:59:13 +00:00
Julian Elischer
84eef27df4 Massively up the (artificial) limit on system scope threads
in a process from 50 to 500

Also up the number of process scope threads allowed to be in the kernel
at one time from 150 to 1500 (per process)
2004-03-21 09:22:38 +00:00
Peter Wemm
37814395c1 Push Giant down a little further:
- no longer serialize on Giant for thread_single*() and family in fork,
  exit and exec
- thread_wait() is mpsafe, assert no Giant
- reduce scope of Giant in exit to not cover thread_wait and just do
  vm_waitproc().
- assert that thread_single() family are not called with Giant
- remove the DROP/PICKUP_GIANT macros from thread_single() family
- assert that thread_suspend_check() s not called with Giant
- remove manual drop_giant hack in thread_suspend_check since we know it
  isn't held.
- remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family
- mark kse_create() mpsafe
2004-03-13 22:31:39 +00:00
John Baldwin
707559e402 Check for TDF_SINTR before calling sleepq_abort() as there is a narrow
race in between sleepq_add() and sleepq_catch_signals() in that setting
td_wchan and TDF_SINTR is not atomic to sched_lock but only to the sleepq
lock.  This band-aid will stop assertion failures, but there is perhaps a
larger problem with the sleepq_add/sleepq_catch_signals race that I am not
sure how to solve.  For the signals case the race is harmless because we
always call cursig() after setting TDF_SINTR.  However, KSE doesn't do
anything in sleepq_catch_signals() to check that this race was lost, so I
am unsure if this race is harmful for this specific abort.
2004-03-01 23:07:58 +00:00
John Baldwin
44f3b09204 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
John Baldwin
62a0fd943c Use mtx_assert() rather than using a home-rolled version. 2004-01-28 20:26:39 +00:00
Jeff Roberson
29bcc4514f - Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL.  Assert that one of these is set in mi_switch() and propery
   adjust the rusage statistics.  This is to simplify the large number of
   users of this interface which were previously all required to adjust the
   proper counter prior to calling mi_switch().  This also facilitates more
   switch and locking optimizations.
 - Change all callers of mi_switch() to pass the appropriate paramter and
   remove direct references to the process statistics.
2004-01-25 03:54:52 +00:00
Robert Watson
679365e7b9 Reduce gratuitous includes: don't include jail.h if it's not needed.
Presumably, at some point, you had to include jail.h if you included
proc.h, but that is no longer required.

Result of:	self injury involving adding something to struct prison
2004-01-21 17:10:47 +00:00
Jens Schweikhardt
85495c72ff s/Muliple/Multiple
Removed whitespace at EOL and EOF.
2004-01-10 18:34:01 +00:00
Peter Wemm
55cdddc0d8 Don't use NULL (pointer) when we mean 0 (integer) for the number of ticks
in msleep.
2003-12-23 02:28:42 +00:00
Marcel Moolenaar
ccb46feb8e Write the thread pointer (val) in the kse mailbox (loc) before we
set the new context in kse_switchin(2). This allows us to return
an error to the calling context when the suword() fails.
2003-12-10 01:59:23 +00:00
Marcel Moolenaar
702b2a179c Add kse_switchin(2). This syscall can be used by KSE implementations
to have the kernel switch to a new thread, instead of doing it in
userland. It is in fact needed on ia64 where syscall restarts do not
return to userland first. It's completely handled inside the kernel.
As such, any context created by the kernel as part of an upcall and
caused by some syscall needs to be restored by the kernel.
2003-12-07 19:34:29 +00:00
Alan Cox
bca62663ab - Giant is no longer required by vm_thread_new(). 2003-12-07 04:16:49 +00:00
John Baldwin
961a7b244d Add an implementation of turnstiles and change the sleep mutex code to use
turnstiles to implement blocking isntead of implementing a thread queue
directly.  These turnstiles are somewhat similar to those used in Solaris 7
as described in Solaris Internals but are also different.

Turnstiles do not come out of a fixed-sized pool.  Rather, each thread is
assigned a turnstile when it is created that it frees when it is destroyed.
When a thread blocks on a lock, it donates its turnstile to that lock to
serve as queue of blocked threads.  The queue associated with a given lock
is found by a lookup in a simple hash table.  The turnstile itself is
protected by a lock associated with its entry in the hash table.  This
means that sched_lock is no longer needed to contest on a mutex.  Instead,
sched_lock is only used when manipulating run queues or thread priorities.
Turnstiles also implement priority propagation inherently.

Currently turnstiles only support mutexes.  Eventually, however, turnstiles
may grow two queue's to support a non-sleepable reader/writer lock
implementation.  For more details, see the comments in sys/turnstile.h and
kern/subr_turnstile.c.

The two primary advantages from the turnstile code include: 1) the size
of struct mutex shrinks by four pointers as it no longer stores the
thread queue linkages directly, and 2) less contention on sched_lock in
SMP systems including the ability for multiple CPUs to contend on different
locks simultaneously (not that this last detail is necessarily that much of
a big win).  Note that 1) means that this commit is a kernel ABI breaker,
so don't mix old modules with a new kernel and vice versa.

Tested on:	i386 SMP, sparc64 SMP, alpha SMP
2003-11-11 22:07:29 +00:00
David Xu
ab2baa7254 Let SA process work under ULE scheduler, originally it would panic kernel.
Reviewed by: jeff
2003-08-26 11:33:15 +00:00
Sam Leffler
c06eb4e293 Change instances of callout_init that specify MPSAFE behaviour to
use CALLOUT_MPSAFE instead of "1" for the second parameter.  This
does not change the behaviour; it just makes the intent more clear.
2003-08-19 17:51:11 +00:00
Peter Grehan
eac100658a Update powerpc to use the (old thread,new thread) calling convention
for cpu_throw() and cpu_switch().
2003-08-14 03:56:24 +00:00
John Baldwin
e9911cf591 - Convert Alpha over to the new calling conventions for cpu_throw() and
cpu_switch() where both the old and new threads are passed in as
  arguments.  Only powerpc uses the old conventions now.
- Update comments in the Alpha swtch.s to reflect KSE changes.

Tested by:	obrien, marcel
2003-08-12 19:33:36 +00:00
Daniel Eischen
ab908f5935 Copyin the thread mailbox flags from the correct location
in the mailbox.
2003-08-08 20:23:10 +00:00
John Baldwin
8b149b5131 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
David Xu
d3b5e418bc Introduce a thread mailbox flag TMF_NOUPCALL. On some architectures other
than i386 or AMD64, TP register points to thread mailbox, and they can not
atomically clear km_curthread in kse mailbox, in this case, thread retrieves
its thread pointer from TP register and sets flag TMF_NOUPCALL in its thread
mailbox to indicate a critical region.
2003-08-05 12:00:55 +00:00
John Baldwin
139b7550d9 Set td_critnest to 1 when setting up a thread since it is a MI field with
MI values.  This ensures that td_critnest for a newly fork'd thread is
always valid.

Requested by:	bde (a long time ago)
2003-08-04 20:28:20 +00:00
David Xu
dd7da9aa28 o Refine kse_thr_interrupt to allow it to handle different commands.
o Remove TDF_NOSIGPOST.
o Add a member td_waitset to proc structure, it will be used for sigwait.

Tested by: deischen
2003-07-17 22:45:33 +00:00
David Xu
af161f2232 If initial thread is still a bound thread, don't change its signal mask. 2003-07-15 14:04:38 +00:00
David Xu
4b7d5d84ee Rename thread_siginfo to cpu_thread_siginfo 2003-07-15 04:26:26 +00:00
Mike Makonnen
8689793bfb kse_thr_interrupt should target the thread, specifically.
Requested by:	davidxu
2003-07-04 01:41:32 +00:00