Commit Graph

233 Commits

Author SHA1 Message Date
David Xu
d10183d94d This is initial version of POSIX priority mutex support, a new userland
mutex structure is added as following:
struct umutex {
        __lwpid_t       m_owner;
        uint32_t        m_flags;
        uint32_t        m_ceilings[2];
        uint32_t        m_spare[4];
};
The m_owner represents owner thread, it is a thread id, in non-contested
case, userland can simply use atomic_cmpset_int to lock the mutex, if the
mutex is contested, high order bit will be set, and userland should do locking
and unlocking via kernel syscall. Flag UMUTEX_PRIO_INHERIT represents
pthread's PTHREAD_PRIO_INHERIT mutex, which when contention happens, kernel
should do priority propagating. Flag UMUTEX_PRIO_PROTECT indicates it is
pthread's PTHREAD_PRIO_PROTECT mutex, userland should initialize m_owner
to contested state UMUTEX_CONTESTED, then atomic_cmpset_int will be failure
and kernel syscall should be invoked to do locking, this becauses
for such a mutex, kernel should always boost the thread's priority before
it can lock the mutex, m_ceilings is used by PTHREAD_PRIO_PROTECT mutex,
the first element is used to boost thread's priority when it locked the mutex,
second element is used when the mutex is unlocked, the PTHREAD_PRIO_PROTECT
mutex's link list is kept in userland, the m_ceiling[1] is managed by thread
library so kernel needn't allocate memory to keep the link list, when such
a mutex is unlocked, kernel reset m_owner to UMUTEX_CONTESTED.
Flag USYNC_PROCESS_SHARED indicate if the synchronization object is process
shared, if the flag is not set, it saves a vm_map_lookup() call.

The umtx chain is still used as a sleep queue, when a thread is blocked on
PTHREAD_PRIO_INHERIT mutex, a umtx_pi is allocated to support priority
propagating, it is dynamically allocated and reference count is used,
it is not optimized but works well in my tests, while the umtx chain has
its own locking protocol, the priority propagating protocol are all protected
by sched_lock because priority propagating function is called with sched_lock
held from scheduler.

No visible performance degradation is found which these changes. Some parameter
names in _umtx_op syscall are renamed.
2006-08-28 04:24:51 +00:00
Maxim Konovalov
e2668f5563 o Fix typo in the comment.
PR:		kern/99632
Submitted by:	clsung
2006-06-30 08:10:55 +00:00
David Xu
bf1a322061 Rethink it a bit, if there is a STOP flag, don't bother to resume other
threads.
2006-03-21 10:05:15 +00:00
David Xu
568b4ebbcc Because JOB control has higher priority than single threading in
thread_suspend_check(), call thread_stopped() to report SIGCHLD
if there is JOB control in progress.
2006-03-21 08:41:15 +00:00
David Xu
e170bfda56 1. Count last time slice, this intends to fix
"calcru: runtime went backwards" bug for threaded process.
2. Add comment about possible logical problem with scheduler.

MFC after: 3 days
2006-03-14 04:00:21 +00:00
David Xu
28e989e9ca Remove unused code. 2006-03-13 10:37:25 +00:00
David Xu
94f0972bec Fix a long standing race between sleep queue and thread
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.

Reviewed by:	jhb
MFC after:	1 week
2006-02-15 23:52:01 +00:00
David Xu
d8267df729 In order to speed up process suspension on MP machine, send IPI to
remote CPU. While here, abstract thread suspension code into a function
called sig_suspend_threads, the function is called when a process received
a STOP signal.
2006-02-13 03:16:55 +00:00
Robert Watson
89964dd284 When exiting a thread, submit any pending record. Today, we don't
audit thread exit, but should that happen, this will prevent
unhappiness, as the thread exit system call will never return, and
hence not commit the record.

Pointed out by/with:	cognet
Obtained from:		TrustedBSD Project
2006-02-06 01:51:08 +00:00
Robert Watson
6e8525ce84 When GC'ing a thread, assert that it has no active audit record.
This should not happen, but with this assert, brueffer and I would
not have spent 45 minutes trying to figure out why he wasn't
seeing audit records with the audit version in CVS.

Obtained from:	TrustedBSD Project
2006-02-05 21:06:09 +00:00
Robert Watson
911b84b08d Add new fields to process-related data structures:
- td_ar to struct thread, which holds the in-progress audit record during
  a system call.

- p_au to struct proc, which holds per-process audit state, such as the
  audit identifier, audit terminal, and process audit masks.

In the earlier implementation, td_ar was added to the zero'd section of
struct thread.  In order to facilitate merging to RELENG_6, it has been
moved to the end of the data structure, requiring explicit
initalization in the thread constructor.

Much help from:	wsalamon
Obtained from:	TrustedBSD Project
2006-02-02 00:37:05 +00:00
David Xu
5c4745177f Now SIGCHLD is always queued. 2005-12-09 02:27:55 +00:00
David Xu
b2f92ef96b Last step to make mq_notify conform to POSIX standard, If the process
has successfully attached a notification request to the message queue
via a queue descriptor, file closing should remove the attachment.
2005-11-30 05:12:03 +00:00
David Xu
ebceaf6dc7 Add support for queueing SIGCHLD same as other UNIX systems did.
For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.

The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.

There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.

Tested on: i386 (SMP and UP)
2005-11-08 09:09:26 +00:00
David Xu
44355392b4 Add thread_find() function to search a thread by lwpid. 2005-11-03 01:34:08 +00:00
David Xu
60354683d9 Make p_itimers as a pointer, so file sys/proc.h does not need to include
sys/timers.h.
2005-10-23 12:19:08 +00:00
David Xu
86857b368d Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC
clock are supported. I have plan to merge XSI timer ITIMER_REAL and other
two CPU timers into the new code, current three slots are available for
the XSI timers.
The SIGEV_THREAD notification type is not supported yet because our
sigevent struct lacks of two member fields:
sigev_notify_function
sigev_notify_attributes
I have found the sigevent is used in AIO, so I won't add the two members
unless the AIO code is adjusted.
2005-10-23 04:22:56 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
David Xu
763a429571 Fox a LOR of sleep and sched_lock by using a timeout wait
when process reaches maximum number of threads.

MFC after: 3 days
2005-09-30 06:09:41 +00:00
David Xu
39c4e4c202 Remove sleep queue hack, it is no longer needed with current sleep queue.
Actually, it causes process to hang when it is being debugged.

PR: gnu/77818
2005-05-27 04:27:22 +00:00
David Xu
21fc316430 Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.
2005-04-23 02:32:32 +00:00
Julian Elischer
b75b03116f Fix code freeing wrong cred pointer.
Submitted by:	das
Noticed by: Coverity tool
MFC after:	3 days

Note: usually the two pointers point to the same
thing but it was still a bug.
2005-03-21 22:55:38 +00:00
Poul-Henning Kamp
773eff9d97 Sleeping is not allowed in uma->fini 2005-03-19 08:22:13 +00:00
Poul-Henning Kamp
1ea7a6f806 Use subr_unit to allocate thread ID's with.
Tested by:	davidxu
2005-03-18 12:34:14 +00:00
David Xu
bc8e6d817d Allocate umtx_q from heap instead of stack, this avoids
page fault panic in kernel under heavy swapping.
2005-03-05 09:15:03 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Jeff Roberson
7842f65e7f - Garbage collect several unused members of struct kse and struce ksegrp.
As best as I can tell, some of these were never used.
2004-12-14 10:53:55 +00:00
David Schultz
6db36923ad Remove local definitions of RANGEOF() and use __rangeof() instead.
Also remove a few bogus casts.
2004-11-20 23:00:59 +00:00
David Xu
8acf605790 Respect TDF_SINTR, don't suspend uninterruptible thread. 2004-11-05 22:40:33 +00:00
David Xu
64895117a0 Backout previous commit, the P_STOPPED_BOUNDARY flag was already
cleared at the begin of thread_single() when needed.
2004-11-05 22:31:20 +00:00
David Xu
cefe021b6c Don't forget to turn off P_SINGLE_BOUNDARY for thread_single(SINGLE_EXIT),
otherwise a threaded process which calls execv() will hang in kernel and
may can not be killed!
2004-11-04 22:13:16 +00:00
John Baldwin
ebcfea8764 Whitespace fix. 2004-10-12 19:36:00 +00:00
David Xu
906ac69d08 In original kern_execve() code, at the start of the function, it forces
all other threads to suicide, problem is execve() could be failed, and
a failed execve() would change threaded process to unthreaded, this side
effect is unexpected.
The new code introduces a new single threading mode SINGLE_BOUNDARY, in
the mode, all threads should suspend themself at user boundary except
the singler. we can not use SINGLE_NO_EXIT because we want to start from
a clean state if execve() is successful, suspending other threads at unknown
point and later resuming them from there and forcing them to exit at user
boundary may cause the process to start from a dirty state. If execve() is
successful, current thread upgrades to SINGLE_EXIT mode and forces other
threads to suicide at user boundary, otherwise, other threads will be resumed
and their interrupted syscall will be restarted.

Reviewed by: julian
2004-10-06 00:40:41 +00:00
Julian Elischer
fcb7c67b7b Slight cleanup in the single threading code.
MFC after:	4 days
2004-10-05 22:05:25 +00:00
Julian Elischer
e5bedcef92 Break out to a separate function, the code to revert a multithreaded
process back to officially being a non-threaded program.

MFC after:	4 days
2004-10-05 20:39:26 +00:00
Julian Elischer
a9b5dc7d6d Always strt out with an initilalised ksegrp structure.
MFC after:	3 days
2004-10-03 20:06:11 +00:00
Julian Elischer
2179a22cc7 Use the universal 'threaded process' flag rather than the
specific tests for different threading systems.

MFC after:	1 week
2004-09-25 00:53:46 +00:00
John Baldwin
7eaec467d8 Various small style fixes. 2004-09-22 15:24:33 +00:00
Julian Elischer
915996978d Try harder to get back to being a non threaded process.
Submitted by:	DavidXu
MFC after:	3 days
2004-09-15 18:39:09 +00:00
Julian Elischer
ed062c8d66 Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is  now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by:	scottl, peter
MFC after:	1 week
2004-09-05 02:09:54 +00:00
David Xu
45a4bfa17d Only test return_instead if P_SINGLE_EXIT is set, otherwise a fork()
syscall can interrupt other thread's syscall in sleepq_catch_signals().
Current, all callers know thread_suspend_check may suspend thread
itself, so we need't to check return_instead for normal suspension
flags (no P_SINGLE_EXIT set).

Tested by: deischen
Reported by: Maarten L. Hekkelman <m.hekkelman@cmbi.kun.nl>
2004-08-29 23:10:02 +00:00
John Baldwin
007ddf7e7a Now that the return value semantics of cv's for multithreaded processes
have been unified with that of msleep(9), further refine the sleepq
interface and consolidate some duplicated code:
- Move the pre-sleep checks for theaded processes into a
  thread_sleep_check() function in kern_thread.c.
- Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c.
  Specifically, if a thread is awakened by something other than a signal
  while checking for signals before going to sleep, clear TDF_SINTR in
  sleepq_catch_signals().  This removes a sched_lock lock/unlock combo in
  that edge case during an interruptible sleep.  Also, fix
  sleepq_check_signals() to properly handle the condition if TDF_SINTR is
  clear rather than requiring the callers of the sleepq API to notice
  this edge case and call a non-_sig variant of sleepq_wait().
- Clarify the flags arguments to sleepq_add(), sleepq_signal() and
  sleepq_broadcast() by creating an explicit submask for sleepq types.
  Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of
  0.  Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and
  move the setting of TDF_SINTR to sleepq_add() if this flag is set rather
  than sleepq_catch_signals().  Note that it is the caller's responsibility
  to ensure that sleepq_catch_signals() is called if and only if this flag
  is passed to the preceeding sleepq_add().  Note that this also removes a
  sched_lock lock/unlock pair from sleepq_catch_signals().  It also ensures
  that for an interruptible sleep, TDF_SINTR is always set when
  TD_ON_SLEEPQ() is true.
2004-08-19 11:31:42 +00:00
Julian Elischer
f0017f3321 Whitespace nit. 2004-08-14 07:21:20 +00:00
Julian Elischer
732d95288a Increase the amount of data exported by KTR in the KTR_RUNQ setting.
This extra data is needed to really follow what is going on in the
threaded case.
2004-08-09 18:21:12 +00:00
Robert Watson
cc701b73b8 In thread_exit(), include more information about the thread/process
context in the KTR trace record.  In particular, include the same
information as passed for mi_switch() and fork_exit() KTR trace
records.
2004-08-06 22:06:14 +00:00
Brian Feldman
b23f72e98a * Add a "how" argument to uma_zone constructors and initialization functions
so that they know whether the allocation is supposed to be able to sleep
  or not.
* Allow uma_zone constructors and initialation functions to return either
  success or error.  Almost all of the ones in the tree currently return
  success unconditionally, but mbuf is a notable exception: the packet
  zone constructor wants to be able to fail if it cannot suballocate an
  mbuf cluster, and the mbuf allocators want to be able to fail in general
  in a MAC kernel if the MAC mbuf initializer fails.  This fixes the
  panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
  the default.

Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.
2004-08-02 00:18:36 +00:00
Julian Elischer
55d44f79ea When calling scheduler entrypoints for creating new threads and processes,
specify "us" as the thread not the process/ksegrp/kse.
You can always find the others from the thread but the converse is not true.
Theorotically this would lead to runtime being allocated to the wrong
entity in some cases though it is not clear how often this actually happenned.
(would only affect threaded processes and would probably be pretty benign,
but it WAS a bug..)

Reviewed by: peter
2004-07-18 23:36:13 +00:00
John Baldwin
d3373e371b Whitespace fix. 2004-07-16 21:01:52 +00:00
David Xu
cbf4e354ec Add code to support debugging threaded process.
1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
   thread current user thread is running on. Add tm_dflags into
   kse_thr_mailbox, the flags is written by debugger, it tells
   UTS and kernel what should be done when the process is being
   debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.

   TMDF_SSTEP is used to tell kernel to turn on single stepping,
   or turn off if it is not set.

   TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
   whenever possible, to UTS, it means do not run the user thread
   until debugger clears it, this behaviour is necessary because
   gdb wants to resume only one thread when the thread's pc is
   at a breakpoint, and thread needs to go forward, in order to
   avoid other threads sneak pass the breakpoints, it needs to remove
   breakpoint, only wants one thread to go. Also, add km_lwp to
   kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
   switch time when process is not being debugged, so when process
   is attached, debugger can map kernel thread to user thread.

2. Add p_xthread to proc strcuture and td_xsig to thread structure.
   p_xthread is used by a thread when it wants to report event
   to debugger, every thread can set the pointer, especially, when
   it is used in ptracestop, it is the last thread reporting event
   will win the race. Every thread has a td_xsig to exchange signal
   with debugger, thread uses TDF_XSIG flag to indicate it is reporting
   signal to debugger, if the flag is not cleared, thread will keep
   retrying until it is cleared by debugger, p_xthread may be
   used by debugger to indicate CURRENT thread. The p_xstat is still
   in proc structure to keep wait() to work, in future, we may
   just use td_xsig.

3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
   a thread. When process stops, debugger can set the flag for
   thread, thread will check the flag in thread_suspend_check,
   enters a loop, unless it is cleared by debugger, process is
   detached or process is existing. The flag is also checked in
   ptracestop, so debugger can temporarily suspend a thread even
   if the thread wants to exchange signal.

4. Current, in ptrace, we always resume all threads, but if a thread
   has already a TDF_DBSUSPEND flag set by debugger, it won't run.

Encouraged by: marcel, julian, deischen
2004-07-13 07:20:10 +00:00
John Baldwin
bf0acc273a - Change mi_switch() and sched_switch() to accept an optional thread to
switch to.  If a non-NULL thread pointer is passed in, then the CPU will
  switch to that thread directly rather than calling choosethread() to pick
  a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
  TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
  requiring all callers of mi_switch() to know to do this if they can be
  called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
  the middle of the function prototypes and up above into their own
  section.
2004-07-02 19:09:50 +00:00