Commit Graph

451 Commits

Author SHA1 Message Date
obrien
ee337f2c34 Be more exact with sigaction SA_SIGINFO handling.
Reviewed by:	marcel
2007-12-18 20:39:13 +00:00
kib
9ae733819b Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
csjp
a16bb8381d Implement AUE_CORE, which adds process core dump support into the kernel.
This change introduces audit_proc_coredump() which is called by coredump(9)
to create an audit record for the coredump event.  When a process
dumps a core, it could be security relevant.  It could be an indicator that
a stack within the process has been overflowed with an incorrectly constructed
malicious payload or a number of other events.

The record that is generated looks like this:

header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec
argument,0,0xb,signal
path,/usr/home/csjp/test.core
subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2
return,success,1
trailer,111

- We allocate a completely new record to make sure we arent clobbering
  the audit data associated with the syscall that produced the core
  (assuming the core is being generated in response to SIGABRT  and not
  an invalid memory access).
- Shuffle around expand_name() so we can use the coredump name at the very
  beginning of the coredump call.  Make sure we free the storage referenced
  by "name" if we need to bail out early.
- Audit both successful and failed coredump creation efforts

Obtained from:	TrustedBSD Project
Reviewed by:	rwatson
MFC after:	1 month
2007-10-26 01:23:07 +00:00
csjp
76f4445c43 Move where we audit the PID argument such that we unconditionally
audit it at the beginning of the syscall.  This fixes a problem
where the user supplies an invalid process ID which is > 0 which
results in the PID argument not being audited.

Obtained from:	TrustedBSD Project
MFC after:	1 week
2007-10-24 00:14:19 +00:00
jeff
1961c471a3 - Calling sched_nice() in tdsigwakeup() is no longer required by ULE and
actually causes LORs and other panics.

Reported by:	mlaier
Approved by:	re
2007-07-19 08:49:16 +00:00
jeff
e93b04c286 - Add a missing PROC_SUNLOCK() in tdsignal() 2007-06-11 23:27:03 +00:00
mjacob
8578bcf993 Initialized ets to zero. This is arguably a gcc bug in that ets is always
set to rts when timeout is non-NULL and then timevalid is set and ets is
only checked later when timervalid is set.
2007-06-10 01:43:11 +00:00
jeff
8931208c40 Commit 4/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.
 - Move some common code into thread_suspend_switch() to handle the
   mechanics of suspending a thread.  The locking here is incredibly
   convoluted and should be simplified.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:52:24 +00:00
jeff
a7a8bac81f - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
kib
f13486a222 Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by:	jhb
Reviewed by:	daichi (unionfs)
Approved by:	re (kensmith)
2007-05-31 11:51:53 +00:00
rwatson
e2456913e5 Comment that tdsignal() may be entered from the debugger. 2007-05-23 17:27:42 +00:00
jhb
a84f74bb36 Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.
2007-03-21 21:20:51 +00:00
rwatson
69938bd196 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
rwatson
300d4098cf Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
2007-03-04 22:36:48 +00:00
delphij
73c9958792 Give which signal caller has attempted to deliver when panicking. 2007-02-09 17:48:28 +00:00
delphij
2e20bff54b Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form. 2007-01-17 14:58:53 +00:00
davidxu
70875d94ab break loop early if we know that there are at least two signals. 2006-12-25 03:00:15 +00:00
trhodes
58cca8458a Merge posix4/* into normal kernel hierarchy.
Reviewed by:	glanced at by jhb
Approved by:	silence on -arch@ and -standards@
2006-11-11 16:26:58 +00:00
jb
f82c799735 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
davidxu
5bbfc34004 Use macro TAILQ_FOREACH_SAFE instead of expanding it. 2006-10-22 00:09:41 +00:00
jhb
4cc0b2f46e Remove the check that prevented signals from being delivered to exiting
processes.  It was originally added back when support for Linux threads
(and thus shared sigacts objects) was added, but no one knows why.  My
guess is that at some point during the Linux threads patches, the sigacts
object was torn down during exit1(), so this check was added to prevent
a panic for that race.  However, the stuff that was actually committed to
the tree doesn't teardown sigacts until wait() making the above race moot.
Re-allowing signals here lets one interrupt a NFS request during process
teardown (such as closing descriptors) on an interruptible mount.

Requested by:	kib (long time ago)
MFC after:	1 week
2006-10-20 16:19:21 +00:00
davidxu
0fa66af83d Move some declaration of 32-bit signal structures into file
freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.
2006-10-05 01:56:11 +00:00
jhb
0f921e0992 Remove various bits of conditional Alpha code and fixup a few comments. 2006-05-12 05:04:46 +00:00
tegge
aa5d948d47 Call vn_finished_write() before calling the coredump handler which will
indirectly call vn_start_write() as necessary for each write.
2006-05-07 22:50:22 +00:00
ps
76acdb6332 Don't try to kill embryonic processes in killpg1(). This prevents
a race condition between fork() and kill(pid,sig) with pid < 0 that
can cause a kernel panic.

Submitted by:	up
MFC after:	3 weeks
2006-04-21 19:26:21 +00:00
jhb
44f0d9f519 - Conditionalize Giant around VFS operations for ALQ, ktrace, and
generating a coredump as the result of a signal.
- Fix a bug where we could leak a Giant lock if vn_start_write() failed
  in coredump().

Reported by:	jmg (2)
2006-03-28 21:30:22 +00:00
davidxu
a84244a660 Remove _STOPEVENT call, it is already called in issignal, simplify
code for SIGKILL signal.
2006-03-09 08:31:51 +00:00
davidxu
f9daf29b84 Add signal set sq_kill to sigqueue structure, the member saves all
signals sent by kill() syscall, without this, a signal sent by
sigqueue() can cause a signal sent by kill() to be lost.
2006-03-02 14:06:40 +00:00
davidxu
890309fa66 1. Refine kern_sigtimedwait() to remove redundant code.
2. Fix a bug, if thread got a SIGKILL signal, call sigexit() to kill
   its process.

MFC after: 3 days
2006-02-23 09:24:19 +00:00
davidxu
b5b90d2902 Code cleanup, simply compare with curproc. 2006-02-23 05:50:55 +00:00
davidxu
f1ce5c8660 Fix a long standing race between sleep queue and thread
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.

Reviewed by:	jhb
MFC after:	1 week
2006-02-15 23:52:01 +00:00
wsalamon
5be1898054 Audit the arguments to the kill(2) and killpg(2) system calls.
Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)
2006-02-14 01:17:03 +00:00
davidxu
6afcb6595b In order to speed up process suspension on MP machine, send IPI to
remote CPU. While here, abstract thread suspension code into a function
called sig_suspend_threads, the function is called when a process received
a STOP signal.
2006-02-13 03:16:55 +00:00
davidxu
1f7f51d43d Create childproc_jobstate function to report job control state, this
also fixes a bug in childproc_continued which ignored PS_NOCLDSTOP.
2006-02-04 14:10:57 +00:00
davidxu
4b072f53d2 Avoid kernel panic when attaching a process which may not be stopped
by debugger, e.g process is dumping core. Only access p_xthread if
P_STOPPED_TRACE is set, this means thread is ready to exchange signal
with debugger, print a warning if P_STOPPED_TRACE is not set due to
some bugs in other code, if there is.

The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and
is slightly adjusted.
2005-12-24 02:59:29 +00:00
davidxu
52e3a6cedd Add a sysctl to force a process to sigexit if a trap signal is
being hold by current thread or ignored by current process,
otherwise, it is very possible the thread will enter an infinite loop
and lead to an administrator's nightmare.
2005-12-09 08:29:29 +00:00
davidxu
2ee31f310b Cleanup sigqueue sysctl. 2005-12-09 02:26:44 +00:00
davidxu
2d6ec412df Fix a lock leak in childproc_continued(). 2005-12-06 05:30:13 +00:00
davidxu
9208ca9d98 set signal queue values for sysconf(). 2005-12-01 00:25:50 +00:00
davidxu
d5fcf7dfa0 Make sure only remove one signal by debugger. 2005-11-12 04:22:16 +00:00
davidxu
f9da852761 WIFxxx macros requires an int type but p_xstat is short, convert it
to int before using the macros.

Bug reported by : Pyun YongHyeon pyunyh at gmail dot com
2005-11-09 07:58:16 +00:00
davidxu
37bb483679 Add support for queueing SIGCHLD same as other UNIX systems did.
For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.

The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.

There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.

Tested on: i386 (SMP and UP)
2005-11-08 09:09:26 +00:00
davidxu
ae161ac239 Fix name compatible problem with POSIX standard. the sigval_ptr and
sigval_int really should be sival_ptr and sival_int.
Also sigev_notify_function accepts a union sigval value but not a
pointer.
2005-11-04 09:41:00 +00:00
davidxu
301b115d60 Cleanup some signal interfaces. Now the tdsignal function accepts
both proc pointer and thread pointer, if thread pointer is NULL,
tdsignal automatically finds a thread, otherwise it sends signal
to given thread.
Add utility function psignal_event to send a realtime sigevent
to a process according to the delivery requirement specified in
struct sigevent.
2005-11-03 04:49:16 +00:00
davidxu
728466f712 Let itimer store itimerspec instead of itimerval, so I don't have to
convert to or from timeval frequently.

Introduce function itimer_accept() to ack a timer signal in signal
acceptance code, this allows us to return more fresh overrun counter
than at signal generating time. while POSIX says:
"the value returned by timer_getoverrun() shall apply to the most
recent expiration signal delivery or acceptance for the timer,.."
I prefer returning it at acceptance time.

Introduce SIGEV_THREAD_ID notification mode, it is used by thread
libary to request kernel to deliver signal to a specified thread,
and in turn, the thread library may use the mechanism to implement
SIGEV_THREAD which is required by POSIX.

Timer signal is managed by timer code, so it can not fail even if
signal queue is full filled by sigqueue syscall.
2005-10-30 02:56:08 +00:00
davidxu
79343f4d09 1. Make ksiginfo_alloc and ksiginfo_free public.
2. Introduce flags KSI_EXT and KSI_INS. The flag KSI_EXT allows a ksiginfo
   to be managed by outside code, the KSI_INS indicates sigqueue_add should
   directly insert passed ksiginfo into queue other than copy it.
2005-10-23 04:12:26 +00:00
davidxu
3fbdb3c215 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
davidxu
c2895a92bd Fix a bug relavant to debugging, a masked signal unexpectedly interrupts
a sleeping thread when process is being debugged.

PR: GNU/77818
Tested by: Sean C. Farley <sean-freebsd at farley org>
2005-06-06 05:13:10 +00:00
davidxu
913d50be4f Oops, forgot to update this file.
Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after    : 4 days
2005-04-19 08:11:28 +00:00
das
7a8e8974c6 Suspend all other threads in the process while generating a core dump.
The main reason for doing this is that the ELF dump handler expects
the thread list to be fixed while the dump header is generated, so an
upcall that occurs at the wrong time can lead to buffer overruns and
other Bad Things.

Another solution would be to grab sched_lock in the ELF dump handler,
but we might as well single-thread, since the process is about to die.
Furthermore, I think this should ensure that the register sets in the
core file are sequentially consistent.
2005-04-10 02:31:24 +00:00
davidxu
ab7386cc5e The td_waitset is pointing to a stack address when thread is waiting
for a signal, because kernel stack is swappable, this causes page fault
in kernel under heavy swapping case. Fix this bug by eliminating unneeded
code.
2005-03-04 22:46:31 +00:00
davidxu
5bf5079e95 In kern_sigtimedwait, remove waitset bits for td_sigmask before
sleeping, so in do_tdsignal, we no longer need to test td_waitset.
now td_waitset is only used to give a thread higher priority when
delivering signal to multithreads process.
This also fixes a bug:
when a thread in sigwait states was suspended and later resumed
by SIGCONT, it can no longer receive signals belong to waitset.
2005-03-02 13:43:51 +00:00
davidxu
699536634e Don't restart a timeout wait in kern_sigtimedwait, also allow it
to wait longer than a single integer can represent.
2005-02-19 06:05:49 +00:00
sobomax
52ae2ac0b9 Backout previous change (disabling of security checks for signals delivered
in emulation layers), since it appears to be too broad.

Requested by:   rwatson
2005-02-13 17:37:20 +00:00
sobomax
1d558007d0 Split out kill(2) syscall service routine into user-level and kernel part, the
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.

Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.

As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.

Requested (sorta) by:	rwatson
MFC after:	2 weeks
2005-02-13 16:42:08 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
jeff
6ec1292f83 - If delivering a signal will result in killing a process that has a
nice value above 0, set it to 0 so that it may proceed with haste.
   This is especially important on ULE, where adjusting the priority
   does not guarantee that a thread will be granted a greater time slice.
2004-12-13 16:45:57 +00:00
imp
4e6fd1a485 Fix an off by one error. MAXPATHLEN already has +1. 2004-11-15 20:51:32 +00:00
alfred
a2b1b554d2 Allow kill -9 to kill processes stuck in procfs STOPEVENTs. 2004-10-30 02:56:22 +00:00
alfred
29ecb2fede Backout 1.291.
re doesn't seem to think this fixes:
  Desired features for 5.3-RELEASE "More truss problems"
2004-10-29 08:24:41 +00:00
davidxu
aa22b44625 Use scheduler api to adjust thread priority. 2004-10-05 09:10:30 +00:00
davidxu
33faeb8a73 Don't bother to turn off other P_STOPPED bits for SIGKILL, doing
so would cause kernel to produce an unkillable process in some cases,
especially, P_STOPPED_SINGLE has a singling thread, turning off the
bit would mess the state.
2004-10-03 13:23:49 +00:00
alfred
0efc91b067 Clear a process's procfs trace points upon delivery of SIGKILL.
MT5 candidate. (Desired features for 5.3-RELEASE "More truss problems")
2004-10-01 14:15:20 +00:00
julian
2782d4b3fc Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after:	3 days
2004-08-31 07:34:54 +00:00
jmg
bc1805c6e8 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
jmg
2c2b6c4ef7 add option to automaticly mark core dumps with the nodump flag
PR:		57065
Submitted by:	Walter C. Pelissero
2004-08-09 05:46:46 +00:00
pjd
7a05d0a3cd Don't skip permission checks when sending signals to zombie processes.
Pointed out by:	bde
Reviewed by:	rwatson
2004-08-03 15:39:23 +00:00
pjd
809d561dd5 Syscall kill(2) called for a zombie process should return 0.
Obtained from:	Darwin
2004-07-29 20:38:19 +00:00
jhb
4585fb3d5c Improve readability a bit by changing some code at the end of a function
that did:

	if (foo)
		return
	else
		blah

to just do the simpler

	if (!foo)
		blah

instead.
2004-07-16 21:00:50 +00:00
davidxu
1920ad199e Add code to support debugging threaded process.
1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
   thread current user thread is running on. Add tm_dflags into
   kse_thr_mailbox, the flags is written by debugger, it tells
   UTS and kernel what should be done when the process is being
   debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.

   TMDF_SSTEP is used to tell kernel to turn on single stepping,
   or turn off if it is not set.

   TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
   whenever possible, to UTS, it means do not run the user thread
   until debugger clears it, this behaviour is necessary because
   gdb wants to resume only one thread when the thread's pc is
   at a breakpoint, and thread needs to go forward, in order to
   avoid other threads sneak pass the breakpoints, it needs to remove
   breakpoint, only wants one thread to go. Also, add km_lwp to
   kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
   switch time when process is not being debugged, so when process
   is attached, debugger can map kernel thread to user thread.

2. Add p_xthread to proc strcuture and td_xsig to thread structure.
   p_xthread is used by a thread when it wants to report event
   to debugger, every thread can set the pointer, especially, when
   it is used in ptracestop, it is the last thread reporting event
   will win the race. Every thread has a td_xsig to exchange signal
   with debugger, thread uses TDF_XSIG flag to indicate it is reporting
   signal to debugger, if the flag is not cleared, thread will keep
   retrying until it is cleared by debugger, p_xthread may be
   used by debugger to indicate CURRENT thread. The p_xstat is still
   in proc structure to keep wait() to work, in future, we may
   just use td_xsig.

3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
   a thread. When process stops, debugger can set the flag for
   thread, thread will check the flag in thread_suspend_check,
   enters a loop, unless it is cleared by debugger, process is
   detached or process is existing. The flag is also checked in
   ptracestop, so debugger can temporarily suspend a thread even
   if the thread wants to exchange signal.

4. Current, in ptrace, we always resume all threads, but if a thread
   has already a TDF_DBSUSPEND flag set by debugger, it won't run.

Encouraged by: marcel, julian, deischen
2004-07-13 07:20:10 +00:00
marcel
57e7de678f Implement the PT_LWPINFO request. This request can be used by the
tracing process to obtain information about the LWP that caused the
traced process to stop. Debuggers can use this information to select
the thread currently running on the LWP as the current thread.

The request has been made compatible with NetBSD for as much as
possible. This implementation differs from NetBSD in the following
ways:
1.  The data argument is allowed to be smaller than the size of the
    ptrace_lwpinfo structure known to the kernel, but not 0. This
    is opposite to what NetBSD allows. The reason for this is that
    we can extend the structure without affecting older binaries.
2.  On NetBSD the tracing process is to set the pl_lwpid field to
    the Id of the LWP it wants information of. We don't do that.
    Our ptrace interface allows passing the LWP Id instead of the
    PID. The tracing process is to set the PID to the LWP Id it
    wants information of.
3.  When the PID is actually the PID of the tracing process, this
    request returns the information about the LWP that caused the
    process to stop. This was the whole purpose of the request in
    the first place.

When the traced process has exited, this request will return the
LWP Id 0, indicating that the process state is not the result of
an event specific to a LWP.
2004-07-12 05:07:50 +00:00
jhb
1b16b181d1 - Change mi_switch() and sched_switch() to accept an optional thread to
switch to.  If a non-NULL thread pointer is passed in, then the CPU will
  switch to that thread directly rather than calling choosethread() to pick
  a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
  TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
  requiring all callers of mi_switch() to know to do this if they can be
  called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
  the middle of the function prototypes and up above into their own
  section.
2004-07-02 19:09:50 +00:00
phk
86602fc06c Deorbit COMPAT_SUNOS.
We inherited this from the sparc32 port of BSD4.4-Lite1.  We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.
2004-06-11 11:16:26 +00:00
davidxu
90554db906 According to SUSv3, sigwait is different with sigwaitinfo, sigwait
returns error code in return value, not in errno.
2004-06-07 13:35:02 +00:00
tjr
80d36400ed Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by:	davidxu
2004-06-02 07:52:36 +00:00
bde
9ec48f4cab Fixed some style bugs in tdsigwakeup(). 2004-05-21 10:02:24 +00:00
jhb
91621895aa In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if
a thread is on a sleep queue and should have it's sleep aborted.

Reported by:	Thierry Herbelot thierry at herbelot dot com
2004-05-20 20:17:28 +00:00
cperciva
7eb8531271 stop() no longer needs sched_lock held; in fact, holding sched_lock causes
a LOR against sleepq.  Fix the comment, and fix ptracestop() to pick up
sched_lock after stop() rather than before.

Reported by:	Scott Sipe <cscotts@mindspring.com>
Reviewed by:	rwatson, jhb
2004-04-12 15:56:05 +00:00
imp
74cf37bd00 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
peter
7957bc47f6 Shorten some XXXKSE commentry 2004-03-29 22:46:54 +00:00
jhb
4e1bd1e348 - Push down Giant in exit() and wait().
- Push Giant down a bit in coredump() and call coredump() with the proc
  lock already held rather than unlocking it only to turn around and
  relock it.

Requested by:	peter
2004-03-05 22:39:53 +00:00
des
e6b61c95ad Use different dummy wait channels to avoid panic in msleep().
Reviewed by:	jhb
2004-03-03 23:03:18 +00:00
jhb
d25301c858 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
jhb
279b2b8278 Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
rwatson
e55550188e Assert process lock in ptracestop(), since we're going to rely
on it, and later unlock it.
2004-01-29 00:58:21 +00:00
kan
a62ca42084 Move the part of the comment which applies to osigsuspend where
it belongs. The current sigsuspend syscall does expect a pointer
to the mask as argument.

Submitted by:	Igor Sysoev <is at rambler-co dot ru>
2004-01-28 06:06:04 +00:00
jeff
c85cdc3d0f - Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL.  Assert that one of these is set in mi_switch() and propery
   adjust the rusage statistics.  This is to simplify the large number of
   users of this interface which were previously all required to adjust the
   proper counter prior to calling mi_switch().  This also facilitates more
   switch and locking optimizations.
 - Change all callers of mi_switch() to pass the appropriate paramter and
   remove direct references to the process statistics.
2004-01-25 03:54:52 +00:00
rwatson
9190052ce9 When not creating a core dump due to resource limits specifying
a maximum dump size of 0, return a size-related error, rather
than returning success.  Otherwise, waitpid() will incorrectly
return a status indicating that a core dump was created.  Note
that the specific error doesn't actually matter, since it's lost.

MFC after:	2 weeks
PR:		60367
Submitted by:	Valentin Nechayev <netch@netch.kiev.ua>
2004-01-11 02:28:06 +00:00
rwatson
befa7a41a2 Drop the sigacts mutex around calls to stopevent() to avoid sleeping
holding the mutex.  Because the sigacts pointer can't change while
the process is "live" (proc locking (x)), we know our pointer is still
valid.

In communication with:	truckman
Reviewed by:		jhb
2004-01-08 22:44:54 +00:00
davidxu
f39653dda8 Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr
2004-01-03 02:02:26 +00:00
davidxu
69ce33ca6e Lock and unlock sched_lock when walking through thread list, current we
insert kse upcall thread into thread list at mi_switch time, process lock
is not enough.
2003-12-07 23:47:15 +00:00
davidxu
91407731e5 Try to fetch thread mailbox address in page fault trap, so when thread
blocks in page fault hanlder, and upcall thread can be scheduled. It is
useful if process is doing lots of mmap based I/O.
2003-10-30 02:55:43 +00:00
rwatson
b43632b153 Check (locked) before performing an advisory unlock following a failure
of vn_start_write().  Otherwise, we may inconsistently attempt to release
the advisory lock.

Pointed out by:	teggej
2003-10-25 16:43:50 +00:00
rwatson
e4935eb9ae When generate a core dump, use advisory locking in an advisory way:
if we do acquire an advisory lock, great!  We'll release it later.
However, if we fail to acquire a lock, we perform the coredump
anyway.  This problem became particularly visible with NFS after
the introduction of rpc.lockd: if the lock manager isn't running,
then locking calls will fail, aborting the core dump (resulting in
a zero-byte dump file).

Reported by:	Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>
2003-10-25 16:14:09 +00:00
davidxu
b24bb74b9e Don't clear signal mask in execsig(). RELENG_4 does not clear it and POSIX
asks to inherit signal mask for execv.
2003-10-13 14:03:08 +00:00
robert
58f93096d9 Move some tracing related code into its own function as it will
be needed for system call related ptrace functionality I plan
to commit soon.
2003-09-26 15:09:46 +00:00
nectar
78ff87db8b panic() if we try to handle an out-of-range signal number in
psignal()/tdsignal().  The test was historically in psignal().  It was
changed into a KASSERT, and then later moved to tdsignal() when the
latter was introduced.

Reviewed by:	iedowse, jhb
2003-08-10 23:05:37 +00:00
davidxu
176657958f Use correct signal when calling sigexit. 2003-07-30 23:11:37 +00:00
phk
d4d7ca154a Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
mtm
e2309e4ab4 The POSIX spec also requires that kern_sigtimedwait return
EINVAL if tv_nsec of the timeout is less than zero.
2003-07-24 17:07:17 +00:00
davidxu
ae38138034 Always deliver synchronous signal to UTS for SA threads. 2003-07-21 00:26:52 +00:00
davidxu
97d2d9dfed Fix sigwait to conform to POSIX.
When a signal is being delivered to process, first find a sigwait
thread to deliver, POSIX's argument is speed of delivering signal
to sigwait thread is faster than other ways. A signal in its wait
set will cause sigwait to return the signal number, a signal not
in its wait set but in not blocked by the thread also causes sigwait
to return, but sigwait returns EINTR, sigwait is oneshot operation,
only one signal can be delivered to its wait set, when a signal is
delivered to the sigwait thread, the thread's sigwait state is canceled.
2003-07-17 22:52:55 +00:00
davidxu
15825cd99f Rename thread_siginfo to cpu_thread_siginfo 2003-07-15 04:26:26 +00:00
davidxu
59f688ef90 If a thread is sending signal to its process, if the thread can handle
the signal itself, it should get it without looking for other threads.
2003-07-11 13:42:23 +00:00
mtm
50c58f0282 Make the conditional, which decides what siglist to put a signal on,
more concise and improve the comment.

Submitted by: bde
2003-07-05 08:37:40 +00:00
mtm
6f4ee681fd Signals sent specifically to a particular thread must
be delivered to that thread, regardless of whether it
has it masked or not.

Previously, if the targeted thread had the signal masked,
it would be put on the processes' siglist. If
another thread has the signal umasked or unmasks it before
the target, then the thread it was intended for would never
receive it.

This patch attempts to solve the problem by requiring callers
of tdsignal() to say whether the signal is for the thread or
for the process. If it is for the process, then normal processing
occurs and any thread that has it unmasked can receive it.
But if it is destined for a specific thread, it is put on
that thread's pending list regardless of whether it is currently
masked or not.

The new behaviour still needs more work, though.  If the signal
is reposted for some reason it is always posted back to the
thread that handled it because the information regarding the
target of the signal has been lost by then.

Reviewed by:	jdp, jeff, bde (style)
2003-07-03 19:09:59 +00:00
davidxu
788b1fc17a o Change kse_thr_interrupt to allow send a signal to a specified thread,
or unblock a thread in kernel, and allow UTS to specify whether syscall
  should be restarted.
o Add ability for UTS to monitor signal comes in and removed from process,
  the flag PS_SIGEVENT is used to indicate the events.
o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with
  this flag set to wait for above signal event.
o For SA based thread, kernel masks all signal in its signal mask, let
  UTS to use kse_thr_interrupt interrupt a thread, and install a signal
  frame in userland for the thread.
o Add a tm_syncsig in thread mailbox, when a hardware trap occurs,
  it is used to deliver synchronous signal to userland, and upcall
  is schedule, so UTS can process the synchronous signal for the thread.

Reviewed by: julian (mentor)
2003-06-28 08:29:05 +00:00
davidxu
c6c7b174d1 Fix POSIX compatible bug for sigwaitinfo and sigtimedwait.
POSIX says siginfo pointer parameter can be NULL and if the
function success, it should return signal number but not zero.
The waitset it past should be negatived before it can be
used as thread signal mask.
2003-06-28 08:03:28 +00:00
davidxu
88ed270c3d When a STOP signal is being sent to a process, it is possible all
threads in the process have already masked the signal, so job control
is delayed. But later a thread unmasking the STOP signal should enable
job control, so in issignal(), scanning all threads in process to see
if we can direct suspend some of them, not just suspend current thread.
2003-06-20 03:36:45 +00:00
davidxu
c0a849442b Fix typo. td should be td0. 2003-06-20 01:56:28 +00:00
davidxu
1d77a8e0f6 1. Add code to support bound thread. when blocked, a bound thread never
schedules an upcall. Signal delivering to a bound thread is same as
   non-threaded process. This is intended to be used by libpthread to
   implement PTHREAD_SCOPE_SYSTEM thread.
2. Simplify kse_release() a bit, remove sleep loop.
2003-06-15 12:51:26 +00:00
davidxu
abb4420bbe Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
obrien
3b8fff9e4c Use __FBSDID(). 2003-06-11 00:56:59 +00:00
jhb
ae45522340 - Add a td_pflags field to struct thread for private flags accessed only by
curthread.  Unlike td_flags, this field does not need any locking.
- Replace the td_inktr and td_inktrace variables with equivalent private
  thread flags.
- Move TDF_OLDMASK over to the private flags field so it no longer requires
  sched_lock.
2003-06-09 17:38:32 +00:00
obrien
384dc4a2a3 Fix long standing bug that prevents the PT_CONTINUE, PT_KILL and
PT_DETACH ptrace(2) requests from functioning as advertised in the
manual page.  As described in kern/35175, the PT_DETACH request will,
under certain circumstances, pass an unwanted signal on to the traced
process upan detaching from it.  The PT_CONTINUE request will
sometimes fail if you make it pass a signal that has "properties" that
differ from the properties of the signal that origionally caused the
traced process to be stopped.  Since PT_KILL is nothing than
PT_CONTINUE with SIGKILL, it is broken too.  In the PT_KILL case, this
leads to an unkillable process.

PR:		44011
Submitted by:	Mark Kettenis <kettenis@chello.nl>
Approved by:	re(jhb)
2003-05-16 01:34:23 +00:00
jhb
89a4eb17de - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
jhb
9efb8e111e Remove Giant from kern_sigsuspend() and osigsuspend() as these should now
be MP safe.

Approved by:	re (scottl)
2003-05-09 19:11:32 +00:00
jhb
65572963c9 Mostly sort the includes. 2003-05-05 21:26:25 +00:00
jhb
099389efb0 Lock the proc lock around calls to tdsignal() in the sigwait() family of
syscalls.
2003-05-05 21:18:10 +00:00
jhb
755cc1e549 Make issignal() private to kern_sig.c since it is only called from cursig()
and cursig() is now a function rather than a macro.
2003-05-05 21:16:28 +00:00
jhb
65f917c9f1 Forgot to remove Giant around call to kern_sigaction() in
freebsd4_sigaction() in revision 1.232.
2003-04-30 19:45:13 +00:00
jhb
57c0e7ab21 Push Giant down into kern_sigaction() instead of locking it around calls
to kern_sigaction() in the various callers of the function.
2003-04-25 20:01:19 +00:00
jhb
9b55ca02a0 Remove Giant from osigblock(), osigsetmask(), and kern_sigaltstack(). 2003-04-23 19:49:18 +00:00
jhb
89c52cff2e - Reorganize osigstack() to do the copyin first, grab the proc lock once,
do all the various sigstack dances, unlock the proc lock, and finally do
  the copyout.  This more closely resembles the behavior of
  kern_sigaltstack() and closes a small race.
- Remove Giant from osigstack as it is no longer needed.
2003-04-23 18:50:25 +00:00
davidxu
28038e92fe Unbreak sigaltstack syscall. sigonstack is now a function and
want proc lock be held.
2003-04-19 05:04:06 +00:00
jhb
801acfe1d4 - Make sigonstack() a regular function instead of an inline and add a proc
lock assertion to it.
- SIGPENDING() no longer needs sched_lock, so only grab sched_lock to set
  the TDF_NEEDSIGCHK and TDF_ASTPENDING flags in signotify().
- Add a proc lock assertion to tdsigwakeup().
- Since we always set TDF_OLDMASK while holding the proc lock, the proc
  lock is sufficient protection to check its state in postsig() and we only
  need sched_lock when clearing the actual flag.
2003-04-18 20:59:05 +00:00
jhb
fa6200c9ec Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol
so that it can be used by binary emulators.
2003-04-18 20:18:44 +00:00
jhb
5921ce0c8b Don't hold the proc lock while performing sigset conversions on local
variables.
2003-04-17 22:07:56 +00:00
jhb
4b2bc05ffe - Remove garbage SIGSETOR() that snuck into struct sigpending_args
definition.
- Use the proper constant for the last arg to kern_sigaction() in osigvec()
  instead of a magic value.
2003-04-17 22:06:43 +00:00
davidxu
d5e8438f32 Style fix. 2003-04-12 02:54:46 +00:00
davidxu
cb24fd3c57 Check SIG_HOLD action ealier to avoid missing test it in later code. 2003-04-12 00:38:47 +00:00
jeff
3c4f704ebe - p will be unused in cursig() if INVARIANTS is not defined. Access it
through td->td_proc to avoid the unused variable.

Spotted by:	Maxim Konovalov <maxim@macomnet.ru>
2003-04-01 09:07:36 +00:00
jeff
b23496dd54 - Define sigwait, sigtimedwait, and sigwaitinfo in terms of
kern_sigtimedwait() which is capable of supporting all of their semantics.
 - These should be POSIX compliant but more careful review is needed before
   we announce this.
2003-03-31 23:30:41 +00:00
jeff
46e6ba39f1 - Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
 - signotify() now operates on a thread since unmasked pending signals are
   stored in the thread.
 - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
2003-03-31 22:49:17 +00:00
jeff
6e01278555 - Mark signals which may be delivered to any thread in the process with
SA_PROC.  Signals without this flag should be directed to a particular
   thread if this is possible.
2003-03-31 22:12:09 +00:00
jeff
4a3718fb25 - Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.
2003-03-31 22:02:38 +00:00
davidxu
bb4f70ad77 Fix threaded process job control bug. SMP tested.
Reviewed by: julian
2003-03-11 00:07:53 +00:00
tjr
0b60094f80 Hold the proc lock while accessing p_procsig in trapsignal(). 2003-03-09 01:40:55 +00:00
jhb
e4bcd25517 Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().
2003-03-04 21:03:05 +00:00
julian
3fc9836d46 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
davidxu
53f79e941f Fix a bug when handling SIGCONT.
Reported By: Mike Makonnen <mtm@identd.net>
2003-02-26 12:47:46 +00:00
jeff
5c29a640b8 - Add a new function, thread_signal_add(), that is called from postsig to
add a signal to a mailbox's pending set.
 - Add a new function, thread_signal_upcall(), this causes the current thread
   to upcall so that we can deliver pending signals.

Reviewed by:	mini
2003-02-17 09:58:11 +00:00
julian
af55753a06 Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by:	 parts by davidxu@
Reviewed by:	jeff@ mini@
2003-02-17 09:55:10 +00:00
jeff
590a39e29b - Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers.  This greatly simplifies much the
   KSE code.

Submitted by:	davidxu
2003-02-17 05:14:26 +00:00
tjr
c831929bbb Acquire Giant around calls to kern_sigaction() in sigaction(),
freebsd4_sigaction() and osigaction() instead of around the whole
body of those functions. They now no longer hold Giant around calls
to copyin() and copyout(), and it is slightly more obvious what
Giant is protecting.
2003-02-15 09:56:09 +00:00
tjr
a000ef163a osigpending() no longer needs Giant, for the same reason sigpending()
does not.
2003-02-15 09:15:30 +00:00
tjr
f12b647e3e All uses of p_siglist are protected by the proc lock now, so there's
no need to acquire Giant in sigpending() anymore.
2003-02-15 08:42:02 +00:00
julian
e8efa7328e Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.
2003-02-01 12:17:09 +00:00
peter
d7e400a15f No longer force COMPAT_FREEBSD4 to be on. 2003-01-27 23:01:03 +00:00
davidxu
4b9b549ca2 Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00