Commit Graph

486 Commits

Author SHA1 Message Date
Konstantin Belousov
3f1c4c4f31 When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.

This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	4 weeks
2010-04-06 10:43:01 +00:00
Alfred Perlstein
e722820434 Merge projects/enhanced_coredumps (r204346) into HEAD:
Enhanced process coredump routines.

  This brings in the following features:
  1) Limit number of cores per process via the %I coredump formatter.
  Example:
    if corefilename is set to %N.%I.core AND num_cores = 3, then
    if a process "rpd" cores, then the corefile will be named
    "rpd.0.core", however if it cores again, then the kernel will
    generate "rpd.1.core" until we hit the limit of "num_cores".

    this is useful to get several corefiles, but also prevent filling
    the machine with corefiles.

  2) Encode machine hostname in core dump name via %H.

  3) Compress coredumps, useful for embedded platforms with limited space.
    A sysctl kern.compress_user_cores is made available if turned on.

    To enable compressed coredumps, the following config options need to be set:
    options COMPRESS_USER_CORES
    device zlib   # brings in the zlib requirements.
    device gzio   # brings in the kernel vnode gzip output module.

  4) Eventhandlers are fired to indicate coredumps in progress.

  5) The imgact sv_coredump routine has grown a flag to pass in more
  state, currently this is used only for passing a flag down to compress
  the coredump or not.

  Note that the gzio facility can be used for generic output of gzip'd
  streams via vnodes.

Obtained from: Juniper Networks
Reviewed by: kan
2010-03-02 06:58:58 +00:00
Konstantin Belousov
a5799a4f27 Staticise sigqueue manipulation functions used only in kern_sig.c.
MFC after:	1 week
2010-01-23 11:43:30 +00:00
Konstantin Belousov
6a671a6bde When traced process is about to receive the signal, the process is
stopped and debugger may modify or drop the signal. After the changes to
keep process-targeted signals on the process sigqueue, another thread
may note the old signal on the queue and act before the thread removes
changed or dropped signal from the process queue. Since process is
traced, it usually gets stopped. Or, if the same signal is delivered
while process was stopped, the thread may erronously remove it,
intending to remove the original signal.

Remove the signal from the queue before notifying the debugger. Restore
the siginfo to the head of sigqueue when signal is allowed to be
delivered to the debugee, using newly introduced KSI_HEAD ksiginfo_t
flag. This preserves required order of delivery. Always restore the
unchanged signal on the curthread sigqueue, not to the process queue,
since the thread is about to get it anyway, because sigmask cannot be
changed.

Handle failure of reinserting the siginfo into the queue by falling
back to sq_kill method, calling sigqueue_add with NULL ksi.

If debugger changed the signal to be delivered, use sigqueue_add()
with NULL ksi instead of only setting sq_signals bit.

Reported by:	Gardner Bell <gbell72 rogers com>
Analyzed and first version of fix by:	Tijl Coosemans <tijl coosemans org>
PR:	142757
Reviewed by:	davidxu
MFC after:	2 weeks
2010-01-20 11:58:04 +00:00
Konstantin Belousov
4f17d481ed Remove wrong assertion. Debugee is allowed to lose a signal.
Reported and tested by:	jh
MFC after:	2 weeks
2009-12-03 20:16:59 +00:00
Konstantin Belousov
a3de221dbe Among signal generation syscalls, only sigqueue(2) is allowed by POSIX
to fail due to lack of resources to queue siginfo. Add KSI_SIGQ flag
that allows sigqueue_add() to fail while trying to allocate memory for
new siginfo. When the flag is not set, behaviour is the same as for
KSI_TRAP: if memory cannot be allocated, set bit in sq_kill. KSI_TRAP is
kept to preserve KBI.

Add SI_KERNEL si_code, to be used in siginfo.si_code when signal is
generated by kernel. Deliver siginfo when signal is generated by kill(2)
family of syscalls (SI_USER with properly filled si_uid and si_pid), or
by kernel (SI_KERNEL, mostly job control or SIGIO). Since KSI_SIGQ flag
is not set for the ksi, low memory condition cause old behaviour.

Keep psignal(9) KBI intact, but modify it to generate SI_KERNEL
si_code. Pgsignal(9) and gsignal(9) now take ksi explicitely. Add
pksignal(9) that behaves like psignal but takes ksi, and ddb kill
command implemented as pksignal(..., ksi = NULL) to not do allocation
while in debugger.

While there, remove some register specifiers and use ANSI C prototypes.

Reviewed by:	davidxu
MFC after:	1 month
2009-11-17 11:39:15 +00:00
Konstantin Belousov
75c586a4c8 In r198506, kern_sigsuspend() started doing cursig/postsig loop to make
sure that a signal was delivered to the thread before returning from
syscall. Signal delivery puts new return frame on the user stack, and
modifies trap frame to enter signal handler. As a consequence, syscall
return code sets EINTR as error return for signal frame, instead of the
syscall return.

Also, for ia64, due to different registers layout for those two kind of
frames, usermode sigsegfaulted when returned from signal handler.

Use newly-introduced cpu_set_syscall_retval(9) to set syscall result,
and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return
code from modifying this frame [1].

Another issue is that pending SIGCONT might be cancelled by SIGSTOP,
causing postsig() not to deliver any catched signal [2]. Modify
postsig() to return 1 if signal was posted, and 0 otherwise, and use
this in the kern_sigsuspend loop.

Proposed by:	marcel [1]
Noted by:	davidxu [2]
Reviewed by:	marcel, davidxu
MFC after:	1 month
2009-11-10 11:46:53 +00:00
Konstantin Belousov
80a8b0f3bf Trapsignal() and postsig() call kern_sigprocmask() with both process
lock and curproc->p_sigacts->ps_mtx. Reschedule_signals may need to have
ps_mtx locked to decide and wakeup a thread, causing recursion on the
mutex.

Inform kern_sigprocmask() and reschedule_signals() about lock state
of the ps_mtx by new flag SIGPROCMASK_PS_LOCKED to avoid recursion.

Reported and tested by:	keramida
MFC after:	1 month
2009-10-30 10:10:39 +00:00
Konstantin Belousov
8084540253 Trapsignal() calls kern_sigprocmask() when delivering catched signal
with proc lock held.

Reported and tested by:	Mykola Dzham  freebsd at levsha org ua
MFC after:	1 month
2009-10-29 14:34:24 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Konstantin Belousov
84440afb54 In kern_sigsuspend(), better manipulate thread signal mask using
kern_sigprocmask() to properly notify other possible candidate threads
for signal delivery.

Since sigsuspend() shall only return to usermode after a signal was
delivered, do cursig/postsig loop immediately after waiting for
signal, repeating the wait if wakeup was spurious due to race with
other thread fetching signal from the process queue before us. Add
thread_suspend_check() call to allow the thread to be stopped or killed
while in loop.

Modify last argument of kern_sigprocmask() from boolean to flags,
allowing the function to be called with locked proc. Convertion of the
callers that supplied 1 to the old argument will be done in the next
commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally
correct in between.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:42:24 +00:00
Joseph Koshy
8b5adf9dd9 Improve the description of sysctl "kern.sugid_coredump".
Submitted by:	Mel Flynn <mel.flynn+fbsd.hackers at mailing.thruhere.net>
		on -hackers
2009-10-12 15:49:48 +00:00
Konstantin Belousov
7df8f6ab6f Fix typo.
Submitted by:	rdivacky
MFC after:	1 month
2009-10-12 10:09:48 +00:00
Konstantin Belousov
6b286ee8b5 Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.

Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.

Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.

Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.

Reported by:	Justin Teller <justin.teller gmail com>
Reviewed by:	davidxu, jilles
MFC after:	1 month
2009-10-11 16:49:30 +00:00
Konstantin Belousov
15b7a831df Fix typo.
MFC after:	3 days
2009-10-01 12:46:58 +00:00
Robert Watson
e76d823b81 Use C99 initialization for struct filterops.
Obtained from:	Mac OS X
Sponsored by:	Apple Inc.
MFC after:	3 weeks
2009-09-12 20:03:45 +00:00
Konstantin Belousov
f33a947b56 Add new msleep(9) flag PBDY that shall be specified together with
PCATCH, to indicate that thread shall not be stopped upon receipt of
SIGSTOP until it reaches the kernel->usermode boundary.

Also change thread_single(SINGLE_NO_EXIT) to only stop threads at
the user boundary unconditionally.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:52:46 +00:00
Robert Watson
14961ba789 Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type.  This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by:	brooks
Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-06-27 13:58:44 +00:00
Peter Holm
401679debe vn_open_cred() needs a non NULL ucred pointer
Reviewed by:	kib
2009-06-23 11:29:54 +00:00
Konstantin Belousov
e0c161b89c Add another flags argument to vn_open_cred. Use it to specify that some
vn_open_cred invocations shall not audit namei path.

In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by
default implementation of vop_vptocnp, and for the open done for core
file. vn_fullpath is called from the audit code, and vn_open there need
to disable audit to avoid infinite recursion. Core file is created on
return to user mode, that, in particular, happens during syscall return.
The creation of the core file is audited by direct calls, and we do not
want to overwrite audit information for syscall.

Reported, reviewed and tested by: rwatson
2009-06-21 13:41:32 +00:00
Robert Watson
885868cd8f Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by:    jeff
Reviewed by:    jeff
Discussed with: rmacklem, zach loafman @ isilon
2009-04-10 10:52:19 +00:00
Ed Schouten
c90c9021e9 Remove even more unneeded variable assignments.
kern_time.c:
- Unused variable `p'.

kern_thr.c:
- Variable `error' is always caught immediately, so no reason to
  initialize it. There is no way that error != 0 at the end of
  create_thread().

kern_sig.c:
- Unused variable `code'.

kern_synch.c:
- `rval' is always assigned in all different cases.

kern_rwlock.c:
- `v' is always overwritten with RW_UNLOCKED further on.

kern_malloc.c:
- `size' is always initialized with the proper value before being used.

kern_exit.c:
- `error' is always caught and returned immediately. abort2() never
  returns a non-zero value.

kern_exec.c:
- `len' is always assigned inside the if-statement right below it.

tty_info.c:
- `td' is always overwritten by FOREACH_THREAD_IN_PROC().

Found by:	LLVM's scan-build
2009-02-26 15:51:54 +00:00
David Xu
7b4a950a7d Revert rev 184216 and 184199, due to the way the thread_lock works,
it may cause a lockup.

Noticed by: peter, jhb
2008-11-05 03:01:23 +00:00
David Xu
3f9be10eb0 Actually, for signal and thread suspension, extra process spin lock is
unnecessary, the normal process lock and thread lock are enough. The
spin lock is still needed for process and thread exiting to mimic
single sched_lock.
2008-10-23 07:55:38 +00:00
David Xu
904c5ec4e3 Move per-thread userland debugging flags into seperated field,
this eliminates some problems of locking, e.g, a thread lock is needed
but can not be used at that time. Only the process lock is needed now
for new field.
2008-10-15 06:31:37 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
John Baldwin
da7bbd2c08 If a thread that is swapped out is made runnable, then the setrunnable()
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup().  When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).

With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup.  However, this required
grabbing proc0's thread lock to perform the wakeup.  If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.

Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked.  The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up.  The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().

Discussed with:	jeff
Glanced at by:	sam
Tested by:	Jurgen Weber  jurgen - ish com au
MFC after:	2 weeks
2008-08-05 20:02:31 +00:00
John Birrell
5d217f173c Add DTrace 'proc' provider probes using the Statically Defined Trace
(sdt) mechanism.
2008-05-24 06:22:16 +00:00
Jeff Roberson
b7edba7704 - Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs
to enter thread_suspend_check().
 - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the
   thread_suspend_check() to ast() rather than userret().
 - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so
   that we don't miss a suspend request.  If this is set use the
   expensive signal path.
 - Set NEEDSUSPCHK when creating a new thread in thr in case the
   creating thread is due to be suspended as well but has not yet.

Reviewed by:	davidxu (Authored original patch)
2008-03-21 08:23:25 +00:00
Jeff Roberson
374ae2a393 - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
 - Reflect these changes in the proc.h documentation and consumers throughout
   the kernel.  This is a substantial reduction in locking cost for these
   fields and was made possible by recent changes to threading support.
2008-03-19 06:19:01 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Robert Watson
36b208e008 Use sbuf routines to construct core dump filenames rather than custom
string buffer handling, making the code both easier to read and more
robust against string-handling bugs.

MFC after:	1 week
2008-03-08 16:31:29 +00:00
Robert Watson
eeccc36738 Unlock the process lock when expand_name() fails, or we may leak the
process lock leading to a hang.  This bug was introduced in
kern_sig.c:1.351, when the call to expand_name() was moved earlier
bit this particular error case was not updated.
2008-03-08 15:48:06 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
David E. O'Brien
10c2b8e128 Be more exact with sigaction SA_SIGINFO handling.
Reviewed by:	marcel
2007-12-18 20:39:13 +00:00
Konstantin Belousov
89b57fcf01 Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
Christian S.J. Peron
57274c513c Implement AUE_CORE, which adds process core dump support into the kernel.
This change introduces audit_proc_coredump() which is called by coredump(9)
to create an audit record for the coredump event.  When a process
dumps a core, it could be security relevant.  It could be an indicator that
a stack within the process has been overflowed with an incorrectly constructed
malicious payload or a number of other events.

The record that is generated looks like this:

header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec
argument,0,0xb,signal
path,/usr/home/csjp/test.core
subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2
return,success,1
trailer,111

- We allocate a completely new record to make sure we arent clobbering
  the audit data associated with the syscall that produced the core
  (assuming the core is being generated in response to SIGABRT  and not
  an invalid memory access).
- Shuffle around expand_name() so we can use the coredump name at the very
  beginning of the coredump call.  Make sure we free the storage referenced
  by "name" if we need to bail out early.
- Audit both successful and failed coredump creation efforts

Obtained from:	TrustedBSD Project
Reviewed by:	rwatson
MFC after:	1 month
2007-10-26 01:23:07 +00:00
Christian S.J. Peron
5ff3816d82 Move where we audit the PID argument such that we unconditionally
audit it at the beginning of the syscall.  This fixes a problem
where the user supplies an invalid process ID which is > 0 which
results in the PID argument not being audited.

Obtained from:	TrustedBSD Project
MFC after:	1 week
2007-10-24 00:14:19 +00:00
Jeff Roberson
6eeb364b4c - Calling sched_nice() in tdsigwakeup() is no longer required by ULE and
actually causes LORs and other panics.

Reported by:	mlaier
Approved by:	re
2007-07-19 08:49:16 +00:00
Jeff Roberson
efe641b939 - Add a missing PROC_SUNLOCK() in tdsignal() 2007-06-11 23:27:03 +00:00
Matt Jacob
9b73d2396a Initialized ets to zero. This is arguably a gcc bug in that ets is always
set to rts when timeout is non-NULL and then timevalid is set and ets is
only checked later when timervalid is set.
2007-06-10 01:43:11 +00:00
Jeff Roberson
a54e85fdbf Commit 4/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.
 - Move some common code into thread_suspend_switch() to handle the
   mechanics of suspending a thread.  The locking here is incredibly
   convoluted and should be simplified.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:52:24 +00:00
Jeff Roberson
1c4bcd050a - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
Konstantin Belousov
9e223287c0 Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by:	jhb
Reviewed by:	daichi (unionfs)
Approved by:	re (kensmith)
2007-05-31 11:51:53 +00:00
Robert Watson
4dec0e67ea Comment that tdsignal() may be entered from the debugger. 2007-05-23 17:27:42 +00:00
John Baldwin
aa89d8cd52 Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.
2007-03-21 21:20:51 +00:00
Robert Watson
873fbcd776 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
Robert Watson
0c14ff0eb5 Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
2007-03-04 22:36:48 +00:00
Xin LI
d60226bd43 Give which signal caller has attempted to deliver when panicking. 2007-02-09 17:48:28 +00:00
Xin LI
4f506694bb Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form. 2007-01-17 14:58:53 +00:00
David Xu
016fa30228 break loop early if we know that there are at least two signals. 2006-12-25 03:00:15 +00:00
Tom Rhodes
6aeb05d7be Merge posix4/* into normal kernel hierarchy.
Reviewed by:	glanced at by jhb
Approved by:	silence on -arch@ and -standards@
2006-11-11 16:26:58 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
David Xu
5c28a8d474 Use macro TAILQ_FOREACH_SAFE instead of expanding it. 2006-10-22 00:09:41 +00:00
John Baldwin
0fc32899f1 Remove the check that prevented signals from being delivered to exiting
processes.  It was originally added back when support for Linux threads
(and thus shared sigacts objects) was added, but no one knows why.  My
guess is that at some point during the Linux threads patches, the sigacts
object was torn down during exit1(), so this check was added to prevent
a panic for that race.  However, the stuff that was actually committed to
the tree doesn't teardown sigacts until wait() making the above race moot.
Re-allowing signals here lets one interrupt a NFS request during process
teardown (such as closing descriptors) on an interruptible mount.

Requested by:	kib (long time ago)
MFC after:	1 week
2006-10-20 16:19:21 +00:00
David Xu
c6511aea86 Move some declaration of 32-bit signal structures into file
freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.
2006-10-05 01:56:11 +00:00
John Baldwin
73dbd3da73 Remove various bits of conditional Alpha code and fixup a few comments. 2006-05-12 05:04:46 +00:00
Tor Egge
11991ab418 Call vn_finished_write() before calling the coredump handler which will
indirectly call vn_start_write() as necessary for each write.
2006-05-07 22:50:22 +00:00
Paul Saab
95f16c1e2c Don't try to kill embryonic processes in killpg1(). This prevents
a race condition between fork() and kill(pid,sig) with pid < 0 that
can cause a kernel panic.

Submitted by:	up
MFC after:	3 weeks
2006-04-21 19:26:21 +00:00
John Baldwin
33f19bee6f - Conditionalize Giant around VFS operations for ALQ, ktrace, and
generating a coredump as the result of a signal.
- Fix a bug where we could leak a Giant lock if vn_start_write() failed
  in coredump().

Reported by:	jmg (2)
2006-03-28 21:30:22 +00:00
David Xu
7b8d5e4865 Remove _STOPEVENT call, it is already called in issignal, simplify
code for SIGKILL signal.
2006-03-09 08:31:51 +00:00
David Xu
3dfcaad667 Add signal set sq_kill to sigqueue structure, the member saves all
signals sent by kill() syscall, without this, a signal sent by
sigqueue() can cause a signal sent by kill() to be lost.
2006-03-02 14:06:40 +00:00
David Xu
7e0221a251 1. Refine kern_sigtimedwait() to remove redundant code.
2. Fix a bug, if thread got a SIGKILL signal, call sigexit() to kill
   its process.

MFC after: 3 days
2006-02-23 09:24:19 +00:00
David Xu
7c9a98f15b Code cleanup, simply compare with curproc. 2006-02-23 05:50:55 +00:00
David Xu
94f0972bec Fix a long standing race between sleep queue and thread
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.

Reviewed by:	jhb
MFC after:	1 week
2006-02-15 23:52:01 +00:00
Wayne Salamon
bfd7575a39 Audit the arguments to the kill(2) and killpg(2) system calls.
Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)
2006-02-14 01:17:03 +00:00
David Xu
d8267df729 In order to speed up process suspension on MP machine, send IPI to
remote CPU. While here, abstract thread suspension code into a function
called sig_suspend_threads, the function is called when a process received
a STOP signal.
2006-02-13 03:16:55 +00:00
David Xu
7f96995ebd Create childproc_jobstate function to report job control state, this
also fixes a bug in childproc_continued which ignored PS_NOCLDSTOP.
2006-02-04 14:10:57 +00:00
David Xu
d7bc12b096 Avoid kernel panic when attaching a process which may not be stopped
by debugger, e.g process is dumping core. Only access p_xthread if
P_STOPPED_TRACE is set, this means thread is ready to exchange signal
with debugger, print a warning if P_STOPPED_TRACE is not set due to
some bugs in other code, if there is.

The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and
is slightly adjusted.
2005-12-24 02:59:29 +00:00
David Xu
f71a882f15 Add a sysctl to force a process to sigexit if a trap signal is
being hold by current thread or ignored by current process,
otherwise, it is very possible the thread will enter an infinite loop
and lead to an administrator's nightmare.
2005-12-09 08:29:29 +00:00
David Xu
761a4d9423 Cleanup sigqueue sysctl. 2005-12-09 02:26:44 +00:00
David Xu
027f760408 Fix a lock leak in childproc_continued(). 2005-12-06 05:30:13 +00:00
David Xu
b51d237a67 set signal queue values for sysconf(). 2005-12-01 00:25:50 +00:00
David Xu
413cf3bbe1 Make sure only remove one signal by debugger. 2005-11-12 04:22:16 +00:00
David Xu
f4d8522334 WIFxxx macros requires an int type but p_xstat is short, convert it
to int before using the macros.

Bug reported by : Pyun YongHyeon pyunyh at gmail dot com
2005-11-09 07:58:16 +00:00
David Xu
ebceaf6dc7 Add support for queueing SIGCHLD same as other UNIX systems did.
For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.

The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.

There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.

Tested on: i386 (SMP and UP)
2005-11-08 09:09:26 +00:00
David Xu
8f0371f19d Fix name compatible problem with POSIX standard. the sigval_ptr and
sigval_int really should be sival_ptr and sival_int.
Also sigev_notify_function accepts a union sigval value but not a
pointer.
2005-11-04 09:41:00 +00:00
David Xu
6d7b314b14 Cleanup some signal interfaces. Now the tdsignal function accepts
both proc pointer and thread pointer, if thread pointer is NULL,
tdsignal automatically finds a thread, otherwise it sends signal
to given thread.
Add utility function psignal_event to send a realtime sigevent
to a process according to the delivery requirement specified in
struct sigevent.
2005-11-03 04:49:16 +00:00
David Xu
56c06c4b67 Let itimer store itimerspec instead of itimerval, so I don't have to
convert to or from timeval frequently.

Introduce function itimer_accept() to ack a timer signal in signal
acceptance code, this allows us to return more fresh overrun counter
than at signal generating time. while POSIX says:
"the value returned by timer_getoverrun() shall apply to the most
recent expiration signal delivery or acceptance for the timer,.."
I prefer returning it at acceptance time.

Introduce SIGEV_THREAD_ID notification mode, it is used by thread
libary to request kernel to deliver signal to a specified thread,
and in turn, the thread library may use the mechanism to implement
SIGEV_THREAD which is required by POSIX.

Timer signal is managed by timer code, so it can not fail even if
signal queue is full filled by sigqueue syscall.
2005-10-30 02:56:08 +00:00
David Xu
5da49fcb8a 1. Make ksiginfo_alloc and ksiginfo_free public.
2. Introduce flags KSI_EXT and KSI_INS. The flag KSI_EXT allows a ksiginfo
   to be managed by outside code, the KSI_INS indicates sigqueue_add should
   directly insert passed ksiginfo into queue other than copy it.
2005-10-23 04:12:26 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
David Xu
ec8297bda1 Fix a bug relavant to debugging, a masked signal unexpectedly interrupts
a sleeping thread when process is being debugged.

PR: GNU/77818
Tested by: Sean C. Farley <sean-freebsd at farley org>
2005-06-06 05:13:10 +00:00
David Xu
407948a530 Oops, forgot to update this file.
Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after    : 4 days
2005-04-19 08:11:28 +00:00
David Schultz
f97c3df18d Suspend all other threads in the process while generating a core dump.
The main reason for doing this is that the ELF dump handler expects
the thread list to be fixed while the dump header is generated, so an
upcall that occurs at the wrong time can lead to buffer overruns and
other Bad Things.

Another solution would be to grab sched_lock in the ELF dump handler,
but we might as well single-thread, since the process is about to die.
Furthermore, I think this should ensure that the register sets in the
core file are sequentially consistent.
2005-04-10 02:31:24 +00:00
David Xu
627451c1d9 The td_waitset is pointing to a stack address when thread is waiting
for a signal, because kernel stack is swappable, this causes page fault
in kernel under heavy swapping case. Fix this bug by eliminating unneeded
code.
2005-03-04 22:46:31 +00:00
David Xu
6675b36ec5 In kern_sigtimedwait, remove waitset bits for td_sigmask before
sleeping, so in do_tdsignal, we no longer need to test td_waitset.
now td_waitset is only used to give a thread higher priority when
delivering signal to multithreads process.
This also fixes a bug:
when a thread in sigwait states was suspended and later resumed
by SIGCONT, it can no longer receive signals belong to waitset.
2005-03-02 13:43:51 +00:00
David Xu
1089f0319b Don't restart a timeout wait in kern_sigtimedwait, also allow it
to wait longer than a single integer can represent.
2005-02-19 06:05:49 +00:00
Maxim Sobolev
1a88a252fd Backout previous change (disabling of security checks for signals delivered
in emulation layers), since it appears to be too broad.

Requested by:   rwatson
2005-02-13 17:37:20 +00:00
Maxim Sobolev
d8ff44b79f Split out kill(2) syscall service routine into user-level and kernel part, the
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.

Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.

As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.

Requested (sorta) by:	rwatson
MFC after:	2 weeks
2005-02-13 16:42:08 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Jeff Roberson
3ef6ac3361 - If delivering a signal will result in killing a process that has a
nice value above 0, set it to 0 so that it may proceed with haste.
   This is especially important on ULE, where adjusting the priority
   does not guarantee that a thread will be granted a greater time slice.
2004-12-13 16:45:57 +00:00
Warner Losh
6d1ab6edac Fix an off by one error. MAXPATHLEN already has +1. 2004-11-15 20:51:32 +00:00
Alfred Perlstein
90d75f785a Allow kill -9 to kill processes stuck in procfs STOPEVENTs. 2004-10-30 02:56:22 +00:00
Alfred Perlstein
cd71c41476 Backout 1.291.
re doesn't seem to think this fixes:
  Desired features for 5.3-RELEASE "More truss problems"
2004-10-29 08:24:41 +00:00
David Xu
b3a4fb14b3 Use scheduler api to adjust thread priority. 2004-10-05 09:10:30 +00:00
David Xu
482d099c50 Don't bother to turn off other P_STOPPED bits for SIGKILL, doing
so would cause kernel to produce an unkillable process in some cases,
especially, P_STOPPED_SINGLE has a singling thread, turning off the
bit would mess the state.
2004-10-03 13:23:49 +00:00
Alfred Perlstein
50434413b0 Clear a process's procfs trace points upon delivery of SIGKILL.
MT5 candidate. (Desired features for 5.3-RELEASE "More truss problems")
2004-10-01 14:15:20 +00:00
Julian Elischer
5995adc206 Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after:	3 days
2004-08-31 07:34:54 +00:00
John-Mark Gurney
ad3b9257c2 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
John-Mark Gurney
6141e04a7e add option to automaticly mark core dumps with the nodump flag
PR:		57065
Submitted by:	Walter C. Pelissero
2004-08-09 05:46:46 +00:00
Pawel Jakub Dawidek
24b2151f4d Don't skip permission checks when sending signals to zombie processes.
Pointed out by:	bde
Reviewed by:	rwatson
2004-08-03 15:39:23 +00:00
Pawel Jakub Dawidek
0b011ea3da Syscall kill(2) called for a zombie process should return 0.
Obtained from:	Darwin
2004-07-29 20:38:19 +00:00
John Baldwin
6dbc085016 Improve readability a bit by changing some code at the end of a function
that did:

	if (foo)
		return
	else
		blah

to just do the simpler

	if (!foo)
		blah

instead.
2004-07-16 21:00:50 +00:00
David Xu
cbf4e354ec Add code to support debugging threaded process.
1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
   thread current user thread is running on. Add tm_dflags into
   kse_thr_mailbox, the flags is written by debugger, it tells
   UTS and kernel what should be done when the process is being
   debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.

   TMDF_SSTEP is used to tell kernel to turn on single stepping,
   or turn off if it is not set.

   TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
   whenever possible, to UTS, it means do not run the user thread
   until debugger clears it, this behaviour is necessary because
   gdb wants to resume only one thread when the thread's pc is
   at a breakpoint, and thread needs to go forward, in order to
   avoid other threads sneak pass the breakpoints, it needs to remove
   breakpoint, only wants one thread to go. Also, add km_lwp to
   kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
   switch time when process is not being debugged, so when process
   is attached, debugger can map kernel thread to user thread.

2. Add p_xthread to proc strcuture and td_xsig to thread structure.
   p_xthread is used by a thread when it wants to report event
   to debugger, every thread can set the pointer, especially, when
   it is used in ptracestop, it is the last thread reporting event
   will win the race. Every thread has a td_xsig to exchange signal
   with debugger, thread uses TDF_XSIG flag to indicate it is reporting
   signal to debugger, if the flag is not cleared, thread will keep
   retrying until it is cleared by debugger, p_xthread may be
   used by debugger to indicate CURRENT thread. The p_xstat is still
   in proc structure to keep wait() to work, in future, we may
   just use td_xsig.

3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
   a thread. When process stops, debugger can set the flag for
   thread, thread will check the flag in thread_suspend_check,
   enters a loop, unless it is cleared by debugger, process is
   detached or process is existing. The flag is also checked in
   ptracestop, so debugger can temporarily suspend a thread even
   if the thread wants to exchange signal.

4. Current, in ptrace, we always resume all threads, but if a thread
   has already a TDF_DBSUSPEND flag set by debugger, it won't run.

Encouraged by: marcel, julian, deischen
2004-07-13 07:20:10 +00:00
Marcel Moolenaar
fbc3247d81 Implement the PT_LWPINFO request. This request can be used by the
tracing process to obtain information about the LWP that caused the
traced process to stop. Debuggers can use this information to select
the thread currently running on the LWP as the current thread.

The request has been made compatible with NetBSD for as much as
possible. This implementation differs from NetBSD in the following
ways:
1.  The data argument is allowed to be smaller than the size of the
    ptrace_lwpinfo structure known to the kernel, but not 0. This
    is opposite to what NetBSD allows. The reason for this is that
    we can extend the structure without affecting older binaries.
2.  On NetBSD the tracing process is to set the pl_lwpid field to
    the Id of the LWP it wants information of. We don't do that.
    Our ptrace interface allows passing the LWP Id instead of the
    PID. The tracing process is to set the PID to the LWP Id it
    wants information of.
3.  When the PID is actually the PID of the tracing process, this
    request returns the information about the LWP that caused the
    process to stop. This was the whole purpose of the request in
    the first place.

When the traced process has exited, this request will return the
LWP Id 0, indicating that the process state is not the result of
an event specific to a LWP.
2004-07-12 05:07:50 +00:00
John Baldwin
bf0acc273a - Change mi_switch() and sched_switch() to accept an optional thread to
switch to.  If a non-NULL thread pointer is passed in, then the CPU will
  switch to that thread directly rather than calling choosethread() to pick
  a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
  TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
  requiring all callers of mi_switch() to know to do this if they can be
  called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
  the middle of the function prototypes and up above into their own
  section.
2004-07-02 19:09:50 +00:00
Poul-Henning Kamp
1930e303cf Deorbit COMPAT_SUNOS.
We inherited this from the sparc32 port of BSD4.4-Lite1.  We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.
2004-06-11 11:16:26 +00:00
David Xu
36939a0a5c According to SUSv3, sigwait is different with sigwaitinfo, sigwait
returns error code in return value, not in errno.
2004-06-07 13:35:02 +00:00
Tim J. Robbins
aa0aa7a113 Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by:	davidxu
2004-06-02 07:52:36 +00:00
Bruce Evans
a4c2da1503 Fixed some style bugs in tdsigwakeup(). 2004-05-21 10:02:24 +00:00
John Baldwin
80c4433c18 In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if
a thread is on a sleep queue and should have it's sleep aborted.

Reported by:	Thierry Herbelot thierry at herbelot dot com
2004-05-20 20:17:28 +00:00
Colin Percival
4a3b3dcb55 stop() no longer needs sched_lock held; in fact, holding sched_lock causes
a LOR against sleepq.  Fix the comment, and fix ptracestop() to pick up
sched_lock after stop() rather than before.

Reported by:	Scott Sipe <cscotts@mindspring.com>
Reviewed by:	rwatson, jhb
2004-04-12 15:56:05 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Peter Wemm
9a6a4cb50d Shorten some XXXKSE commentry 2004-03-29 22:46:54 +00:00
John Baldwin
4ae89b957c - Push down Giant in exit() and wait().
- Push Giant down a bit in coredump() and call coredump() with the proc
  lock already held rather than unlocking it only to turn around and
  relock it.

Requested by:	peter
2004-03-05 22:39:53 +00:00
Dag-Erling Smørgrav
86b5e56351 Use different dummy wait channels to avoid panic in msleep().
Reviewed by:	jhb
2004-03-03 23:03:18 +00:00
John Baldwin
44f3b09204 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
Robert Watson
30a9f26db2 Assert process lock in ptracestop(), since we're going to rely
on it, and later unlock it.
2004-01-29 00:58:21 +00:00
Alexander Kabaev
975634280a Move the part of the comment which applies to osigsuspend where
it belongs. The current sigsuspend syscall does expect a pointer
to the mask as argument.

Submitted by:	Igor Sysoev <is at rambler-co dot ru>
2004-01-28 06:06:04 +00:00
Jeff Roberson
29bcc4514f - Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL.  Assert that one of these is set in mi_switch() and propery
   adjust the rusage statistics.  This is to simplify the large number of
   users of this interface which were previously all required to adjust the
   proper counter prior to calling mi_switch().  This also facilitates more
   switch and locking optimizations.
 - Change all callers of mi_switch() to pass the appropriate paramter and
   remove direct references to the process statistics.
2004-01-25 03:54:52 +00:00
Robert Watson
def055686c When not creating a core dump due to resource limits specifying
a maximum dump size of 0, return a size-related error, rather
than returning success.  Otherwise, waitpid() will incorrectly
return a status indicating that a core dump was created.  Note
that the specific error doesn't actually matter, since it's lost.

MFC after:	2 weeks
PR:		60367
Submitted by:	Valentin Nechayev <netch@netch.kiev.ua>
2004-01-11 02:28:06 +00:00
Robert Watson
047aa39b25 Drop the sigacts mutex around calls to stopevent() to avoid sleeping
holding the mutex.  Because the sigacts pointer can't change while
the process is "live" (proc locking (x)), we know our pointer is still
valid.

In communication with:	truckman
Reviewed by:		jhb
2004-01-08 22:44:54 +00:00
David Xu
a30ec4b99c Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr
2004-01-03 02:02:26 +00:00
David Xu
a9a48d6862 Lock and unlock sched_lock when walking through thread list, current we
insert kse upcall thread into thread list at mi_switch time, process lock
is not enough.
2003-12-07 23:47:15 +00:00
David Xu
7eeaaf9b97 Try to fetch thread mailbox address in page fault trap, so when thread
blocks in page fault hanlder, and upcall thread can be scheduled. It is
useful if process is doing lots of mmap based I/O.
2003-10-30 02:55:43 +00:00
Robert Watson
36bbf86ba6 Check (locked) before performing an advisory unlock following a failure
of vn_start_write().  Otherwise, we may inconsistently attempt to release
the advisory lock.

Pointed out by:	teggej
2003-10-25 16:43:50 +00:00
Robert Watson
c447f5b2f4 When generate a core dump, use advisory locking in an advisory way:
if we do acquire an advisory lock, great!  We'll release it later.
However, if we fail to acquire a lock, we perform the coredump
anyway.  This problem became particularly visible with NFS after
the introduction of rpc.lockd: if the lock manager isn't running,
then locking calls will fail, aborting the core dump (resulting in
a zero-byte dump file).

Reported by:	Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>
2003-10-25 16:14:09 +00:00
David Xu
3a2e2a0ec8 Don't clear signal mask in execsig(). RELENG_4 does not clear it and POSIX
asks to inherit signal mask for execv.
2003-10-13 14:03:08 +00:00
Robert Drehmel
4cc9f52f78 Move some tracing related code into its own function as it will
be needed for system call related ptrace functionality I plan
to commit soon.
2003-09-26 15:09:46 +00:00
Jacques Vidrine
41b3077a6c panic() if we try to handle an out-of-range signal number in
psignal()/tdsignal().  The test was historically in psignal().  It was
changed into a KASSERT, and then later moved to tdsignal() when the
latter was introduced.

Reviewed by:	iedowse, jhb
2003-08-10 23:05:37 +00:00
David Xu
1fc434dc9a Use correct signal when calling sigexit. 2003-07-30 23:11:37 +00:00
Poul-Henning Kamp
7c89f162bc Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
Mike Makonnen
a6ca48085c The POSIX spec also requires that kern_sigtimedwait return
EINVAL if tv_nsec of the timeout is less than zero.
2003-07-24 17:07:17 +00:00
David Xu
432b45de08 Always deliver synchronous signal to UTS for SA threads. 2003-07-21 00:26:52 +00:00
David Xu
3074d1b454 Fix sigwait to conform to POSIX.
When a signal is being delivered to process, first find a sigwait
thread to deliver, POSIX's argument is speed of delivering signal
to sigwait thread is faster than other ways. A signal in its wait
set will cause sigwait to return the signal number, a signal not
in its wait set but in not blocked by the thread also causes sigwait
to return, but sigwait returns EINTR, sigwait is oneshot operation,
only one signal can be delivered to its wait set, when a signal is
delivered to the sigwait thread, the thread's sigwait state is canceled.
2003-07-17 22:52:55 +00:00
David Xu
4b7d5d84ee Rename thread_siginfo to cpu_thread_siginfo 2003-07-15 04:26:26 +00:00
David Xu
ffb2e92a98 If a thread is sending signal to its process, if the thread can handle
the signal itself, it should get it without looking for other threads.
2003-07-11 13:42:23 +00:00
Mike Makonnen
14b5ae1a98 Make the conditional, which decides what siglist to put a signal on,
more concise and improve the comment.

Submitted by: bde
2003-07-05 08:37:40 +00:00
Mike Makonnen
c197abc49a Signals sent specifically to a particular thread must
be delivered to that thread, regardless of whether it
has it masked or not.

Previously, if the targeted thread had the signal masked,
it would be put on the processes' siglist. If
another thread has the signal umasked or unmasks it before
the target, then the thread it was intended for would never
receive it.

This patch attempts to solve the problem by requiring callers
of tdsignal() to say whether the signal is for the thread or
for the process. If it is for the process, then normal processing
occurs and any thread that has it unmasked can receive it.
But if it is destined for a specific thread, it is put on
that thread's pending list regardless of whether it is currently
masked or not.

The new behaviour still needs more work, though.  If the signal
is reposted for some reason it is always posted back to the
thread that handled it because the information regarding the
target of the signal has been lost by then.

Reviewed by:	jdp, jeff, bde (style)
2003-07-03 19:09:59 +00:00
David Xu
9dde3bc999 o Change kse_thr_interrupt to allow send a signal to a specified thread,
or unblock a thread in kernel, and allow UTS to specify whether syscall
  should be restarted.
o Add ability for UTS to monitor signal comes in and removed from process,
  the flag PS_SIGEVENT is used to indicate the events.
o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with
  this flag set to wait for above signal event.
o For SA based thread, kernel masks all signal in its signal mask, let
  UTS to use kse_thr_interrupt interrupt a thread, and install a signal
  frame in userland for the thread.
o Add a tm_syncsig in thread mailbox, when a hardware trap occurs,
  it is used to deliver synchronous signal to userland, and upcall
  is schedule, so UTS can process the synchronous signal for the thread.

Reviewed by: julian (mentor)
2003-06-28 08:29:05 +00:00
David Xu
418228df24 Fix POSIX compatible bug for sigwaitinfo and sigtimedwait.
POSIX says siginfo pointer parameter can be NULL and if the
function success, it should return signal number but not zero.
The waitset it past should be negatived before it can be
used as thread signal mask.
2003-06-28 08:03:28 +00:00
David Xu
062cf543fc When a STOP signal is being sent to a process, it is possible all
threads in the process have already masked the signal, so job control
is delayed. But later a thread unmasking the STOP signal should enable
job control, so in issignal(), scanning all threads in process to see
if we can direct suspend some of them, not just suspend current thread.
2003-06-20 03:36:45 +00:00
David Xu
8b56079e2b Fix typo. td should be td0. 2003-06-20 01:56:28 +00:00
David Xu
cd4f6ebb13 1. Add code to support bound thread. when blocked, a bound thread never
schedules an upcall. Signal delivering to a bound thread is same as
   non-threaded process. This is intended to be used by libpthread to
   implement PTHREAD_SCOPE_SYSTEM thread.
2. Simplify kse_release() a bit, remove sleep loop.
2003-06-15 12:51:26 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
John Baldwin
5e26dcb560 - Add a td_pflags field to struct thread for private flags accessed only by
curthread.  Unlike td_flags, this field does not need any locking.
- Replace the td_inktr and td_inktrace variables with equivalent private
  thread flags.
- Move TDF_OLDMASK over to the private flags field so it no longer requires
  sched_lock.
2003-06-09 17:38:32 +00:00
David E. O'Brien
8d542cb56d Fix long standing bug that prevents the PT_CONTINUE, PT_KILL and
PT_DETACH ptrace(2) requests from functioning as advertised in the
manual page.  As described in kern/35175, the PT_DETACH request will,
under certain circumstances, pass an unwanted signal on to the traced
process upan detaching from it.  The PT_CONTINUE request will
sometimes fail if you make it pass a signal that has "properties" that
differ from the properties of the signal that origionally caused the
traced process to be stopped.  Since PT_KILL is nothing than
PT_CONTINUE with SIGKILL, it is broken too.  In the PT_KILL case, this
leads to an unkillable process.

PR:		44011
Submitted by:	Mark Kettenis <kettenis@chello.nl>
Approved by:	re(jhb)
2003-05-16 01:34:23 +00:00