freebsd-dev

Author	SHA1	Message	Date
Edward Tomasz Napierala	b38520f09c	Fix long-standing thinko regarding maxproc accounting. Basically, we were accounting the newly created process to its parent instead of the child itself. This caused problems later, when the child changed its credentials - the per-uid, per-jail etc counters were not properly updated, because the maxproc counter in the child process was 0. Approved by: re (kib)	2011-09-17 19:55:32 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Jonathan Anderson	cfb5f76865	Add experimental support for process descriptors A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc	2011-08-18 22:51:30 +00:00
Edward Tomasz Napierala	afcc55f318	All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".	2011-07-06 20:06:44 +00:00
David E. O'Brien	f528c3fdbc	We should not return ECHILD when debugging a child and the parent does a "wait4(-1, ..., WNOHANG, ...)". Instead wait(2) should behave as if the child does not wish to report status at this time. Reviewed by: jhb	2011-06-14 17:09:30 +00:00
Sergey Kandaurov	ced9253e4e	Remove stale M_ZOMBIE malloc type. This type is unused since embedding p_ru into struct proc. MFC after: 1 week	2011-04-14 14:25:47 +00:00
Konstantin Belousov	2cfd5d3c91	Some callers of proc_reparent() already have the parent process locked. Detect the situation and avoid process lock recursion. Reported by: Fabian Keil <freebsd-listen fabiankeil de>	2011-04-10 17:07:02 +00:00
Edward Tomasz Napierala	58c77a9d53	Enable accounting for RACCT_NPROC and RACCT_NTHR. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-31 19:22:11 +00:00
Edward Tomasz Napierala	097055e26d	Add racct. It's an API to keep per-process, per-jail, per-loginclass and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-29 17:47:25 +00:00
Alexander Leidinger	bb63fdde6d	By using the 32-bit Linux version of Sun's Java Development Kit 1.6 on FreeBSD (amd64), invocations of "javac" (or "java") eventually end with the output of "Killed" and exit code 137. This is caused by: 1. After calling exec() in multithreaded linux program threads are not destroyed and continue running. They get killed after program being executed finishes. 2. linux_exit_group doesn't return correct exit code when called not from group leader. Which happens regularly using sun jvm. The submitters fix this in a similar way to how NetBSD handles this. I took the PRs away from dchagin, who seems to be out of touch of this since a while (no response from him). The patches committed here are from [2], with some little modifications from me to the style. PR: 141439 [1], 144194 [2] Submitted by: Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk Reviewed by: rdivacky (in april 2010) MFC after: 5 days	2010-11-22 09:06:59 +00:00
John Baldwin	d680caab73	- When disabling ktracing on a process, free any pending requests that may be left. This fixes a memory leak that can occur when tracing is disabled on a process via disabling tracing of a specific file (or if an I/O error occurs with the tracefile) if the process's next system call is exit(). The trace disabling code clears p_traceflag, so exit1() doesn't do any KTRACE-related cleanup leading to the leak. I chose to make the free'ing of pending records synchronous rather than patching exit1(). - Move KTRACE-specific logic out of kern_(exec\|exit\|fork).c and into kern_ktrace.c instead. Make ktrace_mtx private to kern_ktrace.c as a result. MFC after: 1 month	2010-10-21 19:17:40 +00:00
David Xu	cf7d9a8ca8	Create a global thread hash table to speed up thread lookup, use rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian	2010-10-09 02:50:23 +00:00
Rui Paulo	79856499bd	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
John Baldwin	ad6eec7b9e	Tweak the in-kernel API for sending signals to threads: - Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c. - Add tdsignal() and tdksignal() routines that mirror psignal() and pksignal() except that they accept a thread as an argument instead of a process. They send a signal to a specific thread rather than to an individual process. Reviewed by: kib	2010-06-29 20:41:52 +00:00
Ed Schouten	8dc9b4cf04	Let access overriding to TTYs depend on the cdev_priv, not the vnode. Basically this commit changes two things, which improves access to TTYs in exceptional conditions. Basically the problem was that when you ran jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if you want to attach to screens quickly, use ssh(1), etc. The fixes: - Cache the cdev_priv of the controlling TTY in struct session. Change devfs_access() to compare against the cdev_priv instead of the vnode. This allows you to bypass UNIX permissions, even across different mounts of devfs. - Extend devfs_prison_check() to unconditionally expose the device node of the controlling TTY, even if normal prison nesting rules normally don't allow this. This actually allows you to interact with this device node. To be honest, I'm not really happy with this solution. We now have to store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp). In an ideal world, we should just get rid of the latter two and only use s_ttyp, but this makes certian pieces of code very impractical (e.g. devfs, kern_exit.c). Reported by: Many people	2009-12-19 18:42:12 +00:00
Konstantin Belousov	6728dc169d	Refine r195509, instead of checking that vnode type is VBAD, that is set quite late in the revocation path, properly verify that vnode is not doomed before calling VOP. Reported and tested by: Harald Schmalzbauer <h.schmalzbauer omnilan de> MFC after: 3 days	2009-10-10 21:17:30 +00:00
Marius Strobl	5486ffc898	Add a temporary workaround which just lets init die instead of causing a panic if it is killed due to a unsolved stack overflow seen very late during shutdown on sparc64 when the gmirror worker process exists, which is a regression introduced in 8.0. Reviewed by: kib MFC after: 3 days	2009-08-26 21:10:47 +00:00
Jamie Gritton	7afcbc18b3	Remove the interim vimage containers, struct vimage and struct procg, and the ioctl-based interface that supported them. Approved by: re (kib), bz (mentor)	2009-07-17 14:48:21 +00:00
Konstantin Belousov	7c6d401c75	The control terminal revocation at the session leader exit does not correctly checks for reclaimed vnode, possibly calling VOP_REVOKE for such vnode. If the terminal is already revoked, or devfs mount was forcibly unmounted, the revocation of doomed ctty vnode causes panic. Reported and tested by: lstewart Approved by: re (kensmith) MFC after: 2 weeks	2009-07-09 18:54:38 +00:00
Robert Watson	2ef24dde7c	udit the 'options' argument to wait4(2). Approved by: re (kib) MFC after: 3 days	2009-07-01 12:36:10 +00:00
Robert Watson	14961ba789	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week	2009-06-27 13:58:44 +00:00
Ed Schouten	eaaaf1906b	Perform some more cleanups to in-kernel session handling. The code that was in place in exit1() was mainly based on code from the old TTY layer. The main reason behind this, was because at one moment I ran a system that had two TTY layers in place at the same time. It is now sufficient to do the following: - Remove references from the session structure to the TTY vnode and the session leader. - If we have a controlling TTY and the session used by the TTY is equal to our session, send the SIGHUP. - If we have a vnode to the controlling TTY which has not been revoked, revoke it. While there, change sys/kern/tty.c to use s_ttyp in the comparison instead of s_ttyvp. It should not make any difference, because s_ttyvp can only become null when the session leader already left, but it's nicer to compare against the proper value.	2009-06-15 20:45:51 +00:00
Ed Schouten	9c373a81a3	Make tcsetsid(3) work on revoked TTYs. Right now the only way to make tcsetsid(3)/TIOCSCTTY work, is by ensuring the session leader is dead. This means that an application that catches SIGHUPs and performs a sleep prevents us from assigning a new session leader. Change the code to make it work on revoked TTYs as well. This allows us to change init(8) to make the shutdown script run in a more clean environment.	2009-06-15 19:17:52 +00:00
Robert Watson	324fb6be0f	Move zombie-reaping code out of kern_wait() and into its own function, proc_reap(). Reviewed by: jhb MFC after: 3 days Sponsored by: Google, Inc.	2009-06-08 15:26:09 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Jamie Gritton	0304c73163	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)	2009-05-27 14:11:23 +00:00
Marko Zec	29b02909eb	Introduce a new virtualization container, provisionally named vprocg, to hold virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)	2009-05-08 14:11:06 +00:00
Konstantin Belousov	6fae832ad7	Fix typo. Noted by: jhb MFC after: 2 weeks	2009-04-20 15:10:03 +00:00
Konstantin Belousov	007abb3d0f	On the exit of the child process which parent either set SA_NOCLDWAIT or ignored SIGCHLD, unconditionally wake up the parent instead of doing this only when the child is a last child. This brings us in line with other U**xes that support SA_NOCLDWAIT. If the parent called waitpid(childpid), then exit of the child should wake up the parent immediately instead of forcing it to wait for all children to exit. Reported by: Alan Ferrency <alan pair com> Submitted by: Jilles Tjoelker <jilles stack nl> PR: 108390 MFC after: 2 weeks	2009-04-20 14:34:55 +00:00
Ed Schouten	c90c9021e9	Remove even more unneeded variable assignments. kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build	2009-02-26 15:51:54 +00:00
Konstantin Belousov	aeb325719a	Several threads in a process may do vfork() simultaneously. Then, all parent threads sleep on the parent' struct proc until corresponding child releases the vmspace. Each sleep is interlocked with proc mutex of the child, that triggers assertion in the sleepq_add(). The assertion requires that at any time, all simultaneous sleepers for the channel use the same interlock. Silent the assertion by using conditional variable allocated in the child. Broadcast the variable event on exec() and exit(). Since struct proc * sleep wait channel is overloaded for several unrelated events, I was unable to remove wakeups from the places where cv_broadcast() is added, except exec(). Reported and tested by: ganbold Suggested and reviewed by: jhb MFC after: 2 week	2008-12-05 20:50:24 +00:00
Bjoern A. Zeeb	413628a7e3	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
David Xu	904c5ec4e3	Move per-thread userland debugging flags into seperated field, this eliminates some problems of locking, e.g, a thread lock is needed but can not be used at that time. Only the process lock is needed now for new field.	2008-10-15 06:31:37 +00:00
David Xu	3eb8b8bbeb	Don't remove queued SIGCHLD if options contain WNOWAIT, so other threads still can be notified by the signal.	2008-08-29 01:34:05 +00:00
Konstantin Belousov	cbc158449b	Implement WNOWAIT flag for wait4(2). It specifies that process whose status is returned shall be kept in the waitable state. Add WSTOPPED as an alias for WUNTRACED. Submitted by: Jukka Ukkonen <jau at iki fi> PR: standards/116221 MFC after: 2 weeks	2008-08-26 12:37:16 +00:00
Ed Schouten	bc093719ca	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan	2008-08-20 08:31:58 +00:00
John Birrell	5d217f173c	Add DTrace 'proc' provider probes using the Statically Defined Trace (sdt) mechanism.	2008-05-24 06:22:16 +00:00
Poul-Henning Kamp	4218a7310b	In abort2(2): Accept a NULL arg pointer if nargs == 0	2008-03-22 16:32:52 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
Kris Kennaway	e17660e79c	Switch from conditionally dropping Giant in exit1() to asserting it is not held, which appears to be always true.	2008-02-17 15:28:28 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Julian Elischer	7ab24ea3b9	Introduce a way to make pure kernal threads. kthread_add() takes the same parameters as the old kthread_create() plus a pointer to a process structure, and adds a kernel thread to that process. kproc_kthread_add() takes the parameters for kthread_add, plus a process name and a pointer to a pointer to a process instead of just a pointer, and if the proc * is NULL, it creates the process to the specifications required, before adding the thread to it. All other old kthread_xxx() calls return, but act on (struct thread ) instead of (struct proc ). One reason to change the name is so that any old kernel modules that are lying around and expect kthread_create() to make a process will not just accidentally link. fix top to show kernel threads by their thread name in -SH mode add a tdnam formatting option to ps to show thread names. make all idle threads actual kthreads and put them into their own idled process. make all interrupt threads kthreads and put them in an interd process (mainly for aesthetic and accounting reasons) rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper' man page fixes to follow.	2007-10-26 08:00:41 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
John Baldwin	34a9edafbc	Improve the ktrace locking somewhat to reduce overhead: - Depessimize userret() in kernels where KTRACE is enabled by doing an unlocked check of the per-process queue of pending events before acquiring any locks. Previously ktr_userret() unconditionally acquired the global ktrace_sx lock on every return to userland for every thread, even if ktrace wasn't enabled for the thread. - Optimize the locking in exit() to first perform an unlocked read of p_traceflag to see if ktrace is enabled and only acquire locks and teardown ktrace if the test succeeds. Also, explicitly disable tracing before draining any pending events so the pending events actually get written out. The unlocked read is safe because proc lock is acquired earlier after single-threading so p_traceflag can't change between then and this check (well, it can currently due to a bug in ktrace I will fix next, but that race existed prior to this change as well). Reviewed by: rwatson	2007-06-13 20:01:42 +00:00
Attilio Rao	a1fe14bc33	rufetch and calcru sometimes should be called atomically together. This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)	2007-06-09 21:48:44 +00:00
Attilio Rao	a140976eb4	The current rusage code show peculiar problems: - Unsafeness on ruadd() in thread_exit() - Unatomicity of thread_exiit() in the exit1() operations This patch addresses these problems allocating p_fd as part of the process and modifying the way it is accessed. A small chunk of this patch, resolves a race about p_state in kern_wait(), since we have to be sure about the zombif-ing process. Submitted by: jeff Approved by: jeff (mentor)	2007-06-09 18:56:11 +00:00
Robert Watson	faef53711b	Move per-process audit state from a pointer in the proc structure to embedded storage in struct ucred. This allows audit state to be cached with the thread, avoiding locking operations with each system call, and makes it available in asynchronous execution contexts, such as deep in the network stack or VFS. Reviewed by: csjp Approved by: re (kensmith) Obtained from: TrustedBSD Project	2007-06-07 22:27:15 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
John Baldwin	1bba2a940b	Move cpu_exit() earlier in exit1() to close a race between SIGCHLD/kevent(2) notification of process termination and wait(). Now we no longer drop locks between sending the notification and marking the process as a zombie. Previously, if another process attempted to do a wait() with W_NOHANG after receiving a SIGCHLD or kevent and locked the process while the exiting thread was in cpu_exit(), then wait() would fail to find the process, which is quite astonishing to the process calling wait(). MFC after: 3 days	2007-05-14 22:21:58 +00:00
John Baldwin	aa89d8cd52	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.	2007-03-21 21:20:51 +00:00
Robert Watson	873fbcd776	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.	2007-03-05 13:10:58 +00:00
Robert Watson	0c14ff0eb5	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.	2007-03-04 22:36:48 +00:00
David Xu	ff7668079f	Move sigqueue_take() call into proc_reparent(), this fixed bugs where proc_reparent() is called but sigqueue_take() is forgotten.	2006-10-25 06:18:04 +00:00
David Xu	e94cc4ac30	Protect sigqueue_take() call by child process's lock, it fixed a potential race with ptrace 'attach' which changes parent of the child process.	2006-10-24 12:04:21 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
David Xu	f71e748d89	Since revision 1.333 of kern_sig.c no longer uses P_WEXIT, the change opened a race window which can cause memory leak in signal queue. Here we free memory for signal queue when process state is set to PRS_ZOMBIE.	2006-10-21 23:59:15 +00:00
Christian S.J. Peron	7ca6b7823d	Back out one of the Giant removals from revision 1.272. Giant was not here to protect the vnode, it was present to synchronize access to TTY session information between exit(2) and the TTY code. While we are here, note that Giant is required for TTY protection. Clue from: bde Discussed with: jhb MFC after: 1 week	2006-09-13 15:47:53 +00:00
Tor Egge	57051fdc4b	Close race between vmspace_exitfree() and exit1() and races between vmspace_exitfree() and vmspace_free() which could result in the same vmspace being freed twice. Factor out part of exit1() into new function vmspace_exit(). Attach to vmspace0 to allow old vmspace to be freed earlier. Add new function, vmspace_acquire_ref(), for obtaining a vmspace reference for a vmspace belonging to another process. Avoid changing vmspace refcount from 0 to 1 since that could also lead to the same vmspace being freed twice. Change vmtotal() and swapout_procs() to use vmspace_acquire_ref(). Reviewed by: alc	2006-05-29 21:28:56 +00:00
Christian S.J. Peron	d5e5634075	Kill the last Giant acquisition in the exit(2) code. This Giant acquisition doesn't appear to be protecting anything. Most of consumers funsetownlst(9) do not appear to be picking up Giant anywhere. This was originally a part of my Giant exit(2) clean up revision 1.272 but I thought it was a good idea to leave it out until we were able to analyze it better. Tested by: kris MFC after: 3 weeks	2006-04-10 14:07:28 +00:00
Peter Wemm	b9eee07e36	Remove the unused sva and eva arguments from pmap_remove_pages().	2006-04-03 21:16:10 +00:00
David Xu	e170bfda56	1. Count last time slice, this intends to fix "calcru: runtime went backwards" bug for threaded process. 2. Add comment about possible logical problem with scheduler. MFC after: 3 days	2006-03-14 04:00:21 +00:00
John Baldwin	06ad42b2f7	Close some races between procfs/ptrace and exit(2): - Reorder the events in exit(2) slightly so that we trigger the S_EXIT stop event earlier. After we have signalled that, we set P_WEXIT and then wait for any processes with a hold on the vmspace via PHOLD to release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops to zero. - Change proc_rwmem() to require that the processing read from has its vmspace held via PHOLD by the caller and get rid of all the junk to screw around with the vmspace reference count as we no longer need it. - In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it doesn't exist. - Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem() to clear an earlier single-step simualted via a breakpoint). We only do one to avoid races. Also, by making the EINVAL error for unknown requests be part of the default: case in the switch, the various switch cases can now just break out to return which removes a _lot_ of duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug where a LWP ptrace command could return EINVAL with the proc lock still held. - Changed the locking for ptrace_single_step(), ptrace_set_pc(), and ptrace_clear_single_step() to always be called with the proc lock held (it was a mixed bag previously). Alpha and arm have to drop the lock while the mess around with breakpoints, but other archs avoid extra lock release/acquires in ptrace(). I did have to fix a couple of other consumers in kern_kse and a few other places to hold the proc lock and PHOLD. Tested by: ps (1 mostly, but some bits of 2-4 as well) MFC after: 1 week	2006-02-22 18:57:50 +00:00
John Baldwin	bd106be404	Move the ruadd() in kern_exit() to save our final stats in our child stats even further down in exit1() so that it includes the runtime and tick counts from the final time slice for the dying thread. Reviewed by: phk	2006-02-21 21:48:42 +00:00
Poul-Henning Kamp	e8444a7e6f	CPU time accounting speedup (step 2) Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64	2006-02-11 09:33:07 +00:00
Poul-Henning Kamp	5b1a8eb397	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.	2006-02-07 21:22:02 +00:00
John Baldwin	934ba9b2cf	- Move the wakeup() for exiting kthreads out of exit1() and into kthread_exit() as that is cleaner and less obscured. It also does the wakeup sooner. - Add some comments to kthread_exit().	2006-02-06 21:56:13 +00:00
Wayne Salamon	de3007e8f3	Audit the pid being requested in wait4(). Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-02-06 00:19:09 +00:00
Robert Watson	00c28d9678	On process exit, audit the return value of the process, and commit the record immediately, as this system call never returns. Obtained from: TrustedBSD Project	2006-02-05 21:08:25 +00:00
John Baldwin	37f84a6018	Add a comment.	2006-02-03 21:09:40 +00:00
Robert Watson	fcf7f27a36	Hook up audit to fork() and exit() events. These changes manage the audit state on processes, not auditing of these events. Much work by: wsalamon Obtained from: TrustedBSD Project	2006-02-02 01:32:58 +00:00
Stephan Uphoff	03001f59c8	Hopefully fix the "calcru: runtime went backwards from ..." problem by keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days	2006-01-23 19:15:13 +00:00
Poul-Henning Kamp	25f6e35a05	Regenerate sysent with new abort2 system call. Implement abort2(const char reason, int narg, void *args); Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>	2005-12-23 11:58:42 +00:00
David Xu	d26b1a1fb9	Register itimers_event_hook as a kernel event handler, so I don't have to duplicate code to call it in exec() and exit1().	2005-12-09 05:43:26 +00:00
Robert Watson	2c255e9df6	Moderate rewrite of kernel ktrace code to attempt to generally improve reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>	2005-11-13 13:27:44 +00:00
Christian S.J. Peron	576068804d	Giant clean up for exit(2) -Change unconditional aquisition of Giant to only pickup Giant if the vnode for the controlling tty resides on a non-mpsafe file system. -Pickup Giant around executable vnode reference counting operations only if the executable resides on a non-mpsafe file system. -If this process is being traced, pickup Giant for trace file reference count operations only if it resides on a non-mpsafe file system. Discussed with: jhb Tested by: kris	2005-11-08 17:11:03 +00:00
David Xu	ebceaf6dc7	Add support for queueing SIGCHLD same as other UNIX systems did. For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)	2005-11-08 09:09:26 +00:00
John Baldwin	68a17869c1	Push down Giant into fdfree() and remove it from two of the callers. Other callers such as some rfork() cases weren't locking Giant anyway. Reviewed by: csjp MFC after: 1 week	2005-11-01 17:13:05 +00:00
Gleb Smirnoff	c0bc2867c1	- Fix leak of struct nlminfo on process exit. - Fix malloc type collision, that made the above problem difficult to understand. Reported by: Vladimir Sharun <sharun ukr.net>	2005-10-26 07:18:37 +00:00
David Xu	60354683d9	Make p_itimers as a pointer, so file sys/proc.h does not need to include sys/timers.h.	2005-10-23 12:19:08 +00:00
David Xu	86857b368d	Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.	2005-10-23 04:22:56 +00:00
David Xu	9104847f21	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64	2005-10-14 12:43:47 +00:00
John Baldwin	83de502d59	Add witness warnings to panic if a thread tries to exit while holding any locks. Requested by: jeff MFC after: 3 days	2005-09-02 20:20:01 +00:00
John Baldwin	ddf9c4f771	- Slightly reorder the events around the setting of PRS_ZOMBIE to be less hokie and much more readable and expand the comment to explain why it is the way that it is. - Close a race where one CPU could free the process belonging to a thread on another CPU that hasn't quite finished exiting yet but is beyond the point of setting the process state as PRS_ZOMBIE. Reported and tested by: ps (2) MFC after: 3 days	2005-07-18 20:08:14 +00:00
Stephan Uphoff	d13ec71369	Use low level constructs borrowed from interrupt threads to wait for work in proc0. Remove the TDP_WAKEPROC0 workaround.	2005-05-23 23:01:53 +00:00
David Xu	5a2f73e624	Only check signal event, single threading event shouldn't be reported.	2005-05-05 06:42:02 +00:00
David Xu	bc247e78c0	Wake up swapper process if needed. PR: kern/78474 Submitted by: Sam Lawrance <boris at brooknet dot com dot au>	2005-04-23 05:06:44 +00:00
David Xu	902c0d8297	Clear P_STATCHILD earlier to avoid unnecessary retrying.	2005-04-19 12:31:15 +00:00
David Xu	95992d56f5	Fix a race condition between kern_wait() and thread_stopped(). Problem is in kern_wait(), parent process steps through children list, once a child process is skipped, and later even if the child is stopped, parent process still sleeps in msleep(), the race happens if parent masked SIGCHLD. Submitted by : Peter Edwards peadar.edwards at gmail dot com MFC after : 4 days	2005-04-19 08:07:28 +00:00
Robert Watson	babe9a2bb3	Introduce p_canwait() and MAC Framework and MAC Policy entry points mac_check_proc_wait(), which control the ability to wait4() specific processes. This permits MAC policies to limit information flow from children that have changed label, although has to be handled carefully due to common programming expectations regarding the behavior of wait4(). The cr_seeotheruids() check in p_canwait() is #if 0'd for this reason. The mac_stub and mac_test policies are updated to reflect these new entry points. Sponsored by: SPAWAR, SPARTA Obtained from: TrustedBSD Project	2005-04-18 13:36:57 +00:00
Jeff Roberson	2b3183a8b7	- A lock is required before calling VOP_REVOKE. Our reference protects us from accessing another vnode so a naked VOP_LOCK is sufficient. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:47:04 +00:00
Poul-Henning Kamp	a258707313	In 1.276 of kern/subr_trap.c I introduced a mechanism for delaying a process return to userspace if it had pending GEOM events. We need to have the same check in the exit pass to catch the case where a GEOM related filedescriptor is not explicitly closed by the process. Bumped into by: people using dd(1) to build releases, nanobsd etc.	2005-01-29 14:03:41 +00:00
Robert Watson	fd544ee8f7	In kern_wait(), let the compiler copy the rusage structure rather than an explicit bcopy() -- it probably does a better job.	2005-01-08 04:17:48 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
John Baldwin	c88379381b	- Move the function prototypes for kern_setrlimit() and kern_wait() to sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.	2005-01-05 22:19:44 +00:00
David Schultz	6004362e66	Don't include sys/user.h merely for its side-effect of recursively including other headers.	2004-11-27 06:51:39 +00:00
David Xu	c283653201	Remove P_STOPPED_TRACE bit if debugger dies without a chance to detach debugged process.	2004-10-23 11:20:26 +00:00
John Baldwin	78c85e8dfc	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month	2004-10-05 18:51:11 +00:00
John Baldwin	a9a64385e7	Some more whitespace, style, and comment fixes. Submitted by: bde (mostly)	2004-09-24 20:27:04 +00:00
John Baldwin	6111dcd2ef	A modest collection of various and sundry style, spelling, and whitespace fixes. Submitted by: bde (mostly)	2004-09-24 00:38:15 +00:00
John Baldwin	7eaec467d8	Various small style fixes.	2004-09-22 15:24:33 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Alan Cox	9be60284a6	Giant is no longer required by vm_waitproc() and vmspace_exitfree(). Eliminate it acquisition and release around vm_waitproc() in kern_wait().	2004-07-30 20:31:02 +00:00
Alan Cox	1a276a3f91	- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.	2004-07-27 03:53:41 +00:00
Julian Elischer	55d44f79ea	When calling scheduler entrypoints for creating new threads and processes, specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter	2004-07-18 23:36:13 +00:00
David Xu	cbf4e354ec	Add code to support debugging threaded process. 1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel thread current user thread is running on. Add tm_dflags into kse_thr_mailbox, the flags is written by debugger, it tells UTS and kernel what should be done when the process is being debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER. TMDF_SSTEP is used to tell kernel to turn on single stepping, or turn off if it is not set. TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall whenever possible, to UTS, it means do not run the user thread until debugger clears it, this behaviour is necessary because gdb wants to resume only one thread when the thread's pc is at a breakpoint, and thread needs to go forward, in order to avoid other threads sneak pass the breakpoints, it needs to remove breakpoint, only wants one thread to go. Also, add km_lwp to kse_mailbox, the lwp id is copied to kse_thr_mailbox at context switch time when process is not being debugged, so when process is attached, debugger can map kernel thread to user thread. 2. Add p_xthread to proc strcuture and td_xsig to thread structure. p_xthread is used by a thread when it wants to report event to debugger, every thread can set the pointer, especially, when it is used in ptracestop, it is the last thread reporting event will win the race. Every thread has a td_xsig to exchange signal with debugger, thread uses TDF_XSIG flag to indicate it is reporting signal to debugger, if the flag is not cleared, thread will keep retrying until it is cleared by debugger, p_xthread may be used by debugger to indicate CURRENT thread. The p_xstat is still in proc structure to keep wait() to work, in future, we may just use td_xsig. 3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend a thread. When process stops, debugger can set the flag for thread, thread will check the flag in thread_suspend_check, enters a loop, unless it is cleared by debugger, process is detached or process is existing. The flag is also checked in ptracestop, so debugger can temporarily suspend a thread even if the thread wants to exchange signal. 4. Current, in ptrace, we always resume all threads, but if a thread has already a TDF_DBSUSPEND flag set by debugger, it won't run. Encouraged by: marcel, julian, deischen	2004-07-13 07:20:10 +00:00
Alan Cox	ce8da3091f	Push down the acquisition and release of the page queues lock into pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().	2004-07-13 02:49:22 +00:00
Marcel Moolenaar	fbc3247d81	Implement the PT_LWPINFO request. This request can be used by the tracing process to obtain information about the LWP that caused the traced process to stop. Debuggers can use this information to select the thread currently running on the LWP as the current thread. The request has been made compatible with NetBSD for as much as possible. This implementation differs from NetBSD in the following ways: 1. The data argument is allowed to be smaller than the size of the ptrace_lwpinfo structure known to the kernel, but not 0. This is opposite to what NetBSD allows. The reason for this is that we can extend the structure without affecting older binaries. 2. On NetBSD the tracing process is to set the pl_lwpid field to the Id of the LWP it wants information of. We don't do that. Our ptrace interface allows passing the LWP Id instead of the PID. The tracing process is to set the PID to the LWP Id it wants information of. 3. When the PID is actually the PID of the tracing process, this request returns the information about the LWP that caused the process to stop. This was the whole purpose of the request in the first place. When the traced process has exited, this request will return the LWP Id 0, indicating that the process state is not the result of an event specific to a LWP.	2004-07-12 05:07:50 +00:00
Bruce Evans	40a3fa2d59	(1) Removed the bogus condition "p->p_pid != 1" on calling sched_exit() from exit1(). sched_exit() must be called unconditionally from exit1(). It was called almost unconditionally because the only exits on system shutdown if at all. (2) Removed the comment that presumed to know what sched_exit() does. sched_exit() does different things for the ULE case. The call became essential when it started doing load average stuff, but its caller should not know that. (3) Didn't fix bugs caused by bitrot in the condition. The condition was last correct in rev.1.208 when it was in wait1(). There p was spelled curthread->td_proc and was for the waiting parent; now p is for the exiting child. The condition was to avoid lowering init's priority. It should be in sched_exit() itself. Lowering of priorities is broken in other ways in at least the 4BSD scheduler, and doing it for init causes less noticeable problems than doing it for for shells. Noticed by: julian (1)	2004-06-21 14:49:50 +00:00
Bruce Evans	871684b822	Update p_runtime on exit. This fixes calcru() on zombies, and prepares for not calling calcru() on exit. calcru() on a zombie can happen if ttyinfo() (^T) picks one. PR: 52490	2004-06-21 14:03:38 +00:00
David Xu	b370279ef8	Add comment to reflect that we should retry after thread singling failed.	2004-06-18 11:13:49 +00:00
David Xu	0aabef657e	Remove a bogus panic. It is possible more than one threads will be suspended in thread_suspend_check, after they are resumed, all threads will call thread_single, but only one can be success, others should retry and will exit in thread_suspend_check.	2004-06-18 06:21:09 +00:00
Tim J. Robbins	f55530b436	Remove remnants of PGINPROF.	2004-06-08 10:37:30 +00:00
Tim J. Robbins	aa0aa7a113	Move TDF_SA from td_flags to td_pflags (and rename it accordingly) so that it is no longer necessary to hold sched_lock while manipulating it. Reviewed by: davidxu	2004-06-02 07:52:36 +00:00
Thomas Moestl	65e29c4822	Retire cpu_sched_exit(); it is not used any more.	2004-05-26 12:09:39 +00:00
David Xu	702ac0f112	Clear KSE thread flags after KSE thread mode is ended. The side effect of not clearing the flags for execv() syscall will result that a new program runs in KSE thread mode without enabling it. Submitted by: tjr Modified by: davidxu	2004-05-21 14:50:23 +00:00
Julian Elischer	b324899838	Remove misplaced duplicate comment and slightly reformat the version that was in the right place.	2004-05-09 22:29:14 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
Brian Feldman	150883179a	Add the missing Giant when doing anything with VFS -- in this case, releasing the ktrace vnode.	2004-03-18 18:15:58 +00:00
John Baldwin	b7e23e826c	- Replace wait1() with a kern_wait() function that accepts the pid, options, status pointer and rusage pointer as arguments. It is up to the caller to copyout the status and rusage to userland if needed. This lets us axe the 'compat' argument and hide all that functionality in owait(), by the way. This also cleans up some locking in kern_wait() since it no longer has to drop locks around copyout() since all the copyout()'s are deferred. - Convert owait(), wait4(), and the various ABI compat wait() syscalls to use kern_wait() rather than wait1() or wait4(). This removes a bit more stackgap usage. Tested on: i386 Compiled on: i386, alpha, amd64	2004-03-17 20:00:00 +00:00
Peter Wemm	a5bdcb2a2f	Make the process_exit eventhandler run without Giant. Add Giant hooks in the two consumers that need it.. processes using AIO and netncp. Update docs. Say that process_exec is called with Giant, but not to depend on it. All our consumers can handle it without Giant.	2004-03-14 02:06:28 +00:00
Peter Wemm	37814395c1	Push Giant down a little further: - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe	2004-03-13 22:31:39 +00:00
John Baldwin	4ae89b957c	- Push down Giant in exit() and wait(). - Push Giant down a bit in coredump() and call coredump() with the proc lock already held rather than unlocking it only to turn around and relock it. Requested by: peter	2004-03-05 22:39:53 +00:00
John Baldwin	e5bb601d87	Drop sched_lock around the wakeup of the parent process after setting the process state to zombie when a process exits to avoid a lock order reversal with the sleepqueue locks. This appears to be the only place that we call wakeup() with sched_lock held.	2004-02-27 18:39:09 +00:00
Don Lewis	6567eef757	A Linux thread created using clone() should not send SIGCHLD to its parent if no signal is specified in the clone() flags argument. PR: 42457 MFC after: 2 weeks	2004-02-19 06:43:48 +00:00
Don Lewis	55b5f2a202	When reparenting a process to init, make sure that p_sigparent is set to SIGCHLD. This avoids the creation of orphaned Linux-threaded zombies that init is unable to reap. This can occur when the parent process sets its SIGCHLD to SIG_IGN. Fix a similar situation in the PT_DETACH code. Tested by: "Steven Hartland" <killing AT multiplay.co.uk>	2004-02-11 22:06:02 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Robert Watson	679365e7b9	Reduce gratuitous includes: don't include jail.h if it's not needed. Presumably, at some point, you had to include jail.h if you included proc.h, but that is no longer required. Result of: self injury involving adding something to struct prison	2004-01-21 17:10:47 +00:00
Olivier Houchard	1a29c80648	Better fix than my previous commit: in exit1(), make sure the p_klist is empty after sending NOTE_EXIT. The process won't report fork() or execve() and won't be able to handle NOTE_SIGNAL knotes anyway. This fixes some race conditions with do_tdsignal() calling knote() while the process is exiting. Reported by: Stefan Farfeleder <stefan@fafoe.narf.at> MFC after: 1 week	2003-11-14 18:49:01 +00:00
David Xu	0e2a4d3aeb	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.	2003-06-15 00:31:24 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
John Baldwin	5499ea019d	Wait for the real interval timer callout handler to finish executing if it is currently executing when we try to remove it in exit1(). Without this, it was possible for the callout to bogusly rearm itself and eventually refire after the process had been free'd resulting in a panic. PR: kern/51964 Reported by: Jilles Tjoelker <jilles@stack.nl> Reviewed by: tegge, bde	2003-06-09 21:46:22 +00:00
John Baldwin	90af4afacb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)	2003-05-13 20:36:02 +00:00
John Baldwin	7d447c956b	Initialize and destroy the struct proc mutex in the proc zone's init and fini routines instead of in fork() and wait(). This has the nice side benefit that the proc lock of any process on the allproc list is always valid and sched_lock doesn't have to be used to test against PRS_NEW anymore.	2003-05-01 21:16:38 +00:00
John Baldwin	112afcb232	- Protect p_numthreads with the sched_lock. - Protect p_singlethread with both the sched_lock and the proc lock. - Protect p_suspcount with the proc lock.	2003-04-23 18:46:51 +00:00
David Xu	11b20c685b	Fix lock order reversal problem.	2003-04-21 14:42:04 +00:00
John Baldwin	462f31bff0	Adjust a few comments.	2003-04-17 22:22:47 +00:00
Jeff Roberson	f6f230febe	- Adjust sched hooks for fork and exec to take processes as arguments instead of ksegs since they primarily operation on processes. - KSEs take ticks so pass the kse through sched_clock(). - Add a sched_class() routine that adjusts a ksegrp pri class. - Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp} that will be used to tell the scheduler about new instances of these structures within the same process. These will be used by THR and KSE. - Change sched_4bsd to reflect this API update.	2003-04-11 03:39:07 +00:00
Jeff Roberson	2c10d16a4b	- Borrow the KSE single threading code for exec and exit. We use the check if (p->p_numthreads > 1) and not a flag because action is only necessary if there are other threads. The rest of the system has no need to identify thr threaded processes. - In kern_thread.c use thr_exit1() instead of thread_exit() if P_THREADED is not set.	2003-04-01 01:26:20 +00:00
Jeff Roberson	4093529dee	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.	2003-03-31 22:49:17 +00:00
John Baldwin	75b8b3b25c	Replace the at_fork, at_exec, and at_exit functions with the slightly more flexible process_fork, process_exec, and process_exit eventhandlers. This reduces code duplication and also means that I don't have to go duplicate the eventhandler locking three more times for each of at_fork, at_exec, and at_exit. Reviewed by: phk, jake, almost complete silence on arch@	2003-03-24 21:15:35 +00:00
Dag-Erling Smørgrav	830c3153c6	Unregisterize, ansify.	2003-03-19 00:49:40 +00:00
Dag-Erling Smørgrav	4e8074eba2	Whitespace cleanup.	2003-03-19 00:33:38 +00:00
John Baldwin	a5881ea55a	- Cache a reference to the credential of the thread that starts a ktrace in struct proc as p_tracecred alongside the current cache of the vnode in p_tracep. This credential is then used for all later ktrace operations on this file rather than using the credential of the current thread at the time of each ktrace event. - Now that we have multiple ktrace-related items in struct proc that are pointers, rename p_tracep to p_tracevp to make it less ambiguous. Requested by: rwatson (1)	2003-03-13 18:24:22 +00:00
Tim J. Robbins	6ec62361c8	Tidy up previous change: move comment about obtaining an exclusive reference where it belongs, and remove a blank line to make it more obvious what the comment applies to.	2003-03-13 00:57:47 +00:00
Tim J. Robbins	3890793e9c	In wait1(), remove the zombie process from zombproc before removing it from its pgrp to avoid leaving zombies around with p_pgrp == NULL. This bug was apparent as a NULL-dereference in the pid selection code in fork1().	2003-03-12 11:10:04 +00:00
David Xu	e574e444e0	Fix threaded process job control bug. SMP tested. Reviewed by: julian	2003-03-11 00:07:53 +00:00
Julian Elischer	ac2e415327	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.	2003-02-27 02:05:19 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00

1 2 3 4 5 ...

444 Commits