freebsd-dev

Author	SHA1	Message	Date
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
Pawel Jakub Dawidek	2342d5216e	Remove duplicated $FreeBSD$.	2006-09-30 16:33:29 +00:00
Martin Blapp	8be563721a	Move Giant up even further since P_CONTROLT isn't really fully locked yet (p_flag is, but P_CONTROLT isn't really). Submitted by: jhb	2006-09-27 16:42:10 +00:00
Martin Blapp	45e6819160	Protect enterpgrp() against another tty/proc race case until the tty locking work has been fixed. MFC after: 1 week	2006-09-23 17:35:24 +00:00
Martin Blapp	d7b167b57b	Fix races between tty.c and sessrele() / doenterpgrp() / leavepgrp(). The tty code is still under giant lock, but the session/pgrp release code just used proctree_locks. This explains why moving the proctree_lock in sys/kern/tty.c rev. 1.258 did fix the panics in our SMP systems. This should also fix some race panics with revoked ttys. Reviewed by: jhb MFC after: 1 week	2006-09-19 19:25:11 +00:00
Poul-Henning Kamp	e8444a7e6f	CPU time accounting speedup (step 2) Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64	2006-02-11 09:33:07 +00:00
Poul-Henning Kamp	5b1a8eb397	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.	2006-02-07 21:22:02 +00:00
Julian Elischer	11f4763dd4	Return the thread name in the kinfo_proc structure. Also correct the comment describing what the value is.	2006-01-18 20:27:43 +00:00
Juli Mallett	b241b0a239	Since p_cansee will end up dereferencing p_ucred, don't check for p_ucred equal to NULL several times later. p_ucred "should probably not" be NULL if the process isn't PRS_NEW anyway. This is strongly reinforced by the fact that we don't see frequent crashes here. Remove the checks after p_cansee and add a KASSERT right before it. Found by: Coverity Prevent (tm) Also trim one nearby trailing space.	2006-01-17 20:25:01 +00:00
David Xu	3357835a46	Add code to report zombie state. PR: threads/91044 MFC after: 3 days	2005-12-29 13:00:42 +00:00
Robert Watson	2c255e9df6	Moderate rewrite of kernel ktrace code to attempt to generally improve reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>	2005-11-13 13:27:44 +00:00
David Xu	ebceaf6dc7	Add support for queueing SIGCHLD same as other UNIX systems did. For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)	2005-11-08 09:09:26 +00:00
John Baldwin	f55ab99409	Document in #ifdef notnow code the actions that proc_fini would need to take if struct procs were actually freed.	2005-10-24 20:15:23 +00:00
Don Lewis	5032ff8197	Always wire the sysctl output buffer in sysctl_kern_proc() before calling sysctl_out_proc(). -- fix from jhb Move the code in fill_kinfo_thread() that gathers data from struct proc into the new function fill_kinfo_proc_only(). Change all callers of fill_kinfo_thread() to call both fill_kinfo_proc_only() and fill_kinfo() thread. When gathering data from a multi-threaded process, fill_kinfo_proc_only() only needs to be called once. Grab sched_lock before accessing the process thread list or calling fill_kinfo_thread(). PR: kern/84684 MFC after: 3 days	2005-10-02 23:27:56 +00:00
John Baldwin	55b4a5ae0d	Use the refcount API to implement reference counts on process argument structures rather than using a global mutex to protect the reference counts. Tested on: i386, alpha, sparc64	2005-09-27 18:03:15 +00:00
David Schultz	fe769cdd95	Add a sysctl that returns the full path of a process' text file. This information is needed by things like `gdb -p' and Sun's javac, and previously it could only be obtained via procfs	2005-04-18 02:10:37 +00:00
John Baldwin	c6a37e8413	Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more	2005-04-04 21:53:56 +00:00
Pawel Jakub Dawidek	c78941e69e	Add ki_jid field to the kinfo_proc structure and store jail ID there. Reviewed by: gad MFC after: 3 days	2005-03-20 10:35:23 +00:00
Poul-Henning Kamp	572b4402d1	In stange circumstances we may end up being the last reference to a session in tprintf(). SESSRELE() needs to properly dispose of the sessions mutex. Add sessrele() which does the proper cleanup and have SESSRELE() call it. Use SESSRELE also in pgdelete(). Found by: Coverity (ID:526)	2005-03-17 08:44:41 +00:00
Pawel Jakub Dawidek	cefcecbefd	Function jailed() looks into ucred strcture, so be sure ucred is not NULL. Reviewed by: rwatson MFC after: 1 week	2005-03-12 14:31:04 +00:00
Pawel Jakub Dawidek	d079d0a0d2	Clean up a bit. Reviewed by: rwatson MFC after: 1 week	2005-03-12 14:28:34 +00:00
Poul-Henning Kamp	0c898376fa	Make a bunch of SYSCTL_NODEs static.	2005-02-10 12:15:49 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
David Schultz	1eecfae3e5	Axe a.out core dump support. Neither older gdb binaries nor current bfd sources understand the present format.	2004-11-27 06:46:59 +00:00
David Schultz	6db36923ad	Remove local definitions of RANGEOF() and use __rangeof() instead. Also remove a few bogus casts.	2004-11-20 23:00:59 +00:00
David Schultz	8b059651ba	Malloc p_stats instead of putting it in the U area. We should consider simply embedding it in struct proc. Reviewed by: arch@	2004-11-20 02:28:48 +00:00
Julian Elischer	9b036bdf5a	Remove duplicate line.	2004-10-10 05:07:43 +00:00
John Baldwin	78c85e8dfc	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month	2004-10-05 18:51:11 +00:00
David Schultz	8daa8c602a	The zone from which proc structures are allocated is marked UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.	2004-09-19 18:34:17 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
Robert Watson	6cbea71c82	Cause pfind() not to return processes in the PRS_NEW state. As a result, threads consuming the result of pfind() will not need to check for a NULL credential pointer or other signs of an incompletely created process. However, this also means that pfind() cannot be used to test for the existence or find such a process. Annotate pfind() to indicate that this is the case. A review of curent consumers seems to indicate that this is not a problem for any of them. This closes a number of race conditions that could result in NULL pointer dereferences and related failure modes. Other related races continue to exist, especially during iteration of the allproc list without due caution. Discussed with: tjr, green	2004-08-14 17:15:16 +00:00
Julian Elischer	332e72ddb7	Remove typos on KASSERT messages.	2004-08-09 20:13:07 +00:00
Brian Feldman	b23f72e98a	* Add a "how" argument to uma_zone constructors and initialization functions so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.	2004-08-02 00:18:36 +00:00
Pawel Jakub Dawidek	cebabef04f	Fill some informations about zombie processes as well. Before this change every zombie process were reported as an owner of PID 0 in ps(1) output. Reviewed by: julian	2004-07-29 20:27:59 +00:00
Garance A Drosehn	7638fa19a7	Fill in the values for the ki_tid and ki_numthreads which have been added to kproc_info. PR: bin/65803 (a tiny part...) Submitted by: Cyrille Lefevre	2004-06-20 22:17:22 +00:00
Garance A Drosehn	99d2ecbc7d	Add a call to calcru() to update the kproc_info fields of ki_rusage.ru_utime and ki_rusage.ru_stime. This greatly improves the accuracy of those fields. Suggested by: bde	2004-06-20 02:03:33 +00:00
Garance A Drosehn	078842c5c9	Fill in the some new fields 'struct kinfo_proc', namely ki_childstime, ki_childutime, and ki_emul. Also uses the timevaladd() routine to correct the calculation of ki_childtime. That will correct the value returned when ki_childtime.tv_usec > 1,000,000. This also implements a new KERN_PROC_GID option for kvm_getprocs(). (there will be a similar update to lib/libkvm/kvm_proc.c) Submitted by: Cyrille Lefevre	2004-06-19 14:03:00 +00:00
Poul-Henning Kamp	f3732fd15b	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Poul-Henning Kamp	2195e4207a	Reference count struct tty. Add two new functions: ttyref() and ttyrel(). ttymalloc() creates a struct tty with a reference count of one. when ttyrel sees the count go to zero, struct tty is freed. Hold references for open ttys and for ttys which are controlling terminal for sessions. Until drivers start using ttyrel(), this commit will make no difference.	2004-06-09 09:41:30 +00:00
Poul-Henning Kamp	a59df4e1ee	Fix a race in destruction of sessions.	2004-06-09 09:29:08 +00:00
Garance A Drosehn	b8fdc89d79	Implement the new KERN_PROC_RGID option, and also implement the KERN_PROC_SESSION option which had been previously defined but never implemented. PR: bin/65803 (a very tiny piece of the PR)` Submitted by: Cyrille Lefevre	2004-05-22 23:11:44 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
Pawel Jakub Dawidek	5e2c0c0b0e	Remove ps_argsopen check. It is was bogus in the past and was corrected not quite well by me - if kern.ps_argsopen was set to 0, users weren't permitted to see arguments of even own processes. But kern.ps_argsopen is going away, so just remove this check and leave security checks for p_cansee() function.	2004-04-01 00:08:20 +00:00
Pawel Jakub Dawidek	9cdb62160b	Fix information leakage. Without this fix it is possible to cheat policies like: - sysctl security.bsd.see_other_[gu]ids=0, - mac_seeotheruids(4), - jail(2) and get full processes list with their arguments. This problem exists from revision 1.62 of kern_proc.c when it was introduced. Reviewed by: nectar, rwatson.	2004-03-17 13:19:43 +00:00
Don Lewis	47934cef8f	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms	2004-02-26 00:27:04 +00:00
Daniel Eischen	2648efa621	Add sysctls to allow showing threads for pgrp, tty, uid, ruid, and pid.	2004-02-22 17:54:32 +00:00
Jeff Roberson	7cf90fb376	- Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td argument rather than a kse.	2003-10-16 08:39:15 +00:00
Peter Wemm	25e247af44	The KERN_PROC_PROC sysctl took 4 args in 5.0-REL and 5.1-REL. We need to accept this for a bit longer. Requiring the new order of 3 args only was not very helpful.	2003-10-15 03:11:46 +00:00

1 2 3 4 5

244 Commits