freebsd-skq

Author	SHA1	Message	Date
John Baldwin	ac07d659c3	Don't hold the process mutex across calls to FREE() since the vm system uses lockmgr locks and this leads to a lock order reversal. At this point in wait1() the process is not on any process lists or in the process tree, so no other process should be able to find it or have a reference to it anyways, so the locking is not needed.	2001-05-04 16:13:28 +00:00
Seigo Tanimura	ebdc3f1d2d	Do not leave a process with no credential in zombproc. Reviewed by: jhb	2001-04-25 10:22:35 +00:00
John Baldwin	33a9ed9d0e	Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning. Reviewed by: -smp, dfr, jake	2001-04-24 00:51:53 +00:00
John Baldwin	1005a129e5	Convert the allproc and proctree locks from lockmgr locks to sx locks.	2001-03-28 11:52:56 +00:00
John Baldwin	f34fa851e0	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>	2001-03-28 09:17:56 +00:00
John Baldwin	c65437a326	- Call proc_reparent() when handing a process off to init in exit rather than dinking around in the process lists explicitly. - Hold both the proctree lock and proc lock of the child process when reparenting a process via proc_reparent. - Lock processes while sending them signals. - Miscellaenous proc locking. - proc_reparent() now asserts that the child is locked in addition to an exclusive proctree lock.	2001-03-07 02:22:31 +00:00
Tor Egge	9d0ddf1861	Streamline updating of switchtime (don't copy code from kern_sync.c). Submitted by: jhb	2001-02-22 20:16:51 +00:00
Tor Egge	0d139b3741	Protect update of the per processor switchtime variable against interrupts. Protect usage of the per processor switchtime variable against interrupts in calcru(). This seem to eliminate the "microuptime() went backwards" warnings.	2001-02-22 19:50:37 +00:00
Robert Watson	91421ba234	o Move per-process jail pointer (p->pr_prison) to inside of the subject credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project	2001-02-21 06:39:57 +00:00
John Baldwin	c3a6f33758	Revert the previous revision for two reasons: - I can't seem to reproduce the warning I got from WITNESS anymore. - The fix was wrong. Since a uidinfo struct is a member of proc, it makes sense for the locking order to be such that you are allowed to hold proc and then grab the uidinfo lock.	2001-02-09 20:51:11 +00:00
John Baldwin	8ad802d82c	Release the proc lock around crfree() and uifree() in wait1(). It leads to a lock order violation, and since p is already a zombie at this point, I'm not sure that we even need all the locking currently in wait1().	2001-02-09 16:43:18 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
John Baldwin	a914fb6b27	- Proc locking. - Protect calcru() with sched_lock.	2001-01-24 00:33:44 +00:00
Jake Burkholder	ef73ae4b0c	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.	2001-01-10 04:43:51 +00:00
Jake Burkholder	98f03f9030	Protect proc.p_pptr and proc.p_children/p_sibling with the proctree_lock. linprocfs not locked pending response from informal maintainer. Reviewed by: jhb, -smp@	2000-12-23 19:43:10 +00:00
Jake Burkholder	1156bc4de2	Whitespace. Fix a comment block and an if statement that were wider than 80 characters.	2000-12-18 07:10:04 +00:00
Jake Burkholder	c0c2557090	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.	2000-12-13 00:17:05 +00:00
Jake Burkholder	85b039fe64	Remove if defined(tahoe) cobwebs.	2000-12-04 09:49:34 +00:00
John Baldwin	4971f62a86	- Add a mutex to the proc structure p_mtx that will be used to lock accesses to each individual proc. - Initialize the lock during fork1(), and destroy it in wait1().	2000-12-03 01:22:34 +00:00
John Baldwin	2925cbe569	Protect p_stat with sched_lock.	2000-12-01 16:59:02 +00:00
John Baldwin	472fd56ea5	Don't update p_stat in exit1() to SZOMB until after releasing the allproc lock. Otherwise, if we block on the backing mutex while releasing the allproc lock, then when we resume, we will be at SRUN, and we will stay that way all the way through cpu_exit. As a result, our parent will never harvest us.	2000-12-01 03:42:17 +00:00
Jake Burkholder	4f55983606	Use callout_reset instead of timeout(9). Most callouts are statically allocated, 2 have been added to struct proc for setitimer and sleep. Reviewed by: jhb, jlemon	2000-11-27 22:52:31 +00:00
Jake Burkholder	553629ebc9	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd	2000-11-22 07:42:04 +00:00
John Baldwin	35e0e5b311	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h	2000-10-20 07:58:15 +00:00
Bruce Evans	621dbe43df	Added used include of <sys/mutex.h> (don't depend on pollution in <sys/signalvar.h>).	2000-09-17 12:20:49 +00:00
Jason Evans	0384fff8c5	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh	2000-09-07 01:33:02 +00:00
Don Lewis	f535380cb6	Remove uidinfo hash table lookup and maintenance out of chgproccnt() and chgsbsize(), which are called rather frequently and may be called from an interrupt context in the case of chgsbsize(). Instead, do the hash table lookup and maintenance when credentials are changed, which is a lot less frequent. Add pointers to the uidinfo structures to the ucred and pcred structures for fast access. Pass a pointer to the credential to chgproccnt() and chgsbsize() instead of passing the uid. Add a reference count to the uidinfo structure and use it to decide when to free the structure rather than freeing the structure when the resource consumption drops to zero. Move the resource tracking code from kern_proc.c to kern_resource.c. Move some duplicate code sequences in kern_prot.c to separate helper functions. Change KASSERTs in this code to unconditional tests and calls to panic().	2000-09-05 22:11:13 +00:00
Peter Wemm	ac2b067b9a	Change the 'exit()' system call to 'sys_exit()'. This avoids overlapping gcc's internal exit() prototypes and the (futile) hackery that we did to try and avoid warnings. main() was renamed for similar reasons. Remove an exit related hack from makesyscalls.sh.	2000-07-29 00:16:28 +00:00
Alfred Perlstein	c636255150	fix races in the uidinfo subsystem, several problems existed: 1) while allocating a uidinfo struct malloc is called with M_WAITOK, it's possible that while asleep another process by the same user could have woken up earlier and inserted an entry into the uid hash table. Having redundant entries causes inconsistancies that we can't handle. fix: do a non-waiting malloc, and if that fails then do a blocking malloc, after waking up check that no one else has inserted an entry for us already. 2) Because many checks for sbsize were done as "test then set" in a non atomic manner it was possible to exceed the limits put up via races. fix: instead of querying the count then setting, we just attempt to set the count and leave it up to the function to return success or failure. 3) The uidinfo code was inlining and repeating, lookups and insertions and deletions needed to be in their own functions for clarity. Reviewed by: green	2000-06-22 22:27:16 +00:00
Jake Burkholder	e39756439c	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others	2000-05-26 02:09:24 +00:00
Jake Burkholder	740a1973a6	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd	2000-05-23 20:41:01 +00:00
Brian Feldman	a274d19ba2	Back out NOTE_EXIT status reporting pending discussion.	2000-05-21 16:27:41 +00:00
Brian Feldman	a24b514d72	Put the wait(2) exit status in "data" for NOTE_EXIT kevents.	2000-05-17 01:16:11 +00:00
Jonathan Lemon	cb679c385e	Introduce kqueue() and kevent(), a kernel event notification facility.	2000-04-16 18:53:38 +00:00
Sean Eric Fagan	893618352c	Handle the case where we truss an SUGID program -- in particular, we need to wake up any processes waiting via PIOCWAIT on process exit, and truss needs to be more aware that a process may actually disappear while it's waiting. Reviewed by: Paul Saab <ps@yahoo-inc.com>	2000-01-10 04:09:05 +00:00
Bruce Evans	bdf423572e	Scheduler fixes equivalent to the ones logged in the following NetBSD commit to kern_synch.c: ---------------------------- revision 1.55 date: 1999/02/23 02:56:03; author: ross; state: Exp; lines: +39 -10 Scheduler bug fixes and reorganization * fix the ancient nice(1) bug, where nice +20 processes incorrectly steal 10 - 20% of the CPU, (or even more depending on load average) * provide a new schedclk() mechanism at a new clock at schedhz, so high platform hz values don't cause nice +0 processes to look like they are niced * change the algorithm slightly, and reorganize the code a lot * fix percent-CPU calculation bugs, and eliminate some no-op code === nice bug === Correctly divide the scheduler queues between niced and compute-bound processes. The current nice weight of two (sort of, see `algorithm change' below) neatly divides the USRPRI queues in half; this should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op, and it was done after decay_cpu() which can only _reduce_ the value. It has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can scheduler-penalize themselves onto the same queue as nice +20 processes. (Or even a higher one.) === New schedclk() mechansism === Some platforms should be cutting down stathz before hitting the scheduler, since the scheduler algorithm only works right in the vicinity of 64 Hz. Rather than prescale hz, then scale back and forth by 4 every time p_estcpu is touched (each occurance an abstraction violation), use p_estcpu without scaling and require schedhz to be generated directly at the right frequency. Use a default stathz (well, actually, profhz) / 4, so nothing changes unless a platform defines schedhz and a new clock. Define these for alpha, where hz==1024, and nice was totally broke. === Algorithm change === The nice value used to be added to the exponentially-decayed scheduler history value p_estcpu, in _addition_ to be incorporated directly (with greater wieght) into the priority calculation. At first glance, it appears to be a pointless increase of 1/8 the nice effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that because it will ramp up linearly but be decayed only exponentially, thus converging to an additional .75 nice for a loadaverage of one. I killed this, it makes the behavior hard to control, almost impossible to analyze, and the effect (~~nothing at for the first second, then somewhat increased niceness after three seconds or more, depending on load average) pointless. === Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation. Collect scheduler functionality. Try to put each abstraction in just one place. ---------------------------- The details are a little different in FreeBSD: === nice bug === Fixing this is the main point of this commit. We use essentially the same clipping rule as NetBSD (our limit on p_estcpu differs by a scale factor). However, clipping at all is fundamentally bad. It gives free CPU the hoggiest hogs once they reach the limit, and reaching the limit is normal for long-running hogs. This will be fixed later. === New schedclk() mechanism === We don't use the NetBSD schedclk() (now schedclock()) mechanism. We require (real)stathz to be about 128 and scale by an extra factor of 2 compared with NetBSD's statclock(). We scale p_estcpu instead of scaling the clock. This is more accurate and flexible. === Algorithm change === Same change. === Other bugs === The p_pctcpu bug was fixed long ago. We don't try as hard to abstract functionality yet. Related changes: the new limit on p_estcpu must be exported to kern_exit.c for clipping in wait1(). Agreed with by: dufault	1999-11-28 12:12:14 +00:00
Poul-Henning Kamp	da654d9070	s/p_cred->pc_ucred/p_ucred/g	1999-11-21 12:38:21 +00:00
Poul-Henning Kamp	93efcae809	The at_exit and at_fork functions currently use a 'roll your own' linked list to store the callbak routines. The patch converts the lists to queue(3) TAILQs, making the code slightly clearer and ensuring that callbacks are executed in FIFO order. Man page also updated as necesary. (discontinued use of M_TEMP malloc type while here anyway /phk) Submitted by: Jake Burkholder jake@checker.org PR: 14912	1999-11-19 21:29:03 +00:00
Poul-Henning Kamp	b9df5231ca	Introduce commandline caching in the kernel. This fixes some nasty procfs problems for SMP, makes ps(1) run much faster, and makes ps(1) even less dependent on /proc which will aid chroot and jails alike. To disable this facility and revert to previous behaviour: sysctl -w kern.ps_arg_cache_limit=0 For full details see the current@FreeBSD.org mail-archives.	1999-11-16 20:31:58 +00:00
Poul-Henning Kamp	2e3c8fcbd0	This is a partial commit of the patch from PR 14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914	1999-11-16 10:56:05 +00:00
Luoqi Chen	645682fd40	Add a per-signal flag to mark handlers registered with osigaction, so we can provide the correct context to each signal handler. Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK as we did before the linuxthreads support merge (submitted by bde). Move ps_sigstk from to p_sigacts to the main proc structure since signal stack should not be shared among threads. Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag. Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag. Reviewed by: marcel, jdp, bde	1999-10-11 20:33:17 +00:00
Peter Wemm	0894f4a92c	Clean up some cruft. We don't run <= 4.3 binaries on hp300 or luna68k arches using owait(2).	1999-10-11 15:15:45 +00:00
Marcel Moolenaar	2c42a14602	sigset_t change (part 2 of 5) ----------------------------- The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements. The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t. struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals. signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution. Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types. NOTE: kdump (and port linux_kdump) must be recompiled. Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.	1999-09-29 15:03:48 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Marcel Moolenaar	c6dfea0ebd	Add sysctl variables for the Linuxulator. These reside under `compat.linux' as discussed on current. The following variables are defined (for now): osname (defaults to "Linux") Allow users to change the name of the OS as returned by uname(2), specially added for all those Linux Netscape users and statistics maniacs :-) We now have what we all wanted! osrelease (defaults to "2.2.5") Allow users to change the version of the OS as returned by uname(2). Since -current supports glibc2.1 now, change the default to 2.2.5 (was 2.0.36). oss_version (defaults to 198144 [0x030600]) This one will be used by the OSS_GETVERSION ioctl (PR 12917) which I can commit now that we have the MIB. The default version number is the lowest version possible with the current 'encoding'. A note about imprisoned processes (see jail(2)): These variables are copy-on-write (as suggested by phk). This means that imprisoned processes will use the system wide value unless it is written/set by the process. From that moment on, a copy local to the prison will be used. A note about the implementation: I choose to add a single pointer to struct prison, because I didn't like the idea of changing struct prison every time I come up with a new variable. As a side effect, the extra storage is only needed when a variable is set from within the prison. This also minimizes kernel bloat when the Linuxulator is not used; both compiled in or as a module. Reviewed by: bde (first version only) and phk	1999-08-27 19:47:41 +00:00
Mike Smith	79fc0bf4a0	From the submitter: - this causes POSIX locking to use the thread group leader (p->p_leader) as the locking thread for all advisory locks. In non-kernel-threaded code p->p_leader == p, so this will have no effect. This results in (more) correct POSIX threaded flock-ing semantics. It also prevents the leader from exiting before any of the children. (so that p->p_leader will never be stale) in exit1(). We have been running this patch for over a month now in our lab under load and at customer sites. Submitted by: John Plevyak <jplevyak@inktomi.com>	1999-06-07 20:37:29 +00:00
Poul-Henning Kamp	75c1354190	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/	1999-04-28 11:38:52 +00:00
Luoqi Chen	5206bca10a	Enable vmspace sharing on SMP. Major changes are, - %fs register is added to trapframe and saved/restored upon kernel entry/exit. - Per-cpu pages are no longer mapped at the same virtual address. - Each cpu now has a separate gdt selector table. A new segment selector is added to point to per-cpu pages, per-cpu global variables are now accessed through this new selector (%fs). The selectors in gdt table are rearranged for cache line optimization. - fask_vfork is now on as default for both UP and SMP. - Some aio code cleanup. Reviewed by: Alan Cox <alc@cs.rice.edu> John Dyson <dyson@iquest.net> Julian Elischer <julian@whistel.com> Bruce Evans <bde@zeta.org.au> David Greenman <dg@root.com>	1999-04-28 01:04:33 +00:00
Peter Wemm	e91896117b	Well folks, this is it - The second stage of the removal for build support for LKM's..	1999-04-17 08:36:07 +00:00
Bruce Evans	56ce1a8dc4	Fixed runtime accounting. The time since the previous context switch was discarded on every call to calcru(). Hacking on the `switchtime' global for a related fix in rev.1.38 of kern_resource.c was too fragile and broke when p_switchtime went away. PR: 10402	1999-03-11 21:53:12 +00:00

1 2 3

126 Commits