freebsd-dev

Author	SHA1	Message	Date
Robert Watson	faef53711b	Move per-process audit state from a pointer in the proc structure to embedded storage in struct ucred. This allows audit state to be cached with the thread, avoiding locking operations with each system call, and makes it available in asynchronous execution contexts, such as deep in the network stack or VFS. Reviewed by: csjp Approved by: re (kensmith) Obtained from: TrustedBSD Project	2007-06-07 22:27:15 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
John Baldwin	4649e92b4e	Align 'struct thread' on 16 byte boundaries so that the lower 4 bits are always 0. Previously we aligned threads on a minimum of 8-byte boundaries. Note: This changes the uma zone to no longer cache align threads. We really want the uma zone to do align threads to MAX(16, cache line size) but there currently isn't a good way to express that to uma. Submitted by: attilio	2007-03-27 16:51:34 +00:00
Jeff Roberson	f0393f063a	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
Xin LI	4f506694bb	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.	2007-01-17 14:58:53 +00:00
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
David Xu	60d4823594	Copy base user priority in NO_KSE case.	2006-11-12 11:48:37 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Robert Watson	88b85279a9	SI_ORDER_THIRD + 2, not SI_ORDER_FOURTH + 2. MFC after: 3 days Submitted by: mlaier	2006-09-26 00:15:56 +00:00
Robert Watson	5add74b4a7	Add "FreeBSD" trademark statement to copyright section of boot messages. MFC after: 3 days Approved by: core, board at FreeBSDFoundation dot org	2006-09-25 23:19:01 +00:00
David Xu	4386313871	Initialize kg_base_user_pri.	2006-08-25 06:29:16 +00:00
Benno Rice	77fe443878	The VERBOSE_SYSINIT stuff sees the DDB define a lot better if we include opt_ddb.h. Spotted by: benno Pointy hat to: benno	2006-05-14 07:11:28 +00:00
Benno Rice	26ab616fdc	Add a new kernel config option, VERBOSE_SYSINIT. When porting FreeBSD to a new platform, one of the more useful things to do is get mi_startup() to let you know which SYSINIT it's up to. Most people tend to whack a printf in the SYSINIT loop to print the address of the function it's about to call. Going one better, jhb made a version that uses DDB to look up the name of the function and print that instead. This version is essentially his with the addition of some ifdeffery to make it optional and to allow it to work (although using only the function address, not the symbol) if you forgot to enable DDB. All the cool bits by: jhb Approved by: scottl, rink, cognet, imp	2006-05-12 02:01:38 +00:00
Poul-Henning Kamp	5b1a8eb397	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.	2006-02-07 21:22:02 +00:00
Olivier Houchard	2a3b10658d	rwlock expects the struct thread to be aligned on 8 bytes, so make sure thread0 is.	2006-02-06 16:03:10 +00:00
Robert Watson	3683665bbd	Hook up audit to the initial process creation events (proc0, proc1). Much help from: wsalamon Obtained from: TrustedBSD Project	2006-02-02 01:16:31 +00:00
Robert Watson	2c255e9df6	Moderate rewrite of kernel ktrace code to attempt to generally improve reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>	2005-11-13 13:27:44 +00:00
Robert Watson	223aaaecb0	Remove mac_create_root_mount() and mpo_create_root_mount(), which provided access to the root file system before the start of the init process. This was used briefly by SEBSD before it knew about preloading data in the loader, and using that method to gain access to data earlier results in fewer inconsistencies in the approach. Policy modules still have access to the root file system creation event through the mac_create_mount() entry point. Removed now, and will be removed from RELENG_6, in order to gain third party policy dependencies on the entry point for the lifetime of the 6.x branch. MFC after: 3 days Submitted by: Chris Vance <Christopher dot Vance at SPARTA dot com> Sponsored by: SPARTA	2005-09-19 13:59:57 +00:00
Ralf S. Engelschall	724447ac41	Fix system shutdown timeout handling by again supporting longer running shutdown procedures (which have a duration of more than 120 seconds). We have two user-space affecting shutdown timeouts: a "soft" one in /etc/rc.shutdown and a "hard" one in init(8). The first one can be configured via /etc/rc.conf variable "rcshutdown_timeout" and defaults to 30 seconds. The second one was originally (in 1998) intended to be configured via sysctl(8) variable "kern.shutdown_timeout" and defaults to 120 seconds. Unfortunately, the "kern.shutdown_timeout" was declared "unused" in 1999 (as it obviously is actually not used within the kernel itself) and hence was intentionally but misleadingly removed in revision 1.107 from init_main.c. Kernel sysctl(8) variables are certainly a wrong way to control user-space processes in general, but in this particular case the sysctl(8) variable should have remained as it supports init(8), which isn't passed command line flags (which in turn could have been set via /etc/rc.conf), etc. As there is already a similar "kern.init_path" sysctl(8) variable which directly affects init(8), resurrect the init(8) shutdown timeout under sysctl(8) variable "kern.init_shutdown_timeout". But this time document it as being intentionally unused within the kernel and used by init(8). Also document it in the manpages init(8) and rc.conf(5). Reviewed by: phk MFC after: 2 weeks	2005-09-15 13:16:07 +00:00
Suleiman Souhlal	571dcd15e2	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone	2005-07-01 16:28:32 +00:00
Dag-Erling Smørgrav	f3f4baf099	Add /rescue/init to the default init_path, before /stand/sysinstall. MFC after: 2 weeks	2005-02-17 10:00:10 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Poul-Henning Kamp	20a92a18f1	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).	2004-12-07 08:15:41 +00:00
David Schultz	6004362e66	Don't include sys/user.h merely for its side-effect of recursively including other headers.	2004-11-27 06:51:39 +00:00
David Schultz	8b059651ba	Malloc p_stats instead of putting it in the U area. We should consider simply embedding it in struct proc. Reviewed by: arch@	2004-11-20 02:28:48 +00:00
Poul-Henning Kamp	8ec21e3a68	Allow fdinit() to be called with a NULL fdp argument so we can use it when setting up init. Make fdinit() lock the fdp argument as needed.	2004-11-07 12:39:28 +00:00
John Baldwin	78c85e8dfc	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month	2004-10-05 18:51:11 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
Julian Elischer	2630e4c90c	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days	2004-09-01 02:11:28 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Poul-Henning Kamp	d634f69316	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.	2004-07-28 20:21:04 +00:00
Alfred Perlstein	f257b7a54b	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Poul-Henning Kamp	2cf6bdac50	Loudly announce WITNESS and DIAGNOSTIC options and warn about reduced performance.	2004-02-29 16:56:54 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Robert Watson	bafc8f255a	KASSERT() that initproc->p_pid is 1. Very bad things happen if init's pid isn't 1, and it can actually occur if kthread_create() is called before SUB_SI_CREATE_INIT without RFHIGHPID. Discussed with: jhb	2004-01-16 20:29:23 +00:00
Dag-Erling Smørgrav	a2fe44e8cf	New file descriptor allocation code, derived from similar code introduced in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated file descriptors which is used to locate available descriptors when a new one is needed. It also moves the task of growing the file descriptor table out of fdalloc(), reducing complexity in both fdalloc() and do_dup(). Debts of gratitude are owed to tjr@ (who provided the original patch on which this work is based), grog@ (for the gdb(4) man page) and rwatson@ (for assistance with pxeboot(8)).	2004-01-15 10:15:04 +00:00
Robert Watson	c142b0fcfe	Remove the global variable 'cmask', which was used to initialize the fd_cmask field in the file descriptor structure for the first process indirectly from CMASK, and when an fd structure is initialized before being filled in, and instead just use CMASK. This appears to be an artifact left over from the initial integration of quotas into BSD. Suggested by: peter	2003-10-02 03:57:59 +00:00
Peter Wemm	c460ac3a00	Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.	2003-09-25 01:10:26 +00:00
Sam Leffler	c06eb4e293	Change instances of callout_init that specify MPSAFE behaviour to use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.	2003-08-19 17:51:11 +00:00
Poul-Henning Kamp	68f2d20b70	Revert stuff which accidentally ended up in the previous commit.	2003-07-22 10:36:36 +00:00
Poul-Henning Kamp	55d1d7034f	Don't attempt to inline large functions mb_alloc() and mb_free(), it more than doubles the text size of this file. GCC has wisely ignored us on this previously	2003-07-22 10:24:41 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Tor Egge	ad05d58087	Add tracking of process leaders sharing a file descriptor table and allow a file descriptor table to be shared between multiple process leaders. PR: 50923	2003-06-02 16:05:32 +00:00
John Baldwin	90af4afacb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)	2003-05-13 20:36:02 +00:00
Dag-Erling Smørgrav	87ccef7b77	Instead of recording the Unix time in a process when it starts, record the uptime. Where necessary, convert it back to Unix time by adding boottime to it. This fixes a potential problem in the accounting code, which would compute the elapsed time incorrectly if the Unix time was stepped during the lifetime of the process.	2003-05-01 16:59:23 +00:00
Jake Burkholder	2373499592	Made vmspace0 non-static. Its useful to be able to identify a vmspace as the kernel vmspace.	2003-04-13 21:29:11 +00:00
Julian Elischer	060563ec50	Move the _oncpu entry from the KSE to the thread. The entry in the KSE still exists but it's purpose will change a bit when we add the ability to lock a KSE to a cpu.	2003-04-10 17:35:44 +00:00
Tim J. Robbins	96d7f8ef46	Use the proc lock to protect p_realtimer instead of Giant, and obtain sched_lock around accesses to p_stats->p_timer[] to avoid a potential race with hardclock. getitimer(), setitimer() and the realitexpire() callout are now Giant-free.	2003-02-17 10:03:02 +00:00
Jeff Roberson	5215b1872f	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu	2003-02-17 05:14:26 +00:00
Dag-Erling Smørgrav	f6cebd7310	It seems the extra precautions are no longer needed.	2003-02-13 10:05:20 +00:00
Dag-Erling Smørgrav	c524b1a8cf	Correct grammatical error in previous commit.	2003-02-04 18:47:17 +00:00
Dag-Erling Smørgrav	91dd013b1e	Extra precautions before trying to start init(8).	2003-02-04 18:16:50 +00:00
Julian Elischer	6f8132a867	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.	2003-02-01 12:17:09 +00:00
Poul-Henning Kamp	f6a1852dcc	NODEVFS cleanup: remove #ifdefs.	2003-01-30 12:35:17 +00:00
David Xu	0dbb100b9b	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian	2003-01-26 11:41:35 +00:00
Poul-Henning Kamp	7e760e148a	Originally when DEVFS was added, a global variable "devfs_present" was used to control code which were conditional on DEVFS' precense since this avoided the need for large-scale source pollution with #include "opt_geom.h" Now that we approach making DEVFS standard, replace these tests with an #ifdef to facilitate mechanical removal once DEVFS becomes non-optional. No functional change by this commit.	2003-01-19 11:03:07 +00:00
Jake Burkholder	a360a43dd5	Improve the way that an elf image activator for an alternate word size is included in the kernel. Include imgact_elf.c in conf/files, instead of both imgact_elf32.c and imgact_elf64.c, which will use the default word size for an architecture as defined in machine/elf.h. Architectures that wish to build an additional image activator for an alternate word size can include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which allows it to be dependent on MD options instead of solely on architecture. Glanced at by: peter	2003-01-04 22:07:48 +00:00
Julian Elischer	93a7aa79d6	Add code to ddb to allow backtracing an arbitrary thread. (show thread {address}) Remove the IDLE kse state and replace it with a change in the way threads sahre KSEs. Every KSE now has a thread, which is considered its "owner" however a KSE may also be lent to other threads in the same group to allow completion of in-kernel work. n this case the owner remains the same and the KSE will revert to the owner when the other work has been completed. All creations of upcalls etc. is now done from kse_reassign() which in turn is called from mi_switch or thread_exit(). This means that special code can be removed from msleep() and cv_wait(). kse_release() does not leave a KSE with no thread any more but converts the existing thread into teh KSE's owner, and sets it up for doing an upcall. It is just inhibitted from being scheduled until there is some reason to do an upcall. Remove all trace of the kse_idle queue since it is no-longer needed. "Idle" KSEs are now on the loanable queue.	2002-12-28 01:23:07 +00:00
Giorgos Keramidas	0c920c0de8	Fix typo in comment. It's SYSINIT, not SYSINT. Approved by: re (murray)	2002-11-30 22:15:30 +00:00
Jeff Roberson	de028f5a4a	- Implement a mechanism for allowing schedulers to place scheduler dependant data in the scheduler independant structures (proc, ksegrp, kse, thread). - Implement unused stubs for this mechanism in sched_4bsd. Approved by: re Reviewed by: luigi, trb Tested on: x86, alpha	2002-11-21 01:22:38 +00:00
Robert Drehmel	e80fb43467	Use strlcpy() instead of strncpy() to copy NUL terminated strings for safety and consistency.	2002-10-17 20:03:38 +00:00
John Baldwin	5715307f74	- Move p_cpulimit to struct proc from struct plimit and protect it with sched_lock. This means that we no longer access p_limit in mi_switch() and the p_limit pointer can be protected by the proc lock. - Remove PRS_ZOMBIE check from CPU limit test in mi_switch(). PRS_ZOMBIE processes don't call mi_switch(), and even if they did there is no longer the danger of p_limit being NULL (which is what the original zombie check was added for). - When we bump the current processes soft CPU limit in ast(), just bump the private p_cpulimit instead of the shared rlimit. This fixes an XXX for some value of fix. There is still a (probably benign) bug in that this code doesn't check that the new soft limit exceeds the hard limit. Inspired by: bde (2)	2002-10-09 17:17:24 +00:00
Julian Elischer	48bfcddd94	Round out the facilty for a 'bound' thread to loan out its KSE in specific situations. The owner thread must be blocked, and the borrower can not proceed back to user space with the borrowed KSE. The borrower will return the KSE on the next context switch where teh owner wants it back. This removes a lot of possible race conditions and deadlocks. It is consceivable that the borrower should inherit the priority of the owner too. that's another discussion and would be simple to do. Also, as part of this, the "preallocatd spare thread" is attached to the thread doing a syscall rather than the KSE. This removes the need to lock the scheduler when we want to access it, as it's now "at hand". DDB now shows a lot mor info for threaded proceses though it may need some optimisation to squeeze it all back into 80 chars again. (possible JKH project) Upcalls are now "bound" threads, but "KSE Lending" now means that other completing syscalls can be completed using that KSE before the upcall finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is one of the highest priorities in the KSE system.) The upcall when it happens will present all the completed syscalls to the KSE for selection.	2002-10-09 02:33:36 +00:00
Scott Long	316ec49abd	Some kernel threads try to do significant work, and the default KSTACK_PAGES doesn't give them enough stack to do much before blowing away the pcb. This adds MI and MD code to allow the allocation of an alternate kstack who's size can be speficied when calling kthread_create. Passing the value 0 prevents the alternate kstack from being created. Note that the ia64 MD code is missing for now, and PowerPC was only partially written due to the pmap.c being incomplete there. Though this patch does not modify anything to make use of the alternate kstack, acpi and usb are good candidates. Reviewed by: jake, peter, jhb	2002-10-02 07:44:29 +00:00
Juli Mallett	1d9c56964d	Back our kernel support for reliable signal queues. Requested by: rwatson, phk, and many others	2002-10-01 17:15:53 +00:00
Juli Mallett	1226f694e6	First half of implementation of ksiginfo, signal queues, and such. This gets signals operating based on a TailQ, and is good enough to run X11, GNOME, and do job control. There are some intricate parts which could be more refined to match the sigset_t versions, but those require further evaluation of directions in which our signal system can expand and contract to fit our needs. After this has been in the tree for a while, I will make in kernel API changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo more robustly, such that we can actually pass information with our (queued) signals to the userland. That will also result in using a struct ksiginfo pointer, rather than a signal number, in a lot of kern_sig.c, to refer to an individual pending signal queue member, but right now there is no defined behaviour for such. CODAFS is unfinished in this regard because the logic is unclear in some places. Sponsored by: New Gold Technology Reviewed by: bde, tjr, jake [an older version, logic similar]	2002-09-30 20:20:22 +00:00
Jake Burkholder	05ba50f522	Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.	2002-09-21 22:07:17 +00:00
Julian Elischer	4f0db5e08c	Allocate KSEs and KSEGRPs separatly and remove them from the proc structure. next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place) While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.	2002-09-15 23:52:25 +00:00
Julian Elischer	71fad9fdee	Completely redo thread states. Reviewed by: davidxu@freebsd.org	2002-09-11 08:13:56 +00:00
Jake Burkholder	f36ba45234	Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to sysentvec. Initialized all fields of all sysentvecs, which will allow them to be used instead of constants in more places. Provided stack fixup routines for emulations that previously used the default.	2002-09-01 21:41:24 +00:00
Ian Dowse	8f19eb88df	Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap. Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel	2002-09-01 20:37:28 +00:00
Robert Watson	4d1a4bb79f	Refresh the credential on the first initproc thread following divorcing the initproc credential from the proc0 credential. Otherwise, the proc0 credential is used instead of initproc's credentil when authorizing start_init() activities prior to initproc hitting userland for the first time. This could result in the incorrect credential being used to authorize mounting of the root file system, which could in turn cause problems for NFS when used in combination with uid/gid ipfw rules, or with MAC. Discussed with: julian	2002-08-07 17:53:31 +00:00
Robert Watson	a87cdf8335	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the necessary MAC entry points to maintain labels on mount structures. In particular, invoke entry points for intialization and destruction in various scenarios (root, non-root). Also introduce an entry point in the boot procedure following the mount of the root file system, but prior to the start of the userland init process to permit policies to perform further initialization. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 01:11:29 +00:00
Robert Watson	4024496496	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the necessary MAC entry points to maintain labels on process credentials. In particular, invoke entry points for the initialization and destruction of struct ucred, the copying of struct ucred, and permit the initial labels to be set for both process 0 (parent of all kernel processes) and process 1 (parent of all user processes). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 00:39:19 +00:00
Peter Wemm	3ebc124838	Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable handler in the kernel at the same time. Also, allow for the exec_new_vmspace() code to build a different sized vmspace depending on the executable environment. This is a big help for execing i386 binaries on ia64. The ELF exec code grows the ability to map partial pages when there is a page size difference, eg: emulating 4K pages on 8K or 16K hardware pages. Flesh out the i386 emulation support for ia64. At this point, the only binary that I know of that fails is cvsup, because the cvsup runtime tries to execute code in pages not marked executable. Obtained from: dfr (mostly, many tweaks from me).	2002-07-20 02:56:12 +00:00
Julian Elischer	c3b98db091	Thinking about it I came to the conclusion that the KSE states were incorrectly formulated. The correct states should be: IDLE: On the idle KSE list for that KSEG RUNQ: Linked onto the system run queue. THREAD: Attached to a thread and slaved to whatever state the thread is in. This means that most places where we were adjusting kse state can go away as it is just moving around because the thread is.. The only places we need to adjust the KSE state is in transition to and from the idle and run queues. Reviewed by: jhb@freebsd.org	2002-07-14 03:43:33 +00:00
Maxime Henrion	563af2ec15	Remove an unused argument in vfs_mountroot().	2002-07-03 08:52:37 +00:00
Julian Elischer	e602ba25fd	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..	2002-06-29 17:26:22 +00:00
Tom Rhodes	d394511de3	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
Maxime Henrion	6dbde1fe23	Convert devfs to nmount. Reviewed by: phk	2002-05-02 20:27:42 +00:00
Robert Watson	5a06cb0ca6	Divorce proc0 and proc1 credentials earlier; while this isn't technically needed in the current code, in the MAC tree, create_init() relies on the ability to modify the credentials present for initproc, and should not perform that modification on a shared credential. Pro-active diff reduction against MAC changes that are in the queue; also facilitates other work, including the capabilities implementation. Submitted by: green Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-04-19 13:35:53 +00:00
Maxime Henrion	d786139c76	Rework the kernel environment subsystem. We now convert the static environment needed at boot time to a dynamic subsystem when VM is up. The dynamic kernel environment is protected by an sx lock. This adds some new functions to manipulate the kernel environment : freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be called after every getenv() when you have finished using the string. testenv() only tests if an environment variable is present, and doesn't require a freeenv() call. setenv() and unsetenv() are self explanatory. The kenv(2) syscall exports these new functionalities to userland, mainly for kenv(1). Reviewed by: peter	2002-04-17 13:06:36 +00:00
John Baldwin	6008862bc2	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
Seigo Tanimura	5cf4bcebbf	The description of fd_mtx is "filedesc structure."	2002-03-29 11:26:05 +00:00
Jeff Roberson	f22a4b62f5	Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks with this flag. Remove the dup_list and dup_ok code from subr_witness. Now we just check for the flag instead of doing string compares. Also, switch the process lock, process group lock, and uma per cpu locks over to this interface. The original mechanism did not work well for uma because per cpu lock names are unique to each zone. Approved by: jhb	2002-03-27 09:23:41 +00:00
Poul-Henning Kamp	fb92273bdc	Move the mount of the root filesystem to happen in the init process before the exec if /sbin/init. This allows the scheduler to get started and kthreads a chance to run before we start filesystem operations.	2002-03-08 10:33:11 +00:00
Peter Wemm	e2256f43ed	Fix warning. s/microuptime()/binuptime()/ for switchtime initial value.	2002-02-26 01:03:39 +00:00
Seigo Tanimura	f591779bb5	Lock struct pgrp, session and sigio. New locks are: - pgrpsess_lock which locks the whole pgrps and sessions, - pg_mtx which protects the pgrp members, and - s_mtx which protects the session members. Please refer to sys/proc.h for the coverage of these locks. Changes on the pgrp/session interface: - pgfind() needs the pgrpsess_lock held. - The caller of enterpgrp() is responsible to allocate a new pgrp and session. - Call enterthispgrp() in order to enter an existing pgrp. - pgsignal() requires a pgrp lock held. Reviewed by: jhb, alfred Tested on: cvsup.jp.FreeBSD.org (which is a quad-CPU machine running -current)	2002-02-23 11:12:57 +00:00
Poul-Henning Kamp	1cbb9c3b03	Convert p->p_runtime and PCPU(switchtime) to bintime format.	2002-02-22 13:32:01 +00:00
Julian Elischer	2c1007663f	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)	2002-02-11 20:37:54 +00:00
Julian Elischer	079b7badea	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,	2002-02-07 20:58:47 +00:00
Alfred Perlstein	9e209b124a	Include sys/_lock.h and sys/_mutex.h to reduce namespace pollution. Requested by: jhb	2002-01-13 21:37:49 +00:00
Alfred Perlstein	426da3bcfb	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.	2002-01-13 11:58:06 +00:00
Luigi Rizzo	af1408e33f	Add/correct description for some sysctl variables where it was missing. The description field is unused in -stable, so the MFC there is equivalent to a comment. It can be done at any time, i am just setting a reminder in 45 days when hopefully we are past 4.5-release. MFC after: 45 days	2001-12-16 16:07:20 +00:00
John Baldwin	0bbc882680	Overhaul the per-CPU support a bit: - The MI portions of struct globaldata have been consolidated into a MI struct pcpu. The MD per-CPU data are specified via a macro defined in machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the interface would be cleaner (PCPU_GET(my_md_field) vs. PCPU_GET(md.md_my_md_field)). - All references to globaldata are changed to pcpu instead. In a UP kernel, this data was stored as global variables which is where the original name came from. In an SMP world this data is per-CPU and ideally private to each CPU outside of the context of debuggers. This also included combining machine/globaldata.h and machine/globals.h into machine/pcpu.h. - The pointer to the thread using the FPU on i386 was renamed from npxthread to fpcurthread to be identical with other architectures. - Make the show pcpu ddb command MI with a MD callout to display MD fields. - The globaldata_register() function was renamed to pcpu_init() and now init's MI fields of a struct pcpu in addition to registering it with the internal array and list. - A pcpu_destroy() function was added to remove a struct pcpu from the internal array and list. Tested on: alpha, i386 Reviewed by: peter, jake	2001-12-11 23:33:44 +00:00
John Baldwin	8e2e767b1f	Add a per-thread ucred reference for syscalls and synchronous traps from userland. The per thread ucred reference is immutable and thus needs no locks to be read. However, until all the proc locking associated with writes to p_ucred are completed, it is still not safe to use the per-thread reference. Tested on: x86 (SMP), alpha, sparc64	2001-10-26 08:12:54 +00:00
John Baldwin	8cc06751dd	Don't initialize proc0's mutex twice. It is already done earlier on in the MD startup code.	2001-09-18 22:09:47 +00:00
Peter Wemm	b711616825	In the devfs case, have initproc attempt the easy cases of mounting /dev. This works if /dev exists, or if / is read/write (nfsroot). If it is too hard, leave it up to init -d (which will probably fail if /dev does not exist, but there isn't much else we can do short of making a union mount on /). This means we get a proper /dev if you boot a 5.x kernel on a 4.x world, which I happen to do often (the ramdisks on our install netboot servers have 4.x userland worlds on them).	2001-09-15 11:15:22 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Matthew Dillon	0cddd8f023	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
Peter Wemm	f41325db5f	With this commit, I hereby pronounce gensetdefs past its use-by date. Replace the a.out emulation of 'struct linker_set' with something a little more flexible. <sys/linker_set.h> now provides macros for accessing elements and completely hides the implementation. The linker_set.h macros have been on the back burner in various forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()), John Polstra (ELF clue) and myself (cleaned up API and the conversion of the rest of the kernel to use it). The macros declare a strongly typed set. They return elements with the type that you declare the set with, rather than a generic void *. For ELF, we use the magic ld symbols (__start_<setname> and __stop_<setname>). Thanks to Richard Henderson <rth@redhat.com> for the trick about how to force ld to provide them for kld's. For a.out, we use the old linker_set struct. NOTE: the item lists are no longer null terminated. This is why the code impact is high in certain areas. The runtime linker has a new method to find the linker set boundaries depending on which backend format is in use. linker sets are still module/kld unfriendly and should never be used for anything that may be modular one day. Reviewed by: eivind	2001-06-13 10:58:39 +00:00
Robert Watson	b1fc0ec1a7	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit	2001-05-25 16:59:11 +00:00
John Baldwin	1b2555b243	- Lock the VM when initializing the vmspace for proc0. - Don't bother releasing Giant while doing a lookup on the vm_map of initproc while starting up init. We have to grab it again right after the lookup anyways.	2001-05-23 22:06:47 +00:00
Alfred Perlstein	2395531439	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
John Baldwin	7a08bae6ec	- Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT. - Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be toggled after boot. - Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just afer the display of the copyright message instead of doing it by hand in three MD places.	2001-05-17 22:28:46 +00:00
Greg Lehey	60fb0ce365	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
John Baldwin	7b531e6037	Stick proc0 in the PID hash table.	2001-04-11 18:50:50 +00:00
John Baldwin	1005a129e5	Convert the allproc and proctree locks from lockmgr locks to sx locks.	2001-03-28 11:52:56 +00:00
David E. O'Brien	828c9e13a3	Do not set a default ELF syscall ABI fallback. If one runs an un-branded Linux static binary that calls Linux's fcntl the machine will reboot when interupted by the FreeBSD syscall ABI.	2001-03-04 11:58:50 +00:00
Ian Dowse	a90ef2ae0f	The kernel did not hold a vnode reference associated with the `rootvnode' pointer, but vfs_syscalls.c's checkdirs() assumed that it did. This bug reliably caused a panic at reboot time if any filesystem had been mounted directly over /. The checkdirs() function is called at mount time to find any process fd_cdir or fd_rdir pointers referencing the covered mountpoint vnode. It transfers these to point at the root of the new filesystem. However, this process was not reversed at unmount time, so processes with a cwd/root at a mount point would unexpectedly lose their cwd/root following a mount-unmount cycle at that mountpoint. This change should fix both of the above issues. Start_init() now holds an extra vnode reference corresponding to `rootvnode', and dounmount() releases this reference when the root filesystem is unmounted just before reboot. Dounmount() now undoes the actions taken by checkdirs() at mount time; any process cdir/rdir pointers that reference the root vnode of the unmounted filesystem are transferred to the now-uncovered vnode. Reviewed by: bde, phk	2001-02-28 20:54:28 +00:00
Jake Burkholder	5b270b2a55	Sigh. Try to get priorities sorted out. Don't bother trying to update native priority, it is diffcult to get right and likely to end up horribly wrong. Use an honestly wrong fixed value that seems to work; PUSER for user threads, and the interrupt priority for ithreads. Set it once when the process is created and forget about it. Suggested by: bde Pointy hat: me	2001-02-28 02:53:44 +00:00
Jake Burkholder	be15bfc091	Initialize native priority to PRI_MAX. It was usually 0 which made a process's priority go through the roof when it released a (contested) mutex. Only set the native priority in mtx_lock if hasn't already been set. Reviewed by: jhb	2001-02-26 23:27:35 +00:00
John Baldwin	130c1f25a4	It turns out the kernel console works fine and thus doesn't need quite this much extra testing.	2001-02-24 03:40:23 +00:00
Peter Wemm	3e688165a9	Stricter style(9) conformance - remove unnecessary blank lines in previous commit.	2001-02-23 23:05:46 +00:00
John Baldwin	0b1d793211	Test out the kernel console just before launching the AP's.	2001-02-23 19:44:25 +00:00
Robert Watson	91421ba234	o Move per-process jail pointer (p->pr_prison) to inside of the subject credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project	2001-02-21 06:39:57 +00:00
Jake Burkholder	d5a08a6065	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.	2001-02-12 00:20:08 +00:00
John Baldwin	929604ec9b	Move the initailization of the proc lock for proc0 very early into the MD startup code.	2001-02-09 16:25:16 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
John Baldwin	f202965e95	- Catch up to p_sflag changes. - The MD code now initializes proc0.p_heldmtx, proc0.p_contested, and curproc. - The MD code calls here with Giant already held. - Proc locking.	2001-01-24 10:40:56 +00:00
Jake Burkholder	ef73ae4b0c	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.	2001-01-10 04:43:51 +00:00
Jake Burkholder	c0c2557090	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.	2000-12-13 00:17:05 +00:00
John Baldwin	4971f62a86	- Add a mutex to the proc structure p_mtx that will be used to lock accesses to each individual proc. - Initialize the lock during fork1(), and destroy it in wait1().	2000-12-03 01:22:34 +00:00
Jake Burkholder	1512b5d6ab	Use an mp-safe callout for endtsleep.	2000-12-01 04:55:52 +00:00
Jake Burkholder	4f55983606	Use callout_reset instead of timeout(9). Most callouts are statically allocated, 2 have been added to struct proc for setitimer and sleep. Reviewed by: jhb, jlemon	2000-11-27 22:52:31 +00:00
Jake Burkholder	553629ebc9	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd	2000-11-22 07:42:04 +00:00
David E. O'Brien	00910f2882	ELF kernels should use an ELF sysvec. This allows us to move a.out specific files to those platforms that acutally support a.out.	2000-11-05 10:41:35 +00:00
John Baldwin	35e0e5b311	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h	2000-10-20 07:58:15 +00:00
John Baldwin	db72809d24	Release Giant before starting up init. Submitted by: jake	2000-09-15 19:25:29 +00:00
Doug Rabson	36240ea5bf	Move the include of <sys/systm.h> so that KTR gets a declaration for snprintf().	2000-09-10 13:54:52 +00:00
Jason Evans	0384fff8c5	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh	2000-09-07 01:33:02 +00:00
Don Lewis	f535380cb6	Remove uidinfo hash table lookup and maintenance out of chgproccnt() and chgsbsize(), which are called rather frequently and may be called from an interrupt context in the case of chgsbsize(). Instead, do the hash table lookup and maintenance when credentials are changed, which is a lot less frequent. Add pointers to the uidinfo structures to the ucred and pcred structures for fast access. Pass a pointer to the credential to chgproccnt() and chgsbsize() instead of passing the uid. Add a reference count to the uidinfo structure and use it to decide when to free the structure rather than freeing the structure when the resource consumption drops to zero. Move the resource tracking code from kern_proc.c to kern_resource.c. Move some duplicate code sequences in kern_prot.c to separate helper functions. Change KASSERTs in this code to unconditional tests and calls to panic().	2000-09-05 22:11:13 +00:00
Poul-Henning Kamp	db90128160	Avoid the modules madness I inadvertently introduced by making the cloning infrastructure standard in kern_conf. Modules are now the same with or without devfs support. If you need to detect if devfs is present, in modules or elsewhere, check the integer variable "devfs_present". This happily removes an ugly hack from kern/vfs_conf.c. This forces a rename of the eventhandler and the standard clone helper function. Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include like <sys/queue.h> Remove all #includes of opt_devfs.h they no longer matter.	2000-09-02 19:17:34 +00:00
Poul-Henning Kamp	3f54a085a6	Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c) Remove old DEVFS support fields from dev_t. Make uid, gid & mode members of dev_t and set them in make_dev(). Use correct uid, gid & mode in make_dev in disk minilayer. Add support for registering alias names for a dev_t using the new function make_dev_alias(). These will show up as symlinks in DEVFS. Use makedev() rather than make_dev() for MFSs magic devices to prevent DEVFS from noticing this abuse. Add a field for DEVFS inode number in dev_t. Add new DEVFS in fs/devfs. Add devfs cloning to: disk minilayer (ie: ad(4), sd(4), cd(4) etc etc) md(4), tun(4), bpf(4), fd(4) If DEVFS add -d flag to /sbin/inits args to make it mount devfs. Add commented out DEVFS to GENERIC	2000-08-20 21:34:39 +00:00
Peter Wemm	37b087a645	Clean up some low level bootstrap code: - stop using the evil 'struct trapframe' argument for mi_startup() (formerly main()). There are much better ways of doing it. - do not use prepare_usermode() - setregs() in execve() will do it all for us as long as the p_md.md_regs pointer is set. (which is now done in machdep.c rather than init_main.c. The Alpha port did it this way all along and is much cleaner). - collect all the magic %cr0 etc register settings into one place and have the AP's call that instead of using magic numbers (!!) that keep changing over and over again. - Make it safe to call kthread_create() earlier, including during the device probe sequence. It doesn't need the callback mechanism that NetBSD's version uses. - kthreads created this way are root-less as they exist before the root filesystem is mounted. init(1) is set up so that it aquires the root pointers prior to running. If other kthreads want filesystem acccess we can make this code more generic. - set all threads start times once we have decided what time it is. - init uses a trampoline rather than the evil prepare_usermode() hack. - kern_descrip.c has a couple of tweaks to deal with forking when there is no rootdir or cwd etc. - adjust the early SYSINIT() sequence so that a few prereqisites are in place. eg: make sure the run queue is initialized before doing forks. With this, the USB code can easily create a kthread to do the device tree discovery. (I have tested it, it works nicely). There are still some open issues before this is truely useful. - tsleep() does not like working before the clock is running. It sort-of tries to spin wait, but it can do more useful things now. - stopping a kthread in kld code at unload time is "interesting" but we have a solution for that. The Alpha code needs no changes for this. It already uses pretty much the same strategies, but a little cleaner.	2000-08-11 09:05:12 +00:00
Peter Wemm	af4b2d2d1c	Fix the SYSINIT() bubble sort. This was fixed in kern_linker.c already.	2000-08-02 21:05:21 +00:00
Mark Murray	b6e67f5c7d	Remove no-longer-relevant comment.	2000-06-25 10:14:06 +00:00
Alfred Perlstein	c636255150	fix races in the uidinfo subsystem, several problems existed: 1) while allocating a uidinfo struct malloc is called with M_WAITOK, it's possible that while asleep another process by the same user could have woken up earlier and inserted an entry into the uid hash table. Having redundant entries causes inconsistancies that we can't handle. fix: do a non-waiting malloc, and if that fails then do a blocking malloc, after waking up check that no one else has inserted an entry for us already. 2) Because many checks for sbsize were done as "test then set" in a non atomic manner it was possible to exceed the limits put up via races. fix: instead of querying the count then setting, we just attempt to set the count and leave it up to the function to return success or failure. 3) The uidinfo code was inlining and repeating, lookups and insertions and deletions needed to be in their own functions for clarity. Reviewed by: green	2000-06-22 22:27:16 +00:00
Jordan K. Hubbard	8d0bf3d6f8	Add new oid, debug.boothowto. This allows userland apps to see how the kernel was booted and perhaps do conditional things based upon it (sysinstall, for example, will now turn Debug mode on automatically if boot -v was done). Submitted by: msmith Suggested by: ulf	2000-02-25 11:43:08 +00:00
Greg Lehey	580e7e5a0f	If we fail to find init, print out the search path used. This helps differentiate between one of three different scenarios: 1. No init. 2. Path to init munged by an incorrect loader configuration. 3. Root file system not mounted. Reviewed-by: billf	1999-12-20 02:50:49 +00:00
Poul-Henning Kamp	0429e37ade	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967	1999-11-20 10:00:46 +00:00
Poul-Henning Kamp	2e3c8fcbd0	This is a partial commit of the patch from PR 14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914	1999-11-16 10:56:05 +00:00
Mike Smith	92b3c1b050	swapinit isn't called from vfs_mountroot, so don't complain about it in a #if 0'ed comment. Call the machine-dependant cpu_rootconf functions from sysinits in their respective areas, don't do it from a stub here.	1999-11-01 23:53:27 +00:00
Poul-Henning Kamp	923502ff91	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
Peter Wemm	d1f088dab5	Trim unused options (or #ifdef for undoc options). Submitted by: phk	1999-10-11 15:19:12 +00:00
Bruce Evans	37d3877723	Moved the definition of `boottime' and its sysctl to the correct file.	1999-09-13 14:22:27 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Poul-Henning Kamp	9dcbe2404a	Convert DEVFS hooks in (most) drivers to make_dev(). Diskslice/label code not yet handled. Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers) Add the correct hook for devfs to kern_conf.c The net result of this excercise is that a lot less files depends on DEVFS, and devtoname() gets more sensible output in many cases. A few drivers had minor additional cleanups performed relating to cdevsw registration. A few drivers don't register a cdevsw{} anymore, but only use make_dev().	1999-08-23 20:59:21 +00:00
Peter Wemm	9c8b8baa38	Slight reorganization of kernel thread/process creation. Instead of using SYSINIT_KT() etc (which is a static, compile-time procedure), use a NetBSD-style kthread_create() interface. kproc_start is still available as a SYSINIT() hook. This allowed simplification of chunks of the sysinit code in the process. This kthread_create() is our old kproc_start internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work the same as the NetBSD one. One thing I'd like to do shortly is get rid of nfsiod as a user initiated process. It makes sense for the nfs client code to create them on the fly as needed up to a user settable limit. This means that nfsiod doesn't need to be in /sbin and is always "available". This is a fair bit easier to do outside of the SYSINIT_KT() framework.	1999-07-01 13:21:46 +00:00
Peter Wemm	df8abd0bb9	Slight tweak to fork1() calling conventions. Add a third argument so the caller can easily find the child proc struct. fork(), rfork() etc syscalls set p->p_retval[] themselves. Simplify the SYSINIT_KT() code and other kernel thread creators to not need to use pfind() to find the child based on the pid. While here, partly tidy up some of the fork1() code for RF_SIGSHARE etc.	1999-06-30 15:33:41 +00:00
John Birrell	e40822840b	Use colons instead of semi-colons to behave like UNIX instead of DOS. Suggested by: bde	1999-05-11 10:08:10 +00:00
Peter Wemm	7067d561cc	Lites2 seems to have pretty much disappeared from the radar, and I suspect far more than this hack would be needed now..	1999-05-09 20:42:45 +00:00
Peter Wemm	18c3dd1745	s/main/mi_startup/ for the kernel entry point so that egcs doesn't get upset about it (and generate things like __main() calls that are reserved for main()). Renaming was phk's suggestion, but I'd already thought about it too. (phk liked my suggested name tada() but I decided against it :-) Reviewed by: phk	1999-05-09 19:01:49 +00:00
Dag-Erling Smørgrav	b83308b00b	Nit fix.	1999-05-07 17:37:08 +00:00
John Birrell	67481196cc	Allow the init_path to be customised in an embedded system using the INIT_PATH config option. Also fix two bugs which caused an infinite loop in none of the programs in the init_path were found. That code was obviously not tested!	1999-05-05 12:20:23 +00:00
Bill Fumerola	3d177f465a	Add sysctl descriptions to many SYSCTL_XXXs PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)	1999-05-03 23:57:32 +00:00
Dmitrij Tejblum	188554bba1	Set curproc at the end of proc0_init(). This patch also moves the bogus comment (the comment is still not quite right) and (as a side effect) removes some verbose initialisations (we depend on static initialisation to 0 for almost everything in proc0). The alpha kernels are bootable again. The change won't affect i386's until machdep.c is changed. Submitted by: bde	1999-04-29 22:51:59 +00:00
Poul-Henning Kamp	75c1354190	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/	1999-04-28 11:38:52 +00:00
Luoqi Chen	5206bca10a	Enable vmspace sharing on SMP. Major changes are, - %fs register is added to trapframe and saved/restored upon kernel entry/exit. - Per-cpu pages are no longer mapped at the same virtual address. - Each cpu now has a separate gdt selector table. A new segment selector is added to point to per-cpu pages, per-cpu global variables are now accessed through this new selector (%fs). The selectors in gdt table are rearranged for cache line optimization. - fask_vfork is now on as default for both UP and SMP. - Some aio code cleanup. Reviewed by: Alan Cox <alc@cs.rice.edu> John Dyson <dyson@iquest.net> Julian Elischer <julian@whistel.com> Bruce Evans <bde@zeta.org.au> David Greenman <dg@root.com>	1999-04-28 01:04:33 +00:00
Dmitrij Tejblum	ba41a07d04	Fixed printf format errors on alpha.	1999-04-24 18:50:48 +00:00
Dag-Erling Smørgrav	5f967b24fc	Make the location of init(8) tunable at boot time.	1999-04-20 21:15:13 +00:00
Bruce Evans	e7ba67f274	Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu, not per-process. Keep it in `switchtime' consistently. It is now clear that the timestamp is always valid in fork_trampoline() except when the child is running on a previously idle cpu, which can only happen if there are multiple cpus, so don't check or set the timestamp in fork_trampoline except in the (i386) SMP case. Just remove the alpha code for setting it unconditionally, since there is no SMP case for alpha and the code had rotted. Parts reviewed by: dfr, phk	1999-02-28 10:53:29 +00:00
Bruce Evans	1b0b259ed2	Don't forget to update `switchticks' in corner cases (except for the alpha fork_trampoline(), forget it because it I believe it is only necessary for the unsupported SMP case).	1999-02-25 11:03:08 +00:00
Luoqi Chen	b1028ad122	Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper. Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>	1999-02-19 14:25:37 +00:00
Luoqi Chen	75ffaf5939	Initialize procsig0.ps_refcnt to 1 (instead of 2), this would silence complaints about ps_refcnt greater than two when we try to fork() a kthread from proc0 with RFSIGSHARE flag set. Noticed by: Tor Egge <tegge@fast.no> Reviewed by: Richard Seaman, Jr. <dick@tar.com>	1999-02-17 21:03:14 +00:00
Mike Smith	e25810a699	Remove unused "kern.shutdown_timeout" sysctl node.	1999-01-30 19:36:02 +00:00
Matthew Dillon	bc81493155	More const fixes for -Wall, -Wcast-qual	1999-01-29 23:18:50 +00:00
Matthew Dillon	3cfc69e6c2	More -Wall / -Wcast-qual cleanup. Also, EXEC_SET can't use C_DECLARE_MODULE due to the linker_file_sysinit() function making modifications to the data.	1999-01-29 08:36:45 +00:00
Julian Elischer	88c5ea4574	Enable Linux threads support by default. This takes the conditionals out of the code that has been tested by various people for a while. ps and friends (libkvm) will need a recompile as some proc structure changes are made. Submitted by: "Richard Seaman, Jr." <dick@tar.com>	1999-01-26 02:38:12 +00:00
Julian Elischer	dc9c271aa1	Changes to the LINUX_THREADS support to only allocate extra memory for shared signal handling when there is shared signal handling being used. This removes the main objection to making the shared signal handling a standard ability in rfork() and friends and 'unconditionalising' this code. (i.e. the allocation of an extra 328 bytes per process). Signal handling information remains in the U area until such a time as it's reference count would be incremented to > 1. At that point a new struct is malloc'd and maintained in KVM so that it can be shared between the processes (threads) using it. A function to check the reference count and move the struct back to the U area when it drops back to 1 is also supplied. Signal information is therefore now swapable for all processes that are not sharing that information with other processes. THis should addres the concerns raised by Garrett and others. Submitted by: "Richard Seaman, Jr." <dick@tar.com>	1999-01-07 21:23:50 +00:00
Doug Rabson	9c0fed3dcf	Various changes to support OSF1 emulation: * Move the user stack from VM_MAXUSER_ADDRESS to a place below the 32bit boundary (needed to support 32bit OSF programs). This should also save one pagetable per process. * Add cvtqlsv to the set of instructions handled by the floating point software completion code. * Disable all floating point exceptions by default. * A minor change to execve to allow the OSF1 image activator to support dynamic loading.	1998-12-30 10:38:59 +00:00
Julian Elischer	39fb8e6b3e	Fix two bogons created by 'patch(1)' in my last commit.	1998-12-19 08:23:31 +00:00
Julian Elischer	6626c6045c	Reviewed by: Luoqi Chen, Jordan Hubbard Submitted by: "Richard Seaman, Jr." <lists@tar.com> Obtained from: linux :-) Code to allow Linux Threads to run under FreeBSD. By default not enabled This code is dependent on the conditional COMPAT_LINUX_THREADS (suggested by Garret) This is not yet a 'real' option but will be within some number of hours.	1998-12-19 02:55:34 +00:00
Peter Wemm	ddd62546e1	Fix sysinit_add(). - Don't include multiple copies of the previous sysinit in the new one. - Leave space for and explicitly null terminate the new list.	1998-10-15 17:09:19 +00:00
Peter Wemm	94e9d7c12d	Implement merging SYSINIT's from preloaded KLD modules. This means we check off SYSINIT entries as they are run, and when more arrive, we re-sort and restart (skipping the already-run entries). This can only be done after KMEM (and malloc) is up and running - this is fine because KLD is the only consumer of this and it's done after that. The nice thing about this is that the SYSINIT's within preloaded KLD modules are executed in their natural order. It should be possible to register devices for the probes which follow, etc. (soon.. several key things prevent this, such as use of linker sets for things like pci devices).	1998-10-09 23:42:47 +00:00
Doug Rabson	a20d77550a	Make sure that the argv pointers for init are aligned to the correct boundary on the alpha.	1998-10-06 11:55:40 +00:00
Søren Schmidt	d024c95599	Remove the SLICE code. This clearly needs alot more thought, and we dont need this to hunt us down in 3.0-RELEASE.	1998-09-14 19:56:42 +00:00
Bruce Evans	c2da0fd903	Cast pointers to intptr_t instead of or before casting to long. Fixed bitrot in K&R support (suword() now takes a long word). Didn't fix corresponding bitrot in store.9 and fetch.9. The correct types for the store and fetch families are problematic. The `word' functions are unfortunately named and need to be split to handle ints/longs/object pointers/function pointers. Storing argv[] as longs is quite broken when longs are longer than pointers, but usually works because it clobbers variables that will soon be reinitialized.	1998-07-15 05:21:48 +00:00
Doug Rabson	ecbb00a262	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.	1998-06-07 17:13:14 +00:00
Poul-Henning Kamp	e796e00de3	Some cleanups related to timecounters and weird ifdefs in <sys/time.h>. Clean up (or if antipodic: down) some of the msgbuf stuff. Use an inline function rather than a macro for timecounter delta. Maintain process "on-cpu" time as 64 bits of microseconds to avoid needless second rollover overhead. Avoid calling microuptime the second time in mi_switch() if we do not pass through _idle in cpu_switch() This should reduce our context-switch overhead a bit, in particular on pre-P5 and SMP systems. WARNING: Programs which muck about with struct proc in userland will have to be fixed. Reviewed, but found imperfect by: bde	1998-05-28 09:30:28 +00:00
Poul-Henning Kamp	c21410e119	s/nanoruntime/nanouptime/g s/microruntime/microuptime/g Reviewed by: bde	1998-05-17 11:53:46 +00:00
Julian Elischer	3e425b968d	Add changes and code to implement a functional DEVFS. This code will be turned on with the TWO options DEVFS and SLICE. (see LINT) Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes. /dev will be automatically mounted by init (thanks phk) on bootup. See /sys/dev/slice/slice.4 for more info. All code should act the same without these options enabled. Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others This code does not support the following: bad144 handling. Persistance. (My head is still hurting from the last time we discussed this) ATAPI flopies are not handled by the SLICE code yet. When this code is running, all major numbers are arbitrary and COULD be dynamically assigned. (this is not done, for POLA only) Minor numbers for disk slices ARE arbitray and dynamically assigned.	1998-04-19 23:32:49 +00:00
Dag-Erling Smørgrav	dc73342347	Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.	1998-04-17 22:37:19 +00:00
Poul-Henning Kamp	a2481bbe8e	When pmap_pinit0() allocates a page for proc0's page directory, kernal page table may need to be extended. But while growing the kernel page table (pmap_growkernel()), newly allocated kernel page table pages are entered into every process' page directory. For proc0, the page directory is not allocated yet, and results in a page fault. Eventually, the machine panics with "lockmgr: not holding exclusive lock". PR: 5458 Reviewed by: phk Submitted by: Luoqi Chen <luoqi@luoqi.watermarkgroup.com>	1998-04-11 17:24:06 +00:00
Poul-Henning Kamp	5f88ec3625	Minor adjustments to the timecounting and proc0. Mostly Submitted by: bde	1998-04-08 09:01:53 +00:00
Peter Wemm	aacdc613e5	curproc is initialized in locore at the same time for both SMP and UP now.	1998-04-06 15:51:22 +00:00
Poul-Henning Kamp	00af9731c9	Time changes mark 2: * Figure out UTC relative to boottime. Four new functions provide time relative to boottime. * move "runtime" into struct proc. This helps fix the calcru() problem in SMP. * kill mono_time. * add timespec{add\|sub\|cmp} macros to time.h. (XXX: These may change!) * nanosleep, select & poll takes long sleeps one day at a time Reviewed by: bde Tested by: ache and others	1998-04-04 13:26:20 +00:00
Poul-Henning Kamp	227ee8a188	Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde	1998-03-30 09:56:58 +00:00
John Dyson	9f24f214c3	Make the rootdir handling more consistent. Now, processes always have a root vnode associated with them, and no special checks for the null case are needed. Submitted by: terry@freebsd.org	1998-02-15 04:17:09 +00:00
Eivind Eklund	0b08f5f737	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
Eivind Eklund	47cfdb166d	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
Poul-Henning Kamp	c5b193bfba	Retire LFS. If you want to play with it, you can find the final version of the code in the repository the tag LFS_RETIREMENT. If somebody makes LFS work again, adding it back is certainly desireable, but as it is now nobody seems to care much about it, and it has suffered considerable bitrot since its somewhat haphazard integration. R.I.P	1998-01-30 11:34:06 +00:00
John Dyson	2d8acc0f4a	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)	1998-01-22 17:30:44 +00:00
John Dyson	8256655132	After one of my analysis passes to evaluate methods for SMP TLB mgmt, I noticed some major enhancements available for UP situations. The number of UP TLB flushes is decreased much more than significantly with these changes. Since a TLB flush appears to cost minimally approx 80 cycles, this is a "nice" enhancement, equiv to eliminating between 40 and 160 instructions per TLB flush. Changes include making sure that kernel threads all use the same PTD, and eliminate unneeded PTD switches at context switch time.	1997-12-14 02:11:23 +00:00

... 2 3 4 5 6 ...

428 Commits