freebsd-dev

Author	SHA1	Message	Date
John Baldwin	f4daf05619	- Correct the translation of old rlimit values to properly handle the old RLIM_INFINITY case for ogetrlimit(). - Use %jd and intmax_t to output negative time in usec in calcru(). - Rework getrusage() to make a copy of the rusage struct into a local variable while holding Giant and then do the copyout from the local variable to avoid having to have the original process rusage struct locked while doing the copyout (which would not be safe). This also includes a few style fixes from Bruce to getrusage(). Submitted by: bde (1, parts of 3) Suggested by: bde (2)	2004-02-06 19:30:12 +00:00
John Baldwin	99b6e02ba6	A few more style fixes from Bruce including a few I missed last time. Submitted by: bde	2004-02-06 19:25:34 +00:00
John Baldwin	4c3558aa82	Always set a process' state to normal when it is fully constructed in fork1() rather than only doing it for the RFSTOPPED case and then having to fix it up in other places later on.	2004-02-05 21:01:37 +00:00
John Baldwin	b4323d7729	- A lot of style and whitespace fixes. - Update a few comments regarding locking notes. Submitted by: bde (1, mostly)	2004-02-05 20:53:25 +00:00
Jacques Vidrine	b00a3c85da	Correct a reference counting bug in shmat(2). If vm_map_find(9) failed, the reference count for the virtual memory object referenced by the specified shared memory segment would have been erroneously incremented. Reported by: Joost Pol <joost@pine.nl>	2004-02-05 18:00:35 +00:00
Alexander Kabaev	dec8868dcc	Rename cn_unavailable to cnunavailable for little more consistency. Garbage collect unused cndebug() function. Suggested by: bde	2004-02-05 17:35:28 +00:00
Mike Silbersack	b711d74eaf	Style fixes: don't indent variable names. Submitted by: bde	2004-02-05 08:29:27 +00:00
Alexander Kabaev	e99c09e2dc	Eliminate global cons_unavailable flag and replace it by the status bit maintained on a per-device basis. Single variable is inadequate on machines running with multiple consoles enabled.	2004-02-05 01:56:43 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Mike Silbersack	ff5e43a3fd	Rename iov_to_uio to uiofromiov to be more consistent with other uio* functions. Suggested by: bde	2004-02-04 08:43:21 +00:00
Pawel Jakub Dawidek	19b0efd32d	Allow assert that the current thread does not hold the sx(9) lock. Reviewed by: jhb In cooperation with: juli, jhb Approved by: jhb, scottl (mentor)	2004-02-04 08:14:58 +00:00
Mike Silbersack	2ccbe4b596	Style fixes Submitted by: bde	2004-02-04 08:14:47 +00:00
Robert Watson	5e312ddcc6	A variety of further cleanups to ttyinfo(): - Rename temporary variable names ("tmp", "tmp2") to more informative names ("load", "pctcpu", "rss", ...) - Unclutter indentation and return paths: rather than lots of nested ifs, simply return earlier if it's not going to work out. Simplify general structure and avoid "deep" code. - Comment on the thread/process selection and locking. - Correct handling of "running"/"runnable" states, avoid "unknown" that people were seeing for running processes. This was due to a misunderstanding of the more complex state machine / inhibitors behavior of KSE. - Do perform ttyinfo() printing on KSE (P_SA) processes, it seems generally to work. While I initially attempted to formulate this as two commits (one layout, the other content), I concluded that the layout changes were really structural changes. Many elements submitted by: bde	2004-02-04 05:46:05 +00:00
John Baldwin	3e9ac3ebf2	Remove a bogus assertion. Noticed by: bde Pointy hat to: jhb	2004-02-03 15:14:27 +00:00
Daniel Eischen	b5426f096b	Regen after adding ksem_timedwait().	2004-02-03 05:11:31 +00:00
Daniel Eischen	aae94fbbb6	Add ksem_timedwait() to complement ksem_wait(). Glanced at by: alfred	2004-02-03 05:08:32 +00:00
Robert Watson	4f638130c3	Don't dec/inc the amountpipes counter every time we resize a pipe -- instead, just dec/inc in the ctor/dtor. For now, increment/decrement in two's, since we're now performing the operation once per pair, not once per pipe. Not really any measurable performance change in my micro-benchmarks, but doing less work is good, especially when it comes to atomic operations. Suggested by: alc	2004-02-03 04:55:24 +00:00
Robert Watson	9a830ddc54	Catch instances of (pipe == NULL) that were obsoleted with recent changes to jointly allocated pipe pairs. Replace these checks with pipe_present checks. This avoids a NULL pointer dereference when a pipe is half-closed. Submitted by: Peter Edwards <peter.edwards@openet-telecom.com>	2004-02-03 02:50:51 +00:00
John Baldwin	9c9c52a3ed	- Assert that witness_cold is not true in enroll(). - Only check witness_watch once in enroll(). Reported by: ru (2)	2004-02-02 22:15:17 +00:00
Pawel Jakub Dawidek	3410b19324	Fix many issues related to mount/unmount: 1. Root from inside a jail was able to unmount any file system (except /). 2. Unprivileged root was able to unmount file systems mounted by privileged root (execpt /). 3. User from inside a jail was able to mount file system when sysctl vfs.usermount was set to 1. 4. User was able to mount file system when vfs.usermount was set to 1 (that's ok) and unmount it even if vfs.usermount was equal to 0 (that's not correct). Possibility from point 1 was reported by: Dariusz Kowalski <darek@76.pl> Only a part of this fix will be MFC'ed (if approved). PR: kern/60149 Reviewed by: rwatson Approved by: scottl (mentor) MFC after: 3 days	2004-02-02 19:02:05 +00:00
Mike Silbersack	02ec600572	Remove debugging code that slipped into the previous commit. Spotted by: bde	2004-02-02 09:09:59 +00:00
Jeff Roberson	b209e5e3e4	- style fixes to the critical_exit() KASSERT(). Submitted by: bde	2004-02-02 08:13:27 +00:00
Jeff Roberson	0392e39dff	- Allow interactive tasks to use the maximum time-slice. This is not as detrimental as I thought it would be in the case of massive process storms from a shell and it makes regular desktop usage noticeably better.	2004-02-01 10:38:13 +00:00
Mike Silbersack	beb699c7ba	Rewrite sendfile's header support so that headers are now sent in the first packet along with data, instead of in their own packet. When serving files of size (packetsize - headersize) or smaller, this will result in one less packet crossing the network. Quick testing with thttpd and http_load has shown a noticeable performance improvement in this case (350 vs 330 fetches per second.) Included in this commit are two support routines, iov_to_uio, and m_uiotombuf; these routines are used by sendfile to construct the header mbuf chain that will be linked to the rest of the data in the socket buffer.	2004-02-01 07:56:44 +00:00
Jeff Roberson	f2f51f8ab8	- Disable ithread binding in all cases for now. This doesn't make as much sense with sched_4bsd as it does with sched_ule. - Use P_NOLOAD instead of the absence of td->td_ithd to determine whether or not a thread should be accounted for in sched_tdcnt.	2004-02-01 06:20:18 +00:00
Robert Watson	4795b82c13	Coalesce pipe allocations and frees. Previously, the pipe code would allocate two 'struct pipe's from the pipe zone, and malloc a mutex. - Create a new "struct pipepair" object holding the two 'struct pipe' instances, struct mutex, and struct label reference. Pipe structures now have a back-pointer to the pipe pair, and a 'pipe_present' flag to indicate whether the half has been closed. - Perform mutex init/destroy in zone init/destroy, avoiding reallocating the mutex for each pipe. Perform most pipe structure setup in zone constructor. - VM memory mappings for pageable buffers are still done outside of the UMA zone. - Change MAC API to speak 'struct pipepair' instead of 'struct pipe', update many policies. MAC labels are also handled outside of the UMA zone for now. Label-only policy modules don't have to be recompiled, but if a module is recompiled, its pipe entry points will need to be updated. If a module actually reached into the pipe structures (unlikely), that would also need to be modified. These changes substantially simplify failure handling in the pipe code as there are many fewer possible failure modes. On half-close, pipes no longer free the 'struct pipe' for the closed half until a full-close takes place. However, VM mapped buffers are still released on half-close. Some code refactoring is now possible to clean up some of the back references, etc; this patch attempts not to change the structure of most of the pipe implementation, only allocation/free code paths, so as to avoid introducing bugs (hopefully). This cuts about 8%-9% off the cost of sequential pipe allocation and free in system call tests on UP and SMP in my micro-benchmarks. May or may not make a difference in macro-benchmarks, but doing less work is good. Reviewed by: juli, tjr Testing help: dwhite, fenestro, scottl, et al	2004-02-01 05:56:51 +00:00
Jeff Roberson	40ece05382	- Revert rev 1.240 we no longer need a kthread for loadav().	2004-02-01 05:37:36 +00:00
Jeff Roberson	e7f004fe23	- Use sched_load() rather than grabbing the sx lock and traversing the proc table to discover the load.	2004-02-01 02:51:33 +00:00
Jeff Roberson	33916c360e	- Add a new member to struct kseq called ksq_sysload. This is intended to track the load for the sched_load() function. In the SMP case this member is not defined because it would be redundant with the ksg_load member which already tracks the non ithd load. - For sched_load() in the UP case simply return ksq_sysload. In the SMP case traverse the list of kseq groups and sum up their ksg_load fields.	2004-02-01 02:48:36 +00:00
Jeff Roberson	ca59f15272	- Keep a variable 'sched_tdcnt' that is used for the local implementation of sched_load(). This variable tracks the number of running and runnable non ithd threads. This removes the need to traverse the proc table and discover how many threads are runnable.	2004-02-01 02:46:47 +00:00
Robert Watson	fca542bcaa	Move KASSERT regarding td_critnest to after the value of td is set to curthread, to avoid warning and incorrect behavior. Hoped not to mind: jeff	2004-02-01 02:31:36 +00:00
Jeff Roberson	6767c6547b	- Assert that td_critnest > 0 in critical_exit() to catch cases of unbalanced uses of the critical_* api.	2004-02-01 01:24:54 +00:00
Robert Watson	26518e8d8c	Fix an error in a KASSERT string: it's pipe_free_kmem(), not pipespace(), that contains this KASSERT.	2004-01-31 23:03:22 +00:00
Poul-Henning Kamp	be8a62e821	Introduce the SO_BINTIME option which takes a high-resolution timestamp at packet arrival. For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL since it has higher resolution and lower overhead. Simultaneous use of the two options is possible and they will return consistent timestamps. This introduces an extra test and a function call for SO_TIMEVAL, but I have not been able to measure that.	2004-01-31 10:40:25 +00:00
Robert Watson	30a9f26db2	Assert process lock in ptracestop(), since we're going to rely on it, and later unlock it.	2004-01-29 00:58:21 +00:00
Robert Watson	94ffb20d72	Add a reset sysctl for mutex profiling: zeros all of the mutex profiling buffers and hash table. This makes it a lot easier to do multiple profiling runs without rebooting or performing gratuitous arithmetic. Sysctl is named debug.mutex.prof.reset. Reviewed by: jake	2004-01-28 22:11:53 +00:00
John Baldwin	d5b75694e7	Move the loadav() callout into its own kthread since it uses allproc_lock which is a sleepable lock and thus is not safe to acquire from a callout routine.	2004-01-28 20:44:41 +00:00
John Baldwin	8d768e7676	Rework witness_lock() to make it slightly more useful and flexible. - witness_lock() is split into two pieces: witness_checkorder() and witness_lock(). Witness_checkorder() determines if acquiring a specified lock at the time it is called would result in a lock order. It optionally adds a new lock order relationship as well. witness_lock() updates witness's data structures to assume that a lock has been acquired by stick a new lock instance in the appropriate lock instance list. - The mutex and sx lock functions now call checkorder() prior to trying to acquire a lock and continue to call witness_lock() after the acquire is completed. This will let witness catch a deadlock before it happens rather than trying to do so after the threads have deadlocked (i.e. never actually report it). - A new function witness_defineorder() has been added that adds a lock order between two locks at runtime without having to acquire the locks. If the lock order cannot be added it will return an error. This function is available to programmers via the WITNESS_DEFINEORDER() macro which accepts either two mutexes or two sx locks as its arguments. - A few simple wrapper macros were added to allow developers to call witness_checkorder() anywhere as a way of enforcing locking assertions in code that might acquire a certain lock in some situations. The macros are: witness_check_{mutex,shared_sx,exclusive_sx} and take an appropriate lock as the sole argument. - The code to remove a lock instance from a lock list in witness_unlock() was unnested by using a goto to vastly improve the readability of this function.	2004-01-28 20:39:57 +00:00
John Baldwin	62a0fd943c	Use mtx_assert() rather than using a home-rolled version.	2004-01-28 20:26:39 +00:00
Alexander Kabaev	975634280a	Move the part of the comment which applies to osigsuspend where it belongs. The current sigsuspend syscall does expect a pointer to the mask as argument. Submitted by: Igor Sysoev <is at rambler-co dot ru>	2004-01-28 06:06:04 +00:00
Dag-Erling Smørgrav	84344f9fbf	Rename the kern.vm.kmem.size tunable to the more logical vm.kmem_size. To assure backward compatibility (conditional on !BURN_BRIDGES), look it up by its old name first, and log a warning (but accept the setting) if it was found. If both the old and new name are defined, the new name takes precedence. Also export vm.kmem_size as a read-only sysctl variable; I find it hard to tune a parameter when I don't know its default value, especially when that default value is computed at boot time.	2004-01-27 15:59:38 +00:00
Robert Watson	6bea667f63	When aborting fork() due to a failure, if using MAC, make sure to clean up the p_label field. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-01-25 18:42:18 +00:00
Ruslan Ermilov	33fe8fd0df	Register the uart(4)'s spin lock with witness(4).	2004-01-25 15:04:37 +00:00
Jeff Roberson	c77ac1fdee	- sched_strict has been dead for a long time now. Get rid of it.	2004-01-25 08:58:14 +00:00
Jeff Roberson	c494ddc8a1	- Clean up KASSERTS.	2004-01-25 08:57:38 +00:00
Jeff Roberson	5a2b158d8d	- Correct function names listed in KASSERTs. These were copied from other code and it was sloppy of me not to adjust these sooner.	2004-01-25 08:21:46 +00:00
Jeff Roberson	e17c57b14b	- Implement cpu pinning and binding. This is acomplished by keeping a per- cpu run queue that is only used for pinned or bound threads. Submitted by: Chris Bradfield <chrisb@ation.org>	2004-01-25 08:00:04 +00:00
Jeff Roberson	d1605f0ac9	- Use a unique string for the sched_setup SYSINIT and rename sched_setup to synch_setup. The schedulers use the sched_setup function name.	2004-01-25 07:49:45 +00:00
Jeff Roberson	29bcc4514f	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.	2004-01-25 03:54:52 +00:00
Robert Watson	8dc10be885	Add some basic support for measuring sleep mutex contention to the mutex profiling code. As with existing mutex profiling, measurement is done with respect to mtx_lock() instances in the code, as opposed to specific mutexes. In particular, measure two things: (1) Lock contention. How often did this mtx_lock() call get made and have to sleep (or almost sleep) waiting for the lock. This helps identify the "victims" of contention. (2) Hold contention. How often, while the lock was held by a thread as a result of this mtx_lock(), did another thread try to acquire the same mutex. This helps identify the causes of contention. I'm currently exploring adding measurement of "time waited for the lock", but the current implementation has proven useful to me so far so I figured I'd commit it so others could try it out. Note that this increases the size of mutexes when MUTEX_PROFILING is enabled, so you might find you need to further bump UMA_BOOT_PAGES. Fixes welcome. The once over: des, others	2004-01-25 01:59:27 +00:00
Poul-Henning Kamp	551260fc36	Deal with MOD_FREQUENCY before MOD_OFFSET because the latter is the one which runs the actual update. This fixes a bug where there were a delay in applying the frequency adjustment. In extreme cases this could result in marginal stability of the kernel-pll.	2004-01-24 21:48:43 +00:00
Jeff Roberson	b9509b56fa	- Move smp_topology to subr_smp.c so that it is defined on all architectures.	2004-01-24 19:52:48 +00:00
Robert Watson	646e29ccac	Don't grab Giant in crfree(), since prison_free() no longer requires it. The uidinfo code appears to be MPSAFE, and is referenced without Giant elsewhere. While this grab of Giant was only made in fairly rare circumstances (actually GC'ing on refcount==0), grabbing Giant here potentially introduces lock order issues with any locks held by the caller. So this probably won't help performance much unless you change credentials a lot in an application, and leave a lot of file descriptors and cached credentials around. However, it simplifies locking down consumers of the credential interfaces. Bumped into by: sam Appeased: tjr	2004-01-23 21:07:52 +00:00
Robert Watson	b3059e09f6	Defer the vrele() on a jail's root vnode reference from prison_free() to a new prison_complete() task run by a task queue. This removes a requirement for grabbing Giant in crfree(). Embed the 'struct task' in 'struct prison' so that we don't have to allocate memory from prison_free() (which means we also defer the FREE()). With this change, I believe grabbing Giant from crfree() can now be removed, but need to check the uidinfo code paths. To avoid header pollution, move the definition of 'struct task' to _task.h, and recursively include from taskqueue.h and jail.h; much preferably to all files including jail.h picking up a requirement to include taskqueue.h. Bumped into by: sam Reviewed by: bde, tjr	2004-01-23 20:44:26 +00:00
Poul-Henning Kamp	ee57aeea65	Write 100 times for tomorrow: "Always print time_t as %jd, you never know what width it has"	2004-01-22 19:50:06 +00:00
Ralf S. Engelschall	446655ac4f	Fix generation of random multicast MAC address. In case no real/physical IEEE 802 address is available, both the expired "draft-leach-uuids-guids-01" (section "4. Node IDs when no IEEE 802 network card is available") and RFC 2518 (section "6.4.1 Node Field Generation Without the IEEE 802 Address") recommend (quoted from RFC 2518): "The ideal solution is to obtain a 47 bit cryptographic quality random number, and use it as the low 47 bits of the node ID, with the _most_ significant bit of the first octet of the node ID set to 1. This bit is the unicast/multicast bit, which will never be set in IEEE 802 addresses obtained from network cards; hence, there can never be a conflict between UUIDs generated by machines with and without network cards." Unfortunately, this incorrectly explains how to implement this and the FreeBSD UUID generator code inherited this generation bug from the broken reference code in the standards draft. They should instead specify the "_least_ significant bit of the first octet of the node ID" as the multicast bit in a memory and hexadecimal string representation of a 48-bit IEEE 802 MAC address. This standards bug arised from a false interpretation, as the multicast bit is actually the _most_ significant bit in IEEE 802.3 (Ethernet) _transmission order_ of an IEEE 802 MAC address. The standards authors forgot that the bitwise order of an _octet_ from a MAC address _memory_ and hexadecimal string representation is still always from left (MSB, bit 7) to right (LSB, bit 0). Fortunately, this UUID generation bug could have occurred on systems without any Ethernet NICs only.	2004-01-22 13:34:11 +00:00
Poul-Henning Kamp	4e74721cac	Add a sysctl (default: off) which enables a log(LOG_INFO...) warning if the clock is stepped.	2004-01-21 21:05:40 +00:00
Robert Watson	679365e7b9	Reduce gratuitous includes: don't include jail.h if it's not needed. Presumably, at some point, you had to include jail.h if you included proc.h, but that is no longer required. Result of: self injury involving adding something to struct prison	2004-01-21 17:10:47 +00:00
Andrey A. Chernov	9bbee25931	pread/pwrite: follow lseek spirit - return EINVAL on negative offset for non-VCHR	2004-01-20 01:27:42 +00:00
Poul-Henning Kamp	50d23be140	Add linenumber and source filename to panic(9) output. Ideally a traceback should be printed too, any takers ?	2004-01-19 21:27:11 +00:00
Alexander Kabaev	54556cc7b8	One more instance of magic number used in place of IO_SEQSHIFT. Submitted by: alc	2004-01-19 20:45:43 +00:00
Ruslan Ermilov	0541040c46	Since "m" is not part of the "mp" chain, need to free() it. Reported by: Stanford Metacompilation research group	2004-01-18 14:02:53 +00:00
Andrew Gallatin	1c318b9665	Handle sf_buf_alloc() returning null. This can happen if the process takes a signal while waiting for an sf_buf to become available. Reviewed by: alc	2004-01-17 21:16:51 +00:00
Dag-Erling Smørgrav	a6d4491c71	Restore correct semantics for F_DUPFD fcntl. This should fix the errors people have been getting with configure scripts.	2004-01-17 00:59:04 +00:00
Dag-Erling Smørgrav	56a9fc0e93	WITNESS won't let us hold two filedesc locks at the same time, so juggle fdp and newfdp around a bit.	2004-01-16 21:54:56 +00:00
Robert Watson	bafc8f255a	KASSERT() that initproc->p_pid is 1. Very bad things happen if init's pid isn't 1, and it can actually occur if kthread_create() is called before SUB_SI_CREATE_INIT without RFHIGHPID. Discussed with: jhb	2004-01-16 20:29:23 +00:00
Dag-Erling Smørgrav	ddce426f69	Remove two KASSERTs which were overly paranoid.	2004-01-16 08:45:56 +00:00
Dag-Erling Smørgrav	12d568c2b1	Take care to drop locks when calling malloc()	2004-01-15 18:50:11 +00:00
Dag-Erling Smørgrav	a2fe44e8cf	New file descriptor allocation code, derived from similar code introduced in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated file descriptors which is used to locate available descriptors when a new one is needed. It also moves the task of growing the file descriptor table out of fdalloc(), reducing complexity in both fdalloc() and do_dup(). Debts of gratitude are owed to tjr@ (who provided the original patch on which this work is based), grog@ (for the gdb(4) man page) and rwatson@ (for assistance with pxeboot(8)).	2004-01-15 10:15:04 +00:00
Don Lewis	288e351b55	If a device attach routine fails during boot and calls bus_teardown_intr(), ithread_remove_handler() may fail to remove the interrupt handler if it decides to let the ithread do the removal. The problem is that during boot "cold" is set, which causes msleep() to return immediately. This will cause ithread_remove_handler() to fail to wait for the ithread to do the removal from the handler TAILQ before freeing the handler back to the heap. Bad things will happen when some other user of the TAILQ, such as ithread_add_handler() or the actual ithread attempts to use the freed handler. Fix the problem by forcing ithread_remove_handler() to do the actual removal itself if the "cold" flag is set. Reviewed by: jhb	2004-01-13 22:55:46 +00:00
Dag-Erling Smørgrav	ac34dc4e79	Back out 1.160, which was committed by mistake.	2004-01-11 20:08:57 +00:00
Dag-Erling Smørgrav	d7a1c7e34b	Back out 1.166, which was committed by mistake.	2004-01-11 20:07:15 +00:00
Dag-Erling Smørgrav	f1ea6d813d	Mechanical whitespace cleanup + other minor style nits.	2004-01-11 19:56:42 +00:00
Dag-Erling Smørgrav	0e5dfade00	Mechanical whitespace cleanup.	2004-01-11 19:54:45 +00:00
Dag-Erling Smørgrav	05c3c5c8b6	Mechanical whitespace cleanup; parenthesize return values; other minor style nits. The #ifdefs in this file give me a headache...	2004-01-11 19:52:10 +00:00
Dag-Erling Smørgrav	e5aeaa0c67	Mechanical whitespace cleanup; parenthesize return values; other minor style nits.	2004-01-11 19:48:19 +00:00
Dag-Erling Smørgrav	012b5531f4	Mechanical whitespace cleanup + minor style nits.	2004-01-11 19:43:14 +00:00
Dag-Erling Smørgrav	c9de31f55f	Mechanical whitespace cleanup.	2004-01-11 19:39:14 +00:00
Alan Cox	0e88a71798	Remove long dead code, specifically, code related to munmapfd(). (See also vm/vm_mmap.c revision 1.173.)	2004-01-11 06:59:21 +00:00
Robert Watson	def055686c	When not creating a core dump due to resource limits specifying a maximum dump size of 0, return a size-related error, rather than returning success. Otherwise, waitpid() will incorrectly return a status indicating that a core dump was created. Note that the specific error doesn't actually matter, since it's lost. MFC after: 2 weeks PR: 60367 Submitted by: Valentin Nechayev <netch@netch.kiev.ua>	2004-01-11 02:28:06 +00:00
Jens Schweikhardt	85495c72ff	s/Muliple/Multiple Removed whitespace at EOL and EOF.	2004-01-10 18:34:01 +00:00
Dag-Erling Smørgrav	d41457da80	More unparenthesized return values.	2004-01-10 17:14:53 +00:00
Dag-Erling Smørgrav	b91a599717	Style: parenthesize return values.	2004-01-10 13:03:43 +00:00
Don Lewis	2b77864f1e	Add a somewhat redundant check on the len arguement to getsockaddr() to avoid relying on the minimum memory allocation size to avoid problems. The check is somewhat redundant because the consumers of the returned structure will check that sa_len is a protocol-specific larger size. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: nectar MFC after: 30 days	2004-01-10 08:28:54 +00:00
Olivier Houchard	5cded90454	Prevent a race condition between fork1() and whatever changes the pgrp by setting the new process' p_pgrp again before inserting it in the p_pglist. Without it we can get the new process to be inserted in a different p_pglist than the one p2->p_pgrp points to, and this is not something we want to happen. This is not a fix, merely a bandaid, but it will work until someone finds a better way to do it. Discussed with: jhb (a long time ago)	2004-01-09 23:42:36 +00:00
Robert Watson	07eacae0d2	Improve the expressiveness of ttyinfo (^T) when dealing with threads in slightly less usual states: If the thread is on a run queue, display "running" if the thread is actually running, otherwise, "runnable". If the thread is sleeping, and it's on a sleep queue, display the name of the queue, otherwise "unknown" -- previously, in this situation we would display "iowait". If the thread is waiting on a lock, display *lockname. If the thread is suspended, display "suspended" -- previously, in this situation we would display "iowait". If the thread is waiting for an interrupt, display "intrwait" -- previously, in this situation we would display "iowait". If the thread is in a state not handled by the above, display "unknown" -- previously, we would print "iowait". Among other things, this avoids displaying "iowait" when the foreground process turns out to be suspended waiting for a debugger to properly attach.	2004-01-08 22:49:23 +00:00
Robert Watson	047aa39b25	Drop the sigacts mutex around calls to stopevent() to avoid sleeping holding the mutex. Because the sigacts pointer can't change while the process is "live" (proc locking (x)), we know our pointer is still valid. In communication with: truckman Reviewed by: jhb	2004-01-08 22:44:54 +00:00
Alexander Kabaev	c969c60c60	Add pid to the info printed in lockmgr_printinfo. This makes VFS diagnostic messages slightly more useful.	2004-01-06 04:34:13 +00:00
Alexander Kabaev	580ddfa64b	More style fixes. Obtained from: bde	2004-01-05 23:40:46 +00:00
John Baldwin	eac097962f	- Allow mtx_trylock() to recurse on a recursive mutex. Attempts to recurse on a non-recursive mutex will fail but will not trigger any assertions. - Add an assertion to mtx_lock() that one never recurses on a non-recursive mutex. This is mostly useful for the non-WITNESS case. Requested by: deischen, julian, others (1)	2004-01-05 23:09:51 +00:00
Alexander Kabaev	b0fdf71656	style(9): Add empty line before first code line in functions with no local variables. Properly terminate comment sentences. Indent lines which are longer that 80 characters. Move v_addpollinfo closer to the rest of poll-related functions. Move DEBUG_VFS_LOCKS ifdefed block to the end of file. Obtained from: bde (partly)	2004-01-05 19:04:29 +00:00
Alexander Kabaev	3ff1b7c23f	Cosmetics: strip '\n' from a string passed to Debugger().	2004-01-04 03:42:20 +00:00
David Xu	a30ec4b99c	Make sigaltstack as per-threaded, because per-process sigaltstack state is useless for threaded programs, multiple threads can not share same stack. The alternative signal stack is private for thread, no lock is needed, the orignal P_ALTSTACK is now moved into td_pflags and renamed to TDP_ALTSTACK. For single thread or Linux clone() based threaded program, there is no semantic changed, because those programs only have one kernel thread in every process. Reviewed by: deischen, dfr	2004-01-03 02:02:26 +00:00
Nate Lawson	44bb5f52d3	Move the kernel power change printf under bootverbose since the power_profile script now duplicates the message via syslog.	2004-01-02 18:24:13 +00:00
Sam Leffler	4f9f9cf3a4	m_tag fixups in preparation for heavier use: o promote several m_tag_* routines to inline o add an m_tag_setup inline to set the fixed fields in a packet tag o add an m_tag_free method pointer to each mtag to support, for example, allocating tags from zones o have m_tag_find check if the tag list is not empty before calling m_tag_locate to search Reviewed by: brooks, silence from others	2004-01-02 17:27:39 +00:00
David Malone	70ad6c2190	Plug a leak of open files that happens when you exec a suid program with one of std{in,out,err} open. This helps with the file descriptor leaks reported on -current. This should probably be merged into 5.2. Reviewed by: ru Tested by: Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>	2003-12-28 19:27:14 +00:00
Bruce Evans	9efe7d9d83	v_vxproc was a bogus name for a thread (pointer).	2003-12-28 09:12:56 +00:00
Mike Silbersack	ddeb5b242e	Track three new sendfile-related statistics: - The number of times sendfile had to do disk I/O - The number of times sfbuf allocation failed - The number of times sfbuf allocation had to wait	2003-12-28 08:57:09 +00:00
Bruce Evans	d6c847f378	Fixed some style bugs (mainly, try to always use explicit comparisons with NULL when checking for null pointers).	2003-12-28 04:37:59 +00:00
Bruce Evans	ca46e90ef4	Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall function back to near the beginning of the file. Rev.1.194 moved it into the middle of auxiliary functions following kern_execve(). Moved the __mac_execve() syscall function up together with execve(). It was new in rev1.1.196 and perfectly misplaced after execve().	2003-12-28 04:18:13 +00:00
Mike Silbersack	69fba1650a	Fix the maxpipekva warning message so that it points to the correct sysctl, and shorten the message. Noticed by: bde	2003-12-28 01:19:58 +00:00
Alan Cox	34d2675761	Remove GIANT_REQUIRED from exec_unmap_first_page().	2003-12-27 19:40:03 +00:00
Mike Silbersack	5eda9873e9	Track current and peak sfbuf usage, export the values via sysctl.	2003-12-27 07:52:47 +00:00
John Baldwin	c55bbb6cb7	Create a separate kthread that executes sched_cpu() once a second. Because sched_cpu() locks an sx lock (allproc_lock) which can sleep if it fails to acquire the lock, it is not safe to execute this in a callout handler from softclock().	2003-12-26 17:07:29 +00:00
Alfred Perlstein	866e3b7e73	Put restrict back in, the compilation failure was my fault when I did a bad merge from the PR. Thanks to Bruce Evans for explaining.	2003-12-26 05:58:16 +00:00
Alfred Perlstein	4abb4ff34d	Add __restrict qualifiers to copyinfrom, copyinstrfrom, copystr, copyinstr, copyin and copyout.	2003-12-26 05:54:35 +00:00
David Malone	9322078275	In socket(2) we only need Giant around the call to socreate, so just grab it there.	2003-12-25 23:44:38 +00:00
David Malone	1c58509c25	Don't TAILQ_INIT kq_head twice, once is enough.	2003-12-25 23:42:36 +00:00
Mike Silbersack	8dee2f6746	Fix another 0 / NULL mixup.	2003-12-25 01:17:27 +00:00
Alfred Perlstein	6502da1307	We're not ready for restrict qualifiers here.	2003-12-24 19:09:45 +00:00
Alfred Perlstein	9f144cff85	Add restrict qualifiers. PR: 44394 Submitted by: Craig Rodrigues <rodrige@attbi.com>	2003-12-24 18:47:43 +00:00
Robert Watson	69546b2fbb	Document that when we are addressing an open()/close() race, the reason we call vn_close() manually rather than letting fdrop() take care of it is that we haven't yet hooked up the various 'struct file' fields.	2003-12-24 17:13:01 +00:00
Alfred Perlstein	1805ed0772	Introduce mp_maxcpus which can be used by libkvm utils to find out how many CPUs the system was compiled for. Export the variable via a sysctl node 'kern.smp.maxcpus' as well.	2003-12-23 13:54:16 +00:00
Peter Wemm	2c74309622	Regen - this should be essentially a NOP, except for rcsid changes.	2003-12-23 03:52:14 +00:00
Peter Wemm	eec525a435	Remove namespc column and attempt to un-fold some of the longer lines that now fit.	2003-12-23 03:51:36 +00:00
Peter Wemm	1a58b07149	Remove the namespace column from the syscalls tables. We don't actually use it, if we ever did. They have been been VERY poorly maintained for some time, possibly because they were a NOP. FWIW, This brings our table formats back closer to the other *BSD's.	2003-12-23 03:50:43 +00:00
Peter Wemm	9b68618df0	Add an additional field to the elf brandinfo structure to support quicker exec-time replacement of the elf interpreter on an emulation environment where an entire /compat/* tree isn't really warranted.	2003-12-23 02:42:39 +00:00
Peter Wemm	a89ec05e3e	Catch a few places where NULL (pointer) was used where 0 (integer) was expected.	2003-12-23 02:36:43 +00:00
Peter Wemm	55cdddc0d8	Don't use NULL (pointer) when we mean 0 (integer) for the number of ticks in msleep.	2003-12-23 02:28:42 +00:00
Jeff Roberson	249e0bea8f	- Make our transfer decisions based on load and not transferable load. A cpu could have been bogged down with non-transferable load and still not migrated a new thread to an idle cpu. This required some benchmarking and tuning to get right as the comment above it suggests.	2003-12-20 22:35:20 +00:00
Jeff Roberson	e7a976f415	- Enable ithread migration on x86. This is done to work around a bug in the IO APIC on Xeons that prevents round-robin interrupt assignment from working.	2003-12-20 20:36:19 +00:00
Alan Cox	96a7b42213	Remove a variable that has been initialized but otherwise unused since revision 1.315.	2003-12-20 19:46:21 +00:00
Jeff Roberson	670c524f08	- In kseq_transfer() return if smp has not been started. - In sched_add(), do the idle check prior to the transfer check so that we don't try to transfer load from an idle cpu. This fixes panics caused by IPIs on UP machines running SMP kernels. Reported/Debugged by: seanc	2003-12-20 14:03:14 +00:00
Jeff Roberson	9b5f6f623d	- Running interactive tasks with the minimum time-slice is fine for vi and sh, but not so great for mozilla, X, etc. Add a fixed define for the slice size granted to interactive KSEs.	2003-12-20 12:54:35 +00:00
Tim J. Robbins	f5925b7436	Reduce the overhead of semop() by using the kernel stack instead of malloc'd memory to store the operations array if it is small enough to fit.	2003-12-19 13:07:17 +00:00
John Baldwin	eb5b0e0565	Various style fixes. Submitted by: bde (mostly, if not all)	2003-12-17 21:13:04 +00:00
Jeff Roberson	958557e9c7	- In vget() if LK_NOWAIT is specified we should return EBUSY and not ENOENT. Submitted by: Stephan Uphoff <ups@stups.com>	2003-12-16 17:08:27 +00:00
Jeff Roberson	d85213669b	- When doing a forced unmount, VFS attempts to keep VCHR vnodes valid by reassigning their v_ops field to specfs, detaching from the mountpoint, etc. However, this is not sufficient. If we vclean() the vnode the pages owned by the vnode are lost, potentially while buffers reference them. Implement parts of vclean() seperately in vgonechrl() so that the pages and bufs associated with a device vnode are not destroyed while in use.	2003-12-16 17:05:05 +00:00
Bruce M Simpson	5406529771	style(9) pass and type fixups. Submitted by: bde	2003-12-16 14:13:47 +00:00
Bruce M Simpson	37621fd5d9	Push m_apply() and m_getptr() up into the colleciton of standard mbuf routines, and purge them from opencrypto. Reviewed by: sam Obtained from: NetBSD Sponsored by: spc.org	2003-12-15 21:49:41 +00:00
Jeff Roberson	86e1c22aa4	- Assign the ke_cpu field in kseq_notify() so that all of our callers do not have to do it. - Set the ke_runq to NULL in sched_add() before calling kseq_notify(). Otherwise we may panic in sched_add() if INVARIANTS is on.	2003-12-14 02:06:29 +00:00
Robert Watson	09a4a69c1d	Although sometimes to the uninitiated, it may seem like goup, KSEGOUP is actually spelt KSEGROUP. Go figure. Reported by: samy@kerneled.com	2003-12-12 21:25:56 +00:00
Jeff Roberson	cac77d0422	- Now that we have kseq groups, balance them seperately. - The new sched_balance_groups() function does intra-group balancing while sched_balance() balances the available groups. - Pick a random time between 0 ticks and hz * 2 ticks to restart each balancing process. Each balancer has its own timeout. - Pick a random place in the list of groups to start the search for lowest and highest group loads. This prevents us from prefering a group based on numeric position. - Use a nasty hack to stop us from preferring cpu 0. The problem is that softclock always runs on cpu 0, so it always has a little extra load. We ignore this load in the balancer for now. In the future softclock should run on a random cpu and these hacks can go away.	2003-12-12 07:33:51 +00:00
Jeff Roberson	2e227f0406	- Don't let the pctcpu rate limiter throttle us if we have recorded over SCHED_CPU_TICKS ticks. This was allowing processes to display (1/SCHED_CPU_TIME * 100) % more cpu than they had used.	2003-12-11 04:23:39 +00:00
Jeff Roberson	b11fdad0fc	- In sched_switch(), if a thread has been assigned, don't touch the runqueues or load. These things have already been taken care of in sched_bind() which should be the only place that we're switching in an assigned thread.	2003-12-11 04:00:49 +00:00
Jeff Roberson	80f86c9f88	- Add support for CPU groups to ule. All SMT cores on the same physical cpu are added to a group. - Don't place a cpu into the kseq_idle bitmask until all cpus in that group have idled. - Prefer idle groups over idle group members in the new kseq_transfer() function. In this way we will prefer to balance load across full cores rather than add further load a partial core. - Before a cpu goes idle, check the other group members for threads. Since SMT cpus may freely share threads, this is cheap. - SMT cores may be individually pinned and bound to now. This contrasts the old mechanism where binding or pinning would have allowed a thread to run on any available cpu. - Remove some unnecessary logic from sched_switch(). Priority propagation should be properly taken care of in sched_prio() now.	2003-12-11 03:57:10 +00:00
Peter Wemm	5be4b10c89	Regen	2003-12-10 22:18:54 +00:00
Peter Wemm	5352eb6bb1	Update file locations for syscall tables to copy to.	2003-12-10 22:08:37 +00:00
Marcel Moolenaar	ccb46feb8e	Write the thread pointer (val) in the kse mailbox (loc) before we set the new context in kse_switchin(2). This allows us to return an error to the calling context when the suword() fails.	2003-12-10 01:59:23 +00:00
John Baldwin	67ba867827	Adjust an assertion for the TDF_TSNOBLOCK race handling in turnstile_unpend(). A racing thread that does not have TDI_LOCK set may either be running on another CPU or it may be sitting on a run queue if it was preempted during the very small window in turnstile_wait() between unlocking the turnstile chain lock and locking sched_lock.	2003-12-09 21:14:31 +00:00
John Baldwin	da1d503b22	Assert that the we never give a thread a NULL turnstile when waking it up.	2003-12-09 21:09:54 +00:00
John Baldwin	6b6bd95ee5	Revert the previous race fix and replace it with a more general fix. The case of a turnstile having no threads is just one instance of the more general case where the thread we are examining has been partially awakened already in that it has been removed from the turnstile's blocked list but still has TDI_LOCK set. We detect that case by checking to see if the thread has already had a turnstile reassigned to it.	2003-12-09 21:09:04 +00:00
David Xu	a9a48d6862	Lock and unlock sched_lock when walking through thread list, current we insert kse upcall thread into thread list at mi_switch time, process lock is not enough.	2003-12-07 23:47:15 +00:00
Don Lewis	50105bcf1a	Pass MTX_DEF as the last argument to mtx_init() instead of 0. This is not a functional change. The code happened to work properly only because MTX_DEF is defined as 0.	2003-12-07 21:53:41 +00:00
Poul-Henning Kamp	377e7be416	Make the DIAGNOSTIC code which complains about long {call\|time}out(9) functions less noisy: We printf if a new function took longer than the previous record holder, or of the previous record holder took more than twice as long as the current record.	2003-12-07 20:03:28 +00:00
Marcel Moolenaar	cfa4b1e7b1	Regen due to kse_switchin(2).	2003-12-07 19:36:16 +00:00
Marcel Moolenaar	702b2a179c	Add kse_switchin(2). This syscall can be used by KSE implementations to have the kernel switch to a new thread, instead of doing it in userland. It is in fact needed on ia64 where syscall restarts do not return to userland first. It's completely handled inside the kernel. As such, any context created by the kernel as part of an upcall and caused by some syscall needs to be restored by the kernel.	2003-12-07 19:34:29 +00:00
Peter Wemm	a2640c9ba9	rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64). Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad things happen(TM). This is why beast.freebsd.org paniced with ULE. Reviewed by: jeff	2003-12-07 09:57:51 +00:00
Scott Long	774114995e	Re-arrange and consolidate some random debugging stuff	2003-12-07 05:04:49 +00:00
Alan Cox	bca62663ab	- Giant is no longer required by vm_thread_new().	2003-12-07 04:16:49 +00:00
Robert Watson	56d9e93207	Rename mac_create_cred() MAC Framework entry point to mac_copy_cred(), and the mpo_create_cred() MAC policy entry point to mpo_copy_cred_label(). This is more consistent with similar entry points for creation and label copying, as mac_create_cred() was called from crdup() as opposed to during process creation. For a number of policies, this removes the requirement for special handling when copying credential labels, and improves consistency. Approved by: re (scottl) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-12-06 21:48:03 +00:00
John Baldwin	b6c71225a9	Fix all users of mp_maxid to use the same semantics, namely: 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha	2003-12-03 14:57:26 +00:00
John Baldwin	45c1c90f6a	Export a few SMP related symbols in UP kernels as well. This is needed to aid other kernel code, especially code which can be in a module such as the acpi_cpu(4) driver, to work properly with both SMP and UP kernels. The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and the smp_rendezvous() function. This also means that CPU_ABSENT() is now always implemented the same on all kernels. Approved by: re (scottl)	2003-12-03 14:55:31 +00:00
David Greenman	186e347f2c	Fixed a bug in sendfile(2) where the sent data would be corrupted due to sendfile(2) being erroneously automatically restarted after a signal is delivered. Fixed by converting ERESTART to EINTR prior to exiting. Updated manual page to indicate the potential EINTR error, its cause and consequences. Approved by: re@freebsd.org	2003-12-01 22:12:50 +00:00
Ian Dowse	25cb5d7a6b	In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the forced unmount case. Otherwise, a file system that is referenced only by process fd_cdir/fd_rdir references to the file system root vnode will be successfully unmounted without the MNT_FORCE flag. The previous behaviour was not compatible with the unmount semantics required by amd(8), so file systems could be unexpectedly unmounted while there were still references to the file system root directory. Reported by: Erez Zadok <ezk@cs.sunysb.edu> Approved by: re (scottl)	2003-11-30 23:30:09 +00:00
Jeff Roberson	a6c6a93c89	- Don't forget to unlock the vnode interlock in the LK_NOWAIT case. Submitted by: Stephan Uphoff <ups@stups.com> Approved by: re (rwatson)	2003-11-30 22:09:58 +00:00
Alexander Kabaev	97c43a540a	Do not attempt to destroy NULL vfs options list. Approved by: re (scottl) Reported by: Christian Laursen <xi atborderworlds dot dk>	2003-11-23 17:13:48 +00:00
John Baldwin	798a45964d	- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid. cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is actually present and sets mp_ncpus and all_cpus. Splitting these up allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the CPU probing code to live in a module, for example, since modules sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is needed to re-enable the ACPI module on i386. - For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating its contents in a few places. Also, add a smp_cpu_enabled() function to avoid duplicating some code. There is room for further code reduction later since much of this code is also present in cpu_mp_start(). - All archs besides i386 still set mp_maxid to the same values they set it to before this change. i386 now sets mp_maxid to MAXCPU. Tested on: alpha, amd64, i386, ia64, sparc64 Approved by: re (scottl)	2003-11-21 22:23:26 +00:00
Mark Murray	4e3a7a14d9	Fix a major faux pas of mine. I was causing 2 very bad things to happen in interrupt context; 1) sleep locks, and 2) malloc/free calls. 1) is fixed by using spin locks instead. 2) is fixed by preallocating a FIFO (implemented with a STAILQ) and using elements from this FIFO instead. This turns out to be rather fast. OK'ed by: re (scottl) Thanks to: peter, jhb, rwatson, jake Apologies to: *	2003-11-20 15:35:48 +00:00
Mark Murray	3fed54aaaa	Hackfix to patch around a kernel panic I introduced. Real fix to follow. In the meanwhile, we are not harvesting interrupt entropy. Approved by: re (jhb)	2003-11-18 14:35:43 +00:00
Robert Watson	a557af222b	Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-18 00:39:07 +00:00
Robert Watson	64d19c2ea7	Add a sysctl, security.bsd.see_other_gids, similar in semantics to see_other_uids but with the logical conversion. This is based on (but not identical to) the patch submitted by Samy Al Bahra. Submitted by: Samy Al Bahra <samy@kerneled.com>	2003-11-17 20:20:53 +00:00
Peter Wemm	0d2a298904	Initial landing of SMP support for FreeBSD/amd64. - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.	2003-11-17 08:58:16 +00:00
Jeff Roberson	fa9c971710	- Mark ksq_assigned as volatile so that when this code is used without sched_lock we can be sure that we'll pick up the new value.	2003-11-17 08:27:11 +00:00
Jeff Roberson	093c05e39d	- Remove long dead code. rslices hasn't been used in some time and neither has sched_pickcpu().	2003-11-17 08:24:14 +00:00
Peter Wemm	90e3387e54	Expand the argument to the ithread enable/disable helper hooks from an int to something big enough to hold a pointer. amd64 needs this.	2003-11-17 06:08:10 +00:00
Robert Watson	b0323ea3aa	Implement sockets support for __mac_get_fd() and __mac_set_fd() system calls, and prefer these calls over getsockopt()/setsockopt() for ABI reasons. When addressing UNIX domain sockets, these calls retrieve and modify the socket label, not the label of the rendezvous vnode. - Create mac_copy_socket_label() entry point based on mac_copy_pipe_label() entry point, intended to copy the socket label into temporary storage that doesn't require a socket lock to be held (currently Giant). - Implement mac_copy_socket_label() for various policies. - Expose socket label allocation, free, internalize, externalize entry points as non-static from mac_net.c. - Use mac_socket_label_set() in __mac_set_fd(). MAC-aware applications may now use mac_get_fd(), mac_set_fd(), and mac_get_peer() to retrieve and set various socket labels without directly invoking the getsockopt() interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 23:31:45 +00:00
Robert Watson	9e71dd0feb	Reduce gratuitous redundancy and length in function names: mac_setsockopt_label_set() -> mac_setsockopt_label() mac_getsockopt_label_get() -> mac_getsockopt_label() mac_getsockopt_peerlabel_get() -> mac_getsockopt_peerlabel() Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 18:25:20 +00:00
Alan Cox	e45db9b837	- Modify alpha's sf_buf implementation to use the direct virtual-to- physical mapping. - Move the sf_buf API to its own header file; make struct sf_buf's definition machine dependent. In this commit, we remove an unnecessary field from struct sf_buf on the alpha, amd64, and ia64. Ultimately, we may eliminate struct sf_buf on those architecures except as an opaque pointer that references a vm page.	2003-11-16 06:11:26 +00:00
Robert Watson	12cbb9dc56	When implementing getsockopt() for SO_LABEL and SO_PEERLABEL, make sure to sooptcopyin() the (struct mac) so that the MAC Framework knows which label types are being requested. This fixes process queries of socket labels. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 03:53:36 +00:00
Bruce Evans	416ab90e6b	Localized the cy driver's locking.	2003-11-16 00:55:54 +00:00
Poul-Henning Kamp	d87526cf43	Rename the debugging mutex "callout_no_sleep" to "dont_sleep_in_callout".	2003-11-15 18:33:54 +00:00
Tim J. Robbins	4d93f53e74	Initialize sequence numbers to 0 in seminit() instead of using whatever garbage happens to be in memory. This did not seem to cause any problems except making semaphore ID's unpredictable (and ugly in ipcs(1) output).	2003-11-15 11:56:53 +00:00
Poul-Henning Kamp	00cbe31bd8	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
Alan Cox	28c9416429	- Remove the remaining now unnecessary checks for the buf's b_object being NULL. See revision 1.421 for more detail. - Remove GIANT_REQUIRED from vfs_unbusy_pages(). Discussed with: jeff	2003-11-15 08:45:36 +00:00
Jeff Roberson	155b9987a3	- Introduce kseq_runq_{add,rem}() which are used to insert and remove kses from the run queues. Also, on SMP, we track the transferable count here. Threads are transferable only as long as they are on the run queue. - Previously, we adjusted our load balancing based on the transferable count minus the number of actual cpus. This was done to account for the threads which were likely to be running. All of this logic is simpler now that transferable accounts for only those threads which can actually be taken. Updated various places in sched_add() and kseq_balance() to account for this. - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're really doing. The load is accounted for seperately from the runq because the load is accounted for even as the thread is running. - Fix a bug in sched_class() where we weren't properly using the PRI_BASE() version of the kg_pri_class. - Add a large comment that describes the impact of a seemingly simple conditional in sched_add(). - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE() prior to checking kseq_idle. This reduces the frequency of access for kseq_idle which is a shared resource.	2003-11-15 07:32:07 +00:00
Olivier Houchard	1a29c80648	Better fix than my previous commit: in exit1(), make sure the p_klist is empty after sending NOTE_EXIT. The process won't report fork() or execve() and won't be able to handle NOTE_SIGNAL knotes anyway. This fixes some race conditions with do_tdsignal() calling knote() while the process is exiting. Reported by: Stefan Farfeleder <stefan@fafoe.narf.at> MFC after: 1 week	2003-11-14 18:49:01 +00:00
Alexander Kabaev	3b39740df8	Fix a number of style(9) bugs introduced in r1.113 by me. Suggested by: bde	2003-11-14 05:27:41 +00:00
Jeff Roberson	808674fd0e	- regen.	2003-11-14 03:49:41 +00:00
Jeff Roberson	5c49a0566a	- Revision 1.156 marked ptrace() SMP safe. Unfortunately, alpha implements parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and giant must be acquired prior to the proc lock, so ptrace must require giant still.	2003-11-14 03:48:37 +00:00
Poul-Henning Kamp	555a5de270	Various minor details: Give the HZ/overflow check a 10% margin. Eliminate bogus newline. If timecounters have equal quality, prefer higher frequency. Some inspiration from: bde	2003-11-13 10:03:58 +00:00
John Baldwin	79a13d0182	- Close a race where a thread on another CPU could release a contested lock and empty its turnstile while the blocking threads still pointed to the turnstile. If the thread on the first CPU blocked on a lock owned by one of the threads blocked on the turnstile just woken up, then the first CPU could try to manipulate a bogus thread queue in the turnstile during priority propagation. - Update locking notes for ts_owner and always clear ts_owner, not just under INVARIANTS. Tested by: sam (1)	2003-11-12 23:48:42 +00:00
Kirk McKusick	48b0f4b67d	At the request of several developers, restore the DIAGNOSIC code deleted in 1.81. Increase the initial timeout limit to 2ms to eliminate spurious messages of excessive timeouts in the NFS client code. Requested by: Poul-Henning Kamp <phk@phk.freebsd.dk> Requested by: Mike Silbersack <silby@silby.com> Requested by: Sam Leffler <sam@errno.com>	2003-11-12 22:28:27 +00:00
Robert Watson	f0ab044241	Mark __mac_get_pid() as MPSAFE in the comment, as it runs without Giant and is also MPSAFE. Push Giant further down into __mac_get_fd() and __mac_set_fd(), grabbing it only for constrained regions dealing with VFS, and dropping it entirely for operations related to labeling of pipes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 22:19:15 +00:00
Peter Wemm	cde6302bf0	MNAMELEN is back to an int again after Kirk's statfs commit kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4) *** Error code 1	2003-11-12 17:09:12 +00:00
John Baldwin	861a7db56f	Fix a typo in a comment. Submitted by: das	2003-11-12 14:55:45 +00:00
Poul-Henning Kamp	1415a09d42	Replace B_PHYS conditional assignment to bio_offset with KASSERT check to see that the originating code already did it right.	2003-11-12 10:27:06 +00:00
Kirk McKusick	1977597b34	Update the five files derived from /sys/kern/syscalls.master after the additions made for the new statfs structure (version 1.157). These must be updated in a separate checkin after syscalls.master has been checked in so that they reflect its new CVS identity. As these are purely derived files, it is not clear to me why they are under CVS at all. I presume that it has something to do with having `make world' operate properly.	2003-11-12 08:09:19 +00:00
Kirk McKusick	fde81c7d8e	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
Robert Watson	eca8a663d4	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 03:14:31 +00:00
Alexander Kabaev	5c957adbf1	1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson	2003-11-12 02:54:47 +00:00
John Baldwin	961a7b244d	Add an implementation of turnstiles and change the sleep mutex code to use turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP	2003-11-11 22:07:29 +00:00
Joseph Koshy	a5896914f0	Bound the number of iterations a thread can perform inside ktr_resize_pool(); this eliminates a potential livelock. Return ENOSPC only if we encountered an out-of-memory condition when trying to increase the pool size. Reviewed by: jhb, bde (style)	2003-11-11 09:09:26 +00:00
Joseph Koshy	b10221ffd9	Have utrace(2) return ENOMEM if malloc() fails. Document this error return in its manual page. Reviewed by: jhb	2003-11-11 04:54:11 +00:00
Alan Cox	e35e0182c3	- Revision 1.469 of vfs_subr.c resulted in the buf's b_object field being consistency initialized. Consequently, a number of conditionals that checked the validity of b_object before passing it to VM_OBJECT_LOCK() and VM_OBJECT_UNLOCK() are no longer needed.	2003-11-11 04:45:37 +00:00
Robert Watson	c8e7bf92ad	Whitespace sync to MAC branch, expand comment at the head of the file.	2003-11-11 03:40:04 +00:00
Alfred Perlstein	cd3c61b93d	Fix a bug where the taskqueue kproc was being parented by init because RFNOWAIT was being passed to kproc_create. The result was that shutdown took quite a bit longer because this errant "child" would not respond to termination signals from init at system shutdown. RFNOWAIT dissassociates itself from the caller by attaching to init as a parent proc. We could have had the taskqueue proc listen for SIGKILL, but being able to SIGKILL a potentially critical system process doesn't seem like a good idea.	2003-11-10 20:39:44 +00:00
Tim J. Robbins	541c3b66b5	When there are no free sem_undo structs available in semu_alloc(), only free one sem_undo with un_cnt == 0 instead of all of them. This is a temporary workaround until the SLIST_FOREACH_PREVPTR loop gets fixed so that it doesn't cause cycles in semu_list when removing multiple adjacent items. It might be easier to just use (doubly-linked) LISTs here instead of complicated SLIST code to achieve O(1) removals. This bug manifested itself as a complete lockup under heavy semaphore use by multiple processes with the SEM_UNDO flag set. PR: 58984	2003-11-10 07:22:41 +00:00
Marcel Moolenaar	fcaa2925a9	Change the clear_ret argument of get_mcontext() to be a flags argument. Since all callers either passed 0 or 1 for clear_ret, define bit 0 in the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI code for possible (but unlikely) future use. The remaining bits are for use by MD code. This change is triggered by a need on ia64 to have another knob for get_mcontext().	2003-11-09 20:31:04 +00:00
Bruce Evans	b698380f33	Quick fix for scaling of statclock ticks in the SMP case. As explained in the log message for kern_sched.c 1.83 (which should have been repo-copied to preserve history for this file), the (4BSD) scheduler algorithm only works right if stathz is nearly 128 Hz. The old commit lock said 64 Hz; the scheduler actually wants nearly 16 Hz but there was a scale factor of 4 to give the requirement of 64 Hz, and rev.1.83 changed the scale factor so that the requirement became 128 Hz. The change of the scale factor was incomplete in the SMP case. Then scheduling ticks are provided by smp_ncpu CPUs, and the scheduler cannot tell the difference between this and 1 CPU providing scheduling ticks smp_ncpu times faster, so we need another scale factor of smp_ncp or an algorithm change. This quick fix uses the scale factor without even trying to optimize the runtime divisions required for this as is done for the other scale factor. The main algorithmic problem is the clamp on the scheduling tick counts. This was 295; it is now approximately 295 * smp_ncpu. When the limit is reached, threads get free timeslices and scheduling becomes very unfair to the threads that don't hit the limit. The limit can be reached and maintained in the worst case if the load average is larger than (limit / effective_stathz - 1) / 2 = 0.65 now (was just 0.08 with 2 CPUs before this change), so there are algorithmic problems even for a load average of 1. Fortunately, the worst case isn't common enough for the problem to be very noticeable (it is mainly for niced CPU hogs competing with less nice CPU hogs).	2003-11-09 13:45:54 +00:00
Seigo Tanimura	512824f8f7	- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current	2003-11-09 09:17:26 +00:00
Sam Leffler	7902224c6b	o add a flags parameter to netisr_register that is used to specify whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation	2003-11-08 22:28:40 +00:00
David Xu	685a6c448a	Return a reasonable number for top or ps to display for M:N thread, since there is no direct association between M:N thread and kse, sometimes, a thread does not have a kse, in that case, return a pctcpu from its last kse, it is not perfect, but gives a good number to be displayed.	2003-11-08 03:03:17 +00:00
John Baldwin	dac33f12cc	Regen.	2003-11-07 20:30:30 +00:00
John Baldwin	c055e5d412	Mark ptrace(), ktrace(), utrace(), sysarch(), and issetugid() as MP safe. The parts of these calls that are not yet MP safe acquire Giant explicitly.	2003-11-07 20:23:23 +00:00
Robert Watson	a2f88a8b7c	Slight whitespace consistency improvement: Trim trailing whitespace. Remove unmatched " " before ")".	2003-11-07 04:47:14 +00:00
Jeff Roberson	f28b3340c1	- Somehow I botched my last commit. Add an extra ( to fix things up. I'm still not sure how this happened. Reported by: ps	2003-11-06 07:56:01 +00:00
Alan Cox	3b2c54e7bc	- Delay the allocation of memory for the pipe mutex until we need it. This avoids the need to free said memory in various error cases along the way.	2003-11-06 05:58:26 +00:00
Alan Cox	fc17df5264	- Simplify pipespace() by eliminating the explicit creation of vm objects. Instead, let the vm objects be lazily instantiated at fault time. This results in the allocation of fewer vm objects and vm map entries due to aggregation in the vm system.	2003-11-06 05:08:12 +00:00
Robert Watson	83b7b0edca	Remove the flags argument from mac_externalize_*_label(), as it's not passed into policies or used internally to the MAC Framework. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-06 03:42:43 +00:00
Jeff Roberson	a70d729bff	- Remove the local definition of sched_pin and unpin. They are provided in sched.h now. - Respect the td pin count.	2003-11-06 03:09:51 +00:00
Sam Leffler	d3be1471c7	o make debug_mpsafenet globally visible o move it from subr_bus.c to netisr.c where it more properly belongs o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to grab Giant as needed when MPSAFE operation is enabled Supported by: FreeBSD Foundation	2003-11-05 23:42:51 +00:00
Warner Losh	252af39a96	Minor style(9) nit	2003-11-05 06:14:48 +00:00
Jeff Roberson	46f8b26550	- It's ok if sched_runnable() has races in it, we don't need the sched_lock here unless we have something on the assigned queue.	2003-11-05 05:30:12 +00:00
Alexander Kabaev	ca430f2e92	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
Max Khon	2332251c6a	Back out the following revisions: 1.36 +73 -60 src/sys/compat/linux/linux_ipc.c 1.83 +102 -48 src/sys/kern/sysv_shm.c 1.8 +4 -0 src/sys/sys/syscallsubr.h That change was intended to support vmware3, but wantrem parameter is useless because vmware3 uses SYSV shared memory to talk with X server and X server is native application. The patch worked because check for wantrem was not valid (wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments). Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set to 1 allows to return removed segments in shm_find_segment_by_shmid() and shm_find_segment_by_shmidx(). MFC after: 1 week	2003-11-05 01:53:10 +00:00
Kirk McKusick	b932dd9b28	Get rid of DIAGNOSTIC that gives false positives on slow CPUs.	2003-11-04 08:03:11 +00:00
Jeff Roberson	9bacd788a1	- Add initial support for pinning and binding.	2003-11-04 07:45:41 +00:00
Kirk McKusick	15a93fcc31	Allow the bufdaemon and update daemon processes to skip the waitrunningbufspace() calls so that they are always able to proceed and clean up buffer space. Submitted by: Brian Fundakowski Feldman <green@freebsd.org>	2003-11-04 06:30:00 +00:00
Sam Leffler	3465702f13	disable MPSAFE network drivers; we aren't ready yet`	2003-11-04 02:01:42 +00:00
Olivier Houchard	7922cdc855	I believe kbyanc@ really meant this in rev 1.58. Use zpfind() to see if the process became a zombie if pfind() doesn't find it and if the caller wants to know about process death, so that the caller knows the process died even if it happened before the kevent was actually registered. MFC after: 1 week	2003-11-04 01:41:47 +00:00
Olivier Houchard	f44004690c	Do not attempt to report proc event if NOTE_EXIT has already been received. This fixes a race condition (specifically with signal events) that could lead to the kn being re-inserted into the list after it has been destroyed, which is not something we want to happen. PR: kern/58258	2003-11-04 01:14:58 +00:00
John Baldwin	8bc0846476	Don't require INTR_FAST handlers to be exclusive in the MI layer. Instead, let the MD code choose whether or not to implement such a policy. The new i386 interrupt code allows multiple FAST handlers for a given source for example. However, the code does not allow FAST and non-FAST handlers to be mixed.	2003-11-03 22:42:58 +00:00
John Baldwin	b95bb3e62b	Update spin lock order list for new i386 interrupt and SMP code.	2003-11-03 22:38:30 +00:00
Robert Watson	730ecf8254	Unlock pipe mutex when failing MAC pipe ioctl access control check. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-03 17:58:23 +00:00
Jeff Roberson	112b6d3aa9	- Remove kseq_find(), we no longer scan other cpu's run queues when we go idle. They figure out that we're idle fast enough that the cache pollution introduces by scanning their run queue is more expensive than waiting a little longer. - Add kseq_setidle() to mark us as being idle. Use this in place of kseq_find(). - Remove kseq_load_highest(), kseq_find() was the only consumer of this interface. kseq_balance() has it's own customized version that finds the lowest and highest loads simultaneously. Continuously told that this would be faster by: terry	2003-11-03 03:27:22 +00:00
Jeff Roberson	ef1134c9ad	- Remove the ksq_loads[] array. We are only interested in three counts, the total load, the timeshare load, and the number of threads that can be migrated to another cpu. Account for these seperately. - Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE can be migrated to another CPU. Currently, this only checks to see if we're an interrupt handler. Eventually this will also be used to support CPU binding.	2003-11-02 10:56:48 +00:00
Alexander Kabaev	cb9ddc80ae	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.	2003-11-02 04:52:53 +00:00
Jeff Roberson	769a363537	- In sched_prio() only force us onto the current queue if our priority is being elevated (numerically smaller).	2003-11-02 04:25:59 +00:00
Jeff Roberson	7d1a81b4dc	- Rename SCHED_PRI_NTHRESH to SCHED_SLICE_NTHRESH since it is only used in slice assignment. Add a comment describing what it does. - Remove a stale XXX comment, the nice should not impact the interactivity, nice adjustments only effect non-interactive tasks in ULE. - Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice +20 tasks as intended.	2003-11-02 04:10:15 +00:00
Jeff Roberson	a0a931cec7	- Remove uses of PRIO_TOTAL and replace them with SCHED_PRI_NRESV - SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we do not have to account for it in the few places that we use it. Requested by: bde	2003-11-02 03:49:32 +00:00
Jeff Roberson	d322132c62	- Change sched_interact_update() to only accept slp+runtime values between 0 and SCHED_SLP_RUN_MAX * 2. This allows us to simplify the algorithm quite a bit. Before, it dealt with arbitrary values which required us to do nasty integer division tricks that didn't quite work out correctly. - Chnage sched_wakeup() to detect conditions where the slp+runtime could exceed SCHED_SLP_RUN_MAX * 2. This can happen if we go to sleep for longer than 6 seconds. In this case, we'll just clear the runtime and set the sleep time to the max. - Define a new function, sched_interact_fork() which updates the slp+runtime of a newly forked thread. We want to limit the amount of history retained from the parent so that we learn the child's behavior quickly. We don't, however want to decay it to nothing. Previously, we would simply divide each parameter by 100 whenever we forked. After a few forks the values would reach 0 and tasks would not be considered interactive. - Add another KTR entry, cleanup some existing entries. - Remove a useless sched_interact_update() from sched_priority(). This is already done by the callers that require it.	2003-11-02 03:36:33 +00:00
Alexander Kabaev	492c1e68fb	Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff	2003-11-01 05:51:54 +00:00
Jeff Roberson	22bf7d9a0e	- Add static to local functions and data where it was missing. - Add an IPI based mechanism for migrating kses. This mechanism is broken down into several components. This is intended to reduce cache thrashing by eliminating most cases where one cpu touches another's run queues. - kseq_notify() appends a kse to a lockless singly linked list and conditionally sends an IPI to the target processor. Right now this is protected by sched_lock but at some point I'd like to get rid of the global lock. This is why I used something more complicated than a standard queue. - kseq_assign() processes our list of kses that have been assigned to us by other processors. This simply calls sched_add() for each item on the list after clearing the new KEF_ASSIGNED flag. This flag is used to indicate that we have been appeneded to the assigned queue but not added to the run queue yet. - In sched_add(), instead of adding a KSE to another processor's queue we use kse_notify() so that we don't touch their queue. Also in sched_add(), if KEF_ASSIGNED is already set return immediately. This can happen if a thread is removed and readded so that the priority is recorded properly. - In sched_rem() return immediately if KEF_ASSIGNED is set. All callers immediately readd simply to adjust priorites etc. - In sched_choose(), if we're running an IDLE task or the per cpu idle thread set our cpumask bit in 'kseq_idle' so that other processors may know that we are idle. Before this, make a single pass through the run queues of other processors so that we may find work more immediately if it is available. - In sched_runnable(), don't scan each processor's run queue, they will IPI us if they have work for us to do. - In sched_add(), if we're adding a thread that can be migrated and we have plenty of work to do, try to migrate the thread to an idle kseq. - Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into consideration. - No longer use kseq_choose() to steal threads, it can lose it's last argument. - Create a new function runq_steal() which operates like runq_choose() but skips threads based on some criteria. Currently it will not steal PRI_ITHD threads. In the future this will be used for CPU binding. - Create a kseq_steal() that checks each run queue with runq_steal(), use kseq_steal() in the places where we used kseq_choose() to steal with before.	2003-10-31 11:16:04 +00:00
John Baldwin	e57ea233d9	Ensure that mp_ncpus is set to 1 if mp_cpu_probe() fails.	2003-10-30 21:44:01 +00:00
Alexander Kabaev	0823d2996c	Relock mntvnode_mtx if vget fails in vfs_stdsync. The loop is always shoould entered with mutex locked.	2003-10-30 16:22:51 +00:00
David Xu	7eeaaf9b97	Try to fetch thread mailbox address in page fault trap, so when thread blocks in page fault hanlder, and upcall thread can be scheduled. It is useful if process is doing lots of mmap based I/O.	2003-10-30 02:55:43 +00:00
Sam Leffler	90fc7b7cb8	Add a temporary mechanism to disble INTR_MPSAFE from network interface drivers. This is prepatory to running more parts of the network system w/o Giant.	2003-10-29 18:29:50 +00:00
Bruce Evans	b3aeaf2ed1	Removed mostly-dead code for setting switchtime after the idle loop clobbers this variable. Long ago, when the idle loop wasn't in a process, it set switchtime.tv_sec to zero to indicate that the time needs to be read after the idle loop finishes. The special case for this isn't needed now that there is an idle process (for each CPU). The time is read in the normal way when the idle process is switched away from. The seconds component of the time is only zero for the first second after the uptime is set, and the mostly-dead code was only executed during this time. (This was slightly broken by using uptimes instead of times relative to the Epoch -- in the original version the seconds component of the time was only 0 for the first second after the Epoch.) In mi_switch(), moved the setting of switchticks to just after the first (and now only) setting of switchtime. This setting used to be delayed since a late setting was needed for the idle case and an early setting was not needed. Now the early setting is needed so that fork_exit() doesn't need to set either switchtime or switchticks. Removed now-completely-rotted comment attached to this. Most of the code described by the comment had already moved to sched_switch().	2003-10-29 15:23:09 +00:00
Bruce Evans	89674a9f77	Removed sched_nest variable in sched_switch(). Context switches always begin with sched_lock held but not recursed, so this variable was always 0. Removed fixup of sched_lock.mtx_recurse after context switches in sched_switch(). Context switches always end with this variable in the same state that it began in, so there is no need to fix it up. Only sched_lock.mtx_lock really needs a fixup. Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion that sched_lock is owned and not recursed after it is fixed up. This assertion much match the one in mi_switch(), and if sched_lock were recursed then a non-null fixup of sched_lock.mtx_recurse would probably be needed again, unlike in sched_switch(), since fork_exit() doesn't return to its caller in the normal way.	2003-10-29 14:40:41 +00:00
Sam Leffler	9c855a36c1	Introduce the notion of "persistent mbuf tags"; these are tags that stay with an mbuf until it is reclaimed. This is in contrast to tags that vanish when an mbuf chain passes through an interface. Persistent tags are used, for example, by MAC labels. Add an m_tag_delete_nonpersistent function to strip non-persistent tags from mbufs and use it to strip such tags from packets as they pass through the loopback interface and when turned around by icmp. This fixes problems with "tag leakage". Pointed out by: Jonathan Stone Reviewed by: Robert Watson	2003-10-29 05:40:07 +00:00
Sam Leffler	395bb18680	speedup stream socket recv handling by tracking the tail of the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)	2003-10-28 05:47:40 +00:00
Jeff Roberson	1aca9909e5	- Only change the run queue in sched_prio() if the kse is non null. threads can be in the TD_ON_RUNQ state and not have an associated kse. - Remove the PRI_IDLE special case from sched_clock(), it was not actually necessary.	2003-10-28 03:28:48 +00:00
Jeff Roberson	eab9cabf34	- Don't set td_priority directly here, use sched_prio().	2003-10-27 07:15:47 +00:00
Jeff Roberson	3f741ca117	- Use a better algorithm in sched_pctcpu_update() Contributed by: Thomaswuerfl@gmx.de - In sched_prio(), adjust the run queue for threads which may need to move to the current queue due to priority propagation . - In sched_switch(), fix style bug introduced when the KSE support went in. Columns are 80 chars wide, not 90. - In sched_switch(), Fix the comparison in the idle case and explicitly re-initialize the runq in the not propagated case. - Remove dead code in sched_clock(). - In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads that have become runnable will get a chance to. - In sched_runnable(), if we're not the IDLETD, we should not consider curthread when examining the load. This mimics the 4BSD behavior of returning 0 when the only runnable thread is running. - In sched_userret(), remove the code for setting NEEDRESCHED entirely. This is not necessary and is not implemented in 4BSD. - Use the correct comparison in sched_add() when checking to see if an idle prio task has had it's priority temporarily elevated.	2003-10-27 06:47:05 +00:00
Alfred Perlstein	6ff7636ea5	constify the second args to timevaladd() and timevalsub().	2003-10-26 02:19:00 +00:00
Robert Watson	36bbf86ba6	Check (locked) before performing an advisory unlock following a failure of vn_start_write(). Otherwise, we may inconsistently attempt to release the advisory lock. Pointed out by: teggej	2003-10-25 16:43:50 +00:00
Robert Watson	c447f5b2f4	When generate a core dump, use advisory locking in an advisory way: if we do acquire an advisory lock, great! We'll release it later. However, if we fail to acquire a lock, we perform the coredump anyway. This problem became particularly visible with NFS after the introduction of rpc.lockd: if the lock manager isn't running, then locking calls will fail, aborting the core dump (resulting in a zero-byte dump file). Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>	2003-10-25 16:14:09 +00:00
Robert Watson	67536f038c	Allow MAC policies to block/revoke kern_alq write access to a file. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Reviewed by: jeff	2003-10-25 16:10:41 +00:00
Warner Losh	17e02bb39b	Convenience functions to generate notifications from the kernel. The ACPI code will start using these shortly. Reviewed by: njl	2003-10-24 22:41:54 +00:00

... 3 4 5 6 7 ...

7167 Commits