freebsd-dev

Author	SHA1	Message	Date
Scott Long	740d9ba692	Convert the other use of flags to mflags in soalloc().	2004-03-01 01:14:28 +00:00
Robert Watson	2bc87dcfbe	Modify soalloc() API so that it accepts a malloc flags argument rather than a "waitok" argument. Callers now passing M_WAITOK or M_NOWAIT rather than 0 or 1. This simplifies the soalloc() logic, and also makes the waiting behavior of soalloc() more clear in the calling context. Submitted by: sam	2004-02-29 17:54:05 +00:00
Poul-Henning Kamp	2cf6bdac50	Loudly announce WITNESS and DIAGNOSTIC options and warn about reduced performance.	2004-02-29 16:56:54 +00:00
Poul-Henning Kamp	3d6e5ccb06	Make sure to disable the watchdog if we cannot honour the timeout.	2004-02-28 22:01:19 +00:00
Poul-Henning Kamp	4103b7652d	Rename the WATCHDOG option to SW_WATCHDOG and make it use the generic watchdoc(9) interface. Make watchdogd(8) perform as watchdog(8) as well, and make it possible to specify a check command to run, timeout and sleep periods. Update watchdog(4) to talk about the generic interface and add new watchdog(8) page.	2004-02-28 20:56:35 +00:00
John Baldwin	44f3b09204	Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.	2004-02-27 18:52:44 +00:00
John Baldwin	e5bb601d87	Drop sched_lock around the wakeup of the parent process after setting the process state to zombie when a process exits to avoid a lock order reversal with the sleepqueue locks. This appears to be the only place that we call wakeup() with sched_lock held.	2004-02-27 18:39:09 +00:00
John Baldwin	dd75b0a90d	Add an implementation of a generic sleep queue abstraction that is used to queue threads sleeping on a wait channel similar to how turnstiles are used to queue threads waiting for a lock. This subsystem will be used as the backend for sleep/wakeup and condition variables initially. Eventually it will also be used to replace the ithread-specific iwait thread inhibitor. Sleep queues are also not locked by sched_lock, so this splits sched_lock up a bit further increasing concurrency within the scheduler. Sleep queues also natively support timeouts on sleeps and interruptible sleeps allowing for the reduction of a lot of duplicated code between the sleep/wakeup and condition variable implementations. For more details on the sleep queue implementation, check the comments in sys/sleepqueue.h and kern/subr_sleepqueue.c.	2004-02-27 18:33:09 +00:00
Dag-Erling Smørgrav	21885af505	Add sysctl_move_oid() which reparents an existing OID.	2004-02-27 17:13:23 +00:00
John Baldwin	5b7de7e19e	Clarify and tweak some comments.	2004-02-27 16:14:27 +00:00
John Baldwin	03129ba97f	Fix _sx_assert() to panic() rather than printf() when an assertion fails and ignore assertions if we have already paniced.	2004-02-27 16:13:44 +00:00
John Baldwin	f4114c3d7f	Replace the ktrace queue's semaphore with a condition variable instead as it is slightly more efficient since we already have a mutex to protect the queue. Ktrace originally used a semaphore more as a proof of concept.	2004-02-26 19:30:22 +00:00
Don Lewis	47934cef8f	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms	2004-02-26 00:27:04 +00:00
Robert Watson	049ffe98a8	Assert pipe mutex in pipeselwakeup(), as we manipulate pipe_state in a non-atomic manner. It appears to always be called with the mutex (good).	2004-02-26 00:18:22 +00:00
Robert Watson	094bdd260c	Update comment regarding MAC labels: we no longer pass endpoints into the MAC Framework, just the pipe pair. GC 'hadpeer' used in pipedestroy(), which is no longer needed as we check pipe_present flags on the pair.	2004-02-25 23:30:56 +00:00
Dag-Erling Smørgrav	854a417d92	Whitespace cleanup	2004-02-24 19:31:30 +00:00
Poul-Henning Kamp	652d04726d	Fix two oversights here: don't trash the freelist, and properly cleanup the cdevsw{}. Submitted by: tegge	2004-02-23 08:42:55 +00:00
Brian Feldman	240160d48b	Correct some major SMP-harmful problems in the pipe implementation. First of all, PIPE_EOF is not checked pervasively after everything that can drop the pipe mutex and msleep(), so fix. Additionally, though it might not harm anything, pipelock() and pipeunlock() are not used consistently. Third, the kqueue support functions do not use the pipe mutex correctly. Last, but absolutely not least, is a race: if pipe_busy is not set on the closing side of the pipe, the other side that is trying to write to that will crash BECAUSE PIPE_EOF IS NOT SET! Unconditionally set PIPE_EOF, and get rid of all the lockups/crashes I have seen trying to build ports.	2004-02-22 23:00:14 +00:00
Daniel Eischen	2648efa621	Add sysctls to allow showing threads for pgrp, tty, uid, ruid, and pid.	2004-02-22 17:54:32 +00:00
Pawel Jakub Dawidek	63dba32b76	Reimplement sysctls handling by MAC framework. Now I believe it is done in the right way. Removed some XXMAC cases, we now assume 'high' integrity level for all sysctls, except those with CTLFLAG_ANYBODY flag set. No more magic. Reviewed by: rwatson Approved by: rwatson, scottl (mentor) Tested with: LINT (compilation), mac_biba(4) (functionality)	2004-02-22 12:31:44 +00:00
Colin Percival	b17dd2bcc0	If we're going to panic(), do it before dereferencing a NULL pointer. Reported by: "Ted Unangst" <tedu@coverity.com> Approved by: rwatson (mentor)	2004-02-22 01:11:53 +00:00
Robert Watson	f6a4109212	Update my personal copyrights and NETA copyrights in the kernel to use the "year1-year3" format, as opposed to "year1, year2, year3". This seems to make lawyers more happy, but also prevents the lines from getting excessively long as the years start to add up. Suggested by: imp	2004-02-22 00:33:12 +00:00
Poul-Henning Kamp	ded67d0f77	Check for NODEV return from udev2dev()	2004-02-21 23:52:03 +00:00
Poul-Henning Kamp	cd690b60de	Device megapatch 6/6: This is what we came here for: Hang dev_t's from their cdevsw, refcount cdevsw and dev_t and generally keep track of things a lot better than we used to: Hold a cdevsw reference around all entrances into the device driver, this will be necessary to safely determine when we can unload driver code. Hold a dev_t reference while the device is open. KASSERT that we do not enter the driver on a non-referenced dev_t. Remove old D_NAG code, anonymous dev_t's are not a problem now. When destroy_dev() is called on a referenced dev_t, move it to dead_cdevsw's list. When the refcount drops, free it. Check that cdevsw->d_version is correct. If not, set all methods to the dead_*() methods to prevent entrance into driver. Print warning on console to this effect. The device driver may still explode if it is also incompatible with newbus, but in that case we probably didn't get this far in the first place.	2004-02-21 21:57:26 +00:00
Poul-Henning Kamp	816d62bbb9	Device megapatch 5/6: Remove the unused second argument from udev2dev(). Convert all remaining users of makedev() to use udev2dev(). The semantic difference is that udev2dev() will only locate a pre-existing dev_t, it will not line makedev() create a new one. Apart from the tiny well controlled windown in D_PSEUDO drivers, there should no longer be any "anonymous" dev_t's in the system now, only dev_t's created with make_dev() and make_dev_alias()	2004-02-21 21:32:15 +00:00
Poul-Henning Kamp	dc08ffec87	Device megapatch 4/6: Introduce d_version field in struct cdevsw, this must always be initialized to D_VERSION. Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.	2004-02-21 21:10:55 +00:00
Poul-Henning Kamp	8e1f1df080	Device megapatch 3/6: Add missing D_TTY flags to various drivers. Complete asserts that dev_t's passed to ttyread(), ttywrite(), ttypoll() and ttykqwrite() have (d_flags & D_TTY) and a struct tty pointer. Make ttyread(), ttywrite(), ttypoll() and ttykqwrite() the default cdevsw methods for D_TTY drivers and remove the explicit initializations in various drivers cdevsw structures.	2004-02-21 20:41:11 +00:00
Poul-Henning Kamp	b0b0334878	Device megapatch 2/6: This commit adds a couple of functions for pseudodrivers to use for implementing cloning in a manner we will be able to lock down (shortly). Basically what happens is that pseudo drivers get a way to ask for "give me the dev_t with this unit number" or alternatively "give me a dev_t with the lowest guaranteed free unit number" (there is unfortunately a lot of non-POLA in the exact numeric value of this number, just live with it for now) Managing the unit number space this way removes the need to use rman(9) to do so in the drivers this greatly simplifies the code in the drivers because even using rman(9) they still needed to manage their dev_t's anyway. I have taken the if_tun, if_tap, snp and nmdm drivers through the mill, partly because they (ab)used makedev(), but mostly because together they represent three different problems for device-cloning: if_tun and snp is the plain case: just give me a device. if_tap has two kinds of devices, with a flag for device type. nmdm has paired devices (ala pty) can you can clone either of them.	2004-02-21 20:29:52 +00:00
Brian Feldman	fe5f3a72ac	Make sure to wake up any select waiters when closing a kqueue (also, not crash). I am fairly sure that only people with SMP and multi-threaded apps using kqueue will be affected by this, so I have a stress-testing program on my web site: <URL:http://green.homeunix.org/~green/getaddrinfo-pthreads-stresstest.c>	2004-02-20 04:00:48 +00:00
John Baldwin	712f57d8ab	Tidy up the thread taskqueue implementation and close a lost wakeup race. Instead of creating a mutex that we msleep on but don't actually lock when doing the corresponding wakeup(), in the kthread, lock the mutex associated with our taskqueue and msleep while the queue is empty. Assert that the queue is locked when the callback function is called to wake the kthread.	2004-02-19 22:03:52 +00:00
Jacques Vidrine	57f22bd4af	Rework jail_attach(2) so that an already jailed process cannot hop to another jail. Submitted by: rwatson	2004-02-19 21:03:20 +00:00
Pawel Jakub Dawidek	461167c289	Added sysctl security.jail.jailed. It returns 1 is process is inside of jail and 0 if it is not. Information if we are in jail or not is not a secret, there is plenty of ways to discover it. Many people are using own hack to check this and this will be a legal way from now on. It will be great if our starting scripts will take advantage of this sysctl to allow clean "boot" inside jail. Approved by: rwatson, scottl (mentor)	2004-02-19 14:29:14 +00:00
Pawel Jakub Dawidek	f6739b1ddc	Simplify check. We are only able to check exclusive lock and if 2nd condition is true, first one is true for sure. Approved by: jhb, scottl (mentor)	2004-02-19 14:19:31 +00:00
Don Lewis	cf93aa166c	When reparenting a process in the PT_DETACH code, only set p_sigparent to SIGCHLD if the new parent process is initproc. MFC after: 2 weeks	2004-02-19 10:39:42 +00:00
Don Lewis	6567eef757	A Linux thread created using clone() should not send SIGCHLD to its parent if no signal is specified in the clone() flags argument. PR: 42457 MFC after: 2 weeks	2004-02-19 06:43:48 +00:00
Nate Lawson	32869e71fb	Add support for 'h' and 'hh' modifiers for printf(9). Submitted by: Bruno Ducrot <ducrot AT poupinou.org> Reviewed by: bde	2004-02-19 05:29:39 +00:00
Colin Percival	3a1bdbf8d1	Don't ignore errors from vfs_allocate_syncvnode. PR: kern/18503 Submitted by: Anatoly Vorobey <mellon@pobox.com> Approved by: rwatson (mentor)	2004-02-18 05:20:54 +00:00
Peter Wemm	df7c361e64	Checkpoint a hack to enable running i386 libc_r binaries on a 64 bit kernel. I'm not happy with it yet - refinements are to come. This hack allows the kern.ps_strings and kern.usrstack sysctls to respond to a 32 bit request, such as those coming from emulated i386 binaries.	2004-02-18 00:54:17 +00:00
David Malone	a1cc6206fb	Correct a comment. Reviewed by: alfred, tanimura	2004-02-17 12:30:32 +00:00
Dag-Erling Smørgrav	963385cf22	Mechanical whistespace cleanup.	2004-02-17 10:21:03 +00:00
Dag-Erling Smørgrav	44f4b94b38	Don't bother storing a result when all you need are the side effects.	2004-02-16 18:38:46 +00:00
David Malone	a82294d01c	In fdcheckstd the descriptor table should never be shared, so just KASSERT this rather than trying to deal with what happens when file descriptors change out from under us.	2004-02-15 21:14:48 +00:00
Bruce Evans	72632ef235	Fixed style bugs near previous commit (mainly formatting errors and missing parentheses). Use default handling (trap to debugger) for udev2dev(x, 1) since it is an error and doesn't happen anywhere in the sys tree except in bogusly commented out code in coda.	2004-02-15 20:14:47 +00:00
Colin Percival	a20e9655b9	Remove opv_desc_vector from vfs_add_vnodeops, since it is defined and given a value, but never used. This has no effect on the resulting binaries, since gcc optimizes the variable away anyway. PR: kern/62684 Approved by: rwatson (mentor)	2004-02-15 17:27:33 +00:00
Poul-Henning Kamp	2a3faf2fbd	Split the initialization of the cdevsw into a separate function.	2004-02-15 10:35:33 +00:00
Robert Watson	402d7aa884	Remove excess brackets.	2004-02-15 00:43:22 +00:00
Poul-Henning Kamp	d60d18d491	Use standard style for cdevsw initialization.	2004-02-14 20:03:36 +00:00
Robert Watson	679a106075	By default, don't allow processes in a jail to list the set of jails in the system. Previous behavior (allowed) may be restored by setting security.jail.list_allowed=1.	2004-02-14 19:19:47 +00:00
Robert Watson	7e440242e5	Fix mismerge in last commit: check that cred->cr_prison is NULL before dereferencing the prison pointer.	2004-02-14 18:52:43 +00:00
Robert Watson	f08df373a3	By default, when a process in jail calls getfsstat(), only return the data for the file system on which the jail's root vnode is located. Previous behavior (show data for all mountpoints) can be restored by setting security.jail.getfsstatroot_only to 0. Note: this also has the effect of hiding other mounts inside a jail, such as /dev, /tmp, and /proc, but errs on the side of leaking less information.	2004-02-14 18:31:11 +00:00
Poul-Henning Kamp	d08c5d0b9b	Remove the check which used to protect us against make_dev() being called until DEVFS had a chance to initialize. Since DEVFS is mandatory and things over in that department coincidentally works from without any initialization now, this is safe.	2004-02-14 17:19:43 +00:00
Brian Feldman	a0ed09c0af	T -CURRENT DO NOT CRASH UPON ^T K PLZ THX. Also, use sched_pctcpu() instead of assuming td->td_kse is non-NULL.	2004-02-14 01:30:06 +00:00
Brian Feldman	f662a93197	Always socantsendmore() before deallocating a socket. This, in turn, calls selwakeup() if necessary (which it is, if you don't want freed memory hanging around on your td->td_selq). Props to: alfred	2004-02-12 01:48:40 +00:00
Don Lewis	55b5f2a202	When reparenting a process to init, make sure that p_sigparent is set to SIGCHLD. This avoids the creation of orphaned Linux-threaded zombies that init is unable to reap. This can occur when the parent process sets its SIGCHLD to SIG_IGN. Fix a similar situation in the PT_DETACH code. Tested by: "Steven Hartland" <killing AT multiplay.co.uk>	2004-02-11 22:06:02 +00:00
John Baldwin	e7a44cace2	Argh! Fix a bogon. lim_cur() was returning the hard (max) limit rather than the soft (cur) limit. Submitted by: bde	2004-02-11 18:04:13 +00:00
Mike Silbersack	b49d824e8b	Add the SF_NODISKIO flag to sendfile. This flag causes sendfile to be mindful of blocking on disk I/O and instead return EBUSY when such blocking would occur. Results from the DeBox project indicate that blocking on disk I/O can slow the performance of a kqueue/poll based webserver. Using a flag such as SF_NODISKIO and throwing connections that would block to helper processes/threads helped increase performance. Currently, only the Flash webserver uses this flag, although it could probably be applied to thttpd with relative ease. Idea by: Yaoping Ruan & Vivek Pai	2004-02-08 07:35:48 +00:00
Alan Cox	c5aebf380c	swp_pager_async_iodone() no longer requires Giant. Modify bufdone() and swapgeom_done() to perform swp_pager_async_iodone() without Giant. Reviewed by: tegge	2004-02-07 08:54:50 +00:00
John Baldwin	a875f38546	- Convert the plimit lock to a pool mutex lock. - Hide struct plimit from userland. Submitted by: bde (2)	2004-02-06 19:35:14 +00:00
John Baldwin	f4daf05619	- Correct the translation of old rlimit values to properly handle the old RLIM_INFINITY case for ogetrlimit(). - Use %jd and intmax_t to output negative time in usec in calcru(). - Rework getrusage() to make a copy of the rusage struct into a local variable while holding Giant and then do the copyout from the local variable to avoid having to have the original process rusage struct locked while doing the copyout (which would not be safe). This also includes a few style fixes from Bruce to getrusage(). Submitted by: bde (1, parts of 3) Suggested by: bde (2)	2004-02-06 19:30:12 +00:00
John Baldwin	99b6e02ba6	A few more style fixes from Bruce including a few I missed last time. Submitted by: bde	2004-02-06 19:25:34 +00:00
John Baldwin	4c3558aa82	Always set a process' state to normal when it is fully constructed in fork1() rather than only doing it for the RFSTOPPED case and then having to fix it up in other places later on.	2004-02-05 21:01:37 +00:00
John Baldwin	b4323d7729	- A lot of style and whitespace fixes. - Update a few comments regarding locking notes. Submitted by: bde (1, mostly)	2004-02-05 20:53:25 +00:00
Jacques Vidrine	b00a3c85da	Correct a reference counting bug in shmat(2). If vm_map_find(9) failed, the reference count for the virtual memory object referenced by the specified shared memory segment would have been erroneously incremented. Reported by: Joost Pol <joost@pine.nl>	2004-02-05 18:00:35 +00:00
Alexander Kabaev	dec8868dcc	Rename cn_unavailable to cnunavailable for little more consistency. Garbage collect unused cndebug() function. Suggested by: bde	2004-02-05 17:35:28 +00:00
Mike Silbersack	b711d74eaf	Style fixes: don't indent variable names. Submitted by: bde	2004-02-05 08:29:27 +00:00
Alexander Kabaev	e99c09e2dc	Eliminate global cons_unavailable flag and replace it by the status bit maintained on a per-device basis. Single variable is inadequate on machines running with multiple consoles enabled.	2004-02-05 01:56:43 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Mike Silbersack	ff5e43a3fd	Rename iov_to_uio to uiofromiov to be more consistent with other uio* functions. Suggested by: bde	2004-02-04 08:43:21 +00:00
Pawel Jakub Dawidek	19b0efd32d	Allow assert that the current thread does not hold the sx(9) lock. Reviewed by: jhb In cooperation with: juli, jhb Approved by: jhb, scottl (mentor)	2004-02-04 08:14:58 +00:00
Mike Silbersack	2ccbe4b596	Style fixes Submitted by: bde	2004-02-04 08:14:47 +00:00
Robert Watson	5e312ddcc6	A variety of further cleanups to ttyinfo(): - Rename temporary variable names ("tmp", "tmp2") to more informative names ("load", "pctcpu", "rss", ...) - Unclutter indentation and return paths: rather than lots of nested ifs, simply return earlier if it's not going to work out. Simplify general structure and avoid "deep" code. - Comment on the thread/process selection and locking. - Correct handling of "running"/"runnable" states, avoid "unknown" that people were seeing for running processes. This was due to a misunderstanding of the more complex state machine / inhibitors behavior of KSE. - Do perform ttyinfo() printing on KSE (P_SA) processes, it seems generally to work. While I initially attempted to formulate this as two commits (one layout, the other content), I concluded that the layout changes were really structural changes. Many elements submitted by: bde	2004-02-04 05:46:05 +00:00
John Baldwin	3e9ac3ebf2	Remove a bogus assertion. Noticed by: bde Pointy hat to: jhb	2004-02-03 15:14:27 +00:00
Daniel Eischen	b5426f096b	Regen after adding ksem_timedwait().	2004-02-03 05:11:31 +00:00
Daniel Eischen	aae94fbbb6	Add ksem_timedwait() to complement ksem_wait(). Glanced at by: alfred	2004-02-03 05:08:32 +00:00
Robert Watson	4f638130c3	Don't dec/inc the amountpipes counter every time we resize a pipe -- instead, just dec/inc in the ctor/dtor. For now, increment/decrement in two's, since we're now performing the operation once per pair, not once per pipe. Not really any measurable performance change in my micro-benchmarks, but doing less work is good, especially when it comes to atomic operations. Suggested by: alc	2004-02-03 04:55:24 +00:00
Robert Watson	9a830ddc54	Catch instances of (pipe == NULL) that were obsoleted with recent changes to jointly allocated pipe pairs. Replace these checks with pipe_present checks. This avoids a NULL pointer dereference when a pipe is half-closed. Submitted by: Peter Edwards <peter.edwards@openet-telecom.com>	2004-02-03 02:50:51 +00:00
John Baldwin	9c9c52a3ed	- Assert that witness_cold is not true in enroll(). - Only check witness_watch once in enroll(). Reported by: ru (2)	2004-02-02 22:15:17 +00:00
Pawel Jakub Dawidek	3410b19324	Fix many issues related to mount/unmount: 1. Root from inside a jail was able to unmount any file system (except /). 2. Unprivileged root was able to unmount file systems mounted by privileged root (execpt /). 3. User from inside a jail was able to mount file system when sysctl vfs.usermount was set to 1. 4. User was able to mount file system when vfs.usermount was set to 1 (that's ok) and unmount it even if vfs.usermount was equal to 0 (that's not correct). Possibility from point 1 was reported by: Dariusz Kowalski <darek@76.pl> Only a part of this fix will be MFC'ed (if approved). PR: kern/60149 Reviewed by: rwatson Approved by: scottl (mentor) MFC after: 3 days	2004-02-02 19:02:05 +00:00
Mike Silbersack	02ec600572	Remove debugging code that slipped into the previous commit. Spotted by: bde	2004-02-02 09:09:59 +00:00
Jeff Roberson	b209e5e3e4	- style fixes to the critical_exit() KASSERT(). Submitted by: bde	2004-02-02 08:13:27 +00:00
Jeff Roberson	0392e39dff	- Allow interactive tasks to use the maximum time-slice. This is not as detrimental as I thought it would be in the case of massive process storms from a shell and it makes regular desktop usage noticeably better.	2004-02-01 10:38:13 +00:00
Mike Silbersack	beb699c7ba	Rewrite sendfile's header support so that headers are now sent in the first packet along with data, instead of in their own packet. When serving files of size (packetsize - headersize) or smaller, this will result in one less packet crossing the network. Quick testing with thttpd and http_load has shown a noticeable performance improvement in this case (350 vs 330 fetches per second.) Included in this commit are two support routines, iov_to_uio, and m_uiotombuf; these routines are used by sendfile to construct the header mbuf chain that will be linked to the rest of the data in the socket buffer.	2004-02-01 07:56:44 +00:00
Jeff Roberson	f2f51f8ab8	- Disable ithread binding in all cases for now. This doesn't make as much sense with sched_4bsd as it does with sched_ule. - Use P_NOLOAD instead of the absence of td->td_ithd to determine whether or not a thread should be accounted for in sched_tdcnt.	2004-02-01 06:20:18 +00:00
Robert Watson	4795b82c13	Coalesce pipe allocations and frees. Previously, the pipe code would allocate two 'struct pipe's from the pipe zone, and malloc a mutex. - Create a new "struct pipepair" object holding the two 'struct pipe' instances, struct mutex, and struct label reference. Pipe structures now have a back-pointer to the pipe pair, and a 'pipe_present' flag to indicate whether the half has been closed. - Perform mutex init/destroy in zone init/destroy, avoiding reallocating the mutex for each pipe. Perform most pipe structure setup in zone constructor. - VM memory mappings for pageable buffers are still done outside of the UMA zone. - Change MAC API to speak 'struct pipepair' instead of 'struct pipe', update many policies. MAC labels are also handled outside of the UMA zone for now. Label-only policy modules don't have to be recompiled, but if a module is recompiled, its pipe entry points will need to be updated. If a module actually reached into the pipe structures (unlikely), that would also need to be modified. These changes substantially simplify failure handling in the pipe code as there are many fewer possible failure modes. On half-close, pipes no longer free the 'struct pipe' for the closed half until a full-close takes place. However, VM mapped buffers are still released on half-close. Some code refactoring is now possible to clean up some of the back references, etc; this patch attempts not to change the structure of most of the pipe implementation, only allocation/free code paths, so as to avoid introducing bugs (hopefully). This cuts about 8%-9% off the cost of sequential pipe allocation and free in system call tests on UP and SMP in my micro-benchmarks. May or may not make a difference in macro-benchmarks, but doing less work is good. Reviewed by: juli, tjr Testing help: dwhite, fenestro, scottl, et al	2004-02-01 05:56:51 +00:00
Jeff Roberson	40ece05382	- Revert rev 1.240 we no longer need a kthread for loadav().	2004-02-01 05:37:36 +00:00
Jeff Roberson	e7f004fe23	- Use sched_load() rather than grabbing the sx lock and traversing the proc table to discover the load.	2004-02-01 02:51:33 +00:00
Jeff Roberson	33916c360e	- Add a new member to struct kseq called ksq_sysload. This is intended to track the load for the sched_load() function. In the SMP case this member is not defined because it would be redundant with the ksg_load member which already tracks the non ithd load. - For sched_load() in the UP case simply return ksq_sysload. In the SMP case traverse the list of kseq groups and sum up their ksg_load fields.	2004-02-01 02:48:36 +00:00
Jeff Roberson	ca59f15272	- Keep a variable 'sched_tdcnt' that is used for the local implementation of sched_load(). This variable tracks the number of running and runnable non ithd threads. This removes the need to traverse the proc table and discover how many threads are runnable.	2004-02-01 02:46:47 +00:00
Robert Watson	fca542bcaa	Move KASSERT regarding td_critnest to after the value of td is set to curthread, to avoid warning and incorrect behavior. Hoped not to mind: jeff	2004-02-01 02:31:36 +00:00
Jeff Roberson	6767c6547b	- Assert that td_critnest > 0 in critical_exit() to catch cases of unbalanced uses of the critical_* api.	2004-02-01 01:24:54 +00:00
Robert Watson	26518e8d8c	Fix an error in a KASSERT string: it's pipe_free_kmem(), not pipespace(), that contains this KASSERT.	2004-01-31 23:03:22 +00:00
Poul-Henning Kamp	be8a62e821	Introduce the SO_BINTIME option which takes a high-resolution timestamp at packet arrival. For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL since it has higher resolution and lower overhead. Simultaneous use of the two options is possible and they will return consistent timestamps. This introduces an extra test and a function call for SO_TIMEVAL, but I have not been able to measure that.	2004-01-31 10:40:25 +00:00
Robert Watson	30a9f26db2	Assert process lock in ptracestop(), since we're going to rely on it, and later unlock it.	2004-01-29 00:58:21 +00:00
Robert Watson	94ffb20d72	Add a reset sysctl for mutex profiling: zeros all of the mutex profiling buffers and hash table. This makes it a lot easier to do multiple profiling runs without rebooting or performing gratuitous arithmetic. Sysctl is named debug.mutex.prof.reset. Reviewed by: jake	2004-01-28 22:11:53 +00:00
John Baldwin	d5b75694e7	Move the loadav() callout into its own kthread since it uses allproc_lock which is a sleepable lock and thus is not safe to acquire from a callout routine.	2004-01-28 20:44:41 +00:00
John Baldwin	8d768e7676	Rework witness_lock() to make it slightly more useful and flexible. - witness_lock() is split into two pieces: witness_checkorder() and witness_lock(). Witness_checkorder() determines if acquiring a specified lock at the time it is called would result in a lock order. It optionally adds a new lock order relationship as well. witness_lock() updates witness's data structures to assume that a lock has been acquired by stick a new lock instance in the appropriate lock instance list. - The mutex and sx lock functions now call checkorder() prior to trying to acquire a lock and continue to call witness_lock() after the acquire is completed. This will let witness catch a deadlock before it happens rather than trying to do so after the threads have deadlocked (i.e. never actually report it). - A new function witness_defineorder() has been added that adds a lock order between two locks at runtime without having to acquire the locks. If the lock order cannot be added it will return an error. This function is available to programmers via the WITNESS_DEFINEORDER() macro which accepts either two mutexes or two sx locks as its arguments. - A few simple wrapper macros were added to allow developers to call witness_checkorder() anywhere as a way of enforcing locking assertions in code that might acquire a certain lock in some situations. The macros are: witness_check_{mutex,shared_sx,exclusive_sx} and take an appropriate lock as the sole argument. - The code to remove a lock instance from a lock list in witness_unlock() was unnested by using a goto to vastly improve the readability of this function.	2004-01-28 20:39:57 +00:00
John Baldwin	62a0fd943c	Use mtx_assert() rather than using a home-rolled version.	2004-01-28 20:26:39 +00:00
Alexander Kabaev	975634280a	Move the part of the comment which applies to osigsuspend where it belongs. The current sigsuspend syscall does expect a pointer to the mask as argument. Submitted by: Igor Sysoev <is at rambler-co dot ru>	2004-01-28 06:06:04 +00:00
Dag-Erling Smørgrav	84344f9fbf	Rename the kern.vm.kmem.size tunable to the more logical vm.kmem_size. To assure backward compatibility (conditional on !BURN_BRIDGES), look it up by its old name first, and log a warning (but accept the setting) if it was found. If both the old and new name are defined, the new name takes precedence. Also export vm.kmem_size as a read-only sysctl variable; I find it hard to tune a parameter when I don't know its default value, especially when that default value is computed at boot time.	2004-01-27 15:59:38 +00:00
Robert Watson	6bea667f63	When aborting fork() due to a failure, if using MAC, make sure to clean up the p_label field. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-01-25 18:42:18 +00:00
Ruslan Ermilov	33fe8fd0df	Register the uart(4)'s spin lock with witness(4).	2004-01-25 15:04:37 +00:00
Jeff Roberson	c77ac1fdee	- sched_strict has been dead for a long time now. Get rid of it.	2004-01-25 08:58:14 +00:00
Jeff Roberson	c494ddc8a1	- Clean up KASSERTS.	2004-01-25 08:57:38 +00:00
Jeff Roberson	5a2b158d8d	- Correct function names listed in KASSERTs. These were copied from other code and it was sloppy of me not to adjust these sooner.	2004-01-25 08:21:46 +00:00
Jeff Roberson	e17c57b14b	- Implement cpu pinning and binding. This is acomplished by keeping a per- cpu run queue that is only used for pinned or bound threads. Submitted by: Chris Bradfield <chrisb@ation.org>	2004-01-25 08:00:04 +00:00
Jeff Roberson	d1605f0ac9	- Use a unique string for the sched_setup SYSINIT and rename sched_setup to synch_setup. The schedulers use the sched_setup function name.	2004-01-25 07:49:45 +00:00
Jeff Roberson	29bcc4514f	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.	2004-01-25 03:54:52 +00:00
Robert Watson	8dc10be885	Add some basic support for measuring sleep mutex contention to the mutex profiling code. As with existing mutex profiling, measurement is done with respect to mtx_lock() instances in the code, as opposed to specific mutexes. In particular, measure two things: (1) Lock contention. How often did this mtx_lock() call get made and have to sleep (or almost sleep) waiting for the lock. This helps identify the "victims" of contention. (2) Hold contention. How often, while the lock was held by a thread as a result of this mtx_lock(), did another thread try to acquire the same mutex. This helps identify the causes of contention. I'm currently exploring adding measurement of "time waited for the lock", but the current implementation has proven useful to me so far so I figured I'd commit it so others could try it out. Note that this increases the size of mutexes when MUTEX_PROFILING is enabled, so you might find you need to further bump UMA_BOOT_PAGES. Fixes welcome. The once over: des, others	2004-01-25 01:59:27 +00:00
Poul-Henning Kamp	551260fc36	Deal with MOD_FREQUENCY before MOD_OFFSET because the latter is the one which runs the actual update. This fixes a bug where there were a delay in applying the frequency adjustment. In extreme cases this could result in marginal stability of the kernel-pll.	2004-01-24 21:48:43 +00:00
Jeff Roberson	b9509b56fa	- Move smp_topology to subr_smp.c so that it is defined on all architectures.	2004-01-24 19:52:48 +00:00
Robert Watson	646e29ccac	Don't grab Giant in crfree(), since prison_free() no longer requires it. The uidinfo code appears to be MPSAFE, and is referenced without Giant elsewhere. While this grab of Giant was only made in fairly rare circumstances (actually GC'ing on refcount==0), grabbing Giant here potentially introduces lock order issues with any locks held by the caller. So this probably won't help performance much unless you change credentials a lot in an application, and leave a lot of file descriptors and cached credentials around. However, it simplifies locking down consumers of the credential interfaces. Bumped into by: sam Appeased: tjr	2004-01-23 21:07:52 +00:00
Robert Watson	b3059e09f6	Defer the vrele() on a jail's root vnode reference from prison_free() to a new prison_complete() task run by a task queue. This removes a requirement for grabbing Giant in crfree(). Embed the 'struct task' in 'struct prison' so that we don't have to allocate memory from prison_free() (which means we also defer the FREE()). With this change, I believe grabbing Giant from crfree() can now be removed, but need to check the uidinfo code paths. To avoid header pollution, move the definition of 'struct task' to _task.h, and recursively include from taskqueue.h and jail.h; much preferably to all files including jail.h picking up a requirement to include taskqueue.h. Bumped into by: sam Reviewed by: bde, tjr	2004-01-23 20:44:26 +00:00
Poul-Henning Kamp	ee57aeea65	Write 100 times for tomorrow: "Always print time_t as %jd, you never know what width it has"	2004-01-22 19:50:06 +00:00
Ralf S. Engelschall	446655ac4f	Fix generation of random multicast MAC address. In case no real/physical IEEE 802 address is available, both the expired "draft-leach-uuids-guids-01" (section "4. Node IDs when no IEEE 802 network card is available") and RFC 2518 (section "6.4.1 Node Field Generation Without the IEEE 802 Address") recommend (quoted from RFC 2518): "The ideal solution is to obtain a 47 bit cryptographic quality random number, and use it as the low 47 bits of the node ID, with the _most_ significant bit of the first octet of the node ID set to 1. This bit is the unicast/multicast bit, which will never be set in IEEE 802 addresses obtained from network cards; hence, there can never be a conflict between UUIDs generated by machines with and without network cards." Unfortunately, this incorrectly explains how to implement this and the FreeBSD UUID generator code inherited this generation bug from the broken reference code in the standards draft. They should instead specify the "_least_ significant bit of the first octet of the node ID" as the multicast bit in a memory and hexadecimal string representation of a 48-bit IEEE 802 MAC address. This standards bug arised from a false interpretation, as the multicast bit is actually the _most_ significant bit in IEEE 802.3 (Ethernet) _transmission order_ of an IEEE 802 MAC address. The standards authors forgot that the bitwise order of an _octet_ from a MAC address _memory_ and hexadecimal string representation is still always from left (MSB, bit 7) to right (LSB, bit 0). Fortunately, this UUID generation bug could have occurred on systems without any Ethernet NICs only.	2004-01-22 13:34:11 +00:00
Poul-Henning Kamp	4e74721cac	Add a sysctl (default: off) which enables a log(LOG_INFO...) warning if the clock is stepped.	2004-01-21 21:05:40 +00:00
Robert Watson	679365e7b9	Reduce gratuitous includes: don't include jail.h if it's not needed. Presumably, at some point, you had to include jail.h if you included proc.h, but that is no longer required. Result of: self injury involving adding something to struct prison	2004-01-21 17:10:47 +00:00
Andrey A. Chernov	9bbee25931	pread/pwrite: follow lseek spirit - return EINVAL on negative offset for non-VCHR	2004-01-20 01:27:42 +00:00
Poul-Henning Kamp	50d23be140	Add linenumber and source filename to panic(9) output. Ideally a traceback should be printed too, any takers ?	2004-01-19 21:27:11 +00:00
Alexander Kabaev	54556cc7b8	One more instance of magic number used in place of IO_SEQSHIFT. Submitted by: alc	2004-01-19 20:45:43 +00:00
Ruslan Ermilov	0541040c46	Since "m" is not part of the "mp" chain, need to free() it. Reported by: Stanford Metacompilation research group	2004-01-18 14:02:53 +00:00
Andrew Gallatin	1c318b9665	Handle sf_buf_alloc() returning null. This can happen if the process takes a signal while waiting for an sf_buf to become available. Reviewed by: alc	2004-01-17 21:16:51 +00:00
Dag-Erling Smørgrav	a6d4491c71	Restore correct semantics for F_DUPFD fcntl. This should fix the errors people have been getting with configure scripts.	2004-01-17 00:59:04 +00:00
Dag-Erling Smørgrav	56a9fc0e93	WITNESS won't let us hold two filedesc locks at the same time, so juggle fdp and newfdp around a bit.	2004-01-16 21:54:56 +00:00
Robert Watson	bafc8f255a	KASSERT() that initproc->p_pid is 1. Very bad things happen if init's pid isn't 1, and it can actually occur if kthread_create() is called before SUB_SI_CREATE_INIT without RFHIGHPID. Discussed with: jhb	2004-01-16 20:29:23 +00:00
Dag-Erling Smørgrav	ddce426f69	Remove two KASSERTs which were overly paranoid.	2004-01-16 08:45:56 +00:00
Dag-Erling Smørgrav	12d568c2b1	Take care to drop locks when calling malloc()	2004-01-15 18:50:11 +00:00
Dag-Erling Smørgrav	a2fe44e8cf	New file descriptor allocation code, derived from similar code introduced in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated file descriptors which is used to locate available descriptors when a new one is needed. It also moves the task of growing the file descriptor table out of fdalloc(), reducing complexity in both fdalloc() and do_dup(). Debts of gratitude are owed to tjr@ (who provided the original patch on which this work is based), grog@ (for the gdb(4) man page) and rwatson@ (for assistance with pxeboot(8)).	2004-01-15 10:15:04 +00:00
Don Lewis	288e351b55	If a device attach routine fails during boot and calls bus_teardown_intr(), ithread_remove_handler() may fail to remove the interrupt handler if it decides to let the ithread do the removal. The problem is that during boot "cold" is set, which causes msleep() to return immediately. This will cause ithread_remove_handler() to fail to wait for the ithread to do the removal from the handler TAILQ before freeing the handler back to the heap. Bad things will happen when some other user of the TAILQ, such as ithread_add_handler() or the actual ithread attempts to use the freed handler. Fix the problem by forcing ithread_remove_handler() to do the actual removal itself if the "cold" flag is set. Reviewed by: jhb	2004-01-13 22:55:46 +00:00
Dag-Erling Smørgrav	ac34dc4e79	Back out 1.160, which was committed by mistake.	2004-01-11 20:08:57 +00:00
Dag-Erling Smørgrav	d7a1c7e34b	Back out 1.166, which was committed by mistake.	2004-01-11 20:07:15 +00:00
Dag-Erling Smørgrav	f1ea6d813d	Mechanical whitespace cleanup + other minor style nits.	2004-01-11 19:56:42 +00:00
Dag-Erling Smørgrav	0e5dfade00	Mechanical whitespace cleanup.	2004-01-11 19:54:45 +00:00
Dag-Erling Smørgrav	05c3c5c8b6	Mechanical whitespace cleanup; parenthesize return values; other minor style nits. The #ifdefs in this file give me a headache...	2004-01-11 19:52:10 +00:00
Dag-Erling Smørgrav	e5aeaa0c67	Mechanical whitespace cleanup; parenthesize return values; other minor style nits.	2004-01-11 19:48:19 +00:00
Dag-Erling Smørgrav	012b5531f4	Mechanical whitespace cleanup + minor style nits.	2004-01-11 19:43:14 +00:00
Dag-Erling Smørgrav	c9de31f55f	Mechanical whitespace cleanup.	2004-01-11 19:39:14 +00:00
Alan Cox	0e88a71798	Remove long dead code, specifically, code related to munmapfd(). (See also vm/vm_mmap.c revision 1.173.)	2004-01-11 06:59:21 +00:00
Robert Watson	def055686c	When not creating a core dump due to resource limits specifying a maximum dump size of 0, return a size-related error, rather than returning success. Otherwise, waitpid() will incorrectly return a status indicating that a core dump was created. Note that the specific error doesn't actually matter, since it's lost. MFC after: 2 weeks PR: 60367 Submitted by: Valentin Nechayev <netch@netch.kiev.ua>	2004-01-11 02:28:06 +00:00
Jens Schweikhardt	85495c72ff	s/Muliple/Multiple Removed whitespace at EOL and EOF.	2004-01-10 18:34:01 +00:00
Dag-Erling Smørgrav	d41457da80	More unparenthesized return values.	2004-01-10 17:14:53 +00:00
Dag-Erling Smørgrav	b91a599717	Style: parenthesize return values.	2004-01-10 13:03:43 +00:00
Don Lewis	2b77864f1e	Add a somewhat redundant check on the len arguement to getsockaddr() to avoid relying on the minimum memory allocation size to avoid problems. The check is somewhat redundant because the consumers of the returned structure will check that sa_len is a protocol-specific larger size. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: nectar MFC after: 30 days	2004-01-10 08:28:54 +00:00
Olivier Houchard	5cded90454	Prevent a race condition between fork1() and whatever changes the pgrp by setting the new process' p_pgrp again before inserting it in the p_pglist. Without it we can get the new process to be inserted in a different p_pglist than the one p2->p_pgrp points to, and this is not something we want to happen. This is not a fix, merely a bandaid, but it will work until someone finds a better way to do it. Discussed with: jhb (a long time ago)	2004-01-09 23:42:36 +00:00
Robert Watson	07eacae0d2	Improve the expressiveness of ttyinfo (^T) when dealing with threads in slightly less usual states: If the thread is on a run queue, display "running" if the thread is actually running, otherwise, "runnable". If the thread is sleeping, and it's on a sleep queue, display the name of the queue, otherwise "unknown" -- previously, in this situation we would display "iowait". If the thread is waiting on a lock, display *lockname. If the thread is suspended, display "suspended" -- previously, in this situation we would display "iowait". If the thread is waiting for an interrupt, display "intrwait" -- previously, in this situation we would display "iowait". If the thread is in a state not handled by the above, display "unknown" -- previously, we would print "iowait". Among other things, this avoids displaying "iowait" when the foreground process turns out to be suspended waiting for a debugger to properly attach.	2004-01-08 22:49:23 +00:00
Robert Watson	047aa39b25	Drop the sigacts mutex around calls to stopevent() to avoid sleeping holding the mutex. Because the sigacts pointer can't change while the process is "live" (proc locking (x)), we know our pointer is still valid. In communication with: truckman Reviewed by: jhb	2004-01-08 22:44:54 +00:00
Alexander Kabaev	c969c60c60	Add pid to the info printed in lockmgr_printinfo. This makes VFS diagnostic messages slightly more useful.	2004-01-06 04:34:13 +00:00
Alexander Kabaev	580ddfa64b	More style fixes. Obtained from: bde	2004-01-05 23:40:46 +00:00
John Baldwin	eac097962f	- Allow mtx_trylock() to recurse on a recursive mutex. Attempts to recurse on a non-recursive mutex will fail but will not trigger any assertions. - Add an assertion to mtx_lock() that one never recurses on a non-recursive mutex. This is mostly useful for the non-WITNESS case. Requested by: deischen, julian, others (1)	2004-01-05 23:09:51 +00:00
Alexander Kabaev	b0fdf71656	style(9): Add empty line before first code line in functions with no local variables. Properly terminate comment sentences. Indent lines which are longer that 80 characters. Move v_addpollinfo closer to the rest of poll-related functions. Move DEBUG_VFS_LOCKS ifdefed block to the end of file. Obtained from: bde (partly)	2004-01-05 19:04:29 +00:00
Alexander Kabaev	3ff1b7c23f	Cosmetics: strip '\n' from a string passed to Debugger().	2004-01-04 03:42:20 +00:00
David Xu	a30ec4b99c	Make sigaltstack as per-threaded, because per-process sigaltstack state is useless for threaded programs, multiple threads can not share same stack. The alternative signal stack is private for thread, no lock is needed, the orignal P_ALTSTACK is now moved into td_pflags and renamed to TDP_ALTSTACK. For single thread or Linux clone() based threaded program, there is no semantic changed, because those programs only have one kernel thread in every process. Reviewed by: deischen, dfr	2004-01-03 02:02:26 +00:00
Nate Lawson	44bb5f52d3	Move the kernel power change printf under bootverbose since the power_profile script now duplicates the message via syslog.	2004-01-02 18:24:13 +00:00
Sam Leffler	4f9f9cf3a4	m_tag fixups in preparation for heavier use: o promote several m_tag_* routines to inline o add an m_tag_setup inline to set the fixed fields in a packet tag o add an m_tag_free method pointer to each mtag to support, for example, allocating tags from zones o have m_tag_find check if the tag list is not empty before calling m_tag_locate to search Reviewed by: brooks, silence from others	2004-01-02 17:27:39 +00:00
David Malone	70ad6c2190	Plug a leak of open files that happens when you exec a suid program with one of std{in,out,err} open. This helps with the file descriptor leaks reported on -current. This should probably be merged into 5.2. Reviewed by: ru Tested by: Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>	2003-12-28 19:27:14 +00:00
Bruce Evans	9efe7d9d83	v_vxproc was a bogus name for a thread (pointer).	2003-12-28 09:12:56 +00:00
Mike Silbersack	ddeb5b242e	Track three new sendfile-related statistics: - The number of times sendfile had to do disk I/O - The number of times sfbuf allocation failed - The number of times sfbuf allocation had to wait	2003-12-28 08:57:09 +00:00
Bruce Evans	d6c847f378	Fixed some style bugs (mainly, try to always use explicit comparisons with NULL when checking for null pointers).	2003-12-28 04:37:59 +00:00
Bruce Evans	ca46e90ef4	Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall function back to near the beginning of the file. Rev.1.194 moved it into the middle of auxiliary functions following kern_execve(). Moved the __mac_execve() syscall function up together with execve(). It was new in rev1.1.196 and perfectly misplaced after execve().	2003-12-28 04:18:13 +00:00
Mike Silbersack	69fba1650a	Fix the maxpipekva warning message so that it points to the correct sysctl, and shorten the message. Noticed by: bde	2003-12-28 01:19:58 +00:00
Alan Cox	34d2675761	Remove GIANT_REQUIRED from exec_unmap_first_page().	2003-12-27 19:40:03 +00:00
Mike Silbersack	5eda9873e9	Track current and peak sfbuf usage, export the values via sysctl.	2003-12-27 07:52:47 +00:00
John Baldwin	c55bbb6cb7	Create a separate kthread that executes sched_cpu() once a second. Because sched_cpu() locks an sx lock (allproc_lock) which can sleep if it fails to acquire the lock, it is not safe to execute this in a callout handler from softclock().	2003-12-26 17:07:29 +00:00
Alfred Perlstein	866e3b7e73	Put restrict back in, the compilation failure was my fault when I did a bad merge from the PR. Thanks to Bruce Evans for explaining.	2003-12-26 05:58:16 +00:00
Alfred Perlstein	4abb4ff34d	Add __restrict qualifiers to copyinfrom, copyinstrfrom, copystr, copyinstr, copyin and copyout.	2003-12-26 05:54:35 +00:00
David Malone	9322078275	In socket(2) we only need Giant around the call to socreate, so just grab it there.	2003-12-25 23:44:38 +00:00
David Malone	1c58509c25	Don't TAILQ_INIT kq_head twice, once is enough.	2003-12-25 23:42:36 +00:00
Mike Silbersack	8dee2f6746	Fix another 0 / NULL mixup.	2003-12-25 01:17:27 +00:00
Alfred Perlstein	6502da1307	We're not ready for restrict qualifiers here.	2003-12-24 19:09:45 +00:00
Alfred Perlstein	9f144cff85	Add restrict qualifiers. PR: 44394 Submitted by: Craig Rodrigues <rodrige@attbi.com>	2003-12-24 18:47:43 +00:00
Robert Watson	69546b2fbb	Document that when we are addressing an open()/close() race, the reason we call vn_close() manually rather than letting fdrop() take care of it is that we haven't yet hooked up the various 'struct file' fields.	2003-12-24 17:13:01 +00:00
Alfred Perlstein	1805ed0772	Introduce mp_maxcpus which can be used by libkvm utils to find out how many CPUs the system was compiled for. Export the variable via a sysctl node 'kern.smp.maxcpus' as well.	2003-12-23 13:54:16 +00:00
Peter Wemm	2c74309622	Regen - this should be essentially a NOP, except for rcsid changes.	2003-12-23 03:52:14 +00:00
Peter Wemm	eec525a435	Remove namespc column and attempt to un-fold some of the longer lines that now fit.	2003-12-23 03:51:36 +00:00
Peter Wemm	1a58b07149	Remove the namespace column from the syscalls tables. We don't actually use it, if we ever did. They have been been VERY poorly maintained for some time, possibly because they were a NOP. FWIW, This brings our table formats back closer to the other *BSD's.	2003-12-23 03:50:43 +00:00
Peter Wemm	9b68618df0	Add an additional field to the elf brandinfo structure to support quicker exec-time replacement of the elf interpreter on an emulation environment where an entire /compat/* tree isn't really warranted.	2003-12-23 02:42:39 +00:00
Peter Wemm	a89ec05e3e	Catch a few places where NULL (pointer) was used where 0 (integer) was expected.	2003-12-23 02:36:43 +00:00
Peter Wemm	55cdddc0d8	Don't use NULL (pointer) when we mean 0 (integer) for the number of ticks in msleep.	2003-12-23 02:28:42 +00:00
Jeff Roberson	249e0bea8f	- Make our transfer decisions based on load and not transferable load. A cpu could have been bogged down with non-transferable load and still not migrated a new thread to an idle cpu. This required some benchmarking and tuning to get right as the comment above it suggests.	2003-12-20 22:35:20 +00:00
Jeff Roberson	e7a976f415	- Enable ithread migration on x86. This is done to work around a bug in the IO APIC on Xeons that prevents round-robin interrupt assignment from working.	2003-12-20 20:36:19 +00:00
Alan Cox	96a7b42213	Remove a variable that has been initialized but otherwise unused since revision 1.315.	2003-12-20 19:46:21 +00:00
Jeff Roberson	670c524f08	- In kseq_transfer() return if smp has not been started. - In sched_add(), do the idle check prior to the transfer check so that we don't try to transfer load from an idle cpu. This fixes panics caused by IPIs on UP machines running SMP kernels. Reported/Debugged by: seanc	2003-12-20 14:03:14 +00:00
Jeff Roberson	9b5f6f623d	- Running interactive tasks with the minimum time-slice is fine for vi and sh, but not so great for mozilla, X, etc. Add a fixed define for the slice size granted to interactive KSEs.	2003-12-20 12:54:35 +00:00
Tim J. Robbins	f5925b7436	Reduce the overhead of semop() by using the kernel stack instead of malloc'd memory to store the operations array if it is small enough to fit.	2003-12-19 13:07:17 +00:00
John Baldwin	eb5b0e0565	Various style fixes. Submitted by: bde (mostly, if not all)	2003-12-17 21:13:04 +00:00
Jeff Roberson	958557e9c7	- In vget() if LK_NOWAIT is specified we should return EBUSY and not ENOENT. Submitted by: Stephan Uphoff <ups@stups.com>	2003-12-16 17:08:27 +00:00
Jeff Roberson	d85213669b	- When doing a forced unmount, VFS attempts to keep VCHR vnodes valid by reassigning their v_ops field to specfs, detaching from the mountpoint, etc. However, this is not sufficient. If we vclean() the vnode the pages owned by the vnode are lost, potentially while buffers reference them. Implement parts of vclean() seperately in vgonechrl() so that the pages and bufs associated with a device vnode are not destroyed while in use.	2003-12-16 17:05:05 +00:00
Bruce M Simpson	5406529771	style(9) pass and type fixups. Submitted by: bde	2003-12-16 14:13:47 +00:00
Bruce M Simpson	37621fd5d9	Push m_apply() and m_getptr() up into the colleciton of standard mbuf routines, and purge them from opencrypto. Reviewed by: sam Obtained from: NetBSD Sponsored by: spc.org	2003-12-15 21:49:41 +00:00
Jeff Roberson	86e1c22aa4	- Assign the ke_cpu field in kseq_notify() so that all of our callers do not have to do it. - Set the ke_runq to NULL in sched_add() before calling kseq_notify(). Otherwise we may panic in sched_add() if INVARIANTS is on.	2003-12-14 02:06:29 +00:00
Robert Watson	09a4a69c1d	Although sometimes to the uninitiated, it may seem like goup, KSEGOUP is actually spelt KSEGROUP. Go figure. Reported by: samy@kerneled.com	2003-12-12 21:25:56 +00:00
Jeff Roberson	cac77d0422	- Now that we have kseq groups, balance them seperately. - The new sched_balance_groups() function does intra-group balancing while sched_balance() balances the available groups. - Pick a random time between 0 ticks and hz * 2 ticks to restart each balancing process. Each balancer has its own timeout. - Pick a random place in the list of groups to start the search for lowest and highest group loads. This prevents us from prefering a group based on numeric position. - Use a nasty hack to stop us from preferring cpu 0. The problem is that softclock always runs on cpu 0, so it always has a little extra load. We ignore this load in the balancer for now. In the future softclock should run on a random cpu and these hacks can go away.	2003-12-12 07:33:51 +00:00
Jeff Roberson	2e227f0406	- Don't let the pctcpu rate limiter throttle us if we have recorded over SCHED_CPU_TICKS ticks. This was allowing processes to display (1/SCHED_CPU_TIME * 100) % more cpu than they had used.	2003-12-11 04:23:39 +00:00
Jeff Roberson	b11fdad0fc	- In sched_switch(), if a thread has been assigned, don't touch the runqueues or load. These things have already been taken care of in sched_bind() which should be the only place that we're switching in an assigned thread.	2003-12-11 04:00:49 +00:00
Jeff Roberson	80f86c9f88	- Add support for CPU groups to ule. All SMT cores on the same physical cpu are added to a group. - Don't place a cpu into the kseq_idle bitmask until all cpus in that group have idled. - Prefer idle groups over idle group members in the new kseq_transfer() function. In this way we will prefer to balance load across full cores rather than add further load a partial core. - Before a cpu goes idle, check the other group members for threads. Since SMT cpus may freely share threads, this is cheap. - SMT cores may be individually pinned and bound to now. This contrasts the old mechanism where binding or pinning would have allowed a thread to run on any available cpu. - Remove some unnecessary logic from sched_switch(). Priority propagation should be properly taken care of in sched_prio() now.	2003-12-11 03:57:10 +00:00
Peter Wemm	5be4b10c89	Regen	2003-12-10 22:18:54 +00:00
Peter Wemm	5352eb6bb1	Update file locations for syscall tables to copy to.	2003-12-10 22:08:37 +00:00
Marcel Moolenaar	ccb46feb8e	Write the thread pointer (val) in the kse mailbox (loc) before we set the new context in kse_switchin(2). This allows us to return an error to the calling context when the suword() fails.	2003-12-10 01:59:23 +00:00
John Baldwin	67ba867827	Adjust an assertion for the TDF_TSNOBLOCK race handling in turnstile_unpend(). A racing thread that does not have TDI_LOCK set may either be running on another CPU or it may be sitting on a run queue if it was preempted during the very small window in turnstile_wait() between unlocking the turnstile chain lock and locking sched_lock.	2003-12-09 21:14:31 +00:00
John Baldwin	da1d503b22	Assert that the we never give a thread a NULL turnstile when waking it up.	2003-12-09 21:09:54 +00:00
John Baldwin	6b6bd95ee5	Revert the previous race fix and replace it with a more general fix. The case of a turnstile having no threads is just one instance of the more general case where the thread we are examining has been partially awakened already in that it has been removed from the turnstile's blocked list but still has TDI_LOCK set. We detect that case by checking to see if the thread has already had a turnstile reassigned to it.	2003-12-09 21:09:04 +00:00
David Xu	a9a48d6862	Lock and unlock sched_lock when walking through thread list, current we insert kse upcall thread into thread list at mi_switch time, process lock is not enough.	2003-12-07 23:47:15 +00:00
Don Lewis	50105bcf1a	Pass MTX_DEF as the last argument to mtx_init() instead of 0. This is not a functional change. The code happened to work properly only because MTX_DEF is defined as 0.	2003-12-07 21:53:41 +00:00
Poul-Henning Kamp	377e7be416	Make the DIAGNOSTIC code which complains about long {call\|time}out(9) functions less noisy: We printf if a new function took longer than the previous record holder, or of the previous record holder took more than twice as long as the current record.	2003-12-07 20:03:28 +00:00
Marcel Moolenaar	cfa4b1e7b1	Regen due to kse_switchin(2).	2003-12-07 19:36:16 +00:00
Marcel Moolenaar	702b2a179c	Add kse_switchin(2). This syscall can be used by KSE implementations to have the kernel switch to a new thread, instead of doing it in userland. It is in fact needed on ia64 where syscall restarts do not return to userland first. It's completely handled inside the kernel. As such, any context created by the kernel as part of an upcall and caused by some syscall needs to be restored by the kernel.	2003-12-07 19:34:29 +00:00
Peter Wemm	a2640c9ba9	rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64). Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad things happen(TM). This is why beast.freebsd.org paniced with ULE. Reviewed by: jeff	2003-12-07 09:57:51 +00:00
Scott Long	774114995e	Re-arrange and consolidate some random debugging stuff	2003-12-07 05:04:49 +00:00
Alan Cox	bca62663ab	- Giant is no longer required by vm_thread_new().	2003-12-07 04:16:49 +00:00
Robert Watson	56d9e93207	Rename mac_create_cred() MAC Framework entry point to mac_copy_cred(), and the mpo_create_cred() MAC policy entry point to mpo_copy_cred_label(). This is more consistent with similar entry points for creation and label copying, as mac_create_cred() was called from crdup() as opposed to during process creation. For a number of policies, this removes the requirement for special handling when copying credential labels, and improves consistency. Approved by: re (scottl) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-12-06 21:48:03 +00:00
John Baldwin	b6c71225a9	Fix all users of mp_maxid to use the same semantics, namely: 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha	2003-12-03 14:57:26 +00:00
John Baldwin	45c1c90f6a	Export a few SMP related symbols in UP kernels as well. This is needed to aid other kernel code, especially code which can be in a module such as the acpi_cpu(4) driver, to work properly with both SMP and UP kernels. The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and the smp_rendezvous() function. This also means that CPU_ABSENT() is now always implemented the same on all kernels. Approved by: re (scottl)	2003-12-03 14:55:31 +00:00
David Greenman	186e347f2c	Fixed a bug in sendfile(2) where the sent data would be corrupted due to sendfile(2) being erroneously automatically restarted after a signal is delivered. Fixed by converting ERESTART to EINTR prior to exiting. Updated manual page to indicate the potential EINTR error, its cause and consequences. Approved by: re@freebsd.org	2003-12-01 22:12:50 +00:00
Ian Dowse	25cb5d7a6b	In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the forced unmount case. Otherwise, a file system that is referenced only by process fd_cdir/fd_rdir references to the file system root vnode will be successfully unmounted without the MNT_FORCE flag. The previous behaviour was not compatible with the unmount semantics required by amd(8), so file systems could be unexpectedly unmounted while there were still references to the file system root directory. Reported by: Erez Zadok <ezk@cs.sunysb.edu> Approved by: re (scottl)	2003-11-30 23:30:09 +00:00
Jeff Roberson	a6c6a93c89	- Don't forget to unlock the vnode interlock in the LK_NOWAIT case. Submitted by: Stephan Uphoff <ups@stups.com> Approved by: re (rwatson)	2003-11-30 22:09:58 +00:00
Alexander Kabaev	97c43a540a	Do not attempt to destroy NULL vfs options list. Approved by: re (scottl) Reported by: Christian Laursen <xi atborderworlds dot dk>	2003-11-23 17:13:48 +00:00
John Baldwin	798a45964d	- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid. cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is actually present and sets mp_ncpus and all_cpus. Splitting these up allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the CPU probing code to live in a module, for example, since modules sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is needed to re-enable the ACPI module on i386. - For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating its contents in a few places. Also, add a smp_cpu_enabled() function to avoid duplicating some code. There is room for further code reduction later since much of this code is also present in cpu_mp_start(). - All archs besides i386 still set mp_maxid to the same values they set it to before this change. i386 now sets mp_maxid to MAXCPU. Tested on: alpha, amd64, i386, ia64, sparc64 Approved by: re (scottl)	2003-11-21 22:23:26 +00:00
Mark Murray	4e3a7a14d9	Fix a major faux pas of mine. I was causing 2 very bad things to happen in interrupt context; 1) sleep locks, and 2) malloc/free calls. 1) is fixed by using spin locks instead. 2) is fixed by preallocating a FIFO (implemented with a STAILQ) and using elements from this FIFO instead. This turns out to be rather fast. OK'ed by: re (scottl) Thanks to: peter, jhb, rwatson, jake Apologies to: *	2003-11-20 15:35:48 +00:00
Mark Murray	3fed54aaaa	Hackfix to patch around a kernel panic I introduced. Real fix to follow. In the meanwhile, we are not harvesting interrupt entropy. Approved by: re (jhb)	2003-11-18 14:35:43 +00:00
Robert Watson	a557af222b	Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-18 00:39:07 +00:00
Robert Watson	64d19c2ea7	Add a sysctl, security.bsd.see_other_gids, similar in semantics to see_other_uids but with the logical conversion. This is based on (but not identical to) the patch submitted by Samy Al Bahra. Submitted by: Samy Al Bahra <samy@kerneled.com>	2003-11-17 20:20:53 +00:00
Peter Wemm	0d2a298904	Initial landing of SMP support for FreeBSD/amd64. - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.	2003-11-17 08:58:16 +00:00
Jeff Roberson	fa9c971710	- Mark ksq_assigned as volatile so that when this code is used without sched_lock we can be sure that we'll pick up the new value.	2003-11-17 08:27:11 +00:00
Jeff Roberson	093c05e39d	- Remove long dead code. rslices hasn't been used in some time and neither has sched_pickcpu().	2003-11-17 08:24:14 +00:00
Peter Wemm	90e3387e54	Expand the argument to the ithread enable/disable helper hooks from an int to something big enough to hold a pointer. amd64 needs this.	2003-11-17 06:08:10 +00:00
Robert Watson	b0323ea3aa	Implement sockets support for __mac_get_fd() and __mac_set_fd() system calls, and prefer these calls over getsockopt()/setsockopt() for ABI reasons. When addressing UNIX domain sockets, these calls retrieve and modify the socket label, not the label of the rendezvous vnode. - Create mac_copy_socket_label() entry point based on mac_copy_pipe_label() entry point, intended to copy the socket label into temporary storage that doesn't require a socket lock to be held (currently Giant). - Implement mac_copy_socket_label() for various policies. - Expose socket label allocation, free, internalize, externalize entry points as non-static from mac_net.c. - Use mac_socket_label_set() in __mac_set_fd(). MAC-aware applications may now use mac_get_fd(), mac_set_fd(), and mac_get_peer() to retrieve and set various socket labels without directly invoking the getsockopt() interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 23:31:45 +00:00
Robert Watson	9e71dd0feb	Reduce gratuitous redundancy and length in function names: mac_setsockopt_label_set() -> mac_setsockopt_label() mac_getsockopt_label_get() -> mac_getsockopt_label() mac_getsockopt_peerlabel_get() -> mac_getsockopt_peerlabel() Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 18:25:20 +00:00
Alan Cox	e45db9b837	- Modify alpha's sf_buf implementation to use the direct virtual-to- physical mapping. - Move the sf_buf API to its own header file; make struct sf_buf's definition machine dependent. In this commit, we remove an unnecessary field from struct sf_buf on the alpha, amd64, and ia64. Ultimately, we may eliminate struct sf_buf on those architecures except as an opaque pointer that references a vm page.	2003-11-16 06:11:26 +00:00
Robert Watson	12cbb9dc56	When implementing getsockopt() for SO_LABEL and SO_PEERLABEL, make sure to sooptcopyin() the (struct mac) so that the MAC Framework knows which label types are being requested. This fixes process queries of socket labels. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 03:53:36 +00:00
Bruce Evans	416ab90e6b	Localized the cy driver's locking.	2003-11-16 00:55:54 +00:00
Poul-Henning Kamp	d87526cf43	Rename the debugging mutex "callout_no_sleep" to "dont_sleep_in_callout".	2003-11-15 18:33:54 +00:00
Tim J. Robbins	4d93f53e74	Initialize sequence numbers to 0 in seminit() instead of using whatever garbage happens to be in memory. This did not seem to cause any problems except making semaphore ID's unpredictable (and ugly in ipcs(1) output).	2003-11-15 11:56:53 +00:00
Poul-Henning Kamp	00cbe31bd8	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
Alan Cox	28c9416429	- Remove the remaining now unnecessary checks for the buf's b_object being NULL. See revision 1.421 for more detail. - Remove GIANT_REQUIRED from vfs_unbusy_pages(). Discussed with: jeff	2003-11-15 08:45:36 +00:00
Jeff Roberson	155b9987a3	- Introduce kseq_runq_{add,rem}() which are used to insert and remove kses from the run queues. Also, on SMP, we track the transferable count here. Threads are transferable only as long as they are on the run queue. - Previously, we adjusted our load balancing based on the transferable count minus the number of actual cpus. This was done to account for the threads which were likely to be running. All of this logic is simpler now that transferable accounts for only those threads which can actually be taken. Updated various places in sched_add() and kseq_balance() to account for this. - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're really doing. The load is accounted for seperately from the runq because the load is accounted for even as the thread is running. - Fix a bug in sched_class() where we weren't properly using the PRI_BASE() version of the kg_pri_class. - Add a large comment that describes the impact of a seemingly simple conditional in sched_add(). - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE() prior to checking kseq_idle. This reduces the frequency of access for kseq_idle which is a shared resource.	2003-11-15 07:32:07 +00:00
Olivier Houchard	1a29c80648	Better fix than my previous commit: in exit1(), make sure the p_klist is empty after sending NOTE_EXIT. The process won't report fork() or execve() and won't be able to handle NOTE_SIGNAL knotes anyway. This fixes some race conditions with do_tdsignal() calling knote() while the process is exiting. Reported by: Stefan Farfeleder <stefan@fafoe.narf.at> MFC after: 1 week	2003-11-14 18:49:01 +00:00
Alexander Kabaev	3b39740df8	Fix a number of style(9) bugs introduced in r1.113 by me. Suggested by: bde	2003-11-14 05:27:41 +00:00
Jeff Roberson	808674fd0e	- regen.	2003-11-14 03:49:41 +00:00
Jeff Roberson	5c49a0566a	- Revision 1.156 marked ptrace() SMP safe. Unfortunately, alpha implements parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and giant must be acquired prior to the proc lock, so ptrace must require giant still.	2003-11-14 03:48:37 +00:00
Poul-Henning Kamp	555a5de270	Various minor details: Give the HZ/overflow check a 10% margin. Eliminate bogus newline. If timecounters have equal quality, prefer higher frequency. Some inspiration from: bde	2003-11-13 10:03:58 +00:00
John Baldwin	79a13d0182	- Close a race where a thread on another CPU could release a contested lock and empty its turnstile while the blocking threads still pointed to the turnstile. If the thread on the first CPU blocked on a lock owned by one of the threads blocked on the turnstile just woken up, then the first CPU could try to manipulate a bogus thread queue in the turnstile during priority propagation. - Update locking notes for ts_owner and always clear ts_owner, not just under INVARIANTS. Tested by: sam (1)	2003-11-12 23:48:42 +00:00
Kirk McKusick	48b0f4b67d	At the request of several developers, restore the DIAGNOSIC code deleted in 1.81. Increase the initial timeout limit to 2ms to eliminate spurious messages of excessive timeouts in the NFS client code. Requested by: Poul-Henning Kamp <phk@phk.freebsd.dk> Requested by: Mike Silbersack <silby@silby.com> Requested by: Sam Leffler <sam@errno.com>	2003-11-12 22:28:27 +00:00
Robert Watson	f0ab044241	Mark __mac_get_pid() as MPSAFE in the comment, as it runs without Giant and is also MPSAFE. Push Giant further down into __mac_get_fd() and __mac_set_fd(), grabbing it only for constrained regions dealing with VFS, and dropping it entirely for operations related to labeling of pipes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 22:19:15 +00:00
Peter Wemm	cde6302bf0	MNAMELEN is back to an int again after Kirk's statfs commit kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4) *** Error code 1	2003-11-12 17:09:12 +00:00
John Baldwin	861a7db56f	Fix a typo in a comment. Submitted by: das	2003-11-12 14:55:45 +00:00
Poul-Henning Kamp	1415a09d42	Replace B_PHYS conditional assignment to bio_offset with KASSERT check to see that the originating code already did it right.	2003-11-12 10:27:06 +00:00
Kirk McKusick	1977597b34	Update the five files derived from /sys/kern/syscalls.master after the additions made for the new statfs structure (version 1.157). These must be updated in a separate checkin after syscalls.master has been checked in so that they reflect its new CVS identity. As these are purely derived files, it is not clear to me why they are under CVS at all. I presume that it has something to do with having `make world' operate properly.	2003-11-12 08:09:19 +00:00
Kirk McKusick	fde81c7d8e	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
Robert Watson	eca8a663d4	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 03:14:31 +00:00
Alexander Kabaev	5c957adbf1	1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson	2003-11-12 02:54:47 +00:00
John Baldwin	961a7b244d	Add an implementation of turnstiles and change the sleep mutex code to use turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP	2003-11-11 22:07:29 +00:00

... 3 4 5 6 7 ...

7225 Commits