freebsd-nq

Author	SHA1	Message	Date
Poul-Henning Kamp	cf7742997a	Pass the file descriptor index down to vn_open. If the method vector was replaced and we got the "special return code" smile and trust that whatever happened below DTRT.	2003-07-27 20:09:13 +00:00
Poul-Henning Kamp	3ab6b09c53	Pass the fdidx argument from vn_open{_cred}() onto VOP_OPEN()	2003-07-27 20:05:36 +00:00
Poul-Henning Kamp	7c89f162bc	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.	2003-07-27 17:04:56 +00:00
Poul-Henning Kamp	1b6c609507	Call the new argument "fdidx" that is more precise than "fd".	2003-07-27 17:03:20 +00:00
David Malone	e41cbeba6d	Now that we can call kmem_malloc without Giant it should be safe to do mbuf allocation without Giant, so remove the GIANT_REQUIRED from mb_alloc in the M_TRYWAIT case.	2003-07-27 14:19:23 +00:00
Poul-Henning Kamp	a8d43c90af	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.	2003-07-26 07:32:23 +00:00
Scott Long	c43cad1ac1	Guard against MLEN growing larger than a uint8_t due to MSIZE grwoing to a value of 512 in LINT. This keeps gcc from complaining.	2003-07-26 07:23:24 +00:00
Alan Cox	18e8d4e79c	revision 1.51 of vm/uma_core.c modified uma_large_malloc() to acquire Giant when needed.	2003-07-25 22:26:43 +00:00
Mike Makonnen	a6ca48085c	The POSIX spec also requires that kern_sigtimedwait return EINVAL if tv_nsec of the timeout is less than zero.	2003-07-24 17:07:17 +00:00
Peter Wemm	80611144e4	Initialize 'blocked' to NULL. I think this was a real problem, but I am not sure about that. The lack of -Werror and the inline noise hid this for a while.	2003-07-23 20:29:13 +00:00
Poul-Henning Kamp	68f2d20b70	Revert stuff which accidentally ended up in the previous commit.	2003-07-22 10:36:36 +00:00
Poul-Henning Kamp	55d1d7034f	Don't attempt to inline large functions mb_alloc() and mb_free(), it more than doubles the text size of this file. GCC has wisely ignored us on this previously	2003-07-22 10:24:41 +00:00
David Xu	432b45de08	Always deliver synchronous signal to UTS for SA threads.	2003-07-21 00:26:52 +00:00
Mike Makonnen	6022ec6737	Turn a KASSERT back into an EINVAL return value. So, next time someone comes across it, it will turn into a core dump in userland instead of a kernel panic. I had also inverted the sense of the test, so Double pointy hat to: mtm	2003-07-19 11:32:48 +00:00
Mike Silbersack	f8bf8e397b	Three fixes: - Make m_prepend use m_gethdr instead of m_get where appropriate - Make m_copym use m_gethdr instead of m_get where appropriate - Add a call to m_fixhdr in m_defrag; m_defrag can't deal with corrupted pkthdr.len counts. MFC after: 3 days	2003-07-19 06:03:48 +00:00
Mike Makonnen	5c6edbec80	Remove a lock held across casuptr() that snuck in last commit.	2003-07-18 21:26:45 +00:00
Mike Makonnen	7df7f5c5ab	Move the decision on whether to unset the contested bit or not from lock to unlock time. Suggested by: jhb	2003-07-18 17:58:37 +00:00
Robert Drehmel	4e19fe1081	To avoid a kernel panic provoked by a NULL pointer dereference, do not clear the `sb_sel' member of the sockbuf structure while invalidating the receive sockbuf in sorflush(), called from soshutdown(). The panic was reproduceable from user land by attaching a knote with EVFILT_READ filters to a socket, disabling further reads from it using shutdown(2), and then closing it. knote_remove() was called to remove all knotes from the socket file descriptor by detaching each using its associated filterops' detach call- back function, sordetach() in this case, which tried to remove itself from the invalidated sockbuf's klist (sb_sel.si_note). PR: kern/54331	2003-07-17 23:49:10 +00:00
David Xu	3074d1b454	Fix sigwait to conform to POSIX. When a signal is being delivered to process, first find a sigwait thread to deliver, POSIX's argument is speed of delivering signal to sigwait thread is faster than other ways. A signal in its wait set will cause sigwait to return the signal number, a signal not in its wait set but in not blocked by the thread also causes sigwait to return, but sigwait returns EINTR, sigwait is oneshot operation, only one signal can be delivered to its wait set, when a signal is delivered to the sigwait thread, the thread's sigwait state is canceled.	2003-07-17 22:52:55 +00:00
David Xu	dd7da9aa28	o Refine kse_thr_interrupt to allow it to handle different commands. o Remove TDF_NOSIGPOST. o Add a member td_waitset to proc structure, it will be used for sigwait. Tested by: deischen	2003-07-17 22:45:33 +00:00
Robert Drehmel	e76bad968c	Correct six return statements which returned zero instead of an appropriate error number after a failure condition. In particular, three of the changed statements return ESRCH for a failed pfind(), and in also three places a non-zero return from p_cansee() will be passed back, Also noticed by: rwatson	2003-07-17 22:44:41 +00:00
Mike Makonnen	994599d782	Fix umtx locking, for libthr, in the kernel. 1. There was a race condition between a thread unlocking a umtx and the thread contesting it. If the unlocking thread won the race it may try to wakeup a thread that was not yet in msleep(). The contesting thread would then go to sleep to await a wakeup that would never come. It's not possible to close the race by using a lock because calls to casuptr() may have to fault a page in from swap. Instead, the race was closed by introducing a flag that the unlocking thread will set when waking up a thread. The contesting thread will check for this flag before going to sleep. For now the flag is kept in td_flags, but it may be better to use some other member or create a new one because of the possible performance/contention issues of having to own sched_lock. Thanks to jhb for pointing me in the right direction on this one. 2. Once a umtx was contested all future locks and unlocks were happening in the kernel, regardless of whether it was contested or not. To prevent this from happening, when a thread locks a umtx it checks the queue for that umtx and unsets the contested bit if there are no other threads waiting on it. Again, this is slightly more complicated than it needs to be because we can't hold a lock across casuptr(). So, the thread has to check the queue again after unseting the bit, and reset the contested bit if it finds that another thread has put itself on the queue in the mean time. 3. Remove the if... block for unlocking an uncontested umtx, and replace it with a KASSERT. The _only_ time a thread should be unlocking a umtx in the kernel is if it is contested.	2003-07-17 11:06:40 +00:00
Bosko Milekic	48719ca7c8	Change the style of the english used to print accounting enabled and disabled. This means no period at the end and changing "Process accounting <foo>" to "Accounting <foo>". Pointed out by: bde	2003-07-16 13:20:10 +00:00
Bosko Milekic	d2dbf5bc0b	Log process accounting activation/deactivation. Useful for some auditing purposes. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/54529	2003-07-16 03:59:50 +00:00
Don Lewis	6ff1481d5c	Rearrange the SYSINIT order to call lockmgr_init() earlier so that the runtime lockmgr initialization code in lockinit() can be eliminated. Reviewed by: jhb	2003-07-16 01:00:39 +00:00
David Xu	af161f2232	If initial thread is still a bound thread, don't change its signal mask.	2003-07-15 14:04:38 +00:00
Hartmut Brandt	7e9024cdd9	Add a facility for devices, specifically network interfaces, that require large to huge amounts of small or medium sized receive buffers. The problem with these situations is that they eat up the available DMA address space very quickly when using mbufs or even mbuf clusters. Additionally this facility provides a direct mapping between 32-bit integers and these buffers. This is needed for devices originally designed for 32-bit systems. Ususally the virtual address of the buffer is used as a handle to find the buffer as soon as it is returned by the card. This does not work for 64-bit machines and hence this mapping is needed.	2003-07-15 08:59:38 +00:00
David Xu	4b7d5d84ee	Rename thread_siginfo to cpu_thread_siginfo	2003-07-15 04:26:26 +00:00
Jeffrey Hsu	330841c763	Rev 1.121 meant to pass the value 1 to soalloc() to indicate waitok. Reported by: arr	2003-07-14 20:39:22 +00:00
Don Lewis	857d9c60d0	Extend the mutex pool implementation to permit the creation and use of multiple mutex pools with different options and sizes. Mutex pools can be created with either the default sleep mutexes or with spin mutexes. A dynamically created mutex pool can now be destroyed if it is no longer needed. Create two pools by default, one that matches the existing pool that uses the MTX_NOWITNESS option that should be used for building higher level locks, and a new pool with witness checking enabled. Modify the users of the existing mutex pool to use the appropriate pool in the new implementation. Reviewed by: jhb	2003-07-13 01:22:21 +00:00
Robert Drehmel	baf731e6ed	Make the system call vector name of a process accessible to user land applications by introducing the KERN_PROC_SV_NAME sysctl node, which is searchable by PID.	2003-07-12 02:00:16 +00:00
David Xu	ffb2e92a98	If a thread is sending signal to its process, if the thread can handle the signal itself, it should get it without looking for other threads.	2003-07-11 13:42:23 +00:00
Mike Silbersack	347194c172	Add init_param3() to subr_param. This function is called immediately after the kernel map has been sized, and is the optimal place for the autosizing of memory allocations which occur within the kernel map to occur. Suggested by: bde	2003-07-11 00:01:03 +00:00
Peter Wemm	e95babf3a8	unifdef -DLAZY_SWITCH and start to tidy up the associated glue.	2003-07-10 01:02:59 +00:00
Mike Silbersack	ff56f15e26	A few minor changes: - Use atomic ops to update the bigpipe count - Make the bigpipe count sysctl readable - Remove a duplicate comparison in an if statement - Comment two SYSCTLs.	2003-07-09 21:59:48 +00:00
Mike Silbersack	41f16f8208	Pull in the entire kmem_map size calculation from kern_malloc, rather than the shortcircuited version I had been using, which only worked properly on i386 & amd64. Also, change an autoscale constant to account for the more correct kmem_map size. Problem noticed by: mux	2003-07-08 18:59:21 +00:00
Jeff Roberson	0c0a98b231	- When stealing a kse in kseq_move() ignore the current kseq's min nice value. We want to steal any thread, even one that is not given a slice on its current queue.	2003-07-08 06:19:40 +00:00
Mike Silbersack	289016f2d1	Put some concrete limits on pipe memory consumption: - Limit the total number of pipes so that we do not exhaust all vm objects in the kernel map. When this limit is reached, a ratelimited message will be printed to the console. - Put a soft limit on the amount of memory consumable by pipes. Once the limit has been reached, all new pipes will be limited to 4K in size, rather than the default of 16K. - Put a limit on the number of pages that may be used for high speed page flipping in order to reduce the amount of wired memory. Pipe writes that occur while this limit is exceeded will fall back to non-page flipping mode. The above values are auto-tuned in subr_param.c and are scaled to take into account both the size of physical memory and the size of the kernel map. These limits help to reduce the "kernel resources exhausted" panics that could be caused by opening a large number of pipes. (Pipes alone are no longer able to exhaust all resources, but other kernel memory hogs in league with pipes may still be able to do so.) PR: 53627 Ideas / comments from: hsu, tjr, dillon@apollo.backplane.com MFC after: 1 week	2003-07-08 04:02:31 +00:00
Jeff Roberson	0ec896fd28	- Clean up an unused variable. Submitted by: Steve Kargl <skg@routmask.apl.washington.edu>	2003-07-07 21:08:28 +00:00
Mike Makonnen	14b5ae1a98	Make the conditional, which decides what siglist to put a signal on, more concise and improve the comment. Submitted by: bde	2003-07-05 08:37:40 +00:00
Mike Makonnen	e55c35c433	I was so happy I found the semi-colon from hell that I didn't notice another typo in the same line. This typo makes libthr unuseable, but it's effects where counter-balanced by the extra semicolon, which made libthr remarkably useable for the past several months.	2003-07-04 23:28:42 +00:00
Jeff Roberson	749d01b011	- Parse the cpu topology map in sched_setup(). - Associate logical CPUs on the same physical core with the same kseq. - Adjust code that assumed there would only be one running thread in any kseq. - Wrap the HTT code with a ULE_HTT_EXPERIMENTAL ifdef. This is a start towards HyperThreading support but it isn't quite there yet.	2003-07-04 19:59:00 +00:00
Poul-Henning Kamp	1226914c17	Use the f_vnode field to tell which file descriptors have a vnode.	2003-07-04 12:20:27 +00:00
Mike Makonnen	1069e3a6f4	It's unfair how one extraneous semi-colon can cause so much grief.	2003-07-04 11:18:07 +00:00
Mike Makonnen	71cfaac0b0	style(9) o Remove double-spacing, and while I'm here add a couple of braces as well. Requested by: bde	2003-07-04 06:59:28 +00:00
Olivier Houchard	a10d5f02c8	In setpgrp(), don't assume a pgrp won't exist if the provided pgid is the same as the target process' pid, it may exist if the process forked before leaving the pgrp. Thix fixes a panic that happens when calling setpgid to make a process re-enter the pgrp with the same pgid as its pid if the pgrp still exists.	2003-07-04 02:21:28 +00:00
Mike Makonnen	8689793bfb	kse_thr_interrupt should target the thread, specifically. Requested by: davidxu	2003-07-04 01:41:32 +00:00
Mike Makonnen	c197abc49a	Signals sent specifically to a particular thread must be delivered to that thread, regardless of whether it has it masked or not. Previously, if the targeted thread had the signal masked, it would be put on the processes' siglist. If another thread has the signal umasked or unmasks it before the target, then the thread it was intended for would never receive it. This patch attempts to solve the problem by requiring callers of tdsignal() to say whether the signal is for the thread or for the process. If it is for the process, then normal processing occurs and any thread that has it unmasked can receive it. But if it is destined for a specific thread, it is put on that thread's pending list regardless of whether it is currently masked or not. The new behaviour still needs more work, though. If the signal is reposted for some reason it is always posted back to the thread that handled it because the information regarding the target of the signal has been lost by then. Reviewed by: jdp, jeff, bde (style)	2003-07-03 19:09:59 +00:00
John Baldwin	f7ee15901a	- Add comments about the maintenance of the per-thread list of contested locks held by each thread. - Fix a bug in the original BSD/OS code where a contested lock was not properly handed off from the old thread to the new thread when a contested lock with more than one blocked thread was transferred from one thread to another. - Don't use an atomic operation to write the MTX_CONTESTED value to mtx_lock in the aforementioned special case. The memory barriers and exclusion provided by sched_lock are sufficient. Spotted by: alc (2)	2003-07-02 16:14:09 +00:00
John Baldwin	6591b31040	Add a resource_disabled() helper function that returns true (non-zero) if a specified resource has been disabled via a non-zero 'disabled' hint and false otherwise.	2003-07-02 16:01:38 +00:00
Poul-Henning Kamp	d94e36521e	typo fix in comment.	2003-07-02 08:01:52 +00:00
David Xu	34178711be	Allow SA process unblocks a thread blocked in condition variable. Reviewed by: deischen	2003-07-02 01:19:15 +00:00
Ian Dowse	318f2fb4bf	Add a new mount flag MNT_BYFSID that can be used to unmount a file system by specifying the file system ID instead of a path. Use this by default in umount(8). This avoids the need to perform any vnode operations to look up the mount point, so it makes it possible to unmount a file system whose root vnode cannot be looked up (e.g. due to a dead NFS server, or a file system that has become detached from the hierarchy because an underlying file system was unmounted). It also provides an unambiguous way to specify which file system is to be unmunted. Since the ability to unmount using a path name is retained only for compatibility, that case now just uses a simple string comparison of the supplied path against f_mntonname of each mounted file system. Discussed on: freebsd-arch mdoc help from: ru	2003-07-01 17:40:23 +00:00
Scott Long	79501b66a7	Make swi_vm be INTR_MPSAFE. On all platforms, it is only used to activate busdma_swi(). Now that busdma_swi() uses driver-provided locking, this should be safe.	2003-07-01 16:00:38 +00:00
David Xu	df9c6cda37	Fix typo.	2003-06-30 10:04:04 +00:00
Marcel Moolenaar	4e4422d4d4	Don't use fuword() and suword() on struct members of type int. This happens to work on 32-bit platforms as sizeof(long)=sizeof(int), but wrecks all kinds of havoc (garbage reads, corrupting writes and misaligned loads/stores) on 64-bit architectures. The fix for now is to use fuword32() and suword32() and change the type of the applicable int fields to int32. This is to make it explicit that we depend on these fields being 32-bit. We may want to revisit this later. Reviewed by: deischen	2003-06-28 19:45:15 +00:00
Jeff Roberson	7a20304f84	- Don't migrate to stopped cpus.	2003-06-28 09:09:33 +00:00
David Xu	9dde3bc999	o Change kse_thr_interrupt to allow send a signal to a specified thread, or unblock a thread in kernel, and allow UTS to specify whether syscall should be restarted. o Add ability for UTS to monitor signal comes in and removed from process, the flag PS_SIGEVENT is used to indicate the events. o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with this flag set to wait for above signal event. o For SA based thread, kernel masks all signal in its signal mask, let UTS to use kse_thr_interrupt interrupt a thread, and install a signal frame in userland for the thread. o Add a tm_syncsig in thread mailbox, when a hardware trap occurs, it is used to deliver synchronous signal to userland, and upcall is schedule, so UTS can process the synchronous signal for the thread. Reviewed by: julian (mentor)	2003-06-28 08:29:05 +00:00
Jeff Roberson	86f8ae9663	- If smp is not started yet don't try to load balance or we'll put threads on cpus that aren't running yet.	2003-06-28 08:24:42 +00:00
David Xu	418228df24	Fix POSIX compatible bug for sigwaitinfo and sigtimedwait. POSIX says siginfo pointer parameter can be NULL and if the function success, it should return signal number but not zero. The waitset it past should be negatived before it can be used as thread signal mask.	2003-06-28 08:03:28 +00:00
Jeff Roberson	a91172ade1	- Throttle the inherited sleep and run time in sched_fork_kseg(). This allows us to learn the behavior of a thread much more quickly after it starts up.	2003-06-28 06:19:56 +00:00
Jeff Roberson	e493a5d90c	- Adjust the default maximum slice value to ~140ms. This has improved the nice distribution without significantly impacting interactive response. As a side effect it should also allow batch processes to run for a slightly longer period which will positively impact their performance.	2003-06-28 06:04:47 +00:00
Peter Wemm	eabd19726f	Tidy up leftover lazy_switch instrumentation that is no longer needed. This cleans up some #ifdef hell.	2003-06-27 22:39:14 +00:00
Sean Kelly	6cda41555b	Fix this to build on alpha. Build test successful. Suggested fix from: tjr	2003-06-27 08:35:05 +00:00
Sean Kelly	370c3cb57c	- Add a software watchdog facility. This commit has two pieces. One half is the watchdog kernel code which lives primarily in hardclock() in sys/kern/kern_clock.c. The other half is a userland daemon which, when run, will keep the watchdog from firing while the userland is intact and functioning. Approved by: jeff (mentor)	2003-06-26 09:50:52 +00:00
Warner Losh	4f2073fb4c	Fix leap second processing by the kernel time keeping routines. Before, we would add/subtract the leap second when the system had been up for an even multiple of days, rather than at the end of the day, as a leap second is defined (at least wrt ntp). We do this by calculating the notion of UTC earlier in the loop, and passing that to get it adjusted. Any adjustments that ntp_update_second makes to this time are then transferred to boot time. We can't pass it either the boot time or the uptime because their sum is what determines when a leap second is needed. This code adds an extra assignment and two extra compare in the typical case, which is as cheap as I could made it. I have confirmed with this code the kernel time does the correct thing for both positive and negative leap seconds. Since the ntp interface doesn't allow for +2 or -2, those cases can't be tested (and the folks in the know here say there will never be a +2s or -2s leap event, but rather two +1s or -1s leap events). There will very likely be no leap seconds for a while, given how the earth is speeding up and slowing down, so there will be plenty of time for this fix to propigate. UT1-UTC is currently at "about -0.4s" and decrementing by .1s every 8 months or so. 6 * 8 is 48 months, or 4 years. -stable has different code, but a similar bug that was introduced about the time of the last leap second, which is why nobody has noticed until now. MFC After: 3 weeks Reviewed by: phk "Furthermore, leap seconds must die." -- Cato the Elder	2003-06-25 21:23:51 +00:00
Warner Losh	eac3c62b51	During a positive leap second, the tai_time offset should be incremented at the start of the leap second, not after the leap second has been inserted. This is because at the start of the leap second, we set the time back one second. This setting back one second is the moment that the offset changes. The old code set it back after the leap second, but that's one second too late. The negative leap second case is handled correctly. Reviewed by: phk	2003-06-25 20:56:40 +00:00
Olivier Houchard	7f3bfd6651	At this point targp will always be NULL, so remove the useless if.	2003-06-25 13:28:32 +00:00
Warner Losh	4e82e5f6f1	Use UTC rather than GMT to describe time scale. latter is obsolete.	2003-06-23 20:14:08 +00:00
Robert Watson	f51e58036e	Redesign the externalization APIs from the MAC Framework to the MAC policy modules to improve robustness against C string bugs and vulnerabilities. Following these revisions, all string construction of labels for export to userspace (or elsewhere) is performed using the sbuf API, which prevents the consumer from having to perform laborious and intricate pointer and buffer checks. This substantially simplifies the externalization logic, both at the MAC Framework level, and in individual policies; this becomes especially useful when policies export more complex label data, such as with compartments in Biba and MLS. Bundled in here are some other minor fixes associated with externalization: including avoiding malloc while holding the process mutex in mac_lomac, and hence avoid a failure mode when printing labels during a downgrade operation due to the removal of the M_NOWAIT case. This has been running in the MAC development tree for about three weeks without problems. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-23 01:26:34 +00:00
Robert Watson	6b42f0a2eb	Prefer the vop_rmextattr() vnode operation for removing extended attributes from objects over vop_setextattr() with a NULL uio; if the file system doesn't support the vop_rmextattr() method, fall back to the vop_setextattr() method. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-22 23:03:07 +00:00
Robert Watson	77533ed2aa	Expose vop_rmextattr as an explicit operation at the vnode operation interface, rather than relying on a NULL uio for the deletion operation. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-22 22:45:24 +00:00
Robert Watson	4b090e41ff	Add an explicit credential argument to alq_open() to allow the caller to specify what credential to use when authorizing vn_open() and later write operations, rather than curthread->td_ucred. When writing KTR traces to an ALQ, specify the credential of the thread generating the sysctl request. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-22 22:28:56 +00:00
Poul-Henning Kamp	3b6d965263	Add a f_vnode field to struct file. Several of the subtypes have an associated vnode which is used for stuff like the f*() functions. By giving the vnode a speparate field, a number of checks for the specific subtype can be replaced simply with a check for f_vnode != NULL, and we can later free f_data up to subtype specific use. At this point in time, f_data still points to the vnode, so any code I might have overlooked will still work.	2003-06-22 08:41:43 +00:00
Ian Dowse	adef9265ef	When DDB is active, always send printf() output directly to the console, even if there is a TIOCCONS console tty. We were already doing this after a panic, but it's also useful when entering DDB for some other reason too.	2003-06-22 03:20:24 +00:00
Ian Dowse	d29bf12ff8	Use a new message buffer `consmsgbuf' to forward messages to a TIOCCONS console (e.g. xconsole) via a timeout routine instead of calling into the tty code directly from printf(). This fixes a number of cases where calling printf() at the wrong time (such as with locks held) would cause a panic if xconsole is running. The TIOCCONS message buffer is 8k in size by default, but this can be changed with the kern.consmsgbuf_size sysctl. By default, messages are checked for 5 times per second. The timer runs and the buffer memory remains allocated only at times when a TIOCCONS console is active. Discussed on: freebsd-arch	2003-06-22 02:54:33 +00:00
Ian Dowse	4784a46912	Replace the code for reading and writing the kernel message buffer with a new implementation that has a mostly reentrant "addchar" routine, supports multiple message buffers in the kernel, and hides the implementation details from callers. The new code uses a kind of sequence number to represend the current read and write positions in the buffer. This approach (suggested mainly by bde) permits the read and write pointers to be maintained separately, which reduces the number of atomic operations that are required. The "mostly reentrant" above refers to the way that while it is now always safe to have any number of concurrent writers, readers could see the message buffer after a writer has advanced the pointers but before it has witten the new character. Discussed on: freebsd-arch	2003-06-22 02:18:31 +00:00
Jeff Roberson	1a7a9d0ec2	- lticks was erroneously being updated in sched_pctcpu(). This was causing us to skip the pctcpu_update() call which lead to inaccurate cpu usage statistics for processes that didn't run often.	2003-06-21 02:31:49 +00:00
Jeff Roberson	665cb285a8	- Don't allow nice to have such a large effect on priority. This was causing poor interactive performance while unnice processes were running. The new scheme still allows nice to have an effect on priority but it is not as dramatic as the effect of the interactivity score.	2003-06-21 02:22:47 +00:00
Bosko Milekic	b2b417bb41	Fix a divide-by-zero on kern.log_wakeups_per_second tunable. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/53557	2003-06-20 22:18:38 +00:00
Stefan Eßer	c2ef4dd48a	Add comment about **vpp being special-cased in vnode_if.awk (1.38)	2003-06-20 12:24:06 +00:00
David Xu	ab78d4d641	cpu_set_upcall_kse needs to access userspace, release schedule lock before calling it for bound thread. To avoid this problem, change thread_schedule_upcall to not put new thread on run queue, let caller do it, so we can tweak the new thread before setting it to run. Reported by: pho	2003-06-20 09:12:12 +00:00
Poul-Henning Kamp	166400b7e6	Don't put callout_lock under #ifdef DIAGNOSTIC despite the fact that it works anyway.	2003-06-20 08:39:04 +00:00
Poul-Henning Kamp	568733688b	Initialize b_saveaddr when we hand out buffers	2003-06-20 08:26:38 +00:00
Poul-Henning Kamp	ce6912c420	Crude but efficient: #ifdef DIAGNOSTIC hold a mutex while calling callout's so that we hear about it if they sleep.	2003-06-20 08:07:15 +00:00
Poul-Henning Kamp	eaaca5deee	Don't (re)initialize f_gcflag to zero. Move initialization of DTYPE_VNODE specific field f_seqcount into the DTYPE_VNODE specific code.	2003-06-20 08:02:30 +00:00
David Xu	062cf543fc	When a STOP signal is being sent to a process, it is possible all threads in the process have already masked the signal, so job control is delayed. But later a thread unmasking the STOP signal should enable job control, so in issignal(), scanning all threads in process to see if we can direct suspend some of them, not just suspend current thread.	2003-06-20 03:36:45 +00:00
David Xu	8b56079e2b	Fix typo. td should be td0.	2003-06-20 01:56:28 +00:00
Alfred Perlstein	bab88630ba	Unlock the struct file lock before aquiring Giant, otherwise we can deadlock because of lock order reversals. This was not caught because Witness ignores pool mutexes right now. Diagnosis and help: truckman Noticed by: pho	2003-06-19 18:13:07 +00:00
Mike Silbersack	b083ea5114	Add a ratelimited message of the form "maxproc limit exceeded by uid %i, please see tuning(7) and login.conf(5)." Which will be triggered whenever a user hits his/her maxproc limit or the systemwide maxproc limit is reached. MFC after: 1 week	2003-06-19 05:57:25 +00:00
Don Lewis	6084b6c9d5	FILE_LOCK() uses a pool mutex, as does the vnode v_vnlock. Since pool mutexes are supposed to only be used as leaf mutexes, and what appear to be separate pool mutexes could be aliased together, it is bad idea for a thread to attempt to hold two pool mutexes at the same time. Slightly rearrange the code in kern_open() so that FILE_UNLOCK() is called before calling VOP_GETVOBJECT(), which will grab the v_vnlock mutex.	2003-06-19 04:10:56 +00:00
Mike Silbersack	4d7dfc31b8	Add a rate limited message reporting when kern.maxfiles is exceeded, reporting who did it. Also, fix a style bug introduced in the previous change. MFC after: 1 week	2003-06-19 04:07:12 +00:00
Don Lewis	8d5f9131fc	VOP_GETVOBJECT() wants to be called with the vnode lock held.	2003-06-19 03:55:01 +00:00
Poul-Henning Kamp	2db4b023bb	Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that rather than assume that only DTYPE_VNODE is seekable.	2003-06-18 19:53:59 +00:00
Mike Silbersack	438f085b2f	Reserve the last 5% of file descriptors for root use. This should allow systems to fail more gracefully when a file descriptor exhaustion situation occurs. Original patch by: David G. Andersen <dga@lcs.mit.edu> PR: 45353 MFC after: 1 week	2003-06-18 18:57:58 +00:00
Poul-Henning Kamp	7c2d2efd58	Initialize struct fileops with C99 sparse initialization.	2003-06-18 18:16:40 +00:00
Jeff Roberson	d07ac847ef	- Use a more robust mechanism for determining whether or not a kse is on a kseq.	2003-06-17 19:49:18 +00:00
Scott Long	04d2f20f6b	Drop the proc lock around SYSCTL_OUT in the no-threads case. Submitted by: truckman	2003-06-17 19:14:00 +00:00
Jeff Roberson	7cd0f83355	- Temporarily patch a problem where the interact score could be negative because the run time exceeds the largest value a signed int can hold. The real solution involves calculating how far we are over the limit. To quickly solve this problem we loop removing 1/5th of the current value until it falls below the limit. The common case requires no passes.	2003-06-17 10:21:34 +00:00
Jeff Roberson	4b60e3242e	- Add a new function "sched_interact_update()" that scales back the sleep and run time. - Scale the sleep and run time back via sched_interact_update() in more places. This is to keep the statistic more accurate. - Charge a parent one tick for forking a child. - Add only the run time and not the sleep time to the parents kg when a thread exits. This allows us to give a penalty for having an expensive thread exit but does not give a bonus for having an interactive thread exit. - Change the SLP_RUN_THROTTLE to limit us to 4/5th and not 1/2. - Change the SLP_RUN_MAX to two seconds. This keeps bursty interactive applications like mozilla and openoffice in the interactive range even through expensive tasks. - Recalculate the slice after every sleep. This ensures that once a task has been marked interactive it only has a slice of 1 at the risk of giving tasks that sleep for a very brief period a longer time slice.	2003-06-17 06:39:51 +00:00
Mike Silbersack	51710a4597	Hide the m_defrag* statistics under MBUF_STRESS_TEST, there seems to be no need to see them in the general case (and they aren't smp-safe anyway.) Suggested by: hmp MFC after: 1 week	2003-06-17 02:34:40 +00:00
David Xu	4184d79115	Forgot to commit code to disable creating a bound thread in same group again except first kse_create syscall. Noticed by: julian	2003-06-16 23:46:41 +00:00
David Xu	075102cc4e	Reset ncpus to 1 for bound thread group since there is only one thread in such group. Change message text from kse_rel to kserel, it is better displayed in top.	2003-06-16 13:14:52 +00:00
Poul-Henning Kamp	e725c18c3a	Get rid of the b_spc specialty field in struct buf by using an already available caller private field.	2003-06-16 07:18:39 +00:00
Poul-Henning Kamp	2a0f8aeb52	I have not had any reports of trouble for a long time, so remove the gentle versions of the vop_strategy()/vop_specstrategy() mismatch methods and use vop_panic() instead.	2003-06-15 19:49:14 +00:00
Robert Watson	2bceb0f2b2	Various cr*() calls believed to be MPSAFE, since the uidinfo code is locked down.	2003-06-15 15:57:42 +00:00
David Xu	cd4f6ebb13	1. Add code to support bound thread. when blocked, a bound thread never schedules an upcall. Signal delivering to a bound thread is same as non-threaded process. This is intended to be used by libpthread to implement PTHREAD_SCOPE_SYSTEM thread. 2. Simplify kse_release() a bit, remove sleep loop.	2003-06-15 12:51:26 +00:00
Ian Dowse	4f1b457770	Don't overwrite the static panicstr buffer for secondary and further panics. Before revision 1.38, we used to just point panicstr at the format string if panicstr was NULL, but since we now use a static buffer for the formatted panic message, we have to be careful to only write to it during the first panic. Pointed out by: bde	2003-06-15 11:43:00 +00:00
Jeff Roberson	3c12473229	- Increase the ksegrp's cpu time history buffer to 250ms. - Decrease the history buffer divisor to 2 so that we remember more of the old behavior.	2003-06-15 04:14:25 +00:00
David Xu	1d5a24bec6	1. Migrate TDF_UPCALLING from td_flags to td_pflags. 2. Add a flag TDF_SA, it will be used to distinguish SA based thread from bound thread.	2003-06-15 03:18:58 +00:00
Jeff Roberson	b41f3d22cc	- Cap the growth of sleep and run time in sched_exit_kse().	2003-06-15 02:52:29 +00:00
Jeff Roberson	210491d3d9	- Fix the maximum slice value. I accidentally checked in a value of '2' which meant no process would run for longer than 20ms. - Slightly redo the interactivity scorer. It follows the same algorithm but in a slightly more correct way. Previously values above half were incorrect. - Lower the interactivity threshold to 20. It seems that in testing non- interactive tasks are hardly ever near there and expensive interactive tasks can sometimes surpass it. This area needs more testing. - Remove an unnecessary KTR. - Fix a case where an idle thread that had an elevated priority due to priority prop. would be placed back on the idle queue. - Delay setting NEEDRESCHED until userret() for threads that haad their priority elevated while in kernel. This gives us the same context switch optimization as SCHED_4BSD. - Limit the child's slice to 1 in sched_fork_kse() so we detect its behavior more quickly. - Inhert some of the run/slp time from the child in sched_exit_ksegrp(). - Redo some of the priority comparisons so they are more clear. - Throttle the frequency of sched_pctcpu_update() so that rounding errors do not make it invalid.	2003-06-15 02:18:29 +00:00
David Xu	0e2a4d3aeb	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.	2003-06-15 00:31:24 +00:00
Alan Cox	49a2507bd1	Migrate the thread stack management functions from the machine-dependent to the machine-independent parts of the VM. At the same time, this introduces vm object locking for the non-i386 platforms. Two details: 1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The different machine-dependent implementations used various combinations of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set KSTACK_GUARD_PAGES to 0. 2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In 5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed to vm_page_alloc() or vm_page_grab().	2003-06-14 23:23:55 +00:00
Alan Cox	89f4fca265	Move the _new_altkstack() and _dispose_altkstack() functions out of the various pmap implementations into the machine-independent vm. They were all identical.	2003-06-14 06:20:25 +00:00
Maxime Henrion	fca737117a	Style(9).	2003-06-13 19:39:21 +00:00
Dag-Erling Smørgrav	c2935410f6	Make the VFS cache use zones instead of malloc(9). This results in a small but noticeable increase in performance for name lookup operations. The code uses two zones, one for short names (less than 32 characters) and one for long names (up to NAME_MAX). Since most file names are fairly short, this saves a considerable amount of space that would otherwise be wasted if we always allocated NAME_MAX bytes. The cutoff value of 32 characters was picked arbitrarily and may benefit from some tweaking; it could also be made into a tunable. Submitted by: hmp	2003-06-13 08:46:13 +00:00
Alan Cox	8630c1173e	Add vm object locking to various pagers' "get pages" methods, i386 stack management functions, and a u area management function.	2003-06-13 03:02:28 +00:00
Poul-Henning Kamp	7652131bee	Initialize struct vfsops C99-sparsely. Submitted by: hmp Reviewed by: phk	2003-06-12 20:48:38 +00:00
Dag-Erling Smørgrav	633f506489	Document some sysctl variables. Submitted by: hmp	2003-06-12 19:46:51 +00:00
Scott Long	30c6f34e00	Add support to sysctl_kern_proc to return all threads in a proc, not just the first one. The old behaviour can be switched by specifying KERN_PROC_PROC. Submitted by: julian, tweaks and added functionality by myself	2003-06-12 16:41:50 +00:00
Alan Cox	c10c537816	Finish the vm object locking in sendfile(2). More generally, the vm locking in sendfile(2) is complete.	2003-06-12 05:52:09 +00:00
Alan Cox	2ab3670aad	Lock the vm object when removing a page.	2003-06-11 21:23:04 +00:00
Alan Cox	f717a9d063	Lock the vm object when removing a page.	2003-06-11 16:37:33 +00:00
Dag-Erling Smørgrav	ffe92432e3	Whitespace cleanup.	2003-06-11 07:35:56 +00:00
Alan Cox	c40f7377a4	Add vm object locking.	2003-06-11 06:43:48 +00:00
David E. O'Brien	f4636c5959	Use __FBSDID().	2003-06-11 06:34:30 +00:00
Paul Saab	1795d0cdec	Don't overflow when calculating vm_kmem_size. This fixes kmem_map too small panics on PAE machines which have odd > 4GB sizes (4.5 gig would render a 20MB of KVA for kmem_map instead of 200MB). Submitted by: John Cagle <john.cagle@hp.com>, jeff Reviewed by: jeff, peter, scottl, lots of USENIX folks	2003-06-11 05:18:59 +00:00
David Xu	7677ce18b8	Fix error in my last commit. Correctly maintain p_maxthrwaits and unlock sched_lock.	2003-06-11 01:08:33 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
David Xu	36407bec4f	If there are signals delivered to current thread, breaks out of loop, userret() will be called again by ast() and thread_userret() will be called again by userret(). Reported by: tegge	2003-06-10 02:21:32 +00:00
Maxime Henrion	0ca5dc1c3e	style(9).	2003-06-09 21:57:48 +00:00
John Baldwin	5499ea019d	Wait for the real interval timer callout handler to finish executing if it is currently executing when we try to remove it in exit1(). Without this, it was possible for the callout to bogusly rearm itself and eventually refire after the process had been free'd resulting in a panic. PR: kern/51964 Reported by: Jilles Tjoelker <jilles@stack.nl> Reviewed by: tegge, bde	2003-06-09 21:46:22 +00:00
John Baldwin	8bccf7034e	The issetugid() function is MPSAFE.	2003-06-09 21:34:19 +00:00
Alan Cox	06fa71cdcc	Update the vm object and page locking in exec_map_first_page(). Mark the one still anticipated change with XXX. Otherwise, this function is done.	2003-06-09 19:37:14 +00:00
Alan Cox	4412dc5468	- Add vm object locking to vm_pgmoveco(). - Add a comment to vm_pgmoveco() describing what remains to be done for vm locking.	2003-06-09 19:23:03 +00:00
Juli Mallett	c02d762181	Attempt to fix Alpha build by renaming ident[] to kern_ident[].	2003-06-09 18:19:33 +00:00
John Baldwin	5e26dcb560	- Add a td_pflags field to struct thread for private flags accessed only by curthread. Unlike td_flags, this field does not need any locking. - Replace the td_inktr and td_inktrace variables with equivalent private thread flags. - Move TDF_OLDMASK over to the private flags field so it no longer requires sched_lock.	2003-06-09 17:38:32 +00:00
Juli Mallett	da1186f2c7	Expose kern.ident by way of OID_AUTO. Requested by: phk	2003-06-09 10:54:23 +00:00
Jeff Roberson	356500a306	- Add a simple CPU load balancing algorithm. This works by executing once a second and equalizing the load between the two most imbalanced CPU. This is intended to clear up long term load imbalances that would not be handled by the 'pull' method in sched_choose(). - Pull out some bits of sched_choose() into a kseq_move() function that moves an arbitrary thread from one kseq to another.	2003-06-09 00:39:09 +00:00
Alan Cox	fd0cc9a862	Lock the vm object when performing vm_page_grab().	2003-06-08 07:14:30 +00:00
Jeff Roberson	b90816f188	- When a new thread is added to a kseq the load is incremented prior to adding it to the nice tables. Therefore, in kseq_add_nice, we should keep in mind that the load will be 1 if we are the only thread, and not 0. - Assert that the sched lock is held in all the appropriate places. - Increase the scope of the sched lock in sched_pctcpu_update(). - Hold the sched lock in sched_runnable(). It is not held by the caller.	2003-06-08 00:47:33 +00:00
Poul-Henning Kamp	84c080a85e	Improve the root-dev prompt facility for printing devices which could possibly be a root filesystem.	2003-06-07 15:46:53 +00:00
David Xu	b0bd5f38a6	thread_signal_add now is called with ps_mtx held, unlock it before calling copyin.	2003-06-06 02:17:38 +00:00
Robert Watson	777621799b	If a system call comes in requesting to retrieve an attribute named "", temporarily map it to a call to extattr_list_vp() to provide compatibility for older applications using the "" API to retrieve EA lists. Use VOP_LISTEXTATTR() to support extattr_list_vp() rather than VOP_GETEXTATTR(..., "", ...). Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Asssociates Laboratories	2003-06-05 05:55:34 +00:00
Robert Watson	a6f1342ff6	Add vop_listextattr(), similar to vop_getextattr() but without a specific attribute name. It will have the same semantics as the older vop_getextattr() "retrieve the names" hack, returning a buffer with ASCII nul-seperated names. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-05 05:53:35 +00:00
Marcel Moolenaar	11e0f8e16d	Change the second (and last) argument of cpu_set_upcall(). Previously we were passing in a void* representing the PCB of the parent thread. Now we pass a pointer to the parent thread itself. The prime reason for this change is to allow cpu_set_upcall() to copy (parts of) the trapframe instead of having it done in MI code in each caller of cpu_set_upcall(). Copying the trapframe cannot always be done with a simply bcopy() or may not always be optimal that way. On ia64 specifically the trapframe contains information that is specific to an entry into the kernel and can only be used by the corresponding exit from the kernel. A trapframe copied verbatim from another frame is in most cases useless without some additional normalization. Note that this change removes the assignment to td->td_frame in some implementations of cpu_set_upcall(). The assignment is redundant. A previous call to cpu_thread_setup() already did the exact same assignment. An added benefit of removing the redundant assignment is that we can now change td_pcb without nasty side-effects. This change officially marks the ability on ia64 for 1:1 threading. Not tested on: amd64, powerpc Compile & boot tested on: alpha, sparc64 Functionally tested on: i386, ia64	2003-06-04 21:13:21 +00:00
Poul-Henning Kamp	22ee8c4f50	Add instrumentation which tells us how much work softclock() does per invocation.	2003-06-04 05:25:58 +00:00
Robert Watson	8bebbb1a32	Implementations of extattr_list_fd(), extattr_list_file(), and extattr_list_link() system calls, which return a least of extended attributes defined for a vnode referenced by a file descriptor or path name. Currently, we just invoke VOP_GETEXTATTR() since it will convert a request for an empty name into a query for a name list, which was the old (more hackish) API. At some point in the near future, we'll push the distinction between get and list down to the vnode operation layer, but this provides access to the new API for applications in the short term. Pointed out by: Dominic Giampaolo <dbg@apple.com> Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-04 03:57:28 +00:00
Robert Watson	31d13e2a29	Regen from syscalls.master:1.149, addition of extended attribute list system calls for fd, file, link.	2003-06-04 03:50:20 +00:00
Robert Watson	9e18f27730	Add system calls to explicitly list extended attributes on a file/directory/link, rather than using a less explicit hack on the extattr retrieval API: extattr_list_fd() extattr_list_file() extattr_list_link() The existing API was counter-intuitive, and poorly documented. The prototypes for these system calls are identical to extattr_get_*(), but without a specific attribute name to leave NULL. Pointed out by: Dominic Giampaolo <dbg@apple.com> Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-04 03:49:31 +00:00
Robert Watson	0b95513444	Assert the vnode lock when returning successfully from vn_open_cred().	2003-06-04 00:54:27 +00:00
Julian Elischer	2b035cbe5a	Remove un-needed code. Don't copyin() data we are about to overwrite. Add a flag to tell userland that KSE is officially "DONE" with the mailbox and has gone away. Obtained from: davidxu@	2003-06-04 00:12:57 +00:00
Bosko Milekic	479728fd77	Fix a potential bucket leak where when freeing to an empty bucket we failed to put the bucket back into the general cache/container. Also, fix a bad assumption. There was a KASSERT() that aimed to guarantee that whenever the pcpu container's mc_starved was > 0, that whatever the bucket we were freeing to was an empty bucket, assuming it belonged to the pcpu container cache. However, there is at least one case where this is not true anymore; consider: 1) All containers empty, next thread to try to alloc will touch a pcpu container, notice it's empty, and increment the pcpu container's mc_starved. 2) Some other thread frees an mbuf belonging to a bucket in the general cache/container. Then it frees another mbuf belonging to the same bucket (still in gen container). 3) Some third thread tries to allocate an mbuf from the pcpu container and, since empty, grabs one mbuf now available in the general cache and moves the non-empty bucket from which it took 1 mbuf and to which the thread in (2) freed to, and moves it to the pcpu container. 4) A final thread tries to free an mbuf belonging to the NON-EMPTY bucket mentionned in (2) and (3) and, since the pcpu container's mc_starved is > 0, but the bucket is obviously non-empty, it trips on the KASSERT. This meant that one could potentially get a panic in some cases when out of mbufs and clusters. The problem could be mitigated by commenting out some cv_signal() calls, but I'm assuming that was pure coincidence and this is the correct fix.	2003-06-03 19:19:13 +00:00
Jeff Roberson	980c75b4d8	- Remove the blocked pointer from the umtx structure. - Use a hash of umtx queues to queue blocked threads. We hash on pid and the virtual address of the umtx structure. This eliminates cases where we previously held a lock across a casuptr call. Reviwed by: jhb (quickly)	2003-06-03 05:24:46 +00:00
Tor Egge	ad05d58087	Add tracking of process leaders sharing a file descriptor table and allow a file descriptor table to be shared between multiple process leaders. PR: 50923	2003-06-02 16:05:32 +00:00
Marcel Moolenaar	bf822712f7	Remove the ia64 hackery in threadinit() that was needed to work around the lameness of the kstack code. The EPC overhaul de-lame-ified the kstack code by removing the need for contigmalloc(). We can now allocate stacks using malloc(). We probably want to make the stacks swappable as well so that we can make it MI. But that's another story.	2003-06-01 05:57:58 +00:00
Robert Watson	ef2e1ca561	Attempt to further comment and clarify System V IPC logic: document why certain exceptions are made, note an inconsistency between FreeBSD and some other implementations regarding IPC_M, and let suser() generate our EPERM rather than forcing it ourselves. Remove a carriage return that crept in in the last commit. Reviewed by: gordon Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-05-31 23:31:51 +00:00
Robert Watson	a0ccd3f6ad	Attempt to marginally de-obfuscate sections of the System V IPC access control logic. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-05-31 23:17:30 +00:00
Poul-Henning Kamp	b82af320cf	Add "" around mutex name to make message less confusing.	2003-05-31 21:11:01 +00:00
Poul-Henning Kamp	670966596b	Remove unused variable(s). Found by: FlexeLint	2003-05-31 20:29:34 +00:00
Poul-Henning Kamp	1e93e04fa9	Remove return after panic. Found by: FlexeLint	2003-05-31 20:18:23 +00:00
Poul-Henning Kamp	90471005e1	Remove needless return Found by: FlexeLint	2003-05-31 20:16:44 +00:00
Poul-Henning Kamp	4fe77d64a0	Add a couple of XXX comments where the intent is not clear. Found by: FlexeLint	2003-05-31 20:13:58 +00:00
Poul-Henning Kamp	74f1af0191	Remove unused variable(s). Remove break after goto Found by: FlexeLint	2003-05-31 20:11:33 +00:00
Poul-Henning Kamp	b1921a6f33	Remove return after panic. Found by: FlexeLint	2003-05-31 20:09:42 +00:00
Poul-Henning Kamp	a62f80f8e0	Remove unused variable and now unbalanced call to splbio(); Found by: FlexeLint	2003-05-31 20:09:01 +00:00
Marcel Moolenaar	a063facbf6	Fix ia32 compat on ia64. Recent ia64 MD changes caused the garbage on the stack to be changed in a way incompatible with elf32_map_insert() where we used data_buf without initializing it for when the partial mapping resulting in a misaligned image (typical when the page size implied by the image is not the same as the page size in use by the kernel). Since data_buf is passed by reference to vm_map_find(), the compiler cannot warn about it. While here, move all local variables to the top of the function.	2003-05-31 19:55:05 +00:00
Poul-Henning Kamp	850cb24ef8	"break" rather than fall through to a break in the default clause. Found by: FlexeLint	2003-05-31 16:53:16 +00:00
Poul-Henning Kamp	8313328657	Introduce {be,le}_uuid_{enc,dec}() functions for explicitly encoding and decoding UUID's in big endian and little endian binary format.	2003-05-31 16:47:07 +00:00
Poul-Henning Kamp	17a1391990	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.	2003-05-31 16:42:45 +00:00
Peter Wemm	d5167abf3c	Add __amd64__ to the ifdefs that introduce the "pcicfg" spinlock to witness. Approved by: re (safe amd64 support)	2003-05-31 06:42:37 +00:00
Maxime Henrion	193f2edbf9	When loading a module that contains a sysctl which is already compiled in the kernel, the sysctl_register() call would fail, as expected. However, when unloading this module again, the kernel would then panic in sysctl_unregister(). Print a message error instead. Submitted by: Nicolai Petri <nicolai@catpipe.net> Reviewed by: imp Approved by: re@ (jhb)	2003-05-29 21:19:18 +00:00
David Malone	0f7e5f778a	Add an INVARIENTS only check to make sure Giant is held if mbuf allocation is attempted with M_TRYWAIT. Reviewed by: bmilekic Approved by: re (scottl)	2003-05-29 18:38:24 +00:00
David Malone	de1cab2b60	Grab giant in sendit rather than kern_sendit because sockargs may allocate mbufs with M_TRYWAIT, which may require Giant. Reviewed by: bmilekic Approved by: re (scottl)	2003-05-29 18:36:26 +00:00
Ian Dowse	ad6adb4f18	In cluster_wbuild(), initialise b_iocmd to BIO_WRITE before calling buf_start() to avoid triggering a panic in softdep_disk_io_initiation() if b_iocmd happened to be BIO_READ. The later initialisation of b_iocmd in cluster_wbuild() could probably be moved to before the buf_start() call, but this patch keeps the change as simple as possible. This is reported to fix occasional "softdep_disk_io_initiation: read" panics, especially on NFS servers. Reported by: Nick Hilliard <nick@netability.ie> Tested by: Nick Hilliard <nick@netability.ie> Approved by: re (rwatson)	2003-05-28 13:22:10 +00:00
Peter Wemm	a9a0bbad19	Copy the va_list in sbuf_vprintf() before passing it to vsnprintf(), because we could fail due to a small buffer and loop and rerun. If this happens, then the vsnprintf() will have already taken the arguments off the va_list. For i386 and others, this doesn't matter because the va_list type is a passed as a copy. But on powerpc and amd64, this is fatal because the va_list is a reference to an external structure that keeps the vararg state due to the more complicated argument passing system. On amd64, arguments can be passed as follows: First 6 int/pointer type arguments go in registers, the rest go on the memory stack. Float and double are similar, except using SSE registers. long double (80 bit precision) are similar except using the x87 stack. Where the 'next argument' comes from depends on how many have been processed so far and what type it is. For amd64, gcc keeps this state somewhere that is referenced by the va_list. I found a description that showed the va_copy was required here: http://mirrors.ccs.neu.edu/cgi-bin/unixhelp/man-cgi?va_end+9 The single unix spec doesn't mention va_copy() at all. Anyway, the problem was that the sysctl kern.geom.conf* nodes would panic due to walking off the end of the va_arg lists in vsnprintf. A better fix would be to have sbuf_vprintf() use a single pass and call kvprintf() with a callback function that stored the results and grew the buffer as needed. Approved by: re (scottl)	2003-05-25 19:03:08 +00:00
Jeff Roberson	0003d1b74e	- Create a new lock, umtx_lock, for use instead of the proc lock for protecting the umtx queues. We can't use the proc lock because we need to hold the lock across calls to casuptr, which can fault. Approved by: re	2003-05-25 18:18:32 +00:00
Jeff Roberson	30fd5d085d	- Reset the free ent to NULL if we have consumed the last free entry. This fixes a problem where we would overwrite old data if we ran out of free entries. Submitted by: sam Approved by: re (scottl)	2003-05-25 08:48:42 +00:00
Alan Cox	2e05d89828	Make the maximum number of vnodes a function of both the physical memory size and the kernel's heap size, specifically, vm_kmem_size. This function allows a maximum of 40% of the vm_kmem_size to be used for vnodes and vm objects. This is a conservative bound based upon recent problem reports. (In other words, a slight increase in this percentage may be safe.) Finally, machines with less than ~3GB of RAM should be unaffected by this change, i.e., the maximum number of vnodes should remain the same. If necessary, machines with 3GB or more of RAM can increase the maximum number of vnodes by increasing vm_kmem_size. Desired by: scottl Tested by: jake Approved by: re (rwatson,scottl)	2003-05-23 19:54:02 +00:00
Julian Elischer	faaa20f639	When we are spilling threads out of the run queue during panic, make sure we keep the thread state variable consistent with its real state. i.e. Don't say it's on the run queue when it isn't. Also clarify the associated comment. Turns a double panic back to a single panic :-/ Approved by: re@ (jhb)	2003-05-21 18:53:25 +00:00
Marcel Moolenaar	f2c49dd248	Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)	2003-05-16 21:26:42 +00:00
Don Lewis	1e9bc9f889	Detect that a vnode has been reclaimed while vflush() was waiting to lock the vnode and restart the loop. Vflush() is vulnerable since it does not hold a reference to the vnode and it holds no other locks while waiting for the vnode lock. The vnode will no longer be on the list when the loop is restarted. Approved by: re (rwatson)	2003-05-16 19:46:51 +00:00
David E. O'Brien	8d542cb56d	Fix long standing bug that prevents the PT_CONTINUE, PT_KILL and PT_DETACH ptrace(2) requests from functioning as advertised in the manual page. As described in kern/35175, the PT_DETACH request will, under certain circumstances, pass an unwanted signal on to the traced process upan detaching from it. The PT_CONTINUE request will sometimes fail if you make it pass a signal that has "properties" that differ from the properties of the signal that origionally caused the traced process to be stopped. Since PT_KILL is nothing than PT_CONTINUE with SIGKILL, it is broken too. In the PT_KILL case, this leads to an unkillable process. PR: 44011 Submitted by: Mark Kettenis <kettenis@chello.nl> Approved by: re(jhb)	2003-05-16 01:34:23 +00:00
Robert Watson	c1dca9ab07	VOP_PATHCONF() requires a vnode lock; this patch adds locking to fpathconf(). The lock is held for direct calls to VOP_PATHCONF() in pathconf() already. Approved by: re (jhb) Pointed out by: DEBUG_VFS_LOCKS	2003-05-15 21:13:08 +00:00
Bosko Milekic	11583f6c93	Make the mb_alloc low-watermark sysctl-tunable read-only and make netstat(1) not display it for now because its effects are not yet completely implemented and we're about to cut 5.2-RELEASE. This is temporary. Approved by: re (scottl, rwatson)	2003-05-15 19:05:28 +00:00
Paul Saab	13d56a9a90	p_sigignore moved into struct sigacts. move one which was missed. Approved by: re (scottl)	2003-05-14 00:03:55 +00:00
John Baldwin	90af4afacb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)	2003-05-13 20:36:02 +00:00
John Baldwin	25b4d3a8a6	In setitimer(2), if the it_value of the new itimer value is clear, then don't add the current time to it, but leave it as clear so that when the timer is disabled, the it_value is always clear. Reviewed by: bde Approved by: re (rwatson)	2003-05-13 19:21:46 +00:00
Alan Cox	099e981aa1	Optimize the use of splay in gbincore(). During a "make buildworld" the desired buffer is found at one of the roots more than 60% of the time. Thus, checking both roots before performing either splay eliminates unnecessary splays on the first tree splayed. Approved by: re (jhb)	2003-05-13 04:36:02 +00:00
Poul-Henning Kamp	87b1831f1d	Bail out if there were not two loadable sections. Add XXX comment about one other issue. Approved by: re/rwatson.	2003-05-12 15:08:10 +00:00
Robert Watson	1964fb9ba2	Remove bogus locking from DDB's "show lockedvnods" command: using synchronization primitives from inside DDB is generally a bad idea, and in this case it frequently results in panics due to DDB commands being executed from the sio fast interrupt context on a serial console. Replace the locking with a note that a lack of locking means that DDB may get see inconsistent views of the mount and vnode lists, which could also result in a panic. More frequently, though, this avoids a panic than causes it. Discussed with ages ago: bde Approved by: re (scottl)	2003-05-12 14:37:47 +00:00
Poul-Henning Kamp	1282e9acea	Don't pass NULL pointer to memset if we are compiled with DIAGNOSTIC Approved by: re/rwatson	2003-05-12 05:09:56 +00:00
Bosko Milekic	969bab3efb	Make m_freem() just use m_free() instead of duplicating the code. The reason for the duplication was that m_freem() was meant to eventually be optimized to hold the lock of the cache being freed to as long as possible across frees but the difficulty of implementing said optimization right now is too high, given that in some cases (see MAC and non-cluster external buffers), we need to call into other subsytems, something not permissible when the cache lock is held. This change minimizes code duplication while keeping at least the atomic mbuf+cluster free optimization. Suggested by: luigi	2003-05-10 18:08:23 +00:00
John Baldwin	b1bf1c3a98	Remove Giant from kern_sigsuspend() and osigsuspend() as these should now be MP safe. Approved by: re (scottl)	2003-05-09 19:11:32 +00:00
Robert Watson	b2aef57123	Rename MAC_MAX_POLICIES to MAC_MAX_SLOTS, since the variables and constants in question refer to the number of label slots, not the maximum number of policies that may be loaded. This should reduce confusion regarding an element in the MAC sysctl MIB, as well as make it more clear what the affect of changing the compile-time constants is. Approved by: re (jhb) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-05-08 19:49:42 +00:00
Robert Watson	41a17fe326	Clean up locking for the MAC Framework: (1) Accept that we're now going to use mutexes, so don't attempt to avoid treating them as mutexes. This cleans up locking accessor function names some. (2) Rename variables to _mtx, _cv, _count, simplifying the naming. (3) Add a new form of the _busy() primitive that conditionally makes the list busy: if there are entries on the list, bump the busy count. If there are no entries, don't bump the busy count. Return a boolean indicating whether or not the busy count was bumped. (4) Break mac_policy_list into two lists: one with the same name holding dynamic policies, and a new list, mac_static_policy_list, which holds policies loaded before mac_late and without the unload flag set. The static list may be accessed without holding the busy count, since it can't change at run-time. (5) In general, prefer making the list busy conditionally, meaning we pay only one mutex lock per entry point if all modules are on the static list, rather than two (since we don't have to lower the busy count when we're done with the framework). For systems running just Biba or MLS, this will halve the mutex accesses in the network stack, and may offer a substantial performance benefits. (6) Lay the groundwork for a dynamic-free kernel option which eliminates all locking associated with dynamically loaded or unloaded policies, for pre-configured systems requiring maximum performance but less run-time flexibility. These changes have been running for a few weeks on MAC development branch systems. Approved by: re (jhb) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-05-07 17:49:24 +00:00
Alan Cox	658ad5fff5	Lock the vm_object when performing vm_pager_deallocate().	2003-05-06 02:45:28 +00:00
John Baldwin	01de25134f	Tweak the clearing of TDF_DEADLKTREAT so that we only bother grabbing the lock and clearing the flag if it was clear when uiomove() was called.	2003-05-05 21:27:29 +00:00
John Baldwin	854dc8c2a1	Mostly sort the includes.	2003-05-05 21:26:25 +00:00

... 2 3 4 5 6 ...

6661 Commits