freebsd-dev

Author	SHA1	Message	Date
Robert Watson	d8939d82cb	Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we attempt to IPI other cpus when entering the debugger in order to stop them while in the debugger. The default remains to issue the stop; however, that can result in a hang if another cpu has interrupts disabled and is spinning, since the IPI won't be received and the KDB will wait indefinitely. We probably need to add a timeout, but this is a useful stopgap in the mean time. Reviewed by: marcel	2004-08-15 02:06:27 +00:00
Robert Watson	6cbea71c82	Cause pfind() not to return processes in the PRS_NEW state. As a result, threads consuming the result of pfind() will not need to check for a NULL credential pointer or other signs of an incompletely created process. However, this also means that pfind() cannot be used to test for the existence or find such a process. Annotate pfind() to indicate that this is the case. A review of curent consumers seems to indicate that this is not a problem for any of them. This closes a number of race conditions that could result in NULL pointer dereferences and related failure modes. Other related races continue to exist, especially during iteration of the allproc list without due caution. Discussed with: tjr, green	2004-08-14 17:15:16 +00:00
Poul-Henning Kamp	d8e8b6755c	Add some KASSERTS.	2004-08-14 08:33:49 +00:00
Julian Elischer	f0017f3321	Whitespace nit.	2004-08-14 07:21:20 +00:00
Robert Watson	b295bdcded	After completing a name lookup for a target UNIX domain socket to connect to, re-check that the local UNIX domain socket hasn't been closed while we slept, and if so, return EINVAL. This affects the system running both with and without Giant over the network stack, and recent ULE changes appear to cause it to trigger more frequently than previously under load. While here, improve catching of possibly closed UNIX domain sockets in one or two additional circumstances. I have a much larger set of related changes in Perforce, but they require more testing before they can be merged. One debugging printf is left in place to indicate when such a race takes place: this is typically triggered by a buggy application that simultaenously connect()'s and close()'s a UNIX domain socket file descriptor. I'll remove this at some point in the future, but am interested in seeing how frequently this is reported. In the case of Martin's reported problem, it appears to be a result of a non-thread safe syslog() implementation in the C library, which does not synchronize access to its logging file descriptor. Reported by: mbr	2004-08-14 03:43:49 +00:00
John-Mark Gurney	ac77164d64	clean up whitespace...	2004-08-13 17:43:53 +00:00
John-Mark Gurney	7d5e45a391	looks like rwatson forgot tabs... :)	2004-08-13 07:38:58 +00:00
Julian Elischer	c00661f83c	Don't keep evaluating our own cpu mask.. it's not likely to have changed....	2004-08-13 00:57:43 +00:00
Robert Watson	44f31f7556	Trim trailing white space.	2004-08-12 18:06:21 +00:00
Warner Losh	9f7f340a0f	Minor formatting fixes for lines > 80 characters	2004-08-12 17:26:22 +00:00
Jeff Roberson	f2b74cbf28	- Introduce a new flag KEF_HOLD that prevents sched_add() from doing a migration. Use this in sched_prio() and sched_switch() to stop us from migrating threads that are in short term sleeps or are runnable. These extra migrations were added in the patches to support KSE. - Only set NEEDRESCHED if the thread we're adding in sched_add() is a lower priority and is being placed on the current queue. - Fix some minor whitespace problems.	2004-08-12 07:56:33 +00:00
Julian Elischer	0f54f48225	Properly keep track of how many kses are on the system run queue(s).	2004-08-11 20:54:48 +00:00
Robert Watson	217a4b6e4e	Replace a reference to splnet() with a reference to locking in a comment.	2004-08-11 03:43:10 +00:00
Marcel Moolenaar	4da47b2fec	Add __elfN(dump_thread). This function is called from __elfN(coredump) to allow dumping per-thread machine specific notes. On ia64 we use this function to flush the dirty registers onto the backingstore before we write out the PRSTATUS notes. Tested on: alpha, amd64, i386, ia64 & sparc64 Not tested on: arm, powerpc	2004-08-11 02:35:06 +00:00
Robert Watson	87e83e7d4c	In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However, we may sleep when doing so; check that we didn't race with another thread allocating storage for the vnode after allocation is made to a local pointer, and only update the vnode pointer if it's still NULL. Otherwise, accept that another thread got there first, and release the local storage. Discussed with: jmg	2004-08-11 01:27:53 +00:00
Alan Cox	fad44deea3	Eliminate the acquisition and release of Giant within physio(). Remove the spl calls. Reviewed by: phk@ Discussed with: scottl@	2004-08-10 21:47:11 +00:00
John Baldwin	274f8f48e8	Synchronize the extra SA threading checks and return value handling of condition variables with that of msleep(). Reviewed by: davidxu	2004-08-10 17:42:59 +00:00
Jeff Roberson	2454aaf51c	- Use a new flag, KEF_XFERABLE, to record with certainty that this kse had contributed to the transferable load count. This prevents any potential problems with sched_pin() being used around calls to setrunqueue(). - Change the sched_add() load balancing algorithm to try to migrate on wakeup. This attempts to place threads that communicate with each other on the same CPU. - Don't clear the idle counts in kseq_transfer(), let the cpus do that when they call sched_add() from kseq_assign(). - Correct a few out of date comments. - Make sure the ke_cpu field is correct when we preempt. - Call kseq_assign() from sched_clock() to catch any assignments that were done without IPI. Presently all assignments are done with an IPI, but I'm trying a patch that limits that. - Don't migrate a thread if it is still runnable in sched_add(). Previously, this could only happen for KSE threads, but due to changes to sched_switch() all threads went through this path. - Remove some code that was added with preemption but is not necessary.	2004-08-10 07:52:21 +00:00
Nate Lawson	c8c216d558	Skip the syncing disks loop if there are no dirty buffers. Remove a variable used to flag the initial printf. Submitted by: truckman (earlier version)	2004-08-10 01:32:05 +00:00
Scott Long	0f4ad91810	Add a temporary debugging hack to detect a deadlock in setrunqueue(). This is here so that we can gather stats on the nature of the recent rash of hard lockups, and in this particular case panic the machine instead of letting it deadlock forever.	2004-08-10 00:26:25 +00:00
Julian Elischer	e2105bce2a	Slight changes to comments and some whitespace changes.	2004-08-09 21:57:30 +00:00
Julian Elischer	1a5cd27b4b	Make kg->kg_runnable actually count runnable threads in the ksegrp run queue instead of only doing it sometimes.. This is not used outdide of debugging code in the current code, but that will probably change.	2004-08-09 20:36:03 +00:00
Julian Elischer	332e72ddb7	Remove typos on KASSERT messages.	2004-08-09 20:13:07 +00:00
Brian Feldman	83dd6b37e1	Normalize the VM wiring done with SPARSE_MAPPING: check for errors, and unmap when done. For whatever reason, SPARSE_MAPPING is not even a config option, so this is dead code.	2004-08-09 18:46:13 +00:00
Julian Elischer	732d95288a	Increase the amount of data exported by KTR in the KTR_RUNQ setting. This extra data is needed to really follow what is going on in the threaded case.	2004-08-09 18:21:12 +00:00
John-Mark Gurney	6141e04a7e	add option to automaticly mark core dumps with the nodump flag PR: 57065 Submitted by: Walter C. Pelissero	2004-08-09 05:46:46 +00:00
David Xu	604be46d1e	1.Add KSE_INTR_DBSUSPEND command for kse_thr_interrupt to suspend a bound thread, after the bound thread leaves critical region, the thread should check debug flag may suspend itself by using the command. 2.Schedule upcall after thread is suspended by debugger 3.Wakeup upcall thread after process suspension. Reviewed by: deischen	2004-08-08 22:32:20 +00:00
David Xu	2b70a83aff	Call thread_user_enter for M:N thread, ast() should be treated as another entrance of kernel.	2004-08-08 22:28:33 +00:00
David Xu	1f2eac6cf3	Add pl_flags to ptrace_lwpinfo, two flags PL_FLAG_SA and PL_FLAG_BOUND indicate that a thread is in UTS critical region. Reviewed by: deischen Approved by: marcel	2004-08-08 22:26:11 +00:00
Doug Rabson	cfaf7e60cc	Make sure that AT_PHDR has a useful value even for static programs.	2004-08-08 09:48:10 +00:00
John-Mark Gurney	227559d11f	rearange some code that handles the thread taskqueue so that it is more generic. Introduce a new define TASKQUEUE_DEFINE_THREAD that takes a single arg, which is the name of the queue. Document these changes.	2004-08-08 02:37:22 +00:00
Robert Watson	b223d06425	We're not yet ready to assert !Giant in kern_fcntl(), as it's called with Giant from ABI wrappers such as Linux emulation. Foot shoot off: phk	2004-08-07 14:09:02 +00:00
Robert Watson	db532b63c2	Flag a broad range of VFS operations as GIANT_REQUIRED in order to catch leaking into VFS without Giant. Inch Giant a little lower in several file descriptor operations on vnodes to cover only VFS operations that need it, rather than file flag reading, etc.	2004-08-06 22:25:35 +00:00
Robert Watson	cc701b73b8	In thread_exit(), include more information about the thread/process context in the KTR trace record. In particular, include the same information as passed for mi_switch() and fork_exit() KTR trace records.	2004-08-06 22:06:14 +00:00
Robert Watson	5dd3a4ed6c	Push UIDINFO_UNLOCK() slightly earlier in chgsbize(), as it's not needed if we print the local variable version of the limit rather than the shared version.	2004-08-06 22:04:33 +00:00
Robert Watson	a0a819747c	Avoid acquiring Giant for some common light-weight or already MPSAFE fcntl() operations, including: F_DUPFD dup() alias F_GETFD retrieve close-on-exec flag F_SETFD set close-on-exec flag F_GETFL retrieve file descriptor flags For the remaining fcntl() operations, do acquire Giant, especially where we call into fo_ioctl() as a result. We're not yet ready to push Giant into fo_ioctl(). Once we do, this can all become quite a bit prettier.	2004-08-06 22:00:55 +00:00
Robert Watson	ff7ec58af8	Cut a KTR record whenever a callout is invoked. Mark whether it runs with Giant or not, and include the function point so it can be looked up against the kernel symbol table during trace analysis.	2004-08-06 21:49:00 +00:00
John Baldwin	44fe3c1ff0	Don't scare users with a warning about preemption being off when it isn't yet safe to have on by default.	2004-08-06 15:49:44 +00:00
Robert Watson	6f40c417ca	In ithread_schedule(), when we plan to go harvest some entropy as a result of scheduling an ithread, cut a KTR_INTR trace record so that it's clear in tracing interrupt activity where and when the entropy harvesting code is invoked.	2004-08-06 03:39:28 +00:00
Colin Percival	0413bacd09	When reseting a pending callout, perform the deregistration in callout_reset rather than calling callout_stop. This results in a few lines of code duplication, but it provides a significant performance improvement because it avoids recursing on callout_lock. Requested by: rwatson	2004-08-06 02:44:58 +00:00
John Baldwin	5cc00cfc67	Fix the code in rman that merges adjacent unallocated resources to use a better check for 'adjacent'. The old code assumed that if two resources were adjacent in the linked list that they were also adjacent range wise. This is not true when a resource manager has to manage disparate regions. For example, the current interrupt code on i386/amd64 will instruct irq_rman to manage two disjoint regions: 0-1 and 3-15 for the non-APIC case. If IRQs 1 and 3 were allocated and then released, the old code would coalesce across the 1 to 3 boundary because the resources were adjacent in the linked list thus adding 2 to the area of resources that irq_rman managed as a side effect. The fix adds extra checks so that adjacent unallocated resources are only merged with the resource being freed if the start and end values of the resources also match up. The patch also consolidates the checks for adjacent resources being allocated.	2004-08-05 15:48:18 +00:00
John Baldwin	0e5a07e533	Remove a potential deadlock on i386 SMP by changing the lazypmap ipi and spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi and spin-wait code snippets so that you can't get into the situation of one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap shootdown each of which are waiting on each other. With this change, only one of the CPUs would do an IPI and spin-wait at a time.	2004-08-04 20:31:19 +00:00
John Baldwin	c950c15c76	Workaround a possible deadlock on SMP due to a spin lock LOR by disabling the immediate awakening of proc0 (scheduler kproc, controls swapping processes in and out). The scheduler process periodically awakens already, so this will not result in processes not being swapped in, there will just be more latency in between a thread being made runnable and the scheduler waking up to swap the affected process back in.	2004-08-04 20:24:40 +00:00
John Baldwin	bdcfcf5bc4	Cache the value of curthread in the _get_sleep_lock() and _get_spin_lock() macros and pass the value to the associated _mtx_*() functions to avoid more curthread dereferences in the function implementations. This provided a very modest perf improvement in some benchmarks. Suggested by: rwatson Tested by: scottl	2004-08-04 20:18:45 +00:00
Robert Watson	7a36e1d6c7	Assert Giant in namei(). Bugs have been reported in which, following a sleep() call waking up in namei(), a later assertion triggers that Giant is not held. By asserting Giant at the start of namei(), we can know that if that assertion triggers, Giant is lost during the call to namei(), and not before.	2004-08-04 18:39:07 +00:00
Robert Watson	0be8ad5fbc	Assert Giant in the following file descriptor-related functions: Function Reason -------- ------ fdfree() VFS setugidsafety() KQueue fdcheckstd() VFS _fgetvp() VFS fgetsock() Conditional assertion based on debug.mpsafenet	2004-08-04 18:35:33 +00:00
Robert Watson	1b93405c7c	Remove spl's from kern_resource.c.	2004-08-04 18:19:09 +00:00
Maxime Henrion	9f1b87f106	Instead of calling ia32_pause() conditionally on __i386__ or __amd64__ being defined, define and use a new MD macro, cpu_spinwait(). It only expands to something on i386 and amd64, so the compiled code should be identical. Name of the macro found by: jhb Reviewed by: jhb	2004-08-03 18:44:27 +00:00
Pawel Jakub Dawidek	24b2151f4d	Don't skip permission checks when sending signals to zombie processes. Pointed out by: bde Reviewed by: rwatson	2004-08-03 15:39:23 +00:00
Mike Silbersack	e10ecdea88	Standardize pipe locking, ensuring that everything is locked via pipelock(), not via a mixture of mutexes and pipelock(). Additionally, add a few KASSERTS, and change some statements that should have been KASSERTS into KASSERTS. As a result of these cleanups, some segments of code have become significantly shorter and/or easier to read.	2004-08-03 02:59:15 +00:00
David Xu	4513fb36aa	s/TMDF_DONOTRUNUSER/TMDF_SUSPEND/g Dicussed with: deischen	2004-08-03 02:23:06 +00:00
Julian Elischer	4fd54632b0	Repeat after me: "Do not apply your tested patches to your commit tree by hand"	2004-08-03 01:43:29 +00:00
Julian Elischer	c94b38af46	Remove an argument that is never used.	2004-08-02 23:48:43 +00:00
David E. O'Brien	64298d52cc	Put a cap on the auto-tuning of kern.maxvnodes. Cap value chosen by: scottl	2004-08-02 21:52:43 +00:00
Robert Watson	3d3f5f6057	Add what appears to be a missing '*/' at the end of a comment.	2004-08-02 01:38:27 +00:00
Brian Feldman	b23f72e98a	* Add a "how" argument to uma_zone constructors and initialization functions so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.	2004-08-02 00:18:36 +00:00
Julian Elischer	6e0fbb01c5	Comment kse_create() and make a few minor code cleanups Reviewed by: davidxu	2004-08-01 23:02:00 +00:00
Poul-Henning Kamp	5e8c582ac2	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.	2004-07-30 22:08:52 +00:00
Alan Cox	9be60284a6	Giant is no longer required by vm_waitproc() and vmspace_exitfree(). Eliminate it acquisition and release around vm_waitproc() in kern_wait().	2004-07-30 20:31:02 +00:00
Nate Lawson	b1c8139147	Minor message cleanup.	2004-07-30 01:30:05 +00:00
Pawel Jakub Dawidek	0b011ea3da	Syscall kill(2) called for a zombie process should return 0. Obtained from: Darwin	2004-07-29 20:38:19 +00:00
Pawel Jakub Dawidek	cebabef04f	Fill some informations about zombie processes as well. Before this change every zombie process were reported as an owner of PID 0 in ps(1) output. Reviewed by: julian	2004-07-29 20:27:59 +00:00
Poul-Henning Kamp	d634f69316	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.	2004-07-28 20:21:04 +00:00
Alexander Kabaev	00fbcda80d	Avoid casts as lvalues.	2004-07-28 06:42:41 +00:00
David Xu	8bda8a620c	Use P_SINGLE_EXIT to check single-threading case, P_WEXIT is not for that purpose.	2004-07-28 06:30:52 +00:00
Poul-Henning Kamp	3dfe213e61	Convert the vfsconf list to a TAILQ. Introduce vfs_byname() function to find things on it. Staticize vfs_nmount() function under the name vfs_donmount(). Various cleanups.	2004-07-27 22:32:01 +00:00
Robert Watson	1a8cfbc450	Pass a thread argument into cpu_critical_{enter,exit}() rather than dereference curthread. It is called only from critical_{enter,exit}(), which already dereferences curthread. This doesn't seem to affect SMP performance in my benchmarks, but improves MySQL transaction throughput by about 1% on UP on my Xeon. Head nodding: jhb, bmilekic	2004-07-27 16:41:01 +00:00
Robert Watson	a9abdce44a	Add "options ADAPTIVE_GIANT" which causes Giant to also be treated in an adaptive fashion when adaptive mutexes are enabled. The theory behind non-adaptive Giant is that Giant will be held for long periods of time, and therefore spinning waiting on it is wasteful. However, in MySQL benchmarks which are relatively Giant-free, running Giant adaptive makes an observable difference on SMP (5% transaction rate improvement). As such, make adaptive behavior on Giant an option so it can be more widely benchmarked.	2004-07-27 16:34:48 +00:00
Alan Cox	1a276a3f91	- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.	2004-07-27 03:53:41 +00:00
Bosko Milekic	0047b9a96a	Move the schedlock owner state update following the context switch in fork_exit() to before anything else is done (but keep schedlock for the deadthread check). This means one less nasty bug if ever in the future whatever might have been called before the update played with schedlock or critical sections. Discussed with: tjr	2004-07-27 03:46:31 +00:00
Colin Percival	66d5c640fa	In revision 1.228, I accidentally broke the "total number of processes in the system" resource limit code: When checking if the caller has superuser privileges, we should be checking the real user, not the effective user. (In general, resource limiting is done based on the real user, in order to avoid resource-exhaustion-by-setuid-program attacks.) Now that a SUSER_RUID flag to suser_cred exists, use it here to return this code to its correct behaviour. Pointed out by: rwatson	2004-07-26 07:54:39 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Robert Watson	feb9bd18c6	Revert modification of subr_turnstile.c accidentally included in the last commit; this assertion was provided by jhb for local debugging and not intended for broader consumption.	2004-07-25 23:32:32 +00:00
Robert Watson	fd179ee91d	In uipc_connect(), assert that the passed thread is curthread, and pass td into unp_connect() instead of reading curthread.	2004-07-25 23:30:43 +00:00
Robert Watson	99901d0afb	Do some initial locking on accept filter registration and attach. While here, close some races that existed in the pre-locking world during low memory conditions. This locking isn't perfect, but it's closer than before.	2004-07-25 23:29:47 +00:00
Poul-Henning Kamp	cf95b5c381	Eliminate unused second argument to reassignbuf() and simplify it accordingly.	2004-07-25 21:24:23 +00:00
Robert Watson	3ed994c6c3	Add netatalk mutexes to hard-coded WITNESS lock order.	2004-07-25 20:16:51 +00:00
Warner Losh	4411688509	Expand the generic, but bogusly formed, copyright notice to include the license from /usr/src/COPYRIGHT. Since cvs annotate shows that this was written by jasone, julian, jhb, peter, bmilekic and obrien. cvs log shows that many others may have contributed to this file. As such, go ahead and use the author of 'FreeBSD Project' for this file. If this is a problem, please notify me. # this eliminates the last file in the kernel with an indirect reference # to /usr/src/COPYRIGHT in the kernel. A few more in userland remain.	2004-07-25 19:49:01 +00:00
Poul-Henning Kamp	a3d57cfbfd	Neuter this warning for now, I think I know the remaining issues.	2004-07-25 08:09:21 +00:00
Julian Elischer	aa3c8c02ae	White space fix.. diff reduction for upcoming commit.	2004-07-24 04:57:41 +00:00
Scott Long	e038d35422	Clean up whitespace, increase consistency and correctness. Submitted by: bde	2004-07-23 23:09:00 +00:00
Robert Watson	ff381670df	Don't include a "\n" in KTR output, it confuses automatic parsing.	2004-07-23 20:12:56 +00:00
Scott Long	18f480f8f6	Remove the previous hack since it doesn't make a difference and is getting in the way of debugging.	2004-07-23 19:59:16 +00:00
Alan Cox	b332cea583	Use kmem_alloc_nofault() rather than kmem_alloc_pageable() for allocating KVA for explicitly managed mappings, i.e., mappings created with pmap_qenter().	2004-07-23 19:36:18 +00:00
Robert Watson	4da86f8826	Export KTR_COMPILE as a sysctl so you can easily check from user space what event mask has been compiled into the kernel.	2004-07-23 17:41:44 +00:00
Robert Watson	46b25cb5f6	Don't perform pipe endpoint locking during pipe_create(), as the pipe can't yet be referenced by other threads. In microbenchmarks, this appears to reduce the cost of pipe();close();close() on UP by 10%, and SMP by 7%. The vast majority of the cost of allocating a pipe remains VM magic. Suggested by: silby	2004-07-23 14:11:04 +00:00
Robert Watson	71a057bc73	In setpgid(), since td is passed in as a system call argument, use it in preference to curthread, which costs slightly more.	2004-07-23 04:26:49 +00:00
Robert Watson	a6719c82b1	Push Giant acquisition down into fo_stat() from most callers. Acquire Giant conditional on debug.mpsafenet in the socket soo_stat() routine, unconditionally in vn_statfile() for VFS, and otherwise don't acquire Giant. Accept an unlocked read in kqueue_stat(), and cryptof_stat() is a no-op. Don't acquire Giant in fstat() system call. Note: in fdescfs, fo_stat() is called while holding Giant due to the VFS stack sitting on top, and therefore there will still be Giant recursion in this case.	2004-07-22 20:40:23 +00:00
Robert Watson	1c1ce9253f	Push acquisition of Giant from fdrop_closed() into fo_close() so that individual file object implementations can optionally acquire Giant if they require it: - soo_close(): depends on debug.mpsafenet - pipe_close(): Giant not acquired - kqueue_close(): Giant required - vn_close(): Giant required - cryptof_close(): Giant required (conservative) Notes: Giant is still acquired in close() even when closing MPSAFE objects due to kqueue requiring Giant in the calling closef() code. Microbenchmarks indicate that this removal of Giant cuts 3%-3% off of pipe create/destroy pairs from user space with SMP compiled into the kernel. The cryptodev and opencrypto code appears MPSAFE, but I'm unable to test it extensively and so have left Giant over fo_close(). It can probably be removed given some testing and review.	2004-07-22 18:35:43 +00:00
Robert Watson	df04411ac4	suser() accepts a thread argument; as suser() dereferences td_ucred, a thread-local pointer, in practice that thread needs to be curthread. If we're running with INVARIANTS, generate a warning if not. If we have KDB compiled in, generate a stack trace. This doesn't fire at all in my local test environment, but could be irritating if it fires frequently for someone, so there will be motivation to fix things quickly when it does.	2004-07-22 17:05:04 +00:00
Scott Long	9493183e77	Disable the PREEMPTION-enabled code in critical_exit() that encourages switching to a different thread. This is just a hack to try to improve stability some more, but likely points closer to the real culprit.	2004-07-22 14:32:48 +00:00
Bosko Milekic	01e9ccbd9c	Back out just a portion of Alfred's last commit. Remove the MBUF_CHECK (WITNESS) for code paths that always call uma_zalloc_arg() shortly after where the check was, because uma_zalloc_arg() already does a similar check. No objections from Alfred. Thanks Alfred.	2004-07-21 21:03:01 +00:00
Robert Watson	46e38ce826	Don't sync the file system on panic by default. This seems to basically work very infrequently, and often results in a compound panic which confuses debugging; locking/SMP have made the layering violation (and risks) of this more obvious over time. Discussed with: green, bde, et al.	2004-07-21 16:04:46 +00:00
Alfred Perlstein	05656b6e2b	put several of the options for DEBUG_VFS_LOCKS under control of sysctls.	2004-07-21 07:13:14 +00:00
Alfred Perlstein	063d811465	Make sure we don't call mbuf allocation functions with mutexes held. Discussed with: rwatson	2004-07-21 07:12:24 +00:00
Marcel Moolenaar	3d4f313695	Add kdb_thr_from_pid(), which given a PID returns the first thread in the process. This is useful when working from or with a process.	2004-07-21 04:49:48 +00:00
Mike Silbersack	eb3d2c61b4	Fix a minor error in pipe_stat - st_size was always reported as 0 when direct writes kicked in. Whether this affected any applications is unknown.	2004-07-20 07:06:43 +00:00
Peter Wemm	b09cb1027b	#ifdef __i386__ -> __i386__ \|\| __amd64__	2004-07-20 02:15:10 +00:00
Julian Elischer	3a63b92c12	You always spot the typos after you have committed.. Start sentence with a Cap.	2004-07-19 18:06:12 +00:00
Julian Elischer	f6449d9d31	Allow the user who calls doadump() from the kernel debugger to not get a page fault if he has not defined a dump device. Panic can often not do a dump as it can hang forever in some cases. The original PR was for amd64 only. This is a generalised version of that change. PR: amd64/67712 Submitted by: wjw@withagen.nl <Willen Jan Withagen>	2004-07-19 18:03:02 +00:00
Brian Feldman	4362fada8f	Reimplement contigmalloc(9) with an algorithm which stands a greatly- improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.	2004-07-19 06:21:27 +00:00
Julian Elischer	55d44f79ea	When calling scheduler entrypoints for creating new threads and processes, specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter	2004-07-18 23:36:13 +00:00
Pawel Jakub Dawidek	ece2d9891e	Now we have NO_ADAPTIVE_MUTEXES option, so use it here too. Missed by: scottl	2004-07-18 23:27:14 +00:00
Marcel Moolenaar	1f7a1baa37	After maintaining previous behaviour in writing out the core notes, it's time now to break with the past: do not write the PID in the first note. Rationale: 1. [impact of the breakage] Process IDs in core files serve no immediate purpose to the debugger itself. They are only useful to relate a core file to a process. This can provide context to the person looking at the core file, provided one keeps track of this. Overall, not having the PID in the core file is only in very rare occasions unfortunate. 2. [reason of the breakage] Having one PRSTATUS note contain the PID, while all others contain the LWPID of the corresponding kernel thread creates an irregularity for the debugger that cannot easily be worked around. This is caused by libthread_db correlating user thread IDs to kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs. Update comments accordingly.	2004-07-18 20:28:07 +00:00
David Malone	cdb71f7526	The recent changes to control message passing broke some things that get certain types of control messages (ping6 and rtsol are examples). This gets the new code closer to working: 1) Collect control mbufs for processing in the controlp == NULL case, so that they can be freed by externalize. 2) Loop over the list of control mbufs, as the externalize function may not know how to deal with chains. 3) In the case where there is no externalize function, remember to add the control mbuf to the controlp list so that it will be returned. 4) After adding stuff to the controlp list, walk to the end of the list of stuff that was added, incase we added a chain. This code can be further improved, but this is enough to get most things working again. Reviewed by: rwatson	2004-07-18 19:10:36 +00:00
Doug Rabson	4c4392e791	Add doxygen doc comments for most of newbus and the BUS interface.	2004-07-18 16:30:31 +00:00
Scott Long	701f140800	Enable ADAPTIVE_MUTEXES by default by changing the sense of the option to NO_ADAPTIVE_MUTEXES. This option has been enabled by default on amd64 for quite some time, and has been extensively tested on i386 and sparc64. It shows measurable performance gains in many circumstances, and few negative effects. It would be nice in t he future if adaptive mutexes actually went to sleep after a certain amount of spinning, but that will require quite a bit more testing.	2004-07-18 15:59:03 +00:00
Alan Cox	d8582da660	Remove GIANT_REQUIRED from vmapbuf().	2004-07-18 04:57:49 +00:00
Robert Watson	2260c03d77	Drop Giant and acquire the UNIX domain socket subsystem lock a bit earlier in unp_connect() so that vp->v_socket can't change between our copying its value to a local variable and later use of that variable. This may have been responsible for a panic during shutdown that I experienced where simultaneous closing of a listen socket by rpcbind and a new connection being made to rpcbind by mountd.	2004-07-18 01:29:43 +00:00
David Xu	c3d88cbab8	Fix typo.	2004-07-17 23:15:41 +00:00
David Malone	e140eb430c	Add a kern_setsockopt and kern_getsockopt which can read the option values from either user land or from the kernel. Use them for [gs]etsockopt and to clean up some calls to [gs]etsockopt in the Linux emulation code that uses the stackgap.	2004-07-17 21:06:36 +00:00
John Baldwin	52eb84641d	- Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).	2004-07-16 21:04:55 +00:00
John Baldwin	d3373e371b	Whitespace fix.	2004-07-16 21:01:52 +00:00
John Baldwin	6dbc085016	Improve readability a bit by changing some code at the end of a function that did: if (foo) return else blah to just do the simpler if (!foo) blah instead.	2004-07-16 21:00:50 +00:00
Colin Percival	24283cc01b	Add a SUSER_RUID flag to suser_cred. This flag indicates that we want to check if the real user is the superuser (vs. the normal behaviour, which checks the effective user). Reviewed by: rwatson	2004-07-16 15:57:16 +00:00
Robert Watson	dad7b41a9b	When entering soclose(), assert that SS_NOFDREF is not already set.	2004-07-16 00:37:34 +00:00
Poul-Henning Kamp	672c05d49c	Preparation commit for the tty cleanups that will follow in the near future: rename ttyopen() -> tty_open() and ttyclose() -> tty_close(). We need the ttyopen() and ttyclose() for the new generic cdevsw functions for tty devices in order to have consistent naming.	2004-07-15 20:47:41 +00:00
Poul-Henning Kamp	3e019deaed	Do a pass over all modules in the kernel and make them return EOPNOTSUPP for unknown events. A number of modules return EINVAL in this instance, and I have left those alone for now and instead taught MOD_QUIESCE to accept this as "didn't do anything".	2004-07-15 08:26:07 +00:00
Alfred Perlstein	bb5faea34f	Cleanup shutdown output.	2004-07-15 08:01:00 +00:00
Alfred Perlstein	da6303bacc	Tidy up system shutdown.	2004-07-15 04:29:48 +00:00
Alfred Perlstein	a88295bb83	Disable SIGIO for now, leave a comment as to why it's busted and hard to fix.	2004-07-15 03:49:52 +00:00
Nate Lawson	8916adb1c9	Clean up the output on reboot by keeping completion messages on the same line as the announcement. Someone should probably update the "buffers remaining" message since we now no longer should have any buffers remaining at that point.	2004-07-15 03:20:08 +00:00
Poul-Henning Kamp	e2ad640e13	A module with no modevent function gets modevent_nop() as default. Until now the function has just returned zero for any event, but that is downright wrong for MOD_UNLOAD and not very useful for any future events we add where it may be crucial to be able to tell if the event was unhandled or successful. Change the function to return as follows: MOD_LOAD -> 0 MOD_UNLOAD -> EBUSY anything else -> EOPNOTSUPP	2004-07-14 22:37:36 +00:00
Christian S.J. Peron	ed6c545cf0	In addition to the real user ID check, do an explicit jail check to ensure that the caller is not prison root. The intention is to fix file descriptor creation so that prison root can not use the last remaining file descriptors. This privilege should be reserved for non-jailed root users. Approved by: bmilekic (mentor)	2004-07-14 19:04:31 +00:00
Alfred Perlstein	67543ab1e3	Make FIOASYNC, FIOSETOWN and FIOGETOWN work on kqueues.	2004-07-14 07:02:03 +00:00
John Baldwin	6942d4339e	Set TDF_NEEDRESCHED when a higher priority thread is scheduled in sched_add() rather than just doing it in sched_wakeup(). The old ithread preemption code used to set NEEDRESCHED unconditionally if it didn't preempt which masked this bug in SCHED_4BSD. Noticed by: jake Reported by: kensmith, marcel	2004-07-13 20:49:13 +00:00
Poul-Henning Kamp	65a311fcb2	Give kldunload a -f(orce) argument. Add a MOD_QUIESCE event for modules. This should return error (EBUSY) of the module is in use. MOD_UNLOAD should now only fail if it is impossible (as opposed to inconvenient) to unload the module. Valid reasons are memory references into the module which cannot be tracked down and eliminated. When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is not given, MOD_QUIESCE failing will also prevent the unload. For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as success. Document that modules should return EOPNOTSUPP for unknown events.	2004-07-13 19:36:59 +00:00
Poul-Henning Kamp	1a946b9fef	Add kldunloadf() system call. Stay tuned for follwing commit messages.	2004-07-13 19:35:11 +00:00
Poul-Henning Kamp	49bddf0c9f	fix compilation.	2004-07-13 16:33:38 +00:00
Colin Percival	65bba83fef	Replace "uid != 0" with "suser(td->td_ucred) != 0" when checking if we've hit the maximum number of processes. The last ten processes are reserved for the non-jailed superuser.	2004-07-13 13:10:07 +00:00
David Xu	4d47dc5549	Add code to support debugging threaded process. 1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel thread current user thread is running on. Add tm_dflags into kse_thr_mailbox, the flags is written by debugger, it tells UTS and kernel what should be done when the process is being debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER. TMDF_SSTEP is used to tell kernel to turn on single stepping, or turn off if it is not set. TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall whenever possible, to UTS, it means do not run the user thread until debugger clears it, this behaviour is necessary because gdb wants to resume only one thread when the thread's pc is at a breakpoint, and thread needs to go forward, in order to avoid other threads sneak pass the breakpoints, it needs to remove breakpoint, only wants one thread to go. Also, add km_lwp to kse_mailbox, the lwp id is copied to kse_thr_mailbox at context switch time when process is not being debugged, so when process is attached, debugger can map kernel thread to user thread. 2. Add p_xthread to proc strcuture and td_xsig to thread structure. p_xthread is used by a thread when it wants to report event to debugger, every thread can set the pointer, especially, when it is used in ptracestop, it is the last thread reporting event will win the race. Every thread has a td_xsig to exchange signal with debugger, thread uses TDF_XSIG flag to indicate it is reporting signal to debugger, if the flag is not cleared, thread will keep retrying until it is cleared by debugger, p_xthread may be used by debugger to indicate CURRENT thread. The p_xstat is still in proc structure to keep wait() to work, in future, we may just use td_xsig. 3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend a thread. When process stops, debugger can set the flag for thread, thread will check the flag in thread_suspend_check, enters a loop, unless it is cleared by debugger, process is detached or process is existing. The flag is also checked in ptracestop, so debugger can temporarily suspend a thread even if the thread wants to exchange signal. 4. Current, in ptrace, we always resume all threads, but if a thread has already a TDF_DBSUSPEND flag set by debugger, it won't run. Encouraged by: marcel, julian, deischen	2004-07-13 07:33:40 +00:00
David Xu	ef9457becb	Implement following commands: PT_CLEARSTEP, PT_SETSTEP, PT_SUSPEND PT_RESUME, PT_GETNUMLWPS, PT_GETLWPLIST.	2004-07-13 07:25:24 +00:00
David Xu	cbf4e354ec	Add code to support debugging threaded process. 1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel thread current user thread is running on. Add tm_dflags into kse_thr_mailbox, the flags is written by debugger, it tells UTS and kernel what should be done when the process is being debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER. TMDF_SSTEP is used to tell kernel to turn on single stepping, or turn off if it is not set. TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall whenever possible, to UTS, it means do not run the user thread until debugger clears it, this behaviour is necessary because gdb wants to resume only one thread when the thread's pc is at a breakpoint, and thread needs to go forward, in order to avoid other threads sneak pass the breakpoints, it needs to remove breakpoint, only wants one thread to go. Also, add km_lwp to kse_mailbox, the lwp id is copied to kse_thr_mailbox at context switch time when process is not being debugged, so when process is attached, debugger can map kernel thread to user thread. 2. Add p_xthread to proc strcuture and td_xsig to thread structure. p_xthread is used by a thread when it wants to report event to debugger, every thread can set the pointer, especially, when it is used in ptracestop, it is the last thread reporting event will win the race. Every thread has a td_xsig to exchange signal with debugger, thread uses TDF_XSIG flag to indicate it is reporting signal to debugger, if the flag is not cleared, thread will keep retrying until it is cleared by debugger, p_xthread may be used by debugger to indicate CURRENT thread. The p_xstat is still in proc structure to keep wait() to work, in future, we may just use td_xsig. 3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend a thread. When process stops, debugger can set the flag for thread, thread will check the flag in thread_suspend_check, enters a loop, unless it is cleared by debugger, process is detached or process is existing. The flag is also checked in ptracestop, so debugger can temporarily suspend a thread even if the thread wants to exchange signal. 4. Current, in ptrace, we always resume all threads, but if a thread has already a TDF_DBSUSPEND flag set by debugger, it won't run. Encouraged by: marcel, julian, deischen	2004-07-13 07:20:10 +00:00
Alan Cox	ce8da3091f	Push down the acquisition and release of the page queues lock into pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().	2004-07-13 02:49:22 +00:00
David Malone	dcee93dcf9	Rename Alfred's kern_setsockopt to so_setsockopt, as this seems a a better name. I have a kern_[sg]etsockopt which I plan to commit shortly, but the arguments to these function will be quite different from so_setsockopt. Approved by: alfred	2004-07-12 21:42:33 +00:00
Mike Makonnen	c21e3b38bd	writers must hold both sched_lock and the process lock; therefore, readers need only obtain the process lock.	2004-07-12 15:28:31 +00:00
Alfred Perlstein	f257b7a54b	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
David Xu	507b03186a	Change kse_switchin to accept kse_thr_mailbox pointer, the syscall will be used heavily in debugging KSE threads. This breaks libpthread on IA64, but because libpthread was not in 5.2.1 release, I would like to change it so we needn't to introduce another syscall.	2004-07-12 07:39:20 +00:00
Alfred Perlstein	d58d3648dd	Use SO_REUSEADDR and SO_REUSEPORT when reconnecting NFS mounts. Tune the timeout from 5 seconds to 12 seconds. Provide a sysctl to show how many reconnects the NFS client has done. Seems to fix IPv6 from: kuriyama	2004-07-12 06:22:42 +00:00
Marcel Moolenaar	fbc3247d81	Implement the PT_LWPINFO request. This request can be used by the tracing process to obtain information about the LWP that caused the traced process to stop. Debuggers can use this information to select the thread currently running on the LWP as the current thread. The request has been made compatible with NetBSD for as much as possible. This implementation differs from NetBSD in the following ways: 1. The data argument is allowed to be smaller than the size of the ptrace_lwpinfo structure known to the kernel, but not 0. This is opposite to what NetBSD allows. The reason for this is that we can extend the structure without affecting older binaries. 2. On NetBSD the tracing process is to set the pl_lwpid field to the Id of the LWP it wants information of. We don't do that. Our ptrace interface allows passing the LWP Id instead of the PID. The tracing process is to set the PID to the LWP Id it wants information of. 3. When the PID is actually the PID of the tracing process, this request returns the information about the LWP that caused the process to stop. This was the whole purpose of the request in the first place. When the traced process has exited, this request will return the LWP Id 0, indicating that the process state is not the result of an event specific to a LWP.	2004-07-12 05:07:50 +00:00
Alfred Perlstein	7ae8ce5df1	Dump the actual bad values when this assertion is tripped.	2004-07-12 04:13:38 +00:00
Marcel Moolenaar	3bcd2440db	Make kdb_dbbe_select() available as an interface function. This allows changing the backend from outside the KDB frontend. For example from within a backend. Rewrite kdb_sysctl_current to make use of this function as well.	2004-07-12 01:15:55 +00:00
Robert Watson	a294c3664f	Use sockbuf_pushsync() to synchronize stack and socket buffer state in soreceive() after removing an MT_SONAME mbuf from the head of the socket buffer. When processing MT_CONTROL mbufs in soreceive(), first remove all of the MT_CONTROL mbufs from the head of the socket buffer to a local mbuf chain, then feed them into dom_externalize() as a set, which both avoids thrashing the socket buffer lock when handling multiple control mbufs, and also avoids races with other threads acting on the socket buffer when the socket buffer mutex is released to enter the externalize code. Existing races that might occur if the protocol externalize method blocked during processing have also been closed. Now that we synchronize socket buffer and stack state following modifications to the socket buffer, turn the manual synchronization that previously followed control mbuf processing with a set of assertions. This can eventually be removed. The soreceive() code is now substantially more MPSAFE.	2004-07-11 23:13:14 +00:00
Robert Watson	b7562e178c	Add sockbuf_pushsync(), an inline function that, following a change to the head of the mbuf chains in a socket buffer, re-synchronizes the cache pointers used to optimize socket buffer appends. This will be used by soreceive() before dropping socket buffer mutexes to make sure a consistent version of the socket buffer is visible to other threads. While here, update copyright to account for substantial rewrite of much socket code required for fine-grained locking.	2004-07-11 22:59:32 +00:00
Poul-Henning Kamp	9ef295b7e0	Better descriptions of the cdev malloc class and mutex.	2004-07-11 19:26:43 +00:00
Robert Watson	d861372b14	Add additional annotations to soreceive(), documenting the effects of locking on 'nextrecord' and concerns regarding potentially inconsistent or stale use of socket buffer or stack fields if they aren't carefully synchronized whenever the socket buffer mutex is released. Document that the high-level sblock() prevents races against other readers on the socket. Also document the 'type' logic as to how soreceive() guarantees that it will only return one of normal data or inline out-of-band data.	2004-07-11 18:29:47 +00:00
Doug Rabson	67f8f14a6e	Expand and rewrite documentation using doxygen markup so that we can generate funky web pages from it.	2004-07-11 16:17:42 +00:00
Marcel Moolenaar	a8bfba1a27	Fix braino: Make sure there is a current backend before we return its name in the debug.kdb.current sysctl. All other dereferences are properly guarded, but this one was overlooked. Reported by: Morten Rodal (morten at rodal dot no)	2004-07-11 15:22:43 +00:00
Poul-Henning Kamp	911dbd84c7	Introduce ttygone() which indicates that the hardware is detached. Move dtrwait logic to the generic TTY level.	2004-07-11 15:18:39 +00:00
Robert Watson	0014b343e0	In the 'dontblock' section of soreceive(), assert that the mbuf on hand ('m') is in fact the first mbuf in the receive socket buffer.	2004-07-11 01:44:12 +00:00
Robert Watson	5e44d93ffc	Break out non-inline out-of-band data receive code from soreceive() and put it in its own helper function soreceive_rcvoob().	2004-07-11 01:34:34 +00:00
Robert Watson	a04b09398c	Assign pointers values of NULL rather than 0 in soreceive().	2004-07-11 01:22:40 +00:00
Marcel Moolenaar	32240d082c	Update for the KDB framework: o Call kdb_enter() instead of Debugger().	2004-07-10 21:47:53 +00:00
Robert Watson	7e17bc9f26	When the MT_SONAME mbuf is popped off of a receive socket buffer associated with a PR_ADDR protocol, make sure to update the m_nextpkt pointer of the new head mbuf on the chain to point to the next record. Otherwise, when we release the socket buffer mutex, the socket buffer mbuf chain may be in an inconsistent state.	2004-07-10 21:43:35 +00:00
Marcel Moolenaar	82ebaee7a3	Update for the KDB framework: o Check kdb_active instead of db_active and do so unconditionally.	2004-07-10 21:43:23 +00:00
Marcel Moolenaar	eba21ad501	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o s/WITNESS_DDB/WITNESS_KDB/g o s/witness_ddb/witness_kdb/g o Rename the debug.witness_ddb sysctl to debug.witness_kdb. o Call kdb_backtrace() instead of backtrace(). o Call kdb_enter() instead Debugger(). o Assert kdb_active instead of db_active.	2004-07-10 21:42:16 +00:00
Marcel Moolenaar	2c3490b1a8	Update for the KDB framework: o Call kdb_backtrace() instead of backtrace().	2004-07-10 21:38:22 +00:00
Marcel Moolenaar	ecb01c64d2	Make the GDB dynamic linker hooks (r_debug_state) conditional upon GDB instead of DDB.	2004-07-10 21:37:30 +00:00
Marcel Moolenaar	2d50560abc	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().	2004-07-10 21:36:01 +00:00
Marcel Moolenaar	cbc174356c	Introduce the KDB debugger frontend. The frontend provides a framework in which multiple (presumably different) debugger backends can be configured and which provides basic services to those backends. Besides providing services to backends, it also serves as the single point of contact for any and all code that wants to make use of the debugger functions, such as entering the debugger or handling of the alternate break sequence. For this purpose, the frontend has been made non-optional. All debugger requests are forwarded or handed over to the current backend, if applicable. Selection of the current backend is done by the debug.kdb.current sysctl. A list of configured backends can be obtained with the debug.kdb.available sysctl. One can enter the debugger by writing to the debug.kdb.enter sysctl.	2004-07-10 18:40:12 +00:00
Poul-Henning Kamp	552afd9c12	Clean up and wash struct iovec and struct uio handling. Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.	2004-07-10 15:42:16 +00:00
Robert Watson	5c2b7a2273	Now socket buffer locks are being asserted at higher code blocks in soreceive(), remove some leaf assertions that are redundant.	2004-07-10 04:38:06 +00:00
Robert Watson	32775a01da	Assert socket buffer lock at strategic points between sections of code in soreceive() to confirm we've moved from block to block properly maintaining locking invariants.	2004-07-10 03:47:15 +00:00
John Baldwin	776b99ee1a	Check the lock lists to see if they are empty directly rather than assigning a pointer to the list and then dereferencing the pointer as a second step. When the first spin lock is acquired, curthread is not in a critical section so it may be preempted and would end up using another CPUs lock list instead of its own. When this code was in witness_lock() this sequence was safe as curthread was in a critical section already since witness_lock() is called after the lock is acquired. Tested by: Daniel Lang dl at leo.org	2004-07-09 17:46:27 +00:00
Dag-Erling Smørgrav	520df27692	Cosmetic adjustment to previous commit: name the second argument to sbuf_bcat() and sbuf_bcpy() "buf" rather than "data".	2004-07-09 11:37:44 +00:00
Dag-Erling Smørgrav	d751f0a935	Have sbuf_bcat() and sbuf_bcpy() take a const void * instead of a const char *, since callers are likely to pass in pointers to all kinds of structs and whatnot.	2004-07-09 11:35:30 +00:00
Alan Cox	0049f8b27b	Eliminate struct shm_handle. It is an unnecessary level of indirection to a vm_object.	2004-07-09 05:28:38 +00:00
Robert Watson	6ec70e64c6	Remove spl()'s from do_sendfile().	2004-07-09 01:46:03 +00:00
John Baldwin	63fcce68f1	- Move contents of sched_add() into a sched_add_internal() function that takes an argument to specify if it should preempt or not. Don't preempt when sched_add_internal() is called from kseq_idled() or kseq_assign() as in those cases we are about to call mi_switch() anyways. Also, doing so during the first context switch on an AP leads to a NULL pointer deref because curthread is NULL. - Reenable preemption for ULE. Submitted by: Taku YAMAMOTO taku at tackymt.homeip.net	2004-07-08 21:45:04 +00:00
Alfred Perlstein	057589c485	fixup sysctl by fsid node	2004-07-08 06:11:36 +00:00
Alfred Perlstein	14d543ddfb	style(9)	2004-07-07 07:00:02 +00:00
Alfred Perlstein	81d16e2d64	do the vfsstd thing instead of messing up our VFS_SYSCTL macro.	2004-07-07 06:58:29 +00:00
Peter Edwards	0f01586867	Fix bug introduced in rev 1.434: When avoiding the zeroing of "bogus_page" when it appears in a buf, be sure to advance the pointers into the data for successive pages. The bug caused file corruption when read(2)ing from a "hole" in a file where a previous page of the read block had already been faulted in: fsx tripped up on this pretty quickly. The particular access pattern is probably pretty unusual, so other applications probably wouldn't have had problems, but you'd never know. Reviewed By: alc@	2004-07-06 23:40:40 +00:00
Alfred Perlstein	1ea6061793	Use vfs_suser() where appropriate.	2004-07-06 09:39:32 +00:00
Alfred Perlstein	ea0104b032	Introduce vfs_suser(), used to test if a user should have special privs for a mount.	2004-07-06 09:37:43 +00:00
Alfred Perlstein	c713aaaeca	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.	2004-07-06 09:12:03 +00:00
Robert Watson	df623e3c2f	Temporarily disable preemption in SCHED_ULE due to reported panics and hangs due to recent preemption changes. This change appears to remove the panic that I was running into, but at the cost of increasing ithread scheduling latency, and as such is a temporary band-aid until jhb has a chance to resolve the ule<->preemption interaction that is the source of the problem. If it doesn't fix the problem for others-- sorry!	2004-07-06 05:57:29 +00:00
Don Lewis	27875d9c88	Unconditionally set last_work_seen while in the SYNCER_RUNNING state so that last_work_seen has a reasonable value at the transition to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened to be zero at the time. Initialize last_work_seen to zero as a safety measure in case the syncer never ran in the SYNCER_RUNNING state. Tested by: phk	2004-07-05 21:32:01 +00:00
Robert Watson	6a72b225b7	Drop the socket buffer lock around a call to m_copym() with M_TRYWAIT. A subset of locking changes to soreceive() in the queue for merging. Bumped into by: Willem Jan Withagen <wjw@withagen.nl>	2004-07-05 19:29:33 +00:00
Don Lewis	faf1b66d1d	Rework syncer termination code: Speed up the syncer when shutting down by sleeping for a shorter period of time instead of cranking up rushjob and using the normal one second sleep. Skip empty worklist slots when shutting down to avoid lengthy intervals of inactivity. Give I/O more time to complete between steps by not speeding the syncer quite as much. Terminate the syncer after one full pass through the worklist plus one second with the worklist containing nothing but syncer vnodes. Print an indication of shutdown progress to the console. Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist to be monitored.	2004-07-05 01:07:33 +00:00
Poul-Henning Kamp	c555963fd1	Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE.	2004-07-04 22:33:22 +00:00
Alfred Perlstein	2d1dca73ee	Pass the operation in with the fsidctl. Remove some fsidctls that we will not be using. Correct prototypes for fs sysctls.	2004-07-04 20:21:58 +00:00
Poul-Henning Kamp	7f6599fec6	Make the last commit handle non-phk root devices better.	2004-07-04 19:42:25 +00:00
Stefan Farfeleder	5908d366fb	Consistently use __inline instead of __inline__ as the former is an empty macro in <sys/cdefs.h> for compilers without support for inline.	2004-07-04 16:11:03 +00:00
Poul-Henning Kamp	1cbb1e02c4	Blocksize for I/O should be a property of the vnode and not found by groping around in the vnodes surroundings when we allocate a block. Assign a blocksize when we create a vnode, and yell a warning (and ignore it) if we got the wrong size. Please email all such warnings to me.	2004-07-04 12:49:04 +00:00
Alfred Perlstein	94ed9c8af5	Introduce a new kevent filter. EVFILT_FS that will be used to signal generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS. Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids. Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.	2004-07-04 10:52:54 +00:00
Alfred Perlstein	903ac7c219	Revision 1.496 would not boot on my system due to ffs_mount -> bdevvp -> getnewvnode(..., mp = NULL, ...) -> insmntqueue(vp, mp = NULL) -> KASSERT -> panic Make getnewvnode() only call insmntqueue() if the mountpoint parameter is not NULL.	2004-07-04 10:19:15 +00:00
Poul-Henning Kamp	e3c5a7a4dd	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
Poul-Henning Kamp	cfa5e80af8	Remove stale comment	2004-07-03 19:37:06 +00:00
Poul-Henning Kamp	279f949ee5	Add NULL arg to mi_switch() call to stop kernel compiles from breaking.	2004-07-03 16:57:51 +00:00
John Baldwin	b5cbda5055	Add a NULL param to an mi_switch() that I missed. Reported by: Jung-uk Kim jkim at niksun dot com	2004-07-03 02:38:03 +00:00
Bosko Milekic	abdb4e5d01	Fix SCHED_ULE build on SMP. The previous revision (1.110) introduced a KSE_CAN_MIGRATE() invocation with one argument missing (class). Either this is a genuine forget or it crept in from JHB's repo where he may have modified it. If it's the latter then it may require more attention. For now fix the make depend.	2004-07-03 01:19:46 +00:00
Marcel Moolenaar	8b44a2e2c9	Unbreak build for the the !PREEMPTION case: don't define variables that aren't used in that case.	2004-07-03 00:57:43 +00:00
John Baldwin	0c0b25ae91	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)	2004-07-02 20:21:44 +00:00
John Baldwin	bf0acc273a	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.	2004-07-02 19:09:50 +00:00
David Xu	f3b929bf42	Allow ptrace to deal with lwpid. Reviewed by: marcel	2004-07-02 09:19:22 +00:00
Alfred Perlstein	95f004dccd	We allocate an array of pointers to the global file table while not holding the filelist_lock. This means the filelist can change size while allocating. Detect this race and retry the allocation.	2004-07-02 07:40:10 +00:00
John Baldwin	a3a7017895	Tidy up uprof locking. Mostly the fields are protected by both the proc lock and sched_lock so they can be read with either lock held. Document the locking as well. The one remaining bogosity is that pr_addr and pr_ticks should be per-thread but profiling of multithreaded apps is currently undefined.	2004-07-02 03:50:48 +00:00
John Baldwin	16f9f20579	- Assert that any process that has statclock called on it has both a stats structure and a vmspace as this should always be true rather than checking the always true condition in an if statement. - Remove never-false check: if ((ru = &pstats->p_ru) != NULL) - Remove pstats variable that is only used once and inline its one use instead.	2004-07-02 03:48:09 +00:00
Marcel Moolenaar	cd28f17da2	Change the thread ID (thr_id_t) used for 1:1 threading from being a pointer to the corresponding struct thread to the thread ID (lwpid_t) assigned to that thread. The primary reason for this change is that libthr now internally uses the same ID as the debugger and the kernel when referencing to a kernel thread. This allows us to implement the support for debugging without additional translations and/or mappings. To preserve the ABI, the 1:1 threading syscalls, including the umtx locking API have not been changed to work on a lwpid_t. Instead the 1:1 threading syscalls operate on long and the umtx locking API has not been changed except for the contested bit. Previously this was the least significant bit. Now it's the most significant bit. Since the contested bit should not be tested by userland, this change is not expected to be visible. Just to be sure, UMTX_CONTESTED has been removed from <sys/umtx.h>. Reviewed by: mtm@ ABI preservation tested on: i386, ia64	2004-07-02 00:40:07 +00:00
Marcel Moolenaar	c2589102b0	Regen.	2004-07-02 00:38:56 +00:00
Don Lewis	e06500dde5	When shutting down the syncer kernel thread, first tell it to run faster and iterate to over its work list a few times in an attempt to empty the work list before the syncer terminates. This leaves fewer dirty blocks to be written at the "syncing disks" stage and keeps the the "giving up on N buffers" problem from being triggered by the presence of a large soft updates work list at system shutdown time. The downside is that the syncer takes noticeably longer to terminate. Tested by: "Arjan van Leeuwen" <avleeuwen AT piwebs DOT com> Approved by: mckusick	2004-07-01 23:59:19 +00:00
Warner Losh	da35daffaf	Add ability to set start/end for rman	2004-07-01 16:22:10 +00:00
John Baldwin	39981fed82	Trim a few things from the dmesg output and stick them under bootverbose to cut down on the clutter including PCI interrupt routing, MTRR, pcibios, etc. Discussed with: USENIX Cabal	2004-07-01 07:46:29 +00:00
Warner Losh	0363a12688	Hide struct resource and struct rman. You must define __RMAN_RESOURCE_VISIBLE to see inside these now. Reviewed by: dfr, njl (not njr)	2004-06-30 16:54:10 +00:00
Warner Losh	37b4e4f471	Include more information about the device in the devadded and devremoved events. This reduces the races around these events. We now include the pnp info in both. This lets one do more interesting thigns with devd on device insertion. Submitted by: Bernd Walter	2004-06-30 02:46:25 +00:00
John Baldwin	01bd10e163	Oops, this didn't make it into my submit before I committed: Defer creation of the sysctl tree for the turnstile profiling stats until a SI_SUB_LOCK sysinit. Doing it in init_turnstiles() is too early as it is called before mi_startup().	2004-06-29 03:48:49 +00:00
Peter Wemm	5b201fdcaa	Wrap long line.	2004-06-29 03:13:54 +00:00
John Baldwin	ef0ebfc351	Add two new kernel options to allow rudimentary profiling of the internal hash tables used in the sleep queue and turnstile code. Each option adds a sysctl tree under debug containing the maximum depth of any bucket in the hash table as well as a separate node for each bucket (or chain) containing the current depth and maximum depth for that bucket.	2004-06-29 02:30:12 +00:00
John Baldwin	a5471e4ef4	Remove the signal_caught argument from sleepq_timedwait() as it was effectively always zero.	2004-06-28 18:57:06 +00:00
John Baldwin	bd83e879fd	- Execute all of the tasks on the taskqueue during taskqueue_free() after the queue has been removed from the global taskqueue_queues list. This removes the need for the draining queue hack. - Allow taskqueue_run() to be called with the taskqueue mutex held. It can still be called without the lock for API compatiblity. In that case it will acquire the lock internally. - Don't lock the individual queue mutex in taskqueue_find() until after the strcmp as the global queues mutex is sufficient for the strcmp. - Simplify taskqueue_thread_loop() now that it can hold the lock across taskqueue_run(). Submitted by: bde (mostly)	2004-06-28 16:28:23 +00:00
John Baldwin	c086588f32	Adjust the priority of the idle threads to be the lowest possible priority. This is just a comestic nit as the idle thread priorities aren't used by the schedulers. Reported by: bde	2004-06-28 16:19:50 +00:00
Warner Losh	29b95d5a7e	Turns out that jhb didn't really like this. And nate pointed out that it wasn't a good idea to have the test for NULL on only a limited subset. Go back because I'm not sure adding NULL to all the others is a good idea.	2004-06-28 03:40:23 +00:00
Warner Losh	d5ca7f4f2b	Allow dev to be NULL and assume that a device is not alive or not attached. Reviewed by: njl(?) and jhb	2004-06-28 02:24:04 +00:00
Pawel Jakub Dawidek	46e3b1cbe7	Add two missing includes and remove two uneeded. This is quite serious fix, because even with MAC framework compiled in, MAC entry points in those two files were simply ignored.	2004-06-27 09:03:22 +00:00
Robert Watson	7717cf07f8	Acquire the socket buffer lock when calling unp_scan() on so->so_rcv.sb_mb to prevent the mbuf chain from changing during the scan.	2004-06-27 03:29:25 +00:00
Robert Watson	a290574663	Add a new global mutex, so_global_mtx, which protects the global variables so_gencnt, numopensockets, and the per-socket field so_gencnt. Annotate this this might be better done with atomic operations. Annotate what accept_mtx protects.	2004-06-27 03:22:15 +00:00
Robert Watson	1e4d7da707	Reduce the number of unnecessary unlock-relocks on socket buffer mutexes associated with performing a wakeup on the socket buffer: - When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append(). - When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets. For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.	2004-06-26 19:10:39 +00:00
Marcel Moolenaar	247aba2474	Allocate TIDs in thread_init() and deallocate them in thread_fini(). The overhead of unconditionally allocating TIDs (and likewise, unconditionally deallocating them), is amortized across multiple thread creations by the way UMA makes it possible to have type-stable storage. Previously the cost was kept down by having threads created as part of a fork operation use the process' PID as the TID. While this had some nice properties, it also introduced complexity in the way TIDs were allocated. Most importantly, by using the type-stable storage that UMA gives us this was also unnecessary. This change affects how core dumps are created and in particular how the PRSTATUS notes are dumped. Since we don't have a thread with a TID equalling the PID, we now need a different way to preserve the old and previous behavior. We do this by having the given thread (i.e. the thread passed to the core dump code in td) dump it's state first and fill in pr_pid with the actual PID. All other threads will have pr_pid contain their TIDs. The upshot of all this is that the debugger will now likely select the right LWP (=TID) as the initial thread. Credits to: julian@ for spotting how we can utilize UMA. Thanks to: all who provided julian@ with test results.	2004-06-26 18:58:22 +00:00
Robert Watson	11c40a39b6	Replace comment on spl state when calling soabort() with a comment on locking state. No socket locks should be held when calling soabort() as it will call into protocol code that may acquire socket locks.	2004-06-26 17:12:29 +00:00
Poul-Henning Kamp	cb9ea5f4cb	Pick the hotchar out of the tty structure instead of caching private copies. No current line disciplines have a dynamically changing hotchar, and expecting to receive anything sensible during a change in ldisc is insane so no locking of the hotchar field is necessary.	2004-06-26 09:20:07 +00:00
Poul-Henning Kamp	4776c07426	Fix line discipline switching issues: If opening a new ldisc fails, we have to revert to TTYDISC which we know will successfully open rather than try the previous ldisc which might also fail to open. Do not let ldisc implementations muck about with ->t_line, and remove code which checks for reopens, it should never happen. Move ldisc->l_hotchar to tty->t_hotchar and have ldisc implementation initialize it in their open routines. Reset to zero when we enter TTYDISC. ("no" should really be -1 since zero could be a valid hotchar for certain old european mainframe protocols.)	2004-06-26 08:44:04 +00:00
Poul-Henning Kamp	ccfac9e40e	Gah! commit from wrong tree. Remove now unused variables from last commit.	2004-06-25 22:10:20 +00:00
Poul-Henning Kamp	950cce9b30	Retire the TIOC_REMOTE ioctl. It was added 22 years ago for emacs to use, but emacs gave up on it it 17 years ago.	2004-06-25 21:54:49 +00:00
Robert Watson	a5993a9778	Release UNIX domain socket subsystem lock earlier -- don't need to hold it over free of unp_addr if we've already removed all references to unp.	2004-06-25 20:12:06 +00:00
Poul-Henning Kamp	e77b206f0e	Add two new methods to struct tty: One for manipulating BREAK condition and one for fiddling modem-control signals. Add generic code to deal with the relevant ioctls if these methods are present.	2004-06-25 10:24:10 +00:00
Robert Watson	4f3bf9b9b4	Don't cuddle else's so much as we removed additional parts of each block.	2004-06-24 17:22:29 +00:00
Robert Watson	5e11031e05	Remove temporary API bandage that allowed applications speaking the older API to list attributes on a file (zero-length attribute name) to function. extattr_list_*() are now the only available APIs to use when listing attributes.	2004-06-24 17:14:28 +00:00
Poul-Henning Kamp	075ef10234	#include <sys/serial.h>	2004-06-24 10:32:30 +00:00
Poul-Henning Kamp	98de21b633	Use CTASSERT to enforce the relationship between the new serial port modem definitions and the old definitions from ioctls.	2004-06-24 10:06:55 +00:00
Robert Watson	c6b93bf29a	Lock socket buffers when processing setting socket options SO_SNDLOWAT or SO_RCVLOWAT for read-modify-write.	2004-06-24 04:28:30 +00:00
Robert Watson	ad6b0efff5	Acquire socket lock in the "waiting for connection" loop in kern_connect(), replacing tsleep() with msleep() with the socket mutex.	2004-06-24 01:43:23 +00:00
Robert Watson	3f11a2f374	Introduce sbreserve_locked(), which asserts the socket buffer lock on the socket buffer having its limits adjusted. sbreserve() now acquires the lock before calling sbreserve_locked(). In soreserve(), acquire socket buffer locks across read-modify-writes of socket buffer fields, and calls into sbreserve/sbrelease; make sure to acquire in keeping with the socket buffer lock order. In tcp_mss(), acquire the socket buffer lock in the calling context so that we have atomic read-modify -write on buffer sizes.	2004-06-24 01:37:04 +00:00
Robert Watson	adb4cf0fbc	Slide socket buffer lock earlier in sopoll() to cover the call into selrecord(), setting up select and flagging the socker buffers as SB_SEL and setting up select under the lock.	2004-06-24 00:54:26 +00:00
Bruce M Simpson	a3146ff925	Fix an inconsistency in socket option propagation on accept(). Propagate the SS_NBIO flag from the parent socket to the child socket during an accept() operation. The file descriptor O_NONBLOCK flag would have been propagated already by the fflag assignment, and therefore would have been inconsistent with the underlying socket's so_state member. This makes accept() more closely adhere to the API contract we effectively outline in the manual page. Note also that Linux continues to differ here; O_NONBLOCK is not propagated. The other BSDs do propagate the flag, as does Solaris. The Single UNIX Specification does not offer specific advice on this issue. PR: kern/45733 Requested by: Jayanth Vijayaraghavan Reviewed by: rwatson	2004-06-22 23:58:09 +00:00
Lukas Ertl	9a98ae94ba	Fix a few spelling mistakes in comments and clean them up a bit.	2004-06-22 20:22:24 +00:00
Robert Watson	5282c61738	Regenerate after updating syscalls.master.	2004-06-22 04:36:25 +00:00
Robert Watson	2ed57081a7	Mark unlink() as MPSAFE as we now acquire Giant in the unlink() system call.	2004-06-22 04:34:55 +00:00
Robert Watson	9260798fd7	Acquire Giant in link() so that the system call can be marked MPSAFE. Don't want to acquire Giant in kern_link() sync linux compat code performs actions requiring Giant prior to calling kern_link().	2004-06-22 04:34:05 +00:00
Robert Watson	7af72ad7b6	Rebuild following marking link() as MPSAFE.	2004-06-22 04:29:59 +00:00
Robert Watson	61d87ffdc0	Mark link() system call as MPSAFE.	2004-06-22 04:29:27 +00:00
Robert Watson	694b21cf7b	Acquire Giant in link() so that we can mark it as MSTD in syscalls.master. Don't want to do it in kern_link() since the Linux emulation code calls kern_link() after performing other actions requiring Giant.	2004-06-22 04:29:07 +00:00
Robert Watson	fea24c0a71	Remove spl's from uipc_socket to ease in merging.	2004-06-22 03:49:22 +00:00
Scott Long	36c6fd1c0f	Fix another typo in the previous commit.	2004-06-21 23:47:47 +00:00
Poul-Henning Kamp	ec66f15d14	Put the pre FreeBSD-2.x tty compat code under BURN_BRIDGES.	2004-06-21 22:57:16 +00:00
Scott Long	c38dd4b6bd	Fix typo that somehow crept into the previous commit	2004-06-21 22:42:46 +00:00
Kelly Yancey	de0a924120	Update previous commit to: * Obtain/release schedlock around calls to calcru. * Sort switch cases which do not cascade per style(9). * Sort local variables per style(9). * Remove "superfluous" whitespace. * Cleanup handling of NULL uap->tp in clock_getres(). It would probably be better to return EFAULT like clock_gettime() does by passing the pointer to copyout(), but I presume it was written to not fail on purpose in the original code. I'll defer to -standards on this one. Reported by: bde	2004-06-21 22:34:57 +00:00
Scott Long	dc09579417	Add the sysctl node 'kern.sched.name' that has the name of the scheduler currently in use. Move the 4bsd kern.quantum node to kern.sched.quantum for consistency.	2004-06-21 22:05:46 +00:00
Julian Elischer	dcc9954eb9	Mark the thread in an exiting program as inactive. This is not really used by the process but it's confusing to some status readers to see zombie processes the "runnin" threads. Pointed out by: Don Lewis <truckman@FreeBSD.org>	2004-06-21 20:44:02 +00:00
Bruce Evans	ba39a1c5a4	Turned off the "calcru: negative time" warning for certain SMP cases where it is known to detect a problem but the problem is not very easy to fix. The warning became very common recently after a call to calcru() was added to fill_kinfo_thread(). Another (much older) cause of "negative times" (actually non-monotonic times) was fixed in rev.1.237 of kern_exit.c. Print separate messages for non-monotonic and negative times.	2004-06-21 17:46:27 +00:00
Bruce Evans	40a3fa2d59	(1) Removed the bogus condition "p->p_pid != 1" on calling sched_exit() from exit1(). sched_exit() must be called unconditionally from exit1(). It was called almost unconditionally because the only exits on system shutdown if at all. (2) Removed the comment that presumed to know what sched_exit() does. sched_exit() does different things for the ULE case. The call became essential when it started doing load average stuff, but its caller should not know that. (3) Didn't fix bugs caused by bitrot in the condition. The condition was last correct in rev.1.208 when it was in wait1(). There p was spelled curthread->td_proc and was for the waiting parent; now p is for the exiting child. The condition was to avoid lowering init's priority. It should be in sched_exit() itself. Lowering of priorities is broken in other ways in at least the 4BSD scheduler, and doing it for init causes less noticeable problems than doing it for for shells. Noticed by: julian (1)	2004-06-21 14:49:50 +00:00
Bruce Evans	871684b822	Update p_runtime on exit. This fixes calcru() on zombies, and prepares for not calling calcru() on exit. calcru() on a zombie can happen if ttyinfo() (^T) picks one. PR: 52490	2004-06-21 14:03:38 +00:00
Poul-Henning Kamp	55dbc267cb	New style functions, kill register keyword.	2004-06-21 12:28:56 +00:00
Robert Watson	a34b704666	Merge next step in socket buffer locking: - sowakeup() now asserts the socket buffer lock on entry. Move the call to KNOTE higher in sowakeup() so that it is made with the socket buffer lock held for consistency with other calls. Release the socket buffer lock prior to calling into pgsigio(), so_upcall(), or aio_swake(). Locking for this event management will need revisiting in the future, but this model avoids lock order reversals when upcalls into other subsystems result in socket/socket buffer operations. Assert that the socket buffer lock is not held at the end of the function. - Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now have _locked versions which assert the socket buffer lock on entry. If a wakeup is required by sb_notify(), invoke sowakeup(); otherwise, unconditionally release the socket buffer lock. This results in the socket buffer lock being released whether a wakeup is required or not. - Break out socantsendmore() into socantsendmore_locked() that asserts the socket buffer lock. socantsendmore() unconditionally locks the socket buffer before calling socantsendmore_locked(). Note that both functions return with the socket buffer unlocked as socantsendmore_locked() calls sowwakeup_locked() which has the same properties. Assert that the socket buffer is unlocked on return. - Break out socantrcvmore() into socantrcvmore_locked() that asserts the socket buffer lock. socantrcvmore() unconditionally locks the socket buffer before calling socantrcvmore_locked(). Note that both functions return with the socket buffer unlocked as socantrcvmore_locked() calls sorwakeup_locked() which has similar properties. Assert that the socket buffer is unlocked on return. - Break out sbrelease() into a sbrelease_locked() that asserts the socket buffer lock. sbrelease() unconditionally locks the socket buffer before calling sbrelease_locked(). sbrelease_locked() now invokes sbflush_locked() instead of sbflush(). - Assert the socket buffer lock in socket buffer sanity check functions sblastrecordchk(), sblastmbufchk(). - Assert the socket buffer lock in SBLINKRECORD(). - Break out various sbappend() functions into sbappend_locked() (and variations on that name) that assert the socket buffer lock. The !_locked() variations unconditionally lock the socket buffer before calling their _locked counterparts. Internally, make sure to call _locked() support routines, etc, if already holding the socket buffer lock. - Break out sbinsertoob() into sbinsertoob_locked() that asserts the socket buffer lock. sbinsertoob() unconditionally locks the socket buffer before calling sbinsertoob_locked(). - Break out sbflush() into sbflush_locked() that asserts the socket buffer lock. sbflush() unconditionally locks the socket buffer before calling sbflush_locked(). Update panic strings for new function names. - Break out sbdrop() into sbdrop_locked() that asserts the socket buffer lock. sbdrop() unconditionally locks the socket buffer before calling sbdrop_locked(). - Break out sbdroprecord() into sbdroprecord_locked() that asserts the socket buffer lock. sbdroprecord() unconditionally locks the socket buffer before calling sbdroprecord_locked(). - sofree() now calls socantsendmore_locked() and re-acquires the socket buffer lock on return. It also now calls sbrelease_locked(). - sorflush() now calls socantrcvmore_locked() and re-acquires the socket buffer lock on return. Clean up/mess up other behavior in sorflush() relating to the temporary stack copy of the socket buffer used with dom_dispose by more properly initializing the temporary copy, and selectively bzeroing/copying more carefully to prevent WITNESS from getting confused by improperly initialized mutexes. Annotate why that's necessary, or at least, needed. - soisconnected() now calls sbdrop_locked() before unlocking the socket buffer to avoid locking overhead. Some parts of this change were: Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-21 00:20:43 +00:00
Garance A Drosehn	7638fa19a7	Fill in the values for the ki_tid and ki_numthreads which have been added to kproc_info. PR: bin/65803 (a tiny part...) Submitted by: Cyrille Lefevre	2004-06-20 22:17:22 +00:00
Robert Watson	c9f69064af	In uipc_rcvd(), lock the socket buffers at either end of the UNIX domain sokcet when updating fields at both ends. Submitted by: sam Sponsored by: FreeBSD Foundation	2004-06-20 21:43:13 +00:00
Robert Watson	1b2e3b4b46	Hold SOCK_LOCK(so) when frobbing so_state when disconnecting a connected UNIX domain datagram socket.	2004-06-20 21:29:56 +00:00
Robert Watson	fa8368a8fe	When retrieving the SO_LINGER socket option for user space, hold the socket lock over pulling so_options and so_linger out of the socket structure in order to retrieve a consistent snapshot. This may be overkill if user space doesn't require a consistent snapshot.	2004-06-20 17:50:42 +00:00
Robert Watson	6f4b1b5578	Convert an if->panic in soclose() into a call to KASSERT().	2004-06-20 17:47:51 +00:00
Robert Watson	ed2f7766b0	Annotate some ordering-related issues in solisten() which are not yet resolved by socket locking: in particular, that we test the connection state at the socket layer without locking, request that the protocol begin listening, and then set the listen state on the socket non-atomically, resulting in a non-atomic cross-layer test-and-set.	2004-06-20 17:38:19 +00:00
Robert Watson	d43c1f67cc	Annotate two intentionally unlocked reads with comments. Annotate a potentially inconsistent result returned to user space when performing fstaT() on a socket due to not using socket buffer locking.	2004-06-20 17:35:50 +00:00
Thomas Moestl	3971dcfa4b	Initialize ni_cnd.cn_cred before calling lookup() (this is normally done by namei(), which cannot easily be used here however). This fixes boot time crashes on sparc64 and probably other platforms. Reviewed by: phk	2004-06-20 17:31:01 +00:00
Garance A Drosehn	99d2ecbc7d	Add a call to calcru() to update the kproc_info fields of ki_rusage.ru_utime and ki_rusage.ru_stime. This greatly improves the accuracy of those fields. Suggested by: bde	2004-06-20 02:03:33 +00:00
Marcel Moolenaar	0068114dd5	Define __lwpid_t as an int32_t in <sys/_types.h> and define lwpid_t as an __lwpid_t in <sys/types.h>. Retype td_tid from an int to a lwpid_t and change related definitions accordingly.	2004-06-19 17:58:32 +00:00
Tim J. Robbins	68ba7a1d57	When no fixed address is given in a shmat() request, pass a hint address to vm_map_find() that is less likely to be outside of addressable memory for 32-bit processes: just past the end of the largest possible heap. This is the same hint that mmap() uses.	2004-06-19 14:46:13 +00:00
Garance A Drosehn	078842c5c9	Fill in the some new fields 'struct kinfo_proc', namely ki_childstime, ki_childutime, and ki_emul. Also uses the timevaladd() routine to correct the calculation of ki_childtime. That will correct the value returned when ki_childtime.tv_usec > 1,000,000. This also implements a new KERN_PROC_GID option for kvm_getprocs(). (there will be a similar update to lib/libkvm/kvm_proc.c) Submitted by: Cyrille Lefevre	2004-06-19 14:03:00 +00:00
Poul-Henning Kamp	d7086f313a	Only initialize f_data and f_ops if nobody else did so already.	2004-06-19 11:41:45 +00:00
Poul-Henning Kamp	a769355f9b	Explicitly initialize f_data and f_vnode to NULL. Report f_vnode to userland in struct xfile.	2004-06-19 11:40:08 +00:00
Robert Watson	31f555a1c5	Assert socket buffer lock in sb_lock() to protect socket buffer sleep lock state. Convert tsleep() into msleep() with socket buffer mutex as argument. Hold socket buffer lock over sbunlock() to protect sleep lock state. Assert socket buffer lock in sbwait() to protect the socket buffer wait state. Convert tsleep() into msleep() with socket buffer mutex as argument. Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK() in order to call into these functions with the lock, as well as to start protecting other socket buffer use in their implementation. Drop the socket buffer mutexes around calls into the protocol layer, around potentially blocking operations, for copying to/from user space, and VM operations relating to zero-copy. Assert the socket buffer mutex strategically after code sections or at the beginning of loops. In some cases, modify return code to ensure locks are properly dropped. Convert the potentially blocking allocation of storage for the remote address in soreceive() into a non-blocking allocation; we may wish to move the allocation earlier so that it can block prior to acquisition of the socket buffer lock. Drop some spl use. NOTE: Some races exist in the current structuring of sosend() and soreceive(). This commit only merges basic socket locking in this code; follow-up commits will close additional races. As merged, these changes are not sufficient to run without Giant safely. Reviewed by: juli, tjr	2004-06-19 03:23:14 +00:00
Brian Feldman	8e1b797456	Add a sysctl/tunable, "kern.always_console_output", that lets you set output to permanently (not ephemerally) go to the console. It is also sent to any other console specified by TIOCCONS as normal. While I'm here, document the kern.log_console_output sysctl.	2004-06-18 20:12:42 +00:00
David Xu	b370279ef8	Add comment to reflect that we should retry after thread singling failed.	2004-06-18 11:13:49 +00:00
David Xu	0aabef657e	Remove a bogus panic. It is possible more than one threads will be suspended in thread_suspend_check, after they are resumed, all threads will call thread_single, but only one can be success, others should retry and will exit in thread_suspend_check.	2004-06-18 06:21:09 +00:00
David Xu	ec008e96a8	If thread singler wants to terminate other threads, make sure it includes all threads except itself. Obtained from: julian	2004-06-18 06:15:21 +00:00
Robert Watson	7b574f2e45	Hold SOCK_LOCK(so) while frobbing so_options. Note that while the local race is corrected, there's still a global race in sosend() relating to so_options and the SO_DONTROUTE flag.	2004-06-18 04:02:56 +00:00
Robert Watson	c012260726	Merge some additional leaf node socket buffer locking from rwatson_netperf: Introduce conditional locking of the socket buffer in fifofs kqueue filters; KNOTE() will be called holding the socket buffer locks in fifofs, but sometimes the kqueue() system call will poll using the same entry point without holding the socket buffer lock. Introduce conditional locking of the socket buffer in the socket kqueue filters; KNOTE() will be called holding the socket buffer locks in the socket code, but sometimes the kqueue() system call will poll using the same entry points without holding the socket buffer lock. Simplify the logic in sodisconnect() since we no longer need spls. NOTE: To remove conditional locking in the kqueue filters, it would make sense to use a separate kqueue API entry into the socket/fifo code when calling from the kqueue() system call.	2004-06-18 02:57:55 +00:00
Kelly Yancey	b8817154c3	Implement CLOCK_VIRTUAL and CLOCK_PROF for clock_gettime(2) and clock_getres(2). Reviewed by: phk PR: 23304	2004-06-17 23:12:12 +00:00
Robert Watson	9535efc00d	Merge additional socket buffer locking from rwatson_netperf: - Lock down low hanging fruit use of sb_flags with socket buffer lock. - Lock down low hanging fruit use of so_state with socket lock. - Lock down low hanging fruit use of so_options. - Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock. - Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead. - Convert a if()->panic() into a KASSERT relating to so_state in soaccept(). - Remove a number of splnet()/splx() references. More complex merging of socket and socket buffer locking to follow.	2004-06-17 22:48:11 +00:00
Poul-Henning Kamp	b90c855961	Reduce the thaumaturgical level of root filesystem mounts: Instead of using an otherwise redundant clone routine in geom_disk.c, mount a temporary DEVFS and do a proper lookup. Submitted by: thomas	2004-06-17 21:24:13 +00:00
Poul-Henning Kamp	f3732fd15b	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Peter Wemm	a8774e396e	Change strategy based on a suggestion from Ian Dowse. Instead of trying to keep track of different section base addresses at a symbol-by-symbol level, just set the symbol values at load time.	2004-06-15 23:57:02 +00:00
Robert Watson	7721f5d760	Grab the socket buffer send or receive mutex when performing a read-modify-write on the sb_state field. This commit catches only the "easy" ones where it doesn't interact with as yet unmerged locking.	2004-06-15 03:51:44 +00:00
Peter Wemm	1cab0c857e	Fix symbol lookups between modules. This caused modules that depend on other modules to explode. eg: snd_ich->snd_pcm and umass->usb. The problem was that I was using the unified base address of the module instead of finding the start address of the section in question.	2004-06-15 01:35:57 +00:00
Peter Wemm	add21e178f	Insurance: cause a proper symbol lookup failure for symbol entries that reference unknown sections.. rather than returning a small value.	2004-06-15 01:33:39 +00:00
John Polstra	4717d22a7c	Change the return value of sema_timedwait() so it returns 0 on success and a proper errno value on failure. This makes it consistent with cv_timedwait(), and paves the way for the introduction of functions such as sema_timedwait_sig() which can fail in multiple ways. Bump __FreeBSD_version and add a note to UPDATING. Approved by: scottl (ips driver), arch	2004-06-14 18:19:05 +00:00
Robert Watson	c0b99ffa02	The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.	2004-06-14 18:16:22 +00:00
Poul-Henning Kamp	170593a9b5	Remove a left over from userland buffer-cache access to disks.	2004-06-14 14:25:03 +00:00
Robert Watson	310e7ceb94	Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.	2004-06-13 02:50:07 +00:00
Robert Watson	cce9e3f104	Introduce socket and UNIX domain socket locks into hard-coded lock order definition for witness. Send lock before receive lock, and socket locks after accept but before select: filedesc -> accept -> so_snd -> so_rcv -> sellck All routing locks after send lock: so_rcv -> radix node head All protocol locks before socket locks: unp -> so_snd udp -> udpinp -> so_snd tcp -> tcpinp -> so_snd	2004-06-13 00:23:03 +00:00
Robert Watson	3e87b34a25	Correct whitespace errors in merge from rwatson_netperf: tabs instead of spaces, no trailing tab at the end of line. Pointed out by: csjp	2004-06-12 23:36:59 +00:00
Robert Watson	395a08c904	Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 20:47:32 +00:00
Robert Watson	f6c0cce6d9	Introduce a mutex into struct sockbuf, sb_mtx, which will be used to protect fields in the socket buffer. Add accessor macros to use the mutex (SOCKBUF_()). Initialize the mutex in soalloc(), and destroy it in sodealloc(). Add addition, add SOCK_() access macros which will protect most remaining fields in the socket; for the time being, use the receive socket buffer mutex to implement socket level locking to reduce memory overhead. Submitted by: sam Sponosored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 16:08:41 +00:00
Poul-Henning Kamp	2653139fd2	Fix registration of loadable line disciplines. This should make watch(8)/snp(4) work again.	2004-06-12 12:31:42 +00:00
Bosko Milekic	96e124135b	Gah! Plug a mbuf leak I introduced in the last commit. I don the pointy-hat. Problem reported by: Peter Holm <pho@>	2004-06-11 18:17:25 +00:00
Julian Elischer	94e0a4cdf3	Shuffle some code around.	2004-06-11 17:48:20 +00:00
Poul-Henning Kamp	1930e303cf	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.	2004-06-11 11:16:26 +00:00
Brian Feldman	b4adfcf2f4	Make sysctl_wire_old_buffer() respect ENOMEM from vslock() by marking the valid length as 0. This prevents vsunlock() from removing a system wire from memory that was not successfully wired (by us). Submitted by: tegge	2004-06-11 02:20:37 +00:00
Robert Watson	0d9ce3a1ac	Introduce a subsystem lock around UNIX domain sockets in order to protect global and allocated variables. This strategy is derived from work originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam Leffler: - Add unp_mtx, a global mutex which will protect all UNIX domain socket related variables, structures, etc. - Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros. - Acquire unp_mtx on entering most UNIX domain socket code, drop/re-acquire around calls into VFS, and release it on return. - Avoid performing sodupsockaddr() while holding the mutex, so in general move to allocating storage before acquiring the mutex to copy the data. - Make a stack copy of the xucred rather than copying out while holding unp_mtx. Copy the peer credential out after releasing the mutex. - Add additional assertions of vnode locks following VOP_CREATE(). A few notes: - Use of an sx lock for the file list mutex may cause problems with regard to unp_mtx when garbage collection passed file descriptors. - The locking in unp_pcblist() for sysctl monitoring is correct subject to the unpcb zone not returning memory for reuse by other subsystems (consistent with similar existing concerns). - Sam's version of this change, as with the BSD/OS version, made use of both a global lock and per-unpcb locks. However, in practice, the global lock covered all accesses, so I have simplified out the unpcb locks in the interest of getting this merged faster (reducing the overhead but not sacrificing granularity in most cases). We will want to explore possibilities for improving lock granularity in this code in the future. Submitted by: sam Sponsored by: FreeBSD Foundatiuon Obtained from: BSD/OS 5 snapshot provided by BSDi	2004-06-10 21:34:38 +00:00
Bosko Milekic	b5b2ea9a46	Plug a race where upon free this scenario could occur: (time grows downward) thread 1 thread 2 ------------\|------------ dec ref_cnt \| \| dec ref_cnt <-- ref_cnt now zero cmpset \| free all \| return \| \| alloc again,\| reuse prev \| ref_cnt \| \| cmpset, read \| already freed \| ref_cnt ------------\|------------ This should fix that by performing only a single atomic test-and-set that will serve to decrement the ref_cnt, only if it hasn't changed since the earlier read, otherwise it'll loop and re-read. This forces ordering of decrements so that truly the thread which did the LAST decrement is the one that frees. This is how atomic-instruction-based refcnting should probably be handled. Submitted by: Julian Elischer	2004-06-10 00:04:27 +00:00

... 4 5 6 7 8 ...

7809 Commits