freebsd-skq

Author	SHA1	Message	Date
marcel	91cc6ca5e3	Make the GDB dynamic linker hooks (r_debug_state) conditional upon GDB instead of DDB.	2004-07-10 21:37:30 +00:00
marcel	a9ad69d5af	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().	2004-07-10 21:36:01 +00:00
marcel	60b53542e9	Introduce the KDB debugger frontend. The frontend provides a framework in which multiple (presumably different) debugger backends can be configured and which provides basic services to those backends. Besides providing services to backends, it also serves as the single point of contact for any and all code that wants to make use of the debugger functions, such as entering the debugger or handling of the alternate break sequence. For this purpose, the frontend has been made non-optional. All debugger requests are forwarded or handed over to the current backend, if applicable. Selection of the current backend is done by the debug.kdb.current sysctl. A list of configured backends can be obtained with the debug.kdb.available sysctl. One can enter the debugger by writing to the debug.kdb.enter sysctl.	2004-07-10 18:40:12 +00:00
phk	b9f13e4266	Clean up and wash struct iovec and struct uio handling. Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.	2004-07-10 15:42:16 +00:00
rwatson	a4644f9bd8	Now socket buffer locks are being asserted at higher code blocks in soreceive(), remove some leaf assertions that are redundant.	2004-07-10 04:38:06 +00:00
rwatson	e3e0b9a496	Assert socket buffer lock at strategic points between sections of code in soreceive() to confirm we've moved from block to block properly maintaining locking invariants.	2004-07-10 03:47:15 +00:00
jhb	761713b2ff	Check the lock lists to see if they are empty directly rather than assigning a pointer to the list and then dereferencing the pointer as a second step. When the first spin lock is acquired, curthread is not in a critical section so it may be preempted and would end up using another CPUs lock list instead of its own. When this code was in witness_lock() this sequence was safe as curthread was in a critical section already since witness_lock() is called after the lock is acquired. Tested by: Daniel Lang dl at leo.org	2004-07-09 17:46:27 +00:00
des	9c6b715afb	Cosmetic adjustment to previous commit: name the second argument to sbuf_bcat() and sbuf_bcpy() "buf" rather than "data".	2004-07-09 11:37:44 +00:00
des	3bf01ad1d7	Have sbuf_bcat() and sbuf_bcpy() take a const void * instead of a const char *, since callers are likely to pass in pointers to all kinds of structs and whatnot.	2004-07-09 11:35:30 +00:00
alc	b5e3777efc	Eliminate struct shm_handle. It is an unnecessary level of indirection to a vm_object.	2004-07-09 05:28:38 +00:00
rwatson	fb654efba8	Remove spl()'s from do_sendfile().	2004-07-09 01:46:03 +00:00
jhb	eeb3c91445	- Move contents of sched_add() into a sched_add_internal() function that takes an argument to specify if it should preempt or not. Don't preempt when sched_add_internal() is called from kseq_idled() or kseq_assign() as in those cases we are about to call mi_switch() anyways. Also, doing so during the first context switch on an AP leads to a NULL pointer deref because curthread is NULL. - Reenable preemption for ULE. Submitted by: Taku YAMAMOTO taku at tackymt.homeip.net	2004-07-08 21:45:04 +00:00
alfred	b65386ecc3	fixup sysctl by fsid node	2004-07-08 06:11:36 +00:00
alfred	05d9335437	style(9)	2004-07-07 07:00:02 +00:00
alfred	edeee1cf4e	do the vfsstd thing instead of messing up our VFS_SYSCTL macro.	2004-07-07 06:58:29 +00:00
peadar	adb7022709	Fix bug introduced in rev 1.434: When avoiding the zeroing of "bogus_page" when it appears in a buf, be sure to advance the pointers into the data for successive pages. The bug caused file corruption when read(2)ing from a "hole" in a file where a previous page of the read block had already been faulted in: fsx tripped up on this pretty quickly. The particular access pattern is probably pretty unusual, so other applications probably wouldn't have had problems, but you'd never know. Reviewed By: alc@	2004-07-06 23:40:40 +00:00
alfred	131eae0f4c	Use vfs_suser() where appropriate.	2004-07-06 09:39:32 +00:00
alfred	e0a5f530c2	Introduce vfs_suser(), used to test if a user should have special privs for a mount.	2004-07-06 09:37:43 +00:00
alfred	97a6f04270	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.	2004-07-06 09:12:03 +00:00
rwatson	fef549cb01	Temporarily disable preemption in SCHED_ULE due to reported panics and hangs due to recent preemption changes. This change appears to remove the panic that I was running into, but at the cost of increasing ithread scheduling latency, and as such is a temporary band-aid until jhb has a chance to resolve the ule<->preemption interaction that is the source of the problem. If it doesn't fix the problem for others-- sorry!	2004-07-06 05:57:29 +00:00
truckman	690b842bc5	Unconditionally set last_work_seen while in the SYNCER_RUNNING state so that last_work_seen has a reasonable value at the transition to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened to be zero at the time. Initialize last_work_seen to zero as a safety measure in case the syncer never ran in the SYNCER_RUNNING state. Tested by: phk	2004-07-05 21:32:01 +00:00
rwatson	01ce51e897	Drop the socket buffer lock around a call to m_copym() with M_TRYWAIT. A subset of locking changes to soreceive() in the queue for merging. Bumped into by: Willem Jan Withagen <wjw@withagen.nl>	2004-07-05 19:29:33 +00:00
truckman	471ab74bb2	Rework syncer termination code: Speed up the syncer when shutting down by sleeping for a shorter period of time instead of cranking up rushjob and using the normal one second sleep. Skip empty worklist slots when shutting down to avoid lengthy intervals of inactivity. Give I/O more time to complete between steps by not speeding the syncer quite as much. Terminate the syncer after one full pass through the worklist plus one second with the worklist containing nothing but syncer vnodes. Print an indication of shutdown progress to the console. Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist to be monitored.	2004-07-05 01:07:33 +00:00
phk	49a3aa211e	Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE.	2004-07-04 22:33:22 +00:00
alfred	95f8f1e089	Pass the operation in with the fsidctl. Remove some fsidctls that we will not be using. Correct prototypes for fs sysctls.	2004-07-04 20:21:58 +00:00
phk	59c88fd71a	Make the last commit handle non-phk root devices better.	2004-07-04 19:42:25 +00:00
stefanf	9dea8aeba1	Consistently use __inline instead of __inline__ as the former is an empty macro in <sys/cdefs.h> for compilers without support for inline.	2004-07-04 16:11:03 +00:00
phk	b52c81e5db	Blocksize for I/O should be a property of the vnode and not found by groping around in the vnodes surroundings when we allocate a block. Assign a blocksize when we create a vnode, and yell a warning (and ignore it) if we got the wrong size. Please email all such warnings to me.	2004-07-04 12:49:04 +00:00
alfred	bbaa6c3ec0	Introduce a new kevent filter. EVFILT_FS that will be used to signal generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS. Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids. Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.	2004-07-04 10:52:54 +00:00
alfred	4a61cff009	Revision 1.496 would not boot on my system due to ffs_mount -> bdevvp -> getnewvnode(..., mp = NULL, ...) -> insmntqueue(vp, mp = NULL) -> KASSERT -> panic Make getnewvnode() only call insmntqueue() if the mountpoint parameter is not NULL.	2004-07-04 10:19:15 +00:00
phk	070a613a48	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
phk	7af2451eed	Remove stale comment	2004-07-03 19:37:06 +00:00
phk	abeab8c454	Add NULL arg to mi_switch() call to stop kernel compiles from breaking.	2004-07-03 16:57:51 +00:00
jhb	ff0e20b1b6	Add a NULL param to an mi_switch() that I missed. Reported by: Jung-uk Kim jkim at niksun dot com	2004-07-03 02:38:03 +00:00
bmilekic	067d8e4e13	Fix SCHED_ULE build on SMP. The previous revision (1.110) introduced a KSE_CAN_MIGRATE() invocation with one argument missing (class). Either this is a genuine forget or it crept in from JHB's repo where he may have modified it. If it's the latter then it may require more attention. For now fix the make depend.	2004-07-03 01:19:46 +00:00
marcel	82affa1f89	Unbreak build for the the !PREEMPTION case: don't define variables that aren't used in that case.	2004-07-03 00:57:43 +00:00
jhb	696704716d	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)	2004-07-02 20:21:44 +00:00
jhb	1b16b181d1	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.	2004-07-02 19:09:50 +00:00
davidxu	e7a5dd69ac	Allow ptrace to deal with lwpid. Reviewed by: marcel	2004-07-02 09:19:22 +00:00
alfred	f05df8a881	We allocate an array of pointers to the global file table while not holding the filelist_lock. This means the filelist can change size while allocating. Detect this race and retry the allocation.	2004-07-02 07:40:10 +00:00
jhb	ca6f6cfd39	Tidy up uprof locking. Mostly the fields are protected by both the proc lock and sched_lock so they can be read with either lock held. Document the locking as well. The one remaining bogosity is that pr_addr and pr_ticks should be per-thread but profiling of multithreaded apps is currently undefined.	2004-07-02 03:50:48 +00:00
jhb	2c858bc6df	- Assert that any process that has statclock called on it has both a stats structure and a vmspace as this should always be true rather than checking the always true condition in an if statement. - Remove never-false check: if ((ru = &pstats->p_ru) != NULL) - Remove pstats variable that is only used once and inline its one use instead.	2004-07-02 03:48:09 +00:00
marcel	622fe058c9	Change the thread ID (thr_id_t) used for 1:1 threading from being a pointer to the corresponding struct thread to the thread ID (lwpid_t) assigned to that thread. The primary reason for this change is that libthr now internally uses the same ID as the debugger and the kernel when referencing to a kernel thread. This allows us to implement the support for debugging without additional translations and/or mappings. To preserve the ABI, the 1:1 threading syscalls, including the umtx locking API have not been changed to work on a lwpid_t. Instead the 1:1 threading syscalls operate on long and the umtx locking API has not been changed except for the contested bit. Previously this was the least significant bit. Now it's the most significant bit. Since the contested bit should not be tested by userland, this change is not expected to be visible. Just to be sure, UMTX_CONTESTED has been removed from <sys/umtx.h>. Reviewed by: mtm@ ABI preservation tested on: i386, ia64	2004-07-02 00:40:07 +00:00
marcel	e84fdd61ba	Regen.	2004-07-02 00:38:56 +00:00
truckman	9ed03e6eb3	When shutting down the syncer kernel thread, first tell it to run faster and iterate to over its work list a few times in an attempt to empty the work list before the syncer terminates. This leaves fewer dirty blocks to be written at the "syncing disks" stage and keeps the the "giving up on N buffers" problem from being triggered by the presence of a large soft updates work list at system shutdown time. The downside is that the syncer takes noticeably longer to terminate. Tested by: "Arjan van Leeuwen" <avleeuwen AT piwebs DOT com> Approved by: mckusick	2004-07-01 23:59:19 +00:00
imp	dc57b667a0	Add ability to set start/end for rman	2004-07-01 16:22:10 +00:00
jhb	900e7c295d	Trim a few things from the dmesg output and stick them under bootverbose to cut down on the clutter including PCI interrupt routing, MTRR, pcibios, etc. Discussed with: USENIX Cabal	2004-07-01 07:46:29 +00:00
imp	9f2638da1f	Hide struct resource and struct rman. You must define __RMAN_RESOURCE_VISIBLE to see inside these now. Reviewed by: dfr, njl (not njr)	2004-06-30 16:54:10 +00:00
imp	3e05633a44	Include more information about the device in the devadded and devremoved events. This reduces the races around these events. We now include the pnp info in both. This lets one do more interesting thigns with devd on device insertion. Submitted by: Bernd Walter	2004-06-30 02:46:25 +00:00
jhb	9c6cf2340f	Oops, this didn't make it into my submit before I committed: Defer creation of the sysctl tree for the turnstile profiling stats until a SI_SUB_LOCK sysinit. Doing it in init_turnstiles() is too early as it is called before mi_startup().	2004-06-29 03:48:49 +00:00
peter	87fa6f8535	Wrap long line.	2004-06-29 03:13:54 +00:00
jhb	6502f84a50	Add two new kernel options to allow rudimentary profiling of the internal hash tables used in the sleep queue and turnstile code. Each option adds a sysctl tree under debug containing the maximum depth of any bucket in the hash table as well as a separate node for each bucket (or chain) containing the current depth and maximum depth for that bucket.	2004-06-29 02:30:12 +00:00
jhb	4dab07ef95	Remove the signal_caught argument from sleepq_timedwait() as it was effectively always zero.	2004-06-28 18:57:06 +00:00
jhb	9234a42a69	- Execute all of the tasks on the taskqueue during taskqueue_free() after the queue has been removed from the global taskqueue_queues list. This removes the need for the draining queue hack. - Allow taskqueue_run() to be called with the taskqueue mutex held. It can still be called without the lock for API compatiblity. In that case it will acquire the lock internally. - Don't lock the individual queue mutex in taskqueue_find() until after the strcmp as the global queues mutex is sufficient for the strcmp. - Simplify taskqueue_thread_loop() now that it can hold the lock across taskqueue_run(). Submitted by: bde (mostly)	2004-06-28 16:28:23 +00:00
jhb	515258abfe	Adjust the priority of the idle threads to be the lowest possible priority. This is just a comestic nit as the idle thread priorities aren't used by the schedulers. Reported by: bde	2004-06-28 16:19:50 +00:00
imp	d334c4b305	Turns out that jhb didn't really like this. And nate pointed out that it wasn't a good idea to have the test for NULL on only a limited subset. Go back because I'm not sure adding NULL to all the others is a good idea.	2004-06-28 03:40:23 +00:00
imp	bd52bbc3c7	Allow dev to be NULL and assume that a device is not alive or not attached. Reviewed by: njl(?) and jhb	2004-06-28 02:24:04 +00:00
pjd	5055061c5d	Add two missing includes and remove two uneeded. This is quite serious fix, because even with MAC framework compiled in, MAC entry points in those two files were simply ignored.	2004-06-27 09:03:22 +00:00
rwatson	b9d22ffbfa	Acquire the socket buffer lock when calling unp_scan() on so->so_rcv.sb_mb to prevent the mbuf chain from changing during the scan.	2004-06-27 03:29:25 +00:00
rwatson	33abe94990	Add a new global mutex, so_global_mtx, which protects the global variables so_gencnt, numopensockets, and the per-socket field so_gencnt. Annotate this this might be better done with atomic operations. Annotate what accept_mtx protects.	2004-06-27 03:22:15 +00:00
rwatson	758f90deb8	Reduce the number of unnecessary unlock-relocks on socket buffer mutexes associated with performing a wakeup on the socket buffer: - When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append(). - When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets. For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.	2004-06-26 19:10:39 +00:00
marcel	49e32d12eb	Allocate TIDs in thread_init() and deallocate them in thread_fini(). The overhead of unconditionally allocating TIDs (and likewise, unconditionally deallocating them), is amortized across multiple thread creations by the way UMA makes it possible to have type-stable storage. Previously the cost was kept down by having threads created as part of a fork operation use the process' PID as the TID. While this had some nice properties, it also introduced complexity in the way TIDs were allocated. Most importantly, by using the type-stable storage that UMA gives us this was also unnecessary. This change affects how core dumps are created and in particular how the PRSTATUS notes are dumped. Since we don't have a thread with a TID equalling the PID, we now need a different way to preserve the old and previous behavior. We do this by having the given thread (i.e. the thread passed to the core dump code in td) dump it's state first and fill in pr_pid with the actual PID. All other threads will have pr_pid contain their TIDs. The upshot of all this is that the debugger will now likely select the right LWP (=TID) as the initial thread. Credits to: julian@ for spotting how we can utilize UMA. Thanks to: all who provided julian@ with test results.	2004-06-26 18:58:22 +00:00
rwatson	8ecd20c6f7	Replace comment on spl state when calling soabort() with a comment on locking state. No socket locks should be held when calling soabort() as it will call into protocol code that may acquire socket locks.	2004-06-26 17:12:29 +00:00
phk	0567d4ef5f	Pick the hotchar out of the tty structure instead of caching private copies. No current line disciplines have a dynamically changing hotchar, and expecting to receive anything sensible during a change in ldisc is insane so no locking of the hotchar field is necessary.	2004-06-26 09:20:07 +00:00
phk	1aa6c5a754	Fix line discipline switching issues: If opening a new ldisc fails, we have to revert to TTYDISC which we know will successfully open rather than try the previous ldisc which might also fail to open. Do not let ldisc implementations muck about with ->t_line, and remove code which checks for reopens, it should never happen. Move ldisc->l_hotchar to tty->t_hotchar and have ldisc implementation initialize it in their open routines. Reset to zero when we enter TTYDISC. ("no" should really be -1 since zero could be a valid hotchar for certain old european mainframe protocols.)	2004-06-26 08:44:04 +00:00
phk	e18498a4a4	Gah! commit from wrong tree. Remove now unused variables from last commit.	2004-06-25 22:10:20 +00:00
phk	11df1584ae	Retire the TIOC_REMOTE ioctl. It was added 22 years ago for emacs to use, but emacs gave up on it it 17 years ago.	2004-06-25 21:54:49 +00:00
rwatson	7203fb63d4	Release UNIX domain socket subsystem lock earlier -- don't need to hold it over free of unp_addr if we've already removed all references to unp.	2004-06-25 20:12:06 +00:00
phk	1f0612daa9	Add two new methods to struct tty: One for manipulating BREAK condition and one for fiddling modem-control signals. Add generic code to deal with the relevant ioctls if these methods are present.	2004-06-25 10:24:10 +00:00
rwatson	55c984d0c9	Don't cuddle else's so much as we removed additional parts of each block.	2004-06-24 17:22:29 +00:00
rwatson	96d33374ff	Remove temporary API bandage that allowed applications speaking the older API to list attributes on a file (zero-length attribute name) to function. extattr_list_*() are now the only available APIs to use when listing attributes.	2004-06-24 17:14:28 +00:00
phk	b1dd8c1222	#include <sys/serial.h>	2004-06-24 10:32:30 +00:00
phk	24f207afc6	Use CTASSERT to enforce the relationship between the new serial port modem definitions and the old definitions from ioctls.	2004-06-24 10:06:55 +00:00
rwatson	5b148b0cd4	Lock socket buffers when processing setting socket options SO_SNDLOWAT or SO_RCVLOWAT for read-modify-write.	2004-06-24 04:28:30 +00:00
rwatson	deac06df05	Acquire socket lock in the "waiting for connection" loop in kern_connect(), replacing tsleep() with msleep() with the socket mutex.	2004-06-24 01:43:23 +00:00
rwatson	caac080ec9	Introduce sbreserve_locked(), which asserts the socket buffer lock on the socket buffer having its limits adjusted. sbreserve() now acquires the lock before calling sbreserve_locked(). In soreserve(), acquire socket buffer locks across read-modify-writes of socket buffer fields, and calls into sbreserve/sbrelease; make sure to acquire in keeping with the socket buffer lock order. In tcp_mss(), acquire the socket buffer lock in the calling context so that we have atomic read-modify -write on buffer sizes.	2004-06-24 01:37:04 +00:00
rwatson	e71609f557	Slide socket buffer lock earlier in sopoll() to cover the call into selrecord(), setting up select and flagging the socker buffers as SB_SEL and setting up select under the lock.	2004-06-24 00:54:26 +00:00
bms	00a26380d4	Fix an inconsistency in socket option propagation on accept(). Propagate the SS_NBIO flag from the parent socket to the child socket during an accept() operation. The file descriptor O_NONBLOCK flag would have been propagated already by the fflag assignment, and therefore would have been inconsistent with the underlying socket's so_state member. This makes accept() more closely adhere to the API contract we effectively outline in the manual page. Note also that Linux continues to differ here; O_NONBLOCK is not propagated. The other BSDs do propagate the flag, as does Solaris. The Single UNIX Specification does not offer specific advice on this issue. PR: kern/45733 Requested by: Jayanth Vijayaraghavan Reviewed by: rwatson	2004-06-22 23:58:09 +00:00
le	5d3555a2f9	Fix a few spelling mistakes in comments and clean them up a bit.	2004-06-22 20:22:24 +00:00
rwatson	c6ebad2a3a	Regenerate after updating syscalls.master.	2004-06-22 04:36:25 +00:00
rwatson	498a844902	Mark unlink() as MPSAFE as we now acquire Giant in the unlink() system call.	2004-06-22 04:34:55 +00:00
rwatson	1d271c7712	Acquire Giant in link() so that the system call can be marked MPSAFE. Don't want to acquire Giant in kern_link() sync linux compat code performs actions requiring Giant prior to calling kern_link().	2004-06-22 04:34:05 +00:00
rwatson	78c99bbbea	Rebuild following marking link() as MPSAFE.	2004-06-22 04:29:59 +00:00
rwatson	761a6e100d	Mark link() system call as MPSAFE.	2004-06-22 04:29:27 +00:00
rwatson	8a33376794	Acquire Giant in link() so that we can mark it as MSTD in syscalls.master. Don't want to do it in kern_link() since the Linux emulation code calls kern_link() after performing other actions requiring Giant.	2004-06-22 04:29:07 +00:00
rwatson	935a4c087e	Remove spl's from uipc_socket to ease in merging.	2004-06-22 03:49:22 +00:00
scottl	16254a8419	Fix another typo in the previous commit.	2004-06-21 23:47:47 +00:00
phk	0033eabc1b	Put the pre FreeBSD-2.x tty compat code under BURN_BRIDGES.	2004-06-21 22:57:16 +00:00
scottl	2c953c0dce	Fix typo that somehow crept into the previous commit	2004-06-21 22:42:46 +00:00
kbyanc	1e3cfa985e	Update previous commit to: * Obtain/release schedlock around calls to calcru. * Sort switch cases which do not cascade per style(9). * Sort local variables per style(9). * Remove "superfluous" whitespace. * Cleanup handling of NULL uap->tp in clock_getres(). It would probably be better to return EFAULT like clock_gettime() does by passing the pointer to copyout(), but I presume it was written to not fail on purpose in the original code. I'll defer to -standards on this one. Reported by: bde	2004-06-21 22:34:57 +00:00
scottl	95c9bdb62a	Add the sysctl node 'kern.sched.name' that has the name of the scheduler currently in use. Move the 4bsd kern.quantum node to kern.sched.quantum for consistency.	2004-06-21 22:05:46 +00:00
julian	545024cdd9	Mark the thread in an exiting program as inactive. This is not really used by the process but it's confusing to some status readers to see zombie processes the "runnin" threads. Pointed out by: Don Lewis <truckman@FreeBSD.org>	2004-06-21 20:44:02 +00:00
bde	459a8e3950	Turned off the "calcru: negative time" warning for certain SMP cases where it is known to detect a problem but the problem is not very easy to fix. The warning became very common recently after a call to calcru() was added to fill_kinfo_thread(). Another (much older) cause of "negative times" (actually non-monotonic times) was fixed in rev.1.237 of kern_exit.c. Print separate messages for non-monotonic and negative times.	2004-06-21 17:46:27 +00:00
bde	747f331358	(1) Removed the bogus condition "p->p_pid != 1" on calling sched_exit() from exit1(). sched_exit() must be called unconditionally from exit1(). It was called almost unconditionally because the only exits on system shutdown if at all. (2) Removed the comment that presumed to know what sched_exit() does. sched_exit() does different things for the ULE case. The call became essential when it started doing load average stuff, but its caller should not know that. (3) Didn't fix bugs caused by bitrot in the condition. The condition was last correct in rev.1.208 when it was in wait1(). There p was spelled curthread->td_proc and was for the waiting parent; now p is for the exiting child. The condition was to avoid lowering init's priority. It should be in sched_exit() itself. Lowering of priorities is broken in other ways in at least the 4BSD scheduler, and doing it for init causes less noticeable problems than doing it for for shells. Noticed by: julian (1)	2004-06-21 14:49:50 +00:00
bde	b65b61b58a	Update p_runtime on exit. This fixes calcru() on zombies, and prepares for not calling calcru() on exit. calcru() on a zombie can happen if ttyinfo() (^T) picks one. PR: 52490	2004-06-21 14:03:38 +00:00
phk	9c97a4d517	New style functions, kill register keyword.	2004-06-21 12:28:56 +00:00
rwatson	21164a78ac	Merge next step in socket buffer locking: - sowakeup() now asserts the socket buffer lock on entry. Move the call to KNOTE higher in sowakeup() so that it is made with the socket buffer lock held for consistency with other calls. Release the socket buffer lock prior to calling into pgsigio(), so_upcall(), or aio_swake(). Locking for this event management will need revisiting in the future, but this model avoids lock order reversals when upcalls into other subsystems result in socket/socket buffer operations. Assert that the socket buffer lock is not held at the end of the function. - Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now have _locked versions which assert the socket buffer lock on entry. If a wakeup is required by sb_notify(), invoke sowakeup(); otherwise, unconditionally release the socket buffer lock. This results in the socket buffer lock being released whether a wakeup is required or not. - Break out socantsendmore() into socantsendmore_locked() that asserts the socket buffer lock. socantsendmore() unconditionally locks the socket buffer before calling socantsendmore_locked(). Note that both functions return with the socket buffer unlocked as socantsendmore_locked() calls sowwakeup_locked() which has the same properties. Assert that the socket buffer is unlocked on return. - Break out socantrcvmore() into socantrcvmore_locked() that asserts the socket buffer lock. socantrcvmore() unconditionally locks the socket buffer before calling socantrcvmore_locked(). Note that both functions return with the socket buffer unlocked as socantrcvmore_locked() calls sorwakeup_locked() which has similar properties. Assert that the socket buffer is unlocked on return. - Break out sbrelease() into a sbrelease_locked() that asserts the socket buffer lock. sbrelease() unconditionally locks the socket buffer before calling sbrelease_locked(). sbrelease_locked() now invokes sbflush_locked() instead of sbflush(). - Assert the socket buffer lock in socket buffer sanity check functions sblastrecordchk(), sblastmbufchk(). - Assert the socket buffer lock in SBLINKRECORD(). - Break out various sbappend() functions into sbappend_locked() (and variations on that name) that assert the socket buffer lock. The !_locked() variations unconditionally lock the socket buffer before calling their _locked counterparts. Internally, make sure to call _locked() support routines, etc, if already holding the socket buffer lock. - Break out sbinsertoob() into sbinsertoob_locked() that asserts the socket buffer lock. sbinsertoob() unconditionally locks the socket buffer before calling sbinsertoob_locked(). - Break out sbflush() into sbflush_locked() that asserts the socket buffer lock. sbflush() unconditionally locks the socket buffer before calling sbflush_locked(). Update panic strings for new function names. - Break out sbdrop() into sbdrop_locked() that asserts the socket buffer lock. sbdrop() unconditionally locks the socket buffer before calling sbdrop_locked(). - Break out sbdroprecord() into sbdroprecord_locked() that asserts the socket buffer lock. sbdroprecord() unconditionally locks the socket buffer before calling sbdroprecord_locked(). - sofree() now calls socantsendmore_locked() and re-acquires the socket buffer lock on return. It also now calls sbrelease_locked(). - sorflush() now calls socantrcvmore_locked() and re-acquires the socket buffer lock on return. Clean up/mess up other behavior in sorflush() relating to the temporary stack copy of the socket buffer used with dom_dispose by more properly initializing the temporary copy, and selectively bzeroing/copying more carefully to prevent WITNESS from getting confused by improperly initialized mutexes. Annotate why that's necessary, or at least, needed. - soisconnected() now calls sbdrop_locked() before unlocking the socket buffer to avoid locking overhead. Some parts of this change were: Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-21 00:20:43 +00:00
gad	be237c4660	Fill in the values for the ki_tid and ki_numthreads which have been added to kproc_info. PR: bin/65803 (a tiny part...) Submitted by: Cyrille Lefevre	2004-06-20 22:17:22 +00:00
rwatson	ffabeb7229	In uipc_rcvd(), lock the socket buffers at either end of the UNIX domain sokcet when updating fields at both ends. Submitted by: sam Sponsored by: FreeBSD Foundation	2004-06-20 21:43:13 +00:00
rwatson	8c7c75cc62	Hold SOCK_LOCK(so) when frobbing so_state when disconnecting a connected UNIX domain datagram socket.	2004-06-20 21:29:56 +00:00
rwatson	e1348f0140	When retrieving the SO_LINGER socket option for user space, hold the socket lock over pulling so_options and so_linger out of the socket structure in order to retrieve a consistent snapshot. This may be overkill if user space doesn't require a consistent snapshot.	2004-06-20 17:50:42 +00:00
rwatson	d77a417b71	Convert an if->panic in soclose() into a call to KASSERT().	2004-06-20 17:47:51 +00:00
rwatson	1183f24cde	Annotate some ordering-related issues in solisten() which are not yet resolved by socket locking: in particular, that we test the connection state at the socket layer without locking, request that the protocol begin listening, and then set the listen state on the socket non-atomically, resulting in a non-atomic cross-layer test-and-set.	2004-06-20 17:38:19 +00:00
rwatson	c2c08bfea9	Annotate two intentionally unlocked reads with comments. Annotate a potentially inconsistent result returned to user space when performing fstaT() on a socket due to not using socket buffer locking.	2004-06-20 17:35:50 +00:00
tmm	2dca765b97	Initialize ni_cnd.cn_cred before calling lookup() (this is normally done by namei(), which cannot easily be used here however). This fixes boot time crashes on sparc64 and probably other platforms. Reviewed by: phk	2004-06-20 17:31:01 +00:00
gad	f83481b547	Add a call to calcru() to update the kproc_info fields of ki_rusage.ru_utime and ki_rusage.ru_stime. This greatly improves the accuracy of those fields. Suggested by: bde	2004-06-20 02:03:33 +00:00
marcel	ce6c7857d6	Define __lwpid_t as an int32_t in <sys/_types.h> and define lwpid_t as an __lwpid_t in <sys/types.h>. Retype td_tid from an int to a lwpid_t and change related definitions accordingly.	2004-06-19 17:58:32 +00:00
tjr	8fd212e66e	When no fixed address is given in a shmat() request, pass a hint address to vm_map_find() that is less likely to be outside of addressable memory for 32-bit processes: just past the end of the largest possible heap. This is the same hint that mmap() uses.	2004-06-19 14:46:13 +00:00
gad	3f2ff133b4	Fill in the some new fields 'struct kinfo_proc', namely ki_childstime, ki_childutime, and ki_emul. Also uses the timevaladd() routine to correct the calculation of ki_childtime. That will correct the value returned when ki_childtime.tv_usec > 1,000,000. This also implements a new KERN_PROC_GID option for kvm_getprocs(). (there will be a similar update to lib/libkvm/kvm_proc.c) Submitted by: Cyrille Lefevre	2004-06-19 14:03:00 +00:00
phk	3237babd8b	Only initialize f_data and f_ops if nobody else did so already.	2004-06-19 11:41:45 +00:00
phk	1ce305fbfd	Explicitly initialize f_data and f_vnode to NULL. Report f_vnode to userland in struct xfile.	2004-06-19 11:40:08 +00:00
rwatson	e5f4cab982	Assert socket buffer lock in sb_lock() to protect socket buffer sleep lock state. Convert tsleep() into msleep() with socket buffer mutex as argument. Hold socket buffer lock over sbunlock() to protect sleep lock state. Assert socket buffer lock in sbwait() to protect the socket buffer wait state. Convert tsleep() into msleep() with socket buffer mutex as argument. Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK() in order to call into these functions with the lock, as well as to start protecting other socket buffer use in their implementation. Drop the socket buffer mutexes around calls into the protocol layer, around potentially blocking operations, for copying to/from user space, and VM operations relating to zero-copy. Assert the socket buffer mutex strategically after code sections or at the beginning of loops. In some cases, modify return code to ensure locks are properly dropped. Convert the potentially blocking allocation of storage for the remote address in soreceive() into a non-blocking allocation; we may wish to move the allocation earlier so that it can block prior to acquisition of the socket buffer lock. Drop some spl use. NOTE: Some races exist in the current structuring of sosend() and soreceive(). This commit only merges basic socket locking in this code; follow-up commits will close additional races. As merged, these changes are not sufficient to run without Giant safely. Reviewed by: juli, tjr	2004-06-19 03:23:14 +00:00
green	e57eac4be2	Add a sysctl/tunable, "kern.always_console_output", that lets you set output to permanently (not ephemerally) go to the console. It is also sent to any other console specified by TIOCCONS as normal. While I'm here, document the kern.log_console_output sysctl.	2004-06-18 20:12:42 +00:00
davidxu	d11f8ce42b	Add comment to reflect that we should retry after thread singling failed.	2004-06-18 11:13:49 +00:00
davidxu	70a732669b	Remove a bogus panic. It is possible more than one threads will be suspended in thread_suspend_check, after they are resumed, all threads will call thread_single, but only one can be success, others should retry and will exit in thread_suspend_check.	2004-06-18 06:21:09 +00:00
davidxu	673364f0ef	If thread singler wants to terminate other threads, make sure it includes all threads except itself. Obtained from: julian	2004-06-18 06:15:21 +00:00
rwatson	89d347105a	Hold SOCK_LOCK(so) while frobbing so_options. Note that while the local race is corrected, there's still a global race in sosend() relating to so_options and the SO_DONTROUTE flag.	2004-06-18 04:02:56 +00:00
rwatson	d87fad9f08	Merge some additional leaf node socket buffer locking from rwatson_netperf: Introduce conditional locking of the socket buffer in fifofs kqueue filters; KNOTE() will be called holding the socket buffer locks in fifofs, but sometimes the kqueue() system call will poll using the same entry point without holding the socket buffer lock. Introduce conditional locking of the socket buffer in the socket kqueue filters; KNOTE() will be called holding the socket buffer locks in the socket code, but sometimes the kqueue() system call will poll using the same entry points without holding the socket buffer lock. Simplify the logic in sodisconnect() since we no longer need spls. NOTE: To remove conditional locking in the kqueue filters, it would make sense to use a separate kqueue API entry into the socket/fifo code when calling from the kqueue() system call.	2004-06-18 02:57:55 +00:00
kbyanc	c81446b87e	Implement CLOCK_VIRTUAL and CLOCK_PROF for clock_gettime(2) and clock_getres(2). Reviewed by: phk PR: 23304	2004-06-17 23:12:12 +00:00
rwatson	855c4bb01f	Merge additional socket buffer locking from rwatson_netperf: - Lock down low hanging fruit use of sb_flags with socket buffer lock. - Lock down low hanging fruit use of so_state with socket lock. - Lock down low hanging fruit use of so_options. - Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock. - Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead. - Convert a if()->panic() into a KASSERT relating to so_state in soaccept(). - Remove a number of splnet()/splx() references. More complex merging of socket and socket buffer locking to follow.	2004-06-17 22:48:11 +00:00
phk	7dd1d04ac0	Reduce the thaumaturgical level of root filesystem mounts: Instead of using an otherwise redundant clone routine in geom_disk.c, mount a temporary DEVFS and do a proper lookup. Submitted by: thomas	2004-06-17 21:24:13 +00:00
phk	40dd98a3bd	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
phk	dfd1f7fd50	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
julian	6c9d81ae0d	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
peter	efc9e973bf	Change strategy based on a suggestion from Ian Dowse. Instead of trying to keep track of different section base addresses at a symbol-by-symbol level, just set the symbol values at load time.	2004-06-15 23:57:02 +00:00
rwatson	029226f3a8	Grab the socket buffer send or receive mutex when performing a read-modify-write on the sb_state field. This commit catches only the "easy" ones where it doesn't interact with as yet unmerged locking.	2004-06-15 03:51:44 +00:00
peter	0d395ced06	Fix symbol lookups between modules. This caused modules that depend on other modules to explode. eg: snd_ich->snd_pcm and umass->usb. The problem was that I was using the unified base address of the module instead of finding the start address of the section in question.	2004-06-15 01:35:57 +00:00
peter	99c1fd6c77	Insurance: cause a proper symbol lookup failure for symbol entries that reference unknown sections.. rather than returning a small value.	2004-06-15 01:33:39 +00:00
jdp	32b926e0fb	Change the return value of sema_timedwait() so it returns 0 on success and a proper errno value on failure. This makes it consistent with cv_timedwait(), and paves the way for the introduction of functions such as sema_timedwait_sig() which can fail in multiple ways. Bump __FreeBSD_version and add a note to UPDATING. Approved by: scottl (ips driver), arch	2004-06-14 18:19:05 +00:00
rwatson	f2c0db1521	The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.	2004-06-14 18:16:22 +00:00
phk	ba2d5c4937	Remove a left over from userland buffer-cache access to disks.	2004-06-14 14:25:03 +00:00
rwatson	f1bc833e95	Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.	2004-06-13 02:50:07 +00:00
rwatson	e3d9cae8b6	Introduce socket and UNIX domain socket locks into hard-coded lock order definition for witness. Send lock before receive lock, and socket locks after accept but before select: filedesc -> accept -> so_snd -> so_rcv -> sellck All routing locks after send lock: so_rcv -> radix node head All protocol locks before socket locks: unp -> so_snd udp -> udpinp -> so_snd tcp -> tcpinp -> so_snd	2004-06-13 00:23:03 +00:00
rwatson	7c0b73a950	Correct whitespace errors in merge from rwatson_netperf: tabs instead of spaces, no trailing tab at the end of line. Pointed out by: csjp	2004-06-12 23:36:59 +00:00
rwatson	82295697cd	Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 20:47:32 +00:00
rwatson	7bfe3e80fc	Introduce a mutex into struct sockbuf, sb_mtx, which will be used to protect fields in the socket buffer. Add accessor macros to use the mutex (SOCKBUF_()). Initialize the mutex in soalloc(), and destroy it in sodealloc(). Add addition, add SOCK_() access macros which will protect most remaining fields in the socket; for the time being, use the receive socket buffer mutex to implement socket level locking to reduce memory overhead. Submitted by: sam Sponosored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 16:08:41 +00:00
phk	ad8388ad62	Fix registration of loadable line disciplines. This should make watch(8)/snp(4) work again.	2004-06-12 12:31:42 +00:00
bmilekic	b75fa8ff5c	Gah! Plug a mbuf leak I introduced in the last commit. I don the pointy-hat. Problem reported by: Peter Holm <pho@>	2004-06-11 18:17:25 +00:00
julian	8b8e5c020c	Shuffle some code around.	2004-06-11 17:48:20 +00:00
phk	86602fc06c	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.	2004-06-11 11:16:26 +00:00
green	ff7604586b	Make sysctl_wire_old_buffer() respect ENOMEM from vslock() by marking the valid length as 0. This prevents vsunlock() from removing a system wire from memory that was not successfully wired (by us). Submitted by: tegge	2004-06-11 02:20:37 +00:00
rwatson	fcbd16cf61	Introduce a subsystem lock around UNIX domain sockets in order to protect global and allocated variables. This strategy is derived from work originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam Leffler: - Add unp_mtx, a global mutex which will protect all UNIX domain socket related variables, structures, etc. - Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros. - Acquire unp_mtx on entering most UNIX domain socket code, drop/re-acquire around calls into VFS, and release it on return. - Avoid performing sodupsockaddr() while holding the mutex, so in general move to allocating storage before acquiring the mutex to copy the data. - Make a stack copy of the xucred rather than copying out while holding unp_mtx. Copy the peer credential out after releasing the mutex. - Add additional assertions of vnode locks following VOP_CREATE(). A few notes: - Use of an sx lock for the file list mutex may cause problems with regard to unp_mtx when garbage collection passed file descriptors. - The locking in unp_pcblist() for sysctl monitoring is correct subject to the unpcb zone not returning memory for reuse by other subsystems (consistent with similar existing concerns). - Sam's version of this change, as with the BSD/OS version, made use of both a global lock and per-unpcb locks. However, in practice, the global lock covered all accesses, so I have simplified out the unpcb locks in the interest of getting this merged faster (reducing the overhead but not sacrificing granularity in most cases). We will want to explore possibilities for improving lock granularity in this code in the future. Submitted by: sam Sponsored by: FreeBSD Foundatiuon Obtained from: BSD/OS 5 snapshot provided by BSDi	2004-06-10 21:34:38 +00:00
bmilekic	bfeac9f9f9	Plug a race where upon free this scenario could occur: (time grows downward) thread 1 thread 2 ------------\|------------ dec ref_cnt \| \| dec ref_cnt <-- ref_cnt now zero cmpset \| free all \| return \| \| alloc again,\| reuse prev \| ref_cnt \| \| cmpset, read \| already freed \| ref_cnt ------------\|------------ This should fix that by performing only a single atomic test-and-set that will serve to decrement the ref_cnt, only if it hasn't changed since the earlier read, otherwise it'll loop and re-read. This forces ordering of decrements so that truly the thread which did the LAST decrement is the one that frees. This is how atomic-instruction-based refcnting should probably be handled. Submitted by: Julian Elischer	2004-06-10 00:04:27 +00:00
mux	b09a5ac74d	Fix a panic happening when m_getm() is called with len < MCLBYTES. Reported by: ale Tested by: ale Reviewed by: bosko	2004-06-09 14:53:35 +00:00
jmallett	d57aeb149e	Add a comment explaining td_critnest's initial state and its life from that point on, as it happens relatively indirectly, and in a codepath the casual reader may not be acquainted with or find obvious. Glanced at by: jhb	2004-06-09 14:06:44 +00:00
phk	de5f777272	Rename struct pt_ioctl to "ptsc" and pointers to it from "pti" to "pt"	2004-06-09 10:21:53 +00:00
phk	88992250bd	Ditch K&R function style	2004-06-09 10:16:14 +00:00
phk	2d1181e619	Reference count struct tty. Add two new functions: ttyref() and ttyrel(). ttymalloc() creates a struct tty with a reference count of one. when ttyrel sees the count go to zero, struct tty is freed. Hold references for open ttys and for ttys which are controlling terminal for sessions. Until drivers start using ttyrel(), this commit will make no difference.	2004-06-09 09:41:30 +00:00
phk	f9d30f0a79	Fix a race in destruction of sessions.	2004-06-09 09:29:08 +00:00
phk	6c64927139	Move PTY private defines into PTY private files.	2004-06-09 09:09:54 +00:00
stefanf	d7af95e868	Avoid assignments to cast expressions. Reviewed by: md5 Approved by: das (mentor)	2004-06-08 13:08:19 +00:00
tjr	58dbd6e669	Remove remnants of PGINPROF.	2004-06-08 10:37:30 +00:00
rwatson	8555f72de8	Correct a resource leak introduced in recent accept locking changes: when I reordered events in accept1() to allocate a file descriptor earlier, I didn't properly update use of goto on exit to unwind for cases where the file descriptor is now held, but wasn't previously. The result was that, in the event of accept() on a non-blocking socket, or in the event of a socket error, a file descriptor would be leaked. This ended up being non-fatal in many cases, as the file descriptor would be properly GC'd on process exit, so only showed up for processes that do a lot of non-blocking accept() calls, and also live for a long time (such as qmail). This change updates the use of goto targets to do additional unwinding. Eyes provided by: Brian Feldman <green@freebsd.org> Feet, hands provided by: Stefan Ehmann <shoesoft@gmx.net>, Dimitry Andric <dimitry@andric.com> Arjan van Leeuwen <avleeuwen@piwebs.com>	2004-06-07 21:45:44 +00:00
phk	bfb13da831	Make linesw[] an array of pointers to linedesc instead of an array of linedisc.	2004-06-07 20:45:45 +00:00
julian	769daa5d1d	Split kern_thread.c into 2 parts. kern_kse.c and kern_thread.c Kern_kse has already been committed. This separates out the KSE threading ABI from generic thread support.	2004-06-07 19:00:57 +00:00
davidxu	90554db906	According to SUSv3, sigwait is different with sigwaitinfo, sigwait returns error code in return value, not in errno.	2004-06-07 13:35:02 +00:00
pjd	c66d0ff628	Remove unused code. Submitted by: Bjoern A. Zeeb	2004-06-07 12:19:55 +00:00
ume	3a5bdeaf2c	allow more than MLEN bytes for ancillary data to meet the requirement of Section 20.1 of RFC3542. Obtained from: KAME MFC after: 1 week	2004-06-07 09:59:50 +00:00
tjr	24fcba21fb	Remove a stale and misleading comment.	2004-06-07 09:35:00 +00:00
julian	85b03d3641	Move the KSE ABI specific code here and separate it from code that is generic to any threading system. This commit does not link this file to the build yet, nor does it remove these functions from their current location in kern_thread.c. (that commit coming up after further review)	2004-06-07 07:25:03 +00:00
phk	4c3fd8116d	Remove filename+line number from panic messages.	2004-06-06 21:26:49 +00:00
bde	e02f078768	Detect interrupt storms better. The storm detection didn't work at all with an ASUS A7N8X-E motherboard in APIC mode, since storming interrupts don't repeat immediately. Use DELAY(1) to wait a bit for them to repeat. This affects all systems. Only delay for the first (10 * intr_storm_threshold) interrupts (per interrupt handler) so that this is only a pessimization while warming up. Throttle after calling the sub-handlers instead of before so that the long delay given by throttling can be used instead of the DELAY(1) to detect storms after warming up. Reduced the throttling period from 1/10 second to 1/hz seconds so that throttling doesn't destroy performance so much. Interrupts that are detected as storming are effectively handled by polling at a frequency of hz Hz. On A7N8X-E's there is another hardware or configuration bug that makes the throttled frequency closer to 2*hz Hz.	2004-06-05 18:27:28 +00:00
mux	b7f9b2983e	When we don't have any meaningful value to print for the device sysctl tree, output an empty string instead of "?". This is already what happened with DEVICE_SYSCTL_LOCATION and DEVICE_SYSCTL_PNPINFO. This makes the output of "sysctl dev" much nicer (it won't display those empty sysctls). Reviewed by: des	2004-06-05 11:39:05 +00:00
tjr	02a7d287a2	Change the types of vn_rdwr_inchunks()'s len and aresid arguments to size_t and size_t *, respectively. Update callers for the new interface. This is a better fix for overflows that occurred when dumping segments larger than 2GB to core files.	2004-06-05 02:18:28 +00:00
tjr	445b7fecaa	Back out workaround for vn_rdwr_inchunks()'s INT_MAX length limitation after discussions with bde; vn_rdwr_inchunks() itself should be fixed.	2004-06-05 02:00:12 +00:00
phk	17b52df3d7	Centralize the line discipline optimization determination in a function called ttyldoptim(). Use this function from all the relevant drivers. I belive no drivers finger linesw[] directly anymore, paving the way for locking and refcounting.	2004-06-04 21:55:55 +00:00
phk	06049d3eaf	Manual edits to change linesw[]-frobbing to ttyld_*() calls.	2004-06-04 20:04:52 +00:00
phk	ba3920e2a2	Machine generated patch which changes linedisc calls from accessing linesw[] directly to using the ttyld...() functions The ttyld...() functions ar inline so there is no performance hit.	2004-06-04 16:02:56 +00:00
tjr	5c5d136c33	Remove a stale comment.	2004-06-04 11:00:22 +00:00
des	7fec1d4931	Add a devclass level to the dev sysctl tree, in order to support per- class variables in addition to per-device variables. In plain English, this means that dev.foo0.bar is now called dev.foo.0.bar, and it is possible to to have dev.foo.bar as well.	2004-06-04 10:23:00 +00:00
phk	41a29cfd2f	Get rid of ttyregister(). All drivers now use ttymalloc() for struct tty, so now we stand a chance of implementing refcounting and getting rid of the damn things again.	2004-06-04 07:17:03 +00:00
phk	7e6e0efd64	Use ttymalloc() instead of ttyregister(). Use ttyioctl() instead of direct calls to the linedisc.	2004-06-04 06:50:35 +00:00
tjr	85aaf94278	Write segments to core dump files in maximally-sized chunks that neither exceed vn_rdwr_inchunks()'s INT_MAX length limitation nor span a block boundary. This fixes dumping segments larger than 2GB. PR: 67546	2004-06-04 06:30:16 +00:00
rwatson	87449e4f90	Mark sun_noname as const since it's immutable. Update definitions of functions that potentially accept &sun_noname (sbappendaddr(), et al) to accept a const sockaddr pointer.	2004-06-04 04:07:08 +00:00
alc	b5cd9ba03c	Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to blist.h, enabling the removal of numerous #includes from subr_blist.c. (subr_blist.c and swap_pager.c are the only users of these definitions.)	2004-06-04 04:03:26 +00:00
jhb	66f3d8ffca	- Comment out NULL, NULL barrier for Unix domain sockets section as the double NULL entries signal Witness to stop processing the array of order entries meaning none of the spin locks are added resulting in panics on boot. - Add a missing NULL, NULL terminator to the Slip locks list to keep them separate from the spin locks.	2004-06-03 20:07:44 +00:00
tjr	48c79c9521	Remove checks for curthread == NULL - it can't happen.	2004-06-03 10:22:47 +00:00
tjr	7a46b27935	Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb	2004-06-03 01:47:37 +00:00
rwatson	de0c6ecd47	Expand the hard-coded WITNESS lock order to include the following relationships: Sockets: filedesc->accept->sellck Routing: radix node head->rtentry->ifaddr UDP: udp->udpinp TCP: tcp->tcpinp SLIP: slip_mtx->slip sc_mtx Drop in a place holder section for UNIX domain sockets. Various sections to be expanded over the next few days.	2004-06-02 23:28:06 +00:00
mux	0ccfefe220	As discussed on arch@, flatten the device sysctl tree to make it more convenient to deal with. The notion of hierarchy is however preserved by adding a new %parent node.	2004-06-02 22:43:35 +00:00
tjr	9bd12a2fd9	Remove a redundant "td = curthread" statement from profclock().	2004-06-02 12:05:06 +00:00
tjr	80d36400ed	Move TDF_SA from td_flags to td_pflags (and rename it accordingly) so that it is no longer necessary to hold sched_lock while manipulating it. Reviewed by: davidxu	2004-06-02 07:52:36 +00:00
jeff	33a226cf5e	- Run sched_balance() and sched_balance_groups() from hardclock via sched_clock() rather than using callouts. This means we no longer have to take the load of the callout thread into consideration while balancing and should make the balancing decisions simpler and more accurate. Tested on: x86/UP, amd64/SMP	2004-06-02 05:46:48 +00:00
rwatson	576b26bafd	Integrate accept locking from rwatson_netperf, introducing a new global mutex, accept_mtx, which serializes access to the following fields across all sockets: so_qlen so_incqlen so_qstate so_comp so_incomp so_list so_head While providing only coarse granularity, this approach avoids lock order issues between sockets by avoiding ownership of the fields by a specific socket and its per-socket mutexes. While here, rewrite soclose(), sofree(), soaccept(), and sonewconn() to add assertions, close additional races and address lock order concerns. In particular: - Reorganize the optimistic concurrency behavior in accept1() to always allocate a file descriptor with falloc() so that if we do find a socket, we don't have to encounter the "Oh, there wasn't a socket" race that can occur if falloc() sleeps in the current code, which broke inbound accept() ordering, not to mention requiring backing out socket state changes in a way that raced with the protocol level. We may want to add a lockless read of the queue state if polling of empty queues proves to be important to optimize. - In accept1(), soref() the socket while holding the accept lock so that the socket cannot be free'd in a race with the protocol layer. Likewise in netgraph equivilents of the accept1() code. - In sonewconn(), loop waiting for the queue to be small enough to insert our new socket once we've committed to inserting it, or races can occur that cause the incomplete socket queue to overfill. In the previously implementation, it was sufficient to simply tested once since calling soabort() didn't release synchronization permitting another thread to insert a socket as we discard a previous one. - In soclose()/sofree()/et al, it is the responsibility of the caller to remove a socket from the incomplete connection queue before calling soabort(), which prevents soabort() from having to walk into the accept socket to release the socket from its queue, and avoids races when releasing the accept mutex to enter soabort(), permitting soabort() to avoid lock ordering issues with the caller. - Generally cluster accept queue related operations together throughout these functions in order to facilitate locking. Annotate new locking in socketvar.h.	2004-06-02 04:15:39 +00:00
rwatson	41a003003f	Rather than assert f_type==DTYPE_VNODE, conditionally perform the file lock release based on f_type==DTYPE_VNODE. vn_closefile() is used by non-vnode types as well (fifo).	2004-06-01 23:36:47 +00:00
rwatson	5adf35c004	Add GIANT_REQUIRED to kqueue_close(), since kqueue currently requires Giant.	2004-06-01 18:05:41 +00:00
rwatson	1e76056c09	Push the VOP_ADVLOCK() call to release advisory locks on vnode file descriptors out of fdrop_locked() and into vn_closefile(). This removes all knowledge of vnodes from fdrop_locked(), since the lock behavior was specific to vnodes. This also removes the specific requirement for Giant in fdrop_locked(), it's now only required by code that it calls into. Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.	2004-06-01 18:03:20 +00:00
bmilekic	9e06a1e05a	Fix a couple of bugs in the mbuf and packet ctors. In the latter case, nextpkt within the m_hdr was not being initialized to NULL for !M_PKTHDR cases. Maybe this will fix weird socket buffer inconsistency panics, but we'll see.	2004-06-01 16:17:10 +00:00
phk	3521579704	Introduce a ttyioctl() cdevsw default function.	2004-06-01 13:39:02 +00:00
phk	e0c89dae13	There is no need to explicitly call the stop function. In all likelyhood ->l_close() did it and ttyclose certainly will.	2004-06-01 11:57:15 +00:00
rwatson	5a32935851	Add a global mutex, accept_filter_mtx, to protect the global list of accept filters and prevent read-modify-write races.	2004-06-01 04:08:48 +00:00
rwatson	bddadcf71a	The SS_COMP and SS_INCOMP flags in the so_state field indicate whether the socket is on an accept queue of a listen socket. This change renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new state field on the socket, so_qstate, as the locking for these flags is substantially different for the locking on the remainder of the flags in so_state.	2004-06-01 02:42:56 +00:00
truckman	d503c79cad	Add MSG_NBIO flag option to soreceive() and sosend() that causes them to behave the same as if the SS_NBIO socket flag had been set for this call. The SS_NBIO flag for ordinary sockets is set by fcntl(fd, F_SETFL, O_NONBLOCK). Pass the MSG_NBIO flag to the soreceive() and sosend() calls in fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag on the underlying socket for each I/O operation. The O_NONBLOCK flag is a property of the descriptor, and unlike ordinary sockets, fifos may be referenced by multiple descriptors.	2004-06-01 01:18:51 +00:00
bmilekic	f7574a2276	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00
rwatson	13656d723e	Assert Giant in vn_start_write() and vn_finished_write().	2004-05-31 20:56:10 +00:00
rwatson	afc098b3e1	Assert Giant in vrele().	2004-05-31 19:06:01 +00:00
phk	30a7ac8468	Add missing #include <sys/module.h>	2004-05-30 20:34:58 +00:00
phk	d6f7d2bde6	Add some missing <sys/module.h> includes which are masked by the one on death-row in <sys/kernel.h>	2004-05-30 17:57:46 +00:00
tjr	2bc3263ac9	Enable MI bits for gcc -ftest-coverage -fprofile-arcs on amd64.	2004-05-29 01:18:14 +00:00
pjd	19d2b54248	Sysctl hw.bus.devctl_disable shouldn't be writtable from inside a jail. Approved by: imp	2004-05-26 16:36:32 +00:00

... 2 3 4 5 6 ...

7555 Commits