freebsd-skq

Author	SHA1	Message	Date
rwatson	2b329ffd2f	Rework sofree() logic to take into account a possible race with accept(). Sockets in the listen queues have reference counts of 0, so if the protocol decides to disconnect the pcb and try to free the socket, this triggered a race with accept() wherein accept() would bump the reference count before sofree() had removed the socket from the listen queues, resulting in a panic in sofree() when it discovered it was freeing a referenced socket. This might happen if a RST came in prior to accept() on a TCP connection. The fix is two-fold: to expand the coverage of the accept mutex earlier in sofree() to prevent accept() from grabbing the socket after the "is it really safe to free" tests, and to expand the logic of the "is it really safe to free" tests to check that the refcount is still 0 (i.e., we didn't race). RELENG_5 candidate. Much discussion with and work by: green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>	2004-10-11 08:11:26 +00:00
glebius	0dda31b4f9	Revert last commit since it breaks API. Requested by: sam	2004-10-10 09:16:48 +00:00
julian	30d2ba06b9	Don't release the slot twice.. sched_rem() has already done it. Submitted by: stephan uphoff (ups at tree dot com) MFC after: 3 days	2004-10-10 05:19:22 +00:00
julian	8c3d54b9e4	Remove duplicate line.	2004-10-10 05:07:43 +00:00
glebius	0c7bb9f633	Remove inlined m_tag_free(). Rename _m_tag_free() to m_tag_free() and make it visible (same way as in OpenBSD). Describe usage in manpage. This change is useful for creating custom free methods, which call default free method at their end. While here, make malloc declaration for mbuf tags more informative. Approved by: julian (mentor), sam MFC after: 1 month	2004-10-09 13:25:19 +00:00
green	3a482df790	Don't "implicitly order all sleep locks before spin locks" in witness when the spin lock in question isn't -- it's the critical_enter() that KDB set. No more panic in DDB for console -> syscons -> tty -> knote operations.	2004-10-09 08:16:37 +00:00
davidxu	94500a0336	Add an execve command for kse_thr_interrupt to allow libpthread to restore signal mask correctly, this is required by POSIX. Reviewed by: deischen	2004-10-07 13:50:10 +00:00
davidxu	e85209d12c	Regen to unbreak world. Pointy hat to: mtm	2004-10-07 01:09:46 +00:00
das	35b6f981ab	Back out rev 1.240; it is unnecessary. In particular, p1 == curthread, so _PHOLD(p1) will not have to block to swap in p1. Noticed by: jhb	2004-10-06 23:53:49 +00:00
mtm	0a21f474dc	Close a race between a thread exiting and the freeing of it's stack. After some discussion the best option seems to be to signal the thread's death from within the kernel. This requires that thr_exit() take an argument. Discussed with: davidxu, deischen, marcel MFC after: 3 days	2004-10-06 14:23:00 +00:00
davidxu	e1ce006b64	Close a race between thr_create and sysctl -w, the thr_scope_sys could be changed when thr_create is running, and we tested it for several times.	2004-10-06 02:29:19 +00:00
grog	152055d94b	vtryrecycle: Don't rely on type VBAD alone to mean that we don't need to clean the vnode. If v_data is set, we still need to clean it. This code change should catch all incidents of the previous commit (INVARIANTS only).	2004-10-06 02:09:59 +00:00
grog	882d69104e	getnewvnode: Weaken the panic "cleaned vnode isn't" to a warning. Discussion: this panic (or waning) only occurs when the kernel is compiled with INVARIANTS. Otherwise the problem (which means that the vp->v_data field isn't NULL, and represents a coding error and possibly a memory leak) is silently ignored by setting it to NULL later on. Panicking here isn't very helpful: by this time, we can only find the symptoms. The panic occurs long after the reason for "not cleaning" has been forgotten; in the case in point, it was the result of severe file system corruption which left the v_type field set to VBAD. That issue will be addressed by a separate commit.	2004-10-06 02:06:11 +00:00
davidxu	7acde29a24	Restore some code removed in revision 1.193 and 1.194, julian said he'd like to keep these code.	2004-10-06 00:49:41 +00:00
davidxu	793ea9317e	In original kern_execve() code, at the start of the function, it forces all other threads to suicide, problem is execve() could be failed, and a failed execve() would change threaded process to unthreaded, this side effect is unexpected. The new code introduces a new single threading mode SINGLE_BOUNDARY, in the mode, all threads should suspend themself at user boundary except the singler. we can not use SINGLE_NO_EXIT because we want to start from a clean state if execve() is successful, suspending other threads at unknown point and later resuming them from there and forcing them to exit at user boundary may cause the process to start from a dirty state. If execve() is successful, current thread upgrades to SINGLE_EXIT mode and forces other threads to suicide at user boundary, otherwise, other threads will be resumed and their interrupted syscall will be restarted. Reviewed by: julian	2004-10-06 00:40:41 +00:00
julian	d5dfe59f9e	Fix whitespace botch that only showed up in the commit message diff :-/ MFC after: 4 days	2004-10-05 22:14:02 +00:00
julian	b4640b18f7	Slight cleanup in the single threading code. MFC after: 4 days	2004-10-05 22:05:25 +00:00
julian	57fb03da54	When preempting a thread, put it back on the HEAD of its run queue. (Only really implemented in 4bsd) MFC after: 4 days	2004-10-05 22:03:10 +00:00
julian	7d0504ed38	Oops. left out part of the diff. MFC after: 4 days	2004-10-05 21:26:27 +00:00
julian	7b170fd9fa	Use some macros to trach available scheduler slots to allow easier debugging. MFC after: 4 days	2004-10-05 21:10:44 +00:00
julian	8587c9806d	light rearrangement of some code to get some locking more correct MFC after: 4 days	2004-10-05 20:48:16 +00:00
julian	2094122f86	Break out to a separate function, the code to revert a multithreaded process back to officially being a non-threaded program. MFC after: 4 days	2004-10-05 20:39:26 +00:00
jhb	ce2d3f89af	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month	2004-10-05 18:51:11 +00:00
jhb	9536269a6d	Add a critical section in turnstile_unpend() from before dropping the turnstile chain lock until after making all the awakened threads runnable. First, this fixes a priority inversion race. Second, this attempts to finish waking up all of the threads waiting on a turnstile before doing a preemption. Reviewed by: Stephan Uphoff (who found the priority inversion race)	2004-10-05 18:00:30 +00:00
pjd	c944ef39d6	Back out changes which were introduced to delay mounting root file system. Those changes were made on gmirror needs, but now gmirror handles this by itself.	2004-10-05 11:26:43 +00:00
davidxu	aa22b44625	Use scheduler api to adjust thread priority.	2004-10-05 09:10:30 +00:00
imp	cf32c9fe79	Add taskqueue_drain. This waits for the specified task to finish, if running, or returns. The calling program is responsible for making sure that nothing new is enqueued. # man page coming soon.	2004-10-05 04:16:01 +00:00
phk	bd3b1af9a6	Change the perfectly precise message printf("No buffers busy after final sync"); to printf("All buffers synced."); in order to not leave the users wondering if there should be.	2004-10-04 13:13:23 +00:00
julian	395c906e95	Another case where we need to guard against a partially constructed process. Submitted by: Stephan Uphoff ( ups at tree.com ) MFC after: 3 days	2004-10-04 06:45:48 +00:00
julian	96dbdb17db	Always strt out with an initilalised ksegrp structure. MFC after: 3 days	2004-10-03 20:06:11 +00:00
davidxu	33faeb8a73	Don't bother to turn off other P_STOPPED bits for SIGKILL, doing so would cause kernel to produce an unkillable process in some cases, especially, P_STOPPED_SINGLE has a singling thread, turning off the bit would mess the state.	2004-10-03 13:23:49 +00:00
alc	ad2a4ca3e0	Add a SOCKBUF_LOCK() to a rarely executed path in do_sendfile().	2004-10-02 05:37:47 +00:00
alfred	0efc91b067	Clear a process's procfs trace points upon delivery of SIGKILL. MT5 candidate. (Desired features for 5.3-RELEASE "More truss problems")	2004-10-01 14:15:20 +00:00
phk	2e5b8b9883	Fix a LOR relating to freeing cdevs.	2004-10-01 06:33:39 +00:00
alfred	a72e384f52	cover soreadable and sowriteable with the corresponding socketbuffer locks.	2004-10-01 05:54:06 +00:00
das	e399d76f1b	Avoid calling _PHOLD(p1) with p2's lock held, since _PHOLD() may block to swap in p1. Instead, call _PHOLD earlier, at a point where the only lock held happens to be p1's.	2004-10-01 05:01:29 +00:00
jhb	59af2fcb61	Fix a typo to fix the !DIAGNOSTIC build. Submitted by: many	2004-09-30 18:13:18 +00:00
phk	9743317a02	Assign a global unit number for the tty slave devices (init/lock) using the new subr_unit.c code. For now assert Giant in ttycreate() and ttyfree(). It is not obvious that it will ever pay off to lock these with anything else.	2004-09-30 10:38:48 +00:00
phk	fbd7b98a6a	Add a new API for allocating unit number (-like) resources. Allocation is always lowest free unit number. A mixed range/bitmap strategy for maximum memory efficiency. In the typical case where no unit numbers are freed total memory usage is 56 bytes on i386. malloc is called M_WAITOK but no locking is provided (yet). A bit of experience will be necessary to determine the best strategy. Hopefully a "caller provides locking" strategy can be maintained, but that may require use of M_NOWAIT allocation and failure handling. A userland test driver is included.	2004-09-30 07:04:03 +00:00
green	70acfe3e4f	Account for alias devices when tearing them down in destroy_dev() so we don't panic on a NULL cdev->si_devsw.	2004-09-29 16:38:38 +00:00
des	f665f60342	Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables. MFC after: 3 days	2004-09-29 14:21:40 +00:00
phk	d08ddc3f6b	Add functions to create and free the "tty-ness" of a serial port in a generic way. This code will allow a similar amount of code to be removed from most if not all serial port drivers. Add generic cdevsw for tty devices. Add generic slave cdevsw for init/lock devices. Add ttypurge function which wakes up all know generic sleep points in the tty code, and calls into the hw-driver if it provides a method. Add ttycreate function which creates tty device and optionally cua device. In both cases .init/.lock devices are created as well. Change ttygone() slightly to also call the hw driver provided purge routine. Add ttyfree() which will purge and destroy the cdevs. Add ttyconsole mode for setting console friendly termios on a port.	2004-09-28 19:33:49 +00:00
jmg	0d1f936e78	improve the mbuf m_print function.. Only pull length from pkthdr if there is one, detect mbuf loops and stop, add an extra arg so you can only print the first x bytes of the data per mbuf (print all if arg is -1), print flags using %b (bitmask)... No code in the tree appears to use m_print, and it's just a maner of adding -1 as an additional arg to m_print to restore original behavior.. MFC after: 4 days	2004-09-28 18:40:18 +00:00
phk	6e31d065d3	Give cluster_write() an explicit vnode argument. In the future a struct buf will not automatically point out a vnode for us.	2004-09-27 19:14:10 +00:00
phk	3234741a00	Used cached cdevsw pointer.	2004-09-27 06:34:30 +00:00
phk	27fb35d0b1	Add cdevsw->d_purge() support. This device method shall wake up any threads sleeping in the device driver and make the depart the drivers code for good.	2004-09-27 06:18:25 +00:00
marcel	266d410b93	Fix a bug introduced in the previous commit: kdb_cpu_trap() gets to the trapframe via kdb_frame, but kdb_frame was not initialized until after the call to kdb_cpu_trap(). Ergo: kdb_cpu_trap() was moved too far up. Pointy hat: marcel	2004-09-26 06:48:59 +00:00
julian	01b7ff330e	Use the universal 'threaded process' flag rather than the specific tests for different threading systems. MFC after: 1 week	2004-09-25 00:53:46 +00:00
jhb	3bd2c1b67b	Some more whitespace, style, and comment fixes. Submitted by: bde (mostly)	2004-09-24 20:27:04 +00:00
pjd	db95a45215	Rename 'mount_root_delay' tunable to 'vfs.root.mountdelay', which fits a bit better to our current naming scheme. Discussed with: ru	2004-09-24 09:19:03 +00:00
phk	c67c50f50e	Remove the cdevsw() function which is now unused.	2004-09-24 08:30:57 +00:00
phk	6ee26d135f	Hold threadcount while throbbing cdevsw in our underlying driver. This is a bit heavyhanded, and will be simplified once the tty code learns to properly deal with disappearing hw and drivers.	2004-09-24 08:26:03 +00:00
phk	14a7813f86	Hold threadcount reference when we call into the underlying console driver.	2004-09-24 07:16:56 +00:00
phk	615a2ebb57	Eliminate devsw() call, we are not dereferencing the pointer.	2004-09-24 07:11:02 +00:00
phk	88cf2bf7b8	Hold threadref while we throb cdevsw in devtoname()	2004-09-24 06:29:23 +00:00
phk	19aa7ffe99	Use vn_isdisk() to check if vnode is a disk. (repeat, CVS core dumped on me)	2004-09-24 06:23:31 +00:00
phk	d04b40f97c	use vn_isdisk() to see if vnode is a disk.	2004-09-24 06:21:43 +00:00
phk	ee853f3efd	Hold dev_lock and check for NULL devsw pointer when we service FIODTYPE ioctl.	2004-09-24 06:16:48 +00:00
phk	5536d5757b	Hold dev_lock and check for NULL devsw pointer when we determine if a vnode is a disk.	2004-09-24 06:16:08 +00:00
phk	18b8697aaa	use dev_re[fl]thread() rather than home rolled versions.	2004-09-24 05:55:03 +00:00
phk	7ede6d888c	Introduce dev_re[lf]thread() functions. dev_refthread() will return the cdevsw pointer or NULL. If the return value is non-NULL a threadcount is held which much be released with dev_relthread(). If the returned cdevsw is NULL no threadcount is held on the device.	2004-09-24 05:54:32 +00:00
jhb	7b666b4137	A modest collection of various and sundry style, spelling, and whitespace fixes. Submitted by: bde (mostly)	2004-09-24 00:38:15 +00:00
cognet	49654e152d	On arm, set the default elf brand to FreeBSD, until the binutils do it for us.	2004-09-23 23:29:24 +00:00
jhb	d0df115aaa	Don't try to protect td_sticks with sched_lock. It doesn't need it as it is only accessed by curthread.	2004-09-23 21:03:58 +00:00
jhb	f6dc0c3d5f	- Assert sched_lock in upcall_remove() since it is needed there and all callers already lock it there. - Lock sched_lock slightly earlier in kse_create() so that it covers kg_numupcalls.	2004-09-23 21:03:16 +00:00
jhb	1f2758a712	- Don't try to unlock Giant if single threading fails since we don't have it locked. - Unlock Giant before calling exit1() since exit1() does not require Giant.	2004-09-23 21:01:50 +00:00
phk	eb5eea42df	Split the ioctl function in control and slave side, this eliminated a troublesome devsw() call.	2004-09-23 16:13:46 +00:00
phk	1d992e18ec	Eliminate DEV_STRATEGY() macro: call dev_strategy() directly. Make dev_strategy() handle errors and departing devices properly.	2004-09-23 14:45:04 +00:00
pjd	99b0ffd3c0	Introduce new /boot/loader.conf variable: root_mount_delay. It can be used to delay mounting root partition to give a chance to GEOM providers to show up. Now, when there is no needed provider, vfs_rootmount() function will look for it every second and if it can't be find in defined time, it'll ask for root device name (before this change it was done immediately). This will allow to boot from gmirror device in degraded mode.	2004-09-23 10:13:18 +00:00
phk	3947e54e89	Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount of the number of threads which are inside whatever is behind the cdevsw for this particular cdev. Make the device mutex visible through dev_lock() and dev_unlock(). We may want finer granularity later. Replace spechash_mtx use with dev_lock()/dev_unlock().	2004-09-23 07:17:41 +00:00
jhb	3956303607	Various small style fixes.	2004-09-22 15:24:33 +00:00
julian	5015e1ce1f	Revert the last change.. Better to kill all other threads than to panic the system if 2 threads call execve() at the same time. A better fix will be committed later. Note that this only affects the case where the execve fails.	2004-09-22 01:30:23 +00:00
julian	fca3e8d0f0	In a threaded process, don't kill off all the other threads until we have a reasonable chance that the eceve() is going to succeeed. I.e. wait until we've done the permission checks etc. MFC after: 1 week	2004-09-21 21:05:13 +00:00
phk	3441ee7248	If a vnode has no v_rdev we cannot hope to answer FIODTYPE ioctl.	2004-09-21 08:33:05 +00:00
jhb	777f907276	Remove unused macro.	2004-09-20 19:01:44 +00:00
brian	255869f387	CTASSERT that MSZIE is a power of 2 (otherwise dtom() breaks) Ask uma_zcreate() to align mbufs to MSIZE bytes (otherwise dtom() breaks) As it happens, uma_zalloc_arg() always returned mbufs aligned to MSIZE anyway, but that was an implementation side-effect.... KASSERT -> CTASSERT suggested by: dd@ Approved by: silence on -net	2004-09-20 08:52:04 +00:00
das	8b64b8f028	The zone from which proc structures are allocated is marked UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.	2004-09-19 18:34:17 +00:00
phk	37dd5c7419	Initialize new ttys a bit more. Check TS_GONE flag for gone-ness.	2004-09-18 17:02:18 +00:00
marcel	f2afc2cd7a	Move makectx() after kdb_cpu_trap(), so the PCB will have possible MD corrections made to the trapframe. This is more logical.	2004-09-17 22:27:23 +00:00
phk	1b1b81c0fc	Add ttyopen and ttyclose functions which will do the right stuff for most if not all of our tty drivers in the future. Centralizing this stuff enables us to remove about 100 lines of almost but not quite perfectly copy&paste code from each tty driver.	2004-09-17 11:43:35 +00:00
phk	f8ef366cb9	Add ttyalloc() which in due time will be the successor to ttymalloc(), but without the "struct tty *" argument.	2004-09-17 06:13:47 +00:00
phk	a64f45cafb	Use the tty->t_sc field to find our softc.	2004-09-16 12:07:25 +00:00
julian	6461286b21	clean up thread runq accounting a bit. MFC after: 3 days	2004-09-16 07:12:59 +00:00
julian	b4933d4405	e specific code to revert a partial add ot teh run queue, not remrunqueue() which can't handle a partially added thread. MFC after: 1 week	2004-09-16 05:37:40 +00:00
phk	02df7323ee	Remove unused B_WRITEINPROG flag	2004-09-15 21:49:22 +00:00
phk	43f0dbec3c	Simplify initialization of va_null a little bit.	2004-09-15 21:42:03 +00:00
phk	7bf1722b65	undent some functions a bit.	2004-09-15 21:08:58 +00:00
phk	b2c5cf5b2a	stylistic polishing.	2004-09-15 20:54:23 +00:00
julian	2e88fd3281	Try harder to get back to being a non threaded process. Submitted by: DavidXu MFC after: 3 days	2004-09-15 18:39:09 +00:00
julian	d7dd18c6b5	Oops accidentally removed #ifdef SCHED_4BSD as part of another commit This function is not yet used in ULE	2004-09-15 03:51:51 +00:00
jmg	ab70754605	unlock global lock in kqueue_scan before msleep'ing to prevent dead lock.. we didn't unlock global lock earlier to prevent just having to reaquire it again.. Found by: peter Reviewed by: ps MFC after: 3 days	2004-09-14 18:38:16 +00:00
julian	2e10eab995	Commit a fix for some panics we've been seeing with preemption. MFC after: 2 days	2004-09-13 23:06:39 +00:00
julian	0b88c839d5	Add some kasserts	2004-09-13 23:02:52 +00:00
julian	29732c6fb7	make some of these conditions apply equally to both threading systems.	2004-09-13 22:10:04 +00:00
phk	a915c8947e	Create struct snapdata which contains the snapshot fields from cdev and the previously malloc'ed snapshot lock. Malloc struct snapdata instead of just the lock. Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes). While here, give the private readblock() function a vnode argument in preparation for moving UFS to access GEOM directly.	2004-09-13 07:29:45 +00:00
phk	2806321da1	Remove the buffercache/vnode side of BIO_DELETE processing in preparation for integration of p4::phk_bufwork. In the future, local filesystems will talk to GEOM directly and they will consequently be able to issue BIO_DELETE directly. Since the removal of the fla driver, BIO_DELETE has effectively been a no-op anyway.	2004-09-13 06:50:42 +00:00
scottl	1e56230631	Revert the previous round of changes to td_pinned. The scheduler isn't fully initialed when the pmap layer tries to call sched_pini() early in the boot and results in an quick panic. Use ke_pinned instead as was originally done with Tor's patch. Approved by: julian	2004-09-11 10:07:22 +00:00
julian	4b041e0e33	Try committing from the right tree this time MFC after: 2 days	2004-09-11 00:11:09 +00:00
julian	7cae3c9d5b	Make up my mind if cpu pinning is stored in the thread structure or the scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days	2004-09-10 22:28:33 +00:00
julian	9993c65718	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week	2004-09-10 21:04:38 +00:00
jmg	08f545c4a5	remove giant required from kqueue_close.. Reported by: kuriyama MFC after: 3 days	2004-09-10 03:14:32 +00:00
rwatson	d4e6ebd0c9	Hard code witness lock order for BPF locks.	2004-09-09 05:01:37 +00:00
phk	1912367ebb	Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.	2004-09-07 09:17:05 +00:00
julian	ae5a11b8d4	fix typo MFC after: 2 days	2004-09-07 07:04:47 +00:00
julian	35060cd448	Make debug printf less threatenning and make it only print out once. MFC after: 2 days	2004-09-07 06:38:22 +00:00
julian	de0e7f8937	Give libthr a choice (per system) of scope_system or scope_thread scheduling. MFC after: 4 days	2004-09-07 06:33:39 +00:00
jmg	ff58a59f8f	make witness it's own sysctl branch instead of using _ to do this. I have left the old tunables in to give people a few days to transition their loader.conf and sysctl.conf's over to the new names.. MFC after: 5 days	2004-09-06 23:27:28 +00:00
jmg	b29998067a	don't call f_detach if the filter has alread removed the knote.. This happens when a proc exits, but needs to inform the user that this has happened.. This also means we can remove the check for detached from proc and sig f_detach functions as this is doing in kqueue now... MFC after: 5 days	2004-09-06 19:02:42 +00:00
julian	91180c0a8c	Don't do IPIs on behalf of interrupt threads. just punt straight on through to teh preemption code. Make a KASSSERT out of a condition that can no longer occur. MFC after: 1 week	2004-09-06 07:23:14 +00:00
julian	daf0815c1d	slight code cleanup MFC after: 1 week	2004-09-05 23:23:58 +00:00
alfred	a91f587457	It's too easy to panic the machine when INVARIANTS are turned on and you botch a call to nmount(2). This is because there is an INVARIANTS check that asserts that opt->len must be zero if opt->val is not NULL. The problem is that the code does not actually follow this invariant if there is an error while processing mount options. Fix the code to honor the INVARIANT. Silence on: fs@	2004-09-05 22:24:28 +00:00
rwatson	0d9965ce27	Expand the scope of the socket buffer locks in sopoll() to include the state test as well as set, or we risk a race between a socket wakeup and registering for select() or poll() on the socket. This does increase the cost of the poll operation, but can probably be optimized some in the future. This appears to correct poll() "wedges" experienced with X11 on SMP systems with highly interactive applications, and might affect a plethora of other select() driven applications. RELENG_5 candidate. Problem reported by: Maxim Maximov <mcsi at mcsi dot pp dot ru> Debugged with help of: dwhite	2004-09-05 14:33:21 +00:00
julian	e291fa7714	turn on IPIs for 4bsd scheduler by default. MFC after: 1 week	2004-09-05 02:19:53 +00:00
julian	5813d27029	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
julian	42bfd75cfe	Don't declare a function we are not defining.	2004-09-03 09:19:49 +00:00
julian	3bc1a1327b	fix compile for UP	2004-09-03 09:15:10 +00:00
julian	e2d37a7c26	ooops finish last commit. moved the variables but not the declarations.	2004-09-03 08:19:31 +00:00
julian	373bbfc184	Move 4bsd specific experimental IP code into the 4bsd file. Move the sysctls into kern.sched	2004-09-03 07:42:31 +00:00
alc	82e55fdf76	Push Giant deep into vm_forkproc(), acquiring it only if the process has mapped System V shared memory segments (see shmfork_myhook()) or requires the allocation of an ldt (see vm_fault_wire()).	2004-09-03 05:11:32 +00:00
rwatson	fd3f91ddf3	Tag AIO as requiring Giant over the network stack using NET_NEEDS_GIANT(). RELENG_5 candidate.	2004-09-03 03:19:14 +00:00
julian	46d0945926	remove unused code MFC after: 2 days	2004-09-02 23:37:41 +00:00
scottl	d9af98161a	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.	2004-09-02 18:59:15 +00:00
julian	7f91bb5d9a	Blush forgot to test non SMP builds.. oddly enough some UP code (particularly in the acpi code) seems to want this in a UP build. (I guess so you can have a sigle kernel module that works for both)	2004-09-01 18:05:43 +00:00
julian	8354ba9e3a	Give the 4bsd scheduler the ability to wake up idle processors when there is new work to be done. MFC after: 5 days	2004-09-01 06:42:02 +00:00
julian	e9d9514975	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days	2004-09-01 02:11:28 +00:00
davidxu	21ee614ff9	Remove TDP_USTATCLOCK, we no longer need it because we now always update tick count for userland in thread_userret. This change also removes a "no upcall owned" panic because fuword() schedules an upcall under heavily loaded, and code assumes there is no upcall can occur. Reported and Tested by: Peter Holm <peter@holm.cc>	2004-08-31 11:52:05 +00:00
julian	2782d4b3fc	Remove an unneeded argument.. The removed argument could trivially be derived from the remaining one. That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument. Having both proc and thread as an argumen tjust gives an opportunity for them to get out sync. MFC after: 3 days	2004-08-31 07:34:54 +00:00
julian	ee753ed190	Remove sched_free_thread() which was only used in diagnostics. It has outlived its usefulness and has started causing panics for people who turn on DIAGNOSTIC, in what is otherwise good code. MFC after: 2 days	2004-08-31 06:12:13 +00:00
imp	7387033363	Fix BUS_DEBUG case	2004-08-30 05:48:49 +00:00
pjd	676a87c9ac	Add a missing '\n'.	2004-08-30 01:10:20 +00:00
davidxu	650fed99d4	Only test return_instead if P_SINGLE_EXIT is set, otherwise a fork() syscall can interrupt other thread's syscall in sleepq_catch_signals(). Current, all callers know thread_suspend_check may suspend thread itself, so we need't to check return_instead for normal suspension flags (no P_SINGLE_EXIT set). Tested by: deischen Reported by: Maarten L. Hekkelman <m.hekkelman@cmbi.kun.nl>	2004-08-29 23:10:02 +00:00
imp	5bc031c873	Initial support (disabled) for rebidding devices. I've been running this in my tree for a while and in its disabled state there are no issues. It isn't enabled yet because some drivers (in acpi) have side effects in their probe routines that need to be resolved in some manner before this can be turned on. The consensus at the last developer's summit was to provide a static method for each driver class that will return characteristics of the driver, one of which is if can be reprobed idempotently.	2004-08-29 18:25:21 +00:00
imp	890b511bba	MFp4: Merge in the patches, submitted long ago by someone whose email address I've lost, that move the location information to the atttach routine as well. While one could use devinfo to get this data, that is difficult and error prone and subject to races for short lived devices. Would make a good MT5 candidate.	2004-08-29 18:11:10 +00:00
des	431e20a6fe	Remove the HW_WDOG option; it serves no purpose. MFC after: 3 days	2004-08-29 11:10:09 +00:00
iedowse	23b0458914	Add support for completing the installation of ELF relocatable object format modules that were read in by the loader. Loading modules via the loader should now work on the amd64 platform.	2004-08-29 01:21:51 +00:00
davidxu	a6ba819750	1. try to use existing mailbox address in thread_update_usr_ticks. 2. remove '\n' in KASSERT.	2004-08-28 04:16:32 +00:00
davidxu	96f0feb1d4	Move TDF_CAN_UNBIND to thread private flags td_pflags, this eliminates need of sched_lock in some places. Also in thread_userret, remove spare thread allocation code, it is already done in thread_user_enter. Reviewed by: julian	2004-08-28 04:08:05 +00:00
peter	9e60f4336e	Backout the previous backout (with scott's ok). sched_ule.c:1.122 is believed to fix the problem with ULE that this change triggered.	2004-08-28 01:04:44 +00:00
obrien	0fe47008f6	s/smp_rv_mtx/smp_ipi_mtx/g Requested by: jhb	2004-08-28 00:49:55 +00:00
peter	587d1d74f3	Commit Jeff's suggested changes for avoiding a bug that is exposed by preemption and/or the rev 1.79 kern_switch.c change that was backed out. The thread was being assigned to a runq without adding in the load, which would cause the counter to hit -1.	2004-08-28 00:49:22 +00:00
andre	bae83b7595	Poll() uses the array smallbits that is big enough to hold 32 struct pollfd's to avoid calling malloc() on small numbers of fd's. Because smalltype's members have type char, its address might be misaligned for a struct pollfd. Change the array of char to an array of struct pollfd. PR: kern/58214 Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at> Reviewed by: bde (a long time ago) MFC after: 3 days	2004-08-27 21:23:50 +00:00
kan	a060608a2d	Reintroduce slightly modified patch from kern/69964. Check for LK_HAVE_EXL in both acquire invocations. MFC after: 5 days	2004-08-27 01:41:28 +00:00
iedowse	1268768c3d	When trying each linker class in turn with a preloaded module, exit the loop if the preload was successful. Previously a successful preload was ignored if the linker class was not the last in the list.	2004-08-27 01:20:26 +00:00
rwatson	4dfef2e0e5	Don't hold the UNIX domain socket subsystem lock over the body of the UNIX domain socket garbage collection implementation, as that risks holding the mutex over potentially sleeping operations (as well as introducing some nasty lock order issues, etc). unp_gc() will hold the lock long enough to do necessary deferal checks and set that it's running, but then release it until it needs to reset the gc state. RELENG_5 candidate. Discussed with: alfred	2004-08-25 21:24:36 +00:00
rwatson	d168fd3606	Conditional acquisition of socket buffer mutexes when testing socket buffers with kqueue filters is no longer required: the kqueue framework will guarantee that the mutex is held on entering the filter, either due to a call from the socket code already holding the mutex, or by explicitly acquiring it. This removes the last of the conditional socket locking.	2004-08-24 05:28:18 +00:00
imp	9b0c1e7ac1	Set the description to NULL in the right detach routine. This should keep dangling pointers to strings in loaded modules from hanging around after the drivers are unloaded.	2004-08-24 05:19:15 +00:00
davidxu	cf0e9470a8	Remove checking of single exit flag in thread_user_enter(), this is generic code for threaded process, should not be here.	2004-08-23 22:54:37 +00:00
peter	326b7f663e	Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock. We were obtaining different spin mutexes (which disable interrupts after aquisition) and spin waiting for delivery. For example, KSE processes do LDT operations which use smp_rendezvous, while other parts of the system are doing things like tlb shootdowns with a different mutex. This patch uses the common smp_rendezvous mutex for all MD home-grown IPIs that spinwait for delivery. Having the single mutex means that the spinloop to aquire it will enable interrupts periodically, thus avoiding the cross-ipi deadlock. Obtained from: dwhite, alc Reviewed by: jhb	2004-08-23 21:39:29 +00:00
kan	edf5a7b07f	Temporarily back out r1.74 as it seems to cause a number of regressions accordimg to numerous reports. It might get reintroduced some time later when an exact failure mode is understood better.	2004-08-23 02:39:45 +00:00
rwatson	b3e3a32317	Make debug.kdb.stop_cpus also a TUNABLE() so it can be set prior to boot to help debug early nasty hangs.	2004-08-22 15:10:52 +00:00
julian	9349236b6f	diff reduction for upcoming patch. Use a macro that masks some of the odd goings on with sub-structures, because they will go away anyhow.	2004-08-22 05:21:41 +00:00
truckman	f36627bd56	Don't bother calling the module event handlers from module_shutdown() in the shutdown_final state if the RB_NOSYNC flag is set. The specific motivation in this case is that a system panic in an interrupt context results in a call to module_shutdown(), which calls g_modevent(), which calls g_malloc(..., M_WAITOK), which results in a second panic. While g_modevent() could be fixed to not call malloc() for MOD_SHUTDOWN events (which it doesn't handle in any case), it is probably also a good idea to entirely skip the execution of the module shutdown handlers after a panic. This may be a MFC candidate for RELENG_5.	2004-08-20 21:47:48 +00:00
truckman	54d23a34f6	Don't attempt to trigger the syncer thread final sync code in the shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the likely cause of hangs after a system panic that are keeping crash dumps from being done. This is a MFC candidate for RELENG_5. MFC after: 3 days	2004-08-20 19:21:47 +00:00
jhb	fc631187fd	Remove some dead code under a straggling APIC_IO #ifdef that I missed back before 5.2.	2004-08-20 17:24:52 +00:00
rwatson	5c80f32b93	Back out uipc_socket.c:1.208, as it incorrectly assumes that all sockets are connection-oriented for the purposes of kqueue registration. Since UDP sockets aren't connection-oriented, this appeared to break a great many things, such as RPC-based applications and services (i.e., NFS). Since jmg isn't around I'm backing this out before too many more feet are shot, but intend to investigate the right solution with him once he's available. Apologies to: jmg Discussed with: imp, scottl	2004-08-20 16:24:23 +00:00
scottl	30583f7adf	Revert the previous change. It works great for 4BSD but causes major problems for ULE. The reason is quite unknown and worrisome.	2004-08-20 05:58:38 +00:00
scottl	b336a56514	In maybe_preempt(), ignore threads that are in an inconsistent state. This is an effective band-aid for at least some of the scheduler corruption seen recently. The real fix will involve protecting threads while they are inconsistent, and will come later. Submitted by: julian	2004-08-20 05:18:50 +00:00
jmg	b0492852c8	make sure that the socket is either accepting connections or is connected when attaching a knote to it... otherwise return EINVAL... Pointed out by: benno	2004-08-20 04:15:30 +00:00
njl	7a83d1fca4	Add a newline.	2004-08-19 20:16:09 +00:00
phk	59d327838d	Add bioq_takefirst(). If the bioq is empty, NULL is returned. Otherwise the front element is removed and returned. This can simplify locking in many drivers from: lock() bp = bioq_first(bq); if (bp == NULL) { unlock() return } bioq_remove(bp, bq) unlock to: lock() bp = bioq_takefirst(bq); unlock() if (bp == NULL) return;	2004-08-19 19:51:51 +00:00
njl	5aee38d321	Add debugging to rman_manage_region() as well. This is useful since we manage subregions in ACPI. MFC after: 3 days	2004-08-19 16:41:12 +00:00
rwatson	477ea1ed67	Remove GIANT_REQUIRED from setugidsafety() as knote_fdclose() no longer requires Giant.	2004-08-19 14:59:51 +00:00
jhb	9e08178eb7	Now that the return value semantics of cv's for multithreaded processes have been unified with that of msleep(9), further refine the sleepq interface and consolidate some duplicated code: - Move the pre-sleep checks for theaded processes into a thread_sleep_check() function in kern_thread.c. - Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c. Specifically, if a thread is awakened by something other than a signal while checking for signals before going to sleep, clear TDF_SINTR in sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in that edge case during an interruptible sleep. Also, fix sleepq_check_signals() to properly handle the condition if TDF_SINTR is clear rather than requiring the callers of the sleepq API to notice this edge case and call a non-_sig variant of sleepq_wait(). - Clarify the flags arguments to sleepq_add(), sleepq_signal() and sleepq_broadcast() by creating an explicit submask for sleepq types. Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of 0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and move the setting of TDF_SINTR to sleepq_add() if this flag is set rather than sleepq_catch_signals(). Note that it is the caller's responsibility to ensure that sleepq_catch_signals() is called if and only if this flag is passed to the preceeding sleepq_add(). Note that this also removes a sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures that for an interruptible sleep, TDF_SINTR is always set when TD_ON_SLEEPQ() is true.	2004-08-19 11:31:42 +00:00
jmg	bead871bc0	add options MPROF_BUFFERS and MPROF_HASH_SIZE that adjust the sizes of the mutex profiling buffers. Document them in the man page and in NOTES. Ensure _HASH_SIZE is larger than _BUFFERS with a cpp error.	2004-08-19 06:38:26 +00:00
rwatson	3d9f38d578	Add UNP_UNLOCK_ASSERT() to asser that the UNIX domain socket subsystem lock is not held. Rather than annotating that the lock is released after calls to unp_detach() with a comment, annotate with an assertion. Assert that the UNIX domain socket subsystem lock is not held when unp_externalize() and unp_internalize() are called.	2004-08-19 01:45:16 +00:00
rwatson	6dcf7eb45e	Annotate call to DELAY() in interrupt storm mitigation as being something to revisit. Approved by: re (scottl)	2004-08-17 04:09:09 +00:00
kan	2c6402abdd	Upgrading a lock does not play well together with acquiring an exclusive lock and can lead to two threads being granted exclusive access. Check that no one has the same lock in exclusive mode before proceeding to acquire it. The LK_WANT_EXCL and LK_WANT_UPGRADE bits act as mini-locks and can block other threads. Normally this is not a problem since the mini locks are upgraded to full locks and the release of the locks will unblock the other threads. However if a thread reset the bits without obtaining a full lock other threads are not awoken. Add missing wakeups for these cases. PR: kern/69964 Submitted by: Stephan Uphoff <ups at tree dot com> Very good catch by: Stephan Uphoff <ups at tree dot com>	2004-08-16 15:01:22 +00:00
obrien	8f41b4e870	s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g	2004-08-16 08:33:37 +00:00
rwatson	7ac0169fa1	Always acquire the UNIX domain socket subsystem lock (UNP lock) before dereferencing sotounpcb() and checking its value, as so_pcb is protected by protocol locking, not subsystem locking. This prevents races during close() by one thread and use of ths socket in another. unp_bind() now assert the UNP lock, and uipc_bind() now acquires the lock around calls to unp_bind().	2004-08-16 04:41:03 +00:00
green	1de6d6df05	Add the missing knote_fdclose().	2004-08-16 03:09:01 +00:00
green	99deda206a	Allocate the marker, when scanning a kqueue, from the "heap" instead of the stack. When swapped out, a process's kernel stack would be unavailable, and we could get a page fault when scanning the same kqueue. PR: kern/61849	2004-08-16 03:08:38 +00:00
rwatson	2cd0dbc8de	Annotate the current UNIX domain socket locking strategies, order, strengths, and weaknesses in a comment. Assert a copyright over the changes made as part of the locking work.	2004-08-16 01:52:04 +00:00
silby	e3f2e32958	Major enhancements to pipe memory usage: - pipespace is now able to resize non-empty pipes; this allows for many more resizing opportunities - Backing is no longer pre-allocated for the reverse direction of pipes. This direction is rarely (if ever) used, so this cuts the amount of map space allocated to a pipe in half. - Pipe growth is now much more dynamic; a pipe will now grow when the total amount of data it contains and the size of the write are larger than the size of pipe. Previously, only individual writes greater than the size of the pipe would cause growth. - In low memory situations, pipes will now shrink during both read and write operations, where possible. Once the memory shortage ends, the growth code will cause these pipes to grow back to an appropriate size. - If the full PIPE_SIZE allocation fails when a new pipe is created, the allocation will be retried with SMALL_PIPE_SIZE. This helps to deal with the situation of a fragmented map after a low memory period has ended. - Minor documentation + code changes to support the above. In total, these changes increase the total number of pipes that can be allocated simultaneously, drastically reducing the chances that pipe allocation will fail. Performance appears unchanged due to dynamic resizing.	2004-08-16 01:27:24 +00:00
truckman	5facb5bcc2	Yet another tweak to the shutdown messages in boot(): Don't count busy buffers before the initial call to sync() and don't skip the initial sync() if no busy buffers were called. Always call sync() at least once if syncing is requested. This defers the "Syncing disks, buffers remaining..." message until after the initial sync() call and the first count of busy buffers. This backs out changes in kern_shutdown 1.162. Print a different message when there are no busy buffers after the initial sync(), which is now the expected situation. Print an additional message when syncing has completed successfully in the unusual situation where the work of syncing was done by boot(). Uppercase one message to make it consistent with all of the other kernel shutdown messages. Discussed with: bde (in a much earlier form, prior to 1.162) Reviewed by: njl (in an earlier form)	2004-08-15 19:17:23 +00:00
jmg	bc1805c6e8	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
rwatson	b3113cfdfe	Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we attempt to IPI other cpus when entering the debugger in order to stop them while in the debugger. The default remains to issue the stop; however, that can result in a hang if another cpu has interrupts disabled and is spinning, since the IPI won't be received and the KDB will wait indefinitely. We probably need to add a timeout, but this is a useful stopgap in the mean time. Reviewed by: marcel	2004-08-15 02:06:27 +00:00
rwatson	c7e2313e86	Cause pfind() not to return processes in the PRS_NEW state. As a result, threads consuming the result of pfind() will not need to check for a NULL credential pointer or other signs of an incompletely created process. However, this also means that pfind() cannot be used to test for the existence or find such a process. Annotate pfind() to indicate that this is the case. A review of curent consumers seems to indicate that this is not a problem for any of them. This closes a number of race conditions that could result in NULL pointer dereferences and related failure modes. Other related races continue to exist, especially during iteration of the allproc list without due caution. Discussed with: tjr, green	2004-08-14 17:15:16 +00:00
phk	9595df2db1	Add some KASSERTS.	2004-08-14 08:33:49 +00:00
julian	ae4d7bb6b9	Whitespace nit.	2004-08-14 07:21:20 +00:00
rwatson	136013f29f	After completing a name lookup for a target UNIX domain socket to connect to, re-check that the local UNIX domain socket hasn't been closed while we slept, and if so, return EINVAL. This affects the system running both with and without Giant over the network stack, and recent ULE changes appear to cause it to trigger more frequently than previously under load. While here, improve catching of possibly closed UNIX domain sockets in one or two additional circumstances. I have a much larger set of related changes in Perforce, but they require more testing before they can be merged. One debugging printf is left in place to indicate when such a race takes place: this is typically triggered by a buggy application that simultaenously connect()'s and close()'s a UNIX domain socket file descriptor. I'll remove this at some point in the future, but am interested in seeing how frequently this is reported. In the case of Martin's reported problem, it appears to be a result of a non-thread safe syslog() implementation in the C library, which does not synchronize access to its logging file descriptor. Reported by: mbr	2004-08-14 03:43:49 +00:00
jmg	bea28d4a04	clean up whitespace...	2004-08-13 17:43:53 +00:00
jmg	d2ff11056b	looks like rwatson forgot tabs... :)	2004-08-13 07:38:58 +00:00
julian	9ab7967d3c	Don't keep evaluating our own cpu mask.. it's not likely to have changed....	2004-08-13 00:57:43 +00:00
rwatson	74889f1a20	Trim trailing white space.	2004-08-12 18:06:21 +00:00
imp	482740a238	Minor formatting fixes for lines > 80 characters	2004-08-12 17:26:22 +00:00
jeff	8745e98dd0	- Introduce a new flag KEF_HOLD that prevents sched_add() from doing a migration. Use this in sched_prio() and sched_switch() to stop us from migrating threads that are in short term sleeps or are runnable. These extra migrations were added in the patches to support KSE. - Only set NEEDRESCHED if the thread we're adding in sched_add() is a lower priority and is being placed on the current queue. - Fix some minor whitespace problems.	2004-08-12 07:56:33 +00:00
julian	765ec5c83b	Properly keep track of how many kses are on the system run queue(s).	2004-08-11 20:54:48 +00:00
rwatson	eed836416f	Replace a reference to splnet() with a reference to locking in a comment.	2004-08-11 03:43:10 +00:00
marcel	fbbaea5f90	Add __elfN(dump_thread). This function is called from __elfN(coredump) to allow dumping per-thread machine specific notes. On ia64 we use this function to flush the dirty registers onto the backingstore before we write out the PRSTATUS notes. Tested on: alpha, amd64, i386, ia64 & sparc64 Not tested on: arm, powerpc	2004-08-11 02:35:06 +00:00
rwatson	371cf09cf7	In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However, we may sleep when doing so; check that we didn't race with another thread allocating storage for the vnode after allocation is made to a local pointer, and only update the vnode pointer if it's still NULL. Otherwise, accept that another thread got there first, and release the local storage. Discussed with: jmg	2004-08-11 01:27:53 +00:00
alc	7210ecc993	Eliminate the acquisition and release of Giant within physio(). Remove the spl calls. Reviewed by: phk@ Discussed with: scottl@	2004-08-10 21:47:11 +00:00
jhb	15d4b7d989	Synchronize the extra SA threading checks and return value handling of condition variables with that of msleep(). Reviewed by: davidxu	2004-08-10 17:42:59 +00:00
jeff	b109ddffbc	- Use a new flag, KEF_XFERABLE, to record with certainty that this kse had contributed to the transferable load count. This prevents any potential problems with sched_pin() being used around calls to setrunqueue(). - Change the sched_add() load balancing algorithm to try to migrate on wakeup. This attempts to place threads that communicate with each other on the same CPU. - Don't clear the idle counts in kseq_transfer(), let the cpus do that when they call sched_add() from kseq_assign(). - Correct a few out of date comments. - Make sure the ke_cpu field is correct when we preempt. - Call kseq_assign() from sched_clock() to catch any assignments that were done without IPI. Presently all assignments are done with an IPI, but I'm trying a patch that limits that. - Don't migrate a thread if it is still runnable in sched_add(). Previously, this could only happen for KSE threads, but due to changes to sched_switch() all threads went through this path. - Remove some code that was added with preemption but is not necessary.	2004-08-10 07:52:21 +00:00
njl	7e21ce666c	Skip the syncing disks loop if there are no dirty buffers. Remove a variable used to flag the initial printf. Submitted by: truckman (earlier version)	2004-08-10 01:32:05 +00:00
scottl	ab3ce7c4d9	Add a temporary debugging hack to detect a deadlock in setrunqueue(). This is here so that we can gather stats on the nature of the recent rash of hard lockups, and in this particular case panic the machine instead of letting it deadlock forever.	2004-08-10 00:26:25 +00:00
julian	00a6534a31	Slight changes to comments and some whitespace changes.	2004-08-09 21:57:30 +00:00
julian	38d3d854fe	Make kg->kg_runnable actually count runnable threads in the ksegrp run queue instead of only doing it sometimes.. This is not used outdide of debugging code in the current code, but that will probably change.	2004-08-09 20:36:03 +00:00
julian	ecbe8aa287	Remove typos on KASSERT messages.	2004-08-09 20:13:07 +00:00
green	fbabec2d12	Normalize the VM wiring done with SPARSE_MAPPING: check for errors, and unmap when done. For whatever reason, SPARSE_MAPPING is not even a config option, so this is dead code.	2004-08-09 18:46:13 +00:00
julian	61fada7840	Increase the amount of data exported by KTR in the KTR_RUNQ setting. This extra data is needed to really follow what is going on in the threaded case.	2004-08-09 18:21:12 +00:00
jmg	2c2b6c4ef7	add option to automaticly mark core dumps with the nodump flag PR: 57065 Submitted by: Walter C. Pelissero	2004-08-09 05:46:46 +00:00
davidxu	634d20a05e	1.Add KSE_INTR_DBSUSPEND command for kse_thr_interrupt to suspend a bound thread, after the bound thread leaves critical region, the thread should check debug flag may suspend itself by using the command. 2.Schedule upcall after thread is suspended by debugger 3.Wakeup upcall thread after process suspension. Reviewed by: deischen	2004-08-08 22:32:20 +00:00
davidxu	f8c21c52ad	Call thread_user_enter for M:N thread, ast() should be treated as another entrance of kernel.	2004-08-08 22:28:33 +00:00
davidxu	6412ad5b2e	Add pl_flags to ptrace_lwpinfo, two flags PL_FLAG_SA and PL_FLAG_BOUND indicate that a thread is in UTS critical region. Reviewed by: deischen Approved by: marcel	2004-08-08 22:26:11 +00:00
dfr	6a047f3d1e	Make sure that AT_PHDR has a useful value even for static programs.	2004-08-08 09:48:10 +00:00
jmg	6967b9b093	rearange some code that handles the thread taskqueue so that it is more generic. Introduce a new define TASKQUEUE_DEFINE_THREAD that takes a single arg, which is the name of the queue. Document these changes.	2004-08-08 02:37:22 +00:00
rwatson	656f433813	We're not yet ready to assert !Giant in kern_fcntl(), as it's called with Giant from ABI wrappers such as Linux emulation. Foot shoot off: phk	2004-08-07 14:09:02 +00:00
rwatson	37eebe5058	Flag a broad range of VFS operations as GIANT_REQUIRED in order to catch leaking into VFS without Giant. Inch Giant a little lower in several file descriptor operations on vnodes to cover only VFS operations that need it, rather than file flag reading, etc.	2004-08-06 22:25:35 +00:00
rwatson	ee17f9503f	In thread_exit(), include more information about the thread/process context in the KTR trace record. In particular, include the same information as passed for mi_switch() and fork_exit() KTR trace records.	2004-08-06 22:06:14 +00:00
rwatson	d6384e3daf	Push UIDINFO_UNLOCK() slightly earlier in chgsbize(), as it's not needed if we print the local variable version of the limit rather than the shared version.	2004-08-06 22:04:33 +00:00
rwatson	8de3afda37	Avoid acquiring Giant for some common light-weight or already MPSAFE fcntl() operations, including: F_DUPFD dup() alias F_GETFD retrieve close-on-exec flag F_SETFD set close-on-exec flag F_GETFL retrieve file descriptor flags For the remaining fcntl() operations, do acquire Giant, especially where we call into fo_ioctl() as a result. We're not yet ready to push Giant into fo_ioctl(). Once we do, this can all become quite a bit prettier.	2004-08-06 22:00:55 +00:00
rwatson	36a8fef8a8	Cut a KTR record whenever a callout is invoked. Mark whether it runs with Giant or not, and include the function point so it can be looked up against the kernel symbol table during trace analysis.	2004-08-06 21:49:00 +00:00
jhb	d3254af40d	Don't scare users with a warning about preemption being off when it isn't yet safe to have on by default.	2004-08-06 15:49:44 +00:00
rwatson	6680706c2b	In ithread_schedule(), when we plan to go harvest some entropy as a result of scheduling an ithread, cut a KTR_INTR trace record so that it's clear in tracing interrupt activity where and when the entropy harvesting code is invoked.	2004-08-06 03:39:28 +00:00
cperciva	b4bae139fd	When reseting a pending callout, perform the deregistration in callout_reset rather than calling callout_stop. This results in a few lines of code duplication, but it provides a significant performance improvement because it avoids recursing on callout_lock. Requested by: rwatson	2004-08-06 02:44:58 +00:00
jhb	73d1afd6fd	Fix the code in rman that merges adjacent unallocated resources to use a better check for 'adjacent'. The old code assumed that if two resources were adjacent in the linked list that they were also adjacent range wise. This is not true when a resource manager has to manage disparate regions. For example, the current interrupt code on i386/amd64 will instruct irq_rman to manage two disjoint regions: 0-1 and 3-15 for the non-APIC case. If IRQs 1 and 3 were allocated and then released, the old code would coalesce across the 1 to 3 boundary because the resources were adjacent in the linked list thus adding 2 to the area of resources that irq_rman managed as a side effect. The fix adds extra checks so that adjacent unallocated resources are only merged with the resource being freed if the start and end values of the resources also match up. The patch also consolidates the checks for adjacent resources being allocated.	2004-08-05 15:48:18 +00:00
jhb	fb7bd65f3f	Remove a potential deadlock on i386 SMP by changing the lazypmap ipi and spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi and spin-wait code snippets so that you can't get into the situation of one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap shootdown each of which are waiting on each other. With this change, only one of the CPUs would do an IPI and spin-wait at a time.	2004-08-04 20:31:19 +00:00
jhb	f513ad537c	Workaround a possible deadlock on SMP due to a spin lock LOR by disabling the immediate awakening of proc0 (scheduler kproc, controls swapping processes in and out). The scheduler process periodically awakens already, so this will not result in processes not being swapped in, there will just be more latency in between a thread being made runnable and the scheduler waking up to swap the affected process back in.	2004-08-04 20:24:40 +00:00
jhb	c75eeac1df	Cache the value of curthread in the _get_sleep_lock() and _get_spin_lock() macros and pass the value to the associated _mtx_*() functions to avoid more curthread dereferences in the function implementations. This provided a very modest perf improvement in some benchmarks. Suggested by: rwatson Tested by: scottl	2004-08-04 20:18:45 +00:00
rwatson	5d6fea3b71	Assert Giant in namei(). Bugs have been reported in which, following a sleep() call waking up in namei(), a later assertion triggers that Giant is not held. By asserting Giant at the start of namei(), we can know that if that assertion triggers, Giant is lost during the call to namei(), and not before.	2004-08-04 18:39:07 +00:00
rwatson	243f24944e	Assert Giant in the following file descriptor-related functions: Function Reason -------- ------ fdfree() VFS setugidsafety() KQueue fdcheckstd() VFS _fgetvp() VFS fgetsock() Conditional assertion based on debug.mpsafenet	2004-08-04 18:35:33 +00:00
rwatson	76535adbaa	Remove spl's from kern_resource.c.	2004-08-04 18:19:09 +00:00
mux	35780dc21a	Instead of calling ia32_pause() conditionally on __i386__ or __amd64__ being defined, define and use a new MD macro, cpu_spinwait(). It only expands to something on i386 and amd64, so the compiled code should be identical. Name of the macro found by: jhb Reviewed by: jhb	2004-08-03 18:44:27 +00:00
pjd	7a05d0a3cd	Don't skip permission checks when sending signals to zombie processes. Pointed out by: bde Reviewed by: rwatson	2004-08-03 15:39:23 +00:00
silby	e327e6bd59	Standardize pipe locking, ensuring that everything is locked via pipelock(), not via a mixture of mutexes and pipelock(). Additionally, add a few KASSERTS, and change some statements that should have been KASSERTS into KASSERTS. As a result of these cleanups, some segments of code have become significantly shorter and/or easier to read.	2004-08-03 02:59:15 +00:00
davidxu	6f2afa324d	s/TMDF_DONOTRUNUSER/TMDF_SUSPEND/g Dicussed with: deischen	2004-08-03 02:23:06 +00:00
julian	6121fa3e4d	Repeat after me: "Do not apply your tested patches to your commit tree by hand"	2004-08-03 01:43:29 +00:00
julian	f1c5d06daf	Remove an argument that is never used.	2004-08-02 23:48:43 +00:00
obrien	47f728c0bc	Put a cap on the auto-tuning of kern.maxvnodes. Cap value chosen by: scottl	2004-08-02 21:52:43 +00:00
rwatson	a21d9ff09b	Add what appears to be a missing '*/' at the end of a comment.	2004-08-02 01:38:27 +00:00
green	9532ab7116	* Add a "how" argument to uma_zone constructors and initialization functions so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.	2004-08-02 00:18:36 +00:00
julian	b0892abf37	Comment kse_create() and make a few minor code cleanups Reviewed by: davidxu	2004-08-01 23:02:00 +00:00
phk	2d868d02cf	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.	2004-07-30 22:08:52 +00:00
alc	6aaed2f8ea	Giant is no longer required by vm_waitproc() and vmspace_exitfree(). Eliminate it acquisition and release around vm_waitproc() in kern_wait().	2004-07-30 20:31:02 +00:00
njl	774b91783e	Minor message cleanup.	2004-07-30 01:30:05 +00:00
pjd	809d561dd5	Syscall kill(2) called for a zombie process should return 0. Obtained from: Darwin	2004-07-29 20:38:19 +00:00
pjd	7e5db42c7a	Fill some informations about zombie processes as well. Before this change every zombie process were reported as an owner of PID 0 in ps(1) output. Reviewed by: julian	2004-07-29 20:27:59 +00:00
phk	075684f5fd	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.	2004-07-28 20:21:04 +00:00
kan	48b2ea77a3	Avoid casts as lvalues.	2004-07-28 06:42:41 +00:00
davidxu	5610a7e068	Use P_SINGLE_EXIT to check single-threading case, P_WEXIT is not for that purpose.	2004-07-28 06:30:52 +00:00
phk	8c9258b82e	Convert the vfsconf list to a TAILQ. Introduce vfs_byname() function to find things on it. Staticize vfs_nmount() function under the name vfs_donmount(). Various cleanups.	2004-07-27 22:32:01 +00:00
rwatson	4ab080249a	Pass a thread argument into cpu_critical_{enter,exit}() rather than dereference curthread. It is called only from critical_{enter,exit}(), which already dereferences curthread. This doesn't seem to affect SMP performance in my benchmarks, but improves MySQL transaction throughput by about 1% on UP on my Xeon. Head nodding: jhb, bmilekic	2004-07-27 16:41:01 +00:00
rwatson	1fea905f48	Add "options ADAPTIVE_GIANT" which causes Giant to also be treated in an adaptive fashion when adaptive mutexes are enabled. The theory behind non-adaptive Giant is that Giant will be held for long periods of time, and therefore spinning waiting on it is wasteful. However, in MySQL benchmarks which are relatively Giant-free, running Giant adaptive makes an observable difference on SMP (5% transaction rate improvement). As such, make adaptive behavior on Giant an option so it can be more widely benchmarked.	2004-07-27 16:34:48 +00:00
alc	8a38bc6b2c	- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.	2004-07-27 03:53:41 +00:00
bmilekic	1c3958ce88	Move the schedlock owner state update following the context switch in fork_exit() to before anything else is done (but keep schedlock for the deadthread check). This means one less nasty bug if ever in the future whatever might have been called before the update played with schedlock or critical sections. Discussed with: tjr	2004-07-27 03:46:31 +00:00
cperciva	c009fddfd6	In revision 1.228, I accidentally broke the "total number of processes in the system" resource limit code: When checking if the caller has superuser privileges, we should be checking the real user, not the effective user. (In general, resource limiting is done based on the real user, in order to avoid resource-exhaustion-by-setuid-program attacks.) Now that a SUSER_RUID flag to suser_cred exists, use it here to return this code to its correct behaviour. Pointed out by: rwatson	2004-07-26 07:54:39 +00:00
cperciva	d9fecc83c8	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
rwatson	ec34d4330f	Revert modification of subr_turnstile.c accidentally included in the last commit; this assertion was provided by jhb for local debugging and not intended for broader consumption.	2004-07-25 23:32:32 +00:00
rwatson	4c9acdbfaf	In uipc_connect(), assert that the passed thread is curthread, and pass td into unp_connect() instead of reading curthread.	2004-07-25 23:30:43 +00:00
rwatson	0e43e3b1b4	Do some initial locking on accept filter registration and attach. While here, close some races that existed in the pre-locking world during low memory conditions. This locking isn't perfect, but it's closer than before.	2004-07-25 23:29:47 +00:00

... 3 4 5 6 7 ...

7937 Commits