freebsd-dev

Author	SHA1	Message	Date
Julian Elischer	9ef95d0105	rename the process to 'idle' and 'intr' as per jhb.	2007-10-27 00:52:26 +00:00
Julian Elischer	fbf7046447	Initialise the initial process pointer to NULL so that we know we don't have an idle process yet. I'm guessing that on my system this was always 0 already. found by: Ed Schouten	2007-10-27 00:42:40 +00:00
Julian Elischer	6bc3d1dc09	If kthread_exit() is called on the last kthread in a kproc, then all the work in kproc_exit must be done. We don't actually have a user of this yet but why leave it to chance.	2007-10-26 22:18:20 +00:00
Julian Elischer	ca9a0ddf31	if one changes a function's arguments, one must also change the callers.	2007-10-26 22:03:19 +00:00
Julian Elischer	5f66cfca51	oops, over optimised and broke non-SMP builds	2007-10-26 20:32:33 +00:00
Julian Elischer	dd1b3ff97e	kthread_exit needs no stinkin argument.	2007-10-26 17:03:22 +00:00
David E. O'Brien	ef44c8d2a3	style(9)	2007-10-26 16:33:47 +00:00
Julian Elischer	7ab24ea3b9	Introduce a way to make pure kernal threads. kthread_add() takes the same parameters as the old kthread_create() plus a pointer to a process structure, and adds a kernel thread to that process. kproc_kthread_add() takes the parameters for kthread_add, plus a process name and a pointer to a pointer to a process instead of just a pointer, and if the proc * is NULL, it creates the process to the specifications required, before adding the thread to it. All other old kthread_xxx() calls return, but act on (struct thread ) instead of (struct proc ). One reason to change the name is so that any old kernel modules that are lying around and expect kthread_create() to make a process will not just accidentally link. fix top to show kernel threads by their thread name in -SH mode add a tdnam formatting option to ps to show thread names. make all idle threads actual kthreads and put them into their own idled process. make all interrupt threads kthreads and put them in an interd process (mainly for aesthetic and accounting reasons) rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper' man page fixes to follow.	2007-10-26 08:00:41 +00:00
Christian S.J. Peron	57274c513c	Implement AUE_CORE, which adds process core dump support into the kernel. This change introduces audit_proc_coredump() which is called by coredump(9) to create an audit record for the coredump event. When a process dumps a core, it could be security relevant. It could be an indicator that a stack within the process has been overflowed with an incorrectly constructed malicious payload or a number of other events. The record that is generated looks like this: header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec argument,0,0xb,signal path,/usr/home/csjp/test.core subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2 return,success,1 trailer,111 - We allocate a completely new record to make sure we arent clobbering the audit data associated with the syscall that produced the core (assuming the core is being generated in response to SIGABRT and not an invalid memory access). - Shuffle around expand_name() so we can use the coredump name at the very beginning of the coredump call. Make sure we free the storage referenced by "name" if we need to bail out early. - Audit both successful and failed coredump creation efforts Obtained from: TrustedBSD Project Reviewed by: rwatson MFC after: 1 month	2007-10-26 01:23:07 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Christian S.J. Peron	5ff3816d82	Move where we audit the PID argument such that we unconditionally audit it at the beginning of the syscall. This fixes a problem where the user supplies an invalid process ID which is > 0 which results in the PID argument not being audited. Obtained from: TrustedBSD Project MFC after: 1 week	2007-10-24 00:14:19 +00:00
Julian Elischer	e9271f5376	Take out the single-threading code in fork. After discussions with jeff, alc, (various Ironport people), david Xu, and mostly Alfred (who found the problem) it has been demonstrated that this is not needed for our implementations of threads and represents a real (as in we've seen it happen a lot) deadlock danger. Several points: Since forking multiple threads is not allowed, and posix states that any mutexes owned by othre threads wilol be owned in the child by phantom threads, and therads shouldn't ba accessing shared structures without protection, It can be proved that if this leads to the child process accessing inconsistent data, it's a programming error. The mode of thread_single() being used in fork() is the wrong one. It is using SINGLE_NO_EXIT when it should be using SINGLE_BOUNDARY. Even if this we used, System processes have no need to do it as they have no userland to get inconsistent. This commmit first fixes the above bugs to get tehm correct in CVS. then removes them with #ifdef. This is so that history contains the corrected version should it be needed in the future. This code may be needed if we implement the forkall() syscall from Solaris. It may be needed for other non-posix thread libraries at some time in the future, so let the code sit for a short while while I do some work on it anyhow. This removes a reproducible lockup in NFS. It may be argued that maybe doing a fork while holding a vnode lock may not be the best idea in th efirst place but it shouldn't cause a deadlock. The removal has been running under soak test for several days now. This removal should be seriously considered for 7.0 and RELENG_6. Note. There is code in the core-dumping code that may have a similar problem with coredumping threaded processes MFC After: 4 days	2007-10-23 17:54:15 +00:00
Peter Grehan	cbdd62ad04	Cut over to ULE on PowerPC kern/sched_ule.c - Add __powerpc__ to the list of supported architectures powerpc/conf/GENERIC - Swap SCHED_4BSD with SCHED_ULE powerpc/powerpc/genassym.c - Export TD_LOCK field of thread struct powerpc/powerpc/swtch.S - Handle new 3rd parameter to cpu_switch() by updating the old thread's lock. Note: uniprocessor-only, will require modification for MP support. powerpc/powerpc/vm_machdep.c - Set 3rd param of cpu_switch to mutex of old thread's lock, making the call a no-op. Reviewed by: marcel, jeffr (slightly older version)	2007-10-23 00:52:25 +00:00
John Birrell	1676805c18	Add the full module path name to the kld_file_stat structure for kldstat(2). This allows libdtrace to determine the exact file from which a kernel module was loaded without having to guess. The kldstat(2) API is versioned with the size of the kld_file_stat structure, so this change creates version 2. Add the pathname to the verbose output of kldstat(8) too. MFC: 3 days	2007-10-22 04:12:57 +00:00
Robert Watson	e41966dc35	Add PRIV_VFS_STAT privilege, which will allow overriding policy limits on the right to stat() a file, such as in mac_bsdextended. Obtained from: TrustedBSD Project MFC after: 3 months	2007-10-21 22:50:11 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Ed Maste	7188e3c834	Put comments about syscalls by the correct ones, and use the correct syscall number in the comment.	2007-10-19 19:17:53 +00:00
Sam Leffler	58590eb06b	ULE works fine on arm; allow it to be used Reviewed by: jeff, cognet, imp MFC after: 1 week	2007-10-16 19:25:26 +00:00
Alfred Perlstein	7c45a9c446	Export maxswzone, maxbcache, maxtsiz, dfldsiz, maxdsiz, dflssiz, maxssiz, and sgrowsiz via sysctl. MFC after: 1 week	2007-10-16 10:40:53 +00:00
Alexander Leidinger	9f05d312b3	Backout sensors framework. Requested by: phk Discussed on: cvs-all	2007-10-15 20:00:24 +00:00
Alexander Leidinger	99f6b270e3	Import OpenBSD's sysctl hardware sensors framework. This commit includes the following core components: * sample configuration file for sensorsd * rc(8) script and glue code for sensorsd(8) * sysctl(3) doc fixes for CTL_HW tree * sysctl(3) documentation for hardware sensors * sysctl(8) documentation for hardware sensors * support for the sensor structure for sysctl(8) * rc.conf(5) documentation for starting sensorsd(8) * sensor_attach(9) et al documentation * /sys/kern/kern_sensors.c o sensor_attach(9) API for drivers to register ksensors o sensor_task_register(9) API for the update task o sysctl(3) glue code o hw.sensors shadow tree for sysctl(8) internal magic * <sys/sensors.h> * HW_SENSORS definition for <sys/sysctl.h> * sensors display for systat(1), including documentation * sensorsd(8) and all applicable documentation The userland part of the framework is entirely source-code compatible with OpenBSD 4.1, 4.2 and -current as of today. All sensor readings can be viewed with `sysctl hw.sensors`, monitored in semi-realtime with `systat -sensors` and also logged with `sensorsd`. Submitted by: Constantine A. Murenin <cnst@FreeBSD.org> Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors) Mentored by: syrinx Tested by: many OKed by: kensmith Obtained from: OpenBSD (parts)	2007-10-14 10:45:31 +00:00
Dag-Erling Smørgrav	d302c56d9b	I don't know what I was smoking when I wrote these three years ago; the return value is an error code, hence always an int. While I'm here, add getenv_uint() for completeness.	2007-10-13 11:30:19 +00:00
Mohan Srinivasan	58d14dae6d	Set the NFS server sockbuf high watermarks to the system defaults (up form 32KB). The low highwatermark setting caused UDP fullsock request drops, throttling thruput greatly. Reported by: Kris Kennaway Approved by: re@ (Ken Smith)	2007-10-12 03:56:27 +00:00
Jeff Roberson	8753688f03	- Fix from pr kern/115469; Don't redeliver a signal once it has been handled by the target process. Contributed by: Tijl Coosemans <tijl@ulyssis.org> Approved by: re	2007-10-09 00:03:39 +00:00
Jeff Roberson	88f530cc25	- Bail out of tdq_idled if !smp_started or idle stealing is disabled. This fixes a bug on UP machines with SMP kernels where the idle thread constantly switches after trying to steal work from the local cpu. - Make the idle stealing code more robust against self selection. - Prefer to steal from the cpu with the highest load that has at least one transferable thread. Before we selected the cpu with the highest transferable count which excludes bound threads. Collaborated with: csjp Approved by: re	2007-10-08 23:50:39 +00:00
Jeff Roberson	05dc0eb204	- Restore historical sched_yield() behavior by changing sched_relinquish() to simply switch rather than lowering priority and switching. This allows threads of equal priority to run but not lesser priority. Discussed with: davidxu Reported by: NIIMI Satoshi <sa2c@sa2c.net> Approved by: re	2007-10-08 23:45:24 +00:00
Jeff Roberson	40a940af86	- Restore historical yield() behavior by manually lowering priority and switching. Approved by: re	2007-10-08 23:40:40 +00:00
Jeff Roberson	5bce4ae3be	- Fix ULE in kernels without PREEMPTION compiled in by always enabling the critical_exit() owepreempt check. ULE will always use owepreempt to preempt the idle thread. This change does not effect 4BSD since it will never set owepreempt without PREEMPTION enabled. - Remove some unused code from choosethread(). Discussed with: jhb Approved by: re	2007-10-08 23:37:28 +00:00
Kip Macy	457869b973	This patch adds an M_NOFREE flag which allows one to mark an mbuf as not being independently freeable. This allows one to embed an mbuf in the cluster itself. This confers the benefits of the packet zone on all cluster sizes. Embedded mbufs currently suffer from the same limitation that packet zone mbufs do in that one cannot disconnect them and pass them around independently of the cluster. It would likely be possible to eliminate this limitation in the future by adding a second reference for the mbuf itself. Approved by: re(gnn)	2007-10-06 21:42:39 +00:00
Kip Macy	629b9e0853	Allow drivers to free an mbuf without having the mbuf be touched if the driver has already freed any attached tags Approved by: re(gnn)	2007-10-06 21:13:55 +00:00
Pawel Jakub Dawidek	764a938b11	Fix sx_try_slock(), so it only fails when there is an exclusive owner. Before that fix, it was possible for the function to fail if number of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr() call. This fixes ZFS problem when ZFS returns strange EIO errors under load. In ZFS there is a code that depends on the fact that sx_try_slock() can only fail if there is an exclusive owner. Discussed with: attilio Reviewed by: jhb Approved by: re (kensmith)	2007-10-02 14:48:48 +00:00
Jeff Roberson	59c6813475	- Reassign the thread queue lock to newtd prior to switching. Assigning after the switch leads to a race where the outgoing thread still owns the local queue lock while another cpu may switch it in. This race is only possible on machines where cpu_switch can take significantly longer on different cpus which in practice means HTT machines with unfair thread scheduling algorithms. Found by: kris (of course) Approved by: re	2007-10-02 01:30:18 +00:00
Jeff Roberson	7fcf154aef	- Move the rebalancer back into hardclock to prevent potential softclock starvation caused by unbalanced interrupt loads. - Change the rebalancer to work on stathz ticks but retain randomization. - Simplify locking in tdq_idled() to use the tdq_lock_pair() rather than complex sequences of locks to avoid deadlock. Reported by: kris Approved by: re	2007-10-02 00:36:06 +00:00
Jeff Roberson	02e2d6b445	- Honor the PREEMPTION and FULL_PREEMPTION flags by setting the default value for kern.sched.preempt_thresh appropriately. It can still by adjusted at runtime. ULE will still use IPI_PREEMPT in certain migration situations. - Assert that we're not trying to compile ULE on an unsupported architecture. To date, I believe only i386 and amd64 have implemented the third cpu switch argument required. Approved by: re	2007-09-27 16:39:27 +00:00
Ruslan Ermilov	718a600b20	Fix the description of the formula used to autosize the number of buffers in the buffer cache. Approved by: re (kensmith)	2007-09-26 11:22:23 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Jeff Roberson	e270652ba3	- Bound the interactivity score so that it cannot become negative. Approved by: re	2007-09-24 00:28:54 +00:00
Jeff Roberson	a5423ea313	- Improve grammar. s/it's/its/. - Improve load long-term load balancer by always IPIing exactly once. Previously the delay after rebalancing could cause problems with uneven workloads. - Allow nice to have a linear effect on the interactivity score. This allows negatively niced programs to stay interactive longer. It may be useful with very expensive Xorg servers under high loads. In general it should not be necessary to alter the nice level to improve interactive response. We may also want to consider never allowing positively niced processes to become interactive at all. - Initialize ccpu to 0 rather than 0.0. The decimal point was leftover from when the code was copied from 4bsd. ccpu is 0 in ULE because ULE only exports weighted cpu values. Reported by: Steve Kargl (Load balancing problem) Approved by: re	2007-09-22 02:20:14 +00:00
Pawel Jakub Dawidek	b4d7e2983c	Fix some locking cases where we ask for exclusively locked vnode, but we get shared locked vnode in instead when vfs.lookup_shared is set to 1. Discussed with: kib, kris Tested by: kris Approved by: re (kensmith)	2007-09-21 10:16:56 +00:00
Jeff Roberson	54b0e65f84	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re	2007-09-21 04:10:23 +00:00
Jeff Roberson	f462501739	- Call sched_sleep() before we suspend threads. sched_wakeup() is already called via setrunnable(). This allows time slept while suspended to be accounted for swap. Approved by: re	2007-09-21 04:04:22 +00:00
Attilio Rao	c8790f5d09	Fix some entries in the locks static table of witness. In particular: - smp_tlb_mtx is no longer used, so it is axed. - smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in the table, however, has been the source of a false positive LOR reporting with the dt_lock. However, smp rendezvous lock would have had sched_lock there for older lock, so it wasn't still a leaf lock. - allpmaps is only used in ia32 architecture, so it is inserted in the appropriate stub. Addictionally: - kse_zombie_lock is no longer present, so its definition is axed out. - zombie_lock doesn't need to have an exported symbol, so just let's it be declared as static. Tested by: kris Approved by: jeff (mentor) Approved by: re	2007-09-20 20:38:43 +00:00
Jeff Roberson	b61ce5b0e6	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
Robert Watson	dce5df0dfc	Remove the definition and implementation of 'CALLOUT_NETGIANT', a now- (and possibly always-) unused define. Reported by: kmacy Approved by: re (kensmith)	2007-09-15 12:33:24 +00:00
Attilio Rao	4486adc51f	Currently the LO_NOPROFILE flag (which is masked on upper level code by per-primitive macros like MTX_NOPROFILE, SX_NOPROFILE or RW_NOPROFILE) is not really honoured. In particular lock_profile_obtain_lock_failure() and lock_profile_obtain_lock_success() are naked respect this flag. The bug leads to locks marked with no-profiling to be profiled as well. In the case of the clock_lock, used by the timer i8254 this leads to unpredictable behaviour both on amd64 and ia32 (double faults panic, sudden reboots, etc.). The amd64 clock_lock is also not marked as not profilable as it should be. Fix these bugs adding proper checks in the lock profiling code and at clock_lock initialization time. i8254 bug pointed out by: kris Tested by: matteo, Giuseppe Cocomazzi <sbudella at libero dot it> Approved by: jeff (mentor) Approved by: re	2007-09-14 01:12:39 +00:00
Attilio Rao	c7fb7ce53a	subr_sleepqueue.c presents a thread lock missing which leads to dangerous races for some struct thread members. More specifically, this bug seems responsible for some memory dumping problems people were experiencing. Fix this adding correct thread locking. Tested by: rwatson Submitted by: tegge Approved by: jeff Approved by: re	2007-09-13 09:12:36 +00:00
Konstantin Belousov	245b204491	When restoring the mount after umount failed, the MNTK_UNMOUNT flag prevents insmntque() from placing reallocated syncer vnode on mount list, that causes panic in vfs_allocate_syncvnode(). Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is set and cleared simultaneously with MNTK_UNMOUNT, except on umount error path, where it is cleaned just before the syncer vnode is going to be allocated. Reported by: Peter Jeremy <peterjeremy optushome com au> Suggested by: tegge Approved by: re (rwatson)	2007-09-12 16:31:32 +00:00
Attilio Rao	0b2e598c14	This is a follow-up, cleaning-up commit about recent changes involving topology foo functions. Working at the patch for topology problems in ia32/amd64 evicted some problems regarding functions ordering in the SI_SUB_CPU family of SYSINIT'ed subsystems. In order to avoid problems with new modified to involved functions, a correct ordering is not semantically specified for SI_SUB_CPU functions (for a larger view of the issue please visit: http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html ) Discussed with: peter Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org> Approved by: jeff Approved by: re	2007-09-11 22:54:09 +00:00
Robert Watson	45e0f3d63d	Rename mac_check_vnode_delete() MAC Framework and MAC Policy entry point to mac_check_vnode_unlink(), reflecting UNIX naming conventions. This is the first of several commits to synchronize the MAC Framework in FreeBSD 7.0 with the MAC Framework as it will appear in Mac OS X Leopard. Reveiwed by: csjp, Samy Bahra <sbahra at gwu dot edu> Submitted by: Jacques Vidrine <nectar at apple dot com> Obtained from: Apple Computer, Inc. Sponsored by: SPARTA, SPAWAR Approved by: re (bmah)	2007-09-10 00:00:18 +00:00
Robert Watson	70ffc2fb53	In userland_sysctl(), call useracc() with the actual newlen value to be used, rather than the one passed via 'req', which may not reflect a rewrite. This call to useracc() is redundant to validation performed by later copyin()/copyout() calls, so there isn't a security issue here, but this could technically lead to excessive validation of addresses if the length in newlen is shorter than req.newlen. Approved by: re (kensmith) Reviewed by: jhb Submitted by: Constantine A. Murenin <cnst+freebsd@bugmail.mojo.ru> Sponsored by: Google Summer of Code 2007	2007-09-02 09:59:33 +00:00
John Baldwin	67b158d888	Close a race that snuck in with the recent changes to fix a LOR between the callout_lock spin lock and the sleepqueue spin locks. In the fix, callout_drain() has to drop the callout_lock so it can acquire the sleepqueue lock. The state of the callout can change while the callout_lock is held however (for example, it can be rescheduled via callout_reset()). The previous code assumed that the only state change that could happen is that the callout could finish executing. This change alters callout_drain() to effectively restart and recheck everything after it acquires the sleepqueue lock thus handling all the possible states that the callout could be in after any changes while callout_lock was dropped. Approved by: re (kensmith) Tested by: kris	2007-08-31 19:01:30 +00:00
Diomidis Spinellis	d5b6981e69	Add missing newline in the log message of the previous commit. Approved by: re (kensmith) - implied	2007-08-31 13:56:26 +00:00
Diomidis Spinellis	72de1b3709	Don't panic. When encountering a negative value call log(LOG_NOTICE, ...) and record LONG_MAX, instead of calling KASSERT(...). Reported by: rwatson Approved by: re (kensmith)	2007-08-31 13:36:58 +00:00
John Baldwin	57b7fe337e	Partially revert the previous change. I failed to notice that where ktruserret() is invoked, an unlocked check of the per-process queue is performed inline, thus, we don't lock the ktrace_sx on every userret(). Pointy hat to: jhb Approved by: re (kensmith) Pointy hat recovered from: rwatson	2007-08-29 21:17:11 +00:00
John Baldwin	cc479dda4a	Rework the routines to convert a 5.x+ statfs structure (with fixed-size 64-bit counters) to a 4.x statfs structure (with long-sized counters). - For block counters, we scale up the block size sufficiently large so that the resulting block counts fit into a the long-sized (long for the ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs VOP did this already. This can lie about the block size to 4.x binaries, but it presents a more accurate picture of the ratios of free and available space. - For non-block counters, fix the freebsd32 stats converter to cap the values at INT32_MAX rather than losing the upper 32-bits to match the behavior of the 4.x statfs conversion routine in vfs_syscalls.c Approved by: re (kensmith)	2007-08-28 20:28:12 +00:00
Randall Stewart	2afb3e849f	- During shutdown pending, when the last sack came in and the last message on the send stream was "null" but still there, a state we allow, we could get hung and not clean it up and wait for the shutdown guard timer to clear the association without a graceful close. Fix this so that that we properly clean up. - Added support for Multiple ASCONF per new RFC. We only (so far) accept input of these and cannot yet generate a multi-asconf. - Sysctl'd support for experimental Fast Handover feature. Always disabled unless sysctl or socket option changes to enable. - Error case in add-ip where the peer supports AUTH and ADD-IP but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to ABORT in this case. - According to the Kyoto summit of socket api developers (Solaris, Linux, BSD). We need to have: o non-eeor mode messages be atomic - Fixed o Allow implicit setup of an assoc in 1-2-1 model if using the sctp_**() send calls - Fixed o Get rid of HAVE_XXX declarations - Done o add a sctp_pr_policy in hole in sndrcvinfo structure - Done o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch! - Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize when we close sending out the data and disabling Nagle. - Change key concatenation order to match the auth RFC - When sending OOTB shutdown_complete always do csum. - Don't send PKT-DROP to a PKT-DROP - For abort chunks just always checksums same for shutdown-complete. - inpcb_free front state had a bug where in queue data could wedge an assoc. We need to just abandon ones in front states (free_assoc). - If a peer sends us a 64k abort, we would try to assemble a response packet which may be larger than 64k. This then would be dropped by IP. Instead make a "minimum" size for us 64k-2k (we want at least 2k for our initack). If we receive such an init discard it early without all the processing. - When we peel off we must increment the tcb ref count to keep it from being freed from underneath us. - handling fwd-tsn had bugs that caused memory overwrites when given faulty data, fixed so can't happen and we also stop at the first bad stream no. - Fixed so comm-up generates the adaption indication. - peeloff did not get the hmac params copied. - fix it so we lock the addr list when doing src-addr selection (in future we need to use a multi-reader/one writer lock here) - During lowlevel output, we could end up with a _l_addr set to null if the iterator is calling the output routine. This means we would possibly crash when we gather the MTU info. Fix so we only do the gather where we have a src address cached. - we need to be sure to set abort flag on conn state when we receive an abort. - peeloff could leak a socket. Moved code so the close will find the socket if the peeloff fails (uipc_syscalls.c) Approved by: re@freebsd.org(Ken Smith)	2007-08-27 05:19:48 +00:00
Konstantin Belousov	5114048b63	Destroy the kaio_mtx on the freeing the struct kaioinfo in the aio_proc_rundown. Do not allow for zero-length read to be passed to the fo_read file method by aio. Reported and tested by: Peter Holm Approved by: re (kensmith)	2007-08-20 11:53:26 +00:00
Jeff Roberson	67e20930bd	- Improve runq_findbit_from() which is used by ULE's circular queue. Mask of the bits we want to ignore on the first pass rather than doing a linear scan. This puts us within a few instructions of the cost of runq_findbit() and removes this function from the top of profiling output for context switch heavy workloads. Approved by: re	2007-08-20 06:36:12 +00:00
Jeff Roberson	9862717afe	- Set steal_thresh to log2(ncpus). This improves idle-time load balancing on 2cpu machines by reducing it to 1 by default. This improves loaded operation on 8cpu machines by increasing it to 3 where the extra idle time is not as critical. Approved by: re	2007-08-20 06:34:20 +00:00
Nate Lawson	62db376af3	Always call sched_bind(), even if on the CPU in question. It is wrong to check if we're already on that cpu and skip the bind since the thread could be migrated off in the meantime. Suggested by: jeff Approved by: re	2007-08-20 06:28:26 +00:00
Nate Lawson	2145b9d207	Use a different loop variable for the inner loop. This previous reuse could have caused a hang, but we got lucky with the available multi-CPU states on actual hardware. Submitted by: Bjorn Koenig <bkoenig / alpha-tierchen.de> Approved by: re MFC after: 3 days	2007-08-19 20:34:13 +00:00
David Xu	6ec46f7aa8	Regenerate. Approved by: re(kensmith)	2007-08-16 05:32:26 +00:00
David Xu	0b1f0611b4	Add thr_kill2 syscall which sends a signal to a thread in another process. Submitted by: Tijl Coosemans tijl at ulyssis dot org Approved by: re (kensmith)	2007-08-16 05:26:42 +00:00
John Baldwin	1dc5b1cc56	On 6.x this works: % mount \| grep home /dev/ad4s1e on /home (ufs, local, noatime, soft-updates) % mount -u -o atime /home % mount \| grep home /dev/ad4s1e on /home (ufs, local, soft-updates) Restore this behavior for on 7.x for the following mount options: noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow In addition, on 7.x, the following are equivalent: mount -u -o atime /home mount -u -o nonoatime /home Ideally, when we introduce new mount options, we should avoid options starting with "no". :) Requested by: jhb Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com> Approved by: re (bmah) Proxy commit for: rodrigc	2007-08-15 17:40:09 +00:00
Pawel Jakub Dawidek	354eb80141	Improve vn_printf() by: - adding missing vnode flags, - printing unknown flags as numbers, - using strlcat() instead of strcat(). Approved by: re (bmah)	2007-08-13 21:23:30 +00:00
Konstantin Belousov	004e08be60	Do not call free() while holding vnode interlock. Reported and tested by: Peter Holm Reviewed by: jeff Approved by: re (kensmith)	2007-08-07 09:04:50 +00:00
Robert Watson	0bf686c125	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
Jeff Roberson	3a78f9658b	- Fix one line that erroneously crept in my last commit. Approved by: re	2007-08-04 01:21:28 +00:00
Jeff Roberson	c47f202b45	- Share scheduler locks between hyper-threaded cores to protect the tdq_group structure. Hyper-threaded cores won't really benefit from seperate locks anyway. - Seperate out the migration case from sched_switch to simplify the main switch code. We only migrate here if called via sched_bind(). - When preempted place the preempted thread back in the same queue at the head. - Improve the cpu group and topology infrastructure. Tested by: many on current@ Approved by: re	2007-08-03 23:38:46 +00:00
Jeff Roberson	413ea6f543	- Set SW_PREEMPT when we preempt in critical_exit(). Approved by: re	2007-08-03 23:35:35 +00:00
Robert Watson	33d2bb9ca3	First in a series of changes to remove the now-unused Giant compatibility framework for non-MPSAFE network protocols: - Remove debug_mpsafenet variable, sysctl, and tunable. - Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel. - Remove logic to automatically flag interrupt handlers as non-MPSAFE if debug.mpsafenet is set for an INTR_TYPE_NET handler. - Remove logic to automatically flag netisr handlers as non-MPSAFE if debug.mpsafenet is set. - Remove references in a few subsystems, including NFS and Cronyx drivers, which keyed off debug_mpsafenet to determine various aspects of their own locking behavior. - Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into no-op's, as their entire behavior was determined by the value in debug_mpsafenet. - Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE. Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still present in subsystems, and will be removed in followup commits. Reviewed by: bz, jhb Approved by: re (kensmith)	2007-07-27 11:59:57 +00:00
Attilio Rao	34ed040030	Actually, upcalls cannot be freed while destroying the thread because we should call uma_zfree() with various spinlock helds. Rearranging the code would not help here because we cannot break atomicity respect prcess spinlock, so the only one choice we have is to defer the operation. In order to do this use a global queue synchronized through the kse_lock spinlock which is freed at any thread_alloc() / thread_wait() through a call to thread_reap(). Note that this approach is not ideal as we should want a per-process list of zombie upcalls, but it follows initial guidelines of KSE authors. Tested by: jkim, pav Approved by: jeff, julian Approved by: re	2007-07-27 09:21:18 +00:00
Pawel Jakub Dawidek	57fd3d5572	When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)	2007-07-26 16:58:09 +00:00
Pawel Jakub Dawidek	68c1a246ae	The v_mountedhere field is protected by the vnode lock, not vnode's internal lock. Approved by: re (rwatson)	2007-07-26 16:52:57 +00:00
Attilio Rao	758b17a100	upcall_free() was only used in kse_GC() which has been removed so it now results unused; this, with -Werror option of gcc, rise a warning for gcc which let the buildkernel to be busted. Fix this removing upcall_free(). Reported by: various Approved by: jeff Approved by: re Pointy hat to: attilio	2007-07-23 23:16:53 +00:00
Attilio Rao	ac8094e4e3	Actually, KSE kernel bits locking is broken and can lead likely to dangerous races. Fix this problems adding correct locking for the members of 'struct kse_upcall' and other struct proc/struct thread related members. For the moment, just leave ku_mflag and ku_flags "lazy" locked. While here, cleanup the code removing the function kse_GC() (unused), and merging upcall_link(), upcall_unlink(), upcall_stash() in their respective callers (static functions, very short and only called in one place). Reported by: pav Tested by: pav (on some pointyhat cluster nodes) Approved by: jeff Approved by: re Sponsorized by: NGX Italy (http://www.ngx.it)	2007-07-23 14:52:22 +00:00
David Malone	6d8617d42a	If clock_ct_to_ts fails to convert time time from the real time clock, print a one line error message. Add some comments on not being able to trust the day of week field (I'll act on these comments in a follow up commit). Approved by: re MFC after: 3 weeks	2007-07-23 09:42:32 +00:00
Konstantin Belousov	e69aee3117	ttyfree() frees the cdev(). But if there are pending kevents, filt_ttyrdetach() etc would later attempt to dereference cdev->si_tty, causing a 0xdeadc0de dereference. Change kn_hook value from cdev to struct tty to avoid dereferencing freed cdev. In ttygone(), wake up select(), sigio and kevent() users in addition to the queue sleepers. Return EV_EOF from kevent filters if TS_GONE is set. Submitted by: peter Tested by: Peter Holm Approved by: re (kensmith) MFC after: 2 weeks	2007-07-20 09:41:54 +00:00
Attilio Rao	6aa294be2c	Fix some problems with lock profiling in rw locks: - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - As for sx locks, disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - style(9) fixes Approved by: jeff, kmacy, jhb[1] Approved by: re [1] Had initial reservations not shared by others, conceded in the end.	2007-07-20 08:43:42 +00:00
Jeff Roberson	28994a5852	- Refine the load balancer to improve buildkernel times on dual core machines. - Leave the long-term load balancer running by default once per second. - Enable stealing load from the idle thread only when the remote processor has more than two transferable tasks. Setting this to one further improves buildworld. Setting it higher improves mysql. - Remove the bogus pick_zero option. I had not intended to commit this. - Entirely disallow migration for threads with SRQ_YIELDING set. This balances out the extra migration allowed for with the load balancers. It also makes pick_pri perform better as I had anticipated. Tested by: Dmitry Morozovsky <marck@rinet.ru> Approved by: re	2007-07-19 20:03:15 +00:00
Jeff Roberson	08c9a16c4f	- When newtd is specified to sched_switch() it was not being initialized properly. We have to temporarily unlock the TDQ lock so we can lock the thread and add it to the run queue. This is used only for KSE. - When we add a thread from the tdq_move() via sched_balance() we need to ipi the target if it's sitting in the idle thread or it'll never run. Reported by: Rene Landan Approved by: re	2007-07-19 19:51:45 +00:00
Jeff Roberson	56696bd1ab	- Remove explicit references to sched_lock. A simpler assert will do. Approved by: re	2007-07-19 08:58:40 +00:00
Jeff Roberson	6eeb364b4c	- Calling sched_nice() in tdsigwakeup() is no longer required by ULE and actually causes LORs and other panics. Reported by: mlaier Approved by: re	2007-07-19 08:49:16 +00:00
Jeff Roberson	6ea38de8aa	- Remove the global definition of sched_lock in mutex.h to break new code and third party modules which try to depend on it. - Initialize sched_lock in sched_4bsd.c. - Declare sched_lock in sparc64 pmap.c and assert that we're compiling with SCHED_4BSD to prevent accidental crashes from running ULE. This is the sole remaining file outside of the scheduler that uses the global sched_lock. Approved by: re	2007-07-18 20:46:06 +00:00
Jeff Roberson	773890b9a8	- Add the proper lock profiling calls to _thread_lock(). Obtained from: kipmacy Approved by: re	2007-07-18 20:38:13 +00:00
Jeff Roberson	ae7a6b38d5	ULE 3.0: Fine grain scheduler locking and affinity improvements. This has been in development for over 6 months as SCHED_SMP. - Implement one spin lock per thread-queue. Threads assigned to a run-queue point to this lock via td_lock. - Improve the facility for assigning threads to CPUs now that sched_lock contention no longer dominates scheduling decisions on larger SMP machines. - Re-write idle time stealing in an attempt to make it less damaging to general performance. This is still disabled by default. See kern.sched.steal_idle. - Call the long-term load balancer from a callout rather than sched_clock() so there are no locks held. This is disabled by default. See kern.sched.balance. - Parameterize many scheduling decisions via sysctls. Try to document these via sysctl descriptions. - General structural and naming cleanups. - Document each function with comments. Tested by: current@ amd64, x86, UP, SMP. Approved by: re	2007-07-17 22:53:23 +00:00
Jeff Roberson	fb62eea266	- Use ruxagg() in calcru() to make sure we have current tick information from all threads. Discussed with: bde, attilio Approved by: re	2007-07-17 01:08:09 +00:00
Craig Rodrigues	d7f81adbd4	Revert previous commits which I committed by mistake. Approved by: re (implicit) Pointy hat to: me	2007-07-14 21:23:31 +00:00
Craig Rodrigues	d678780e60	The last entry in the ext2_opts array must be NULL, otherwise the kernel with crash in vfs_filteropt() if an invalid mount option is passed to ext2fs. Approved by: re (kensmith)	2007-07-14 21:18:19 +00:00
John Baldwin	59d8f3ff08	Fix a couple of issues with the stack limit for 32-bit processes on 64-bit kernels exposed by the recent fixes to resource limits for 32-bit processes on 64-bit kernels: - Let ABIs expose their maximum stack size via a new pointer in sysentvec and use that in preference to maxssiz during exec() rather than always using maxssiz for all processses. - Apply the ABI's limit fixup to the previous stack size when adjusting RLIMIT_STACK to determine if the existing mapping for the stack needs to be grown or shrunk (as well as how much it should be grown or shrunk). Approved by: re (kensmith)	2007-07-12 18:01:31 +00:00
Attilio Rao	c1a6d9fa42	Fix some problems with lock_profiling in sx locks: - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - Disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - Use homogeneous names for automatic variables in hard functions regarding lock_profiling - Style fixes - Add a CTASSERT for some flags building Discussed with: kmacy, kris Approved by: jeff (mentor) Approved by: re	2007-07-06 13:20:44 +00:00
Konstantin Belousov	196a7385ac	Revert destroy_dev() to the state before destroy_dev_sched() was introduced. Attempt to spawn destroy_dev_sched() from it causes inadmissible races. Requested by: tegge Approved by: re (kensmith)	2007-07-05 13:04:59 +00:00
Bjoern A. Zeeb	f43455fd89	Remove netkey directory from cscope/TAGs generation and replace it with netipsec now that KAME IPsec is gone. While here add missing netinet6 directories. Add comments about the ports needed to be able to run those targets. Reviewed by: philip Approved by: re (rwatson)	2007-07-05 08:55:14 +00:00
Peter Wemm	22af4cab91	Fix bad function type passed to destroy_dev_sched_cb(). Approved by: re (rwatson)	2007-07-05 05:54:47 +00:00
Peter Wemm	c2815ad564	Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate Approved by: re (kensmith)	2007-07-04 22:57:21 +00:00
Peter Wemm	552fbe752f	Regenerate after mmap/lseek/etc syscall changes. Approved by: re (kensmith)	2007-07-04 22:49:55 +00:00
Peter Wemm	51504d9ac4	Create new syscalls for mmap(), lseek(), pread(), pwrite(), truncate() and ftruncate(), but without the pad arg. There are several reasons for this. Consider 'mmap()'. On AMD64, the function call (and syscall) ABI allow for 6 register arguments. Additional arguments go on the stack. mmap(2) has 6 arguments. However, the syscall definition has an extra 'int pad' argument. This pushes it to 7 arguments, which means one must spill into the memory stack. Since the kernel API doesn't match userland API, we have a hack in libc - libc/sys/mmap.c. This implements the userland API by calling __syscall() with an extra argument and the pad argument, for a total of 8 args. This is all unnecessary and inconvenient for several things, including the kernel's syscall handler code which now has to handle merging stack arguments with register arguments. It is a big deal for certain 3rd party code. I'm adding libc glue to make the transition totally painless. I had intended to mark the old syscalls as COMPAT6, but the potential to shoot your feet by building a new kernel without COMPAT_FREEBSD6 but with a slighly older userland was too great. For now, they have manual "freebsd6_" prefixes rather than being COMPAT6. They will go back to being marked 'COMPAT6' after 7-stable starts. Approved by: re (kensmith)	2007-07-04 22:47:37 +00:00
Peter Wemm	9f0482e515	Add support for COMPAT6 syscalls. Also, change the visibility of compat syscalls a slightly. Compat syscalls were missing from 'syscalls.h' entirely. This additionally adds them with their compat prefix. eg: SYS_freebsd6_mmap. Also, the syscalls.c names strings have different prefixes to differentiate syscalls. Instead of several "old.mmap" strings, there will now be a "compat.mmap" and "compat6.mmap" etc. Before, both would have had the same "old.mmap" label. Approved by: re	2007-07-04 22:38:28 +00:00
Konstantin Belousov	09828ba947	Since cdev mutex is after system map mutex in global lock order, free() shall not be called while holding cdev mutex. devfs_inos unrhdr has cdev as mutex, thus creating this LOR situation. Postpone calling free() in kern/subr_unit.c:alloc_unr() and nested functions until the unrhdr mutex is dropped. Save the freed items on the ppfree list instead, and provide the clean_unrhdrl() and clean_unrhdr() functions to clean the list. Call clean_unrhdrl() after devfs_create() calls immediately before dropping cdev mutex. devfs_create() is the only user of the alloc_unrl() in the tree. Reviewed by: phk Tested by: Peter Holm LOR: 80 Approved by: re (kensmith)	2007-07-04 06:56:58 +00:00
Jeff Roberson	f6c1ecca50	- Use explicit locking in the various fcntl case statements so that we can acquire shared filedescriptor locks in the appropriate cases. - Remove Giant from calls that issue ioctls. The ioctl path has been mpsafe for some time now. - Only acquire giant for VOP_ADVLOCK when the filesystem requires giant. advlock is now mpsafe. Reviewed by: rwatson Approved by: re	2007-07-03 21:26:06 +00:00
Jeff Roberson	bc02f1d98d	- Remove explicit Giant protection from lockf. Use the vnode interlock to protect this datastructure instead. - Preallocate an extra lockf structure in case we want to split a lock on insert or delete. - msleep() on the vnode interlock when blocking on a lock. Reviewed by: rwatson Approved by: re	2007-07-03 21:22:58 +00:00
John Baldwin	fb1faf2082	Tweak the low-level MI SMP code some: - Use cpu_spinwait() in the spin loops in stop_cpus(), restart_cpus(), and smp_rendezvous_action(). - Remove unneeded acq memory barriers in stop_cpus(), restart_cpus(), and smp_rendezvous_action(). - Add an additional synch point in smp_rendezvous() to ensure that all the CPUs will always see an up-to-date value of smp_rv_setup_func. Reviewed by: attilio Approved by: re (kensmith) Tested on: alpha, amd64, i386, sparc64 SMP (for several years)	2007-07-03 18:37:06 +00:00
Konstantin Belousov	9d53363bc8	Rev. 1.204 and 1.205 got an erronous version of destroy_dev() that calls destroy_dev_sched() with cdev mutex locked. Commit the code that was actually tested. Pointy hat to: kib Approved by: re (implicit)	2007-07-03 18:18:30 +00:00
Konstantin Belousov	f5baf8d66b	Lock Giant and proctree lock around dereferencing p_session->s_ttyvp->v_rdev. Lock cdev mutex too to close the race with tty being freed. Relock clone_drain_lock to prevent the LOR with proctree lock, thus add #include <fs/devfs/devfs_int.h>. Suggested by: tegge Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:46:37 +00:00
Konstantin Belousov	8a5d7ef25c	Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from pty clone handler. Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:45:52 +00:00
Konstantin Belousov	0a9c2b6db8	Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from the clone handler. Lock Giant in the clone handler. Use destroy_dev_sched() explicitely from pty_maybecleanup() and postpone pty_release() until both master and slave cdevs are destroyed by setting it as callback for destroy_dev_sched(). Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:44:59 +00:00
Konstantin Belousov	6f0281937b	Automatically detect deadlock condition in destroy_dev(), that is, if destroy_dev() is called from csw method, and no d_purge driver method is provided. Transform the direct call to destroy_dev() into destroy_dev_sched(). Reviewed by: njl (programming interface) Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:43:20 +00:00
Konstantin Belousov	de10ffa527	Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls destroy_dev() from d_close() cdev method would self-deadlock. devfs_close() bump device thread reference counter, and destroy_dev() sleeps, waiting for si_threadcount to reach zero for cdev without d_purge method. destroy_dev_sched() could be used instead from d_close(), to schedule execution of destroy_dev() in another context. The destroy_dev_sched_drain() function can be used to drain the scheduled calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains the events clone to make sure no lingering devices are left after dev_clone event handler deregistered. make_dev_credf(MAKEDEV_REF) function should be used from dev_clone event handlers instead of make_dev()/make_dev_cred() to ensure that created device has reference counter bumped before cdev mutex is dropped inside make_dev(). Reviewed by: tegge (early versions), njl (programming interface) Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:42:37 +00:00
Konstantin Belousov	7aee5992a5	Relock the sema_mtxp unconditionally after copyin() for SETALL case in kern_semctl. Otherwise, later mtx_unlock() can operate on unlocked mutex. Submitted by: rdivacky MFC after: 3 days Approved by: re (kensmith)	2007-07-03 15:58:47 +00:00
Robert Watson	bc6eca2432	Continue kernel privilege cleanup for 7.0: unstaticize suser_enabled and stop declaring it in systm.h -- it's used only in kern_priv.c and is not required elsewhere. Approved by: re (kensmith)	2007-07-02 14:03:29 +00:00
Randall Stewart	b8709d23c5	- Add some needed error checking on bad fd passing in the sctp syscalls. Approved by: re@freebsd.org (Ken Smith) Obtained from: Weongyo Jeong (weongyo.jeong@gmail.com)	2007-07-02 12:50:53 +00:00
Jeff Roberson	03d03260b2	- Use rufetchcalc() rather than calcru() in ttyinfo so that we get correct system and user time stats. Approved by: re Reported by: kris Discussed with: Attilio	2007-07-01 00:17:59 +00:00
Robert Watson	dc2e1e3fae	Use vm_offset_t for kmembase and kmemlimit rather than char *, avoiding unnecessary casts, and making it possible to compile kern_malloc.c with strict aliasing. Submitted by: rdivacky Approved by: re (kensmith)	2007-06-27 13:39:38 +00:00
Attilio Rao	6a0ce57d10	Fix an old standing LOR between callout_lock and sleepqueues chain (which could lead to a deadlock). - sleepq_set_timeout acquires callout_lock (via callout_reset()) only with sleepq chain lock held - msleep_spin in _callout_stop_safe lock the sleepqueue chain with callout_lock held In order to solve this don't use msleep_spin in _callout_stop_safe() but use directly sleepqueues as inline msleep_spin code. Rearrange the wakeup path in order to have it consistent too. Reported by: kris (via stress2 test suite) Tested by: Timothy Redaelli <drizzt@gufi.org> Reviewed by: jhb Approved by: jeff (mentor) Approved by: re	2007-06-26 21:42:01 +00:00
Attilio Rao	f08945a7d2	Introduce a new rwlocks initialization function: rw_init_flags. This is very similar to sx_init_flags: it initializes the rwlock using special flags passed as third argument (RW_DUPOK, RW_NOPROFILE, RW_NOWITNESS, RW_QUIET, RW_RECURSE). Among these, the most important new feature is probabilly that rwlocks can be acquired recursively now (for both shared and exclusive paths). Because of the recursion counter, the ABI is changed. Tested by: Timothy Redaelli <drizzt@gufi.org> Reviewed by: jhb Approved by: jeff (mentor) Approved by: re	2007-06-26 21:31:56 +00:00
Rong-En Fan	534046e301	- Remove UMAP filesystem. It was disconnected from build three years ago, and it is seriously broken. Discussed on: freebsd-arch@ Approved by: re (mux)	2007-06-25 05:06:57 +00:00
Konstantin Belousov	9bc911d4a2	devfs_free() calls free_unr(), that may sleep. Postpone call to devfs_free() after cdev mutex is dropped. Reuse cdp_list link for queuing devices awaiting deletion in the cdevp_free_list. Reported by: Hans Petter Selasky <hselasky c2i net> Tested by: Peter Holm Approved by: re (kensmith) MFC after: 2 weeks	2007-06-19 13:19:23 +00:00
Konstantin Belousov	7550e3eac4	Add the witness warning for free_unr. Function could sleep, thus callers shall not have any non-sleepable locks held. Submitted by: Hans Petter Selasky <hselasky c2i net> Approved by: re (kensmith)	2007-06-19 13:13:17 +00:00
Pawel Jakub Dawidek	dfe97ff4a5	We only flush entries related to the given file system. Currently there are no 'invalid' cache entires - file system is responsible for keeping it that way. The comment should have been updated in rev.1.25.	2007-06-18 09:28:24 +00:00
Robert Watson	7251b7863c	Rather than passing SUSER_RUID into priv_check_cred() to specify when a privilege is checked against the real uid rather than the effective uid, instead decide which uid to use in priv_check_cred() based on the privilege passed in. We use the real uid for PRIV_MAXFILES, PRIV_MAXPROC, and PRIV_PROC_LIMIT. Remove the definition of SUSER_RUID; there are now no flags defined for priv_check_cred(). Obtained from: TrustedBSD Project	2007-06-16 23:41:43 +00:00
Marius Strobl	79be8b5082	- Remove zstty spin lock for no longer existing zs(4). - Move the rtc_mtx spin lock out from under #ifdef SMP as it's just not SMP-specific. - Add a new spin lock pcib_mtx for locking "fast" interrupt handlers of host-to-PCI bridge drivers on sparc64.	2007-06-16 23:30:57 +00:00
Jeff Roberson	dda713dfb8	- Fix an off by one error in sched_pri_range. - In tdq_choose() only assert that a thread does not have too high a priority (low value) for the queue we removed it from. This will catch bugs in priority elevation. It's not a serious error for the thread to have too low a priority as we don't change queues in this case as an optimization. Reported by: kris	2007-06-15 19:33:58 +00:00
Robert Watson	7e273744a6	Remove the restriction that rtprio(2) cannot be used to set the realtime or idle priority of another process owned by the same user. This means that privilege in rtprio(2) (and rtprio_thread(2)) is required indirectly via p_cansched(9) or directly to set realtime/idle privilege, rather than directly affecting target process authorization.	2007-06-14 23:31:52 +00:00
Robert Watson	b4be6ef22f	Only require privilege to set the current time adjustment, not in order to query it.	2007-06-14 18:37:58 +00:00
Robert Watson	3805385e3d	Spell statistics more correctly in comments.	2007-06-14 03:02:33 +00:00
John Baldwin	34a9edafbc	Improve the ktrace locking somewhat to reduce overhead: - Depessimize userret() in kernels where KTRACE is enabled by doing an unlocked check of the per-process queue of pending events before acquiring any locks. Previously ktr_userret() unconditionally acquired the global ktrace_sx lock on every return to userland for every thread, even if ktrace wasn't enabled for the thread. - Optimize the locking in exit() to first perform an unlocked read of p_traceflag to see if ktrace is enabled and only acquire locks and teardown ktrace if the test succeeds. Also, explicitly disable tracing before draining any pending events so the pending events actually get written out. The unlocked read is safe because proc lock is acquired earlier after single-threading so p_traceflag can't change between then and this check (well, it can currently due to a bug in ktrace I will fix next, but that race existed prior to this change as well). Reviewed by: rwatson	2007-06-13 20:01:42 +00:00
John Baldwin	ce0be64687	Conditionally acquire Giant when dropping a reference on the ktrace vnode during execve() when turning off tracing due to executing a setuid binary as non-root. Previously this could fail to acquire Giant and fail an assertion if the ktrace file was on a non-MPSAFE filesystem and the executable was on an MPSAFE filesystem. MFC after: 3 days Reported by: kris	2007-06-13 19:41:47 +00:00
Jeff Roberson	3036ab79e3	- Include opt_sched.h for SCHED_STATS.	2007-06-12 23:27:31 +00:00
Jeff Roberson	671f2709ae	- Garbage collect unused concurrency functions.	2007-06-12 19:50:31 +00:00
Jeff Roberson	e7c8d2e9fe	- Garbage collect unused concurrency functions. - Remove unused kse fields from struct proc. - Group remaining fields and #ifdef KSE them. - Move some kern_kse.c only prototypes out of proc and into kern_kse. Discussed with: Julian	2007-06-12 19:49:39 +00:00
Jeff Roberson	fe54587ffa	- Move some common code out of sched_fork_exit() and back into fork_exit().	2007-06-12 07:47:09 +00:00
Jeff Roberson	ff8fbcffcb	Solve a complex exit race introduced with thread_lock: - Add a count of exiting threads, p_exitthreads, to struct proc. - Increment p_exithreads when we set the deadthread in thread_exit(). - When we thread_stash() a deadthread use an atomic to drop the count. - Spin until the p_exithreads count reaches 0 in thread_wait(). - Lock the last exiting thread momentarily to be certain that it has exited cpu_throw(). - Restructure thread_wait(). It does not need a loop as there will only ever be one thread. Tested by: moose@opera.com Reported by: kris, moose@opera.com	2007-06-12 07:24:46 +00:00
Robert Watson	32f9753cfb	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
Jeff Roberson	efe641b939	- Add a missing PROC_SUNLOCK() in tdsignal()	2007-06-11 23:27:03 +00:00
Olivier Houchard	e411ce026a	Re-acquire the PROC_SLOCK before calling calcru(), and release it after, since calcru() expects it to be locked. Reviewed by: attilio	2007-06-11 21:05:41 +00:00
Sam Leffler	68e8e04e93	Update 802.11 wireless support: o major overhaul of the way channels are handled: channels are now fully enumerated and uniquely identify the operating characteristics; these changes are visible to user applications which require changes o make scanning support independent of the state machine to enable background scanning and roaming o move scanning support into loadable modules based on the operating mode to enable different policies and reduce the memory footprint on systems w/ constrained resources o add background scanning in station mode (no support for adhoc/ibss mode yet) o significantly speedup sta mode scanning with a variety of techniques o add roaming support when background scanning is supported; for now we use a simple algorithm to trigger a roam: we threshold the rssi and tx rate, if either drops too low we try to roam to a new ap o add tx fragmentation support o add first cut at 802.11n support: this code works with forthcoming drivers but is incomplete; it's included now to establish a baseline for other drivers to be developed and for user applications o adjust max_linkhdr et. al. to reflect 802.11 requirements; this eliminates prepending mbufs for traffic generated locally o add support for Atheros protocol extensions; mainly the fast frames encapsulation (note this can be used with any card that can tx+rx large frames correctly) o add sta support for ap's that beacon both WPA1+2 support o change all data types from bsd-style to posix-style o propagate noise floor data from drivers to net80211 and on to user apps o correct various issues in the sta mode state machine related to handling authentication and association failures o enable the addition of sta mode power save support for drivers that need net80211 support (not in this commit) o remove old WI compatibility ioctls (wicontrol is officially dead) o change the data structures returned for get sta info and get scan results so future additions will not break user apps o fixed tx rate is now maintained internally as an ieee rate and not an index into the rate set; this needs to be extended to deal with multi-mode operation o add extended channel specifications to radiotap to enable 11n sniffing Drivers: o ath: add support for bg scanning, tx fragmentation, fast frames, dynamic turbo (lightly tested), 11n (sniffing only and needs new hal) o awi: compile tested only o ndis: lightly tested o ipw: lightly tested o iwi: add support for bg scanning (well tested but may have some rough edges) o ral, ural, rum: add suppoort for bg scanning, calibrate rssi data o wi: lightly tested This work is based on contributions by Atheros, kmacy, sephe, thompsa, mlaier, kevlo, and others. Much of the scanning work was supported by Atheros. The 11n work was supported by Marvell.	2007-06-11 03:36:55 +00:00
Attilio Rao	393a081d42	Optimize vmmeter locking. In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)	2007-06-10 21:59:14 +00:00
Matt Jacob	a659386c7e	Remove unused variable.	2007-06-10 01:50:05 +00:00
Matt Jacob	26756b7a58	The new compiler can't quite follow the logic of has_stime and complains about using uninitialized tags in stime.	2007-06-10 01:49:17 +00:00
Matt Jacob	9b73d2396a	Initialized ets to zero. This is arguably a gcc bug in that ets is always set to rts when timeout is non-NULL and then timevalid is set and ets is only checked later when timervalid is set.	2007-06-10 01:43:11 +00:00
Attilio Rao	bdf08be439	Fix a bug caming from the committing a pre-merge version of the patch instead than a post-merge version (respect to another rusage fix). Reported by: marcel Approved by: jeff(mentor)	2007-06-10 00:28:41 +00:00
Marcel Moolenaar	55b5660de4	Work around an integer overflow in expression `3 * maxbufspace / 4', when maxbufspace is larger than INT_MAX / 3. The overflow causes a hard hang on ia64 when physical memory is sufficiently large (8GB).	2007-06-09 23:41:14 +00:00
Attilio Rao	a1fe14bc33	rufetch and calcru sometimes should be called atomically together. This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)	2007-06-09 21:48:44 +00:00
Attilio Rao	86a49dea5b	Since locking in kern/subr_prof.c is changed a bit, we need nomore of time_lock spinlock exported. Approved by: jeff (mentor)	2007-06-09 19:41:14 +00:00
Attilio Rao	a140976eb4	The current rusage code show peculiar problems: - Unsafeness on ruadd() in thread_exit() - Unatomicity of thread_exiit() in the exit1() operations This patch addresses these problems allocating p_fd as part of the process and modifying the way it is accessed. A small chunk of this patch, resolves a race about p_state in kern_wait(), since we have to be sure about the zombif-ing process. Submitted by: jeff Approved by: jeff (mentor)	2007-06-09 18:56:11 +00:00
Matt Jacob	65d32cd8fb	Propagate volatile qualifier to make gcc4.2 happy.	2007-06-09 18:09:37 +00:00
Attilio Rao	e682569165	Remove the MUTEX_WAKE_ALL option and make it the default behaviour for our mutexes. Currently we alredy force MUTEX_WAKE_ALL beacause of some problems with the !MUTEX_WAKE_ALL case (unavioidable priority inversion).	2007-06-08 21:36:52 +00:00
Poul-Henning Kamp	7acfb0af82	Double the WITNESS and DIAGNOSTIC benchmark warnings right before we go into userland to improve the chances of people noticing them.	2007-06-08 11:47:36 +00:00
Xin LI	7b8c8b858c	In getblk(), before gbincore(), use BO_LOCK directly when locking the bufobj, rather than using VI_LOCK, like what was done with revision 1.453.	2007-06-08 07:05:08 +00:00
Robert Watson	faef53711b	Move per-process audit state from a pointer in the proc structure to embedded storage in struct ucred. This allows audit state to be cached with the thread, avoiding locking operations with each system call, and makes it available in asynchronous execution contexts, such as deep in the network stack or VFS. Reviewed by: csjp Approved by: re (kensmith) Obtained from: TrustedBSD Project	2007-06-07 22:27:15 +00:00
John Baldwin	a66fde8d35	- Remove unused variable from create_thread(). - Move kern_thr_() prototype to <sys/syscallsubr.h> where all the other kern_() prototypes live.	2007-06-07 19:45:19 +00:00
David Xu	42ce445fed	Backout experimental adaptive-spin umtx code.	2007-06-06 07:35:08 +00:00
Jeff Roberson	710eacdc5f	- Placing the 'volatile' on the right side of the * in the td_lock declaration removes the need for __DEVOLATILE(). Pointed out by: tegge	2007-06-06 03:40:47 +00:00
Attilio Rao	d301eb10c7	Fix a problem with not-preemptive kernels caming from mis-merging of existing code with the new thread_lock patch. This also cleans up a bit unlock operation for mutexes. Approved by: jhb, jeff(mentor)	2007-06-05 18:57:09 +00:00
Konstantin Belousov	b95b98b0bd	Restore non-SMP build. Reviewed by: attilio	2007-06-05 14:20:13 +00:00
Jeff Roberson	95e3a0bca3	- Better fix for previous error; use DEVOLATILE on the td_lock pointer it can actually sometimes be something other than sched_lock even on schedulers which rely on a global scheduler lock. Tested by: kan	2007-06-05 04:12:46 +00:00
Jeff Roberson	c219b097af	- Pass &sched_lock as the third argument to cpu_switch() as this will always be the correct lock and we don't get volatile warnings this way. Pointed out by: kan	2007-06-05 03:46:54 +00:00
Jeff Roberson	36b369163b	- Define TDQ_ID() for the !SMP case. - Default pick_pri to off. It is not faster in most cases.	2007-06-05 02:53:51 +00:00
Jeff Roberson	8e0185f604	- Remove sched_core.c. The maintainer has lost interest in pursuing this and it has been neglected in the recent ksegrp removal as well as the thread_lock() changes. Discussed with: davidxu	2007-06-05 00:12:37 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Jeff Roberson	bd43e47156	Commit 10/14 of sched_lock decomposition. - Add new spinlocks to support thread_lock() and adjust ordering. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:55:45 +00:00
Jeff Roberson	07a61420ff	Commit 9/14 of sched_lock decomposition. - Attempt to return the ttyinfo() selection algorithm to something sane as it has been broken and disabled for some time. Adapt this algorithm in such a way that it does not conflict with per-cpu scheduler locking. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:55:32 +00:00
Jeff Roberson	3c2e44364e	Commit 8/14 of sched_lock decomposition. - Use a global umtx spinlock to protect the sleep queues now that there is no global scheduler lock. - Use thread_lock() to protect thread state. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:54:50 +00:00
Jeff Roberson	765b2891e8	Commit 7/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. - Use a global kse spinlock to protect upcall and thread assignment. The per-process spinlock can not be used because this lock must be acquired via mi_switch() where we already hold a thread lock. The kse spinlock is a leaf lock ordered after the process and thread spinlocks. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:54:27 +00:00
Jeff Roberson	11bda9b8d5	Commit 6/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. - Replace the tail-end of fork_exit() with a scheduler specific routine which can do the appropriate lock manipulations. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:53:34 +00:00
Jeff Roberson	40acdeabab	Commit 5/14 of sched_lock decomposition. - Protect the cp_time tick counts with atomics instead of a global lock. There will only be one atomic per tick and this allows all processors to execute softclock concurrently. - In softclock, protect access to rusage and td_*tick data with the thread_lock(), expanding the scope of the thread lock over the whole function. - Do some creative re-arranging in hardclock() to avoid excess locking. - Protect the p_timer fields with the per-process spinlock. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:53:06 +00:00
Jeff Roberson	a54e85fdbf	Commit 4/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. - Move some common code into thread_suspend_switch() to handle the mechanics of suspending a thread. The locking here is incredibly convoluted and should be simplified. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:52:24 +00:00
Jeff Roberson	2502c107ba	Commit 3/14 of sched_lock decomposition. - Add a per-turnstile spinlock to solve potential priority propagation deadlocks that are possible with thread_lock(). - The turnstile lock order is defined as the exact opposite of the lock order used with the sleep locks they represent. This allows us to walk in reverse order in priority_propagate and this is the only place we wish to multiply acquire turnstile locks. - Use the turnstile_chain lock to protect assigning mutexes to turnstiles. - Change the turnstile interface to pass back turnstile pointers to the consumers. This allows us to reduce some locking and makes it easier to cancel turnstile assignment while the turnstile chain lock is held. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:51:44 +00:00
Jeff Roberson	d72e80f09a	Commit 2/14 of sched_lock decomposition. - Adapt sleepqueues to the new thread_lock() mechanism. - Delay assigning the sleep queue spinlock as the thread lock until after we've checked for signals. It is illegal for a thread to return in mi_switch() with any lock assigned to td_lock other than the scheduler locks. - Change sleepq_catch_signals() to do the switch if necessary to simplify the callers. - Simplify timeout handling now that locking a sleeping thread has the side-effect of locking the sleepqueue. Some previous races are no longer possible. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:50:56 +00:00
Jeff Roberson	7b20fb19fb	Commit 1/14 of sched_lock decomposition. - Move all scheduler locking into the schedulers utilizing a technique similar to solaris's container locking. - A per-process spinlock is now used to protect the queue of threads, thread count, suspension count, p_sflags, and other process related scheduling fields. - The new thread lock is actually a pointer to a spinlock for the container that the thread is currently owned by. The container may be a turnstile, sleepqueue, or run queue. - thread_lock() is now used to protect access to thread related scheduling fields. thread_unlock() unlocks the lock and thread_set_lock() implements the transition from one lock to another. - A new "blocked_lock" is used in cases where it is not safe to hold the actual thread's lock yet we must prevent access to the thread. - sched_throw() and sched_fork_exit() are introduced to allow the schedulers to fix-up locking at these points. - Add some minor infrastructure for optionally exporting scheduler statistics that were invaluable in solving performance problems with this patch. Generally these statistics allow you to differentiate between different causes of context switches. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:50:30 +00:00
Attilio Rao	b4b7081961	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
Attilio Rao	6759608248	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:38:48 +00:00
David Malone	041b706b2f	Despite several examples in the kernel, the third argument of sysctl_handle_int is not sizeof the int type you want to export. The type must always be an int or an unsigned int. Remove the instances where a sizeof(variable) is passed to stop people accidently cut and pasting these examples. In a few places this was sysctl_handle_int was being used on 64 bit types, which would truncate the value to be exported. In these cases use sysctl_handle_quad to export them and change the format to Q so that sysctl(1) can still print them.	2007-06-04 18:25:08 +00:00
David Malone	df82ff50ed	Add a function for exporting 64 bit types.	2007-06-04 18:14:28 +00:00
Kris Kennaway	cdcc788a7e	Revert some debugging KTRs that were added during development.	2007-06-03 18:24:31 +00:00
Konstantin Belousov	7a31868ed0	Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file: part 2. Convert calls missed in the first big commit. Noted by: rwatson Pointy hat to: kib	2007-06-01 14:33:11 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Paolo Pisati	3401f2c1df	In some particular cases (like in pccard and pccbb), the real device handler is wrapped in a couple of functions - a filter wrapper and an ithread wrapper. In this case (and just in this case), the filter wrapper could ask the system to schedule the ithread and mask the interrupt source if the wrapped handler is composed of just an ithread handler: modify the "old" interrupt code to make it support this situation, while the "new" interrupt code is already ok. Discussed with: jhb	2007-05-31 19:25:35 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
Robert Watson	049c3b6cdf	Now that sx(9) locks support an interruptible lock acquire primitive, properly observe the SB_NOINTR flag in sblock. This restores the required behavior that lock acquisition be interruptible on the socket buffer I/O serialization lock to allow threads waiting for I/O to be signaled even if they aren't the thread currently holding the I/O lock. With this change, the sblock regression test is again passed. Reported by: alfred sx(9) handiwork: attilio	2007-05-31 11:51:22 +00:00
Attilio Rao	f9819486e5	Add functions sx_xlock_sig() and sx_slock_sig(). These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)	2007-05-31 09:14:48 +00:00
Attilio Rao	2c7289cbfa	style(9) fixes for sx locks. Approved by: jeff (mentor)	2007-05-29 19:46:37 +00:00
Attilio Rao	acf840c4bd	Add a small fix for lock profiling in sx locks. "0" cannot be a correct value since when the function is entered at least one shared holder must be present and since we want the last one "1" is the correct value. Note that lock_profiling for sx locks is far from being perfect. Expect further fixes for that. Approved by: jeff (mentor)	2007-05-29 19:34:32 +00:00
Attilio Rao	02b0a160dc	Fix some problems introduced with the last descriptors tables locking patch: - Do the correct test for ldt allocation - Drop dt_lock just before to call kmem_free (since it acquires blocking locks inside) - Solve a deadlock with smp_rendezvous() where other CPU will wait undefinitively for dt_lock acquisition. - Add dt_lock in the WITNESS list of spinlocks While applying these modifies, change the requirement for user_ldt_free() making that returning without dt_lock held. Tested by: marcus, tegge Reviewed by: tegge Approved by: jeff (mentor)	2007-05-29 18:55:41 +00:00
Robert Watson	03c96c3176	Add DDB "show unpcb" command, allowing DDB to print out many pertinent details from UNIX domain socket protocol layer state.	2007-05-29 12:36:00 +00:00
Ed Maste	911d16b8cd	Revert 1.197 and instead avoid calling kdb_enter() if the KDB_UNATTENDED option is in use.	2007-05-28 21:50:54 +00:00
Warner Losh	cfa7a8beea	Simplify the kernel configuration file return code. Reviewed by: wkoszek	2007-05-28 20:41:10 +00:00
Ed Maste	1e62d77c09	Eliminate explicit kdb_enter in the software watchdog handler (which produced incorrect behaviour with the KDB_UNATTENDED option) and call panic in both the KDB and non-KDB cases. This change is consistent with rwatson's current kdb/ddb work.	2007-05-28 19:51:12 +00:00
Robert Watson	dede2ab3b2	In kern_kevent(), unconditionally fdrop() fp once fget() has succeeded, as we never have an opportunity to set it to NULL. Found with: Coverity Prevent(tm) CID: 2161	2007-05-28 17:15:05 +00:00
Robert Watson	e1e8f51b85	Universally adopt most conventional spelling of acquire.	2007-05-27 20:50:23 +00:00
Robert Watson	87066f04c6	Select a more appealing spelling for the word acquire.	2007-05-27 19:24:00 +00:00
Robert Watson	1c293049d9	Add parens around free in free++ in mbp_count() so that mbp_count() actually works. mbp_count() turns out only to be used in debugging code in if_patm_intr.c, so this bug did not affect much in practice. Found with: Coverity Prevent(tm) CID: 1943	2007-05-27 17:38:36 +00:00
Robert Watson	097e1ea87f	Remove amountpipes counter for pipes -- this replicates the function of existing UMA statistics for pipes, and allows us to get rid of both the per-pipe dtor and two atomic operations per pipe required to maintain the counter.	2007-05-27 17:33:10 +00:00
Robert Watson	e4e80aa713	Remove #if 0'd check for 0-size allocations, which if enabled, called kdb_enter().	2007-05-27 13:13:46 +00:00
Pawel Jakub Dawidek	6e042171bd	To avoid a deadlock when handling .. directory during a lookup, we unlock parent vnode and relock it after locking child vnode. The problem was that we always relock it exclusively, even when it was share-locked. Discussed with: jeff	2007-05-25 22:23:38 +00:00
Pawel Jakub Dawidek	b4c85af977	We no longer need to put namecache entries onto temporary mplist. It was useful in revision 1.86, but should have been removed in 1.89.	2007-05-25 22:19:49 +00:00
Pawel Jakub Dawidek	950afe9972	The cache_leaf_test() function seems to be unused, so remove it.	2007-05-25 22:16:17 +00:00
Sam Leffler	3c86b7cdad	fix comment typo	2007-05-23 17:28:21 +00:00
Robert Watson	4dec0e67ea	Comment that tdsignal() may be entered from the debugger.	2007-05-23 17:27:42 +00:00
Robert Watson	63d69d2592	Initialize time_lock before calling cpu_initclocks(). This corrects a race condition in which hardclock fires before the mutex is initialized leading to a "corrupt spinlock" panic. Submitted by: attilio	2007-05-23 17:27:01 +00:00
Olivier Houchard	302e130edc	Remove duplicate includes. Submitted by: Cyril Nguyen Huu <cyril ci0 org>	2007-05-23 13:36:02 +00:00
Pawel Jakub Dawidek	f013ccb768	- Remove redundant initialization. - Compare pointer with NULL.	2007-05-22 23:05:48 +00:00
Diomidis Spinellis	fdbe5babe4	Increase precision of time values in the process accounting structure, while maintaining backward compatibility with legacy file and record formats.	2007-05-22 06:51:38 +00:00
Jeff Roberson	8b98fec903	- Move clock synchronization into a seperate clock lock so the global scheduler lock is not involved. sched_lock still protects the sched_clock call. Another patch will remedy this. Contributed by: Attilio Rao <attilio@FreeBSD.org> Tested by: kris, jeff	2007-05-20 22:11:50 +00:00
John Baldwin	7ec137e5b0	Rename the macros for assertion flags passed to sx_assert() from SX_* to SA_* to match mutexes and rwlocks. The old flags still exist for backwards compatiblity. Requested by: attilio	2007-05-19 21:26:05 +00:00
Andre Oppermann	1c9dbd1567	In kern_sendfile() adjust byte accounting of the file sending loop to ignore the size of any headers that were passed with the sendfile(2) system call. Otherwise the file sent will be truncated by the header size if the nbytes parameter was provided. The bug doesn't show up when either nbytes is zero, meaning send the whole file, or no header iovec is provided. Resolve a potential error aliasing of errors from the VM and sf_buf parts and the protocol send parts where an error of the latter over- writes one of the former. Update comments. The byte accounting bug wasn't seen in earlier because none of the popular sendfile(2) consumers, Apache, lighttpd and our ftpd(8) use it in modes that trigger it. The varnish HTTP proxy makes full use of it and exposed the problem. Bug found by: phk Tested by: phk	2007-05-19 20:50:59 +00:00
John Baldwin	71a95881d4	Expose sx_xholder() as a public macro. It returns a pointer to the thread that holds the current exclusive lock, or NULL if no thread holds an exclusive lock. Requested by: pjd	2007-05-19 20:18:12 +00:00
John Baldwin	46a8b9cbc9	Oops, didn't include SX_ADAPTIVESPIN in the list of valid flags for the assert in sx_init_flags(). Submitted by: attilio	2007-05-19 18:34:24 +00:00
John Baldwin	b0d673254d	Add a new SX_RECURSE flag to make support for recursive exclusive locks conditional. By default, sx(9) locks are back to not supporting recursive exclusive locks. Submitted by: attilio	2007-05-19 16:35:27 +00:00
Alexander Kabaev	ee9f46615e	Add kern.arnd sysctl. SSP code uses it to initialize the stack guard magic value. Submitted by: Jeremie Le Hen <jeremie@le-hen.org>	2007-05-19 04:53:14 +00:00
Robert Watson	eefc497257	Remove unnecessary assignment. CID: 2227 Found with: Coverity Prevent(tm)	2007-05-18 21:10:08 +00:00
John Baldwin	da7d0d1e24	Fix a comment.	2007-05-18 15:05:41 +00:00
John Baldwin	c91fcee75d	Move lock_profile_object_{init,destroy}() into lock_{init,destroy}().	2007-05-18 15:04:59 +00:00
Konstantin Belousov	d413d21071	Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.	2007-05-18 13:02:13 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Jeff Roberson	2b7e2ee7a5	- Convert turnstiles and sleepqueus to use UMA. This provides a modest speedup and will be more useful after each gains a spinlock in the impending thread_lock() commit. - Move initialization and asserts into init/fini routines. fini routines are only needed in the INVARIANTS case for now. Submitted by: Attilio Rao <attilio@FreeBSD.org> Tested by: kris, jeff	2007-05-18 06:32:24 +00:00
Peter Wemm	c6b342f820	Eliminate a micro-optimization that hasn't had any effect for 15+ years.	2007-05-17 15:31:14 +00:00
Warner Losh	3627f73782	Don't export a kern.conftxt sysctl, except when INCLUDE_CONF_FILE is defined. This restores the old behavior, and eliminates the dependency on the kernconf.tmpl when INCLUDE_CONFIG_FILE isn't included in the kernel config. There were many people in the terminal room that had almost, but not quite, up-to-date config files that this helps. I don't know if this is the result of skew among the cvsup servers, or some other more subtle problem. However, this fix should work for any config of recent vintage (I tested with the latest, and one before the recent changes, and eye-balled the intermediate versions). Reviewed by: the terminal room crew	2007-05-17 05:05:12 +00:00
Robert Watson	d19e16a72c	Generally migrate to ANSI function headers, and remove 'register' use.	2007-05-16 20:41:08 +00:00
Wojciech A. Koszek	5f9974ae57	Handle !INCLUDE_CONFIG_FILE entirely in the kernel. This should make some developers happy, since it will let them to use old config(8) with newer kernels. Reviewed by: imp Approved by: imp	2007-05-16 16:08:04 +00:00
John Baldwin	19059a13ed	Rework the support for ABIs to override resource limits (used by 32-bit processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit. MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected). MFC after: 1 week	2007-05-14 22:40:04 +00:00
John Baldwin	1bba2a940b	Move cpu_exit() earlier in exit1() to close a race between SIGCHLD/kevent(2) notification of process termination and wait(). Now we no longer drop locks between sending the notification and marking the process as a zombie. Previously, if another process attempted to do a wait() with W_NOHANG after receiving a SIGCHLD or kevent and locked the process while the exiting thread was in cpu_exit(), then wait() would fail to find the process, which is quite astonishing to the process calling wait(). MFC after: 3 days	2007-05-14 22:21:58 +00:00
Kirk McKusick	2f79c90982	Update entries for building tags.	2007-05-13 18:21:54 +00:00
Wojciech A. Koszek	744b947ef8	Improve INCLUDE_CONFIG_FILE support. This change will let us to have full configuration of a running kernel available in sysctl: sysctl -b kern.conftxt The same configuration is also contained within the kernel image. It can be obtained with: config -x <kernelfile> Current functionality lets you to quickly recover kernel configuration, by simply redirecting output from commands presented above and starting kernel build procedure. "include" statements are also honored, which means options and devices from included files are also included. Please note that comments from configuration files are not preserved by default. In order to preserve them, you can use -C flag for config(8). This will bring configuration file and included files literally; however, redirection to a file no longer works directly. This commit was followed by discussion, that took place on freebsd-current@. For more details, look here: http://lists.freebsd.org/pipermail/freebsd-current/2007-March/069994.html http://lists.freebsd.org/pipermail/freebsd-current/2007-May/071844.html Development of this patch took place in Perforce, hierarchy: //depot/user/wkoszek/wkoszek_kconftxt/ Support from: freebsd-current@ (links above) Reviewed by: imp@ Approved by: imp@	2007-05-12 19:38:18 +00:00
Andre Oppermann	0489b64c5e	Make the TCP timer callout obtain Giant if the network stack is marked as non-mpsafe. This change is to be removed when all protocols are mp-safe.	2007-05-11 20:52:47 +00:00
Robert Watson	08d73f1370	Remove more one more stale comment regarding unpcb type-safety.	2007-05-11 12:28:45 +00:00
Robert Watson	d7924b7086	Clarify and update quite a few comments to reflect locking optimizations, the addition of unpcb refcounts, and bug fixes. Some of these fixes are appropriate for MFC. MFC after: 3 days	2007-05-11 12:10:45 +00:00
John Baldwin	0026c92c3e	Add destroyed cookie values for sx locks and rwlocks as well as extra KASSERTs so that any lock operations on a destroyed lock will panic or hang.	2007-05-08 21:51:37 +00:00
John Baldwin	c0bfd70306	Teach 'show lock' to properly handle a destroyed mutex.	2007-05-08 21:50:46 +00:00
John Baldwin	9fa7ce0f23	Fix a potential LOR with sx_sleep() and cv_wait() with sx locks by 1) adding the thread to the sleepq via sleepq_add() before dropping the lock, and 2) dropping the sleepq lock around calls to lc_unlock() for sleepable locks (i.e. locks that use sleepq's in their implementation).	2007-05-08 21:49:59 +00:00
Pyun YongHyeon	ccd8d954f3	Add missing socket buffer unlock before returning to userland. Reviewed by: rwatson	2007-05-08 12:34:14 +00:00
Paolo Pisati	bafe5a3118	Bring in the reminaing bits to make interrupt filtering work: o push much of the i386 and amd64 MD interrupt handling code (intr_machdep.c::intr_execute_handlers()) into MI code (kern_intr.c::ithread_loop()) o move filter handling to kern_intr.c::intr_filter_loop() o factor out the code necessary to mask and ack an interrupt event (intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()), and make them part of 'struct intr_event', passing them as arguments to kern_intr.c::intr_event_create(). o spawn a private ithread per handler (struct intr_handler::ih_thread) with filter and ithread functions. Approved by: re (implicit?)	2007-05-06 17:02:50 +00:00
Wojciech A. Koszek	9e2894466a	Don't acquire Giant unconditionally. Reviewed by: rwatson	2007-05-06 12:00:38 +00:00
Konstantin Belousov	5c76452f8f	Mark the filedescriptor table entries with VOP_OPEN being performed for them as UF_OPENING. Disable closing of that entries. This should fix the crashes caused by devfs_open() (and fifo_open()) dereferencing struct file * by index, while the filedescriptor is closed by parallel thread. Idea by: tegge Reviewed by: tegge (previous version of patch) Tested by: Peter Holm Approved by: re (kensmith) MFC after: 3 weeks	2007-05-04 14:23:29 +00:00
Robert Watson	7abab91135	sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.	2007-05-03 14:42:42 +00:00
Alan Cox	fa75abb0d2	Remove unneeded include files.	2007-05-01 06:35:54 +00:00
John-Mark Gurney	ebf750a9fd	Complete removal of restriction about overlaps to rman_manage_region: remove comment and man page verbage... Document return values for rman_init and rman_manage_region.. MFC after: 1 week	2007-04-28 07:37:49 +00:00
John Baldwin	06e043fb20	Avoid a lot of code duplication by using kern_open() to open /dev/null in fdcheckstd() instead of a stripped down version of kern_open()'s code. MFC after: 1 week Reviewed by: cperciva	2007-04-26 18:01:19 +00:00
Konstantin Belousov	e5ea32c290	Allow the dounmount() to proceed even for doomed coveredvp. In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp vnode may be VI_DOOMED due to one of the following: - other thread finished unmount and vput()ed it, and vnode was chosen for recycling, while vn_lock() slept; - forced unmount of the coveredvp->v_mount fs. In the first case, next check for changed v_mountedhere or mnt_gen counter would be successfull. In the second case, the unmount shall be allowed. Submitted by: sobomax MFC after: 2 weeks	2007-04-26 08:56:56 +00:00
Konstantin Belousov	8e68f804a7	Disable nesting of BOP_BDFLUSH(). VOP_FSYNC() call in bdwrite() could result in bdwrite() being reentered, thus causing infinite recursion. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 2 weeks	2007-04-24 10:59:21 +00:00
Pawel Jakub Dawidek	8c804c7c98	Correct typo.	2007-04-23 12:53:00 +00:00
Robert Watson	c14d15ae3e	Remove MAC Framework access control check entry points made redundant with the introduction of priv(9) and MAC Framework entry points for privilege checking/granting. These entry points exactly aligned with privileges and provided no additional security context: - mac_check_sysarch_ioperm() - mac_check_kld_unload() - mac_check_settime() - mac_check_system_nfsd() Add mpo_priv_check() implementations to Biba and LOMAC policies, which, for each privilege, determine if they can be granted to processes considered unprivileged by those two policies. These mostly, but not entirely, align with the set of privileges granted in jails. Obtained from: TrustedBSD Project	2007-04-22 15:31:22 +00:00
Stephane E. Potvin	0e5179e441	Add support for specifying a minimal size for vm.kmem_size in the loader via vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will be at least 256mb (for example) without forcing a particular value via vm.kmem_size. Approved by: njl (mentor) Reviewed by: alc	2007-04-21 01:14:48 +00:00
Pawel Jakub Dawidek	eed20b37f5	Don't reinvent vm_page_grab(). Reviewed by: ups	2007-04-20 19:49:20 +00:00
Kip Macy	fb1e3ccd7e	Schedule the ithread on the same cpu as the interrupt Tested by: kmacy Submitted by: jeffr	2007-04-20 05:45:46 +00:00
Joseph Koshy	382d30cdd8	Fix witness(4) warnings about mutex use. Group mutexes used in hwpmc(4) into 3 "types" in the sense of witness(4): - leaf spin mutexes---only one of these should be held at a time, so these mutexes are specified as belonging to a single witness type "pmc-leaf". - `struct pmc_owner' descriptors are protected by a spin mutex of witness type "pmc-owner-proc". Since we call wakeup_one() while holding these mutexes, the witness type of these mutexes needs to dominate that of "sleepq chain" mutexes. - logger threads use a sleep mutex, of type "pmc-sleep". Submitted by: wkoszek (earlier patch)	2007-04-19 08:02:51 +00:00
Pawel Jakub Dawidek	fb1daf8164	Fix a bug in sendfile(2) when files larger than page size and nbytes=0. When nbytes=0, sendfile(2) should use file size. Because of the bug, it was sending half of a file. The bug is that 'off' variable can't be used for size calculation, because it changes inside the loop, so we should use uap->offset instead.	2007-04-19 05:54:45 +00:00
Nate Lawson	0ae62c18a0	Bump the interrupt storm detection counter to 1000. My slow fileserver gets a bogus irq storm detected when periodic daily kicks off at 3 am and disconnects the disk. Change the print logic to print once per second when the storm is occurring instead of only once. Otherwise, it appeared that something else was causing the errors each night at 3 am since the print only occurred the first time. Reviewed by: jhb MFC after: 1 week	2007-04-19 01:24:32 +00:00
Pawel Jakub Dawidek	7760d8409f	Export vfs_mount_alloc() as it is used in ZFS.	2007-04-17 21:14:06 +00:00
John Baldwin	2248f68064	- Add a 'show rman <rm>' DDB command to dump the resources in a resource manager similar to 'devinfo -u'. - Add a 'show allrman' DDB command that effectively does 'show rman' on all resource managers in the system.	2007-04-16 21:09:03 +00:00
Kip Macy	21ee3e7aff	remove now invalid check from m_sanity panic on m_sanity check failure with INVARIANTS	2007-04-14 20:19:16 +00:00
Pawel Jakub Dawidek	24b0502ee0	Fix jails and jail-friendly file systems handling: - We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail. - Move security checks to vfs_suser() and deny unmounting and updating for jailed root from different jails, etc. OK'ed by: rwatson	2007-04-13 23:54:22 +00:00
Pawel Jakub Dawidek	6bc3ab2574	When we are running low on vnodes, there is currently no way to ask other subsystems to release some vnodes. Implement backpressure based on vfs_lowvnodes event (similar to vm_lowmem for memory).	2007-04-13 08:38:48 +00:00
Robert Watson	94b94b2b49	Remove now-obsolete comment regarding mqueue privileges in jail.	2007-04-11 16:22:59 +00:00
Robert Watson	4b08405682	Allow PRIV_NETINET_REUSEPORT in jail.	2007-04-10 15:59:49 +00:00
Robert Watson	9956b3f5e4	Do allow POSIX mqueue unlink privilege inside a jail, as we all all other POSIX mqueue privileges inside a jail.	2007-04-10 15:40:27 +00:00
Pawel Jakub Dawidek	08be819487	Minor style cleanups (mostly removal of trailing whitespaces).	2007-04-10 15:29:37 +00:00
Pawel Jakub Dawidek	21ff8c6715	Correct typos.	2007-04-10 15:22:40 +00:00
Nate Lawson	a363f67a81	Restore the locking for the sleep/wakeup to avoid waiting an extra 1 sec if a race was lost. We're still single-threaded at this point, but just be safe for the future.	2007-04-09 21:10:04 +00:00
Nate Lawson	6b1e469ea5	Clean up the root mount and mount wait code. No mutexes are needed here since a spurious wakeup() is the only possible outcome and this is fine in the BSD programming model.	2007-04-09 19:23:52 +00:00
Pawel Jakub Dawidek	82068fe7a9	Add kern.hostuuid sysctl, which will be used to keep host's UUID. Reviewed by: mlaier, rink, brooks, rwatson	2007-04-09 19:18:09 +00:00
Pawel Jakub Dawidek	2eb68d493f	Add root_mounted() function that returns true if the root file system is already mounted.	2007-04-08 23:54:01 +00:00
Pawel Jakub Dawidek	c2cda60911	prison_free() can be called with a mutex held. This wasn't a problem until I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR, move prison removal to prison_complete() entirely. To ensure that noone will reference this prison before it's beeing removed from the list skip prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to greater than 0 in prison_hold(). Reported by: kris OK'ed by: rwatson	2007-04-08 10:46:23 +00:00
Pawel Jakub Dawidek	b63b0c6529	Only use prison mutex to protect the fields that need to be protected by it.	2007-04-08 10:21:38 +00:00
Pawel Jakub Dawidek	264de85e73	pr_list is protected by the allprison_lock.	2007-04-08 02:13:32 +00:00
Robert Watson	7b20aa9ca6	Remove XXX comment that changes to file fields should be protected with the file lock rather than the filedesc lock: I fixed this in the last revision. Spotted by: kris	2007-04-06 23:31:30 +00:00
Pawel Jakub Dawidek	028e84c68b	allprison mutex was converted to sx(9) lock.	2007-04-05 23:32:32 +00:00
Pawel Jakub Dawidek	dc68a63332	Implement functionality I called 'jail services'. It may be used for external modules to attach some data to jail's in-kernel structure. - Change allprison_mtx mutex to allprison_sx sx(9) lock. We will need to call external functions while holding this lock, which may want to allocate memory. Make use of the fact that this is shared-exclusive lock and use shared version when possible. - Implement the following functions: prison_service_register() - registers a service that wants to be noticed when a jail is created and destroyed prison_service_deregister() - deregisters service prison_service_data_add() - adds service-specific data to the jail structure prison_service_data_get() - takes service-specific data from the jail structure prison_service_data_del() - removes service-specific data from the jail structure Reviewed by: rwatson	2007-04-05 23:19:13 +00:00
Pawel Jakub Dawidek	54b369c1ae	Make prison_find() globally accessible.	2007-04-05 21:34:54 +00:00
Pawel Jakub Dawidek	f6521d1c31	Implement SEEK_DATA and SEEK_HOLE extensions to lseek(2) as found in OpenSolaris. For more information please refer to: http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data	2007-04-05 21:10:53 +00:00
Pawel Jakub Dawidek	f3a8d2f93c	Add security.jail.mount_allowed sysctl, which allows to mount and unmount jail-friendly file systems from within a jail. Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user. It is turned off by default. A jail-friendly file system is a file system which driver registers itself with VFCF_JAIL flag via VFS_SET(9) API. The lsvfs(1) command can be used to see which file systems are jail-friendly ones. There currently no jail-friendly file systems, ZFS will be the first one. In the future we may consider marking file systems like nullfs as jail-friendly. Reviewed by: rwatson	2007-04-05 21:03:05 +00:00
Kip Macy	0f4d9d04ea	Fix mb_ctor_clust and mb_dtor_clust to reference the appropriate zone, simplify setting refcnt Reviewed by: andre, rwatson, and glebius MFC after: 3 days	2007-04-04 21:27:01 +00:00
Robert Watson	5e3f7694b1	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
Kip Macy	59a31e6acf	fix typo	2007-04-04 00:11:22 +00:00
Kip Macy	e2bc106690	style fixes and make sure that the lock is treated as released in the sharers == 0 case not that this is somewhat racy because a new sharer can come in while we're updating stats	2007-04-04 00:01:05 +00:00
Kip Macy	afc0bfbd90	Fixes to sx for newsx - fix recursed case and move out of inline Submitted by: Attilio Rao <attilio@freebsd.org>	2007-04-03 22:58:21 +00:00
Kip Macy	70fe8436c8	move lock_profile calls out of the macros and into kern_mutex.c add check for mtx_recurse == 0 when releasing sleep lock	2007-04-03 22:52:31 +00:00
Kip Macy	8289600ce7	skip call to _lock_profile_obtain_lock_success entirely if acquisition time is non-zero (i.e. recursing or adding sharers)	2007-04-03 18:36:27 +00:00
Pawel Jakub Dawidek	afd894bb12	Add root_mount_wait() function which can be used to wait until the root file system is mounted. This is useful for kernel modules loaded from /boot/loader.conf, that have to access file system.	2007-04-03 11:45:28 +00:00
John Baldwin	1ce2bc9187	Fix a fd leak in socketpair(): - Close the new file objects created during socketpair() if the copyout of the new file descriptors fails. - Add a test to the socketpair regression test for this edge case.	2007-04-02 19:15:47 +00:00
John Baldwin	ebb3c22c16	Don't go to a whole lot of extra work to handle the race where the new file descriptor is closed out from under us in kern_open(). This race is already handled and the file will be closed when kern_open() does an fdrop just before returning.	2007-04-02 13:40:38 +00:00
Wojciech A. Koszek	4850546f51	ng_node and ng_worklist locks both migrated from being spinning locks to adaptive mutexes. Let witness(4) calm down and bring proper types of those locks to the lock order database. Glanced at by: rwatson	2007-04-01 15:48:10 +00:00
Pawel Jakub Dawidek	5c1c2e82e2	I think the code I'm removing here is completely bogus. vfs_flags field is used for VFCF_* flags which are given at file system driver creation time (via VFS_SET(9)) macro. What this code did was bascially this: If file system registers itself with VFCF_UNICODE flag (stores file names as Unicode), it will gain MNT_SOFTDEP flag (UFS soft-updates). If file system registers itself with VFCF_LOOPBACK flag (aliases some other mounted FS), it will gain MNT_SUIDDIR flag (special handling of SUID on dirs). The latter will be quite dangerous, but those flags are reset later in vfs_domount(). MFC after: 1 month	2007-04-01 13:08:05 +00:00
Pawel Jakub Dawidek	def72fbba1	Now that the vdropl() function is public, assert that the vnode interlock is held.	2007-04-01 10:45:32 +00:00
Dag-Erling Smørgrav	e6534b36d8	Make vdropl() public; zfs needs it. There is also plenty of existing file system code (mostly _reclaim()) which look like this: VOP_LOCK(vp); / examine vp / VOP_UNLOCK(vp); vdrop(vp); This can now be rewritten to: VOP_LOCK(vp); / examine vp / vdropl(vp); / will unlock vp */ MFC after: 1 week	2007-03-31 23:57:17 +00:00
John Baldwin	4e7f640dfb	Optimize sx locks to use simple atomic operations for the common cases of obtaining and releasing shared and exclusive locks. The algorithms for manipulating the lock cookie are very similar to that rwlocks. This patch also adds support for exclusive locks using the same algorithm as mutexes. A new sx_init_flags() function has been added so that optional flags can be specified to alter a given locks behavior. The flags include SX_DUPOK, SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature to the similar flags for mutexes. Adaptive spinning on select locks may be enabled by enabling the ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN flag via sx_init_flags() will adaptively spin. The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock() are now performed inline in non-debug kernels. As a result, <sys/sx.h> now requires <sys/lock.h> to be included prior to <sys/sx.h>. The new kernel option SX_NOINLINE can be used to disable the aforementioned inlining in non-debug kernels. The size of struct sx has changed, so the kernel ABI is probably greatly disturbed. MFC after: 1 month Submitted by: attilio Tested by: kris, pjd	2007-03-31 23:23:42 +00:00
Pawel Jakub Dawidek	695919ad9a	Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them.	2007-03-31 22:44:45 +00:00
Robert Watson	e92d773fbc	Rather than ignoring any error return from getnewvnode() in nameiinit(), explicitly test and panic. This should not ever happen, but if it does, this is a preferred failure mode to a NULL pointer dereference in kernel. Coverity CID: 1716 Found with: Coverity Prevent(tm)	2007-03-31 16:08:50 +00:00
John Baldwin	b80ad3eea1	- Drop memory barriers in rw_try_upgrade(). We don't need an 'acq' memory barrier here as the earlier rw_rlock() already contained one. - Comment fix.	2007-03-30 18:08:55 +00:00
John Baldwin	ab2dab1680	- Use lock_init/lock_destroy() to setup the lock_object inside of lockmgr. We can now use LOCK_CLASS() as a stronger check in lockmgr_chain() as a result. This required putting back lk_flags as lockmgr's use of flags conflicted with other flags in lo_flags otherwise. - Tweak 'show lock' output for lockmgr to match sx, rw, and mtx.	2007-03-30 18:07:24 +00:00
Wojciech A. Koszek	2404c938e6	vm_map_delete should be used only internally, by the VM subsystem. Replace it with vm_map_remove, which not only embeds additional check, but also takes care of locking. Reviewed by: alc Approved by: alc, cognet (mentor)	2007-03-29 13:26:13 +00:00
John Baldwin	4649e92b4e	Align 'struct thread' on 16 byte boundaries so that the lower 4 bits are always 0. Previously we aligned threads on a minimum of 8-byte boundaries. Note: This changes the uma zone to no longer cache align threads. We really want the uma zone to do align threads to MAX(16, cache line size) but there currently isn't a good way to express that to uma. Submitted by: attilio	2007-03-27 16:51:34 +00:00
Marcel Moolenaar	f3ea971bf0	PowerPC is the only architecture with mpsafe_vfs=0. This is now broken. Rudimentary tests show that PowerPC can run with mpsafe_vfs=1. Make it so...	2007-03-27 05:29:41 +00:00
Nate Lawson	0d4ac62a35	Add an interface for drivers to be notified of changes to CPU frequency. cpufreq_pre_change is called before the change, giving each driver a chance to revoke the change. cpufreq_post_change provides the results of the change (success or failure). cpufreq_levels_changed gives the unit number of the cpufreq device whose number of available levels has changed. Hook in all the drivers I could find that needed it. * TSC: update TSC frequency value. When the available levels change, take the highest possible level and notify the timecounter set_cputicker() of that freq. This gets rid of the "calcru: runtime went backwards" messages. * identcpu: updates the sysctl hw.clockrate value * Profiling: if profiling is active when the clock changes, let the user know the results may be inaccurate. Reviewed by: bde, phk MFC after: 1 month	2007-03-26 18:03:29 +00:00
Ed Maste	caa8943810	Avoid manipulating semu_list outside of the scope of SEMUNDO_LOCK(). This would lead to an occasional hang with a cycle in semu_list. X-Discussed-On: hackers@	2007-03-26 17:41:14 +00:00
Robert Watson	8c799760e1	Following movement of functions from uipc_socket2.c to uipc_socket.c and uipc_sockbuf.c, clean up and update comments.	2007-03-26 17:05:09 +00:00
Robert Watson	20d9e5e87c	Complete removal of uipc_socket2.c by moving the last few functions to other C files: - Move sbcreatecontrol() and sbtoxsockbuf() to uipc_sockbuf.c. While sbcreatecontrol() is really an mbuf allocation routine, it does its work with awareness of the layout of socket buffer memory. - Move pru_*() protocol switch stubs to uipc_socket.c where the non-stub versions of several of these functions live. Likewise, move socket state transition calls (soisconnecting(), etc) to uipc_socket.c. Moveo sodupsockaddr() and sotoxsocket().	2007-03-26 08:59:03 +00:00
Kris Kennaway	0c9c08dd9c	Correct a comment typo	2007-03-25 10:07:23 +00:00
Kris Kennaway	bd37fd7220	Update a comment: we usually call exec_vmspace_new with Giant not held, but sometimes it is.	2007-03-25 10:05:44 +00:00
Ed Maste	13b762a304	Stop setting ki_ocomm (thread name) to the proc name by default, as nothing in the base system relies on this any longer.	2007-03-23 04:01:08 +00:00
John Baldwin	cd6e6e4e11	- Simplify the #ifdef's for adaptive mutexes and rwlocks by conditionally defining a macro earlier in the file. - Add NO_ADAPTIVE_RWLOCKS option to disable adaptive spinning for rwlocks.	2007-03-22 16:09:23 +00:00
Gleb Smirnoff	cd68a3f706	Move the dom_dispose and pru_detach calls in sofree() earlier. Only after calling pru_detach we can be absolutely sure, that we don't have any references to the socket in the stack. This closes race between lockless sbdestroy() and data arriving on socket. Reviewed by: rwatson	2007-03-22 13:21:24 +00:00
John Baldwin	8f27b08e87	Rename the cv_wait() functions to _cv_wait() and change their second argument from a mutex to a lock_object. Add cv_wait() wrapper macros that accept either a mutex, rwlock, or sx lock as the second argument and convert it to a lock_object and then call _cv_wait(). Basically, the visible difference is that you can now use rwlocks and sx locks with condition variables using the same API as with mutexes.	2007-03-21 22:22:13 +00:00
John Baldwin	aa89d8cd52	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.	2007-03-21 21:20:51 +00:00
John Baldwin	503916a7c1	Don't use cv_wait_unlock() to implement cv_wait(). Instead, implement cv_wait() fully and add missing KTRACE context switch traces.	2007-03-21 20:46:26 +00:00
John Baldwin	ecd8246189	If vn_open() fails during kern_open(), don't fdrop() the new file object until after the call to fdclose(). This closes an obscure race that could result in the later call to fdclose() actually closing a different file descriptor if another thread close()'s the file descriptor being opened before fdrop() is called, so the fdrop() in kern_open() frees the file object, then the second thread (or a third) creates a new file descriptor which reuses both the same index and the same file pointer thus tricking fdclose() in the first thread into thinking that the original file was still open. MFC after: 1 week	2007-03-21 19:32:08 +00:00
John Baldwin	6d257b6e70	Handle the case when a thread is blocked on a lockmgr lock with LK_DRAIN in DDB's 'show sleepchain'. MFC after: 3 days	2007-03-21 19:28:20 +00:00
Andre Oppermann	4e02375908	Maintain a pointer and offset pair into the socket buffer mbuf chain to avoid traversal of the entire socket buffer for larger offsets on stream sockets. Adjust tcp_output() make use of it. Tested by: gallatin	2007-03-19 18:35:13 +00:00
Pawel Jakub Dawidek	9a2fd584b4	Don't deny unmounting file systems for jailed processes immediately, allow prison_priv_check() to decide what to do. This change is suppose not to change current (security) behaviour in any way. This change is simlar to the change of PRIV_VFS_MOUNT in previous revision.	2007-03-18 02:39:19 +00:00
Jeff Roberson	52bc574cc7	- Handle the case where slptime == runtime. Submitted by: Atoine Brodin	2007-03-17 23:32:48 +00:00
Jeff Roberson	4499aff6ec	- Cast the intermediate value in priority computtion back down to unsigned char. Weirdly, casting the 1 constant to u_char still produces a signed integer result that is then used in the % computation. This avoids that mess all together and causes a 0 pri to turn into 255 % 64 as we expect. Reported by: kkenn (about 4 times, thanks)	2007-03-17 18:13:32 +00:00
John Baldwin	3076ca6720	Just use 'fdrop()' instead of 'FILE_LOCK(); fdrop_locked()' in dupfdopen(). While I'm at it, move the second fdrop() out from under the filedesc lock.	2007-03-15 21:19:21 +00:00
Pawel Jakub Dawidek	7533652025	Don't deny mounting for jailed processes immediately, allow prison_priv_check() to decide what to do. This change is suppose not to change current (security) behaviour in any way. Reviewed by: rwatson	2007-03-14 13:09:59 +00:00
Pawel Jakub Dawidek	f7d4e990c7	White space nits.	2007-03-14 12:54:10 +00:00
Konstantin Belousov	71d49316cc	Busy filesystem around call of VFS_QUOTACTL() vfs op. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-14 08:45:55 +00:00
John Baldwin	c1f2a5334d	Print readers count as unsigned in ddb 'show lock'. Submitted by: attilio	2007-03-13 16:51:27 +00:00
Tor Egge	61b9d89ff0	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib	2007-03-13 01:50:27 +00:00
John Baldwin	4b493b1a6d	Fix a typo.	2007-03-12 20:10:29 +00:00
John Baldwin	7568503421	- Use m_gethdr(), m_get(), and m_clget() instead of the macros in sosend_copyin(). - Use M_WAITOK instead of M_TRYWAIT in sosend_copyin(). - Don't check for NULL from M_WAITOK and return ENOBUFS. M_WAITOK/M_TRYWAIT allocations don't fail with NULL. Reviewed by: andre Requested by: andre (2)	2007-03-12 19:27:36 +00:00
Robert Watson	6e2faa2444	In uipc_close(), we no longer always free the unpcb, as the last reference may be dropped later. In this case, always unlock the unpcb so as not to leak the lock. Found by: kris (BugMagnet)	2007-03-12 14:52:00 +00:00
John Baldwin	6caa5f40a2	Use sx_sleep() in the main loop of the accounting kthread.	2007-03-09 23:29:31 +00:00
John Baldwin	e7573e7ad7	Allow threads to atomically release rw and sx locks while waiting for an event. Locking primitives that support this (mtx, rw, and sx) now each include their own foo_sleep() routine. - Rename msleep() to _sleep() and change it's 'struct mtx' object to a 'struct lock_object' pointer. _sleep() uses the recently added lc_unlock() and lc_lock() function pointers for the lock class of the specified lock to release the lock while the thread is suspended. - Add wrappers around _sleep() for mutexes (mtx_sleep()), rw locks (rw_sleep()), and sx locks (sx_sleep()). msleep() still exists and is now identical to mtx_sleep(), but it is deprecated. - Rename SLEEPQ_MSLEEP to SLEEPQ_SLEEP. - Rewrite much of sleep.9 to not be msleep(9) centric. - Flesh out the 'RETURN VALUES' section in sleep.9 and add an 'ERRORS' section. - Add __nonnull(1) to _sleep() and msleep_spin() so that the compiler will warn if you try to pass a NULL wait channel. The functions already have a KASSERT to that effect.	2007-03-09 22:41:01 +00:00
John Baldwin	6e21afd40c	Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes. These functions are intended to be used to drop a lock and then reacquire it when doing an sleep such as msleep(9). Both functions accept a 'struct lock_object *' as their first parameter. The 'lc_unlock' function returns an integer that is then passed as the second paramter to the subsequent 'lc_lock' function. This can be used to communicate state. For example, sx locks and rwlocks use this to indicate if the lock was share/read locked vs exclusive/write locked. Currently, spin mutexes and lockmgr locks do not provide working lc_lock and lc_unlock functions.	2007-03-09 16:27:11 +00:00
John Baldwin	3ff6d22988	Use C99-style struct member initialization for lock classes.	2007-03-09 16:19:34 +00:00
John Baldwin	ae8dde30c2	Use C99-style struct member initialization for lock classes.	2007-03-09 16:04:44 +00:00
Pawel Jakub Dawidek	2709e8904f	Minor simplification.	2007-03-09 05:22:10 +00:00
Mohan Srinivasan	f9bb753844	Over NFS, an open() call could result in multiple over-the-wire GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.	2007-03-09 04:02:38 +00:00
Julian Elischer	486a941418	Instead of doing comparisons using the pcpu area to see if a thread is an idle thread, just see if it has the IDLETD flag set. That flag will probably move to the pflags word as it's permenent and never chenges for the life of the system so it doesn't need locking.	2007-03-08 06:44:34 +00:00
Pawel Jakub Dawidek	9e5dcf7b21	White space nits.	2007-03-07 21:24:51 +00:00
John Baldwin	ddb38a1f3d	Fix some nits in lock profiling for rwlocks: - Properly note when a read lock is released. - Always note when we contest on a read lock. - Only note success of obtaining read locks for the first reader to match the behavior of sx(9). Reviewed by: kmacy	2007-03-07 20:48:48 +00:00
Julian Elischer	1d820a15b8	After the last change to KSE threading a bug was introduced where all threads were counted against the count of upcall capable threads. this changes the way we do this accounting.	2007-03-07 20:17:41 +00:00
Olivier Houchard	aed12d5ff8	Backout rev 1.17, msleep() can't be used with a spinlock. Pointy hat to: cognet	2007-03-06 12:08:38 +00:00
Robert Watson	b5368498b5	Replay minor system call comment cleanup applied to kern_acl.c in a race with repo-copy of kern_acl.c to vfs_acl.c.	2007-03-05 13:26:07 +00:00
Robert Watson	e6f5470468	Recognize repo-copy of kern_acl.c to vfs_acl.c, remove kern_acl.c, remove kern_acl.c from the build, connect vfs_acl.c to the build. Thanks to: joe	2007-03-05 13:24:01 +00:00
Robert Watson	873fbcd776	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.	2007-03-05 13:10:58 +00:00
Wojciech A. Koszek	59f65a4ba6	Change these descriptions of memory types used in malloc(9), as their current, rather long strings make output from vmstat -m look unpleasant. Approved by: cognet (mentor)	2007-03-05 00:21:40 +00:00
Wojciech A. Koszek	d348f4d384	Use msleep(9) instead of tsleep(9) surrounded by lock acquisition and release. Approved by: cognet (mentor)	2007-03-04 23:40:35 +00:00
Robert Watson	0c14ff0eb5	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.	2007-03-04 22:36:48 +00:00
Robert Watson	1a5d072b76	Move to ANSI C function headers. Re-wrap some comments.	2007-03-04 17:50:46 +00:00
John Baldwin	e41bcf3cfc	- Don't do the interrupt storm protection stuff for software interrupt handlers. - Use pause() when throtting during an interrupt storm. Reported by: kris (1)	2007-03-02 17:01:45 +00:00
Kip Macy	c66d760608	lock stats updates need to be protected by the lock	2007-03-02 07:21:20 +00:00
Pawel Jakub Dawidek	bb531912ff	Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better describe the privilege. OK'ed by: rwatson	2007-03-01 20:47:42 +00:00
Bruce M Simpson	6f7ca813c4	Do not dispatch SIGPIPE from the generic write path for a socket; with this patch the code behaves according to the comment on the line above. Without this patch, a socket could cause SIGPIPE to be delivered to its process, once with SO_NOSIGPIPE set, and twice without. With this patch, the kernel now passes the sigpipe regression test. Tested by: Anton Yuzhaninov MFC after: 1 week	2007-03-01 19:20:25 +00:00
Kip Macy	a5bceb77f2	Evidently I've overestimated gcc's ability to peak inside inline functions and optimize away unused stack values. The 48 bytes that the lock_profile_object adds to the stack evidently has a measurable performance impact on certain workloads.	2007-03-01 09:35:48 +00:00
Robert Watson	ede6e136f8	Remove two simultaneous acquisitions of multiple unpcb locks from uipc_send in cases where only a global read lock is held by breaking them out and avoiding the unpcb lock acquire in the common case. This avoids deadlocks which manifested with X11, and should also marginally further improve performance. Reported by: sepotvin, brooks	2007-03-01 09:00:42 +00:00
Robert Watson	3592fd4de5	Lock unp2 after checking for a non-NULL unp2 pointer in uipc_send() on datagram UNIX domain sockets, not before.	2007-02-28 08:08:50 +00:00
John Baldwin	1a4435ee0e	Print tid's rather than thread pointers in KTR_PROC traces.	2007-02-27 18:46:07 +00:00
John Baldwin	4d70511ac3	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
John Baldwin	84d37a463a	Use pause() rather than tsleep() on explicit global dummy variables.	2007-02-27 17:22:30 +00:00

... 5 6 7 8 9 ...

10431 Commits