freebsd-nq

Author	SHA1	Message	Date
David Xu	3d5c30f7c2	Inherit signal mask for child process in fork1(), RELENG_4 and other *BSD have this behaviour, also it is required by POSIX. PR: kern/80130 Submitted by: Kostik Belousov konstantin.belousov at zoral dot com dot ua	2005-04-20 13:14:52 +00:00
Matthew N. Dodd	96a041b533	Check sopt_level in uipc_ctloutput() and return early if it is non-zero. This prevents unintended consequnces when an application calls things like setsockopt(x, SOL_SOCKET, SO_REUSEADDR, ...) on a Unix domain socket.	2005-04-20 02:57:56 +00:00
Pawel Jakub Dawidek	f163441e7e	Call g_waitidle() before every check the list of holds is empty. Suggested by: phk	2005-04-19 21:44:44 +00:00
David Xu	902c0d8297	Clear P_STATCHILD earlier to avoid unnecessary retrying.	2005-04-19 12:31:15 +00:00
David Xu	407948a530	Oops, forgot to update this file. Fix a race condition between kern_wait() and thread_stopped(). Problem is in kern_wait(), parent process steps through children list, once a child process is skipped, and later even if the child is stopped, parent process still sleeps in msleep(), the race happens if parent masked SIGCHLD. Submitted by : Peter Edwards peadar.edwards at gmail dot com MFC after : 4 days	2005-04-19 08:11:28 +00:00
David Xu	95992d56f5	Fix a race condition between kern_wait() and thread_stopped(). Problem is in kern_wait(), parent process steps through children list, once a child process is skipped, and later even if the child is stopped, parent process still sleeps in msleep(), the race happens if parent masked SIGCHLD. Submitted by : Peter Edwards peadar.edwards at gmail dot com MFC after : 4 days	2005-04-19 08:07:28 +00:00
Poul-Henning Kamp	d1c712ede2	Call g_waitidle() instead of GEOM using the root_mount_hold() KPI. GEOM could (and will) get events as a result of drivers coming in late so a one-shot method is not good enough for GEOM.	2005-04-19 06:23:59 +00:00
Joseph Koshy	ebccf1e3a6	Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)	2005-04-19 04:01:25 +00:00
Poul-Henning Kamp	73fbaa74e5	Add a named reference-count KPI to hold off mounting of the root filesystem. While we wait for holds to be released, print a list of who holds us back once per second. Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle(). Use the new KPI also from ata. With ATAmkIII's newbusification, ata could narrowly miss the window and ad0 would not exist when we tried to mount root.	2005-04-18 21:21:26 +00:00
Poul-Henning Kamp	bdb3564638	Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready earlier.	2005-04-18 21:11:47 +00:00
Robert Watson	babe9a2bb3	Introduce p_canwait() and MAC Framework and MAC Policy entry points mac_check_proc_wait(), which control the ability to wait4() specific processes. This permits MAC policies to limit information flow from children that have changed label, although has to be handled carefully due to common programming expectations regarding the behavior of wait4(). The cr_seeotheruids() check in p_canwait() is #if 0'd for this reason. The mac_stub and mac_test policies are updated to reflect these new entry points. Sponsored by: SPAWAR, SPARTA Obtained from: TrustedBSD Project	2005-04-18 13:36:57 +00:00
Robert Watson	8e37dd2bb9	Remove end-of-line tabs. MFC after: 3 days	2005-04-18 11:51:10 +00:00
David Schultz	fe769cdd95	Add a sysctl that returns the full path of a process' text file. This information is needed by things like `gdb -p' and Sun's javac, and previously it could only be obtained via procfs	2005-04-18 02:10:37 +00:00
Robert Watson	7f53207b92	Introduce three additional MAC Framework and MAC Policy entry points to control socket poll() (select()), fstat(), and accept() operations, required for some policies: poll() mac_check_socket_poll() fstat() mac_check_socket_stat() accept() mac_check_socket_accept() Update mac_stub and mac_test policies to be aware of these entry points. While here, add missing entry point implementations for: mac_stub.c stub_check_socket_receive() mac_stub.c stub_check_socket_send() mac_test.c mac_test_check_socket_send() mac_test.c mac_test_check_socket_visible() Obtained from: TrustedBSD Project Sponsored by: SPAWAR, SPARTA	2005-04-16 18:46:29 +00:00
Robert Watson	f0c2044bd9	In mac_get_fd(), remove unconditional acquisition of Giant around copying of the socket label to thread-local storage, and replace it with conditional acquisition based on debug.mpsafenet. Acquire the socket lock around the copy operation. In mac_set_fd(), replace the unconditional acquisition of Giant with the conditional acquisition of Giant based on debug.mpsafenet. The socket lock is acquired in mac_socket_label_set() so doesn't have to be acquired here. Obtained from: TrustedBSD Project Sponsored by: SPAWAR, SPARTA	2005-04-16 18:33:13 +00:00
Marius Strobl	ea35b592d4	Increase default HZ for sparc64 to 1000.	2005-04-16 15:07:41 +00:00
Robert Watson	030a28b3b5	Introduce new MAC Framework and MAC Policy entry points to control the use of system calls to manipulate elements of the process credential, including: setuid() mac_check_proc_setuid() seteuid() mac_check_proc_seteuid() setgid() mac_check_proc_setgid() setegid() mac_check_proc_setegid() setgroups() mac_check_proc_setgroups() setreuid() mac_check_proc_setreuid() setregid() mac_check_proc_setregid() setresuid() mac_check_proc_setresuid() setresgid() mac_check_rpoc_setresgid() MAC checks are performed before other existing security checks; both current credential and intended modifications are passed as arguments to the entry points. The mac_test and mac_stub policies are updated. Submitted by: Samy Al Bahra <samy@kerneled.org> Obtained from: TrustedBSD Project	2005-04-16 13:29:15 +00:00
Robert Watson	e551d45211	Modify the alq(9) alq_open() API to accept a file creation mode, rather than defaulting the cmode argument to vn_open() to 0. Supply a default argument of ALQ_DEFAULT_CMODE (0600) in current callers. Discussed with/pointed out by: hmp Reveiwed by: jeff, hmp MFC after: 3 days	2005-04-16 12:12:27 +00:00
Maxim Konovalov	f305048664	Fix a typo in the comment. Noticed by: Samy Al Bahra	2005-04-15 14:01:43 +00:00
John Baldwin	95b66e9e53	Close a race between sleepq_broadcast() and sleepq_catch_signals(). Specifically, sleepq_broadcast() uses td_slpq for its private pending queue of threads that it is going to wake up after it takes them off the sleep queue. The problem is that if one of the threads is actually not asleep yet, then we can end up with td_slpq being corrupted and/or the thread being made runnable at the wrong time resulting in the td_sleepqueue == NULL assertion failures occasionally reported under heavy load. The fix is to stop being so fancy and ditch the whole pending queue bit. Instead, sleepq_remove_thread() and sleepq_resume_thread() were merged into one function that requires the caller to hold sched_lock. This fixes several places that unlocked sched_lock only to call a function that then locked sched_lock, so even though sched_lock is now held slightly longer, removing the extra lock acquires (1 pair instead of 3 in some cases) probably makes it an overall win if you don't include the fact that it closes a race. This is definitely a 5.4 candidate. PR: kern/79693 Submitted by: Steven Sears stevenjsears at yahoo dot com MFC after: 4 days	2005-04-14 06:30:32 +00:00
Jeff Roberson	74a5123246	- Remove a debugging printf that slipped in. Spotted by: Peter Wemm	2005-04-13 23:36:28 +00:00
Tai-hwa Liang	2d4420789d	According to the comment in struct tty, t_modem is optional; hence we should guard against NULL t_modem entry. Otherwise, driver doesn't have t_modem callback implemented(such like sys/dev/usb/ucycom.c) would panic when someone opens the driver's associated tty device. Reviewed by: phk, sam (mentor)	2005-04-13 13:56:17 +00:00
Jeff Roberson	4585e3ac5a	- Change all filesystems and vfs_cache to relock the dvp once the child is locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details. Sponsored by: Isilon Systems, Inc.	2005-04-13 10:59:09 +00:00
Jeff Roberson	374df05fd3	- Change vop_lookup_post assertions to reflect recent vfs_lookup changes. Sponsored by: Isilon Systems, Inc.	2005-04-13 10:57:53 +00:00
Jeff Roberson	18ef8344b4	- Further simplify lookup; Force all filesystems to relock in the DOTDOT case. There are bugs in some which didn't unlock in the ISDOTDOT case to begin with that need to be addressed seperately. This simplifies things anyway. - Fix relookup() to prevent it from vrele()'ing the dvp while the vp is locked. Catch up to other lookup changes. Sponsored by: Isilon Systems, Inc. Reported by: Peter Wemm	2005-04-13 10:57:13 +00:00
Matthew N. Dodd	6a2989fd54	Implement unix(4) socket options LOCAL_CREDS and LOCAL_CONNWAIT. - Add unp_addsockcred() (for LOCAL_CREDS). - Add an argument to unp_connect2() to differentiate between PRU_CONNECT and PRU_CONNECT2. (for LOCAL_CONNWAIT) Obtained from: NetBSD (with some changes)	2005-04-13 00:01:46 +00:00
Robert Watson	87efd4d58a	Consistently style function declarations in kern_malloc.c. MFC after: 3 days	2005-04-12 23:54:34 +00:00
John Baldwin	aa9aa68d2f	Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic operations in some places and simple non-per CPU math in others.	2005-04-12 23:18:54 +00:00
Vinod Kashyap	f0c1dee27f	The latest release of the FreeBSD driver (twa) for 3ware's 9xxx series controllers. This corresponds to the 9.2 release (for FreeBSD 5.2.1) on the 3ware website. Highlights of this release are: 1. The driver has been re-architected to use a "Common Layer" (all tw_cl* files), which is a consolidation of all OS-independent parts of the driver. The FreeBSD OS specific portions of the driver go into an "OS Layer" (all tw_osl* files). This re-architecture is to achieve better maintainability, consistency of behavior across OS's, and better portability to new OS's (drivers for new OS's can be written by just adding an OS Layer that's specific to the OS, by complying to a "Common Layer Programming Interface" API. 2. The driver takes advantage of multiple processors. 3. The driver has a new firmware image bundled, the new features of which include Online Capacity Expansion and multi-lun support, among others. More details about 3ware's 9.2 release can be found here: http://www.3ware.com/download/Escalade9000Series/9.2/9.2_Release_Notes_Web.pdf Since the Common Layer is used across OS's, the FreeBSD specific include path for header files (/sys/dev/twa) is not part of the #include pre-processor directive in any of the source files. For being able to integrate twa into the kernel despite this, Makefile.<arch> has been changed to add the include path to CFLAGS. Reviewed by: scottl	2005-04-12 22:07:11 +00:00
Warner Losh	2bd5d8147a	resource_list_purge: release the resources in this list, and purge the elements of this list (eg, reset it). Man page to follow	2005-04-12 15:20:36 +00:00
Warner Losh	f351862a17	rman_set_device() seems to have been omitted by mistake. Implement it.	2005-04-12 06:21:59 +00:00
Jeff Roberson	0b581232df	- Remove unused include.	2005-04-12 05:45:58 +00:00
Jeff Roberson	436901a86b	- Differentiate two UPGRADE panics so I have a better idea of what's going on here.	2005-04-12 05:43:03 +00:00
Warner Losh	cdf7c848cf	Return the resource created/found in resource_list_add to avoid an extra resouce_list_find in some places. Suggested by: sam Found by: Coventry Analysis tool.	2005-04-12 04:22:17 +00:00
Jeff Roberson	17c916e321	- Mark the VOPs that require exclusive locks. Those that aren't marked with E may be called with a shared lock held. This list really could be made per filesystem if we had any filesystems which differed from ffs in locking guarantees. VFS itself is not sensitive to this except where vgone() etc. are concerned. Sponsored by: Isilon Systems, Inc.	2005-04-11 15:19:29 +00:00
Jeff Roberson	539de9eda0	- Enable ASSERT_VOP_ELOCKED and assert_vop_elocked() now that vnode_if.awk uses it. Sponsored by: Isilon Systems, Inc.	2005-04-11 15:17:06 +00:00
Jeff Roberson	070898b1b3	- Change the VOP_LOCK UPGRADE in vput() to do a LK_NOWAIT to avoid a potential lock order reversal. Also, don't unlock the vnode if this fails, lockmgr has already unlocked it for us. - Restructure vget() now that vn_lock() does all of VI_DOOMED checking for us and also handles the case where there is no real lock type. - If VI_OWEINACT is set, we need to upgrade the lock request to EXCLUSIVE so that we can call inactive. It's not legal to vget a vnode that hasn't had INACTIVE called yet. Sponsored by: Isilon Systems, Inc.	2005-04-11 09:28:32 +00:00
Jeff Roberson	1b19c74d73	- Assert that we're no longer doing recursive vn_locks in inactive/reclaim as I'd like to get rid of the vxthread. - Handle lock requests which don't actually want a lock as this is a much more convenient place to handle this condition than in vget(). These requests simply want to know that VI_DOOMED isn't set. - Correct a test at the end of vn_lock, if error !=0 should be if error == 0, this has been broken since I comitted the VI_DOOMED changes, but no one ran into it because vget() duplicated this functionality. Sponsored by: Isilon Systems, Inc.	2005-04-11 09:23:56 +00:00
Jeff Roberson	836c5b4149	- vput(tvp) before vrele(tdvp) in kern_rename() to avoid lock order issues.	2005-04-11 09:19:08 +00:00
Nate Lawson	8d9134815e	Add debugging prints to all the methods in case there are problems with managing levels. This can be enabled with the debug.cpufreq.verbose tunable and sysctl.	2005-04-10 19:11:23 +00:00
David Schultz	f97c3df18d	Suspend all other threads in the process while generating a core dump. The main reason for doing this is that the ELF dump handler expects the thread list to be fixed while the dump header is generated, so an upcall that occurs at the wrong time can lead to buffer overruns and other Bad Things. Another solution would be to grab sched_lock in the ELF dump handler, but we might as well single-thread, since the process is about to die. Furthermore, I think this should ensure that the register sets in the core file are sequentially consistent.	2005-04-10 02:31:24 +00:00
Pawel Jakub Dawidek	c19618dd7d	CDEV lock should be before 'system map' lock. Hardcode this order to help track down reported LOR. LOR reported by: Thierry Herbelot <thierry@herbelot.com> LOR info: http://sources.zabbadoz.net/freebsd/lor.html#080	2005-04-09 13:32:01 +00:00
Jeff Roberson	5ef9827cea	- Remove the namei NOOBJ flag. It is meaningless now. Sponsored by: Isilon Systems, Inc.	2005-04-09 12:04:36 +00:00
Jeff Roberson	d3b78f7337	- If we vrele() a dvp while the child is locked we can potentially deadlock when vrele() acquires the directory lock in the wrong order. Fix this via the following changes: - Keep the directory locked after VOP_LOOKUP() until we've determined what we're going to do with the child. This allows us to remove the complicated post LOOKUP code which determins whether we should lock or unlock the parent. This means we may have to vput() in the appropriate cases later, rather than doing an unsafe vrele. - in NDFREE() keep two flags to indicate whether we need to unlock vp or dvp. This allows us to vput rather than vrele in the appropriate cases without rechecking the flags. Move the code to handle dvp after we handle vp. - Remove some dead code from namei() that was the result of changes to VFS_LOCK_GIANT(). Sponsored by: Isilon Systems, Inc.	2005-04-09 11:53:16 +00:00
Pawel Jakub Dawidek	cd104dd3c1	Add a missing terminator. Confirmed by: rwatson	2005-04-09 11:31:31 +00:00
Gleb Smirnoff	4f20185860	Add additional newline to debug.mutex.prof.stats header, so that column names are printed exactly above the columns.	2005-04-08 14:14:09 +00:00
Stephan Uphoff	779186434a	Sprinkle some volatile magic and rearrange things a bit to avoid race conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb	2005-04-08 03:37:53 +00:00
Poul-Henning Kamp	2e0b9b22f0	Fix bug in vfs_hash_rehash(): use correct bucket. This only affected msdosfs which is broken in other ways too.	2005-04-07 07:54:08 +00:00
Poul-Henning Kamp	30a1695b11	Constify hexdump() harder.	2005-04-06 10:14:13 +00:00
Jeff Roberson	a96ab77002	- Remove dead code.	2005-04-06 10:11:14 +00:00
Jeff Roberson	d78e0ee9fd	- Assert that the bufobj matches in flushbuflists. I still haven't gotten to root cause on exactly how this happens. - If the assert is disabled, we presently try to handle this case, but the BUF_UNLOCK was missing. Thus, if this condition ever hit we would leak a buf lock. Many thanks to Peter Holm for all his help in finding this bug. He really put more effort into it than I did.	2005-04-06 06:49:46 +00:00
Jeff Roberson	2bbd6c9818	- Move NDFREE() from vfs_subr to vfs_lookup where namei() is.	2005-04-05 08:58:49 +00:00
Jeff Roberson	22fdc83f93	- Use taskqueue_thread rather than taskqueue_swi since our task is going to vrele, which may vop lock. This is not safe in a software interrupt context.	2005-04-05 08:51:45 +00:00
Christian S.J. Peron	f3e89267c0	Assert that the vnode is locked. This is meant to catch bugs or mis-use of the vnode API in conditions where IO_NODELOCKED has been used without the vnode actually being locked.	2005-04-05 01:11:43 +00:00
John Baldwin	c6a37e8413	Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more	2005-04-04 21:53:56 +00:00
Nate Lawson	4abfd70c87	Document that devclass_get_maxunit(9) returns one greater than the current highest unit. Reviewed by: dfr MFC after: 2 weeks	2005-04-04 15:37:59 +00:00
Nate Lawson	fada20b989	Add devclass_get_drivers(9) which provides an array of pointers to driver instances in a given devclass. This is useful for systems that want to call code in driver static methods, similar to device_identify(). Reviewed by: dfr MFC after: 2 weeks	2005-04-04 15:26:51 +00:00
Jeff Roberson	d1cc6041e6	- Add a missing unlock of the vnode_free_list_mtx. Spotted by: Antoine Brodin	2005-04-04 12:07:16 +00:00
Jeff Roberson	92b8231d4f	- Instead of waiting forever to get a vnode in getnewvnode() wait for one to become available for one second and then return ENFILE. We can run out of vnodes, and there must be a hard limit because without one we can quickly run out of KVA on x86. Presently the system can deadlock if there are maxvnodes directories in the namecache. The original 4.x BSD behavior was to return ENFILE if we reached the max, but 4.x BSD did not have the vnlru proc so it was less profitable to wait.	2005-04-04 11:43:44 +00:00
Jeff Roberson	9a6bb8ad8f	- Include opt_vfs.h for LOOKUP_SHARED. - Control the behavior of shared lookups with the lookup_shared sysctl which has its default behavior set via the LOOKUP_SHARED option.	2005-04-03 23:50:20 +00:00
Nate Lawson	a44732323f	maxunit is actually one higher than the greatest currently-allocated unit in a devclass. All the other uses of maxunit are correct and this one was safe since it checks the return value of devclass_get_device(), which would always say that the highest unit device doesn't exist. Reviewed by: dfr MFC after: 3 days	2005-04-03 22:23:18 +00:00
Jeff Roberson	20728d8f63	- Slightly restructure acquire() so I can add more ktr information and an assert to help find two strange bugs. - Remove some nearby spls.	2005-04-03 11:49:02 +00:00
Jeff Roberson	f0ddc75ed0	- Now that writes to character devices supporting softupdates can generate dirty bufs even with a locked vnode, 100 retries is not that many. This should probably change from a retry count to an abort when we are no longer cleaning any buffers. - Don't call vprint() while we still hold the vnode locked. Move the call to later in the function. - Clean up a comment.	2005-04-03 10:24:03 +00:00
Alan Cox	9f65fb13aa	Remove GIANT_REQUIRED from elfN_load_section().	2005-04-03 07:57:47 +00:00
John Baldwin	98df9218da	- Change the vm_mmap() function to accept an objtype_t parameter specifying the type of object represented by the handle argument. - Allow vm_mmap() to map device memory via cdev objects in addition to vnodes and anonymous memory. Note that mmaping a cdev directly does not currently perform any MAC checks like mapping a vnode does. - Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the cdev the ioctl is acting on rather than trying to find a suitable vnode to map from. Reviewed by: alc, arch@	2005-04-01 20:00:11 +00:00
John Baldwin	fe24ab5fc5	Actually commit the code for kern_sched_get_rr_interval().	2005-03-31 22:54:48 +00:00
John Baldwin	b88ec951e1	Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(), kern_settimeofday(), and kern_writev() to allow for further stackgap reduction in the compat ABIs.	2005-03-31 22:51:18 +00:00
John Baldwin	ea2b9b3e36	- Denote a few places where kobj class references are manipulated without holding the appropriate lock. - Add a comment explaining why we bump a driver's kobj class reference when loading a module.	2005-03-31 22:49:31 +00:00
John Baldwin	2945387fee	Drop a bogus mp_fixme(). Adding a lock would do nothing to reduce userland races regarding changing of jail-related sysctls.	2005-03-31 22:47:57 +00:00
John Baldwin	b80ed61487	Don't recursively panic when we call mi_switch() in a critical section, even though calling mi_switch() after a panic is likely a bug anyway as the recursive panic only serves to make things worse.	2005-03-31 20:36:44 +00:00
Nate Lawson	71ab130c9b	Add a check for cpufreq_unregister() being called with no cpufreq device active. Note that the logic indicates this should not be possible so generate a warning if this ever happens. Found by: Coverity Prevent (via sam)	2005-03-31 18:56:54 +00:00
Poul-Henning Kamp	f4f6abcb4e	Explicitly hold a reference to the cdev we have just cloned. This closes the race where the cdev was reclaimed before it ever made it back to devfs lookup.	2005-03-31 12:19:44 +00:00
Poul-Henning Kamp	9477d73e32	cdev (still) needs per instance uid/gid/mode Add unlocked version of dev_ref() Clean up various stuff in sys/conf.h	2005-03-31 10:29:57 +00:00
Poul-Henning Kamp	eb151cb989	Rename dev_ref() to dev_refl()	2005-03-31 06:51:54 +00:00
Jeff Roberson	e451d879a1	- Disable vfs shared locks by default. They must be specifically enabled on filesystems which safely support them. It appears that many network filesystems specifically are not shared lock safe. Sponsored by: Isilon Systems, Inc.	2005-03-31 05:22:45 +00:00
Jeff Roberson	c4c0ec5ba7	- Add a LK_NOSHARE flag which forces all shared lock requests to be treated as exclusive lock requests. Sponsored by: Isilon Systems, Inc.	2005-03-31 05:18:19 +00:00
Jeff Roberson	f247a5240d	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:37:09 +00:00
Jeff Roberson	b641353e3a	- Remove apause(). It makes no sense with our present mutex implementation since simply unlocking a mutex does not ensure that one of the waiters will run and acquire it. We're more likely to reacquire the mutex before anyone else has a chance. It has also bit me three times now, as it's not safe to drop the interlock before sleeping in many cases. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:25:59 +00:00
David Schultz	7ce7f713ee	Eliminate v_id and v_ddid. The name cache now holds references to vnodes whose names it caches, so we no longer need a `generation number' to tell us if a referenced vnode is invalid. Replace the use of the parent's v_id in the hash function with the address of the parent vnode. Tested by: Peter Holm Glanced at by: jeff, phk	2005-03-30 03:01:36 +00:00
David Schultz	dd33f0d92f	Merge kern___cwd() and vn_fullpath(), which were virtually identical, except for places where people forget to update one of them. We now collect only one set of stats for both of these routines. Other changes in this commit include: - Start acquiring Giant again in vn_fullpath(), since it is required when crossing a mount point. - Expand the scope of the cache lock to avoid dropping it and picking it up again for every pathname component. This also makes it trivial to avoid races in stats collection. - Assert that nc_dvp == v_dd for directories instead of returning an error to userland when this is not true. AFAIK, it should always be true when v_dd is non-null. - For vn_fullpath(), handle the first (non-directory) vnode separately. Glanced at by: jeff, phk	2005-03-30 02:59:32 +00:00
Jeff Roberson	5280e61f2f	- Move the logic that locks and refs the new vnode from vfs_cache_lookup() to cache_lookup(). This allows us to acquire the vnode interlock before dropping the cache lock. This protects the vnodes identity until we have locked it. Sponsored by: Isilon Systems, Inc.	2005-03-29 12:59:06 +00:00
Poul-Henning Kamp	b3d82c03fc	Remove the global cdev hash and use the cdevsw list instead. Don't remove the now unused element from cdev yet, wait until we have a better reason to bump the version. There is now no longer any upper limit on how many device drivers a FreeBSD kernel can have.	2005-03-29 11:15:54 +00:00
Jeff Roberson	571211c454	- Get rid of the old LOOKUP_SHARED code. namei() now supplies the proper lock flags via cn_lkflag. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:08:23 +00:00
Jeff Roberson	99f3c87034	- Set cn_lkflags to LK_SHARED in the LOOKUP_SHARED case so that we only acquire shared locks on intermediate directories. - For the LASTCN, we may have to LK_UPGRADE the parent directory before we lookup the last component. - Acquire VFS_ROOT and dp locks based on the cn_lkflag. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:07:15 +00:00
Jeff Roberson	0fbc3b7df0	- Dont clear OWEINACT in vbusy(), we still owe an inactive call if someone vhold()s us. - Avoid an extra mutex acquire and release in the common case of vgonel() by checking for OWEINACT at the start of the function. - Fix the case where we set OWEINACT in vput(). LK_EXCLUPGRADE drops our shared lock if it fails. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:02:48 +00:00
Jeff Roberson	cb34b95ba4	- Don't initial v_dd here, let cache_purge() do it for us. Sponsored by: Isilon Systems, Inc.	2005-03-29 09:59:34 +00:00
Jeff Roberson	b75719afea	- Invalidate the childrens v_dd pointers when we cache_purge() a directory. Otherwise the stale pointer may be accessed after a vnode is freed. Sponsored by: Isilon Systems, Inc.	2005-03-29 09:58:41 +00:00
Poul-Henning Kamp	ff7284eeb4	Remove the global cdev hash and use the cdevsw list instead. Don't remove the now unused element from cdev yet, wait until we have a better reason to bump the version.	2005-03-29 09:56:21 +00:00
Poul-Henning Kamp	fd5f6f4cf2	Privatize major().	2005-03-29 08:13:17 +00:00
Poul-Henning Kamp	97eb8cfae0	Print name of device instead of useless major/minor numbers.	2005-03-29 08:13:01 +00:00
Jeff Roberson	ea9aa09dd1	- Remove an unused variable from relookup(). - Assert that REMOVE, CREATE, and RENAME callers have WANTPARENT or LOCKPARENT set. You can't complete any of these operations without at least a reference to the parent. Many filesystems check for this case even though it isn't possible in the current system.	2005-03-28 13:56:56 +00:00
Jeff Roberson	f7b404d88f	- Remove an unused variable. Sponsored by: Isilon Systems, Inc.	2005-03-28 13:29:48 +00:00
Jeff Roberson	9d65cdf6ff	- Rev 1.83 of kern_lock.c fixes the td_locks assert, reenable it here. Sponsored by: Isilon Systems, Inc.	2005-03-28 12:52:46 +00:00
Jeff Roberson	bf5c2a1940	- Don't bump the count twice in the LK_DRAIN case. Sponsored by: Isilon Systems, Inc.	2005-03-28 12:52:10 +00:00
Jeff Roberson	9dcc5da318	- Move code that should probably be an assert above the main body of vrele so that we can decrease the indentation of the real work and make things slightly more clear. Sponsored by: Isilon Systems, Inc.	2005-03-28 11:18:47 +00:00
Jeff Roberson	ee5a0a2d7c	- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:26:17 +00:00
Jeff Roberson	d36f0a4ff8	- Adjust asserts in vop_lookup_post() to match the new post PDIRUNLOCK vfs. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:25:25 +00:00
Jeff Roberson	1e38e08e76	- Get rid of PDIRUNLOCK, instead, we fixup the lock state immediately after calling VOP_LOOKUP(). Rather than having each filesystem check the LOCKPARENT flag, we simply check it once here and unlock as required. The only unusual case is ISDOTDOT, where we require an unlocked vnode on return. Relocking this vnode with the child locked is allowed since the child is actually its parent. - Add a few asserts for some unusual conditions that I do not believe can happen. These will later go away and turn into implementations for these conditions. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:24:50 +00:00
Poul-Henning Kamp	3b73a3c079	Remove another ';' after if(). Also spotted by: bz	2005-03-27 07:53:13 +00:00
Poul-Henning Kamp	2d8dfb2836	Remove extra ; at end of if(). Found by: bz	2005-03-27 07:52:12 +00:00
Poul-Henning Kamp	4a650cc291	Make (some) serial ports implement the PPS-API again. This change appearantly fell out during the tty code cleanup.	2005-03-26 20:12:39 +00:00
Poul-Henning Kamp	f83856243d	s/ENOTTY/ENOIOCTL/	2005-03-26 20:04:28 +00:00
Jeff Roberson	eb8d0e01c0	- The td_locks check is currently broken with snapshots and possibly some case in unmount. Disable the KASSERT until these problems can be diagnosed. Sponsored by: Isilon Systems, Inc.	2005-03-25 09:56:56 +00:00
Jeff Roberson	228ea9d212	- Don't recycle vnodes anymore. Free them once they are dead. getnewvnode now always allocates a new vnode. - Define a new function, vnlru_free, which frees vnodes from the free list. It takes as a parameter the number of vnodes to free, which is wantfreevnodes - freevnodes when called from vnlru_proc or 1 when called from getnewvnode(). For now, getnewvnode() still tries to reclaim a free vnode before creating a new one when we are near the limit. - Define a function, vdestroy, which handles the actual release of memory and teardown of locks, etc. This could become a uma_dtor() routine. - Get rid of minvnodes. Now wantfreevnodes is 1/4th the max vnodes. This keeps more unreferenced vnodes around so that files which have only been stat'd are less likely to be kicked out of the system before we have a chance to read them, etc. These vnodes may still be freed via the normal vnlru_proc() routines which may some day become a real lru.	2005-03-25 05:34:39 +00:00
Marcel Moolenaar	379ba85322	Fix inittodr() invocation. Now that devfs is mounted before the actual root file system is mounted, the first entry on the mountlist is not the root file system and the timestamp for that entry is typically 0. Passing that to inittodr() caused annoying errors on alpha and ia64. So, call inittodr() for all file systems on mountlist, but only when the timestamp (mnt_time) is non-zero.	2005-03-25 01:56:12 +00:00
Jeff Roberson	6c759f3558	- Add information about the buf lock to db_show_buffer. - Add a 'show lockedbufs' command that is similar to show lockedvnods. Sponsored by: Isilon Systems, Inc.	2005-03-25 00:20:37 +00:00
Jeff Roberson	f158df07ab	- Restore COUNT() in all of its original glory. Don't make it dependent on DEBUG as ufs will soon grow a dependency on this count. Discussed with: bde Sponsored by: Isilon Systems, Inc.	2005-03-25 00:00:44 +00:00
John Baldwin	85c36f3d25	Don't set ret_namelen and ret_resnamelen in res_find() unless both the corresponding pointer to the buffer (ret_name and ret_resname) is non-NULL to avoid possible NULL pointer derefs. Reported by: Coverity via sam	2005-03-24 21:20:25 +00:00
Poul-Henning Kamp	eb0d6cde00	Move implementation of hw.bus.rman sysctl to subr_rman.c so that subr_bus.c doesn't need to peek inside struct resource. OK from: imp	2005-03-24 18:13:11 +00:00
Jeff Roberson	61ef09d118	- Fail an assert if we attempt to return with any lockmgr locks held in userret(). Sponsored by: Isilon Systems, Inc.	2005-03-24 09:35:38 +00:00
Jeff Roberson	92e251caf7	- Complete the implementation of td_locks. Track the number of outstanding lockmgr locks that this thread owns. This is complicated due to LK_KERNPROC and because lockmgr tolerates unlocking an unlocked lock. Sponsored by: Isilon Systes, Inc.	2005-03-24 09:35:06 +00:00
Jeff Roberson	d830f82824	- Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For now, all calls to VFS_ROOT() should still acquire exclusive locks. Sponsored by: Isilon Systems, Inc.	2005-03-24 07:31:38 +00:00
Jeff Roberson	aabb175391	- Fixup the default vfs_root function to match the new prototype. Sponsored by: Isilon Systems, Inc.	2005-03-24 07:30:00 +00:00
Jeff Roberson	5d14d29912	- Grab the lock type that the caller requests in vfs_hash_insert(). Sponsored by: Isilon Systems, Inc.	2005-03-24 06:16:27 +00:00
Jeff Roberson	c167961e27	- If vput() is called with a shared lock it must upgrade to an exclusive before it can call VOP_INACTIVE(). This must use the EXCLUPGRADE path because we may violate some lock order with another locked vnode if we drop and reacquire the lock. If EXCLUPGRADE fails, we mark the vnode with VI_OWEINACT. This case should be very rare. - Clear VI_OWEINACT in vinactive() and vbusy(). - If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well. Sponsored by: Isilon Systems, Inc.	2005-03-24 06:08:58 +00:00
Jeff Roberson	3e6bcad375	- Remove some long dead LOOKUP_SHARED code that tracked the lock state. - Always pass LOCKSHARED and rely on namei() to ignore it when LOOKUP_SHARED is not set. Sponsored by: Isilon Systems, Inc.	2005-03-24 06:04:35 +00:00
Jeff Roberson	ae88db8a72	- Remove the #ifdef LOOKUP_SHARED from some calls to NDINIT. The LOCKSHARED flag is simply ignored in namei() if LOOKUP_SHARED is not enabled. Sponsored by: Isilon Systems, Inc.	2005-03-24 06:03:31 +00:00
Jeff Roberson	ad09e57f41	- Clear LOCKSHARED if LOOKUP_SHARED is not enabled. This is not strictly necessary since we disable the shared locks in vfs_cache, but it is prefered that the option not leak out into filesystems when it is disabled. Sponsored by: Isilon Systems, Inc.	2005-03-24 06:02:37 +00:00
Jeff Roberson	fdd6a3ff3c	- All of the bugs which lead to the complication of the LOOKUP_SHARED config option have now been fixed. All filesystems are properly locked and checked via DEBUG_VFS_LOCKS. Remove the workaround code. Sponsored by: Isilon Systems, Inc.	2005-03-24 06:00:45 +00:00
Julian Elischer	b75b03116f	Fix code freeing wrong cred pointer. Submitted by: das Noticed by: Coverity tool MFC after: 3 days Note: usually the two pointers point to the same thing but it was still a bug.	2005-03-21 22:55:38 +00:00
Robert Watson	6220dcba84	Add a read-only kern.sched.preemption sysctl so that user space can tell if "options PREEMPTION" is compiled into the kernel.	2005-03-20 17:05:12 +00:00
Pawel Jakub Dawidek	c78941e69e	Add ki_jid field to the kinfo_proc structure and store jail ID there. Reviewed by: gad MFC after: 3 days	2005-03-20 10:35:23 +00:00
Poul-Henning Kamp	773eff9d97	Sleeping is not allowed in uma->fini	2005-03-19 08:22:13 +00:00
Sam Leffler	b53d6ac575	check copyin return value Noticed by: Coverity Prevent analysis tool	2005-03-19 04:34:23 +00:00
David Schultz	f7fdcd45f0	Add missing cases for PT_SYSCALL. Found by: Coverity Prevent analysis tool	2005-03-18 21:22:28 +00:00
Maxim Sobolev	2322a0a77d	Impose the upper limit on signals that are allowed between kernel threads in set[ug]id program for compatibility with Linux. Linuxthreads uses 4 signals from SIGRTMIN to SIGRTMIN+3. Pointed out by: rwatson	2005-03-18 13:33:18 +00:00
Poul-Henning Kamp	1ea7a6f806	Use subr_unit to allocate thread ID's with. Tested by: davidxu	2005-03-18 12:34:14 +00:00
Maxim Sobolev	f9cd63d436	Linuxthreads uses not only signal 32 but several signals >= 32. PR: kern/72922 Submitted by: Andriy Gapon <avg@icyb.net.ua>	2005-03-18 11:08:55 +00:00
Poul-Henning Kamp	a1e1d551d8	Fix a bad copy&paste mistake I made. Spotted by: truckman	2005-03-18 06:01:21 +00:00
Warner Losh	36fed96550	Use STAILQ in preference to SLIST for the resources. Insert new resources last in the list rather than first. This makes the resouces print in the 4.x order rather than the 5.x order (eg fdc0 at 0x3f0-0x3f5,0x3f7 is 4.x, but 0x3f7,0x3f0-0x3f5 is 5.x). This also means that the pci code will once again print the resources in BAR ascending order.	2005-03-18 05:19:50 +00:00
John-Mark Gurney	c4c44d2935	fix aio+kq... I've been running ambrisko's test program for much longer w/o problems than I was before... This simply brings back the knote_delete as knlist_delete which will also drop the knote's, instead of just clearing the list and seeing _ONESHOT... Fix a race where if a note was _INFLUX and _DETACHED, it could end up being modified... whoopse.. MFC after: 1 week Prodded by: ambrisko and dwhite	2005-03-18 01:11:39 +00:00
John-Mark Gurney	7ac139a904	add m_copyup function.. This can be used to help make our ip stack less alignment restrictive, and help performance on some ethernet cards which currently copy the entire packet a couple bytes to get the packet aligned properly... Wordsmithing by: dwhite Obtained from: NetBSD (code only) I'll clean it up later: rwatson	2005-03-17 19:34:57 +00:00
Robert Watson	bc60830675	A further step on the journey of meaking panics and debugging more reliable: in the window between the beginning of panic() and entering the debugger, it's possible to receive interrupts. If we receive an interrupt, don't preempt if panicstr != NULL, as the system is in the process of failing, and the preempting thread is likely to stumble over the failure. The typical scenario is during the printf() in panic() prior to entering the debugger, but when running with a slower console type such as serial console. It could be that the panic string should be passed to the debugger to print, so that it can run from the debugger's environment rather than a regular kernel printf. Glanced at by: jhb	2005-03-17 15:18:01 +00:00
Poul-Henning Kamp	bde1a9c98b	Kill MAJOR_AUTO	2005-03-17 13:37:28 +00:00
Poul-Henning Kamp	800b42bde0	Prepare for the final onslaught on devices: Move uid/gid/mode from cdev to cdevsw. Add kind field to use for devd(8) later. Bump both D_VERSION and __FreeBSD_version	2005-03-17 12:07:00 +00:00
Poul-Henning Kamp	572b4402d1	In stange circumstances we may end up being the last reference to a session in tprintf(). SESSRELE() needs to properly dispose of the sessions mutex. Add sessrele() which does the proper cleanup and have SESSRELE() call it. Use SESSRELE also in pgdelete(). Found by: Coverity (ID:526)	2005-03-17 08:44:41 +00:00
Poul-Henning Kamp	51f5ce0c8c	Add two arguments to the vfs_hash() KPI so that filesystems which do not have unique hashes (NFS) can also use it.	2005-03-16 11:20:51 +00:00
Poul-Henning Kamp	9068e77689	Fix a memoryleak in case of failed root filesystem mount. Spotted by: Coverity via sam	2005-03-16 11:06:49 +00:00
John-Mark Gurney	2a77000b75	MFp4: print a more useful error when we don't have a /dev to mount devfs on..	2005-03-16 08:04:39 +00:00
Poul-Henning Kamp	78bb3c21ed	Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use it to get better hashing in vfs_hash. In case of an insert collision in vfs_hash_insert(), put the loosing vnode on a special list so that vfs_hash_remove() can just assume that it is on a list. Drop the VI_HASHED flag.	2005-03-16 07:35:06 +00:00
Warner Losh	358fef538f	Sometimes, when asked to return region A..C, we'd return A+N..C+N instead of failing. When looking for a region to allocate, we used to check to see if the start address was < end. In the case where A..B is allocated already, and one wants to allocate A..C (B < C), then this test would improperly fail (which means we'd examine that region as a possible one), and we'd return the region B+1..C+(B-A+1) rather than NULL. Since C+(B-A+1) is necessarily larger than C (end argument), this is incorrect behavior for rman_reserve_resource_bound(). The fix is to exclude those regions where r->r_start + count - 1 > end rather than r->r_start > end. This bug has been in this code for a very long time. I believe that all other tests against end are correctly done. This is why sio0 generated a message about interrupts not being enabled properly for the device. When fdc had a bug that allocated from 0x3f7 to 0x3fb, sio0 was then given 0x3fc-0x404 rather than the 0x3f8-0x3ff that it wanted. Now when fdc has the same bug, sio0 fails to allocate its ports, which is the proper behavior. Since the probe failed, we never saw the messed up resources reported. I suspect that there are other places in the tree that have weird looping or other odd work arounds to try to cope with the observed weirdness this bug can introduce. These workarounds should be located and eliminated. Minor debug write fix to match the above test done as well. 'nice' by: mdodd Sponsored by: timing solutions (http://www.timing.com/)	2005-03-15 20:28:51 +00:00
Warner Losh	a33ab77447	Fix a debugging printf. The order of start/end was inconsistant with all the other start/end debugs, causing momentary confusion when the output was examined.	2005-03-15 20:15:15 +00:00
Poul-Henning Kamp	45c26fa2b6	Improve the vfs_hash() API: vput() the unneeded vnode centrally to avoid replicating the vput in all the filesystems.	2005-03-15 20:00:03 +00:00
Jeff Roberson	b172f6c5f9	- Now that there are no external users of vfree() make it static. - Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so no one else attempts to grow a dependency on them. - Now that objects with pages hold the vnode we don't have to do unlocked checks for the page count in the vm object in VSHOULDFREE. These three macros could simply check for holdcnt state transitions to determine whether the vnode is on the free list already, but the extra safety the flag affords us is probably worth the minimal cost. - The leafonly sysctl and code have been dead for several years now, remove the sysctl and the code that employed it from vtryrecycle(). - vtryrecycle() also no longer has to check the object's page count as the object holds the vnode until it reaches 0. Sponsored by: Isilon Systems, Inc.	2005-03-15 14:38:16 +00:00
Poul-Henning Kamp	7933351a28	Fix a debug message to print a usable device name rather than useless major+minor tupple.	2005-03-15 14:08:10 +00:00
Jeff Roberson	c178628d6e	- Expose vholdl() so it may be used outside of vfs_subr.c	2005-03-15 13:43:10 +00:00
Poul-Henning Kamp	4ba679d6d0	Remove findcdev().	2005-03-15 12:58:08 +00:00
Poul-Henning Kamp	0a2e49f1f8	Rename cdev->si_udev to cdev->si_drv0 to reflect the new nature of the field.	2005-03-15 11:33:28 +00:00
Jeff Roberson	f5f0da0a0e	- transferlockers() requires the interlock to be SMP safe. Sponsored by: Isilon Systems, Inc.	2005-03-15 09:27:45 +00:00
Poul-Henning Kamp	e82ef95c11	Simplify the vfs_hash calling convention.	2005-03-15 08:07:07 +00:00
Poul-Henning Kamp	ee148e2606	Cleanup accidentally include #if 0 section.	2005-03-14 10:25:09 +00:00
Poul-Henning Kamp	6c325a2a21	Currently (almost) all filesystems maintain a local inode hash table to get from (mount + inode) to vnode. These tables are mostly copy&pasted from UFS, sized based on desiredvnodes and therefore quite large (128K-512K). Several filesystems are buggy enough that they allocate the hash table even before they know if they will ever be used or not. Add "vfs_hash", a system wide hash table, which will replace all the per-filesystem hash-tables. The fields we add to struct vnode will more or less be saved in the respective filesystems inodes. Having one central implementation will save code and will allow us to justify the complexity of code to dynamically (re)size the hash at a later point.	2005-03-14 10:01:29 +00:00
Jeff Roberson	8045557f2b	- Increment the holdcnt once for each usecount reference. This allows us to use only the holdcnt to determine whether a vnode may be recycled, simplifying the V* macros as well as vtryrecycle(), etc. Sponsored by: Isilon Systems, Inc.	2005-03-14 09:25:19 +00:00
Jeff Roberson	159b454819	- We do not have to check the object's ref_count in VSHOULDFREE or vtryrecycle(). All obj refs also ref the vnode. - Consistently use v_incr_usecount() to increment the usecount. This will be more important later. Sponsored by: Isilon Systems, Inc.	2005-03-14 08:30:31 +00:00
Jeff Roberson	8f13a540ed	- Slightly rearrange vrele() to move the common case in one indentation level. Sponsored by: Isilon Systems, Inc.	2005-03-14 07:16:55 +00:00
Jeff Roberson	6fc16a838c	- Rework vget() so we drop the usecount in two failure cases that were missed by my last commit. Sponsored by: Isilon Systems, Inc.	2005-03-14 07:11:19 +00:00
Poul-Henning Kamp	93f6c81e25	Remove debugging printfs.	2005-03-14 06:51:29 +00:00
Jeff Roberson	0463dc9ef1	- Do a vn_start_write in vn_close, we may write if this is the last ref on an unlinked file. We can't know if this is the case until after we have the lock. - Lock the vnode in vn_close, many filesystems had code which was unsafe without the lock held, and holding it greatly simplifies vgone(). - Adjust vn_lock() to check for the VI_DOOMED flag where appropriate. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:56:28 +00:00
Jeff Roberson	6703c30bb5	- Remove vx_lock, vx_unlock, vx_wait, etc. - Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do writing ops without suspending. This could suspend the vlruproc which should not be a problem under normal circumstances. - Manually implement VMIGHTFREE in vlrureclaim as this was the only instance where it was used. - Acquire a lock before calling vgone() as it now requires it. - Move the acquisition of the vnode interlock from vtryrecycle() to getnewvnode() so that if it fails we don't drop and reacquire the vnode_free_list_mtx. - Check for a usecount or holdcount at the end of vtryrecycle() in case someone grabbed a ref while we were recycling. Abort the recycle, and on the final ref drop this vnode will be placed on the head of the free list. - Move the redundant VOP_INACTIVE protection code into the local vinactive() routine to avoid code bloat. - Keep the vnode lock held across calls to vgone() in several places. - vgonel() no longer uses XLOCK, instead callers must hold an exclusive vnode lock. The VI_DOOMED flag is set to allow other threads to detect a vnode which is no longer valid. This flag is set until the last reference is gone, and there are no chances for a new ref. vgonel() holds this lock across the entire function, which greatly simplifies logic. _ Only vfree() in one place in vgone() not three. - Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock in the LK_NOWAIT case. In other cases, check after we have slept and acquired an exlusive lock. This will simulate the old vx_wait() behavior. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:54:28 +00:00
Jeff Roberson	2b3183a8b7	- A lock is required before calling VOP_REVOKE. Our reference protects us from accessing another vnode so a naked VOP_LOCK is sufficient. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:47:04 +00:00
Jeff Roberson	9331fd135b	- Don't VOP_UNLOCK prior to VOP_REVOKE. The lock is required now. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:45:51 +00:00
Jeff Roberson	23f2513a4e	- Don't drop the lock in the default inactive handler anymore, VOP_NULL will do for vop_stdinactive now. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:45:01 +00:00
Jeff Roberson	4e6746965e	- CLOSE, REVOKE, INACTIVE, and RECLAIM are not L L L, that's a locked vnode on enter, exit, error. This allows for the removal of the XLOCK. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:42:16 +00:00
Pawel Jakub Dawidek	cefcecbefd	Function jailed() looks into ucred strcture, so be sure ucred is not NULL. Reviewed by: rwatson MFC after: 1 week	2005-03-12 14:31:04 +00:00
Pawel Jakub Dawidek	d079d0a0d2	Clean up a bit. Reviewed by: rwatson MFC after: 1 week	2005-03-12 14:28:34 +00:00
Robert Watson	59f21d5ab1	Extend the coverage of the accept and socket mutexes in soisconnected() so that the socket lock is held over the test-and-set removal of the accept filter option during connect, and the two socket mutex regions (transition to connected, perform accept filter) are combined.	2005-03-12 13:39:39 +00:00
Robert Watson	a59f81d263	Move the logic implementing retrieval of the SO_ACCEPTFILTER socket option from uipc_socket.c to uipc_accf.c in do_getopt_accept_filter(), so that it now matches do_setopt_accept_filter(). Slightly reformulate the logic to match the optimistic allocation of storage for the argument in advance, and slightly expand the coverage of the socket lock.	2005-03-12 12:57:18 +00:00
Robert Watson	92081a8344	Part two of post-SMPng cleanup of accept filter registration: perform all allocation up front before grabbing the socket mutex and doing the registration work. The result is a lot cleaner.	2005-03-12 12:27:47 +00:00
Peter Wemm	f71692e9be	Replace my previous change for 32 bit systems with hz > 169 with Bruce's simpler one.	2005-03-12 00:13:45 +00:00
Peter Wemm	2afec87508	Make the tty vmin/vtime timeouts work for hz > 169 on 32 bit machines.	2005-03-12 00:10:23 +00:00
Robert Watson	64c238075f	First step in simplifying accept filter socket option logic in the post-SMPng world order. Centralize handling of the socket option clear case in do_setopt_accept_filter().	2005-03-11 21:37:45 +00:00
Robert Watson	56856fbfb4	Remove an additional commented out reference to a possible future sx lock.	2005-03-11 19:16:02 +00:00
Robert Watson	2b37548a71	When setting up a socket in socreate(), there's no need to lock the socket lock around knlist_init(), so don't. Hard code the setting of the socket reference count to 1 rather than using soref() to avoid asserting the socket lock, since we've not yet exposed the socket to other threads. This removes two mutex operations from each socket allocation.	2005-03-11 16:30:02 +00:00
Robert Watson	5fab68b19e	Remove suggestive sx_init() comment in soalloc(). We will have something like this at some point, but for now it clutters the source.	2005-03-11 16:26:33 +00:00
Robert Watson	35a196154f	The SO_NOSIGPIPE socket option allows a user process to mark a socket so that the socket does not generate SIGPIPE, only EPIPE, when a write is attempted after socket shutdown. When the option was introduced in 2002, this required the logic for determining whether SIGPIPE was generated to be pushed down from dofilewrite() to the socket layer so that the socket options could be considered. However, the change in 2002 omitted modification to soo_write() required to add that logic, resulting in SIGPIPE not being generated even without SO_NOSIGPIPE when the socket was written to using write() or related generic system calls. This change adds the EPIPE logic to soo_write(), generating a SIGPIPE signal to the process associated with the passed uio in the event that the SO_NOSIGPIPE option is not set. Notes: - The are upsides and downsides to placing this logic in the socket layer as opposed to the file descriptor layer. This is really fd layer logic, but because we need so_options, we have a choice of layering violations and pick this one. - SIGPIPE possibly should be delivered to the thread performing the write, not the process performing the write. - uio->uio_td and the td argument to soo_write() might potentially differ; we use the thread in the uio argument. - The "sigpipe" regression test in src/tools/regression/sockets/sigpipe tests for the bug. Submitted by: Mikko Tyolajarvi <mbsd at pacbell dot net> Talked with: glebius, alfred PR: 78478 MFC after: 1 week	2005-03-11 15:06:16 +00:00
John-Mark Gurney	74e620476c	fix spelling of match in comment... MFC after: 3 days	2005-03-10 21:23:06 +00:00
Poul-Henning Kamp	b43ab0e378	Try to fix the mess I made of devname, with the minimal subset of the larger minor/major patch which was posted for testing.	2005-03-10 18:21:34 +00:00
Robert Watson	53358cc907	Document, via WITNESS, that the NFS server mutex falls ahead of the socket buffer mutexes.	2005-03-09 21:38:53 +00:00
Dag-Erling Smørgrav	628b83cd08	My addled brains didn't realize that since vtp points into value, we can't freeenv(value) before we're done inspecting vtp[0]. Tested by: Anish Mistry <mistry.7@osu.edu>	2005-03-09 12:16:45 +00:00
Stefan Farfeleder	b26244446b	Fix typo in comment.	2005-03-09 11:50:55 +00:00
Sam Leffler	a4e714295a	allow the destination of m_move_pkthdr to have external storage (e.g. a cluster) Glanced at by: rwatson, silby	2005-03-08 17:52:01 +00:00
Giorgos Keramidas	0a11e99990	Remove redundant initialization that is repeated in the for() loop right below it. Approved by: jhb	2005-03-08 16:57:20 +00:00
Maxim Sobolev	8d6e40c3f1	Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress SIGPIPE signal for the duration of the sento-family syscalls. Use it to replace previously added hack in Linux layer based on temporarily setting SO_NOSIGPIPE flag. Suggested by: alfred	2005-03-08 16:11:41 +00:00
Poul-Henning Kamp	d9a54d5c23	Reengineer subr_unit Add support for passing in a mutex. If NULL is passed a global subr_unit mutex is used. Add alloc_unrl() which expects the mutex to be held. Allocating a unit will never sleep as it does not need to allocate memory. Cut possible range in half so we can use -1 to mean "out of number". Collapse first and last runs into the head by means of counters. This saves memory in the common case(s).	2005-03-08 10:40:48 +00:00
Poul-Henning Kamp	3238ec33e1	Fix signedness of minor2unit().	2005-03-08 10:40:03 +00:00
Jeff Roberson	ec346d1040	- Lock access to the buffer_map with the vm_map lock. In 4.x this was done with splbio, in 5.x this was done with Giant. Discussed with: alc Reported by: julian, pho	2005-03-08 09:34:54 +00:00
Giorgos Keramidas	46da8bf8fb	Typo & grammar fixes in comments.	2005-03-08 00:58:50 +00:00
Robert Watson	9bfb7389bc	When upcalling from a socket in soisconnected() for an accept filter, call with flag M_DONTWAIT rather than M_TRYWAIT, as we don't want to do blocking memory allocation (etc) in the netisr. MFC after: 3 days	2005-03-07 13:50:16 +00:00
Poul-Henning Kamp	3b3f38ed7d	Add placeholder mutex argument to new_unrhdr().	2005-03-07 11:05:47 +00:00
Bill Paul	58a6edd121	When you call MiniportInitialize() for an 802.11 driver, it will at some point result in a status event being triggered (it should be a link down event: the Microsoft driver design guide says you should generate one when the NIC is initialized). Some drivers generate the event during MiniportInitialize(), such that by the time MiniportInitialize() completes, the NIC is ready to go. But some drivers, in particular the ones for Atheros wireless NICs, don't generate the event until after a device interrupt occurs at some point after MiniportInitialize() has completed. The gotcha is that you have to wait until the link status event occurs one way or the other before you try to fiddle with any settings (ssid, channel, etc...). For the drivers that set the event sycnhronously this isn't a problem, but for the others we have to pause after calling ndis_init_nic() and wait for the event to arrive before continuing. Failing to wait can cause big trouble: on my SMP system, calling ndis_setstate_80211() after ndis_init_nic() completes, but _before_ the link event arrives, will lock up or reset the system. What we do now is check to see if a link event arrived while ndis_init_nic() was running, and if it didn't we msleep() until it does. Along the way, I discovered a few other problems: - Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL. ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation wrong.) - Similarly, the NDIS interrupt handler, which is essentially a DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask() has been fixed accordingly. - MiniportQueryInformation() and MiniportSetInformation() run at DISPATCH_LEVEL, and each request must complete before another can be submitted. ndis_get_info() and ndis_set_info() have been fixed accordingly. - Turned the sleep lock that guards the NDIS thread job list into a spin lock. We never do anything with this lock held except manage the job list (no other locks are held), so it's safe to do this, and it's possible that ndis_sched() and ndis_unsched() can be called from DISPATCH_LEVEL, so using a sleep lock here is semantically incorrect. Also updated subr_witness.c to add the lock to the order list.	2005-03-07 03:05:31 +00:00
Alan Cox	2b2c7a6b40	The m_ext reference counts are potentially shared and modified asynchronously by different threads. Thus, declare as volatile the reference count that is accessed through m_ext's pointer, ref_cnt. Revert the previous change, revision 1.144, that casts as volatile a single dereference of ref_cnt. Reviewed by: bmilekic, dwhite Problem reported by: kris MFC after: 3 days	2005-03-06 20:09:00 +00:00
Dag-Erling Smørgrav	f3301d15f1	Teach getenv_quad() to recognize k/m/g/t suffixes in both lower- and upper-case. This means (almost) all tunables now support those suffixes.	2005-03-05 15:52:12 +00:00
David Xu	bc8e6d817d	Allocate umtx_q from heap instead of stack, this avoids page fault panic in kernel under heavy swapping.	2005-03-05 09:15:03 +00:00
David Xu	627451c1d9	The td_waitset is pointing to a stack address when thread is waiting for a signal, because kernel stack is swappable, this causes page fault in kernel under heavy swapping case. Fix this bug by eliminating unneeded code.	2005-03-04 22:46:31 +00:00
Maxim Sobolev	4b1783363f	In linux emulation layer try to detect attempt to use linux_clone() to create kernel threads and call rfork(2) with RFTHREAD flag set in this case, which puts parent and child into the same threading group. As a result all threads that belong to the same program end up in the same threading group. This is similar to what linuxthreads port does, though in this case we don't have a luxury of having access to the source code and there is no definite way to differentiate linux_clone() called for threading purposes from other uses, so that we have to resort to heuristics. Allow SIGTHR to be delivered between all processes in the same threading group previously it has been blocked for s[ug]id processes. This also should improve locking of the same file descriptor from different threads in programs running under linux compat layer. PR: kern/72922 Reported by: Andriy Gapon <avg@icyb.net.ua> Idea suggested by: rwatson	2005-03-03 16:57:55 +00:00
Doug White	a1d0c3f203	Insert volatile cast to discourage gcc from optimizing the read outside of the while loop. Suggested by: alc MFC after: 1 day	2005-03-03 02:41:37 +00:00
Joerg Wunsch	a5f50ef9e4	netchild's mega-patch to isolate compiler dependencies into a central place. This moves the dependency on GCC's and other compiler's features into the central sys/cdefs.h file, while the individual source files can then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42. By now, GCC and ICC (the Intel compiler) have been actively tested on IA32 platforms by netchild. Extension to other compilers is supposed to be possible, of course. Submitted by: netchild Reviewed by: various developers on arch@, some time ago	2005-03-02 21:33:29 +00:00
David Xu	6675b36ec5	In kern_sigtimedwait, remove waitset bits for td_sigmask before sleeping, so in do_tdsignal, we no longer need to test td_waitset. now td_waitset is only used to give a thread higher priority when delivering signal to multithreads process. This also fixes a bug: when a thread in sigwait states was suspended and later resumed by SIGCONT, it can no longer receive signals belong to waitset.	2005-03-02 13:43:51 +00:00
Paul Saab	b8a4edc17e	Use kern_kevent instead of the stackgap for 32bit syscall wrapping. Submitted by: jhb Tested on: amd64	2005-03-01 17:45:55 +00:00
Paul Saab	c1aa81b6d9	regen	2005-03-01 17:44:34 +00:00

... 2 3 4 5 6 ...

8569 Commits