freebsd-dev

Author	SHA1	Message	Date
John Baldwin	8aa9e82e67	Move INTR_FILTER from opt_global.h to its own header.	2008-04-05 20:13:15 +00:00
John Baldwin	1ee1b68792	Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt code. - Rename the intr_event 'eoi', 'disable', and 'enable' hooks to 'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric. Also, add a comment describe what the MI code expects them to do. - On amd64, i386, and powerpc this is effectively a NOP. - On arm, don't bother masking the interrupt unless the ithread is scheduled in the non-INTR_FILTER case to match what INTR_FILTER did. Also, don't bother unmasking the interrupt in the post_filter case if we never masked it. The INTR_FILTER case had been doing this by having arm_unmask_irq for the post_filter (formerly 'eoi') hook. - On ia64, stray interrupts are now masked for the non-INTR_FILTER case. They were already masked in the INTR_FILTER case. - On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for both the 'post_filter' and 'post_ithread' hooks to match what the non-INTR_FILTER code did. - On sun4v, retire the ithread wrapper hack by using an appropriate 'post_ithread' hook instead (it's what 'post_ithread'/'enable' was designed to do even in 5.x). Glanced at by: piso Reviewed by: marius Requested by: marius [1], [5] Tested on: amd64, i386, arm, sparc64	2008-04-05 19:58:30 +00:00
Alan Cox	7630c26507	Reintroduce UMA_SLAB_KMAP; however, change its spelling to UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM. (UMA_SLAB_KMAP met its original demise in revision 1.30 of vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame allocators. Without it, UMA cannot correctly return pages from the jumbo frame zones to the VM system because it resets the pages' object field to NULL instead of the kernel object. In more detail, the jumbo frame zones are created with the option UMA_ZONE_REFCNT. This causes UMA to overwrite the pages' object field with the address of the slab. However, when UMA wants to release these pages, it doesn't know how to restore the object field, so it sets it to NULL. This change teaches UMA how to reset the object field to the kernel object. Crashes reported by: kris Fix tested by: kris Fix discussed with: jeff MFC after: 6 weeks	2008-04-04 18:41:12 +00:00
Jeff Roberson	00ca09449d	- Add sysctls at debug.rwlock to control the behavior of the speculative spinning when readers hold a lock. This spinning is speculative because, unlike the write case, we can not test whether the owners are running. - Add speculative read spinning for readers who are blocked by pending writers while a read lock is still held. This allows the thread to spin until the write lock succeeds after which it may spin until the writer has released the lock. This prevents excessive context switches when readers and writers both hold the lock for brief periods. Sponsored by: Nokia	2008-04-04 10:00:46 +00:00
Jeff Roberson	3bc8c68d9f	- Add a Nokia copyright to cpuset to reflect their generous contribution to this work.	2008-04-04 01:22:04 +00:00
Jeff Roberson	0502fe2e43	- Allow static_boost to specify no boost with '0', traditional kernel fixed pri boost with '1' or any priority less than the current thread's priority with a value greater than two. Default the boost to PRI_MIN_TIMESHARE to prevent regular user-space threads from starving threads in the kernel. This prevents these user-threads from also being scheduled as if they are high fixed-priority kernel threads. - Restore the setting of lowpri in tdq_choose(). It has to be either here or in sched_switch(). I accidentally removed it from both places. Tested by: kris	2008-04-04 01:16:18 +00:00
Jeff Roberson	03d17db7d5	- Don't check for the ITHD pri class in tdq_load_add and rem. 4BSD doesn't do this either. Simply check P_NOLOAD. It'd be nice if this was in a thread flag so we didn't have an extra cache miss every time we add and remove a thread from the run-queue.	2008-04-04 01:04:43 +00:00
Jeff Roberson	e4b1aa6210	- Fix a mis-merge that crept in during the softclock changes. Spotted by: jhb	2008-04-04 01:03:23 +00:00
David Xu	44253336b6	let umtxq_busy() only spin on mp machine. make function name do_rwlock_unlock to be consistent with others.	2008-04-03 11:49:20 +00:00
Jeff Roberson	e8245292a7	- Convert two timeout users to the new callout_reset_curcpu() api. Sponsored by: Nokia	2008-04-02 11:21:42 +00:00
Jeff Roberson	8d809d5061	Implement per-cpu callout threads, wheels, and locks. - Move callout thread creation from kern_intr.c to kern_timeout.c - Call callout_tick() on every processor via hardclock_cpu() rather than inspecting callout internal details in kern_clock.c. - Remove callout implementation details from callout.h - Package up all of the global variables into a per-cpu callout structure. - Start one thread per-cpu. Threads are not strictly bound. They prefer to execute on the native cpu but may migrate temporarily if interrupts are starving callout processing. - Run all callouts by default in the thread for cpu0 to maintain current ordering and concurrency guarantees. Many consumers may not properly handle concurrent execution. - The new callout_reset_on() api allows specifying a particular cpu to execute the callout on. This may migrate a callout to a new cpu. callout_reset() schedules on the last assigned cpu while callout_reset_curcpu() schedules on the current cpu. Reviewed by: phk Sponsored by: Nokia	2008-04-02 11:20:30 +00:00
Konstantin Belousov	35b450291a	Add two missed chunks from the rev. 1.210, for the giant_read() and giant_ioctl(). PR: kern/122287 MFC after: 3 days	2008-04-02 11:11:58 +00:00
Jeff Roberson	1fd9b6a577	- Destroy the bo mtx when the vnode is destroyed.	2008-04-02 10:40:03 +00:00
David Xu	fadd84c58f	Fix compiling problem for amd64.	2008-04-02 05:54:41 +00:00
David Xu	11b1023b7d	Er, don't restart a timeout version.	2008-04-02 04:26:59 +00:00
David Xu	1a30511c61	Introduce kernel based userland rwlock. Each umtx chain now has two lists, one for readers and one for writers, other types of synchronization object just use first list. Asked by: jeff	2008-04-02 04:08:37 +00:00
Attilio Rao	b31a149bbb	Add rw_try_rlock() and rw_try_wlock() to rwlocks. These functions try the specified operation (rlocking and wlocking) and true is returned if the operation completes, false otherwise. The KPI is enriched by this commit, so __FreeBSD_version bumping and manpage updating will happen soon. Requested by: jeff, kris	2008-04-01 20:31:55 +00:00
Doug Rabson	60cdfde09f	Don't try to use an SX lock while holding the vnode interlock. Sponsored by: Isilon Systems	2008-04-01 16:07:01 +00:00
Konstantin Belousov	f2296b585e	Regen	2008-03-31 12:12:27 +00:00
Konstantin Belousov	7104518b07	Add the openat(), fexecve() and other *at() syscalls to the table. Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:06:55 +00:00
Konstantin Belousov	632dbc19e2	Implement the fexecve(2) syscall. Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:05:52 +00:00
Konstantin Belousov	e4193f25cb	Implement the openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2), futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2), readlinkat(2), renameat(2), symlinkat(2) syscalls. Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:04:20 +00:00
Konstantin Belousov	57b4252e45	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
Konstantin Belousov	e314f69fff	Add the support for the O_EXEC open(2) mode, as specified by the POSIX Extended API Set Part 2 extension specification. Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 11:57:18 +00:00
Konstantin Belousov	0a3af16a75	Add the utility function vn_commname() to retrieve the command name from the vfs namecache, when available. Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 11:53:03 +00:00
Jeff Roberson	a03ee0000e	- Consistently return EDEADLK when presented with a new set that is incompatible with existing bindings. - Try to copyout the setid in cpuset() before migrating the proc to the setid in case the user has supplied a bad buffer. - Rename cpuset_root() and cpuset_base() to cpuset_ref{root,base} to be more descriptive and free cpuset_root to be used as a different type of symbol. - Make cpuset_root the cpuset_t set of all cpus in the system. This should contain the same bitmask as all_cpus presently. - Add a CPU_CMP() macro to compare two sets.	2008-03-30 11:31:14 +00:00
Jeff Roberson	5634d48667	- Don't allow calls to vn_lock() with no lock type requested. Callers which simply want a reference should use vref(). Callers which want to check validity need to hold a lock while performing any action based on that validity. vn_lock() would always release the interlock before returning making any action synchronous with the validity check impossible.	2008-03-29 23:36:26 +00:00
Jeff Roberson	069c6953a0	- Use vget() to lock the vnode rather than refing without a lock and locking in separate steps.	2008-03-29 23:30:40 +00:00
Attilio Rao	71072af500	b_waiters cannot be adequately protected by the interlock because it is dropped after the call to lockmgr() so just revert this approach using something similar to the precedent one: BUF_LOCKWAITERS() just checks if there are waiters (not the actual number of them) and it is based on newly introduced lockmgr_waiters() which returns if the lockmgr has waiters or not. The name has been choosen differently by old lockwaiters() in order to not confuse them. KPI results enriched by this commit so __FreeBSD_version bumping and manpage update will be happening soon. 'struct buf' also changes, so kernel ABI is disturbed. Bug found by: jeff Approved by: jeff, kib	2008-03-28 12:30:12 +00:00
John Birrell	fc70c0bdd8	Regen after makesyscalls.sh change.	2008-03-27 01:55:06 +00:00
John Birrell	e994ea8e55	Generate another function for the DTrace syscall provider to specify the syscall argument types. This code is only compiled into the systrace kernel modul and has no effect otherwise.	2008-03-27 01:53:44 +00:00
Poul-Henning Kamp	e465985885	The "free-lance" timer in the i8254 is only used for the speaker these days, so de-generalize the acquire_timer/release_timer api to just deal with speakers. The new (optional) MD functions are: timer_spkr_acquire() timer_spkr_release() and timer_spkr_setfreq() the last of which configures the timer to generate a tone of a given frequency, in Hz instead of 1/1193182th of seconds. Drop entirely timer2 on pc98, it is not used anywhere at all. Move sysbeep() to kern/tty_cons.c and use the timer_spkr() if they exist, and do nothing otherwise. Remove prototypes and empty acquire-/release-timer() and sysbeep() functions from the non-beeping archs. This eliminate the need for the speaker driver to know about i8254frequency at all. In theory this makes the speaker driver MI, contingent on the timer_spkr_() functions existing but the driver does not know this yet and still attaches to the ISA bus. Syscons is more tricky, in one function, sc_tone(), it knows the hz and things are just fine. In the other function, sc_bell() it seems to get the period from the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode the 1193182 and leave it at that. It's probably not important. Change a few other sysbeep() uses which obviously knew that the argument was in terms of i8254 frequency, and leave alone those that look like people thought sysbeep() took frequency in hertz. This eliminates the knowledge of i8254_freq from all but the actual clock.c code and the prof_machdep.c on amd64 and i386, where I think it would be smart to ask for help from the timecounters anyway [TBD].	2008-03-26 20:09:21 +00:00
Doug Rabson	a7ac0db6cb	Regen.	2008-03-26 15:24:02 +00:00
Doug Rabson	dfdcada31e	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks	2008-03-26 15:23:12 +00:00
Scott Long	478cfc7300	Implement taskqueue_block() and taskqueue_unblock(). These functions allow the owner of a queue to block and unblock execution of the tasks in the queue while allowing tasks to continue to be added queue. Combining this with taskqueue_drain() allows a queue to be safely disabled. The unblock function may run (or schedule to run) the queue when it is called, just as calling taskqueue_enqueue() would. Reviewed by: jhb, sam	2008-03-25 22:38:45 +00:00
Ruslan Ermilov	ea26d58729	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.	2008-03-25 09:39:02 +00:00
Ruslan Ermilov	b2798e2573	Regen after changing prototypes of cpuset_{get,set}affinity().	2008-03-25 09:14:17 +00:00
Ruslan Ermilov	7f64829a5e	Fixed type of the fourth argument of cpuset_{get,set}affinity(2) to be size_t. Prodded by: davidxu	2008-03-25 09:11:53 +00:00
Jeff Roberson	0ee6cecc9d	- Greatly simplify vget() by removing the guarantee that any new references to a vnode with VI_OWEINACT set will force the vinactive() call. The kernel makes no guarantees about which reference was the last to close a file or when the actual inactive processing will happen. The previous code was designed to preserve existing semantics in the face of shared locks, however, this was unnecessary. Discussed with: mckusick	2008-03-24 04:22:58 +00:00
Jeff Roberson	804e60d4cf	- Don't acquire the vnode interlock in _vn_lock() unless no lock type is requested. Handle this case specially before the while loop. - Use the held vnode lock to check for VI_DOOMED. The vnode lock and interlock must both be held to set VI_DOOMED so either one held, even shared, is sufficient to check it. No objection by: kib	2008-03-24 04:17:35 +00:00
Konstantin Belousov	1be222e9df	Yield the cpu in the kernel while iterating the list of the vnodes belonging to the mountpoint. Also, yield when in the softdep_process_worklist() even when we are not going to sleep due to buffer drain. It is believed that the ULE fixed the problem [1], but the yielding seems to be needed at least for the 4BSD case. Discussed: on stable@, with bde Reviewed by: tegge, jeff [1] MFC after: 2 weeks	2008-03-23 13:45:24 +00:00
David Xu	34d05d83f6	Remove commented out code, thread suspension is done in thread library.	2008-03-23 02:03:06 +00:00
Jeff Roberson	e6b2545b3b	- Only return 1 from sync_vnode() in cases where the vnode is still at the head of the sync list. This prevents sched_sync() from re-queueing a vnode which may have been freed already. Discussed with: kib	2008-03-23 01:44:28 +00:00
Jeff Roberson	f6a8cecfc6	- Pass BO_MTX(bo) to lockmgr in vtruncbuf, we don't own the vnode interlock here anymore. Reported by: kris	2008-03-23 01:42:19 +00:00
Poul-Henning Kamp	4218a7310b	In abort2(2): Accept a NULL arg pointer if nargs == 0	2008-03-22 16:32:52 +00:00
Jeff Roberson	698b1a6643	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)	2008-03-22 09:15:16 +00:00
Alfred Perlstein	435cdf88ea	Fix a race where timeout/untimeout could cause crashes for Giant locked code. The bug: There exists a race condition for timeout/untimeout(9) due to the way that the softclock thread dequeues timeouts. The softclock thread sets the c_func and c_arg of the callout to NULL while holding the callout lock but not Giant. It then drops the callout lock and acquires Giant. It is at this point where untimeout(9) on another cpu/thread could be called. Since c_arg and c_func are cleared, untimeout(9) does not touch the callout and returns as if the callout is canceled. The softclock then tries to acquire Giant and likely blocks due to the other cpu/thread holding it. The other cpu/thread then likely deallocates the backing store that c_arg points to and finishes working and hence drops Giant. Softclock resumes and acquires giant and calls the function with the now free'd c_arg and we have corruption/crash. The fix: We need to track curr_callout even for timeout(9) (LOCAL_ALLOC) callouts. We need to free the callout after the softclock processes it to deal with the race here. Obtained from: Juniper Networks, iedowse Reviewed by: jhb, iedowse MFC After: 2 weeks.	2008-03-22 07:29:45 +00:00
Konstantin Belousov	e7ffdf423a	Reduce contention on the vnode interlock by not acquiring the BO_LOCK around the check for the BV_BKGRDINPROG in the brelse() and bqrelse(). See the comment for the explanation why it is safe. Tested by: pho Submitted by: jeff	2008-03-21 12:38:44 +00:00
Jeff Roberson	0169d126a6	- Reduce contention on the global bdonelock and bpinlock by using a pool mutex to protect these sleep/wakeup/counter races. This still is preferable to bloating each bio with a mtx.	2008-03-21 10:00:05 +00:00
Jeff Roberson	b7edba7704	- Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs to enter thread_suspend_check(). - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the thread_suspend_check() to ast() rather than userret(). - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so that we don't miss a suspend request. If this is set use the expensive signal path. - Set NEEDSUSPCHK when creating a new thread in thr in case the creating thread is due to be suspended as well but has not yet. Reviewed by: davidxu (Authored original patch)	2008-03-21 08:23:25 +00:00

1 2 3 4 5 ...

10411 Commits