freebsd-nq

Author	SHA1	Message	Date
John Baldwin	57b7fe337e	Partially revert the previous change. I failed to notice that where ktruserret() is invoked, an unlocked check of the per-process queue is performed inline, thus, we don't lock the ktrace_sx on every userret(). Pointy hat to: jhb Approved by: re (kensmith) Pointy hat recovered from: rwatson	2007-08-29 21:17:11 +00:00
John Baldwin	cc479dda4a	Rework the routines to convert a 5.x+ statfs structure (with fixed-size 64-bit counters) to a 4.x statfs structure (with long-sized counters). - For block counters, we scale up the block size sufficiently large so that the resulting block counts fit into a the long-sized (long for the ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs VOP did this already. This can lie about the block size to 4.x binaries, but it presents a more accurate picture of the ratios of free and available space. - For non-block counters, fix the freebsd32 stats converter to cap the values at INT32_MAX rather than losing the upper 32-bits to match the behavior of the 4.x statfs conversion routine in vfs_syscalls.c Approved by: re (kensmith)	2007-08-28 20:28:12 +00:00
Randall Stewart	2afb3e849f	- During shutdown pending, when the last sack came in and the last message on the send stream was "null" but still there, a state we allow, we could get hung and not clean it up and wait for the shutdown guard timer to clear the association without a graceful close. Fix this so that that we properly clean up. - Added support for Multiple ASCONF per new RFC. We only (so far) accept input of these and cannot yet generate a multi-asconf. - Sysctl'd support for experimental Fast Handover feature. Always disabled unless sysctl or socket option changes to enable. - Error case in add-ip where the peer supports AUTH and ADD-IP but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to ABORT in this case. - According to the Kyoto summit of socket api developers (Solaris, Linux, BSD). We need to have: o non-eeor mode messages be atomic - Fixed o Allow implicit setup of an assoc in 1-2-1 model if using the sctp_**() send calls - Fixed o Get rid of HAVE_XXX declarations - Done o add a sctp_pr_policy in hole in sndrcvinfo structure - Done o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch! - Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize when we close sending out the data and disabling Nagle. - Change key concatenation order to match the auth RFC - When sending OOTB shutdown_complete always do csum. - Don't send PKT-DROP to a PKT-DROP - For abort chunks just always checksums same for shutdown-complete. - inpcb_free front state had a bug where in queue data could wedge an assoc. We need to just abandon ones in front states (free_assoc). - If a peer sends us a 64k abort, we would try to assemble a response packet which may be larger than 64k. This then would be dropped by IP. Instead make a "minimum" size for us 64k-2k (we want at least 2k for our initack). If we receive such an init discard it early without all the processing. - When we peel off we must increment the tcb ref count to keep it from being freed from underneath us. - handling fwd-tsn had bugs that caused memory overwrites when given faulty data, fixed so can't happen and we also stop at the first bad stream no. - Fixed so comm-up generates the adaption indication. - peeloff did not get the hmac params copied. - fix it so we lock the addr list when doing src-addr selection (in future we need to use a multi-reader/one writer lock here) - During lowlevel output, we could end up with a _l_addr set to null if the iterator is calling the output routine. This means we would possibly crash when we gather the MTU info. Fix so we only do the gather where we have a src address cached. - we need to be sure to set abort flag on conn state when we receive an abort. - peeloff could leak a socket. Moved code so the close will find the socket if the peeloff fails (uipc_syscalls.c) Approved by: re@freebsd.org(Ken Smith)	2007-08-27 05:19:48 +00:00
Konstantin Belousov	5114048b63	Destroy the kaio_mtx on the freeing the struct kaioinfo in the aio_proc_rundown. Do not allow for zero-length read to be passed to the fo_read file method by aio. Reported and tested by: Peter Holm Approved by: re (kensmith)	2007-08-20 11:53:26 +00:00
Jeff Roberson	67e20930bd	- Improve runq_findbit_from() which is used by ULE's circular queue. Mask of the bits we want to ignore on the first pass rather than doing a linear scan. This puts us within a few instructions of the cost of runq_findbit() and removes this function from the top of profiling output for context switch heavy workloads. Approved by: re	2007-08-20 06:36:12 +00:00
Jeff Roberson	9862717afe	- Set steal_thresh to log2(ncpus). This improves idle-time load balancing on 2cpu machines by reducing it to 1 by default. This improves loaded operation on 8cpu machines by increasing it to 3 where the extra idle time is not as critical. Approved by: re	2007-08-20 06:34:20 +00:00
Nate Lawson	62db376af3	Always call sched_bind(), even if on the CPU in question. It is wrong to check if we're already on that cpu and skip the bind since the thread could be migrated off in the meantime. Suggested by: jeff Approved by: re	2007-08-20 06:28:26 +00:00
Nate Lawson	2145b9d207	Use a different loop variable for the inner loop. This previous reuse could have caused a hang, but we got lucky with the available multi-CPU states on actual hardware. Submitted by: Bjorn Koenig <bkoenig / alpha-tierchen.de> Approved by: re MFC after: 3 days	2007-08-19 20:34:13 +00:00
David Xu	6ec46f7aa8	Regenerate. Approved by: re(kensmith)	2007-08-16 05:32:26 +00:00
David Xu	0b1f0611b4	Add thr_kill2 syscall which sends a signal to a thread in another process. Submitted by: Tijl Coosemans tijl at ulyssis dot org Approved by: re (kensmith)	2007-08-16 05:26:42 +00:00
John Baldwin	1dc5b1cc56	On 6.x this works: % mount \| grep home /dev/ad4s1e on /home (ufs, local, noatime, soft-updates) % mount -u -o atime /home % mount \| grep home /dev/ad4s1e on /home (ufs, local, soft-updates) Restore this behavior for on 7.x for the following mount options: noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow In addition, on 7.x, the following are equivalent: mount -u -o atime /home mount -u -o nonoatime /home Ideally, when we introduce new mount options, we should avoid options starting with "no". :) Requested by: jhb Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com> Approved by: re (bmah) Proxy commit for: rodrigc	2007-08-15 17:40:09 +00:00
Pawel Jakub Dawidek	354eb80141	Improve vn_printf() by: - adding missing vnode flags, - printing unknown flags as numbers, - using strlcat() instead of strcat(). Approved by: re (bmah)	2007-08-13 21:23:30 +00:00
Konstantin Belousov	004e08be60	Do not call free() while holding vnode interlock. Reported and tested by: Peter Holm Reviewed by: jeff Approved by: re (kensmith)	2007-08-07 09:04:50 +00:00
Robert Watson	0bf686c125	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
Jeff Roberson	3a78f9658b	- Fix one line that erroneously crept in my last commit. Approved by: re	2007-08-04 01:21:28 +00:00
Jeff Roberson	c47f202b45	- Share scheduler locks between hyper-threaded cores to protect the tdq_group structure. Hyper-threaded cores won't really benefit from seperate locks anyway. - Seperate out the migration case from sched_switch to simplify the main switch code. We only migrate here if called via sched_bind(). - When preempted place the preempted thread back in the same queue at the head. - Improve the cpu group and topology infrastructure. Tested by: many on current@ Approved by: re	2007-08-03 23:38:46 +00:00
Jeff Roberson	413ea6f543	- Set SW_PREEMPT when we preempt in critical_exit(). Approved by: re	2007-08-03 23:35:35 +00:00
Robert Watson	33d2bb9ca3	First in a series of changes to remove the now-unused Giant compatibility framework for non-MPSAFE network protocols: - Remove debug_mpsafenet variable, sysctl, and tunable. - Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel. - Remove logic to automatically flag interrupt handlers as non-MPSAFE if debug.mpsafenet is set for an INTR_TYPE_NET handler. - Remove logic to automatically flag netisr handlers as non-MPSAFE if debug.mpsafenet is set. - Remove references in a few subsystems, including NFS and Cronyx drivers, which keyed off debug_mpsafenet to determine various aspects of their own locking behavior. - Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into no-op's, as their entire behavior was determined by the value in debug_mpsafenet. - Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE. Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still present in subsystems, and will be removed in followup commits. Reviewed by: bz, jhb Approved by: re (kensmith)	2007-07-27 11:59:57 +00:00
Attilio Rao	34ed040030	Actually, upcalls cannot be freed while destroying the thread because we should call uma_zfree() with various spinlock helds. Rearranging the code would not help here because we cannot break atomicity respect prcess spinlock, so the only one choice we have is to defer the operation. In order to do this use a global queue synchronized through the kse_lock spinlock which is freed at any thread_alloc() / thread_wait() through a call to thread_reap(). Note that this approach is not ideal as we should want a per-process list of zombie upcalls, but it follows initial guidelines of KSE authors. Tested by: jkim, pav Approved by: jeff, julian Approved by: re	2007-07-27 09:21:18 +00:00
Pawel Jakub Dawidek	57fd3d5572	When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)	2007-07-26 16:58:09 +00:00
Pawel Jakub Dawidek	68c1a246ae	The v_mountedhere field is protected by the vnode lock, not vnode's internal lock. Approved by: re (rwatson)	2007-07-26 16:52:57 +00:00
Attilio Rao	758b17a100	upcall_free() was only used in kse_GC() which has been removed so it now results unused; this, with -Werror option of gcc, rise a warning for gcc which let the buildkernel to be busted. Fix this removing upcall_free(). Reported by: various Approved by: jeff Approved by: re Pointy hat to: attilio	2007-07-23 23:16:53 +00:00
Attilio Rao	ac8094e4e3	Actually, KSE kernel bits locking is broken and can lead likely to dangerous races. Fix this problems adding correct locking for the members of 'struct kse_upcall' and other struct proc/struct thread related members. For the moment, just leave ku_mflag and ku_flags "lazy" locked. While here, cleanup the code removing the function kse_GC() (unused), and merging upcall_link(), upcall_unlink(), upcall_stash() in their respective callers (static functions, very short and only called in one place). Reported by: pav Tested by: pav (on some pointyhat cluster nodes) Approved by: jeff Approved by: re Sponsorized by: NGX Italy (http://www.ngx.it)	2007-07-23 14:52:22 +00:00
David Malone	6d8617d42a	If clock_ct_to_ts fails to convert time time from the real time clock, print a one line error message. Add some comments on not being able to trust the day of week field (I'll act on these comments in a follow up commit). Approved by: re MFC after: 3 weeks	2007-07-23 09:42:32 +00:00
Konstantin Belousov	e69aee3117	ttyfree() frees the cdev(). But if there are pending kevents, filt_ttyrdetach() etc would later attempt to dereference cdev->si_tty, causing a 0xdeadc0de dereference. Change kn_hook value from cdev to struct tty to avoid dereferencing freed cdev. In ttygone(), wake up select(), sigio and kevent() users in addition to the queue sleepers. Return EV_EOF from kevent filters if TS_GONE is set. Submitted by: peter Tested by: Peter Holm Approved by: re (kensmith) MFC after: 2 weeks	2007-07-20 09:41:54 +00:00
Attilio Rao	6aa294be2c	Fix some problems with lock profiling in rw locks: - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - As for sx locks, disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - style(9) fixes Approved by: jeff, kmacy, jhb[1] Approved by: re [1] Had initial reservations not shared by others, conceded in the end.	2007-07-20 08:43:42 +00:00
Jeff Roberson	28994a5852	- Refine the load balancer to improve buildkernel times on dual core machines. - Leave the long-term load balancer running by default once per second. - Enable stealing load from the idle thread only when the remote processor has more than two transferable tasks. Setting this to one further improves buildworld. Setting it higher improves mysql. - Remove the bogus pick_zero option. I had not intended to commit this. - Entirely disallow migration for threads with SRQ_YIELDING set. This balances out the extra migration allowed for with the load balancers. It also makes pick_pri perform better as I had anticipated. Tested by: Dmitry Morozovsky <marck@rinet.ru> Approved by: re	2007-07-19 20:03:15 +00:00
Jeff Roberson	08c9a16c4f	- When newtd is specified to sched_switch() it was not being initialized properly. We have to temporarily unlock the TDQ lock so we can lock the thread and add it to the run queue. This is used only for KSE. - When we add a thread from the tdq_move() via sched_balance() we need to ipi the target if it's sitting in the idle thread or it'll never run. Reported by: Rene Landan Approved by: re	2007-07-19 19:51:45 +00:00
Jeff Roberson	56696bd1ab	- Remove explicit references to sched_lock. A simpler assert will do. Approved by: re	2007-07-19 08:58:40 +00:00
Jeff Roberson	6eeb364b4c	- Calling sched_nice() in tdsigwakeup() is no longer required by ULE and actually causes LORs and other panics. Reported by: mlaier Approved by: re	2007-07-19 08:49:16 +00:00
Jeff Roberson	6ea38de8aa	- Remove the global definition of sched_lock in mutex.h to break new code and third party modules which try to depend on it. - Initialize sched_lock in sched_4bsd.c. - Declare sched_lock in sparc64 pmap.c and assert that we're compiling with SCHED_4BSD to prevent accidental crashes from running ULE. This is the sole remaining file outside of the scheduler that uses the global sched_lock. Approved by: re	2007-07-18 20:46:06 +00:00
Jeff Roberson	773890b9a8	- Add the proper lock profiling calls to _thread_lock(). Obtained from: kipmacy Approved by: re	2007-07-18 20:38:13 +00:00
Jeff Roberson	ae7a6b38d5	ULE 3.0: Fine grain scheduler locking and affinity improvements. This has been in development for over 6 months as SCHED_SMP. - Implement one spin lock per thread-queue. Threads assigned to a run-queue point to this lock via td_lock. - Improve the facility for assigning threads to CPUs now that sched_lock contention no longer dominates scheduling decisions on larger SMP machines. - Re-write idle time stealing in an attempt to make it less damaging to general performance. This is still disabled by default. See kern.sched.steal_idle. - Call the long-term load balancer from a callout rather than sched_clock() so there are no locks held. This is disabled by default. See kern.sched.balance. - Parameterize many scheduling decisions via sysctls. Try to document these via sysctl descriptions. - General structural and naming cleanups. - Document each function with comments. Tested by: current@ amd64, x86, UP, SMP. Approved by: re	2007-07-17 22:53:23 +00:00
Jeff Roberson	fb62eea266	- Use ruxagg() in calcru() to make sure we have current tick information from all threads. Discussed with: bde, attilio Approved by: re	2007-07-17 01:08:09 +00:00
Craig Rodrigues	d7f81adbd4	Revert previous commits which I committed by mistake. Approved by: re (implicit) Pointy hat to: me	2007-07-14 21:23:31 +00:00
Craig Rodrigues	d678780e60	The last entry in the ext2_opts array must be NULL, otherwise the kernel with crash in vfs_filteropt() if an invalid mount option is passed to ext2fs. Approved by: re (kensmith)	2007-07-14 21:18:19 +00:00
John Baldwin	59d8f3ff08	Fix a couple of issues with the stack limit for 32-bit processes on 64-bit kernels exposed by the recent fixes to resource limits for 32-bit processes on 64-bit kernels: - Let ABIs expose their maximum stack size via a new pointer in sysentvec and use that in preference to maxssiz during exec() rather than always using maxssiz for all processses. - Apply the ABI's limit fixup to the previous stack size when adjusting RLIMIT_STACK to determine if the existing mapping for the stack needs to be grown or shrunk (as well as how much it should be grown or shrunk). Approved by: re (kensmith)	2007-07-12 18:01:31 +00:00
Attilio Rao	c1a6d9fa42	Fix some problems with lock_profiling in sx locks: - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - Disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - Use homogeneous names for automatic variables in hard functions regarding lock_profiling - Style fixes - Add a CTASSERT for some flags building Discussed with: kmacy, kris Approved by: jeff (mentor) Approved by: re	2007-07-06 13:20:44 +00:00
Konstantin Belousov	196a7385ac	Revert destroy_dev() to the state before destroy_dev_sched() was introduced. Attempt to spawn destroy_dev_sched() from it causes inadmissible races. Requested by: tegge Approved by: re (kensmith)	2007-07-05 13:04:59 +00:00
Bjoern A. Zeeb	f43455fd89	Remove netkey directory from cscope/TAGs generation and replace it with netipsec now that KAME IPsec is gone. While here add missing netinet6 directories. Add comments about the ports needed to be able to run those targets. Reviewed by: philip Approved by: re (rwatson)	2007-07-05 08:55:14 +00:00
Peter Wemm	22af4cab91	Fix bad function type passed to destroy_dev_sched_cb(). Approved by: re (rwatson)	2007-07-05 05:54:47 +00:00
Peter Wemm	c2815ad564	Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate Approved by: re (kensmith)	2007-07-04 22:57:21 +00:00
Peter Wemm	552fbe752f	Regenerate after mmap/lseek/etc syscall changes. Approved by: re (kensmith)	2007-07-04 22:49:55 +00:00
Peter Wemm	51504d9ac4	Create new syscalls for mmap(), lseek(), pread(), pwrite(), truncate() and ftruncate(), but without the pad arg. There are several reasons for this. Consider 'mmap()'. On AMD64, the function call (and syscall) ABI allow for 6 register arguments. Additional arguments go on the stack. mmap(2) has 6 arguments. However, the syscall definition has an extra 'int pad' argument. This pushes it to 7 arguments, which means one must spill into the memory stack. Since the kernel API doesn't match userland API, we have a hack in libc - libc/sys/mmap.c. This implements the userland API by calling __syscall() with an extra argument and the pad argument, for a total of 8 args. This is all unnecessary and inconvenient for several things, including the kernel's syscall handler code which now has to handle merging stack arguments with register arguments. It is a big deal for certain 3rd party code. I'm adding libc glue to make the transition totally painless. I had intended to mark the old syscalls as COMPAT6, but the potential to shoot your feet by building a new kernel without COMPAT_FREEBSD6 but with a slighly older userland was too great. For now, they have manual "freebsd6_" prefixes rather than being COMPAT6. They will go back to being marked 'COMPAT6' after 7-stable starts. Approved by: re (kensmith)	2007-07-04 22:47:37 +00:00
Peter Wemm	9f0482e515	Add support for COMPAT6 syscalls. Also, change the visibility of compat syscalls a slightly. Compat syscalls were missing from 'syscalls.h' entirely. This additionally adds them with their compat prefix. eg: SYS_freebsd6_mmap. Also, the syscalls.c names strings have different prefixes to differentiate syscalls. Instead of several "old.mmap" strings, there will now be a "compat.mmap" and "compat6.mmap" etc. Before, both would have had the same "old.mmap" label. Approved by: re	2007-07-04 22:38:28 +00:00
Konstantin Belousov	09828ba947	Since cdev mutex is after system map mutex in global lock order, free() shall not be called while holding cdev mutex. devfs_inos unrhdr has cdev as mutex, thus creating this LOR situation. Postpone calling free() in kern/subr_unit.c:alloc_unr() and nested functions until the unrhdr mutex is dropped. Save the freed items on the ppfree list instead, and provide the clean_unrhdrl() and clean_unrhdr() functions to clean the list. Call clean_unrhdrl() after devfs_create() calls immediately before dropping cdev mutex. devfs_create() is the only user of the alloc_unrl() in the tree. Reviewed by: phk Tested by: Peter Holm LOR: 80 Approved by: re (kensmith)	2007-07-04 06:56:58 +00:00
Jeff Roberson	f6c1ecca50	- Use explicit locking in the various fcntl case statements so that we can acquire shared filedescriptor locks in the appropriate cases. - Remove Giant from calls that issue ioctls. The ioctl path has been mpsafe for some time now. - Only acquire giant for VOP_ADVLOCK when the filesystem requires giant. advlock is now mpsafe. Reviewed by: rwatson Approved by: re	2007-07-03 21:26:06 +00:00
Jeff Roberson	bc02f1d98d	- Remove explicit Giant protection from lockf. Use the vnode interlock to protect this datastructure instead. - Preallocate an extra lockf structure in case we want to split a lock on insert or delete. - msleep() on the vnode interlock when blocking on a lock. Reviewed by: rwatson Approved by: re	2007-07-03 21:22:58 +00:00
John Baldwin	fb1faf2082	Tweak the low-level MI SMP code some: - Use cpu_spinwait() in the spin loops in stop_cpus(), restart_cpus(), and smp_rendezvous_action(). - Remove unneeded acq memory barriers in stop_cpus(), restart_cpus(), and smp_rendezvous_action(). - Add an additional synch point in smp_rendezvous() to ensure that all the CPUs will always see an up-to-date value of smp_rv_setup_func. Reviewed by: attilio Approved by: re (kensmith) Tested on: alpha, amd64, i386, sparc64 SMP (for several years)	2007-07-03 18:37:06 +00:00
Konstantin Belousov	9d53363bc8	Rev. 1.204 and 1.205 got an erronous version of destroy_dev() that calls destroy_dev_sched() with cdev mutex locked. Commit the code that was actually tested. Pointy hat to: kib Approved by: re (implicit)	2007-07-03 18:18:30 +00:00
Konstantin Belousov	f5baf8d66b	Lock Giant and proctree lock around dereferencing p_session->s_ttyvp->v_rdev. Lock cdev mutex too to close the race with tty being freed. Relock clone_drain_lock to prevent the LOR with proctree lock, thus add #include <fs/devfs/devfs_int.h>. Suggested by: tegge Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:46:37 +00:00
Konstantin Belousov	8a5d7ef25c	Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from pty clone handler. Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:45:52 +00:00
Konstantin Belousov	0a9c2b6db8	Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from the clone handler. Lock Giant in the clone handler. Use destroy_dev_sched() explicitely from pty_maybecleanup() and postpone pty_release() until both master and slave cdevs are destroyed by setting it as callback for destroy_dev_sched(). Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:44:59 +00:00
Konstantin Belousov	6f0281937b	Automatically detect deadlock condition in destroy_dev(), that is, if destroy_dev() is called from csw method, and no d_purge driver method is provided. Transform the direct call to destroy_dev() into destroy_dev_sched(). Reviewed by: njl (programming interface) Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:43:20 +00:00
Konstantin Belousov	de10ffa527	Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls destroy_dev() from d_close() cdev method would self-deadlock. devfs_close() bump device thread reference counter, and destroy_dev() sleeps, waiting for si_threadcount to reach zero for cdev without d_purge method. destroy_dev_sched() could be used instead from d_close(), to schedule execution of destroy_dev() in another context. The destroy_dev_sched_drain() function can be used to drain the scheduled calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains the events clone to make sure no lingering devices are left after dev_clone event handler deregistered. make_dev_credf(MAKEDEV_REF) function should be used from dev_clone event handlers instead of make_dev()/make_dev_cred() to ensure that created device has reference counter bumped before cdev mutex is dropped inside make_dev(). Reviewed by: tegge (early versions), njl (programming interface) Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:42:37 +00:00
Konstantin Belousov	7aee5992a5	Relock the sema_mtxp unconditionally after copyin() for SETALL case in kern_semctl. Otherwise, later mtx_unlock() can operate on unlocked mutex. Submitted by: rdivacky MFC after: 3 days Approved by: re (kensmith)	2007-07-03 15:58:47 +00:00
Robert Watson	bc6eca2432	Continue kernel privilege cleanup for 7.0: unstaticize suser_enabled and stop declaring it in systm.h -- it's used only in kern_priv.c and is not required elsewhere. Approved by: re (kensmith)	2007-07-02 14:03:29 +00:00
Randall Stewart	b8709d23c5	- Add some needed error checking on bad fd passing in the sctp syscalls. Approved by: re@freebsd.org (Ken Smith) Obtained from: Weongyo Jeong (weongyo.jeong@gmail.com)	2007-07-02 12:50:53 +00:00
Jeff Roberson	03d03260b2	- Use rufetchcalc() rather than calcru() in ttyinfo so that we get correct system and user time stats. Approved by: re Reported by: kris Discussed with: Attilio	2007-07-01 00:17:59 +00:00
Robert Watson	dc2e1e3fae	Use vm_offset_t for kmembase and kmemlimit rather than char *, avoiding unnecessary casts, and making it possible to compile kern_malloc.c with strict aliasing. Submitted by: rdivacky Approved by: re (kensmith)	2007-06-27 13:39:38 +00:00
Attilio Rao	6a0ce57d10	Fix an old standing LOR between callout_lock and sleepqueues chain (which could lead to a deadlock). - sleepq_set_timeout acquires callout_lock (via callout_reset()) only with sleepq chain lock held - msleep_spin in _callout_stop_safe lock the sleepqueue chain with callout_lock held In order to solve this don't use msleep_spin in _callout_stop_safe() but use directly sleepqueues as inline msleep_spin code. Rearrange the wakeup path in order to have it consistent too. Reported by: kris (via stress2 test suite) Tested by: Timothy Redaelli <drizzt@gufi.org> Reviewed by: jhb Approved by: jeff (mentor) Approved by: re	2007-06-26 21:42:01 +00:00
Attilio Rao	f08945a7d2	Introduce a new rwlocks initialization function: rw_init_flags. This is very similar to sx_init_flags: it initializes the rwlock using special flags passed as third argument (RW_DUPOK, RW_NOPROFILE, RW_NOWITNESS, RW_QUIET, RW_RECURSE). Among these, the most important new feature is probabilly that rwlocks can be acquired recursively now (for both shared and exclusive paths). Because of the recursion counter, the ABI is changed. Tested by: Timothy Redaelli <drizzt@gufi.org> Reviewed by: jhb Approved by: jeff (mentor) Approved by: re	2007-06-26 21:31:56 +00:00
Rong-En Fan	534046e301	- Remove UMAP filesystem. It was disconnected from build three years ago, and it is seriously broken. Discussed on: freebsd-arch@ Approved by: re (mux)	2007-06-25 05:06:57 +00:00
Konstantin Belousov	9bc911d4a2	devfs_free() calls free_unr(), that may sleep. Postpone call to devfs_free() after cdev mutex is dropped. Reuse cdp_list link for queuing devices awaiting deletion in the cdevp_free_list. Reported by: Hans Petter Selasky <hselasky c2i net> Tested by: Peter Holm Approved by: re (kensmith) MFC after: 2 weeks	2007-06-19 13:19:23 +00:00
Konstantin Belousov	7550e3eac4	Add the witness warning for free_unr. Function could sleep, thus callers shall not have any non-sleepable locks held. Submitted by: Hans Petter Selasky <hselasky c2i net> Approved by: re (kensmith)	2007-06-19 13:13:17 +00:00
Pawel Jakub Dawidek	dfe97ff4a5	We only flush entries related to the given file system. Currently there are no 'invalid' cache entires - file system is responsible for keeping it that way. The comment should have been updated in rev.1.25.	2007-06-18 09:28:24 +00:00
Robert Watson	7251b7863c	Rather than passing SUSER_RUID into priv_check_cred() to specify when a privilege is checked against the real uid rather than the effective uid, instead decide which uid to use in priv_check_cred() based on the privilege passed in. We use the real uid for PRIV_MAXFILES, PRIV_MAXPROC, and PRIV_PROC_LIMIT. Remove the definition of SUSER_RUID; there are now no flags defined for priv_check_cred(). Obtained from: TrustedBSD Project	2007-06-16 23:41:43 +00:00
Marius Strobl	79be8b5082	- Remove zstty spin lock for no longer existing zs(4). - Move the rtc_mtx spin lock out from under #ifdef SMP as it's just not SMP-specific. - Add a new spin lock pcib_mtx for locking "fast" interrupt handlers of host-to-PCI bridge drivers on sparc64.	2007-06-16 23:30:57 +00:00
Jeff Roberson	dda713dfb8	- Fix an off by one error in sched_pri_range. - In tdq_choose() only assert that a thread does not have too high a priority (low value) for the queue we removed it from. This will catch bugs in priority elevation. It's not a serious error for the thread to have too low a priority as we don't change queues in this case as an optimization. Reported by: kris	2007-06-15 19:33:58 +00:00
Robert Watson	7e273744a6	Remove the restriction that rtprio(2) cannot be used to set the realtime or idle priority of another process owned by the same user. This means that privilege in rtprio(2) (and rtprio_thread(2)) is required indirectly via p_cansched(9) or directly to set realtime/idle privilege, rather than directly affecting target process authorization.	2007-06-14 23:31:52 +00:00
Robert Watson	b4be6ef22f	Only require privilege to set the current time adjustment, not in order to query it.	2007-06-14 18:37:58 +00:00
Robert Watson	3805385e3d	Spell statistics more correctly in comments.	2007-06-14 03:02:33 +00:00
John Baldwin	34a9edafbc	Improve the ktrace locking somewhat to reduce overhead: - Depessimize userret() in kernels where KTRACE is enabled by doing an unlocked check of the per-process queue of pending events before acquiring any locks. Previously ktr_userret() unconditionally acquired the global ktrace_sx lock on every return to userland for every thread, even if ktrace wasn't enabled for the thread. - Optimize the locking in exit() to first perform an unlocked read of p_traceflag to see if ktrace is enabled and only acquire locks and teardown ktrace if the test succeeds. Also, explicitly disable tracing before draining any pending events so the pending events actually get written out. The unlocked read is safe because proc lock is acquired earlier after single-threading so p_traceflag can't change between then and this check (well, it can currently due to a bug in ktrace I will fix next, but that race existed prior to this change as well). Reviewed by: rwatson	2007-06-13 20:01:42 +00:00
John Baldwin	ce0be64687	Conditionally acquire Giant when dropping a reference on the ktrace vnode during execve() when turning off tracing due to executing a setuid binary as non-root. Previously this could fail to acquire Giant and fail an assertion if the ktrace file was on a non-MPSAFE filesystem and the executable was on an MPSAFE filesystem. MFC after: 3 days Reported by: kris	2007-06-13 19:41:47 +00:00
Jeff Roberson	3036ab79e3	- Include opt_sched.h for SCHED_STATS.	2007-06-12 23:27:31 +00:00
Jeff Roberson	671f2709ae	- Garbage collect unused concurrency functions.	2007-06-12 19:50:31 +00:00
Jeff Roberson	e7c8d2e9fe	- Garbage collect unused concurrency functions. - Remove unused kse fields from struct proc. - Group remaining fields and #ifdef KSE them. - Move some kern_kse.c only prototypes out of proc and into kern_kse. Discussed with: Julian	2007-06-12 19:49:39 +00:00
Jeff Roberson	fe54587ffa	- Move some common code out of sched_fork_exit() and back into fork_exit().	2007-06-12 07:47:09 +00:00
Jeff Roberson	ff8fbcffcb	Solve a complex exit race introduced with thread_lock: - Add a count of exiting threads, p_exitthreads, to struct proc. - Increment p_exithreads when we set the deadthread in thread_exit(). - When we thread_stash() a deadthread use an atomic to drop the count. - Spin until the p_exithreads count reaches 0 in thread_wait(). - Lock the last exiting thread momentarily to be certain that it has exited cpu_throw(). - Restructure thread_wait(). It does not need a loop as there will only ever be one thread. Tested by: moose@opera.com Reported by: kris, moose@opera.com	2007-06-12 07:24:46 +00:00
Robert Watson	32f9753cfb	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
Jeff Roberson	efe641b939	- Add a missing PROC_SUNLOCK() in tdsignal()	2007-06-11 23:27:03 +00:00
Olivier Houchard	e411ce026a	Re-acquire the PROC_SLOCK before calling calcru(), and release it after, since calcru() expects it to be locked. Reviewed by: attilio	2007-06-11 21:05:41 +00:00
Sam Leffler	68e8e04e93	Update 802.11 wireless support: o major overhaul of the way channels are handled: channels are now fully enumerated and uniquely identify the operating characteristics; these changes are visible to user applications which require changes o make scanning support independent of the state machine to enable background scanning and roaming o move scanning support into loadable modules based on the operating mode to enable different policies and reduce the memory footprint on systems w/ constrained resources o add background scanning in station mode (no support for adhoc/ibss mode yet) o significantly speedup sta mode scanning with a variety of techniques o add roaming support when background scanning is supported; for now we use a simple algorithm to trigger a roam: we threshold the rssi and tx rate, if either drops too low we try to roam to a new ap o add tx fragmentation support o add first cut at 802.11n support: this code works with forthcoming drivers but is incomplete; it's included now to establish a baseline for other drivers to be developed and for user applications o adjust max_linkhdr et. al. to reflect 802.11 requirements; this eliminates prepending mbufs for traffic generated locally o add support for Atheros protocol extensions; mainly the fast frames encapsulation (note this can be used with any card that can tx+rx large frames correctly) o add sta support for ap's that beacon both WPA1+2 support o change all data types from bsd-style to posix-style o propagate noise floor data from drivers to net80211 and on to user apps o correct various issues in the sta mode state machine related to handling authentication and association failures o enable the addition of sta mode power save support for drivers that need net80211 support (not in this commit) o remove old WI compatibility ioctls (wicontrol is officially dead) o change the data structures returned for get sta info and get scan results so future additions will not break user apps o fixed tx rate is now maintained internally as an ieee rate and not an index into the rate set; this needs to be extended to deal with multi-mode operation o add extended channel specifications to radiotap to enable 11n sniffing Drivers: o ath: add support for bg scanning, tx fragmentation, fast frames, dynamic turbo (lightly tested), 11n (sniffing only and needs new hal) o awi: compile tested only o ndis: lightly tested o ipw: lightly tested o iwi: add support for bg scanning (well tested but may have some rough edges) o ral, ural, rum: add suppoort for bg scanning, calibrate rssi data o wi: lightly tested This work is based on contributions by Atheros, kmacy, sephe, thompsa, mlaier, kevlo, and others. Much of the scanning work was supported by Atheros. The 11n work was supported by Marvell.	2007-06-11 03:36:55 +00:00
Attilio Rao	393a081d42	Optimize vmmeter locking. In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)	2007-06-10 21:59:14 +00:00
Matt Jacob	a659386c7e	Remove unused variable.	2007-06-10 01:50:05 +00:00
Matt Jacob	26756b7a58	The new compiler can't quite follow the logic of has_stime and complains about using uninitialized tags in stime.	2007-06-10 01:49:17 +00:00
Matt Jacob	9b73d2396a	Initialized ets to zero. This is arguably a gcc bug in that ets is always set to rts when timeout is non-NULL and then timevalid is set and ets is only checked later when timervalid is set.	2007-06-10 01:43:11 +00:00
Attilio Rao	bdf08be439	Fix a bug caming from the committing a pre-merge version of the patch instead than a post-merge version (respect to another rusage fix). Reported by: marcel Approved by: jeff(mentor)	2007-06-10 00:28:41 +00:00
Marcel Moolenaar	55b5660de4	Work around an integer overflow in expression `3 * maxbufspace / 4', when maxbufspace is larger than INT_MAX / 3. The overflow causes a hard hang on ia64 when physical memory is sufficiently large (8GB).	2007-06-09 23:41:14 +00:00
Attilio Rao	a1fe14bc33	rufetch and calcru sometimes should be called atomically together. This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)	2007-06-09 21:48:44 +00:00
Attilio Rao	86a49dea5b	Since locking in kern/subr_prof.c is changed a bit, we need nomore of time_lock spinlock exported. Approved by: jeff (mentor)	2007-06-09 19:41:14 +00:00
Attilio Rao	a140976eb4	The current rusage code show peculiar problems: - Unsafeness on ruadd() in thread_exit() - Unatomicity of thread_exiit() in the exit1() operations This patch addresses these problems allocating p_fd as part of the process and modifying the way it is accessed. A small chunk of this patch, resolves a race about p_state in kern_wait(), since we have to be sure about the zombif-ing process. Submitted by: jeff Approved by: jeff (mentor)	2007-06-09 18:56:11 +00:00
Matt Jacob	65d32cd8fb	Propagate volatile qualifier to make gcc4.2 happy.	2007-06-09 18:09:37 +00:00
Attilio Rao	e682569165	Remove the MUTEX_WAKE_ALL option and make it the default behaviour for our mutexes. Currently we alredy force MUTEX_WAKE_ALL beacause of some problems with the !MUTEX_WAKE_ALL case (unavioidable priority inversion).	2007-06-08 21:36:52 +00:00
Poul-Henning Kamp	7acfb0af82	Double the WITNESS and DIAGNOSTIC benchmark warnings right before we go into userland to improve the chances of people noticing them.	2007-06-08 11:47:36 +00:00
Xin LI	7b8c8b858c	In getblk(), before gbincore(), use BO_LOCK directly when locking the bufobj, rather than using VI_LOCK, like what was done with revision 1.453.	2007-06-08 07:05:08 +00:00
Robert Watson	faef53711b	Move per-process audit state from a pointer in the proc structure to embedded storage in struct ucred. This allows audit state to be cached with the thread, avoiding locking operations with each system call, and makes it available in asynchronous execution contexts, such as deep in the network stack or VFS. Reviewed by: csjp Approved by: re (kensmith) Obtained from: TrustedBSD Project	2007-06-07 22:27:15 +00:00
John Baldwin	a66fde8d35	- Remove unused variable from create_thread(). - Move kern_thr_() prototype to <sys/syscallsubr.h> where all the other kern_() prototypes live.	2007-06-07 19:45:19 +00:00
David Xu	42ce445fed	Backout experimental adaptive-spin umtx code.	2007-06-06 07:35:08 +00:00
Jeff Roberson	710eacdc5f	- Placing the 'volatile' on the right side of the * in the td_lock declaration removes the need for __DEVOLATILE(). Pointed out by: tegge	2007-06-06 03:40:47 +00:00
Attilio Rao	d301eb10c7	Fix a problem with not-preemptive kernels caming from mis-merging of existing code with the new thread_lock patch. This also cleans up a bit unlock operation for mutexes. Approved by: jhb, jeff(mentor)	2007-06-05 18:57:09 +00:00
Konstantin Belousov	b95b98b0bd	Restore non-SMP build. Reviewed by: attilio	2007-06-05 14:20:13 +00:00
Jeff Roberson	95e3a0bca3	- Better fix for previous error; use DEVOLATILE on the td_lock pointer it can actually sometimes be something other than sched_lock even on schedulers which rely on a global scheduler lock. Tested by: kan	2007-06-05 04:12:46 +00:00
Jeff Roberson	c219b097af	- Pass &sched_lock as the third argument to cpu_switch() as this will always be the correct lock and we don't get volatile warnings this way. Pointed out by: kan	2007-06-05 03:46:54 +00:00
Jeff Roberson	36b369163b	- Define TDQ_ID() for the !SMP case. - Default pick_pri to off. It is not faster in most cases.	2007-06-05 02:53:51 +00:00
Jeff Roberson	8e0185f604	- Remove sched_core.c. The maintainer has lost interest in pursuing this and it has been neglected in the recent ksegrp removal as well as the thread_lock() changes. Discussed with: davidxu	2007-06-05 00:12:37 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Jeff Roberson	bd43e47156	Commit 10/14 of sched_lock decomposition. - Add new spinlocks to support thread_lock() and adjust ordering. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:55:45 +00:00
Jeff Roberson	07a61420ff	Commit 9/14 of sched_lock decomposition. - Attempt to return the ttyinfo() selection algorithm to something sane as it has been broken and disabled for some time. Adapt this algorithm in such a way that it does not conflict with per-cpu scheduler locking. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:55:32 +00:00
Jeff Roberson	3c2e44364e	Commit 8/14 of sched_lock decomposition. - Use a global umtx spinlock to protect the sleep queues now that there is no global scheduler lock. - Use thread_lock() to protect thread state. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:54:50 +00:00
Jeff Roberson	765b2891e8	Commit 7/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. - Use a global kse spinlock to protect upcall and thread assignment. The per-process spinlock can not be used because this lock must be acquired via mi_switch() where we already hold a thread lock. The kse spinlock is a leaf lock ordered after the process and thread spinlocks. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:54:27 +00:00
Jeff Roberson	11bda9b8d5	Commit 6/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. - Replace the tail-end of fork_exit() with a scheduler specific routine which can do the appropriate lock manipulations. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:53:34 +00:00
Jeff Roberson	40acdeabab	Commit 5/14 of sched_lock decomposition. - Protect the cp_time tick counts with atomics instead of a global lock. There will only be one atomic per tick and this allows all processors to execute softclock concurrently. - In softclock, protect access to rusage and td_*tick data with the thread_lock(), expanding the scope of the thread lock over the whole function. - Do some creative re-arranging in hardclock() to avoid excess locking. - Protect the p_timer fields with the per-process spinlock. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:53:06 +00:00
Jeff Roberson	a54e85fdbf	Commit 4/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. - Move some common code into thread_suspend_switch() to handle the mechanics of suspending a thread. The locking here is incredibly convoluted and should be simplified. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:52:24 +00:00
Jeff Roberson	2502c107ba	Commit 3/14 of sched_lock decomposition. - Add a per-turnstile spinlock to solve potential priority propagation deadlocks that are possible with thread_lock(). - The turnstile lock order is defined as the exact opposite of the lock order used with the sleep locks they represent. This allows us to walk in reverse order in priority_propagate and this is the only place we wish to multiply acquire turnstile locks. - Use the turnstile_chain lock to protect assigning mutexes to turnstiles. - Change the turnstile interface to pass back turnstile pointers to the consumers. This allows us to reduce some locking and makes it easier to cancel turnstile assignment while the turnstile chain lock is held. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:51:44 +00:00
Jeff Roberson	d72e80f09a	Commit 2/14 of sched_lock decomposition. - Adapt sleepqueues to the new thread_lock() mechanism. - Delay assigning the sleep queue spinlock as the thread lock until after we've checked for signals. It is illegal for a thread to return in mi_switch() with any lock assigned to td_lock other than the scheduler locks. - Change sleepq_catch_signals() to do the switch if necessary to simplify the callers. - Simplify timeout handling now that locking a sleeping thread has the side-effect of locking the sleepqueue. Some previous races are no longer possible. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:50:56 +00:00
Jeff Roberson	7b20fb19fb	Commit 1/14 of sched_lock decomposition. - Move all scheduler locking into the schedulers utilizing a technique similar to solaris's container locking. - A per-process spinlock is now used to protect the queue of threads, thread count, suspension count, p_sflags, and other process related scheduling fields. - The new thread lock is actually a pointer to a spinlock for the container that the thread is currently owned by. The container may be a turnstile, sleepqueue, or run queue. - thread_lock() is now used to protect access to thread related scheduling fields. thread_unlock() unlocks the lock and thread_set_lock() implements the transition from one lock to another. - A new "blocked_lock" is used in cases where it is not safe to hold the actual thread's lock yet we must prevent access to the thread. - sched_throw() and sched_fork_exit() are introduced to allow the schedulers to fix-up locking at these points. - Add some minor infrastructure for optionally exporting scheduler statistics that were invaluable in solving performance problems with this patch. Generally these statistics allow you to differentiate between different causes of context switches. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:50:30 +00:00
Attilio Rao	b4b7081961	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
Attilio Rao	6759608248	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:38:48 +00:00
David Malone	041b706b2f	Despite several examples in the kernel, the third argument of sysctl_handle_int is not sizeof the int type you want to export. The type must always be an int or an unsigned int. Remove the instances where a sizeof(variable) is passed to stop people accidently cut and pasting these examples. In a few places this was sysctl_handle_int was being used on 64 bit types, which would truncate the value to be exported. In these cases use sysctl_handle_quad to export them and change the format to Q so that sysctl(1) can still print them.	2007-06-04 18:25:08 +00:00
David Malone	df82ff50ed	Add a function for exporting 64 bit types.	2007-06-04 18:14:28 +00:00
Kris Kennaway	cdcc788a7e	Revert some debugging KTRs that were added during development.	2007-06-03 18:24:31 +00:00
Konstantin Belousov	7a31868ed0	Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file: part 2. Convert calls missed in the first big commit. Noted by: rwatson Pointy hat to: kib	2007-06-01 14:33:11 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Paolo Pisati	3401f2c1df	In some particular cases (like in pccard and pccbb), the real device handler is wrapped in a couple of functions - a filter wrapper and an ithread wrapper. In this case (and just in this case), the filter wrapper could ask the system to schedule the ithread and mask the interrupt source if the wrapped handler is composed of just an ithread handler: modify the "old" interrupt code to make it support this situation, while the "new" interrupt code is already ok. Discussed with: jhb	2007-05-31 19:25:35 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
Robert Watson	049c3b6cdf	Now that sx(9) locks support an interruptible lock acquire primitive, properly observe the SB_NOINTR flag in sblock. This restores the required behavior that lock acquisition be interruptible on the socket buffer I/O serialization lock to allow threads waiting for I/O to be signaled even if they aren't the thread currently holding the I/O lock. With this change, the sblock regression test is again passed. Reported by: alfred sx(9) handiwork: attilio	2007-05-31 11:51:22 +00:00
Attilio Rao	f9819486e5	Add functions sx_xlock_sig() and sx_slock_sig(). These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)	2007-05-31 09:14:48 +00:00
Attilio Rao	2c7289cbfa	style(9) fixes for sx locks. Approved by: jeff (mentor)	2007-05-29 19:46:37 +00:00
Attilio Rao	acf840c4bd	Add a small fix for lock profiling in sx locks. "0" cannot be a correct value since when the function is entered at least one shared holder must be present and since we want the last one "1" is the correct value. Note that lock_profiling for sx locks is far from being perfect. Expect further fixes for that. Approved by: jeff (mentor)	2007-05-29 19:34:32 +00:00
Attilio Rao	02b0a160dc	Fix some problems introduced with the last descriptors tables locking patch: - Do the correct test for ldt allocation - Drop dt_lock just before to call kmem_free (since it acquires blocking locks inside) - Solve a deadlock with smp_rendezvous() where other CPU will wait undefinitively for dt_lock acquisition. - Add dt_lock in the WITNESS list of spinlocks While applying these modifies, change the requirement for user_ldt_free() making that returning without dt_lock held. Tested by: marcus, tegge Reviewed by: tegge Approved by: jeff (mentor)	2007-05-29 18:55:41 +00:00
Robert Watson	03c96c3176	Add DDB "show unpcb" command, allowing DDB to print out many pertinent details from UNIX domain socket protocol layer state.	2007-05-29 12:36:00 +00:00
Ed Maste	911d16b8cd	Revert 1.197 and instead avoid calling kdb_enter() if the KDB_UNATTENDED option is in use.	2007-05-28 21:50:54 +00:00
Warner Losh	cfa7a8beea	Simplify the kernel configuration file return code. Reviewed by: wkoszek	2007-05-28 20:41:10 +00:00
Ed Maste	1e62d77c09	Eliminate explicit kdb_enter in the software watchdog handler (which produced incorrect behaviour with the KDB_UNATTENDED option) and call panic in both the KDB and non-KDB cases. This change is consistent with rwatson's current kdb/ddb work.	2007-05-28 19:51:12 +00:00
Robert Watson	dede2ab3b2	In kern_kevent(), unconditionally fdrop() fp once fget() has succeeded, as we never have an opportunity to set it to NULL. Found with: Coverity Prevent(tm) CID: 2161	2007-05-28 17:15:05 +00:00
Robert Watson	e1e8f51b85	Universally adopt most conventional spelling of acquire.	2007-05-27 20:50:23 +00:00
Robert Watson	87066f04c6	Select a more appealing spelling for the word acquire.	2007-05-27 19:24:00 +00:00
Robert Watson	1c293049d9	Add parens around free in free++ in mbp_count() so that mbp_count() actually works. mbp_count() turns out only to be used in debugging code in if_patm_intr.c, so this bug did not affect much in practice. Found with: Coverity Prevent(tm) CID: 1943	2007-05-27 17:38:36 +00:00
Robert Watson	097e1ea87f	Remove amountpipes counter for pipes -- this replicates the function of existing UMA statistics for pipes, and allows us to get rid of both the per-pipe dtor and two atomic operations per pipe required to maintain the counter.	2007-05-27 17:33:10 +00:00
Robert Watson	e4e80aa713	Remove #if 0'd check for 0-size allocations, which if enabled, called kdb_enter().	2007-05-27 13:13:46 +00:00
Pawel Jakub Dawidek	6e042171bd	To avoid a deadlock when handling .. directory during a lookup, we unlock parent vnode and relock it after locking child vnode. The problem was that we always relock it exclusively, even when it was share-locked. Discussed with: jeff	2007-05-25 22:23:38 +00:00
Pawel Jakub Dawidek	b4c85af977	We no longer need to put namecache entries onto temporary mplist. It was useful in revision 1.86, but should have been removed in 1.89.	2007-05-25 22:19:49 +00:00
Pawel Jakub Dawidek	950afe9972	The cache_leaf_test() function seems to be unused, so remove it.	2007-05-25 22:16:17 +00:00
Sam Leffler	3c86b7cdad	fix comment typo	2007-05-23 17:28:21 +00:00
Robert Watson	4dec0e67ea	Comment that tdsignal() may be entered from the debugger.	2007-05-23 17:27:42 +00:00
Robert Watson	63d69d2592	Initialize time_lock before calling cpu_initclocks(). This corrects a race condition in which hardclock fires before the mutex is initialized leading to a "corrupt spinlock" panic. Submitted by: attilio	2007-05-23 17:27:01 +00:00
Olivier Houchard	302e130edc	Remove duplicate includes. Submitted by: Cyril Nguyen Huu <cyril ci0 org>	2007-05-23 13:36:02 +00:00
Pawel Jakub Dawidek	f013ccb768	- Remove redundant initialization. - Compare pointer with NULL.	2007-05-22 23:05:48 +00:00

1 2 3 4 5 ...

10178 Commits