freebsd-skq

Author	SHA1	Message	Date
Alan Cox	b8831f8d68	Remove some dead code.	2003-04-08 18:24:28 +00:00
Dag-Erling Smørgrav	fe58453891	Introduce an M_ASSERTPKTHDR() macro which performs the very common task of asserting that an mbuf has a packet header. Use it instead of hand- rolled versions wherever applicable. Submitted by: Hiten Pandya <hiten@unixdaemons.com>	2003-04-08 14:25:47 +00:00
Jake Burkholder	a12efae1ea	Merged from kern_thread.c 1.113, avoid a panic in cpu_throw when the first thread of a multithreaded process exits. This unrelated and possibly wrong change was not mentioned in the commit message for kern_thread.c 1.113.	2003-04-08 08:13:47 +00:00
David Xu	36f7b36f8a	Inherit blocked thread's context for upcall thread.	2003-04-08 07:45:56 +00:00
Peter Wemm	67db8b23c3	Search for "elf32 kernel" (and elf64) and "elf32 module" (and elf64) as well as "elf kernel" and "elf module". This is a precursor to x86-64 support in the i386 loader so it can load an elf64 x86-64 kernel.	2003-04-06 05:20:00 +00:00
Alan Cox	0b556837a9	Remove an unnecessary trunc_page() from vmapbuf(). Reviewed by: tegge	2003-04-06 00:40:54 +00:00
Alan Cox	ef38cda165	Don't reinitialize fields that are already initialized by getpbuf().	2003-04-05 23:02:58 +00:00
Alan Cox	cdb06eda66	Sufficient access checks are performed by vmapbuf() that calling useracc() is pointless. Remove the call to useracc() from physio(). Reviewed by: tegge	2003-04-05 21:19:58 +00:00
Alan Cox	06363906bc	o Remove useracc() calls from aio_qphysio(); they are redundant given the checks performed by vmapbuf(). Reviewed by: tegge	2003-04-04 06:26:28 +00:00
Alan Cox	08468b6ad7	o Check the b_bufsize passed to vmapbuf() returning an error if it is invalid. o Remove a debugging printf() from vmapbuf(). Suggested by: tegge	2003-04-04 06:14:54 +00:00
Poul-Henning Kamp	b0fc6220b8	Remove BIO_SETATTR from non-GEOM part of kernel as well.	2003-04-03 19:22:32 +00:00
Jeff Roberson	a8949de20e	- Keep seperate statistics and run queues for different scheduling classes. - Treat each class specially in kseq_{choose,add,rem}. Let the rest of the code be less aware of scheduling classes. - Skip the interactivity calculation for non TIMESHARE ksegrps. - Move slice and runq selection into kseq_add(). Uninline it now that it's big.	2003-04-03 00:29:28 +00:00
Peter Wemm	cc66ebe2a9	Commit a partial lazy thread switch mechanism for i386. it isn't as lazy as it could be and can do with some more cleanup. Currently its under options LAZY_SWITCH. What this does is avoid %cr3 reloads for short context switches that do not involve another user process. ie: we can take an interrupt, switch to a kthread and return to the user without explicitly flushing the tlb. However, this isn't as exciting as it could be, the interrupt overhead is still high and too much blocks on Giant still. There are some debug sysctls, for stats and for an on/off switch. The main problem with doing this has been "what if the process that you're running on exits while we're borrowing its address space?" - in this case we use an IPI to give it a kick when we're about to reclaim the pmap. Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a few more things and get some more feedback before turning it on by default. This is NOT a replacement for Bosko's lazy interrupt stuff. This was more meant for the kthread case, while his was for interrupts. Mine helps a little for interrupts, but his helps a lot more. The stats are enabled with options SWTCH_OPTIM_STATS - this has been a pseudo-option for years, I just added a bunch of stuff to it. One non-trivial change was to select a new thread before calling cpu_switch() in the first place. This allows us to catch the silly case of doing a cpu_switch() to the current process. This happens uncomfortably often. This simplifies a bit of the asm code in cpu_switch (no longer have to call choosethread() in the middle). This has been implemented on i386 and (thanks to jake) sparc64. The others will come soon. This is actually seperate to the lazy switch stuff. Glanced at by: jake, jhb	2003-04-02 23:53:30 +00:00
John Baldwin	6751370f6f	Lock the process before sending it a SIGIO. Not doing so is a panic(2) implementation with INVARIANTS.	2003-04-02 21:54:51 +00:00
Jeffrey Hsu	c31548c820	Need to hold the same SMP lock for (knote) list traversal as for list manipulation. This lock also protects read-modify-write operations on the pipe_state field.	2003-04-02 15:24:50 +00:00
Jeff Roberson	5053d272c2	- Make the interactivity calculator decay faster. - Make the pcpu estimator update faster.	2003-04-02 08:22:33 +00:00
Jeff Roberson	98c9b132d1	- I meant divide by two and not shift by two in SCHED_PRI_NHALF.	2003-04-02 08:21:24 +00:00
Jake Burkholder	cef57e7624	- Make casuptr return the old value of the location we're trying to update, and change the umtx code to expect this. Reviewed by: jeff	2003-04-02 08:02:27 +00:00
Jeff Roberson	245f3abfd5	- Add in support for KSEs with 0 slice values on the run queue. If we try to select a KSE with a slice of 0 we will update its slice and insert it onto the next queue. - Pass the KSE instead of the ksegrp into sched_slice(). This more accurately reflects the behavior of the code. Slices are granted to kses. - Add a function kseq_nice_min() which finds the smallest nice value assigned to the kseg of any KSE on the queue. - Rewrite the logic in sched_slice(). Add a large comment describing the new slice selection scheme. To summarize, slices are assigned based on the nice value. Priorities are still calculated based on the nice and interactivity of a process. Slice sizes of 0 may be granted for KSEs whos nice is 20 or futher away from the lowest nice on the run queue. Other nice values are scaled across the range [min, min+20]. This fixes ULEs bad behavior with positively niced processes.	2003-04-02 06:46:43 +00:00
Jake Burkholder	fc2fca74d8	- Fix UC_COPY_SIZE. Adding up the size of structure fields doesn't take alignment into account. - Return EJUSTRETURN from set_context on success to avoid clobbering the first 2 out registers with td_retval on sparc64.	2003-04-01 23:25:18 +00:00
Poul-Henning Kamp	817509273e	#include <geom/geom_disk.h>	2003-04-01 19:00:38 +00:00
Poul-Henning Kamp	af6ca7f4a9	Introduce bioq_flush() function.	2003-04-01 12:49:40 +00:00
Jeff Roberson	c9dfa2e08b	- p will be unused in cursig() if INVARIANTS is not defined. Access it through td->td_proc to avoid the unused variable. Spotted by: Maxim Konovalov <maxim@macomnet.ru>	2003-04-01 09:07:36 +00:00
Jeff Roberson	4518589564	- Regen.	2003-04-01 02:34:21 +00:00
Jeff Roberson	8446303e01	- thr_exit() should no longer be called with Giant held.	2003-04-01 02:32:53 +00:00
Jeff Roberson	f27bf63b8a	- Mark the various thr syscalls as MP safe. Previously there was a bug if this was not done since thr_exit() unwinds giant.	2003-04-01 02:32:07 +00:00
Jeff Roberson	2c10d16a4b	- Borrow the KSE single threading code for exec and exit. We use the check if (p->p_numthreads > 1) and not a flag because action is only necessary if there are other threads. The rest of the system has no need to identify thr threaded processes. - In kern_thread.c use thr_exit1() instead of thread_exit() if P_THREADED is not set.	2003-04-01 01:26:20 +00:00
Jeff Roberson	8af830c374	- Regen for umtx.	2003-04-01 01:22:18 +00:00
Jeff Roberson	6eeb9653aa	- Include umtx.h in files generated by makesyscalls.sh - Add system calls for umtx.	2003-04-01 01:12:24 +00:00
Jeff Roberson	69404b5090	- Add an api for doing smp safe locks in userland. - umtx_lock() is defined as an inline in umtx.h. It tries to do an uncontested acquire of a lock which falls back to the _umtx_lock() system-call if that fails. - umtx_unlock() is also an inline which falls back to _umtx_unlock() if the uncontested unlock fails. - Locks are keyed off of the thr_id_t of the currently running thread which is currently just the pointer to the 'struct thread' in kernel. - _umtx_lock() uses the proc pointer to synchronize access to blocked thread queues which are stored in the first blocked thread.	2003-04-01 01:10:42 +00:00
Jeff Roberson	90e38817b7	- We now have to include umtx.h and ucontext.h in the system call related headers.	2003-04-01 00:35:12 +00:00
Jeff Roberson	d4a63cb9c8	- Regen for thr related system calls.	2003-04-01 00:34:29 +00:00
Jeff Roberson	8d5377e538	- Add the four thr related system calls.	2003-04-01 00:31:37 +00:00
Jeff Roberson	89bb1cef1d	- Add two files to support the thr threading interface. - sys/thr.h contains the user space visible api that is intended only for use in threading library packages. - kern/kern_thr.c contains thr system calls and other thr specific code.	2003-04-01 00:30:30 +00:00
Jeff Roberson	722547925e	- Regen for the sigwait system calls.	2003-03-31 23:33:45 +00:00
Jeff Roberson	a447cd8b28	- Define sigwait, sigtimedwait, and sigwaitinfo in terms of kern_sigtimedwait() which is capable of supporting all of their semantics. - These should be POSIX compliant but more careful review is needed before we announce this.	2003-03-31 23:30:41 +00:00
Jeff Roberson	4093529dee	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.	2003-03-31 22:49:17 +00:00
Julian Elischer	0d49bb4b30	Do NOT return from an non-interruptable cv_wait, falsely claiming to have timed out. I don't know what I was thinking..	2003-03-31 22:41:47 +00:00
Jeff Roberson	da33176f39	- Mark signals which may be delivered to any thread in the process with SA_PROC. Signals without this flag should be directed to a particular thread if this is possible.	2003-03-31 22:12:09 +00:00
Jeff Roberson	1bf4700bff	- Change trapsignal() to accept a thread and not a proc. - Change all consumers to pass in a thread. Right now this does not cause any functional changes but it will be important later when signals can be delivered to specific threads.	2003-03-31 22:02:38 +00:00
Alan Cox	7be80f55ba	Recent changes to uipc_cow.c have eliminated the need for some sf_buf- related variables to be global. Make them either local to sf_buf_init() or static.	2003-03-31 06:25:42 +00:00
Poul-Henning Kamp	d2a0822e9d	retire the "busy" field in bioqueues, it's served it's purpose.	2003-03-30 10:16:31 +00:00
Poul-Henning Kamp	d086f85ac4	Preparation commit before I start on the bioqueue lockdown: Collect all the bits of bioqueue handing in subr_disk.c, vfs_bio.c is big enough as it is and disksort already lives in subr_disk.c.	2003-03-30 08:51:23 +00:00
Jeff Roberson	abb0e6da6b	- We are not guaranteed that read ahead blocks are not in memory already. Check for B_DELWRI as well as B_CACHED before issuing io on a buffer. This is especially important since we are changing the b_iocmd.	2003-03-30 02:57:32 +00:00
Alan Cox	9f6d45b1a4	Pass the vm_page's address to sf_buf_alloc(); map the vm_page as part of sf_buf_alloc() instead of expecting sf_buf_alloc()'s caller to map it. The ultimate reason for this change is to enable two optimizations: (1) that there never be more than one sf_buf mapping a vm_page at a time and (2) 64-bit architectures can transparently use their 1-1 virtual to physical mapping (e.g., "K0SEG") avoiding the overhead of pmap_qenter() and pmap_qremove().	2003-03-29 06:14:14 +00:00
Mike Silbersack	55e9f80d76	Add the m_defrag routine, as discussed on committers@. This incarnation should address the concerns of all in the discussion, and keeps statistics which show how much it is used. MFC after: 2 weeks	2003-03-29 05:48:36 +00:00
John Baldwin	16088e4a88	Check for the PS_NEEDSIGCHK flag in the right flags field.	2003-03-28 18:08:57 +00:00
Mike Silbersack	df8c7fc96e	Allow m_dup_pkthdr to accept mbufs with attached clusters as targets. Submitted by: bmilekic	2003-03-28 05:57:48 +00:00
Ian Dowse	6205bf3107	Add a checksum to the kernel message buffer, and update it every time a character is written. Use this at boot time to reject the existing buffer contents if they are corrupt. This fixes a problem seen on some hardware (especially laptops) where the message buffer gets partially corrupted during a short power cycle or reset, but the msgbuf structure is left intact so it gets reused, resulting in random junk and control characters appearing in dmesg and /var/log/messages. PR: kern/28497	2003-03-28 02:50:10 +00:00
Tor Egge	5bbb806004	Add support for reading directly from file to userland buffer when the O_DIRECT descriptor status flag is set and both offset and length is a multiple of the physical media sector size.	2003-03-26 23:40:42 +00:00
Tor Egge	6b08046175	Adjust the number of vnodes scanned by vlrureclaim() according to the size of the vnode list.	2003-03-26 22:15:58 +00:00
Robert Watson	f2538508f6	Permit debug.malloc.failure_rate to be specified using a tunable so that the feature can be enabled during the boot process. Note the continued limitation that FreeBSD fails so rapidly with this setting enabled that it's hard to narrow down particular failures for correction; we really need per-malloc type failure rates.	2003-03-26 20:44:29 +00:00
Robert Watson	eae870cdb4	Add a new kernel option, MALLOC_MAKE_FAILURES, which compiles in a debugging feature causing M_NOWAIT allocations to fail at a specified rate. This can be useful for detecting poor handling of M_NOWAIT: the most frequent problems I've bumped into are unconditional deference of the pointer even though it's NULL, and hangs as a result of a lost event where memory for the event couldn't be allocated. Two sysctls are added: debug.malloc.failure_rate How often to generate a failure: if set to 0 (default), this feature is disabled. Otherwise, the frequency of failures -- I've been using 10 (one in ten mallocs fails), but other popular settings might be much lower or much higher. debug.malloc.failure_count Number of times a coerced malloc failure has occurred as a result of this feature. Useful for tracking what might have happened and whether failures are being generated. Useful possible additions: tying failure rate to malloc type, printfs indicating the thread that experienced the coerced failure. Reviewed by: jeffr, jhb	2003-03-26 20:18:40 +00:00
Tor Egge	128a0bb7e9	fp->f_offset doesn't need any protection when it isn't accessed.	2003-03-26 19:21:12 +00:00
Robert Watson	5e7ce4785f	Modify the mac_init_ipq() MAC Framework entry point to accept an additional flags argument to indicate blocking disposition, and pass in M_NOWAIT from the IP reassembly code to indicate that blocking is not OK when labeling a new IP fragment reassembly queue. This should eliminate some of the WITNESS warnings that have started popping up since fine-grained IP stack locking started going in; if memory allocation fails, the creation of the fragment queue will be aborted. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-03-26 15:12:03 +00:00
John Baldwin	7908a1d477	Remove extraneous check. We are not going to return from copyin/out on the stack of a thread A but actually be thread B instead of thread A.	2003-03-25 20:13:24 +00:00
Matthew N. Dodd	c844066969	Give print_child a default method.	2003-03-25 04:32:52 +00:00
Jake Burkholder	227f9a1c58	- Add vm_paddr_t, a physical address type. This is required for systems where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long. Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms. Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)	2003-03-25 00:07:06 +00:00
John Baldwin	75b8b3b25c	Replace the at_fork, at_exec, and at_exit functions with the slightly more flexible process_fork, process_exec, and process_exit eventhandlers. This reduces code duplication and also means that I don't have to go duplicate the eventhandler locking three more times for each of at_fork, at_exec, and at_exit. Reviewed by: phk, jake, almost complete silence on arch@	2003-03-24 21:15:35 +00:00
John Baldwin	959d22329a	- Remove witness_dead and just use witness_watch instead. If witness_watch is set to 0, it now has the same affect as setting witness_dead used to have. - Added a sysctl handler that allows root to change witness_watch from a non-zero value to zero to disable witness at runtime. Note that you can't turn witness back on once it is off. You can only turn it off as a one-way switch. - Added a comment describing the possible values of witness_watch.	2003-03-24 21:03:53 +00:00
Maxime Henrion	4974b53e31	Remove a trailing semicolon in SCHED_QUANTUM definition. Luckily this didn't cause any bugs. Spotted by: Samy Al Bahra <samy@kerneled.com>	2003-03-24 15:16:21 +00:00
Olivier Houchard	e2f9a08bb0	s/discriptors/descriptors/	2003-03-23 19:41:34 +00:00
Tim J. Robbins	f949f795aa	Remove unused mtx_lock_giant(), mtx_unlock_giant(), related globals and sysctls.	2003-03-23 11:26:11 +00:00
Yaroslav Tykhiy	17ce5b94d6	We shouldn't assert that a vode is locked in vop_lock_post() if VOP_LOCK() has failed. Reviewed by: jeff	2003-03-22 13:21:54 +00:00
John Baldwin	b254666064	Use td_ucred of curthread instead of p_ucred of curproc. This required changing sem_perm() and sem_hasopen() to take a thread instead of a proc for the first argument.	2003-03-20 21:12:31 +00:00
Poul-Henning Kamp	cc34e37e5b	Backout the getcwd changes, a more comprehensive effort will be needed.	2003-03-20 10:40:45 +00:00
David Xu	6ce75196ce	Adjust code for userland preemptive. Userland can set a quantum in kse_mailbox to schedule an upcall, this is useful for userland timeout routine, for example pthread_cond_timedwait(). Also extract upcall scheduling code from kse_reassign and create a new function called thread_switchout to include these code. Reviewed by: julain	2003-03-19 05:49:38 +00:00
Dag-Erling Smørgrav	830c3153c6	Unregisterize, ansify.	2003-03-19 00:49:40 +00:00
Dag-Erling Smørgrav	4e8074eba2	Whitespace cleanup.	2003-03-19 00:33:38 +00:00
Jake Burkholder	84513ca201	long != int. Use SYSCTL_UINT for kern.devstat.generation. Fixes booting on sparc64.	2003-03-18 23:32:27 +00:00
Andrew Gallatin	daa949b66e	Fix a race condition in socow_setup(): The page must be wired before sf_buf_alloc() is called, as sf_buf_alloc() may sleep. If it does sleep, the page might be reclaimed before wiring occurs. Reported by: alc	2003-03-18 18:27:33 +00:00
Poul-Henning Kamp	c967bee755	If devstat_new_entry() is passed a unit number of -1 assume that the devstat is for an "interior" GEOM node and register using the name argument as a geom identity pointer. Do not put these devstat structures on the list returned by the sysctl. This gives us the ability to tell the two kinds of nodes apart and leave the current "strictly physical" view of devstat intact without modifications, yet be able to use devstat for both kinds of devices. It also saves us bloating struct devstat with another 48 bytes of space for the name. At least for now. Reviewed by: ken	2003-03-18 09:30:31 +00:00
Poul-Henning Kamp	224d5539a9	Make devstat fully Giant agnostic: Add a mutex and protect the allocation and traversal of the list with it. When we allocate a page for devstat use we drop the mutex and use M_WAITOK this is not nice, but under the given circumstances the best we can do. In the sysctl handler for returning the devstat entries we do not want to hold the mutex across copyout(9) calls, so we keep a very careful eye on the devstat_generation count, and abandon with EBUSY if it changes under our feet. Specifically test for BIO_WRITE, rather than default non-read,non-deletes as write. Make the default be DEVSTAT_NO_DATA. Add atomic increments of the sequence[01] fields so applications using the mmap'ed view stand a chance of detecting updates in progress. Reviewed by: ken	2003-03-18 09:20:20 +00:00
Poul-Henning Kamp	b4b138c27f	Including <sys/stdint.h> is (almost?) universally only to be able to use %j in printfs, so put a newsted include in <sys/systm.h> where the printf prototype lives and save everybody else the trouble.	2003-03-18 08:45:25 +00:00
Poul-Henning Kamp	538aabaad9	Make devstat_new_entry() take a const void * rather than const char * argument, GEOM nodes are not identified by ascii string.	2003-03-18 07:52:59 +00:00
Jeff Roberson	5d952c1b59	- Unlock the target bp and not the pager buf bp in a failure case in cluster_wbuild(). This was causing strange panics that were widely reported on current@. Big Pointy Hat to: jeff	2003-03-17 18:38:49 +00:00
Poul-Henning Kamp	9eaf5abceb	(This commit certainly increases the need for a wash&clean of vfs_cache.c, but I decided that it was important for this patch to not bit-rot, and since it is mainly moving code around, the total amount of entropy is epsilon /phk) This is a patch to move the common parts of linux_getcwd() back into kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's extended functionality. The linux syscall linux_getcwd() in compat/linux/linux_getcwd.c has been rewritten to use it too. It should be possible to simplify libc's getcwd() after this. No doubt this code needs some cleaning up, since I've left in the sysctl variables I used for debugging. PR: 48169 Submitted by: James Whitwell <abacau@yahoo.com.au>	2003-03-17 12:21:08 +00:00
Poul-Henning Kamp	5fa5746d3d	Add a #define for the device name of the mmap device for devstat. Constify the geom identification pointer.	2003-03-16 23:20:05 +00:00
Alan Cox	42de97a50a	Pass the sf buf to MEXTADD() as the optional argument. This permits the simplification of socow_iodone() and sf_buf_free(); they don't have to reverse engineer the sf buf from the data's address.	2003-03-16 07:19:12 +00:00
Poul-Henning Kamp	d15cd51001	One devstat_start_transaction_bio() is enough.	2003-03-15 22:20:38 +00:00
Poul-Henning Kamp	7194d335cf	Run a revision of the devstat interface: Kernel: Change statistics to use the uptime() timescale (ie: relative to boottime) rather than the UTC aligned timescale. This makes the device statistics code oblivious to clock steps. Change timestamps to bintime format, they are cheaper. Remove the "busy_count", and replace it with two counter fields: "start_count" and "end_count", which are updated in the down and up paths respectively. This removes the locking constraint on devstat. Add a timestamp argument to devstat_start_transaction(), this will normally be a timestamp set by the _bio() function in bp->bio_t0. Use this field to calculate duration of I/O operations. Add two timestamp arguments to devstat_end_transaction(), one is the current time, a NULL pointer means "take timestamp yourself", the other is the timestamp of when this transaction started (see above). Change calculation of busy_time to operate on "the salami principle": Only when we are idle, which we can determine by the start+end counts being identical, do we update the "busy_from" field in the down path. In the up path we accumulate the timeslice in busy_time and update busy_from. Change the byte_* and num_* fields into two arrays: bytes[] and operations[]. Userland: Change the misleading "busy_time" name to be called "snap_time" and make the time long double since that is what most users need anyway, fill it using clock_gettime(CLOCK_MONOTONIC) to put it on the same timescale as the kernel fields. Change devstat_compute_etime() to operate on struct bintime. Remove the version 2 legacy interface: the change to bintime makes compatibility far too expensive. Fix a bug in systat's "vm" page where boot relative busy times would be bogus. Bump __FreeBSD_version to 500107 Review & Collaboration by: ken	2003-03-15 21:59:06 +00:00
Poul-Henning Kamp	9fa85de269	Add a devstat_start_transaction_bio() to match the devstat_end_transaction_bio() we already have. For now it just calls devstat_start_transaction(), but that will change shortly.	2003-03-15 10:33:32 +00:00
David Xu	9a4b78c9da	Export current time when returning from never blocked syscall.	2003-03-14 03:52:16 +00:00
John Baldwin	2d055ab20f	Trim some trailing whitespace.	2003-03-13 23:07:09 +00:00
John Baldwin	75768576cc	Add a new userland-visible ktrace flag KTR_DROP and an internal ktrace flag KTRFAC_DROP to track instances when ktrace events are dropped due to the request pool being exhausted. When a thread tries to post a ktrace event and is unable to due to no available ktrace request objects, it sets KTRFAC_DROP in its process' p_traceflag field. The next trace event to successfully post from that process will set the KTR_DROP flag in the header of the request going out and clear KTRFAC_DROP. The KTR_DROP flag is the high bit in the type field of the ktr_header structure. Older kdump binaries will simply complain about an unknown type when seeing an entry with KTR_DROP set. Note that KTR_DROP being set on a record in a ktrace file does not tell you anything except that at least one event from this process was dropped prior to this event. The user has no way of knowing what types of events were dropped nor how many were dropped. Requested by: phk	2003-03-13 18:31:15 +00:00
John Baldwin	a5881ea55a	- Cache a reference to the credential of the thread that starts a ktrace in struct proc as p_tracecred alongside the current cache of the vnode in p_tracep. This credential is then used for all later ktrace operations on this file rather than using the credential of the current thread at the time of each ktrace event. - Now that we have multiple ktrace-related items in struct proc that are pointers, rename p_tracep to p_tracevp to make it less ambiguous. Requested by: rwatson (1)	2003-03-13 18:24:22 +00:00
Ian Dowse	a80cc4e104	In m_dup_pkthdr(), convert the supplied `how' argument into malloc flags when passing it into m_tag_copy_chain(), as m_tag* functions use malloc, not mbuf flags.	2003-03-13 09:02:19 +00:00
Jeff Roberson	749ffa4ecd	- Add a lock for protecting against msleep(bp, ...) wakeup(bp) races. - Create a new function bdone() which sets B_DONE and calls wakup(bp). This is suitable for use as b_iodone for buf consumers who are not going through the buf cache. - Create a new function bwait() which waits for the buf to be done at a set priority and with a specific wmesg. - Replace several cases where the above functionality was implemented without locking with the new functions.	2003-03-13 07:31:45 +00:00
Jeff Roberson	e99215a614	- Remove a dead check for bp->b_vp == vp in vtruncbuf(). This has not been possible for some time. - Lock the buf before accessing fields. This should very rarely be locked. - Assert that B_DELWRI is set after we acquire the buf. This should always be the case now.	2003-03-13 07:22:53 +00:00
Jeff Roberson	09f11da5a3	- Remove a race between fsync like functions and flushbufqueues() by requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.	2003-03-13 07:19:23 +00:00
Alfred Perlstein	569d3c4bf0	Make sure we actually have a dev before dereferencing in case someone botches and sends us a NULL pointer. The other code in this file seems to expect it to be able to handle it behaving this way.	2003-03-13 06:29:44 +00:00
Jeff Roberson	de950c003c	- Tune down read_max. For single disks we get no gain out of reading more than a MAXPHYS size block ahead. Having this set too high just leaves other processes starved for IO and screws up interactive response. Let the users with RAID set it higher when they need it.	2003-03-13 06:17:59 +00:00
Tim J. Robbins	6ec62361c8	Tidy up previous change: move comment about obtaining an exclusive reference where it belongs, and remove a blank line to make it more obvious what the comment applies to.	2003-03-13 00:57:47 +00:00
Tim J. Robbins	262c27b846	Back out previous. The locking here needs a rethink.	2003-03-13 00:54:53 +00:00
John Baldwin	8510f2a833	- Various little style fixes. - If SYSCTL_OUT() fails in sysctl_kern_proc_args(), return the error instead of ignoring it if we have new arguments for the process. - If the new arguments for a process are too long, return ENOMEM instead of returning success but not doing the actual copy. Submitted by: bde	2003-03-12 20:17:40 +00:00
John Baldwin	4bc6471b53	- Avoid dropping the proc lock around a simple permissions check and just hold hold it across the check to avoid extra lock operations in the common case. - Copy in the new args to a temporary pargs structure before we drop the reference to the old one. Thus, if the copyin() fails, the process arguments are unchanged rather than being deleted. Also, p_args is no longer NULL during the sysctl operation.	2003-03-12 16:14:55 +00:00
Tim J. Robbins	a7cbe87a5e	Acquire sched_lock around use of FOREACH_KSEGRP_IN_PROC, accesses to kg_nice and calls to sched_nice() in getpriority() and setpriority() (really donice()).	2003-03-12 11:24:41 +00:00
Tim J. Robbins	3890793e9c	In wait1(), remove the zombie process from zombproc before removing it from its pgrp to avoid leaving zombies around with p_pgrp == NULL. This bug was apparent as a NULL-dereference in the pid selection code in fork1().	2003-03-12 11:10:04 +00:00
John Baldwin	2ca9461a05	Trim an extra blank line that snuck into the last commit.	2003-03-11 22:33:42 +00:00
Alexander Kabaev	c162e9c2eb	Rename vfs_stdsync function to vfs_stdnosync which matches more closely what function is really doing. Update all existing consumers to use the new name. Introduce a new vfs_stdsync function, which iterates over mount point's vnodes and call FSYNC on each one of them in turn. Make nwfs and smbfs use this new function instead of rolling their own identical sync implementations. Reviewed by: jeff	2003-03-11 22:15:10 +00:00
John Baldwin	427b3a6549	- Change witness_displaydescendants() to accept the indentation level as a parameter instead of using the level of a given witness. When recursing, pass an indent level of indent + 1. - Make use of the information witness_levelall() provides in witness_display_list() to use an O(n) algorithm instead of an O(n^2) algo to decide which witnesses to display hierarchies from. Basically, we only display a hierarchy for witnesses with a level of 0. - Add a new per-witness flag that is reset at the start of witness_display() for all witness's and is set the first time a witness is displayed in witness_displaydescendants(). If a witness is encountered more than once in the lock order tree (which happens often), witness_displaydescendants() marks the later occurrences with the string "(already displayed)" and doesn't display the subtree under that witness. This avoids duplicating large amounts of the lock order tree in the 'show witness' output in DDB. All these changes serve to make 'show witness' a lot more readable and useful than it was previously.	2003-03-11 22:14:21 +00:00
John Baldwin	f82c6950be	- Split the itismychild() function into two functions: insertchild() adds a witness to the child list of a parent witness. rebalancetree() runs through the entire tree removing direct descendants of witnesses who already have said child witness as an indirect descendant through another direct descendant. itismychild() now calls insertchild() followed by rebalancetree() and no longer needs the evil hack of having static recursed variable. - Add a function reparentchildren() that adds all the direct descendants of one witness as direct descendants of another witness. - Change the return value of itismychild() and similar functions so that they return 0 in the case of failure due to lack of resources instead of 1. This makes the return value more intuitive. - Check the return value of itismychild() when defining the static lock order in witness_initialize(). - Don't try to setup a lock instance in witness_lock() if itismychild() fails. Witness is hosed anyways so no need to do any more witness related activity at that point. It also makes the code flow easier to understand. - Add a new depart() function as the opposite of enroll(). When the reference count of a witness drops to 0 in witness_destroy(), this function is called on that witness. First, it runs through the lock order tree using reparentchildren() to reparent direct descendants of the departing witness to each of the witness' parents in the tree. Next, it releases it's own child list and other associated resources. Finally it calls rebalanacetree() to rebalance the lock order tree. - Sort function prototypes into something closer to alphabetical order. As a result of these changes, there should no longer be 'dead' witnesses in the order tree, and repeatedly loading and unloading a module should no longer exhaust witness of its internal resources. Inspired by: gallatin	2003-03-11 22:07:35 +00:00
John Baldwin	d5b13ee082	Trim useless "../" leading strings from filenames passed into witness.	2003-03-11 21:53:12 +00:00
John Baldwin	28e4d137a2	Adjust style of #ifdef's and #endif's to be more consistent and in line with recent additions to style(9).	2003-03-11 21:38:49 +00:00
John Baldwin	d278a7f9ba	Do the lock order check skip for the LOP_TRYLOCK case after the check for recursing on a lock instead of before. This fixes a bug where WITNESS could get a little confused if you did an sx_tryslock() on a sx lock that you already had an slock on. WITNESS would still function correctly but it could result in weirdness in the output of 'show locks'. This also makes it possible for mtx_trylock() to recurse on a lock.	2003-03-11 20:54:37 +00:00
John Baldwin	ecdf4409f9	Rework the eventhandler locking for hopefully the last time. The scheme used popped into my head during my morning commute a few weeks ago, but it is also very similar (though a bit simpler) to a patch that mini@ developed a while ago. Basically, each eventhandler list has a mutex and a run count. During an eventhandler invocation, the mutex is held while we traverse the list but is dropped while we execute actual handlers. Also, a runcount counter is incremented at the start of an invocation and decremented at the end of an invocation. Adding to the list is not a big deal since the reference of a thread currently executing the handlers remains valid across an add operation. Whether or not new handlers are executed by threads currently executing the handlers for a given list is indeterminate however. The harder case is when a handler is removed from the list. If the runcount is zero, the handler is simply removed from the list directly. If the runcount is not zero, then another thread is currently executing the handlers of this list, so the priority of this handler is set to a magic value (currently -1) to mark it as dead. Dead handlers are not executed during an invocation. If the runcount is zero after it is decremented at the end of an invocation, then a new eventhandler_prune_list() function is called to remove dead handlers from the list. Additional minor notes: - All the common parts of EVENTHANDLER_INVOKE() and EVENTHANDLER_FAST_INVOKE() have been merged into a common _EVENTHANDLER_INVOKE() macro to reduce duplication and ease maintenance. - KTR logging for eventhandlers is now available via the KTR_EVH mask. - The global eventhander_mutex is no longer recursive. Tested by: scottl (SMP i386)	2003-03-11 20:17:00 +00:00
John Baldwin	75d468ee12	Axe the useless MTX_SLEEPABLE flag. mutexes are not sleepable locks. Nothing used this flag and WITNESS would have panic'd during mtx_init() if anything had.	2003-03-11 20:02:57 +00:00
John Baldwin	740190593a	Use a shorter and less redundant name for the sysctl tree lock.	2003-03-11 20:01:51 +00:00
John Baldwin	c06394f53f	Use the KTR_LOCK mask for logging events via KTR in lockmgr() rather than KTR_LOCKMGR. lockmgr locks are locks just like other locks.	2003-03-11 20:00:37 +00:00
John Baldwin	4c6ffc94c0	Trim leading "../" sequences from filenames.	2003-03-11 19:56:16 +00:00
Jeff Roberson	9ec559555b	- Regularize variable usage in cluster_read(). - Issue the io that we will later block on prior to doing cluster read ahead so that it is more likely to be ready when we block. - Loop issuing clustered reads until we've exhausted the seq count supplied by the file system. - Use a sysctl tunable "vfs.read_max" to determine the maximum number of blocks that we'll read ahead.	2003-03-11 06:14:03 +00:00
David Xu	661db6da35	Lock proc lock before changing p_flag.	2003-03-11 03:16:02 +00:00
David Xu	21e0492ab1	Fix signal delivering bug for threaded process.	2003-03-11 02:59:50 +00:00
David Xu	e574e444e0	Fix threaded process job control bug. SMP tested. Reviewed by: julian	2003-03-11 00:07:53 +00:00
Alexander Kabaev	72f0679cfa	Remove trainling whitespace.	2003-03-10 21:55:00 +00:00
Poul-Henning Kamp	194a0abf73	PHCC[1]: I had commented the #ifdef INVARIANTS checks out to make sure I ran this code in all kernels and forgot to comment the #ifdefs back in before I committed. Spotted by: bmilekic [1] PHCC = Pointy Hat Correction Commit	2003-03-10 20:24:54 +00:00
Poul-Henning Kamp	d3c11994e1	Make malloc and mbuf allocation mode flags nonoverlapping. Under INVARIANTS whine if we get incompatible flags. Submitted by: imp	2003-03-10 19:39:53 +00:00
John Baldwin	0e8677f68b	Now that we have WITNESS_WARN(), we only call witness_list() from the ddb 'show locks' command. Thus, move witness_list() to the #ifdef DDB section and remove extra checks for calling this function outside of DDB. Also, witness_list() now returns void instead of returning an int. Reported by: Steve Ames <steve@energistic.com> Prodded by: davidxu	2003-03-10 17:03:57 +00:00
Poul-Henning Kamp	45901e280b	Don't call make_dev() before we are ready for it.	2003-03-09 20:42:49 +00:00
Alan Cox	167b972088	Remove some unnecessary actions by the zero-copy setup and teardown code. Remove an incorrect comment. (Incrementing an object's reference count does not prevent a process from exiting. The real concern here is that the physical page must not be deleted until transmission is complete. That is already handled by the VM system and sf_buf_free().) Tested by: ken	2003-03-09 20:38:56 +00:00
Poul-Henning Kamp	d42ee4e410	Note that MAJOR_AUTO is now the default if d_maj is not initialized. This is more robust and prevents the hijacking of /dev/console for the typical mistake. Remove unneeded MAJOR_AUTO uses, it is only needed explicitly now if the driver source has cross-branch compatibility to old releases.	2003-03-09 11:03:45 +00:00
Poul-Henning Kamp	06a8bb906c	Add one little hack to allow us to make MAJOR_AUTO be zero: Let the console driver ask for major 256 and magically change this to mean zero.	2003-03-09 10:28:05 +00:00
David Xu	d03c79eea1	Cosmetic change, make it QUEUE_MACRO_DEBUG friendly	2003-03-09 04:27:46 +00:00
Tim J. Robbins	ef3dab76bf	Hold the proc lock while accessing p_procsig in trapsignal().	2003-03-09 01:40:55 +00:00
Poul-Henning Kamp	f37de12275	Retire devstat_add_entry() as a public function and bump __FreeBSD_version to mark this act.	2003-03-08 21:46:43 +00:00
Poul-Henning Kamp	c7e73d59c4	Introduce a device driver for /dev/devstat, this will allow us to mmap the device statistics structures into userland instead of using sysctl. Introduce new devstat_new_entry() function which allocates the devstat structure an calls devstat_add_entry() on it.	2003-03-08 19:58:57 +00:00
Kenneth D. Merry	9b80d344ec	Zero copy send and receive fixes: - On receive, vm_map_lookup() needs to trigger the creation of a shadow object. To make that happen, call vm_map_lookup() with PROT_WRITE instead of PROT_READ in vm_pgmoveco(). - On send, a shadow object will be created by the vm_map_lookup() in vm_fault(), but vm_page_cowfault() will delete the original page from the backing object rather than simply letting the legacy COW mechanism take over. In other words, the new page should be added to the shadow object rather than replacing the old page in the backing object. (i.e. vm_page_cowfault() should not be called in this case.) We accomplish this by making sure fs.object == fs.first_object before calling vm_page_cowfault() in vm_fault(). Submitted by: gallatin, alc Tested by: ken	2003-03-08 06:58:18 +00:00
David Xu	b4508d7d3f	Lock sched_lock before modifying td_flags.	2003-03-08 04:09:04 +00:00
Rob Braun	d132c84f07	Fix a spelling error. Submitted by: jkh Reviewed by: zarzycki	2003-03-07 22:47:32 +00:00
John Baldwin	9722121a3c	Respect any passed in external lockmgr flags such as LK_NOWAIT in the default implementations of VOP_LOCK() and VOP_UNLOCK(). Tested by: jlemon, phk Glanced at by: jeffr	2003-03-07 20:45:07 +00:00
John Baldwin	9da590b49b	Oops, fix the double faults people were seeing with the recent changes to witness. Sleepable locks such as sx locks always come before all mutexes including Giant. However, the static lock order list placed Giant before the proctree and allproc sx locks. This resulted in witness creating a cycle in its lock order "tree" (real trees don't have cycles) leading to infinite recursion and eventually a double fault. To fix, put Giant after sx locks in the lock order list.	2003-03-06 17:25:06 +00:00
Alan Cox	7c4351aabd	Remove GIANT_REQUIRED from sf_buf_free().	2003-03-06 04:48:19 +00:00
Robert Watson	9283578946	Instrument sysarch() MD privileged I/O access interfaces with a MAC check, mac_check_sysarch_ioperm(), permitting MAC security policy modules to control access to these interfaces. Currently, they protect access to IOPL on i386, and setting HAE on Alpha. Additional checks might be required on other platforms to prevent bypass of kernel security protections by unauthorized processes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-03-06 04:47:47 +00:00
Alan Cox	09c80124a3	Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress. Discussed on: arch@	2003-03-06 03:41:02 +00:00
Robert Watson	1b2c2ab29a	Provide a mac_check_system_swapoff() entry point, which permits MAC modules to authorize disabling of swap against a particular vnode. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-03-05 23:50:15 +00:00
Robert Watson	a184d471e2	Move the initialization of the vattr flags field in setfflags() to before the MAC check so that we pass the flags field into the MAC check properly initialized. This didn't affect any current MAC modules since they didn't care what the flags argument was (as they were primarily interested in the fact that it was a meta-data write, not the contents of the write), but would be relevant to future modules relying on that field. Submitted by: Mike Halderman <mrh@spawar.navy.mil> Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-03-05 23:15:23 +00:00
Peter Wemm	3c6b084e96	Finish driving a stake through the heart of netns and the associated ifdefs scattered around the place - its dead Jim! The SMB stuff had stolen AF_NS, make it official.	2003-03-05 19:24:24 +00:00
David Schultz	9c62b3ee7c	Make TTYHOG tunable. Reviewed by: mike (mentor)	2003-03-05 08:16:29 +00:00
Jonathan Lemon	1cafed3941	Update netisr handling; Each SWI now registers its queue, and all queue drain routines are done by swi_net, which allows for better queue control at some future point. Packets may also be directly dispatched to a netisr instead of queued, this may be of interest at some installations, but currently defaults to off. Reviewed by: hsu, silby, jayanth, sam Sponsored by: DARPA, NAI Labs	2003-03-04 23:19:55 +00:00
John Baldwin	c141c242ac	Bah, fix a bogon in the last commit: get the sense of a compare test right so that we allow a sleepable lock to be acquired with Giant held rather than allowing a sleepable lock to be acquired with anything but Giant held.	2003-03-04 22:34:07 +00:00
Jeff Roberson	24deed1aaa	- Hold the buf lock while manipulating and inspecting its fields. - Use gbincore() and not incore() so that we can drop the vnode interlock as we acquire the buflock. - Use GB_LOCK_NOWAIT when getting bufs for read ahead clusters so that we don't block on locked bufs. - Convert a while loop to a howmany() that will most likely be faster on modern processors. There is another while loop divide that was left near by because it is operating on a 64bit int and is most likely faster. - Cleanup the cluster_read() code a little to get rid of a goto and make the logic clearer. Tested on: x86, alpha Tested by: Steve Kargl <sgk@troutmask.apl.washington.edu> Reviewd by: arch	2003-03-04 21:35:28 +00:00
John Baldwin	1106937d99	Remove safety belt: it is now ok to do a mtx_trylock() on a mutex you already own. The mtx_trylock() will fail however. Enhance the comment at the top of the try lock function to explain this. Requested by: jlemon and his evil netisr locking	2003-03-04 21:32:25 +00:00
John Baldwin	263067951a	Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().	2003-03-04 21:03:05 +00:00
John Baldwin	9b4982bfed	Add a WITNESS_WARN() call to verify that we hold no locks after running a handler from an interrupt thread.	2003-03-04 21:01:42 +00:00
John Baldwin	35580ede37	A small overhaul of witness: - Add a comment about special lock order rules and Giant near the top of subr_witness.c. Specifically, this documents and explains the real lock order relationship between Giant and sleepable locks (i.e. lockmgr locks and sx locks). Basically, Giant can be safely acquired either before or after sleepable locks and the case of Giant before a sleepable lock is exempted as a special case. - Add a new static function 'witness_list_lock()' that displays a single line of information about a struct lock_instance. This is used to make the output of witness messages more consistent and reduce some code duplication. - Fixup a few comments in witness_lock(). - Properly handle the Giant-before-sleepable-lock lock order exception in a more general fashion and remove the no longer needed LI_SLEPT flag. - Break up the last condition before assuming a reversal a bit to try and make the logic less confusing in witness_lock(). - Axe WITNESS_SLEEP() now that LI_SLEPT is no longer needed and replace it with a more general WITNESS_WARN() macro/function combination. WITNESS_WARN() allows you to output a customized message out to the console along with a list of held locks. It will optionally drop into the debugger as well. You can exempt a single lock from the check by passing it in as the second argument. You can also use flags to specify if Giant should be exempt from the check, if all sleepable locks should be exempt from the check, and if witness should panic if any non-exempt locks are found. - Make the witness_list() function static. Other areas of the kernel should use the new WITNESS_WARN() instead.	2003-03-04 20:56:39 +00:00
John Baldwin	5fa8dd90f9	Miscellaneous cleanups to _mtx_lock_sleep(): - Declare some local variables at the top of the function instead of in a nested block. - Use mtx_owned() instead of masking off bits from mtx_lock manually. - Read the value of mtx_lock into 'v' as a separate line rather than inside an if statement for clarity. This code is hairy enough as it is.	2003-03-04 20:32:41 +00:00
John Baldwin	6b869595c5	Properly assert that mtx_trylock() is not called on a mutex we already owned. Previously the KASSERT would only trigger if we successfully acquired a lock that we already held. However, _obtain_lock() fails to acquire locks that we already hold, so the KASSERT was never checked in the case it was supposed to fail.	2003-03-04 20:30:30 +00:00
Jeff Roberson	e1f89c222b	- Create a function sched_interact_score() which decides on the interactivity of a kseg and assigns it a value of 0 through 100. - Use sched_interact_score() to determine the dynamic priority. - Define SCHED_CURR() in terms of sched_interact_score(). - Adjust the maximum slice back down to 100ms. - Remove redundant clearing of ke_runq in sched_wakeup() - Clean up #defines and comment them.	2003-03-04 02:45:59 +00:00
Jeff Roberson	7261f5f68e	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick	2003-03-04 00:04:44 +00:00
Jeff Roberson	f727171140	- Correct the wchan in vop_stdfsync() This is almost what bde asked for. There is some desire to have per fs wchans still but that is difficult giving the current arrangement of the code.	2003-03-03 23:37:50 +00:00
Ruslan Ermilov	6de61153e8	FreeBSD 5.0 has stopped shipping /modules 2.5 years ago. Catch up with this further by excluding /modules from the (default) kern.module_path.	2003-03-03 22:53:35 +00:00
Nate Lawson	7dc9111650	Pick up one file missed in the previous vprint() cleanup	2003-03-03 19:50:36 +00:00
Nate Lawson	99648386d3	Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.	2003-03-03 19:15:40 +00:00
Poul-Henning Kamp	182a9f7455	Make nokqfilter() return the correct return value. Ditch the D_KQFILTER flag which was used to prevent calling NULL pointers.	2003-03-03 16:24:47 +00:00
Poul-Henning Kamp	7ac40f5f59	Gigacommit to improve device-driver source compatibility between branches: Initialize struct cdevsw using C99 sparse initializtion and remove all initializations to default values. This patch is automatically generated and has been tested by compiling LINT with all the fields in struct cdevsw in reverse order on alpha, sparc64 and i386. Approved by: re(scottl)	2003-03-03 12:15:54 +00:00
Poul-Henning Kamp	a9463ba804	Don't pick up a name from the dev_t if it is not there.	2003-03-03 11:14:36 +00:00
Jeff Roberson	65c8760dbf	- Shift the tick count by 10 and back around sched_pctcpu_update() calculations. Keep this changes local to the function so the tick count is in its natural form otherwise. Previously 1000 was added each time a tick fired and we divided by 1000 when it was reported. This is done to reduce rounding errors.	2003-03-03 05:29:09 +00:00
Jeff Roberson	a6ed41865b	- In sched_add() special case PRI_TIMESHARE and PRI_ITHD\|PRI_REALTIME. We always place ITHD & REALTIME threads on the current queue of the current cpu. Prior to this change an interrupt thread would only ever run on one cpu.	2003-03-03 04:28:07 +00:00
Jeff Roberson	f1e8dc4a3b	- Refrain from setting the td_priority in sched_wakeup(). It will be reset before we return to user space.	2003-03-03 04:11:40 +00:00
Poul-Henning Kamp	f16304aaf0	Explicitly initialize all cdevsw methods with the relevant nofoo() function if they are NULL.	2003-03-02 19:46:45 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Dag-Erling Smørgrav	8994a245e0	Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.	2003-03-02 15:56:49 +00:00
Dag-Erling Smørgrav	c952458814	uiomove-related caddr_t -> void * (just the low-hanging fruit)	2003-03-02 15:50:23 +00:00
Dag-Erling Smørgrav	d5279f20c5	Convert one of our main caddr_t consumers, uiomove(9), to void *.	2003-03-02 15:29:13 +00:00
Dag-Erling Smørgrav	34ca14c687	Clean up whitespace, unregisterize, ANSIfy, remove prototypes made superfluous by ANSIfication.	2003-03-02 15:08:33 +00:00
Poul-Henning Kamp	9c486c30e2	NO_GEOM cleanup: Remove cdevsw->d_size() implementation. No longer needed.	2003-03-02 14:43:46 +00:00
Poul-Henning Kamp	9285a87efd	NODEVFS cleanup: Replace devfs_{create,destroy} hooks with direct function calls.	2003-03-02 13:35:30 +00:00
Jeff Roberson	491081fabf	- Hold the vnode interlock across calls to bgetvp instead of acquiring it internally. This is required to stop multiple bufs from being associated with a single lblkno.	2003-03-02 06:05:23 +00:00
Tor Egge	c6faf3bf1d	Remove unneeded code added in revision 1.188.	2003-03-01 17:18:28 +00:00
Jeff Roberson	bff5362bf2	- gc USE_BUFHASH. The smp locking of the buf cache renders this useless.	2003-03-01 05:55:03 +00:00
David Xu	9948c47f0e	Check kse group limit before linking new ksegrp.	2003-02-28 15:57:33 +00:00
Poul-Henning Kamp	85f19dccdb	Add the flip-side check: If a driver wants a particular major#, make sure it is marked as allocated in reserved_majors[]. Whine if it wasn't.	2003-02-27 15:17:37 +00:00
Maxime Henrion	bca0668a92	We can now properly return ENODEV in nommap(), so do it. Remove the now wrong comment which says we can't.	2003-02-27 14:48:53 +00:00
Poul-Henning Kamp	beea48b254	Add support for allocating a device driver major number on demand. To do this, initialize the d_maj member of the cdevsw to MAJOR_AUTO. When the cdevsw is first passed to make_dev() a free major number will be assigned. Until we have a bit more experience with this a printf will announce this fact. Major numbers are not reclaimed, so loading/unloading the same device driver which uses MAJOR_AUTO will eventually deplete the pool of free major numbers and the system will panic when it can not allocate one. Still undecided who to invonvenience with the solution to this.	2003-02-27 14:46:51 +00:00
Hartmut Brandt	b89bc9e62b	When a process has been waiting on a condition variable or mutex the td_wmesg field in the thread structure points to the description string of the condition variable or mutex. If the condvar or the mutex had been initialized from a loadable module that was unloaded in the meantime, td_wmesg may now point to invalid memory. Retrieving the process table now may panic the kernel (or access junk). Setting the td_wmesg field to NULL after unblocking on the condvar/mutex prevents this panic. PR: kern/47408 Approved by: jake (mentor)	2003-02-27 08:43:27 +00:00
Poul-Henning Kamp	f477b4fd53	NODEVFS cleanup: Remove cdevsw_add() and cdevsw_remove(), they served us well for a long time. Bump __FreeBSD_version to 500104 to mark this.	2003-02-27 07:40:44 +00:00
David Xu	3b3df40fc4	Release sched_lock before calling upcall_free.	2003-02-27 05:42:01 +00:00
Julian Elischer	ac2e415327	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.	2003-02-27 02:05:19 +00:00
Sam Leffler	893bec8059	o fix ppsratecheck to interpret a maxpps of zero as "ignore everything" o add a comment explaining the significance of using 0 or -1 (actually any negative value) for maxpps	2003-02-26 17:16:38 +00:00
David Xu	426269b2c2	Fix a bug when handling SIGCONT. Reported By: Mike Makonnen <mtm@identd.net>	2003-02-26 12:47:46 +00:00
Scott Long	7874f606d5	Introduce a new taskqueue that runs completely free of Giant, and in turns runs its tasks free of Giant too. It is intended that as drivers become locked down, they will move out of the old, Giant-bound taskqueue and into this new one. The old taskqueue has been renamed to taskqueue_swi_giant, and the new one keeps the name taskqueue_swi.	2003-02-26 03:15:42 +00:00
David Xu	5614648e5e	Add a missing '!'.	2003-02-26 01:56:14 +00:00
David Xu	4b4866ed42	Add a simple facility to allow round roubin in userland. Reviewed by: julain	2003-02-26 00:58:23 +00:00
Kirk McKusick	7e734c4149	When doing cleanup of excessive buffers in bdwrite (see kern/vfs_bio.c delta 1.371) we must ensure that we do not get ourselves into a recursive trap endlessly trying to clean up after ourselves. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-25 23:59:09 +00:00
Mike Makonnen	0bd5f7979d	Unbreak mutex profiling (at least for me). o Always check for null when dereferencing the filename component. o Implement a try-and-backoff method for allocating memory to dump stats to avoid a spin-lock -> sleep-lock mutex lock order panic with WITNESS. Approved by: des, markm (mentor) Not objected: jhb	2003-02-25 22:28:46 +00:00
Jeff Roberson	2e3981a70c	- Add the missing NULL interlock argument to a recently added BUF_LOCK.	2003-02-25 08:23:11 +00:00
Kirk McKusick	3a7053cb60	Prevent large files from monopolizing the system buffers. Keep track of the number of dirty buffers held by a vnode. When a bdwrite is done on a buffer, check the existing number of dirty buffers associated with its vnode. If the number rises above vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one of the other (hopefully older) dirty buffers associated with the vnode is written (using bawrite). In the event that this approach fails to curb the growth in it the vnode's number of dirty buffers (due to soft updates rollback dependencies), the more drastic approach of doing a VOP_FSYNC on the vnode is used. This code primarily affects very large and actively written files such as snapshots. This change should eliminate hanging when taking snapshots or doing background fsck on very large filesystems. Hopefully, one day it will be possible to cache filesystem metadata in the VM cache as is done with file data. As it stands, only the buffer cache can be used which limits total metadata storage to about 20Mb no matter how much memory is available on the system. This rather small memory gets badly thrashed causing a lot of extra I/O. For example, taking a snapshot of a 1Tb filesystem minimally requires about 35,000 write operations, but because of the cache thrashing (we only have about 350 buffers at our disposal) ends up doing about 237,540 I/O's thus taking twenty-five minutes instead of four if it could run entirely in the cache. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-25 06:44:42 +00:00
David Xu	d4b570f053	Remove a bogus comment.	2003-02-25 05:17:18 +00:00
David Xu	768298d8c4	Remove a never true condition.	2003-02-25 05:14:18 +00:00
Jeff Roberson	17661e5ac4	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
Maxime Henrion	07159f9c56	Cleanup of the d_mmap_t interface. - Get rid of the useless atop() / pmap_phys_address() detour. The device mmap handlers must now give back the physical address without atop()'ing it. - Don't borrow the physical address of the mapping in the returned int. Now we properly pass a vm_offset_t * and expect it to be filled by the mmap handler when the mapping was successful. The mmap handler must now return 0 when successful, any other value is considered as an error. Previously, returning -1 was the only way to fail. This change thus accidentally fixes some devices which were bogusly returning errno constants which would have been considered as addresses by the device pager. - Garbage collect the poorly named pmap_phys_address() now that it's no longer used. - Convert all the d_mmap_t consumers to the new API. I'm still not sure wheter we need a __FreeBSD_version bump for this, since and we didn't guarantee API/ABI stability until 5.1-RELEASE. Discussed with: alc, phk, jake Reviewed by: peter Compile-tested on: LINT (i386), GENERIC (alpha and sparc64) Runtime-tested on: i386	2003-02-25 03:21:22 +00:00
Scott Long	3303c14b57	Don't NULL out p_fd until after closefd() has been called. This isn't totally correct, but it has caused breakage for too long. I welcome someone with more fd fu to fix it correctly.	2003-02-24 05:46:55 +00:00
David Xu	0fccb684d1	Remove a XXXKSE. kg_completed now needs proc lock.	2003-02-24 01:28:10 +00:00
David Xu	f5878f69df	Backout last surplus commit. That day just wasn't my day.	2003-02-24 00:49:55 +00:00
Tor Egge	6a07a13944	Sync new socket nonblocking/async state with file flags in accept(). PR: 1775 Reviewed by: mbr	2003-02-23 23:00:28 +00:00
Poul-Henning Kamp	acb18acfec	Bracket the kern.vnode sysctl in #ifdef notyet because it results in massive locking issues on diskless systems. It is also not clear that this sysctl is non-dangerous in its requirements for locked down memory on large RAM systems.	2003-02-23 18:09:05 +00:00
Poul-Henning Kamp	5cb3dc8fa3	OK, I was too sleepy there... Pointy hat over here!	2003-02-23 13:45:55 +00:00
Poul-Henning Kamp	8f5ef1a9fa	Implement CLOCK_MONOTONIC.	2003-02-23 10:18:31 +00:00
Jake Burkholder	fc718df7d0	Add a /a modifier to the show ktr ddb command, which prints the whole trace buffer without stopping. Useful if you just want to capture the output but can't run ktrdump.	2003-02-22 23:30:37 +00:00
Robert Watson	90623e1a9e	Don't panic when enumerating SYSCTL_NODE() nodes without any children nodes. Submitted by: green, Hiten Pandya <hiten@unixdaemons.com>	2003-02-22 17:58:06 +00:00
Mike Makonnen	750a91d8b1	Remove a comment which hasn't been true since rev. 1.158 Approved by: jhb, markm (mentor)(implicit)	2003-02-22 05:59:48 +00:00
Robert Watson	838a6d03e8	Export the name of the device used to mount the root file system as kern.rootdev. If rootdev is undefined (NFS mount, etc), export an empty string. Desired by: peter	2003-02-22 05:01:12 +00:00
Peter Wemm	86bb731626	Missing M_TRYWAIT from so_upcall third argument.	2003-02-21 22:23:40 +00:00
Poul-Henning Kamp	2c6b49f6af	NO_GEOM cleanup: Retire the "d_dump_t" and use the "dumper_t" type instead. Dumper_t takes a void * as first arg which is more general than the dev_t taken by d_dump_t. (Remember: we could have net-dumpers if somebody wrote us one!) Define the convention for GEOM controlled disk devices to be that the first argument to the dumper function is the struct disk pointer. Change device drivers accordingly.	2003-02-21 19:00:48 +00:00
David Xu	34ada4b3bb	If UTS kernel is calling kse_wakeup for itself, do nothing.	2003-02-21 07:11:38 +00:00
Poul-Henning Kamp	263444cfbf	Change the console interface to pass a "struct consdev " instead of a dev_t to the method functions. The dev_t can still be found at struct consdev ->cn_dev. Add a void *cn_arg element to struct consdev which the drivers can use for retrieving their softc.	2003-02-20 20:54:45 +00:00
Poul-Henning Kamp	02574b19e1	Add a dead_cdevsw which does its best to return ENXIO if at all possible. In devsw() return dead_cdevsw instead of NULL in case the dev_t does not have a si_devsw. This may improve our survival chances with devices which go away unexpectedly.	2003-02-20 15:35:54 +00:00
David Xu	ab7d94f7eb	Forgot to set KU_DOUPCALL in kse_wakeup.	2003-02-20 08:22:04 +00:00
David Xu	eb117d5cb0	Add a timeout parameter to kse_release.	2003-02-20 08:18:15 +00:00
Bosko Milekic	025b4be197	o Allow "buckets" in mb_alloc to be differently sized (according to compile-time constants). That is, a "bucket" now is not necessarily a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ worth of mbufs, clusters. o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce {mbuf,clust}_lowm, which currently has no effect but will be used to set the low watermarks. o Fix netstat so that it can deal with the differently-sized buckets and teach it about the low watermarks too. o Make sure the per-cpu stats for an absent CPU has mb_active set to 0, explicitly. o Get rid of the allocate refcounts from mbuf map mess. Instead, just malloc() the refcounts in one shot from mbuf_init() o Clean up / update comments in subr_mbuf.c	2003-02-20 04:26:58 +00:00
Tim J. Robbins	27e39ae4d8	Remove the PL_SHAREMOD flag from struct plimit, which could have been used to share resource limits between rfork threads, but never was. Removing it makes resource limit locking much simpler -- only the current process can change the contents of the structure that p_limit points to.	2003-02-20 04:18:42 +00:00
Olivier Houchard	d6bf23783f	Remove duplicate includes. Submitted by: Cyril Nguyen-Huu <cyril@ci0.org>	2003-02-20 03:26:11 +00:00
Bosko Milekic	ec73437395	Fix a serious bug when computing the index for the reference counter array for mbuf clusters. I don't know how this got past early testing nor how it survived so long without getting caught. If anyone was seeing really really bizarre memory corruption in a few mbufs this would be why.	2003-02-20 03:01:04 +00:00
David Xu	a87891ee9e	Move thread limits testing code up a bit. This let UPCALLING thread takes possible accumulated contexts away.	2003-02-20 01:11:17 +00:00
Poul-Henning Kamp	0c977c9c53	Add M_WAITOK	2003-02-19 22:51:33 +00:00
David Xu	fc8cdd87d2	Count non-threaded group.	2003-02-19 13:40:24 +00:00
David Xu	4f6cfa4520	Update comments to reflect new KSE code.	2003-02-19 13:36:51 +00:00
Tim J. Robbins	a44a414e11	The "m = m->m_next" that was removed in the revision 1.12 was necessary for the m->m_next != NULL case to avoid looping infinitely when the first mbuf in the chain becomes full.	2003-02-19 10:12:42 +00:00
David Xu	30621e142d	M_WAITOK and remove an useless comment.	2003-02-19 09:59:12 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
David Xu	0252d20369	Optimize the case when max threads number was hit.	2003-02-19 04:01:55 +00:00
Peter Wemm	af3d516f55	Initiate de-orbit burn for USE_PCI_BIOS_FOR_READ_WRITE. This has been #if'ed out for a while. Complete the deed and tidy up some other bits. We need to be able to call this stuff from outer edges of interrupt handlers for devices that have the ISR bits in pci config space. Making the bios code mpsafe was just too hairy. We had also stubbed it out some time ago due to there simply being too much brokenness in too many systems. This adds a leaf lock so that it is safe to use pci_read_config() and pci_write_config() from interrupt handlers. We still will use pcibios to do interrupt routing if there is no acpi.. [yes, I tested this] Briefly glanced at by: imp	2003-02-18 03:36:49 +00:00
David Xu	88aba94cdc	Further fix PS_NEEDSIGCHK	2003-02-17 14:54:57 +00:00
David Xu	02bbffaf3c	Move code for detecting PS_NEEDSIGCHK into thread_schedule_upcall, I think it is a better place to handle it.	2003-02-17 14:41:22 +00:00
Tim J. Robbins	96d7f8ef46	Use the proc lock to protect p_realtimer instead of Giant, and obtain sched_lock around accesses to p_stats->p_timer[] to avoid a potential race with hardclock. getitimer(), setitimer() and the realitexpire() callout are now Giant-free.	2003-02-17 10:03:02 +00:00
Jeff Roberson	58a3c27384	- Add a new function, thread_signal_add(), that is called from postsig to add a signal to a mailbox's pending set. - Add a new function, thread_signal_upcall(), this causes the current thread to upcall so that we can deliver pending signals. Reviewed by: mini	2003-02-17 09:58:11 +00:00
Julian Elischer	4a338afd7a	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@	2003-02-17 09:55:10 +00:00
Jeff Roberson	5215b1872f	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu	2003-02-17 05:14:26 +00:00
Jeff Roberson	e4625663c9	- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into the proc. These counters are only examined through calcru. Submitted by: davidxu Tested on: x86, alpha, UP/SMP	2003-02-17 02:19:58 +00:00
Alfred Perlstein	9d4156aed3	Fix logic in loop so it actually executes. Pointed out by: fjoe	2003-02-16 16:12:10 +00:00
Poul-Henning Kamp	f341ca9891	Remove #include <sys/dkstat.h>	2003-02-16 14:13:23 +00:00
Poul-Henning Kamp	3abd4ccf87	Move the tty related statistics counters to live with the tty code.	2003-02-16 13:22:15 +00:00
Jeff Roberson	71146186a1	- Introduce a new function bremfreel() that does a bremfree with the buf queue lock already held. - In getblk() and flushbufqueues() use bremfreel() while we still have the buf queue lock held to keep the lists consistent. - Add LK_NOWAIT to two cases where we're essentially asserting that the bufs are not locked while acquiring the locks. This will make sure that we get the appropriate panic() and not another one for sleeping with a lock held.	2003-02-16 10:43:06 +00:00
Jeff Roberson	5e8feb5bed	- Add a WITNESS_SLEEP() for the appropriate cases in lockmgr().	2003-02-16 10:39:49 +00:00
Alfred Perlstein	5015c68a3c	prevent overflow in shminfo.shmmax	2003-02-16 06:08:55 +00:00
Jeffrey Hsu	a44009e07d	Remove extraneous FILEDESC_LOCK around atomic read.	2003-02-16 02:15:15 +00:00
Andrew R. Reiter	1f5a94d5f6	- Update a couple of comments to make sense with what today's code is doing (stale comments make arr something something ;)).	2003-02-15 23:25:12 +00:00
Tor Egge	218a01e062	Avoid file lock leakage when linuxthreads port or rfork is used: - Mark the process leader as having an advisory lock - Check if process leader is marked as having advisory lock when closing file - Check that file is still open after lock has been obtained - Don't allow file descriptor table sharing between processes with different leaders PR: 10265 Reviewed by: alfred	2003-02-15 22:43:05 +00:00
Andrew R. Reiter	da8f0c8429	- Remove old comment for PURGE() as it no longer exists and implied it was a comment to cache_zap(). - Add a comment to quickly state what cache_zap() does. Reviewed by: phk, mux	2003-02-15 18:58:06 +00:00
Tim J. Robbins	4444375710	Acquire Giant around calls to kern_sigaction() in sigaction(), freebsd4_sigaction() and osigaction() instead of around the whole body of those functions. They now no longer hold Giant around calls to copyin() and copyout(), and it is slightly more obvious what Giant is protecting.	2003-02-15 09:56:09 +00:00
Tim J. Robbins	c41c566c4a	osigpending() no longer needs Giant, for the same reason sigpending() does not.	2003-02-15 09:15:30 +00:00
Tim J. Robbins	48e8f774cb	All uses of p_siglist are protected by the proc lock now, so there's no need to acquire Giant in sigpending() anymore.	2003-02-15 08:42:02 +00:00
Alfred Perlstein	e7d6662f1b	Do not allow kqueues to be passed via unix domain sockets.	2003-02-15 06:04:55 +00:00
Alfred Perlstein	edf6699ae6	Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a barrier between free'ing filedesc structures. Basically if you want to access another process's filedesc, you want to hold this mutex over the entire operation.	2003-02-15 05:52:56 +00:00
Bosko Milekic	9e7225808e	Make m_getm() always return the top of the newly allocated chain, as opposed to returning the top of the old chain when there was one and the top of the newly allocated chain if there was no old chain. Actually, it should be noted that prior to this fix, although the comment above m_getm() advertised that m_getm() would return the top of the old chain (if an old chain was being passed in) it actually [wrongly] was returning the tail mbuf in the old chain instead. This is a bug but since the one use of m_getm() in the tree luckily did not depend on the behavior, it happened to work out without notice. Harti Brandt pointed out that the advertised behavior was actually not the real behavior and so this change makes m_getm() ALWAYS return the newly allocated chain (and fixes the comment). This is less confusing and is the best course of action as then the caller is always able to have both a reference to the top of the original chain (because it's passing it in in the call) and a reference to the newly attached chain. Although the API is slightly modified, I don't think that any third-party code uses m_getm() and if it does, it surely can't be working properly because the old behavior was bogus. API bug pointed out by: Harti Brandt <brandt@fokus.fraunhofer.de>	2003-02-14 16:50:13 +00:00
Dag-Erling Smørgrav	af2eed6648	Style nit.	2003-02-14 13:30:25 +00:00
Alfred Perlstein	3dc593c895	KASSERT format string does not need newline termination	2003-02-14 13:28:44 +00:00
Alfred Perlstein	0c5f7aaab5	Add kasserts to catch bad API usage. Submitted by: Hiten Pandya <hiten@unixdaemons.com>	2003-02-14 13:18:51 +00:00
Alfred Perlstein	c11110eabe	Fix crash dumps on ata and scsi. To fix scsi, don't wait for ithreads if we're dumping, it makes the debugger sad. To fix ata, use what appears to be a polling method if we're dumping, I stole this from tmm but added code to ensure that this change is only in effect while dumping. Tested by: des	2003-02-14 13:10:40 +00:00
Alfred Perlstein	e95499bd4c	style.	2003-02-14 12:44:48 +00:00

... 3 4 5 6 7 ...

6382 Commits