freebsd-skq

Author	SHA1	Message	Date
Peter Wemm	cde6302bf0	MNAMELEN is back to an int again after Kirk's statfs commit kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4) *** Error code 1	2003-11-12 17:09:12 +00:00
John Baldwin	861a7db56f	Fix a typo in a comment. Submitted by: das	2003-11-12 14:55:45 +00:00
Poul-Henning Kamp	1415a09d42	Replace B_PHYS conditional assignment to bio_offset with KASSERT check to see that the originating code already did it right.	2003-11-12 10:27:06 +00:00
Kirk McKusick	1977597b34	Update the five files derived from /sys/kern/syscalls.master after the additions made for the new statfs structure (version 1.157). These must be updated in a separate checkin after syscalls.master has been checked in so that they reflect its new CVS identity. As these are purely derived files, it is not clear to me why they are under CVS at all. I presume that it has something to do with having `make world' operate properly.	2003-11-12 08:09:19 +00:00
Kirk McKusick	fde81c7d8e	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
Robert Watson	eca8a663d4	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 03:14:31 +00:00
Alexander Kabaev	5c957adbf1	1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson	2003-11-12 02:54:47 +00:00
John Baldwin	961a7b244d	Add an implementation of turnstiles and change the sleep mutex code to use turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP	2003-11-11 22:07:29 +00:00
Joseph Koshy	a5896914f0	Bound the number of iterations a thread can perform inside ktr_resize_pool(); this eliminates a potential livelock. Return ENOSPC only if we encountered an out-of-memory condition when trying to increase the pool size. Reviewed by: jhb, bde (style)	2003-11-11 09:09:26 +00:00
Joseph Koshy	b10221ffd9	Have utrace(2) return ENOMEM if malloc() fails. Document this error return in its manual page. Reviewed by: jhb	2003-11-11 04:54:11 +00:00
Alan Cox	e35e0182c3	- Revision 1.469 of vfs_subr.c resulted in the buf's b_object field being consistency initialized. Consequently, a number of conditionals that checked the validity of b_object before passing it to VM_OBJECT_LOCK() and VM_OBJECT_UNLOCK() are no longer needed.	2003-11-11 04:45:37 +00:00
Robert Watson	c8e7bf92ad	Whitespace sync to MAC branch, expand comment at the head of the file.	2003-11-11 03:40:04 +00:00
Alfred Perlstein	cd3c61b93d	Fix a bug where the taskqueue kproc was being parented by init because RFNOWAIT was being passed to kproc_create. The result was that shutdown took quite a bit longer because this errant "child" would not respond to termination signals from init at system shutdown. RFNOWAIT dissassociates itself from the caller by attaching to init as a parent proc. We could have had the taskqueue proc listen for SIGKILL, but being able to SIGKILL a potentially critical system process doesn't seem like a good idea.	2003-11-10 20:39:44 +00:00
Tim J. Robbins	541c3b66b5	When there are no free sem_undo structs available in semu_alloc(), only free one sem_undo with un_cnt == 0 instead of all of them. This is a temporary workaround until the SLIST_FOREACH_PREVPTR loop gets fixed so that it doesn't cause cycles in semu_list when removing multiple adjacent items. It might be easier to just use (doubly-linked) LISTs here instead of complicated SLIST code to achieve O(1) removals. This bug manifested itself as a complete lockup under heavy semaphore use by multiple processes with the SEM_UNDO flag set. PR: 58984	2003-11-10 07:22:41 +00:00
Marcel Moolenaar	fcaa2925a9	Change the clear_ret argument of get_mcontext() to be a flags argument. Since all callers either passed 0 or 1 for clear_ret, define bit 0 in the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI code for possible (but unlikely) future use. The remaining bits are for use by MD code. This change is triggered by a need on ia64 to have another knob for get_mcontext().	2003-11-09 20:31:04 +00:00
Bruce Evans	b698380f33	Quick fix for scaling of statclock ticks in the SMP case. As explained in the log message for kern_sched.c 1.83 (which should have been repo-copied to preserve history for this file), the (4BSD) scheduler algorithm only works right if stathz is nearly 128 Hz. The old commit lock said 64 Hz; the scheduler actually wants nearly 16 Hz but there was a scale factor of 4 to give the requirement of 64 Hz, and rev.1.83 changed the scale factor so that the requirement became 128 Hz. The change of the scale factor was incomplete in the SMP case. Then scheduling ticks are provided by smp_ncpu CPUs, and the scheduler cannot tell the difference between this and 1 CPU providing scheduling ticks smp_ncpu times faster, so we need another scale factor of smp_ncp or an algorithm change. This quick fix uses the scale factor without even trying to optimize the runtime divisions required for this as is done for the other scale factor. The main algorithmic problem is the clamp on the scheduling tick counts. This was 295; it is now approximately 295 * smp_ncpu. When the limit is reached, threads get free timeslices and scheduling becomes very unfair to the threads that don't hit the limit. The limit can be reached and maintained in the worst case if the load average is larger than (limit / effective_stathz - 1) / 2 = 0.65 now (was just 0.08 with 2 CPUs before this change), so there are algorithmic problems even for a load average of 1. Fortunately, the worst case isn't common enough for the problem to be very noticeable (it is mainly for niced CPU hogs competing with less nice CPU hogs).	2003-11-09 13:45:54 +00:00
Seigo Tanimura	512824f8f7	- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current	2003-11-09 09:17:26 +00:00
Sam Leffler	7902224c6b	o add a flags parameter to netisr_register that is used to specify whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation	2003-11-08 22:28:40 +00:00
David Xu	685a6c448a	Return a reasonable number for top or ps to display for M:N thread, since there is no direct association between M:N thread and kse, sometimes, a thread does not have a kse, in that case, return a pctcpu from its last kse, it is not perfect, but gives a good number to be displayed.	2003-11-08 03:03:17 +00:00
John Baldwin	dac33f12cc	Regen.	2003-11-07 20:30:30 +00:00
John Baldwin	c055e5d412	Mark ptrace(), ktrace(), utrace(), sysarch(), and issetugid() as MP safe. The parts of these calls that are not yet MP safe acquire Giant explicitly.	2003-11-07 20:23:23 +00:00
Robert Watson	a2f88a8b7c	Slight whitespace consistency improvement: Trim trailing whitespace. Remove unmatched " " before ")".	2003-11-07 04:47:14 +00:00
Jeff Roberson	f28b3340c1	- Somehow I botched my last commit. Add an extra ( to fix things up. I'm still not sure how this happened. Reported by: ps	2003-11-06 07:56:01 +00:00
Alan Cox	3b2c54e7bc	- Delay the allocation of memory for the pipe mutex until we need it. This avoids the need to free said memory in various error cases along the way.	2003-11-06 05:58:26 +00:00
Alan Cox	fc17df5264	- Simplify pipespace() by eliminating the explicit creation of vm objects. Instead, let the vm objects be lazily instantiated at fault time. This results in the allocation of fewer vm objects and vm map entries due to aggregation in the vm system.	2003-11-06 05:08:12 +00:00
Robert Watson	83b7b0edca	Remove the flags argument from mac_externalize_*_label(), as it's not passed into policies or used internally to the MAC Framework. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-06 03:42:43 +00:00
Jeff Roberson	a70d729bff	- Remove the local definition of sched_pin and unpin. They are provided in sched.h now. - Respect the td pin count.	2003-11-06 03:09:51 +00:00
Sam Leffler	d3be1471c7	o make debug_mpsafenet globally visible o move it from subr_bus.c to netisr.c where it more properly belongs o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to grab Giant as needed when MPSAFE operation is enabled Supported by: FreeBSD Foundation	2003-11-05 23:42:51 +00:00
Warner Losh	252af39a96	Minor style(9) nit	2003-11-05 06:14:48 +00:00
Jeff Roberson	46f8b26550	- It's ok if sched_runnable() has races in it, we don't need the sched_lock here unless we have something on the assigned queue.	2003-11-05 05:30:12 +00:00
Alexander Kabaev	ca430f2e92	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
Max Khon	2332251c6a	Back out the following revisions: 1.36 +73 -60 src/sys/compat/linux/linux_ipc.c 1.83 +102 -48 src/sys/kern/sysv_shm.c 1.8 +4 -0 src/sys/sys/syscallsubr.h That change was intended to support vmware3, but wantrem parameter is useless because vmware3 uses SYSV shared memory to talk with X server and X server is native application. The patch worked because check for wantrem was not valid (wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments). Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set to 1 allows to return removed segments in shm_find_segment_by_shmid() and shm_find_segment_by_shmidx(). MFC after: 1 week	2003-11-05 01:53:10 +00:00
Kirk McKusick	b932dd9b28	Get rid of DIAGNOSTIC that gives false positives on slow CPUs.	2003-11-04 08:03:11 +00:00
Jeff Roberson	9bacd788a1	- Add initial support for pinning and binding.	2003-11-04 07:45:41 +00:00
Kirk McKusick	15a93fcc31	Allow the bufdaemon and update daemon processes to skip the waitrunningbufspace() calls so that they are always able to proceed and clean up buffer space. Submitted by: Brian Fundakowski Feldman <green@freebsd.org>	2003-11-04 06:30:00 +00:00
Sam Leffler	3465702f13	disable MPSAFE network drivers; we aren't ready yet`	2003-11-04 02:01:42 +00:00
Olivier Houchard	7922cdc855	I believe kbyanc@ really meant this in rev 1.58. Use zpfind() to see if the process became a zombie if pfind() doesn't find it and if the caller wants to know about process death, so that the caller knows the process died even if it happened before the kevent was actually registered. MFC after: 1 week	2003-11-04 01:41:47 +00:00
Olivier Houchard	f44004690c	Do not attempt to report proc event if NOTE_EXIT has already been received. This fixes a race condition (specifically with signal events) that could lead to the kn being re-inserted into the list after it has been destroyed, which is not something we want to happen. PR: kern/58258	2003-11-04 01:14:58 +00:00
John Baldwin	8bc0846476	Don't require INTR_FAST handlers to be exclusive in the MI layer. Instead, let the MD code choose whether or not to implement such a policy. The new i386 interrupt code allows multiple FAST handlers for a given source for example. However, the code does not allow FAST and non-FAST handlers to be mixed.	2003-11-03 22:42:58 +00:00
John Baldwin	b95bb3e62b	Update spin lock order list for new i386 interrupt and SMP code.	2003-11-03 22:38:30 +00:00
Robert Watson	730ecf8254	Unlock pipe mutex when failing MAC pipe ioctl access control check. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-03 17:58:23 +00:00
Jeff Roberson	112b6d3aa9	- Remove kseq_find(), we no longer scan other cpu's run queues when we go idle. They figure out that we're idle fast enough that the cache pollution introduces by scanning their run queue is more expensive than waiting a little longer. - Add kseq_setidle() to mark us as being idle. Use this in place of kseq_find(). - Remove kseq_load_highest(), kseq_find() was the only consumer of this interface. kseq_balance() has it's own customized version that finds the lowest and highest loads simultaneously. Continuously told that this would be faster by: terry	2003-11-03 03:27:22 +00:00
Jeff Roberson	ef1134c9ad	- Remove the ksq_loads[] array. We are only interested in three counts, the total load, the timeshare load, and the number of threads that can be migrated to another cpu. Account for these seperately. - Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE can be migrated to another CPU. Currently, this only checks to see if we're an interrupt handler. Eventually this will also be used to support CPU binding.	2003-11-02 10:56:48 +00:00
Alexander Kabaev	cb9ddc80ae	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.	2003-11-02 04:52:53 +00:00
Jeff Roberson	769a363537	- In sched_prio() only force us onto the current queue if our priority is being elevated (numerically smaller).	2003-11-02 04:25:59 +00:00
Jeff Roberson	7d1a81b4dc	- Rename SCHED_PRI_NTHRESH to SCHED_SLICE_NTHRESH since it is only used in slice assignment. Add a comment describing what it does. - Remove a stale XXX comment, the nice should not impact the interactivity, nice adjustments only effect non-interactive tasks in ULE. - Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice +20 tasks as intended.	2003-11-02 04:10:15 +00:00
Jeff Roberson	a0a931cec7	- Remove uses of PRIO_TOTAL and replace them with SCHED_PRI_NRESV - SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we do not have to account for it in the few places that we use it. Requested by: bde	2003-11-02 03:49:32 +00:00
Jeff Roberson	d322132c62	- Change sched_interact_update() to only accept slp+runtime values between 0 and SCHED_SLP_RUN_MAX * 2. This allows us to simplify the algorithm quite a bit. Before, it dealt with arbitrary values which required us to do nasty integer division tricks that didn't quite work out correctly. - Chnage sched_wakeup() to detect conditions where the slp+runtime could exceed SCHED_SLP_RUN_MAX * 2. This can happen if we go to sleep for longer than 6 seconds. In this case, we'll just clear the runtime and set the sleep time to the max. - Define a new function, sched_interact_fork() which updates the slp+runtime of a newly forked thread. We want to limit the amount of history retained from the parent so that we learn the child's behavior quickly. We don't, however want to decay it to nothing. Previously, we would simply divide each parameter by 100 whenever we forked. After a few forks the values would reach 0 and tasks would not be considered interactive. - Add another KTR entry, cleanup some existing entries. - Remove a useless sched_interact_update() from sched_priority(). This is already done by the callers that require it.	2003-11-02 03:36:33 +00:00
Alexander Kabaev	492c1e68fb	Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff	2003-11-01 05:51:54 +00:00
Jeff Roberson	22bf7d9a0e	- Add static to local functions and data where it was missing. - Add an IPI based mechanism for migrating kses. This mechanism is broken down into several components. This is intended to reduce cache thrashing by eliminating most cases where one cpu touches another's run queues. - kseq_notify() appends a kse to a lockless singly linked list and conditionally sends an IPI to the target processor. Right now this is protected by sched_lock but at some point I'd like to get rid of the global lock. This is why I used something more complicated than a standard queue. - kseq_assign() processes our list of kses that have been assigned to us by other processors. This simply calls sched_add() for each item on the list after clearing the new KEF_ASSIGNED flag. This flag is used to indicate that we have been appeneded to the assigned queue but not added to the run queue yet. - In sched_add(), instead of adding a KSE to another processor's queue we use kse_notify() so that we don't touch their queue. Also in sched_add(), if KEF_ASSIGNED is already set return immediately. This can happen if a thread is removed and readded so that the priority is recorded properly. - In sched_rem() return immediately if KEF_ASSIGNED is set. All callers immediately readd simply to adjust priorites etc. - In sched_choose(), if we're running an IDLE task or the per cpu idle thread set our cpumask bit in 'kseq_idle' so that other processors may know that we are idle. Before this, make a single pass through the run queues of other processors so that we may find work more immediately if it is available. - In sched_runnable(), don't scan each processor's run queue, they will IPI us if they have work for us to do. - In sched_add(), if we're adding a thread that can be migrated and we have plenty of work to do, try to migrate the thread to an idle kseq. - Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into consideration. - No longer use kseq_choose() to steal threads, it can lose it's last argument. - Create a new function runq_steal() which operates like runq_choose() but skips threads based on some criteria. Currently it will not steal PRI_ITHD threads. In the future this will be used for CPU binding. - Create a kseq_steal() that checks each run queue with runq_steal(), use kseq_steal() in the places where we used kseq_choose() to steal with before.	2003-10-31 11:16:04 +00:00
John Baldwin	e57ea233d9	Ensure that mp_ncpus is set to 1 if mp_cpu_probe() fails.	2003-10-30 21:44:01 +00:00
Alexander Kabaev	0823d2996c	Relock mntvnode_mtx if vget fails in vfs_stdsync. The loop is always shoould entered with mutex locked.	2003-10-30 16:22:51 +00:00
David Xu	7eeaaf9b97	Try to fetch thread mailbox address in page fault trap, so when thread blocks in page fault hanlder, and upcall thread can be scheduled. It is useful if process is doing lots of mmap based I/O.	2003-10-30 02:55:43 +00:00
Sam Leffler	90fc7b7cb8	Add a temporary mechanism to disble INTR_MPSAFE from network interface drivers. This is prepatory to running more parts of the network system w/o Giant.	2003-10-29 18:29:50 +00:00
Bruce Evans	b3aeaf2ed1	Removed mostly-dead code for setting switchtime after the idle loop clobbers this variable. Long ago, when the idle loop wasn't in a process, it set switchtime.tv_sec to zero to indicate that the time needs to be read after the idle loop finishes. The special case for this isn't needed now that there is an idle process (for each CPU). The time is read in the normal way when the idle process is switched away from. The seconds component of the time is only zero for the first second after the uptime is set, and the mostly-dead code was only executed during this time. (This was slightly broken by using uptimes instead of times relative to the Epoch -- in the original version the seconds component of the time was only 0 for the first second after the Epoch.) In mi_switch(), moved the setting of switchticks to just after the first (and now only) setting of switchtime. This setting used to be delayed since a late setting was needed for the idle case and an early setting was not needed. Now the early setting is needed so that fork_exit() doesn't need to set either switchtime or switchticks. Removed now-completely-rotted comment attached to this. Most of the code described by the comment had already moved to sched_switch().	2003-10-29 15:23:09 +00:00
Bruce Evans	89674a9f77	Removed sched_nest variable in sched_switch(). Context switches always begin with sched_lock held but not recursed, so this variable was always 0. Removed fixup of sched_lock.mtx_recurse after context switches in sched_switch(). Context switches always end with this variable in the same state that it began in, so there is no need to fix it up. Only sched_lock.mtx_lock really needs a fixup. Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion that sched_lock is owned and not recursed after it is fixed up. This assertion much match the one in mi_switch(), and if sched_lock were recursed then a non-null fixup of sched_lock.mtx_recurse would probably be needed again, unlike in sched_switch(), since fork_exit() doesn't return to its caller in the normal way.	2003-10-29 14:40:41 +00:00
Sam Leffler	9c855a36c1	Introduce the notion of "persistent mbuf tags"; these are tags that stay with an mbuf until it is reclaimed. This is in contrast to tags that vanish when an mbuf chain passes through an interface. Persistent tags are used, for example, by MAC labels. Add an m_tag_delete_nonpersistent function to strip non-persistent tags from mbufs and use it to strip such tags from packets as they pass through the loopback interface and when turned around by icmp. This fixes problems with "tag leakage". Pointed out by: Jonathan Stone Reviewed by: Robert Watson	2003-10-29 05:40:07 +00:00
Sam Leffler	395bb18680	speedup stream socket recv handling by tracking the tail of the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)	2003-10-28 05:47:40 +00:00
Jeff Roberson	1aca9909e5	- Only change the run queue in sched_prio() if the kse is non null. threads can be in the TD_ON_RUNQ state and not have an associated kse. - Remove the PRI_IDLE special case from sched_clock(), it was not actually necessary.	2003-10-28 03:28:48 +00:00
Jeff Roberson	eab9cabf34	- Don't set td_priority directly here, use sched_prio().	2003-10-27 07:15:47 +00:00
Jeff Roberson	3f741ca117	- Use a better algorithm in sched_pctcpu_update() Contributed by: Thomaswuerfl@gmx.de - In sched_prio(), adjust the run queue for threads which may need to move to the current queue due to priority propagation . - In sched_switch(), fix style bug introduced when the KSE support went in. Columns are 80 chars wide, not 90. - In sched_switch(), Fix the comparison in the idle case and explicitly re-initialize the runq in the not propagated case. - Remove dead code in sched_clock(). - In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads that have become runnable will get a chance to. - In sched_runnable(), if we're not the IDLETD, we should not consider curthread when examining the load. This mimics the 4BSD behavior of returning 0 when the only runnable thread is running. - In sched_userret(), remove the code for setting NEEDRESCHED entirely. This is not necessary and is not implemented in 4BSD. - Use the correct comparison in sched_add() when checking to see if an idle prio task has had it's priority temporarily elevated.	2003-10-27 06:47:05 +00:00
Alfred Perlstein	6ff7636ea5	constify the second args to timevaladd() and timevalsub().	2003-10-26 02:19:00 +00:00
Robert Watson	36bbf86ba6	Check (locked) before performing an advisory unlock following a failure of vn_start_write(). Otherwise, we may inconsistently attempt to release the advisory lock. Pointed out by: teggej	2003-10-25 16:43:50 +00:00
Robert Watson	c447f5b2f4	When generate a core dump, use advisory locking in an advisory way: if we do acquire an advisory lock, great! We'll release it later. However, if we fail to acquire a lock, we perform the coredump anyway. This problem became particularly visible with NFS after the introduction of rpc.lockd: if the lock manager isn't running, then locking calls will fail, aborting the core dump (resulting in a zero-byte dump file). Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>	2003-10-25 16:14:09 +00:00
Robert Watson	67536f038c	Allow MAC policies to block/revoke kern_alq write access to a file. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Reviewed by: jeff	2003-10-25 16:10:41 +00:00
Warner Losh	17e02bb39b	Convenience functions to generate notifications from the kernel. The ACPI code will start using these shortly. Reviewed by: njl	2003-10-24 22:41:54 +00:00
John-Mark Gurney	0eb3b7bb7f	don't allow reading from files that haven't been open'd for reading.	2003-10-24 21:07:53 +00:00
John Baldwin	8b201c42c6	- Add a DDB command 'show intrcnt' to show the non-zero interrupt counts. - Add a DDB function to dump the contents of an ithread and optionally details about each handler in that ithread. This function can be used by MD code to implement DDB commands that display information about interrupt sources and their registered handlers.	2003-10-24 21:05:30 +00:00
John Baldwin	e07c897e61	Writes to p_flag in __setugid() no longer need Giant.	2003-10-23 21:20:34 +00:00
John Baldwin	787f162df6	Move the P_COWINPROGRESS flag from being a per-process p_flag to being a per-thread td_pflag which doesn't require any locks to read or write as it is only read or written by curthread on itself. Glanced at by: mckusick	2003-10-23 21:14:08 +00:00
Garrett Wollman	06cb76bde3	Add appropriate const poisoning to the assert_*locked() family so that I can call ASSERT_VOP_LOCKED(vp, __func__) without a diagnostic. Inspired by: the evil and rude OpenAFS cache manager code	2003-10-23 18:17:36 +00:00
Robert Watson	6fa0475d95	mac_Finish break-out of kern_mac.c into parts: Include src/sys/security/mac/mac_internal.h in kern_mac.c. Remove redundant defines from the include: SYSCTL_DECL(), debug macros, composition macros. Unstaticize various bits now exposed to the remainder of the kernel: mac_init_label(), mac_destroy_label(). Remove all the functions now implemented in mac_process/mac_vfs/mac_net/ mac_pipe. Also remove debug counters, sysctls exporting debug counters, enforcement flags, sysctls exporting enforcement flags. Leave module declaration, sysctl nodes, mactemp malloc type, system calls. This should conclude MAC/LINT/NOTES breakage from the break-out process, but I'm running builds now to make sure I caught everything. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-10-22 20:59:31 +00:00
Robert Watson	089c1bdac9	Variable cleanup following break-out of kern_mac.c into sys/security/mac: Unstaticize mac_late. Remove ea_warn_once, now in mac_vfs.c. Unstaticisize mac_policy_list, mac_static_policy_list, use struct mac_policy_list_head instead of LIST_HEAD() directly. Unstaticize and un-inline MAC policy locking functions so they can be referenced from mac_*.c. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-10-22 20:47:41 +00:00
Robert Watson	9e7bf51ca8	Rename error_select() to mac_error_select(), and unstaticize so it can be used from src/sys/security/mac/mac_*.c. Obtained from: TrustedBSD Project Sponosred by: DARPA, Network Associates Laboratories	2003-10-22 20:42:22 +00:00
Mike Silbersack	184dcdc7c8	Change all SYSCTLS which are readonly and have a related TUNABLE from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.	2003-10-21 18:28:36 +00:00
Hidetoshi Shimokawa	a44ca4f05f	We need to initialize bp->b_offset and bp->b_iooffset becuase bp->b_blkno is ignored now.	2003-10-21 13:18:19 +00:00
Scott Long	bd781a1ed6	Don peril-sensitive sunglasses and mark pipe(2) as MPSAFE. I've beaten up on it for the last 15 hours with no signs of problems. It gives a small (1%) gain on buildworld since pipe_read/pipe_write are already free of Giant.	2003-10-21 07:03:27 +00:00
Poul-Henning Kamp	68b00bf648	Remove KASSERTS on B_PHYS for vmapbuf() and vunmapbuf(), B_PHYS is going away.	2003-10-21 06:53:10 +00:00
Marcel Moolenaar	9ee99eb496	Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.	2003-10-21 01:13:49 +00:00
Sam Leffler	6c24056459	revert default for idle polling to zero until we can resolve the livelock problem	2003-10-20 21:14:24 +00:00
Jeff Roberson	484288de56	- If a thread is not bound to a kse return 0 from sched_pctcpu(). Reported by: pawel.worach@nordea.com	2003-10-20 19:55:21 +00:00
Alan Cox	f2b1200d08	Initialize the buf's b_object in pbgetvp(). Clear it in pbrelvp(). (This facilitates synchronization of the vm page's valid field using the vm object's lock.) Suggested by: tegge	2003-10-20 18:24:38 +00:00
David Malone	111b0d0d29	Mark dup as MPSAFE. Giant was pushed into dup ages ago, but it looks like it was missed in syscalls.master. Spotted by: alc	2003-10-20 16:16:03 +00:00
Alan Cox	9027d603d3	- Synchronize access to a vm page's valid field using the containing vm object's lock.	2003-10-20 05:57:55 +00:00
Marcel Moolenaar	bab1f05277	Put the RSE backing store at a fixed address. This change is triggered by libguile that needs to know the base of the RSE backing store. We currently do not export the fixed address to userland by means of a sysctl so user code needs to hardcode it for now. This will be revisited later. The RSE backing store is now at the bottom of region 4. The memory stack is at the top of region 4. This means that the whole region is usable for the stacks, giving a 61-bit stack space. Port: lang/guile (depended of x11/gnome2)	2003-10-20 05:34:10 +00:00
David Malone	e1419c08e2	falloc allocates a file structure and adds it to the file descriptor table, acquiring the necessary locks as it works. It usually returns two references to the new descriptor: one in the descriptor table and one via a pointer argument. As falloc releases the FILEDESC lock before returning, there is a potential for a process to close the reference in the file descriptor table before falloc's caller gets to use the file. I don't think this can happen in practice at the moment, because Giant indirectly protects closes. To stop the file being completly closed in this situation, this change makes falloc set the refcount to two when both references are returned. This makes life easier for several of falloc's callers, because the first thing they previously did was grab an extra reference on the file. Reviewed by: iedowse Idea run past: jhb	2003-10-19 20:41:07 +00:00
Alan Cox	48ae2dddac	- Add vm object locking to vfs_clean_pages() and vfs_bio_set_validclean(). This is to synchronize access to the vm page's valid field by vm_page_set_validclean().	2003-10-19 20:39:06 +00:00
Peter Wemm	68d86cf1e2	Tidy up loose ends in the idle process. Call the MI cpu_idle() function for all platforms now. XXX alpha/sparc64/powerpc should fill in the function. Submitted by: bde	2003-10-19 02:43:57 +00:00
Poul-Henning Kamp	2d6a9d0747	Initialize b_iooffset before calling VOP_[SPEC]STRATEGY	2003-10-18 19:49:46 +00:00
Poul-Henning Kamp	01758670e9	Initialize b_iooffset before calling strategy	2003-10-18 19:48:21 +00:00
Poul-Henning Kamp	0efedd8864	Don't report b_pblkno, it is going away.	2003-10-18 17:59:02 +00:00
Poul-Henning Kamp	1ad9172f6b	Report bio_pblkbo instead of bio_blkno.	2003-10-18 17:27:10 +00:00
Poul-Henning Kamp	4cb4df483c	Make bioq_disksort() sort on the bio_offset field instead of bio_pblkno.	2003-10-18 15:50:56 +00:00
Poul-Henning Kamp	2c18019f14	DuH! bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)	2003-10-18 14:10:28 +00:00
Poul-Henning Kamp	cc81271eaa	I think rwatson got the sign wrong here...	2003-10-18 12:16:17 +00:00
Poul-Henning Kamp	855c6fcc68	Initialize bp->b_offset before calling VOP_STRATEGY()	2003-10-18 11:13:31 +00:00
Poul-Henning Kamp	583b92e328	Convert some if(bla) panic("foo") to KASSERTS to improve grep-ability.	2003-10-18 09:32:39 +00:00
Poul-Henning Kamp	d986d4580c	The size and contents of the DEV_STRATEGY() macro has progressed to the point where it being a macro is no longer sensible, and it will only be more so in days to come. BIO_STRATEGY() is now only used from DEV_STRATEGY() and should not be used directly anymore. Put the contents of both in the new function dev_strategy() and make DEV_STRATEGY() call that function. In addition, this allows us to make the rather magic bufdonebio() helper function static. This alse saves hunderedandsome bytes of code in a typical kernel.	2003-10-18 09:03:15 +00:00
Robert Watson	dae6d925a2	Wrap db_active check in #ifdef DDB, as db_active is not defined ifndef DDB.	2003-10-18 02:23:57 +00:00
Robert Watson	90e6b5447f	Add a new cn_flags fields to struct consdev, the low-level console definition structure. Define one flag, CN_FLAG_NODEBUG, which indicates the console driver cannot be used in the context of the debugger. This may be used, for example, if the console device interacts with kernel services that cannot be used from the debugger context, such as the network stack. These drivers are skipped over for calls to cn_checkc() and cn_putc(), and the calling function simply moves on to the next available console.	2003-10-18 02:13:39 +00:00
Jeff Roberson	94816f6d52	- Remove the correct thread from the run queue in setrunqueue(). This fixes ULE + KSE.	2003-10-17 20:53:04 +00:00
Poul-Henning Kamp	3da2d6a453	Simplify count_dev()	2003-10-17 11:56:48 +00:00
Peter Wemm	c9c373b093	Halt the cpu on amd64 as well. For some strange reason, this makes a fair bit of difference to the power consumption and lets my cpu cool down enough for the temperature sensitive fan controller to completely stop the cpu fan at times.	2003-10-17 03:49:03 +00:00
Marcel Moolenaar	b0f865c1f3	Implement cpu_idle() on ia64. We put the processor in a lightweight halt state that minimizes power consumption while still preserving cache and TLB coherency. Halting the processor is not conditional at this time. Tested with UP and SMP kernels.	2003-10-17 02:24:59 +00:00
Jeff Roberson	55f2099a70	- The kse may be null in sched_pctcpu(). Reported by: kris	2003-10-16 21:13:14 +00:00
Jeff Roberson	0e0f626628	- Only kse_reassign() in the !running case. Reported by: kris	2003-10-16 20:32:57 +00:00
Jeff Roberson	0c7da3a43d	- Call sched_add() with the correct argument on SMP. Reported by: Valentin Chopov <valentin@valcho.net>	2003-10-16 20:06:19 +00:00
Jeff Roberson	b72f347bdb	- Fix a minor problem with my last commit, we don't want to return from sched_switch if the thread is running, we want to fall through and pick a new thread because we have been preempted.	2003-10-16 10:04:54 +00:00
Doug Rabson	46ba7a35f2	* Add multiple inheritance to kobj. Each class can have zero or more base classes and if a method is not found in a given class, its base classes are searched (in the order they were declared). This search is recursive, i.e. a method may be define in a base class of a base class. * Change the kobj method lookup algorithm to one which is SMP-safe. This relies only on the constraint that an observer of a sequence of writes of pointer-sized values will see exactly one of those values, not a mixture of two or more values. This assumption holds for all processors which FreeBSD supports. * Add locking to kobj class initialisation. * Add a simpler form of 'inheritance' for devclasses. Each devclass can have a parent devclass. Searches for drivers continue up the chain of devclasses until either a matching driver is found or a devclass is reached which has no parent. This can allow, for instance, pci drivers to match cardbus devices (assuming that cardbus declares pci as its parent devclass). * Increment __FreeBSD_version. This preserves the driver API entirely except for one minor feature used by the ISA compatibility shims. A workaround for ISA compatibility will be committed separately. The kobj and newbus ABI has changed - all modules must be recompiled.	2003-10-16 09:16:28 +00:00
Jeff Roberson	ae53b483cc	- Collapse sched_switchin() and sched_switchout() into sched_switch(). Now mi_switch() calls sched_switch() which calls cpu_switch(). This is actually one less function call than it had been.	2003-10-16 08:53:46 +00:00
Jeff Roberson	7cf90fb376	- Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td argument rather than a kse.	2003-10-16 08:39:15 +00:00
Jeff Roberson	4c9612c622	- The non iterative algorithm for interact_update was broken due to rounding errors. This was the source of the majority of the interactivity problems. Reintroduce the old algorithm and its XXX. - Up the interactivity threshold to 30. It really could stand to be even a tiny bit higher. - Let the sleep and run time accumulate up to 5 seconds of history rather than two. This helps stop XFree86 from becoming non-interactive during bursts of activity.	2003-10-16 08:17:43 +00:00
Jeff Roberson	08fd6713b2	- If our user_pri doesn't match our actual priority our priority has been elevated either due to priority propagation or because we're in the kernel in either case, put us on the current queue so that we dont stop others from using important resources. At some point the priority elevations from sleeping in the kernel should go away. - Remove an optimization in sched_userret(). Before we would only set NEEDRESCHED if there was something of a higher priority available. This is a trivial optimization and it breaks priority propagation because it doesn't take threads which we may be blocking into account. Notice that the thread which is blocking others gets up to one tick of cpu time before we honor this NEEDRESCHED in sched_clock().	2003-10-15 07:47:06 +00:00
Peter Wemm	25e247af44	The KERN_PROC_PROC sysctl took 4 args in 5.0-REL and 5.1-REL. We need to accept this for a bit longer. Requiring the new order of 3 args only was not very helpful.	2003-10-15 03:11:46 +00:00
Sam Leffler	bd19669855	Change default for kern.polling.idle_poll back to 1. This was set to 0 because Luigi observed livelock but in recent testing it did not occur so I'm re-enabling it by default. Reviewed by: luigi	2003-10-14 18:39:36 +00:00
Poul-Henning Kamp	b84044731d	Made use of 'error' argument, which was unused (by mistake) before. Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>	2003-10-14 08:09:43 +00:00
Warner Losh	d29516dd82	With DIAGNOSTICS, sometimes we get weird crashes when some driver accesses softc after it is freed. Use a different malloc type for softc than the rest of the bus code to make it more clear when these things happen that it is the driver that's at fault, not the bus code. Suggested by: sam and/or phk (I think)	2003-10-14 06:22:07 +00:00
Jeff Roberson	85b9831dfa	- Add a mising vn_finished_write() Pointy hat: jeff Found by: robert Obtained from: kirk	2003-10-14 00:38:34 +00:00
David Xu	3a2e2a0ec8	Don't clear signal mask in execsig(). RELENG_4 does not clear it and POSIX asks to inherit signal mask for execv.	2003-10-13 14:03:08 +00:00
Jeff Roberson	736c97c7b3	- In SCHED_CURR() add holding Giant to the list of criteria that will keep you on the current queue. In the future, it would be nice if priority propagation could deterministicly pluck a thread off of the next queue and put it on the current queue. Until then this hack stops us from holding up our entire current queue, including interrupt handlers, while a thread on the next queue is blocked while holding Giant. - Inherit our pctcpu information from our parent.	2003-10-12 21:07:31 +00:00
Alan Cox	d58e70a08d	In vfs_bio_clrbuf(), ignore the state of the object lock if the page is the "bogus" page. Found by: tegge	2003-10-12 18:26:48 +00:00
Poul-Henning Kamp	5108cd3652	Simplify vn_isdisk() a bit.	2003-10-12 14:04:39 +00:00
John-Mark Gurney	9e5de980c6	fix a problem referencing free'd memory. This is only a problem for kqueue write events on a socket and you regularly create tons of pipes which overwrites the structure causing a panic when removing the knote from the list. If the peer has gone away (and it's a write knote), then don't bother trying to remove the knote from the list. Submitted by: Brian Buchanan and myself Obtained from: nCircle	2003-10-12 07:06:02 +00:00
Jeff Roberson	7dd1328c13	- Fix a typo, I meant & and not \|. This was causing lockups from the syncer looping forever due to list corruption. Solved by: tegge	2003-10-11 21:50:45 +00:00
Alan Cox	08814d66d5	- Synchronize access to a page's valid field in vfs_bio_clrbuf() by using the lock from its containing object. - Remove GIANT_REQUIRED from vm_hold_load_pages().	2003-10-10 07:26:21 +00:00
Robert Drehmel	ea924c4cd3	Implement preliminary support for the PT_SYSCALL command to ptrace(2).	2003-10-09 10:17:16 +00:00
Tim J. Robbins	a50f62fd9f	Remove support for the unused 4th component of the KERN_PROC_PROC sysctl.	2003-10-06 01:26:11 +00:00
Jeff Roberson	d1cf0fc7fc	- Add a missing vn_start_write() to flushbufqueues(). This could have caused snapshot related problems. - The vp can not be NULL here or we would panic in vfs_bio_awrite(). Stop confusing the logic by checking for it in several places. Submitted by: kirk and then rototilled by me to remove vp == NULL checks.	2003-10-05 22:16:08 +00:00
Bruce M Simpson	f05970242b	Bring back sysctl_wire_old_buffer(). Fix a bug in sysctl_handle_opaque() whereby the pointers would not get reset on a retried SYSCTL_OUT() call. Noticed by: bde	2003-10-05 13:31:33 +00:00
Bruce M Simpson	dcf59a59fc	Fix a security problem in sysctl() the long way round. Use pre-emption detection to avoid the need for wiring a userland buffer when copying opaque data structures. sysctl_wire_old_buffer() is now a no-op. Other consumers of this API should use pre-emption detection to notice update collisions. vslock() and vsunlock() should no longer be called by any code and should be retired in subsequent commits. Discussed with: pete, phk MFC after: 1 week	2003-10-05 09:37:47 +00:00
Bruce M Simpson	0c9601bc6b	Add a pre-emption counter, td_generation, so that threads can notice when they have been pre-empted by other threads. This is bumped from within mi_switch() every time a context switch takes place. Discussed with: pete	2003-10-05 09:35:08 +00:00
Bruce M Simpson	51830edcc5	Fold the vslock() and vsunlock() calls in this file with #if 0's; they will go away in due course. Involuntary pre-emption means that we can't count on wiring of pages alone for consistency when performing a SYSCTL_OUT() bigger than PAGE_SIZE. Discussed with: pete, phk	2003-10-05 08:38:22 +00:00
Jeff Roberson	98d7d155c1	- Apply a big giant lock around the namecache. This has been sitting in my tree since BSDcon.	2003-10-05 07:13:50 +00:00
Jeff Roberson	bdcfcdecea	- Fix an XXX. Check the error of vn_lock() in vflush(). Don't specify LK_RETRY either, we don't want this vnode if it turns into another. - Remove the code that checks the mount point after acquiring the lock we are guaranteed to either fail or get the vnode that we wanted.	2003-10-05 07:12:38 +00:00
Bruce M Simpson	5be99846fc	Remove magic numbers surrounding locking state in the sysctl module, and replace them with more meaningful defines.	2003-10-05 05:38:30 +00:00
Jeff Roberson	45503a37dd	- Rename vcanrecycle() to vtryrecycle() to reflect its new role. - In vtryrecycle() try to vgonel the vnode if all of the previous checks passed. We won't vgonel if someone has either acquired a hold or usecount or started the vgone process elsewhere. This is because we may have been removed from the free list while we were inspecting the vnode for recycling. - The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling the same vnode. To further reduce the likelyhood of this event, requeue the vnode on the tail of the list prior to calling vtryrecycle(). We can not actually remove the vnode from the list until we know that it's going to be recycled because other interlock holders may see the VI_FREE flag and try to remove it from the free list. - Kill a bogus XXX comment. If XLOCK is set we shouldn't wait for it regardless of MNT_WAIT because the vnode does not actually belong to this filesystem.	2003-10-05 05:35:41 +00:00
Jeff Roberson	85311d4b59	- Don't cache_purge() in getnewvnode. It's done in vclean(). With this purge, the purge in vclean, and the filesystems purge, we had 3 purges per vnode. - Move the insmntque(vp, 0) to vclean() so that we may remove it from the two vgone() functions and reduce the number of lock operations required.	2003-10-05 02:48:04 +00:00
Jeff Roberson	ce13b187e7	- Solve a LOR with the sync_mtx by using the VI_ONWORKLST flag to determine whether or not the sync failed. This could potentially get set between the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly lead to the sync being delayed by an extra 30 seconds. If we do not move the vnode it could cause an endless loop if it continues to fail to sync. - Use vhold and vdrop to stop the vnode from changing identities while we have it unlocked. Other internal vfs lists are likely to follow this scheme.	2003-10-05 00:35:41 +00:00
Jeff Roberson	894fbf9769	- Move the xlock 'locking' code into vx_lock() and vx_unlock(). - Create a new function, vgonechrl(), which performs vgone for an in-use character device. Move the code from vflush() that did this into vgonechrl(). - Hold the xlock across the entirety of vgonel() and vgonechrl() so that at no point will an invalid vnode exist on any list without XLOCK set. - Move the xlock code out of vclean() now that it is in the vgone*() functions.	2003-10-05 00:02:41 +00:00
Alan Cox	6ec2fca505	Eliminate some unnecessary uses of the vm page queues lock around the vm page's valid field. This field is being synchronized using the containing vm object's lock.	2003-10-04 22:47:20 +00:00
Alan Cox	bf0da100d6	- Extend the scope the vm object lock to cover calls to vm_page_is_valid(). - Assert that the lock on the containing vm object is held in vm_page_is_valid().	2003-10-04 19:23:29 +00:00
Jeff Roberson	6f4b0863e0	- In sched_sync() test our preconditions prior to dropping the sync_mtx. This is so that we may grab the interlock while still holding the sync_mtx. We have to VI_TRYLOCK() because in all other cases the lock order runs the other way. - If we don't meet any of the preconditions, reinsert the vp into the list for the next second. - We don't need to panic if we fail to sync here because each FSYNC function handles this case. Removing this redundant code also simplifies locking.	2003-10-04 18:03:53 +00:00
Jeff Roberson	8ec82641d8	- Change a lame iterative algorithm to a constant time algorithm. Remove the XXX that complains about it as well. Submitted by: ThomasWuerfl@gmx.de	2003-10-04 17:41:13 +00:00
Jeff Roberson	e4c49d2b50	- In a Giantless world, the vn_lock() in vcanrecycle() could legitimately fail. Remove the panic from that case and document why it might fail. - Document the reason for calling cache_purge() on a newly created vnode. - In insmntque() order the operations so that we can call mtx_unlock() one fewer times. This makes the code somewhat clearer as well. - Add XXX comments in sched_sync() and vflush(). - In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is set. - In vclean() we don't need to acquire a lock around a single TAILQ_FIRST call. It's ok if we race here, the vinvalbuf will just do nothing. - Increase the scope of the lock in vgonel() to reduce the number of lock operations that are performed.	2003-10-04 15:10:40 +00:00
Jeff Roberson	1de1f935f2	- If we are called with LK_NOWAIT in vn_lock() we may be holding a mutex and should not sleep while waiting for XLOCK to clear. Care needs to be taken in functions that use this capability to avoid spinning.	2003-10-04 14:35:22 +00:00
Jacques Vidrine	8b7358ca43	Introduce a uiomove_frombuf helper routine that handles computing and validating the offset within a given memory buffer before handing the real work off to uiomove(9). Use uiomove_frombuf in procfs to correct several issues with integer arithmetic that could result in underflows/overflows. As a side-effect, the code is significantly simplified. Add additional sanity checks when computing a memory allocation size in pfs_read. Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-) Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)	2003-10-02 15:00:55 +00:00
Robert Watson	c142b0fcfe	Remove the global variable 'cmask', which was used to initialize the fd_cmask field in the file descriptor structure for the first process indirectly from CMASK, and when an fd structure is initialized before being filled in, and instead just use CMASK. This appears to be an artifact left over from the initial integration of quotas into BSD. Suggested by: peter	2003-10-02 03:57:59 +00:00
Jeff Roberson	fa3f9daae5	- On my Pentium4-M laptop, invalpg takes ~1100 cycles if the page is found in the TLB and ~1600 if it is not. Therefore, it is more effecient to invalidate the TLB after operations that use CMAP rather than before. - So that the tlb is invalidated prior to switching off of a processor, we must change the switchin functions to switchout functions. - Remove td_switchout from the thread and move it to the x86 pcb. - Move the code that calls switchout into swtch.s. These changes make this optimization truely x86 specific.	2003-09-30 08:11:36 +00:00
Robert Watson	cc7b13bfe0	If the struct mac copied into the kernel has a negative length, return EINVAL rather than failing the following malloc due to the value being too large.	2003-09-29 18:35:17 +00:00
Poul-Henning Kamp	431021789f	Retire revoke_and_destroy_dev() with extreme prejudice.	2003-09-28 20:50:36 +00:00

1 2 3 4 5 ...

6883 Commits