freebsd-dev

Author	SHA1	Message	Date
Tim J. Robbins	541c3b66b5	When there are no free sem_undo structs available in semu_alloc(), only free one sem_undo with un_cnt == 0 instead of all of them. This is a temporary workaround until the SLIST_FOREACH_PREVPTR loop gets fixed so that it doesn't cause cycles in semu_list when removing multiple adjacent items. It might be easier to just use (doubly-linked) LISTs here instead of complicated SLIST code to achieve O(1) removals. This bug manifested itself as a complete lockup under heavy semaphore use by multiple processes with the SEM_UNDO flag set. PR: 58984	2003-11-10 07:22:41 +00:00
Marcel Moolenaar	fcaa2925a9	Change the clear_ret argument of get_mcontext() to be a flags argument. Since all callers either passed 0 or 1 for clear_ret, define bit 0 in the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI code for possible (but unlikely) future use. The remaining bits are for use by MD code. This change is triggered by a need on ia64 to have another knob for get_mcontext().	2003-11-09 20:31:04 +00:00
Bruce Evans	b698380f33	Quick fix for scaling of statclock ticks in the SMP case. As explained in the log message for kern_sched.c 1.83 (which should have been repo-copied to preserve history for this file), the (4BSD) scheduler algorithm only works right if stathz is nearly 128 Hz. The old commit lock said 64 Hz; the scheduler actually wants nearly 16 Hz but there was a scale factor of 4 to give the requirement of 64 Hz, and rev.1.83 changed the scale factor so that the requirement became 128 Hz. The change of the scale factor was incomplete in the SMP case. Then scheduling ticks are provided by smp_ncpu CPUs, and the scheduler cannot tell the difference between this and 1 CPU providing scheduling ticks smp_ncpu times faster, so we need another scale factor of smp_ncp or an algorithm change. This quick fix uses the scale factor without even trying to optimize the runtime divisions required for this as is done for the other scale factor. The main algorithmic problem is the clamp on the scheduling tick counts. This was 295; it is now approximately 295 * smp_ncpu. When the limit is reached, threads get free timeslices and scheduling becomes very unfair to the threads that don't hit the limit. The limit can be reached and maintained in the worst case if the load average is larger than (limit / effective_stathz - 1) / 2 = 0.65 now (was just 0.08 with 2 CPUs before this change), so there are algorithmic problems even for a load average of 1. Fortunately, the worst case isn't common enough for the problem to be very noticeable (it is mainly for niced CPU hogs competing with less nice CPU hogs).	2003-11-09 13:45:54 +00:00
Seigo Tanimura	512824f8f7	- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current	2003-11-09 09:17:26 +00:00
Sam Leffler	7902224c6b	o add a flags parameter to netisr_register that is used to specify whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation	2003-11-08 22:28:40 +00:00
David Xu	685a6c448a	Return a reasonable number for top or ps to display for M:N thread, since there is no direct association between M:N thread and kse, sometimes, a thread does not have a kse, in that case, return a pctcpu from its last kse, it is not perfect, but gives a good number to be displayed.	2003-11-08 03:03:17 +00:00
John Baldwin	dac33f12cc	Regen.	2003-11-07 20:30:30 +00:00
John Baldwin	c055e5d412	Mark ptrace(), ktrace(), utrace(), sysarch(), and issetugid() as MP safe. The parts of these calls that are not yet MP safe acquire Giant explicitly.	2003-11-07 20:23:23 +00:00
Robert Watson	a2f88a8b7c	Slight whitespace consistency improvement: Trim trailing whitespace. Remove unmatched " " before ")".	2003-11-07 04:47:14 +00:00
Jeff Roberson	f28b3340c1	- Somehow I botched my last commit. Add an extra ( to fix things up. I'm still not sure how this happened. Reported by: ps	2003-11-06 07:56:01 +00:00
Alan Cox	3b2c54e7bc	- Delay the allocation of memory for the pipe mutex until we need it. This avoids the need to free said memory in various error cases along the way.	2003-11-06 05:58:26 +00:00
Alan Cox	fc17df5264	- Simplify pipespace() by eliminating the explicit creation of vm objects. Instead, let the vm objects be lazily instantiated at fault time. This results in the allocation of fewer vm objects and vm map entries due to aggregation in the vm system.	2003-11-06 05:08:12 +00:00
Robert Watson	83b7b0edca	Remove the flags argument from mac_externalize_*_label(), as it's not passed into policies or used internally to the MAC Framework. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-06 03:42:43 +00:00
Jeff Roberson	a70d729bff	- Remove the local definition of sched_pin and unpin. They are provided in sched.h now. - Respect the td pin count.	2003-11-06 03:09:51 +00:00
Sam Leffler	d3be1471c7	o make debug_mpsafenet globally visible o move it from subr_bus.c to netisr.c where it more properly belongs o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to grab Giant as needed when MPSAFE operation is enabled Supported by: FreeBSD Foundation	2003-11-05 23:42:51 +00:00
Warner Losh	252af39a96	Minor style(9) nit	2003-11-05 06:14:48 +00:00
Jeff Roberson	46f8b26550	- It's ok if sched_runnable() has races in it, we don't need the sched_lock here unless we have something on the assigned queue.	2003-11-05 05:30:12 +00:00
Alexander Kabaev	ca430f2e92	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
Max Khon	2332251c6a	Back out the following revisions: 1.36 +73 -60 src/sys/compat/linux/linux_ipc.c 1.83 +102 -48 src/sys/kern/sysv_shm.c 1.8 +4 -0 src/sys/sys/syscallsubr.h That change was intended to support vmware3, but wantrem parameter is useless because vmware3 uses SYSV shared memory to talk with X server and X server is native application. The patch worked because check for wantrem was not valid (wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments). Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set to 1 allows to return removed segments in shm_find_segment_by_shmid() and shm_find_segment_by_shmidx(). MFC after: 1 week	2003-11-05 01:53:10 +00:00
Kirk McKusick	b932dd9b28	Get rid of DIAGNOSTIC that gives false positives on slow CPUs.	2003-11-04 08:03:11 +00:00
Jeff Roberson	9bacd788a1	- Add initial support for pinning and binding.	2003-11-04 07:45:41 +00:00
Kirk McKusick	15a93fcc31	Allow the bufdaemon and update daemon processes to skip the waitrunningbufspace() calls so that they are always able to proceed and clean up buffer space. Submitted by: Brian Fundakowski Feldman <green@freebsd.org>	2003-11-04 06:30:00 +00:00
Sam Leffler	3465702f13	disable MPSAFE network drivers; we aren't ready yet`	2003-11-04 02:01:42 +00:00
Olivier Houchard	7922cdc855	I believe kbyanc@ really meant this in rev 1.58. Use zpfind() to see if the process became a zombie if pfind() doesn't find it and if the caller wants to know about process death, so that the caller knows the process died even if it happened before the kevent was actually registered. MFC after: 1 week	2003-11-04 01:41:47 +00:00
Olivier Houchard	f44004690c	Do not attempt to report proc event if NOTE_EXIT has already been received. This fixes a race condition (specifically with signal events) that could lead to the kn being re-inserted into the list after it has been destroyed, which is not something we want to happen. PR: kern/58258	2003-11-04 01:14:58 +00:00
John Baldwin	8bc0846476	Don't require INTR_FAST handlers to be exclusive in the MI layer. Instead, let the MD code choose whether or not to implement such a policy. The new i386 interrupt code allows multiple FAST handlers for a given source for example. However, the code does not allow FAST and non-FAST handlers to be mixed.	2003-11-03 22:42:58 +00:00
John Baldwin	b95bb3e62b	Update spin lock order list for new i386 interrupt and SMP code.	2003-11-03 22:38:30 +00:00
Robert Watson	730ecf8254	Unlock pipe mutex when failing MAC pipe ioctl access control check. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-03 17:58:23 +00:00
Jeff Roberson	112b6d3aa9	- Remove kseq_find(), we no longer scan other cpu's run queues when we go idle. They figure out that we're idle fast enough that the cache pollution introduces by scanning their run queue is more expensive than waiting a little longer. - Add kseq_setidle() to mark us as being idle. Use this in place of kseq_find(). - Remove kseq_load_highest(), kseq_find() was the only consumer of this interface. kseq_balance() has it's own customized version that finds the lowest and highest loads simultaneously. Continuously told that this would be faster by: terry	2003-11-03 03:27:22 +00:00
Jeff Roberson	ef1134c9ad	- Remove the ksq_loads[] array. We are only interested in three counts, the total load, the timeshare load, and the number of threads that can be migrated to another cpu. Account for these seperately. - Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE can be migrated to another CPU. Currently, this only checks to see if we're an interrupt handler. Eventually this will also be used to support CPU binding.	2003-11-02 10:56:48 +00:00
Alexander Kabaev	cb9ddc80ae	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.	2003-11-02 04:52:53 +00:00
Jeff Roberson	769a363537	- In sched_prio() only force us onto the current queue if our priority is being elevated (numerically smaller).	2003-11-02 04:25:59 +00:00
Jeff Roberson	7d1a81b4dc	- Rename SCHED_PRI_NTHRESH to SCHED_SLICE_NTHRESH since it is only used in slice assignment. Add a comment describing what it does. - Remove a stale XXX comment, the nice should not impact the interactivity, nice adjustments only effect non-interactive tasks in ULE. - Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice +20 tasks as intended.	2003-11-02 04:10:15 +00:00
Jeff Roberson	a0a931cec7	- Remove uses of PRIO_TOTAL and replace them with SCHED_PRI_NRESV - SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we do not have to account for it in the few places that we use it. Requested by: bde	2003-11-02 03:49:32 +00:00
Jeff Roberson	d322132c62	- Change sched_interact_update() to only accept slp+runtime values between 0 and SCHED_SLP_RUN_MAX * 2. This allows us to simplify the algorithm quite a bit. Before, it dealt with arbitrary values which required us to do nasty integer division tricks that didn't quite work out correctly. - Chnage sched_wakeup() to detect conditions where the slp+runtime could exceed SCHED_SLP_RUN_MAX * 2. This can happen if we go to sleep for longer than 6 seconds. In this case, we'll just clear the runtime and set the sleep time to the max. - Define a new function, sched_interact_fork() which updates the slp+runtime of a newly forked thread. We want to limit the amount of history retained from the parent so that we learn the child's behavior quickly. We don't, however want to decay it to nothing. Previously, we would simply divide each parameter by 100 whenever we forked. After a few forks the values would reach 0 and tasks would not be considered interactive. - Add another KTR entry, cleanup some existing entries. - Remove a useless sched_interact_update() from sched_priority(). This is already done by the callers that require it.	2003-11-02 03:36:33 +00:00
Alexander Kabaev	492c1e68fb	Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff	2003-11-01 05:51:54 +00:00
Jeff Roberson	22bf7d9a0e	- Add static to local functions and data where it was missing. - Add an IPI based mechanism for migrating kses. This mechanism is broken down into several components. This is intended to reduce cache thrashing by eliminating most cases where one cpu touches another's run queues. - kseq_notify() appends a kse to a lockless singly linked list and conditionally sends an IPI to the target processor. Right now this is protected by sched_lock but at some point I'd like to get rid of the global lock. This is why I used something more complicated than a standard queue. - kseq_assign() processes our list of kses that have been assigned to us by other processors. This simply calls sched_add() for each item on the list after clearing the new KEF_ASSIGNED flag. This flag is used to indicate that we have been appeneded to the assigned queue but not added to the run queue yet. - In sched_add(), instead of adding a KSE to another processor's queue we use kse_notify() so that we don't touch their queue. Also in sched_add(), if KEF_ASSIGNED is already set return immediately. This can happen if a thread is removed and readded so that the priority is recorded properly. - In sched_rem() return immediately if KEF_ASSIGNED is set. All callers immediately readd simply to adjust priorites etc. - In sched_choose(), if we're running an IDLE task or the per cpu idle thread set our cpumask bit in 'kseq_idle' so that other processors may know that we are idle. Before this, make a single pass through the run queues of other processors so that we may find work more immediately if it is available. - In sched_runnable(), don't scan each processor's run queue, they will IPI us if they have work for us to do. - In sched_add(), if we're adding a thread that can be migrated and we have plenty of work to do, try to migrate the thread to an idle kseq. - Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into consideration. - No longer use kseq_choose() to steal threads, it can lose it's last argument. - Create a new function runq_steal() which operates like runq_choose() but skips threads based on some criteria. Currently it will not steal PRI_ITHD threads. In the future this will be used for CPU binding. - Create a kseq_steal() that checks each run queue with runq_steal(), use kseq_steal() in the places where we used kseq_choose() to steal with before.	2003-10-31 11:16:04 +00:00
John Baldwin	e57ea233d9	Ensure that mp_ncpus is set to 1 if mp_cpu_probe() fails.	2003-10-30 21:44:01 +00:00
Alexander Kabaev	0823d2996c	Relock mntvnode_mtx if vget fails in vfs_stdsync. The loop is always shoould entered with mutex locked.	2003-10-30 16:22:51 +00:00
David Xu	7eeaaf9b97	Try to fetch thread mailbox address in page fault trap, so when thread blocks in page fault hanlder, and upcall thread can be scheduled. It is useful if process is doing lots of mmap based I/O.	2003-10-30 02:55:43 +00:00
Sam Leffler	90fc7b7cb8	Add a temporary mechanism to disble INTR_MPSAFE from network interface drivers. This is prepatory to running more parts of the network system w/o Giant.	2003-10-29 18:29:50 +00:00
Bruce Evans	b3aeaf2ed1	Removed mostly-dead code for setting switchtime after the idle loop clobbers this variable. Long ago, when the idle loop wasn't in a process, it set switchtime.tv_sec to zero to indicate that the time needs to be read after the idle loop finishes. The special case for this isn't needed now that there is an idle process (for each CPU). The time is read in the normal way when the idle process is switched away from. The seconds component of the time is only zero for the first second after the uptime is set, and the mostly-dead code was only executed during this time. (This was slightly broken by using uptimes instead of times relative to the Epoch -- in the original version the seconds component of the time was only 0 for the first second after the Epoch.) In mi_switch(), moved the setting of switchticks to just after the first (and now only) setting of switchtime. This setting used to be delayed since a late setting was needed for the idle case and an early setting was not needed. Now the early setting is needed so that fork_exit() doesn't need to set either switchtime or switchticks. Removed now-completely-rotted comment attached to this. Most of the code described by the comment had already moved to sched_switch().	2003-10-29 15:23:09 +00:00
Bruce Evans	89674a9f77	Removed sched_nest variable in sched_switch(). Context switches always begin with sched_lock held but not recursed, so this variable was always 0. Removed fixup of sched_lock.mtx_recurse after context switches in sched_switch(). Context switches always end with this variable in the same state that it began in, so there is no need to fix it up. Only sched_lock.mtx_lock really needs a fixup. Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion that sched_lock is owned and not recursed after it is fixed up. This assertion much match the one in mi_switch(), and if sched_lock were recursed then a non-null fixup of sched_lock.mtx_recurse would probably be needed again, unlike in sched_switch(), since fork_exit() doesn't return to its caller in the normal way.	2003-10-29 14:40:41 +00:00
Sam Leffler	9c855a36c1	Introduce the notion of "persistent mbuf tags"; these are tags that stay with an mbuf until it is reclaimed. This is in contrast to tags that vanish when an mbuf chain passes through an interface. Persistent tags are used, for example, by MAC labels. Add an m_tag_delete_nonpersistent function to strip non-persistent tags from mbufs and use it to strip such tags from packets as they pass through the loopback interface and when turned around by icmp. This fixes problems with "tag leakage". Pointed out by: Jonathan Stone Reviewed by: Robert Watson	2003-10-29 05:40:07 +00:00
Sam Leffler	395bb18680	speedup stream socket recv handling by tracking the tail of the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)	2003-10-28 05:47:40 +00:00
Jeff Roberson	1aca9909e5	- Only change the run queue in sched_prio() if the kse is non null. threads can be in the TD_ON_RUNQ state and not have an associated kse. - Remove the PRI_IDLE special case from sched_clock(), it was not actually necessary.	2003-10-28 03:28:48 +00:00
Jeff Roberson	eab9cabf34	- Don't set td_priority directly here, use sched_prio().	2003-10-27 07:15:47 +00:00
Jeff Roberson	3f741ca117	- Use a better algorithm in sched_pctcpu_update() Contributed by: Thomaswuerfl@gmx.de - In sched_prio(), adjust the run queue for threads which may need to move to the current queue due to priority propagation . - In sched_switch(), fix style bug introduced when the KSE support went in. Columns are 80 chars wide, not 90. - In sched_switch(), Fix the comparison in the idle case and explicitly re-initialize the runq in the not propagated case. - Remove dead code in sched_clock(). - In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads that have become runnable will get a chance to. - In sched_runnable(), if we're not the IDLETD, we should not consider curthread when examining the load. This mimics the 4BSD behavior of returning 0 when the only runnable thread is running. - In sched_userret(), remove the code for setting NEEDRESCHED entirely. This is not necessary and is not implemented in 4BSD. - Use the correct comparison in sched_add() when checking to see if an idle prio task has had it's priority temporarily elevated.	2003-10-27 06:47:05 +00:00
Alfred Perlstein	6ff7636ea5	constify the second args to timevaladd() and timevalsub().	2003-10-26 02:19:00 +00:00
Robert Watson	36bbf86ba6	Check (locked) before performing an advisory unlock following a failure of vn_start_write(). Otherwise, we may inconsistently attempt to release the advisory lock. Pointed out by: teggej	2003-10-25 16:43:50 +00:00
Robert Watson	c447f5b2f4	When generate a core dump, use advisory locking in an advisory way: if we do acquire an advisory lock, great! We'll release it later. However, if we fail to acquire a lock, we perform the coredump anyway. This problem became particularly visible with NFS after the introduction of rpc.lockd: if the lock manager isn't running, then locking calls will fail, aborting the core dump (resulting in a zero-byte dump file). Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>	2003-10-25 16:14:09 +00:00
Robert Watson	67536f038c	Allow MAC policies to block/revoke kern_alq write access to a file. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Reviewed by: jeff	2003-10-25 16:10:41 +00:00
Warner Losh	17e02bb39b	Convenience functions to generate notifications from the kernel. The ACPI code will start using these shortly. Reviewed by: njl	2003-10-24 22:41:54 +00:00
John-Mark Gurney	0eb3b7bb7f	don't allow reading from files that haven't been open'd for reading.	2003-10-24 21:07:53 +00:00
John Baldwin	8b201c42c6	- Add a DDB command 'show intrcnt' to show the non-zero interrupt counts. - Add a DDB function to dump the contents of an ithread and optionally details about each handler in that ithread. This function can be used by MD code to implement DDB commands that display information about interrupt sources and their registered handlers.	2003-10-24 21:05:30 +00:00
John Baldwin	e07c897e61	Writes to p_flag in __setugid() no longer need Giant.	2003-10-23 21:20:34 +00:00
John Baldwin	787f162df6	Move the P_COWINPROGRESS flag from being a per-process p_flag to being a per-thread td_pflag which doesn't require any locks to read or write as it is only read or written by curthread on itself. Glanced at by: mckusick	2003-10-23 21:14:08 +00:00
Garrett Wollman	06cb76bde3	Add appropriate const poisoning to the assert_*locked() family so that I can call ASSERT_VOP_LOCKED(vp, __func__) without a diagnostic. Inspired by: the evil and rude OpenAFS cache manager code	2003-10-23 18:17:36 +00:00
Robert Watson	6fa0475d95	mac_Finish break-out of kern_mac.c into parts: Include src/sys/security/mac/mac_internal.h in kern_mac.c. Remove redundant defines from the include: SYSCTL_DECL(), debug macros, composition macros. Unstaticize various bits now exposed to the remainder of the kernel: mac_init_label(), mac_destroy_label(). Remove all the functions now implemented in mac_process/mac_vfs/mac_net/ mac_pipe. Also remove debug counters, sysctls exporting debug counters, enforcement flags, sysctls exporting enforcement flags. Leave module declaration, sysctl nodes, mactemp malloc type, system calls. This should conclude MAC/LINT/NOTES breakage from the break-out process, but I'm running builds now to make sure I caught everything. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-10-22 20:59:31 +00:00
Robert Watson	089c1bdac9	Variable cleanup following break-out of kern_mac.c into sys/security/mac: Unstaticize mac_late. Remove ea_warn_once, now in mac_vfs.c. Unstaticisize mac_policy_list, mac_static_policy_list, use struct mac_policy_list_head instead of LIST_HEAD() directly. Unstaticize and un-inline MAC policy locking functions so they can be referenced from mac_*.c. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-10-22 20:47:41 +00:00
Robert Watson	9e7bf51ca8	Rename error_select() to mac_error_select(), and unstaticize so it can be used from src/sys/security/mac/mac_*.c. Obtained from: TrustedBSD Project Sponosred by: DARPA, Network Associates Laboratories	2003-10-22 20:42:22 +00:00
Mike Silbersack	184dcdc7c8	Change all SYSCTLS which are readonly and have a related TUNABLE from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.	2003-10-21 18:28:36 +00:00
Hidetoshi Shimokawa	a44ca4f05f	We need to initialize bp->b_offset and bp->b_iooffset becuase bp->b_blkno is ignored now.	2003-10-21 13:18:19 +00:00
Scott Long	bd781a1ed6	Don peril-sensitive sunglasses and mark pipe(2) as MPSAFE. I've beaten up on it for the last 15 hours with no signs of problems. It gives a small (1%) gain on buildworld since pipe_read/pipe_write are already free of Giant.	2003-10-21 07:03:27 +00:00
Poul-Henning Kamp	68b00bf648	Remove KASSERTS on B_PHYS for vmapbuf() and vunmapbuf(), B_PHYS is going away.	2003-10-21 06:53:10 +00:00
Marcel Moolenaar	9ee99eb496	Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.	2003-10-21 01:13:49 +00:00
Sam Leffler	6c24056459	revert default for idle polling to zero until we can resolve the livelock problem	2003-10-20 21:14:24 +00:00
Jeff Roberson	484288de56	- If a thread is not bound to a kse return 0 from sched_pctcpu(). Reported by: pawel.worach@nordea.com	2003-10-20 19:55:21 +00:00
Alan Cox	f2b1200d08	Initialize the buf's b_object in pbgetvp(). Clear it in pbrelvp(). (This facilitates synchronization of the vm page's valid field using the vm object's lock.) Suggested by: tegge	2003-10-20 18:24:38 +00:00
David Malone	111b0d0d29	Mark dup as MPSAFE. Giant was pushed into dup ages ago, but it looks like it was missed in syscalls.master. Spotted by: alc	2003-10-20 16:16:03 +00:00
Alan Cox	9027d603d3	- Synchronize access to a vm page's valid field using the containing vm object's lock.	2003-10-20 05:57:55 +00:00
Marcel Moolenaar	bab1f05277	Put the RSE backing store at a fixed address. This change is triggered by libguile that needs to know the base of the RSE backing store. We currently do not export the fixed address to userland by means of a sysctl so user code needs to hardcode it for now. This will be revisited later. The RSE backing store is now at the bottom of region 4. The memory stack is at the top of region 4. This means that the whole region is usable for the stacks, giving a 61-bit stack space. Port: lang/guile (depended of x11/gnome2)	2003-10-20 05:34:10 +00:00
David Malone	e1419c08e2	falloc allocates a file structure and adds it to the file descriptor table, acquiring the necessary locks as it works. It usually returns two references to the new descriptor: one in the descriptor table and one via a pointer argument. As falloc releases the FILEDESC lock before returning, there is a potential for a process to close the reference in the file descriptor table before falloc's caller gets to use the file. I don't think this can happen in practice at the moment, because Giant indirectly protects closes. To stop the file being completly closed in this situation, this change makes falloc set the refcount to two when both references are returned. This makes life easier for several of falloc's callers, because the first thing they previously did was grab an extra reference on the file. Reviewed by: iedowse Idea run past: jhb	2003-10-19 20:41:07 +00:00
Alan Cox	48ae2dddac	- Add vm object locking to vfs_clean_pages() and vfs_bio_set_validclean(). This is to synchronize access to the vm page's valid field by vm_page_set_validclean().	2003-10-19 20:39:06 +00:00
Peter Wemm	68d86cf1e2	Tidy up loose ends in the idle process. Call the MI cpu_idle() function for all platforms now. XXX alpha/sparc64/powerpc should fill in the function. Submitted by: bde	2003-10-19 02:43:57 +00:00
Poul-Henning Kamp	2d6a9d0747	Initialize b_iooffset before calling VOP_[SPEC]STRATEGY	2003-10-18 19:49:46 +00:00
Poul-Henning Kamp	01758670e9	Initialize b_iooffset before calling strategy	2003-10-18 19:48:21 +00:00
Poul-Henning Kamp	0efedd8864	Don't report b_pblkno, it is going away.	2003-10-18 17:59:02 +00:00
Poul-Henning Kamp	1ad9172f6b	Report bio_pblkbo instead of bio_blkno.	2003-10-18 17:27:10 +00:00
Poul-Henning Kamp	4cb4df483c	Make bioq_disksort() sort on the bio_offset field instead of bio_pblkno.	2003-10-18 15:50:56 +00:00
Poul-Henning Kamp	2c18019f14	DuH! bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)	2003-10-18 14:10:28 +00:00
Poul-Henning Kamp	cc81271eaa	I think rwatson got the sign wrong here...	2003-10-18 12:16:17 +00:00
Poul-Henning Kamp	855c6fcc68	Initialize bp->b_offset before calling VOP_STRATEGY()	2003-10-18 11:13:31 +00:00
Poul-Henning Kamp	583b92e328	Convert some if(bla) panic("foo") to KASSERTS to improve grep-ability.	2003-10-18 09:32:39 +00:00
Poul-Henning Kamp	d986d4580c	The size and contents of the DEV_STRATEGY() macro has progressed to the point where it being a macro is no longer sensible, and it will only be more so in days to come. BIO_STRATEGY() is now only used from DEV_STRATEGY() and should not be used directly anymore. Put the contents of both in the new function dev_strategy() and make DEV_STRATEGY() call that function. In addition, this allows us to make the rather magic bufdonebio() helper function static. This alse saves hunderedandsome bytes of code in a typical kernel.	2003-10-18 09:03:15 +00:00
Robert Watson	dae6d925a2	Wrap db_active check in #ifdef DDB, as db_active is not defined ifndef DDB.	2003-10-18 02:23:57 +00:00
Robert Watson	90e6b5447f	Add a new cn_flags fields to struct consdev, the low-level console definition structure. Define one flag, CN_FLAG_NODEBUG, which indicates the console driver cannot be used in the context of the debugger. This may be used, for example, if the console device interacts with kernel services that cannot be used from the debugger context, such as the network stack. These drivers are skipped over for calls to cn_checkc() and cn_putc(), and the calling function simply moves on to the next available console.	2003-10-18 02:13:39 +00:00
Jeff Roberson	94816f6d52	- Remove the correct thread from the run queue in setrunqueue(). This fixes ULE + KSE.	2003-10-17 20:53:04 +00:00
Poul-Henning Kamp	3da2d6a453	Simplify count_dev()	2003-10-17 11:56:48 +00:00
Peter Wemm	c9c373b093	Halt the cpu on amd64 as well. For some strange reason, this makes a fair bit of difference to the power consumption and lets my cpu cool down enough for the temperature sensitive fan controller to completely stop the cpu fan at times.	2003-10-17 03:49:03 +00:00
Marcel Moolenaar	b0f865c1f3	Implement cpu_idle() on ia64. We put the processor in a lightweight halt state that minimizes power consumption while still preserving cache and TLB coherency. Halting the processor is not conditional at this time. Tested with UP and SMP kernels.	2003-10-17 02:24:59 +00:00
Jeff Roberson	55f2099a70	- The kse may be null in sched_pctcpu(). Reported by: kris	2003-10-16 21:13:14 +00:00
Jeff Roberson	0e0f626628	- Only kse_reassign() in the !running case. Reported by: kris	2003-10-16 20:32:57 +00:00
Jeff Roberson	0c7da3a43d	- Call sched_add() with the correct argument on SMP. Reported by: Valentin Chopov <valentin@valcho.net>	2003-10-16 20:06:19 +00:00
Jeff Roberson	b72f347bdb	- Fix a minor problem with my last commit, we don't want to return from sched_switch if the thread is running, we want to fall through and pick a new thread because we have been preempted.	2003-10-16 10:04:54 +00:00
Doug Rabson	46ba7a35f2	* Add multiple inheritance to kobj. Each class can have zero or more base classes and if a method is not found in a given class, its base classes are searched (in the order they were declared). This search is recursive, i.e. a method may be define in a base class of a base class. * Change the kobj method lookup algorithm to one which is SMP-safe. This relies only on the constraint that an observer of a sequence of writes of pointer-sized values will see exactly one of those values, not a mixture of two or more values. This assumption holds for all processors which FreeBSD supports. * Add locking to kobj class initialisation. * Add a simpler form of 'inheritance' for devclasses. Each devclass can have a parent devclass. Searches for drivers continue up the chain of devclasses until either a matching driver is found or a devclass is reached which has no parent. This can allow, for instance, pci drivers to match cardbus devices (assuming that cardbus declares pci as its parent devclass). * Increment __FreeBSD_version. This preserves the driver API entirely except for one minor feature used by the ISA compatibility shims. A workaround for ISA compatibility will be committed separately. The kobj and newbus ABI has changed - all modules must be recompiled.	2003-10-16 09:16:28 +00:00
Jeff Roberson	ae53b483cc	- Collapse sched_switchin() and sched_switchout() into sched_switch(). Now mi_switch() calls sched_switch() which calls cpu_switch(). This is actually one less function call than it had been.	2003-10-16 08:53:46 +00:00
Jeff Roberson	7cf90fb376	- Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td argument rather than a kse.	2003-10-16 08:39:15 +00:00
Jeff Roberson	4c9612c622	- The non iterative algorithm for interact_update was broken due to rounding errors. This was the source of the majority of the interactivity problems. Reintroduce the old algorithm and its XXX. - Up the interactivity threshold to 30. It really could stand to be even a tiny bit higher. - Let the sleep and run time accumulate up to 5 seconds of history rather than two. This helps stop XFree86 from becoming non-interactive during bursts of activity.	2003-10-16 08:17:43 +00:00
Jeff Roberson	08fd6713b2	- If our user_pri doesn't match our actual priority our priority has been elevated either due to priority propagation or because we're in the kernel in either case, put us on the current queue so that we dont stop others from using important resources. At some point the priority elevations from sleeping in the kernel should go away. - Remove an optimization in sched_userret(). Before we would only set NEEDRESCHED if there was something of a higher priority available. This is a trivial optimization and it breaks priority propagation because it doesn't take threads which we may be blocking into account. Notice that the thread which is blocking others gets up to one tick of cpu time before we honor this NEEDRESCHED in sched_clock().	2003-10-15 07:47:06 +00:00
Peter Wemm	25e247af44	The KERN_PROC_PROC sysctl took 4 args in 5.0-REL and 5.1-REL. We need to accept this for a bit longer. Requiring the new order of 3 args only was not very helpful.	2003-10-15 03:11:46 +00:00
Sam Leffler	bd19669855	Change default for kern.polling.idle_poll back to 1. This was set to 0 because Luigi observed livelock but in recent testing it did not occur so I'm re-enabling it by default. Reviewed by: luigi	2003-10-14 18:39:36 +00:00
Poul-Henning Kamp	b84044731d	Made use of 'error' argument, which was unused (by mistake) before. Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>	2003-10-14 08:09:43 +00:00
Warner Losh	d29516dd82	With DIAGNOSTICS, sometimes we get weird crashes when some driver accesses softc after it is freed. Use a different malloc type for softc than the rest of the bus code to make it more clear when these things happen that it is the driver that's at fault, not the bus code. Suggested by: sam and/or phk (I think)	2003-10-14 06:22:07 +00:00
Jeff Roberson	85b9831dfa	- Add a mising vn_finished_write() Pointy hat: jeff Found by: robert Obtained from: kirk	2003-10-14 00:38:34 +00:00
David Xu	3a2e2a0ec8	Don't clear signal mask in execsig(). RELENG_4 does not clear it and POSIX asks to inherit signal mask for execv.	2003-10-13 14:03:08 +00:00
Jeff Roberson	736c97c7b3	- In SCHED_CURR() add holding Giant to the list of criteria that will keep you on the current queue. In the future, it would be nice if priority propagation could deterministicly pluck a thread off of the next queue and put it on the current queue. Until then this hack stops us from holding up our entire current queue, including interrupt handlers, while a thread on the next queue is blocked while holding Giant. - Inherit our pctcpu information from our parent.	2003-10-12 21:07:31 +00:00
Alan Cox	d58e70a08d	In vfs_bio_clrbuf(), ignore the state of the object lock if the page is the "bogus" page. Found by: tegge	2003-10-12 18:26:48 +00:00
Poul-Henning Kamp	5108cd3652	Simplify vn_isdisk() a bit.	2003-10-12 14:04:39 +00:00
John-Mark Gurney	9e5de980c6	fix a problem referencing free'd memory. This is only a problem for kqueue write events on a socket and you regularly create tons of pipes which overwrites the structure causing a panic when removing the knote from the list. If the peer has gone away (and it's a write knote), then don't bother trying to remove the knote from the list. Submitted by: Brian Buchanan and myself Obtained from: nCircle	2003-10-12 07:06:02 +00:00
Jeff Roberson	7dd1328c13	- Fix a typo, I meant & and not \|. This was causing lockups from the syncer looping forever due to list corruption. Solved by: tegge	2003-10-11 21:50:45 +00:00
Alan Cox	08814d66d5	- Synchronize access to a page's valid field in vfs_bio_clrbuf() by using the lock from its containing object. - Remove GIANT_REQUIRED from vm_hold_load_pages().	2003-10-10 07:26:21 +00:00
Robert Drehmel	ea924c4cd3	Implement preliminary support for the PT_SYSCALL command to ptrace(2).	2003-10-09 10:17:16 +00:00
Tim J. Robbins	a50f62fd9f	Remove support for the unused 4th component of the KERN_PROC_PROC sysctl.	2003-10-06 01:26:11 +00:00
Jeff Roberson	d1cf0fc7fc	- Add a missing vn_start_write() to flushbufqueues(). This could have caused snapshot related problems. - The vp can not be NULL here or we would panic in vfs_bio_awrite(). Stop confusing the logic by checking for it in several places. Submitted by: kirk and then rototilled by me to remove vp == NULL checks.	2003-10-05 22:16:08 +00:00
Bruce M Simpson	f05970242b	Bring back sysctl_wire_old_buffer(). Fix a bug in sysctl_handle_opaque() whereby the pointers would not get reset on a retried SYSCTL_OUT() call. Noticed by: bde	2003-10-05 13:31:33 +00:00
Bruce M Simpson	dcf59a59fc	Fix a security problem in sysctl() the long way round. Use pre-emption detection to avoid the need for wiring a userland buffer when copying opaque data structures. sysctl_wire_old_buffer() is now a no-op. Other consumers of this API should use pre-emption detection to notice update collisions. vslock() and vsunlock() should no longer be called by any code and should be retired in subsequent commits. Discussed with: pete, phk MFC after: 1 week	2003-10-05 09:37:47 +00:00
Bruce M Simpson	0c9601bc6b	Add a pre-emption counter, td_generation, so that threads can notice when they have been pre-empted by other threads. This is bumped from within mi_switch() every time a context switch takes place. Discussed with: pete	2003-10-05 09:35:08 +00:00
Bruce M Simpson	51830edcc5	Fold the vslock() and vsunlock() calls in this file with #if 0's; they will go away in due course. Involuntary pre-emption means that we can't count on wiring of pages alone for consistency when performing a SYSCTL_OUT() bigger than PAGE_SIZE. Discussed with: pete, phk	2003-10-05 08:38:22 +00:00
Jeff Roberson	98d7d155c1	- Apply a big giant lock around the namecache. This has been sitting in my tree since BSDcon.	2003-10-05 07:13:50 +00:00
Jeff Roberson	bdcfcdecea	- Fix an XXX. Check the error of vn_lock() in vflush(). Don't specify LK_RETRY either, we don't want this vnode if it turns into another. - Remove the code that checks the mount point after acquiring the lock we are guaranteed to either fail or get the vnode that we wanted.	2003-10-05 07:12:38 +00:00
Bruce M Simpson	5be99846fc	Remove magic numbers surrounding locking state in the sysctl module, and replace them with more meaningful defines.	2003-10-05 05:38:30 +00:00
Jeff Roberson	45503a37dd	- Rename vcanrecycle() to vtryrecycle() to reflect its new role. - In vtryrecycle() try to vgonel the vnode if all of the previous checks passed. We won't vgonel if someone has either acquired a hold or usecount or started the vgone process elsewhere. This is because we may have been removed from the free list while we were inspecting the vnode for recycling. - The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling the same vnode. To further reduce the likelyhood of this event, requeue the vnode on the tail of the list prior to calling vtryrecycle(). We can not actually remove the vnode from the list until we know that it's going to be recycled because other interlock holders may see the VI_FREE flag and try to remove it from the free list. - Kill a bogus XXX comment. If XLOCK is set we shouldn't wait for it regardless of MNT_WAIT because the vnode does not actually belong to this filesystem.	2003-10-05 05:35:41 +00:00
Jeff Roberson	85311d4b59	- Don't cache_purge() in getnewvnode. It's done in vclean(). With this purge, the purge in vclean, and the filesystems purge, we had 3 purges per vnode. - Move the insmntque(vp, 0) to vclean() so that we may remove it from the two vgone() functions and reduce the number of lock operations required.	2003-10-05 02:48:04 +00:00
Jeff Roberson	ce13b187e7	- Solve a LOR with the sync_mtx by using the VI_ONWORKLST flag to determine whether or not the sync failed. This could potentially get set between the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly lead to the sync being delayed by an extra 30 seconds. If we do not move the vnode it could cause an endless loop if it continues to fail to sync. - Use vhold and vdrop to stop the vnode from changing identities while we have it unlocked. Other internal vfs lists are likely to follow this scheme.	2003-10-05 00:35:41 +00:00
Jeff Roberson	894fbf9769	- Move the xlock 'locking' code into vx_lock() and vx_unlock(). - Create a new function, vgonechrl(), which performs vgone for an in-use character device. Move the code from vflush() that did this into vgonechrl(). - Hold the xlock across the entirety of vgonel() and vgonechrl() so that at no point will an invalid vnode exist on any list without XLOCK set. - Move the xlock code out of vclean() now that it is in the vgone*() functions.	2003-10-05 00:02:41 +00:00
Alan Cox	6ec2fca505	Eliminate some unnecessary uses of the vm page queues lock around the vm page's valid field. This field is being synchronized using the containing vm object's lock.	2003-10-04 22:47:20 +00:00
Alan Cox	bf0da100d6	- Extend the scope the vm object lock to cover calls to vm_page_is_valid(). - Assert that the lock on the containing vm object is held in vm_page_is_valid().	2003-10-04 19:23:29 +00:00
Jeff Roberson	6f4b0863e0	- In sched_sync() test our preconditions prior to dropping the sync_mtx. This is so that we may grab the interlock while still holding the sync_mtx. We have to VI_TRYLOCK() because in all other cases the lock order runs the other way. - If we don't meet any of the preconditions, reinsert the vp into the list for the next second. - We don't need to panic if we fail to sync here because each FSYNC function handles this case. Removing this redundant code also simplifies locking.	2003-10-04 18:03:53 +00:00
Jeff Roberson	8ec82641d8	- Change a lame iterative algorithm to a constant time algorithm. Remove the XXX that complains about it as well. Submitted by: ThomasWuerfl@gmx.de	2003-10-04 17:41:13 +00:00
Jeff Roberson	e4c49d2b50	- In a Giantless world, the vn_lock() in vcanrecycle() could legitimately fail. Remove the panic from that case and document why it might fail. - Document the reason for calling cache_purge() on a newly created vnode. - In insmntque() order the operations so that we can call mtx_unlock() one fewer times. This makes the code somewhat clearer as well. - Add XXX comments in sched_sync() and vflush(). - In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is set. - In vclean() we don't need to acquire a lock around a single TAILQ_FIRST call. It's ok if we race here, the vinvalbuf will just do nothing. - Increase the scope of the lock in vgonel() to reduce the number of lock operations that are performed.	2003-10-04 15:10:40 +00:00
Jeff Roberson	1de1f935f2	- If we are called with LK_NOWAIT in vn_lock() we may be holding a mutex and should not sleep while waiting for XLOCK to clear. Care needs to be taken in functions that use this capability to avoid spinning.	2003-10-04 14:35:22 +00:00
Jacques Vidrine	8b7358ca43	Introduce a uiomove_frombuf helper routine that handles computing and validating the offset within a given memory buffer before handing the real work off to uiomove(9). Use uiomove_frombuf in procfs to correct several issues with integer arithmetic that could result in underflows/overflows. As a side-effect, the code is significantly simplified. Add additional sanity checks when computing a memory allocation size in pfs_read. Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-) Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)	2003-10-02 15:00:55 +00:00
Robert Watson	c142b0fcfe	Remove the global variable 'cmask', which was used to initialize the fd_cmask field in the file descriptor structure for the first process indirectly from CMASK, and when an fd structure is initialized before being filled in, and instead just use CMASK. This appears to be an artifact left over from the initial integration of quotas into BSD. Suggested by: peter	2003-10-02 03:57:59 +00:00
Jeff Roberson	fa3f9daae5	- On my Pentium4-M laptop, invalpg takes ~1100 cycles if the page is found in the TLB and ~1600 if it is not. Therefore, it is more effecient to invalidate the TLB after operations that use CMAP rather than before. - So that the tlb is invalidated prior to switching off of a processor, we must change the switchin functions to switchout functions. - Remove td_switchout from the thread and move it to the x86 pcb. - Move the code that calls switchout into swtch.s. These changes make this optimization truely x86 specific.	2003-09-30 08:11:36 +00:00
Robert Watson	cc7b13bfe0	If the struct mac copied into the kernel has a negative length, return EINVAL rather than failing the following malloc due to the value being too large.	2003-09-29 18:35:17 +00:00
Poul-Henning Kamp	431021789f	Retire revoke_and_destroy_dev() with extreme prejudice.	2003-09-28 20:50:36 +00:00
Marcel Moolenaar	c31f2280ed	Remove the regstkpages sysctl variable. We have a growable register stack now.	2003-09-27 23:07:47 +00:00
Marcel Moolenaar	fd75d71049	Part 2 of implementing rstacks: add the ability to create rstacks and use the ability on ia64 to map the register stack. The orientation of the stack (i.e. its grow direction) is passed to vm_map_stack() in the overloaded cow argument. Since the grow direction is represented by bits, it is possible and allowed to create bi-directional stacks. This is not an advertised feature, more of a side-effect. Fix a bug in vm_map_growstack() that's specific to rstacks and which we could only find by having the ability to create rstacks: when the mapped stack ends at the faulting address, we have not actually mapped the faulting address. we need to include or cover the faulting address. Note that at this time mmap(2) has not been extended to allow the creation of rstacks by processes. If such a need arises, this can be done. Tested on: alpha, i386, ia64, sparc64	2003-09-27 22:28:14 +00:00
Poul-Henning Kamp	98c469d484	Make life a little bit easier for cloning device drivers.	2003-09-27 21:50:00 +00:00
Poul-Henning Kamp	b294143142	Introduce no_poll() default method for device drivers. Have it do exactly the same as vop_nopoll() for consistency and put a comment in the two pointing at each other. Retire seltrue() in favour of no_poll(). Create private default functions in kern_conf.c instead of public ones. Change default strategy to return the bio with ENODEV instead of doing nothing which would lead the bio stranded. Retire public nullopen() and nullclose() as well as the entire band of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump} funtions, they are the default actions now. Move the final two trivial functions from subr_xxx.c to kern_conf.c and retire the now empty subr_xxx.c	2003-09-27 12:53:33 +00:00
Poul-Henning Kamp	41cbb0b237	Don't use seltrue when that is not really what we mean.	2003-09-27 12:44:06 +00:00
Poul-Henning Kamp	70cd771337	The present defaults for the open and close for device drivers which provide no methods does not make any sense, and is not used by any driver. It is a pretty hard to come up with even a theoretical concept of a device driver which would always fail open and close with ENODEV. Change the defaults to be nullopen() and nullclose() which simply does nothing. Remove explicit initializations to these from the drivers which already used them.	2003-09-27 12:01:01 +00:00
Poul-Henning Kamp	3f99f14bf1	OK, I messed up /dev/console with what I had hoped would be compat code. Convert remaining console drivers and hope for the best.	2003-09-26 19:35:50 +00:00
Robert Drehmel	4cc9f52f78	Move some tracing related code into its own function as it will be needed for system call related ptrace functionality I plan to commit soon.	2003-09-26 15:09:46 +00:00
Poul-Henning Kamp	3d4274a52b	Update the list of CDROM device names to try for booting with RB_CDROM flag set.	2003-09-26 09:07:27 +00:00
Poul-Henning Kamp	0d44087987	Remove wrongly sized cnd_name field, we now store the name in the consdev structure. If the consdev name is not set and we have a cn_dev, set the name from there. Try to issue a printf about this, even though it may not have a place to go. Modify the sysctl related code to pick up the name from the consdev instead.	2003-09-26 07:26:54 +00:00
Peter Wemm	c460ac3a00	Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.	2003-09-25 01:10:26 +00:00
Max Khon	b15572e3fc	Avoid NULL pointer dereferencing in modlist_lookup2(). PR: 56570 Submitted by: Thomas Wintergerst <Thomas.Wintergerst@nord-com.net>	2003-09-23 14:42:38 +00:00
Alan Cox	c76789caa6	- vm_hold_free_pages() should lock the kernel object. (The pages being freed belong to the kernel object.) - Increase the granularity of the vm object locking in vm_hold_load_pages() in order to reduce the number of times that we acquire and release the same lock.	2003-09-22 04:58:09 +00:00
Doug Rabson	ab7a2646e0	The method link_preload_finish is not static.	2003-09-20 17:39:32 +00:00
Jeff Roberson	81de51bf1d	- Somewhere along the line I stupidly removed critical logic from sched_ptcpu_update(). This caused erroneous cpu times in TOP for processes that were asleep. Replace the code that was removed.	2003-09-20 02:05:58 +00:00
Jeff Roberson	51b575490c	- In reassignbuf() don't unlock vp and lock newvp if they are the same. Doing so creates a race where the buf is on neither list. - Only vfree() in an error case in vclean() if VSHOULDFREE() thinks we should. - Convert the error case in vclean() to INVARIANTS from DIAGNOSTIC as this really should not happen and is fast to check.	2003-09-20 00:21:48 +00:00
Jeff Roberson	6b6c163a37	- Remove spls(). The locking that has replaced them is in place and they no longer serve as guidelines for future work.	2003-09-19 23:52:06 +00:00
Alexander Kabaev	aebbeee812	Eliminate one case of VI_UNLOCK followed by an immediate VI_LOCK.	2003-09-19 19:13:54 +00:00
Tim J. Robbins	3ddaef4034	Allow the KERN_PROC_PROC sysctl to be used without the useless 4th name component, for consistency with KERN_PROC_ALL. Support for the 4-argument form will be removed some time before 5.2-R.	2003-09-19 14:16:50 +00:00
Jeff Roberson	9fb535dec5	- Only use UMA to cache malloc requests up to PAGE_SIZE. Values larger than this are requested very infrequently and waste memory when we cache spares.	2003-09-19 04:39:08 +00:00
Alan Cox	35b86dc8de	Correct a typo in the previous revision.	2003-09-15 02:56:48 +00:00
Robert Watson	62c45ef40a	Add a new sysctl, security.bsd.conservative_signals, to disable special signal-delivery protections for setugid processes. In the event that a system is relying on "unusual" signal delivery to processes that change their credentials, this can be used to work around application problems. Also, add SIGALRM to the set of signals permitted to be delivered to setugid processes by unprivileged subjects. Reported by: Joe Greco <jgreco@ns.sol.net>	2003-09-14 07:22:38 +00:00
Jacques Vidrine	5949ba2136	sched_setscheduler: Return EINVAL when a invalid policy is specified, thus complying with POLA and the man page. (Previously, no error was returned for this case.)	2003-09-13 18:46:24 +00:00
Jacques Vidrine	b5e80ae344	Correct mostly harmless off-by-one error in getdomainname(). Reviewed by: imp	2003-09-13 17:12:22 +00:00
Alan Cox	58abfe0051	Convert vmapbuf() from using pmap_extract() to using pmap_extract_and_hold(). Note, however, that GIANT_REQUIRED should not be removed until all platforms fully implement the "prot" parameter to pmap_extract_and_hold(). Reviewed by: tegge	2003-09-13 04:29:55 +00:00
Alan Cox	27d203eab3	pipe_build_write_buffer() only requires read access of the page that it obtains from pmap_extract_and_hold().	2003-09-12 07:13:15 +00:00
Marcel Moolenaar	da13b8f9fe	Introduce BUS_CONFIG_INTR(). The method allows devices to tell parents about interrupt trigger mode and interrupt polarity. This allows ACPI for example to pass interrupt resource information up the hierarchy. The default implementation of the method therefore is to pass the request to the parent. Reviewed by: jhb, njl	2003-09-10 21:37:10 +00:00
Hidetoshi Shimokawa	8edbaf859d	Fix asynchronous physio breakage introduced in rev 1.163. We cannnot use bp->b_caller2 because DEV_STRATEGY will overwrite it.	2003-09-10 15:48:51 +00:00
John Baldwin	2b3c42a9e9	Update the license on this file to be a bit more sane.	2003-09-10 01:09:32 +00:00
Ian Dowse	ffe40c80ea	In the !MNT_BYFSID case, return EINVAL from unmount(2) when the specified directory is not found in the mount list. Before the MNT_BYFSID changes, unmount(2) used to return ENOENT for a nonexistent path and EINVAL for a non-mountpoint, but we can no longer distinguish between these cases. Of the two error codes, EINVAL was more likely to occur in practice, and it was the only one of the two that was documented. Update the manual page to match the current behaviour. Suggested by: tjr Reviewed by: tjr	2003-09-08 16:23:21 +00:00
Alan Cox	03be99d20c	Use pmap_extract_and_hold() in pipe_build_write_buffer(). Consequently, pipe_build_write_buffer() no longer requires Giant on entry. Reviewed by: tegge	2003-09-08 04:58:32 +00:00
Tim J. Robbins	f05a427aa6	Return EINVAL if the contested bit is not set on the umtx passed to _umtx_unlock() instead of firing a KASSERT.	2003-09-07 11:14:52 +00:00
Alan Cox	ffe5125eac	msync(2) should be declared MP-safe.	2003-09-07 05:42:07 +00:00
Sam Leffler	6c024e8ef6	add fast swi taskqueue spinlock to the order_list so witness doesn't complain Submitted by: Tor Egge <Tor.Egge@cvsup.no.freebsd.org>	2003-09-06 21:06:08 +00:00
Sam Leffler	7e2282a5a6	correct fast swi taskqueue spinlock name to be different from the sleep lock Submitted by: Tor Egge <Tor.Egge@cvsup.no.freebsd.org>	2003-09-06 21:05:18 +00:00
Alan Cox	603d3d4a44	Giant is no longer required by pipe_destroy_write_buffer(). Reduce unnecessary white space from pipe_destroy_write_buffer().	2003-09-06 21:02:10 +00:00
Sam Leffler	f82c9e70f9	"fast swi" taskqueue support. This is a taskqueue that uses spinlocks making it useful for dispatching swi tasks from fast interrupt handlers. Sponsered by: FreeBSD Foundation	2003-09-05 23:09:22 +00:00
Sam Leffler	7c00e355a2	Print a message at boot for interrupt handlers created with INTR_MPSAFE and/or INTR_FAST. This belongs elsehwere and perhaps under bootverbose; I'm committing it for now as it's uesful to know which drivers have been converted and which have not.	2003-09-05 22:51:18 +00:00
Peter Wemm	917cf8d2a3	Log involuntary context switches correctly.	2003-09-05 22:15:26 +00:00
Poul-Henning Kamp	ce914a08b0	Put the message about msgbuf cksum mismatch under bootverbose and tell people what the consequence is.	2003-09-05 11:12:00 +00:00
Poul-Henning Kamp	c679c73452	Use the quality to disable timecounters for which we deem Hz too low.	2003-09-03 08:14:16 +00:00
Kenneth D. Merry	cb32189e23	Move dynamic sysctl(8) variable creation for the cd(4) and da(4) drivers out of cdregister() and daregister(), which are run from interrupt context. The sysctl code does blocking mallocs (M_WAITOK), which causes problems if malloc(9) actually needs to sleep. The eventual fix for this issue will involve moving the CAM probe process inside a kernel thread. For now, though, I have fixed the issue by moving dynamic sysctl variable creation for these two drivers to a task queue running in a kernel thread. The existing task queues (taskqueue_swi and taskqueue_swi_giant) run in software interrupt handlers, which wouldn't fix the problem at hand. So I have created a new task queue, taskqueue_thread, that runs inside a kernel thread. (It also runs outside of Giant -- clients must explicitly acquire and release Giant in their taskqueue functions.) scsi_cd.c: Remove sysctl variable creation code from cdregister(), and move it to a new function, cdsysctlinit(). Queue cdsysctlinit() to the taskqueue_thread taskqueue once we have fully registered the cd(4) driver instance. scsi_da.c: Remove sysctl variable creation code from daregister(), and move it to move it to a new function, dasysctlinit(). Queue dasysctlinit() to the taskqueue_thread taskqueue once we have fully registered the da(4) instance. taskqueue.h: Declare the new taskqueue_thread taskqueue, update some comments. subr_taskqueue.c: Create the new kernel thread taskqueue. This taskqueue runs outside of Giant, so any functions queued to it would need to explicitly acquire/release Giant if they need it. cd.4: Update the cd(4) man page to talk about the minimum command size sysctl/loader tunable. Also note that the changer variables are available as loader tunables as well. da.4: Update the da(4) man page to cover the retry_count, default_timeout and minimum_cmd_size sysctl variables/loader tunables. Remove references to /dev/r???, they aren't used any longer. cd.9: Update the cd(9) man page to describe the CD_Q_10_BYTE_ONLY quirk. taskqueue.9: Update the taskqueue(9) man page to describe the new thread task queue, and the taskqueue_swi_giant queue. MFC after: 3 days	2003-09-03 04:46:28 +00:00
Sam Leffler	28ace1bf60	move domain list mutex initialization to earlier in the boot sequence so statically configured modules like netgraph can call net_init_domain Noticed by: D.Rock@t-online.de (D. Rock)	2003-09-02 20:59:23 +00:00
Mike Silbersack	3390d47670	Implement MBUF_STRESS_TEST mark II. Changes from the original implementation: - Fragmentation is handled by the function m_fragment, which can be called from whereever fragmentation is needed. Note that this function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing use. - m_fragment works slightly differently from the old fragmentation code in that it allocates a seperate mbuf cluster for each fragment. This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent fragments. While that is a nice feature in practice, it nerfed the usefulness of mbuf_stress_test. - Add two modes of random fragmentation. Chains with fragments all of the same random length and chains with fragments that are each uniquely random in length may now be requested.	2003-09-01 05:55:37 +00:00
Sam Leffler	b9651df42c	o interlock domain list when adding domains o remove irrlevant spl Notes: 1. We don't lock domain list traversals as this is safe until we start removing domains. 2. The calculation of max_datalen in net_init_domain appears safe as noone depends on max_hdr and max_datalen having consistent values. 3. Giant is still held for fast and slow timeouts; this must stay until each timeout routine is properly locked (coming soon). Sponsored by: FreeBSD Fondation	2003-09-01 05:01:55 +00:00
Jeff Roberson	d919a11d06	- Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to bail out if the buffer is not already present. - The buffer returned by incore() is not locked and should not be sent to brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the desired semantics.	2003-08-31 08:50:11 +00:00
Jeff Roberson	a7db559087	- If there is no vp assume that BKGRDINPROG is not set and set RELPBUF in brelse().	2003-08-31 01:07:45 +00:00
Jeff Roberson	b5c61abd82	- In some cases bp->b_vp can be NULL in brelse, don't try to lock the interlock in that case. Found by: alc	2003-08-31 00:06:07 +00:00
Alan Cox	411d10a600	Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy sockets into machine-dependent files. The rationale for this migration is illustrated by the modified amd64 allocator. It uses the amd64's direct map to avoid emphemeral mappings in the kernel's address space. On an SMP, the emphemeral mappings result in an IPI for TLB shootdown for each transmitted page. Yuck. Maintainers of other 64-bit platforms with direct maps should be able to use the amd64 allocator as a reference implementation.	2003-08-29 20:04:10 +00:00
Marcel Moolenaar	9e8147f3af	In bufdone(), change the format specifier for m->valid and m->dirty to a long type and explicitly cast m->valid and m->dirty to unsigned long. When PAGE_SIZE is 32K, these fields are in fact unsigned long.	2003-08-28 19:58:11 +00:00
Alexander Kabaev	772a9659d9	Do not return with vnode interlock held. Reviewed by: rwatson	2003-08-28 15:48:15 +00:00
Jeff Roberson	9dbfeb0ae6	- Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field. - Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode interlock. - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write buffers. Check for the BKGRDINPROG flag before recycling or throwing away a buffer. We do this instead because it is not safe for us to move the original buffer to a new queue from the callback on the background write buffer. - Remove the B_LOCKED flag and the locked buffer queue. They are no longer used. - The vnode interlock is used around checks for BKGRDINPROG where it may not be strictly necessary. If we hold the buf lock the a back-ground write will not be started without our knowledge, one may only be completed while we're not looking. Rather than remove the code, Document two of the places where this extra locking is done. A pass should be done to verify and minimize the locking later.	2003-08-28 06:55:18 +00:00
Robert Watson	a6a65b05d5	Fix a mac_policy_list reference to be a mac_static_policy_list reference: this fixes mac_syscall() for static policies when using optimized locking. Obtained from: TrustedBSD Project Sponosred by: DARPA, Network Associates Laboratories	2003-08-26 17:29:02 +00:00
David Xu	ab2baa7254	Let SA process work under ULE scheduler, originally it would panic kernel. Reviewed by: jeff	2003-08-26 11:33:15 +00:00
Alan Cox	b7ad744dc5	Hold the page queues lock when performing vm_page_clear_dirty() and vm_page_set_invalid().	2003-08-23 18:11:53 +00:00
Tim J. Robbins	c89d555c6c	Fix a logic error in osethostid() that was introduced in rev. 1.34: allow hostid to be set when suser() returns 0, not when it returns an error. This would have allowed non-root users to set the host ID.	2003-08-23 15:45:57 +00:00
Marcel Moolenaar	38bf4e9667	On ia64 time_t is 64 bit. Explicitly cast tv_sec to long and change the corresponding format specifier to %ld in a call to printf() in function softclock(). The printf() is conditional upon DIAGNOSTIC. Found by: LINT	2003-08-23 08:31:32 +00:00
Robert Watson	eb8c7f9992	Introduce two new MAC Framework and MAC policy entry points: mac_reflect_mbuf_icmp() mac_reflect_mbuf_tcp() These entry points permit MAC policies to do "update in place" changes to the labels on ICMP and TCP mbuf headers when an ICMP or TCP response is generated to a packet outside of the context of an existing socket. For example, in respond to a ping or a RST packet to a SYN on a closed port. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-08-21 18:21:22 +00:00
Eivind Eklund	effb9ebd01	Change description of kern.osreldate from "Operating system release date" to "Kernel release date" - userland version is in /usr/include/osreldate.h	2003-08-21 14:47:08 +00:00
Robert Watson	c096756c00	Add mac_check_vnode_deleteextattr() and mac_check_vnode_listextattr(): explicit access control checks to delete and list extended attributes on a vnode, rather than implicitly combining with the setextattr and getextattr checks. This reflects EA API changes in the kernel made recently, including the move to explicit VOP's for both of these operations. Obtained from: TrustedBSD PRoject Sponsored by: DARPA, Network Associates Laboratories	2003-08-21 13:53:01 +00:00
Robert Watson	8d8d5ea8f2	Remove about 40 lines of #ifdef/#endif by using new macros MAC_DEBUG_COUNTER_INC() and MAC_DEBUG_COUNTER_DEC() to maintain debugging counter values rather than #ifdef'ing the atomic operations to MAC_DEBUG. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-08-20 19:16:49 +00:00
Warner Losh	c1cccd1ea6	bde made a number of suggested improvements to the code. This commit represents the pruely stylistic changes and should have no net impact on the rest of the code. bde's more substantive changes will follow in a separate commit once we've come to closure on them. Submitted by: bde	2003-08-20 19:12:46 +00:00
Warner Losh	45cc9f5f4f	Fix an extreme edge case in leap second handling. We need to call ntp_update_second twice when we have a large step in case that step goes across a scheduled leap second. The only way this could happen would be if we didn't call tc_windup over the end of day on the day of a leap second, which would only happen if timeouts were delayed for seconds. While it is an edge case, it is an important one to get right for my employer. Sponsored by: Timing Solutions Corporation	2003-08-20 05:34:27 +00:00
Sam Leffler	c06eb4e293	Change instances of callout_init that specify MPSAFE behaviour to use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.	2003-08-19 17:51:11 +00:00
Poul-Henning Kamp	037c3d0fb0	It is not an error to have no devices in the kernel: Return the generation number and start it from one instead of zero.	2003-08-17 12:06:19 +00:00
Bosko Milekic	b618bba486	Use constants less throughout the code and instead use the objsize variable. This makes changing the size of an mbuf or cluster for testing/debugging/whatever purposes easier. Submitted by: sam	2003-08-16 19:48:52 +00:00
Marcel Moolenaar	26502503e5	Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to cpu.h. This affects db_command.c and kern_shutdown.c. ia64: move all MD prototypes from cpu.h to md_var.h. This affects madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory(). It's not used (vm_machdep.c). alpha: the MD prototypes have been left in cpu.h with a comment that they should be there. Moving them is left for later. It was expected that the impact would be significant enough to be done in a seperate commit. powerpc: MD prototypes left in cpu.h. Comment added. Suggested by: bde Tested with: make universe (pc98 incomplete)	2003-08-16 16:57:57 +00:00
Poul-Henning Kamp	78a49a45bc	Give timecounters a numeric quality field. A timecounter will be selected when registered if its quality is not negative and no less than the current timecounters. Add a sysctl to report all available timecounters and their qualities. Give the dummy timecounter a solid negative quality of minus a million. Give the i8254 zero and the ACPI 1000. The TSC gets 800, unless APM or SMP forces it negative. Other timecounters default to zero quality and thereby retain current selection behaviour.	2003-08-16 08:23:53 +00:00
John Baldwin	70fca4277e	- Various style fixes in both code and comments. - Update some stale comments. - Sort a couple of includes. - Only set 'newcpu' in updatepri() if we use it. - No functional changes. Obtained from: bde (via an old diff I got a long time ago)	2003-08-15 21:29:06 +00:00
Marcel Moolenaar	1c843354aa	Add or finish support for machine dependent ptrace requests. When we check for permissions, do it for all requests, not the known requests. Later when we actually service the request we deal with the invalid requests we previously caught earlier. This commit changes the behaviour of the ptrace(2) interface for boundary cases such as an unknown request without proper permissions. Previously we would return EINVAL. Now we return EBUSY or EPERM. Platforms need to define __HAVE_PTRACE_MACHDEP when they have MD requests. This makes the prototype of cpu_ptrace() visible and introduces a call to this function for all requests greater or equal to PT_FIRSTMACH. Silence on: audit	2003-08-15 05:25:06 +00:00
John-Mark Gurney	fc8684cd46	if we got this far, we definately don't have an EBADF. Return a more sane result of EPIPE. Reported by: nCircle dev team MFC after: 3 day	2003-08-15 04:31:01 +00:00
Cameron Grant	828447e0ca	add a read-only sysctl to display the number of entries in the fixed size kobj global method table; also kassert that the table has not overflowed when defining a new method. there are indications that the table is being overflowed in certain situations as we gain more kobj consumers- this will allow us to check whether kobj is at fault. symptoms would be incorrect methods being called.	2003-08-14 21:16:46 +00:00
Peter Grehan	eac100658a	Update powerpc to use the (old thread,new thread) calling convention for cpu_throw() and cpu_switch().	2003-08-14 03:56:24 +00:00
Alan Cox	77685ea594	- The vm_object pointer in pipe_buffer is unused. Remove it. - Check for successful initialization of pipe_zone in pipeinit() rather than every call to pipe(2).	2003-08-13 20:01:38 +00:00
Warner Losh	06b4bf3e55	Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's copyrighted files. Approved by: Matt Dillon	2003-08-12 23:24:05 +00:00
Maxime Henrion	affd4332fd	Remove extra space.	2003-08-12 20:34:31 +00:00
John Baldwin	e9911cf591	- Convert Alpha over to the new calling conventions for cpu_throw() and cpu_switch() where both the old and new threads are passed in as arguments. Only powerpc uses the old conventions now. - Update comments in the Alpha swtch.s to reflect KSE changes. Tested by: obrien, marcel	2003-08-12 19:33:36 +00:00
Alan Cox	ad8204e3f5	Pipespace() no longer requires Giant.	2003-08-11 22:23:25 +00:00
Alexander Kabaev	660ebf0ef2	Drop Giant in recvit before returning an error to the caller to avoid leaking the Giant on the syscall exit.	2003-08-11 19:37:11 +00:00
Bruce M Simpson	abd498aa71	Add the mlockall() and munlockall() system calls. - All those diffs to syscalls.master for each architecture are necessary. This needed clarification; the stub code generation for mlockall() was disabled, which would prevent applications from linking to this API (suggested by mux) - Giant has been quoshed. It is no longer held by the code, as the required locking has been pushed down within vm_map.c. - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES to express their intention explicitly. - Inspected at the vmstat, top and vm pager sysctl stats level. Paging-in activity is occurring correctly, using a test harness. - The RES size for a process may appear to be greater than its SIZE. This is believed to be due to mappings of the same shared library page being wired twice. Further exploration is needed. - Believed to back out of allocations and locks correctly (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC). PR: kern/43426, standards/54223 Reviewed by: jake, alc Approved by: jake (mentor) MFC after: 2 weeks	2003-08-11 07:14:08 +00:00
Mike Silbersack	cebde06978	More pipe changes: From alc: Move pageable pipe memory to a seperate kernel submap to avoid awkward vm map interlocking issues. (Bad explanation provided by me.) From me: Rework pipespace accounting code to handle this new layout, and adjust our default values to account for the fact that we now have a solid limit on allocations. Also, remove the "maxpipes" limit, as it no longer has a purpose. (The limit on kva usage solves the problem of having two many pipes.)	2003-08-11 05:51:51 +00:00
Alan Cox	f9999c67be	Use vm_page_hold() instead of vm_page_wire(). Otherwise, a multithreaded application could cause a wired page to be freed. In general, vm_page_hold() should be preferred for ephemeral kernel mappings of pages borrowed from a user-level address space. (vm_page_wire() should really be reserved for indefinite duration pinning by the "owner" of the page.) Discussed with: silby Submitted by: tegge	2003-08-11 00:17:44 +00:00
Jacques Vidrine	41b3077a6c	panic() if we try to handle an out-of-range signal number in psignal()/tdsignal(). The test was historically in psignal(). It was changed into a KASSERT, and then later moved to tdsignal() when the latter was introduced. Reviewed by: iedowse, jhb	2003-08-10 23:05:37 +00:00
Jacques Vidrine	007e25d95a	Add or correct range checking of signal numbers in system calls and ioctls. In the particular case of ptrace(), this commit more-or-less reverts revision 1.53 of sys_process.c, which appears to have been erroneous. Reviewed by: iedowse, jhb	2003-08-10 23:04:55 +00:00
Alan Cox	c6eb850aac	Background: When proc_rwmem() wired and mapped a page, it also added a reference to the containing object. The purpose of the reference being to prevent the destruction of the object and an attempt to free the wired page. (Wired pages can't be freed.) Unfortunately, this approach does not work. Some operations, like fork(2) that call vm_object_split(), can move the wired page to a difference object, thereby making the reference pointless and opening the possibility of the wired page being freed. A solution is to use vm_page_hold() in place of vm_page_wire(). Held pages can be freed. They are moved to a special hold queue until the hold is released. Submitted by: tegge	2003-08-09 18:01:19 +00:00
Alan Cox	9c62fce085	- Remove GIANT_REQUIRED from pipespace(). - Remove a duplicate initialization from pipe_create().	2003-08-08 22:38:15 +00:00
Daniel Eischen	ab908f5935	Copyin the thread mailbox flags from the correct location in the mailbox.	2003-08-08 20:23:10 +00:00
John Baldwin	b2db7dc658	td_dupfd just needs to be less than 0, it does not have to hold the negative value of the index of the new file, so just use -1.	2003-08-07 17:08:26 +00:00
Jacques Vidrine	01b9dc96e3	Update some argument-documenting comments to match reality. Add an explicit range check to those same arguments to reduce risk of cardiac arrest in future code readers.	2003-08-07 16:42:27 +00:00
John Baldwin	8b149b5131	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
John Baldwin	277576de43	The ktrace mutex does not need to be locked around the post of the ktrace semaphore and doing so can lead to a possible reversal. WITNESS would have caught this if semaphores were used more often in the kernel. Submitted by: Ted Unangst <tedu@stanford.edu>, Dawson Engler	2003-08-07 13:58:13 +00:00
Alan Cox	f9b1de367e	- Remove GIANT_REQUIRED from pipe_free_kmem(). - Remove the acquisition and release of Giant around pipe_kmem_free() and uma_zfree() in pipeclose().	2003-08-07 04:32:40 +00:00
Yaroslav Tykhiy	b81694ed13	If connect(2) has been interrupted by a signal and therefore the connection is to be established asynchronously, behave as in the case of non-blocking mode: - keep the SS_ISCONNECTING bit set thus indicating that the connection establishment is in progress, which is the case (clearing the bit in this case was just a bug); - return EALREADY, instead of the confusing and unreasonable EADDRINUSE, upon further connect(2) attempts on this socket until the connection is established (this also brings our connect(2) into accord with IEEE Std 1003.1.)	2003-08-06 14:04:47 +00:00
David Xu	75ea65e3a2	kse.h is not needed for these files.	2003-08-05 12:08:49 +00:00
David Xu	d3b5e418bc	Introduce a thread mailbox flag TMF_NOUPCALL. On some architectures other than i386 or AMD64, TP register points to thread mailbox, and they can not atomically clear km_curthread in kse mailbox, in this case, thread retrieves its thread pointer from TP register and sets flag TMF_NOUPCALL in its thread mailbox to indicate a critical region.	2003-08-05 12:00:55 +00:00
Jeffrey Hsu	cc3426866c	Make the second argument to sooptcopyout() constant in order to simplify the upcoming PIM patches. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2003-08-05 00:27:54 +00:00
Ian Dowse	76bd23557b	In the mknod(), mkfifo(), link(), symlink() and undelete() syscalls, use vrele() instead of vput() on the parent directory vnode returned by namei() in the case where it is equal to the target vnode. This handles namei()'s somewhat strange (but documented) behaviour of not locking either vnode when the two vnodes are equal and LOCKPARENT but not LOCKLEAF is specified. Note that since a vnode double-unlock is not currently fatal, these coding errors were effectively harmless. Spotted by: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de> Reviewed by: mckusick	2003-08-05 00:26:51 +00:00
David Malone	d2cce3d6e8	Do some minor Giant pushdown made possible by copyin, fget, fdrop, malloc and mbuf allocation all not requiring Giant. 1) ostat, fstat and nfstat don't need Giant until they call fo_stat. 2) accept can copyin the address length without grabbing Giant. 3) sendit doesn't need Giant, so don't bother grabbing it until kern_sendit. 4) move Giant grabbing from each indivitual recv* syscall to recvit.	2003-08-04 21:28:57 +00:00
John Baldwin	bfe6598264	Adjust a comment to remove staleness and take slightly less implementation specific perspective.	2003-08-04 20:35:13 +00:00
John Baldwin	139b7550d9	Set td_critnest to 1 when setting up a thread since it is a MI field with MI values. This ensures that td_critnest for a newly fork'd thread is always valid. Requested by: bde (a long time ago)	2003-08-04 20:28:20 +00:00
John Baldwin	b35b737201	Insert cosmetic spaces. Reported by: kris	2003-08-04 19:24:25 +00:00
Robert Watson	60bdc14e90	Move more ACL logic from the UFS code (ufs_acl.c) to the central POSIX.1e support routines in kern_acl.c: - Define ACL_OVERRIDE_MASK and ACL_PRESERVE_MASK centrally in acl.h: the mode bits that are (and aren't) stored in the ACL. - Add acl_posix1e_acl_to_mode(): given a POSIX.1e extended ACL, generate a compatibility mode (only the bits supported by the POSIX.1e ACL). - acl_posix1e_newfilemode(): Given a requested creation mode and default ACL, calculate the mode for the new file system object (only the bits supported by the POSIX.1e ACL). PR: 50148 Reported by: Ritz, Bruno <bruno_ritz@gmx.ch> Obtained from: TrustedBSD Project	2003-08-04 02:13:05 +00:00
John Baldwin	80c09f69b0	Both 'c' an 'lines' are unused, the bogus init of lines was accidentally left behind.	2003-08-02 17:35:00 +00:00
Alan Cox	884962ae4e	Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in proc_rwmem(). See revision 1.140 of kern/sys_pipe.c for a detailed rationale. Submitted by: tegge	2003-08-02 17:08:21 +00:00
Poul-Henning Kamp	4bfd22f25e	Grab Giant in bufdonebio() since drivers may not hold it. This only protects the "struct buf" consumers (ie: DEV_STRATEGY()), but does not protect BIO_STRATEGY() users.	2003-08-02 09:45:10 +00:00
Poul-Henning Kamp	f7e56e489d	Grab Giant in physio() since non-giant drivers are starting to appear.	2003-08-02 09:40:53 +00:00
Alan Cox	105660e8ba	Eliminate an abuse of kmem_alloc_pageable() in bufinit() by using VM_ALLOC_NOOBJ to allocate the bogus page. Reviewed by: tegge	2003-08-02 05:05:34 +00:00
Alan Cox	efd02757c2	Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in sf_buf_init(). (See revision 1.140 of kern/sys_pipe.c for a detailed rationale.) Submitted by: tegge	2003-08-02 04:18:56 +00:00
David E. O'Brien	05a1bfa142	Fix kernel build -- 'c' was the unused var, not 'lines'.	2003-08-01 17:00:49 +00:00
Robert Watson	19c3e120f0	Attempt to simplify #ifdef logic for MAC_ALWAYS_LABEL_MBUF. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-08-01 15:45:14 +00:00
Alan Cox	882d8469af	Remove Giant from writev(2). Eliminate trivial style differences between writev(2) and readv(2).	2003-08-01 02:21:54 +00:00
John Baldwin	4110951861	If a spin lock is held for too long and WITNESS is enabled, then call witness_display_spinlock() to see if we can find out where the current owner of the spin lock last acquired the lock.	2003-07-31 18:52:18 +00:00
John Baldwin	1beccae67c	Add a new function to look for a spinlock's instance when it is held by another thread. We use the td_oncpu member of the other field to locate it's associated CPU and then search the that CPU's list of spin locks contained in its per-CPU data. This is not always safe and may in fact panic or just not work, but it is useful in at least one case.	2003-07-31 18:50:58 +00:00
John Baldwin	3f2a1b0656	Update the 'ps', 'show pci', and 'show ktr' ddb commands to use the new pager callout instead of homerolling their own paging facility.	2003-07-31 17:29:42 +00:00
Peter Wemm	aeaead20b8	When ktracing context switches, make sure we record involuntary switches. Otherwise, when we get a evicted from the cpu, there is no record of it. This is not a default ktrace flag.	2003-07-31 01:36:24 +00:00
David Xu	1fc434dc9a	Use correct signal when calling sigexit.	2003-07-30 23:11:37 +00:00
Pierre Beyssac	ae9fcf4c66	Remove test in pipe_write() which causes write(2) to return EAGAIN on a non-blocking pipe in cases where select(2) returns the file descriptor as ready for write. This in turns causes libc_r, for one, to busy wait in such cases. Note: it is a quick performance fix, a more complex fix might be required in case this turns out to have unexpected side effects. Reviewed by: silby MFC after: 3 days	2003-07-30 22:50:37 +00:00
John Baldwin	47b722c1af	When complaining about a sleeping thread owning a mutex, display the thread's pid to make debugging easier for people who don't want to have to use the intended tool for these panics (witness). Indirectly prodded by: kris	2003-07-30 20:42:15 +00:00
Alan Cox	93b4c5b707	The introduction of vm object locking has caused witness to reveal a long-standing mistake in the way a portion of a pipe's KVA is allocated. Specifically, kmem_alloc_pageable() is inappropriate for use in the "direct" case because it allows a preceding vm map entry and vm object to be extended to support the new KVA allocation. However, the direct case KVA allocation should not have a backing vm object. This is corrected by using kmem_alloc_nofault(). Submitted by: tegge (with the above explanation by me)	2003-07-30 18:55:04 +00:00
Alan Cox	fbe1bdddcc	Revision 1.51 of vm/uma_core.c modified uma_large_free() to acquire Giant when needed. So, don't do it here.	2003-07-29 05:23:19 +00:00
Robert Watson	9080ff25cf	Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the kernel ACL interfaces and system call names. Break out UFS2 and FFS extattr delete and list vnode operations from setextattr and getextattr to deleteextattr and listextattr, which cleans up the implementations, and makes the results more readable, and makes the APIs more clear. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-07-28 18:53:29 +00:00
Robert Watson	2e4a71cdb1	When exporting file descriptor data for threads invoking the kern.file sysctl, don't return information about processes that fail p_cansee(td, p). This prevents sockstat and related programs from seeing file descriptors owned by processes not in the same jail as the thread, as well as having implications for MAC, etc. This is a partial solution: it permits an information leak about the number of descriptors in the sizing calculation (but this is not new information, you can also get it from kern.openfiles), and doesn't attempt to mask file descriptors based on the properties of the descriptor, only the process referencing it. However, it provides most of what you want under most circumstances, without complicating the locking. PR: 54211 Based on a patch submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>	2003-07-28 16:03:53 +00:00
Poul-Henning Kamp	cf7742997a	Pass the file descriptor index down to vn_open. If the method vector was replaced and we got the "special return code" smile and trust that whatever happened below DTRT.	2003-07-27 20:09:13 +00:00
Poul-Henning Kamp	3ab6b09c53	Pass the fdidx argument from vn_open{_cred}() onto VOP_OPEN()	2003-07-27 20:05:36 +00:00
Poul-Henning Kamp	7c89f162bc	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.	2003-07-27 17:04:56 +00:00
Poul-Henning Kamp	1b6c609507	Call the new argument "fdidx" that is more precise than "fd".	2003-07-27 17:03:20 +00:00
David Malone	e41cbeba6d	Now that we can call kmem_malloc without Giant it should be safe to do mbuf allocation without Giant, so remove the GIANT_REQUIRED from mb_alloc in the M_TRYWAIT case.	2003-07-27 14:19:23 +00:00
Poul-Henning Kamp	a8d43c90af	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.	2003-07-26 07:32:23 +00:00
Scott Long	c43cad1ac1	Guard against MLEN growing larger than a uint8_t due to MSIZE grwoing to a value of 512 in LINT. This keeps gcc from complaining.	2003-07-26 07:23:24 +00:00
Alan Cox	18e8d4e79c	revision 1.51 of vm/uma_core.c modified uma_large_malloc() to acquire Giant when needed.	2003-07-25 22:26:43 +00:00
Mike Makonnen	a6ca48085c	The POSIX spec also requires that kern_sigtimedwait return EINVAL if tv_nsec of the timeout is less than zero.	2003-07-24 17:07:17 +00:00
Peter Wemm	80611144e4	Initialize 'blocked' to NULL. I think this was a real problem, but I am not sure about that. The lack of -Werror and the inline noise hid this for a while.	2003-07-23 20:29:13 +00:00
Poul-Henning Kamp	68f2d20b70	Revert stuff which accidentally ended up in the previous commit.	2003-07-22 10:36:36 +00:00
Poul-Henning Kamp	55d1d7034f	Don't attempt to inline large functions mb_alloc() and mb_free(), it more than doubles the text size of this file. GCC has wisely ignored us on this previously	2003-07-22 10:24:41 +00:00
David Xu	432b45de08	Always deliver synchronous signal to UTS for SA threads.	2003-07-21 00:26:52 +00:00
Mike Makonnen	6022ec6737	Turn a KASSERT back into an EINVAL return value. So, next time someone comes across it, it will turn into a core dump in userland instead of a kernel panic. I had also inverted the sense of the test, so Double pointy hat to: mtm	2003-07-19 11:32:48 +00:00
Mike Silbersack	f8bf8e397b	Three fixes: - Make m_prepend use m_gethdr instead of m_get where appropriate - Make m_copym use m_gethdr instead of m_get where appropriate - Add a call to m_fixhdr in m_defrag; m_defrag can't deal with corrupted pkthdr.len counts. MFC after: 3 days	2003-07-19 06:03:48 +00:00
Mike Makonnen	5c6edbec80	Remove a lock held across casuptr() that snuck in last commit.	2003-07-18 21:26:45 +00:00
Mike Makonnen	7df7f5c5ab	Move the decision on whether to unset the contested bit or not from lock to unlock time. Suggested by: jhb	2003-07-18 17:58:37 +00:00
Robert Drehmel	4e19fe1081	To avoid a kernel panic provoked by a NULL pointer dereference, do not clear the `sb_sel' member of the sockbuf structure while invalidating the receive sockbuf in sorflush(), called from soshutdown(). The panic was reproduceable from user land by attaching a knote with EVFILT_READ filters to a socket, disabling further reads from it using shutdown(2), and then closing it. knote_remove() was called to remove all knotes from the socket file descriptor by detaching each using its associated filterops' detach call- back function, sordetach() in this case, which tried to remove itself from the invalidated sockbuf's klist (sb_sel.si_note). PR: kern/54331	2003-07-17 23:49:10 +00:00
David Xu	3074d1b454	Fix sigwait to conform to POSIX. When a signal is being delivered to process, first find a sigwait thread to deliver, POSIX's argument is speed of delivering signal to sigwait thread is faster than other ways. A signal in its wait set will cause sigwait to return the signal number, a signal not in its wait set but in not blocked by the thread also causes sigwait to return, but sigwait returns EINTR, sigwait is oneshot operation, only one signal can be delivered to its wait set, when a signal is delivered to the sigwait thread, the thread's sigwait state is canceled.	2003-07-17 22:52:55 +00:00
David Xu	dd7da9aa28	o Refine kse_thr_interrupt to allow it to handle different commands. o Remove TDF_NOSIGPOST. o Add a member td_waitset to proc structure, it will be used for sigwait. Tested by: deischen	2003-07-17 22:45:33 +00:00
Robert Drehmel	e76bad968c	Correct six return statements which returned zero instead of an appropriate error number after a failure condition. In particular, three of the changed statements return ESRCH for a failed pfind(), and in also three places a non-zero return from p_cansee() will be passed back, Also noticed by: rwatson	2003-07-17 22:44:41 +00:00
Mike Makonnen	994599d782	Fix umtx locking, for libthr, in the kernel. 1. There was a race condition between a thread unlocking a umtx and the thread contesting it. If the unlocking thread won the race it may try to wakeup a thread that was not yet in msleep(). The contesting thread would then go to sleep to await a wakeup that would never come. It's not possible to close the race by using a lock because calls to casuptr() may have to fault a page in from swap. Instead, the race was closed by introducing a flag that the unlocking thread will set when waking up a thread. The contesting thread will check for this flag before going to sleep. For now the flag is kept in td_flags, but it may be better to use some other member or create a new one because of the possible performance/contention issues of having to own sched_lock. Thanks to jhb for pointing me in the right direction on this one. 2. Once a umtx was contested all future locks and unlocks were happening in the kernel, regardless of whether it was contested or not. To prevent this from happening, when a thread locks a umtx it checks the queue for that umtx and unsets the contested bit if there are no other threads waiting on it. Again, this is slightly more complicated than it needs to be because we can't hold a lock across casuptr(). So, the thread has to check the queue again after unseting the bit, and reset the contested bit if it finds that another thread has put itself on the queue in the mean time. 3. Remove the if... block for unlocking an uncontested umtx, and replace it with a KASSERT. The _only_ time a thread should be unlocking a umtx in the kernel is if it is contested.	2003-07-17 11:06:40 +00:00
Bosko Milekic	48719ca7c8	Change the style of the english used to print accounting enabled and disabled. This means no period at the end and changing "Process accounting <foo>" to "Accounting <foo>". Pointed out by: bde	2003-07-16 13:20:10 +00:00
Bosko Milekic	d2dbf5bc0b	Log process accounting activation/deactivation. Useful for some auditing purposes. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/54529	2003-07-16 03:59:50 +00:00
Don Lewis	6ff1481d5c	Rearrange the SYSINIT order to call lockmgr_init() earlier so that the runtime lockmgr initialization code in lockinit() can be eliminated. Reviewed by: jhb	2003-07-16 01:00:39 +00:00
David Xu	af161f2232	If initial thread is still a bound thread, don't change its signal mask.	2003-07-15 14:04:38 +00:00
Hartmut Brandt	7e9024cdd9	Add a facility for devices, specifically network interfaces, that require large to huge amounts of small or medium sized receive buffers. The problem with these situations is that they eat up the available DMA address space very quickly when using mbufs or even mbuf clusters. Additionally this facility provides a direct mapping between 32-bit integers and these buffers. This is needed for devices originally designed for 32-bit systems. Ususally the virtual address of the buffer is used as a handle to find the buffer as soon as it is returned by the card. This does not work for 64-bit machines and hence this mapping is needed.	2003-07-15 08:59:38 +00:00
David Xu	4b7d5d84ee	Rename thread_siginfo to cpu_thread_siginfo	2003-07-15 04:26:26 +00:00
Jeffrey Hsu	330841c763	Rev 1.121 meant to pass the value 1 to soalloc() to indicate waitok. Reported by: arr	2003-07-14 20:39:22 +00:00
Don Lewis	857d9c60d0	Extend the mutex pool implementation to permit the creation and use of multiple mutex pools with different options and sizes. Mutex pools can be created with either the default sleep mutexes or with spin mutexes. A dynamically created mutex pool can now be destroyed if it is no longer needed. Create two pools by default, one that matches the existing pool that uses the MTX_NOWITNESS option that should be used for building higher level locks, and a new pool with witness checking enabled. Modify the users of the existing mutex pool to use the appropriate pool in the new implementation. Reviewed by: jhb	2003-07-13 01:22:21 +00:00
Robert Drehmel	baf731e6ed	Make the system call vector name of a process accessible to user land applications by introducing the KERN_PROC_SV_NAME sysctl node, which is searchable by PID.	2003-07-12 02:00:16 +00:00
David Xu	ffb2e92a98	If a thread is sending signal to its process, if the thread can handle the signal itself, it should get it without looking for other threads.	2003-07-11 13:42:23 +00:00
Mike Silbersack	347194c172	Add init_param3() to subr_param. This function is called immediately after the kernel map has been sized, and is the optimal place for the autosizing of memory allocations which occur within the kernel map to occur. Suggested by: bde	2003-07-11 00:01:03 +00:00
Peter Wemm	e95babf3a8	unifdef -DLAZY_SWITCH and start to tidy up the associated glue.	2003-07-10 01:02:59 +00:00
Mike Silbersack	ff56f15e26	A few minor changes: - Use atomic ops to update the bigpipe count - Make the bigpipe count sysctl readable - Remove a duplicate comparison in an if statement - Comment two SYSCTLs.	2003-07-09 21:59:48 +00:00
Mike Silbersack	41f16f8208	Pull in the entire kmem_map size calculation from kern_malloc, rather than the shortcircuited version I had been using, which only worked properly on i386 & amd64. Also, change an autoscale constant to account for the more correct kmem_map size. Problem noticed by: mux	2003-07-08 18:59:21 +00:00
Jeff Roberson	0c0a98b231	- When stealing a kse in kseq_move() ignore the current kseq's min nice value. We want to steal any thread, even one that is not given a slice on its current queue.	2003-07-08 06:19:40 +00:00
Mike Silbersack	289016f2d1	Put some concrete limits on pipe memory consumption: - Limit the total number of pipes so that we do not exhaust all vm objects in the kernel map. When this limit is reached, a ratelimited message will be printed to the console. - Put a soft limit on the amount of memory consumable by pipes. Once the limit has been reached, all new pipes will be limited to 4K in size, rather than the default of 16K. - Put a limit on the number of pages that may be used for high speed page flipping in order to reduce the amount of wired memory. Pipe writes that occur while this limit is exceeded will fall back to non-page flipping mode. The above values are auto-tuned in subr_param.c and are scaled to take into account both the size of physical memory and the size of the kernel map. These limits help to reduce the "kernel resources exhausted" panics that could be caused by opening a large number of pipes. (Pipes alone are no longer able to exhaust all resources, but other kernel memory hogs in league with pipes may still be able to do so.) PR: 53627 Ideas / comments from: hsu, tjr, dillon@apollo.backplane.com MFC after: 1 week	2003-07-08 04:02:31 +00:00
Jeff Roberson	0ec896fd28	- Clean up an unused variable. Submitted by: Steve Kargl <skg@routmask.apl.washington.edu>	2003-07-07 21:08:28 +00:00
Mike Makonnen	14b5ae1a98	Make the conditional, which decides what siglist to put a signal on, more concise and improve the comment. Submitted by: bde	2003-07-05 08:37:40 +00:00
Mike Makonnen	e55c35c433	I was so happy I found the semi-colon from hell that I didn't notice another typo in the same line. This typo makes libthr unuseable, but it's effects where counter-balanced by the extra semicolon, which made libthr remarkably useable for the past several months.	2003-07-04 23:28:42 +00:00
Jeff Roberson	749d01b011	- Parse the cpu topology map in sched_setup(). - Associate logical CPUs on the same physical core with the same kseq. - Adjust code that assumed there would only be one running thread in any kseq. - Wrap the HTT code with a ULE_HTT_EXPERIMENTAL ifdef. This is a start towards HyperThreading support but it isn't quite there yet.	2003-07-04 19:59:00 +00:00
Poul-Henning Kamp	1226914c17	Use the f_vnode field to tell which file descriptors have a vnode.	2003-07-04 12:20:27 +00:00
Mike Makonnen	1069e3a6f4	It's unfair how one extraneous semi-colon can cause so much grief.	2003-07-04 11:18:07 +00:00
Mike Makonnen	71cfaac0b0	style(9) o Remove double-spacing, and while I'm here add a couple of braces as well. Requested by: bde	2003-07-04 06:59:28 +00:00
Olivier Houchard	a10d5f02c8	In setpgrp(), don't assume a pgrp won't exist if the provided pgid is the same as the target process' pid, it may exist if the process forked before leaving the pgrp. Thix fixes a panic that happens when calling setpgid to make a process re-enter the pgrp with the same pgid as its pid if the pgrp still exists.	2003-07-04 02:21:28 +00:00
Mike Makonnen	8689793bfb	kse_thr_interrupt should target the thread, specifically. Requested by: davidxu	2003-07-04 01:41:32 +00:00
Mike Makonnen	c197abc49a	Signals sent specifically to a particular thread must be delivered to that thread, regardless of whether it has it masked or not. Previously, if the targeted thread had the signal masked, it would be put on the processes' siglist. If another thread has the signal umasked or unmasks it before the target, then the thread it was intended for would never receive it. This patch attempts to solve the problem by requiring callers of tdsignal() to say whether the signal is for the thread or for the process. If it is for the process, then normal processing occurs and any thread that has it unmasked can receive it. But if it is destined for a specific thread, it is put on that thread's pending list regardless of whether it is currently masked or not. The new behaviour still needs more work, though. If the signal is reposted for some reason it is always posted back to the thread that handled it because the information regarding the target of the signal has been lost by then. Reviewed by: jdp, jeff, bde (style)	2003-07-03 19:09:59 +00:00
John Baldwin	f7ee15901a	- Add comments about the maintenance of the per-thread list of contested locks held by each thread. - Fix a bug in the original BSD/OS code where a contested lock was not properly handed off from the old thread to the new thread when a contested lock with more than one blocked thread was transferred from one thread to another. - Don't use an atomic operation to write the MTX_CONTESTED value to mtx_lock in the aforementioned special case. The memory barriers and exclusion provided by sched_lock are sufficient. Spotted by: alc (2)	2003-07-02 16:14:09 +00:00
John Baldwin	6591b31040	Add a resource_disabled() helper function that returns true (non-zero) if a specified resource has been disabled via a non-zero 'disabled' hint and false otherwise.	2003-07-02 16:01:38 +00:00
Poul-Henning Kamp	d94e36521e	typo fix in comment.	2003-07-02 08:01:52 +00:00
David Xu	34178711be	Allow SA process unblocks a thread blocked in condition variable. Reviewed by: deischen	2003-07-02 01:19:15 +00:00
Ian Dowse	318f2fb4bf	Add a new mount flag MNT_BYFSID that can be used to unmount a file system by specifying the file system ID instead of a path. Use this by default in umount(8). This avoids the need to perform any vnode operations to look up the mount point, so it makes it possible to unmount a file system whose root vnode cannot be looked up (e.g. due to a dead NFS server, or a file system that has become detached from the hierarchy because an underlying file system was unmounted). It also provides an unambiguous way to specify which file system is to be unmunted. Since the ability to unmount using a path name is retained only for compatibility, that case now just uses a simple string comparison of the supplied path against f_mntonname of each mounted file system. Discussed on: freebsd-arch mdoc help from: ru	2003-07-01 17:40:23 +00:00
Scott Long	79501b66a7	Make swi_vm be INTR_MPSAFE. On all platforms, it is only used to activate busdma_swi(). Now that busdma_swi() uses driver-provided locking, this should be safe.	2003-07-01 16:00:38 +00:00
David Xu	df9c6cda37	Fix typo.	2003-06-30 10:04:04 +00:00
Marcel Moolenaar	4e4422d4d4	Don't use fuword() and suword() on struct members of type int. This happens to work on 32-bit platforms as sizeof(long)=sizeof(int), but wrecks all kinds of havoc (garbage reads, corrupting writes and misaligned loads/stores) on 64-bit architectures. The fix for now is to use fuword32() and suword32() and change the type of the applicable int fields to int32. This is to make it explicit that we depend on these fields being 32-bit. We may want to revisit this later. Reviewed by: deischen	2003-06-28 19:45:15 +00:00
Jeff Roberson	7a20304f84	- Don't migrate to stopped cpus.	2003-06-28 09:09:33 +00:00
David Xu	9dde3bc999	o Change kse_thr_interrupt to allow send a signal to a specified thread, or unblock a thread in kernel, and allow UTS to specify whether syscall should be restarted. o Add ability for UTS to monitor signal comes in and removed from process, the flag PS_SIGEVENT is used to indicate the events. o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with this flag set to wait for above signal event. o For SA based thread, kernel masks all signal in its signal mask, let UTS to use kse_thr_interrupt interrupt a thread, and install a signal frame in userland for the thread. o Add a tm_syncsig in thread mailbox, when a hardware trap occurs, it is used to deliver synchronous signal to userland, and upcall is schedule, so UTS can process the synchronous signal for the thread. Reviewed by: julian (mentor)	2003-06-28 08:29:05 +00:00
Jeff Roberson	86f8ae9663	- If smp is not started yet don't try to load balance or we'll put threads on cpus that aren't running yet.	2003-06-28 08:24:42 +00:00
David Xu	418228df24	Fix POSIX compatible bug for sigwaitinfo and sigtimedwait. POSIX says siginfo pointer parameter can be NULL and if the function success, it should return signal number but not zero. The waitset it past should be negatived before it can be used as thread signal mask.	2003-06-28 08:03:28 +00:00
Jeff Roberson	a91172ade1	- Throttle the inherited sleep and run time in sched_fork_kseg(). This allows us to learn the behavior of a thread much more quickly after it starts up.	2003-06-28 06:19:56 +00:00
Jeff Roberson	e493a5d90c	- Adjust the default maximum slice value to ~140ms. This has improved the nice distribution without significantly impacting interactive response. As a side effect it should also allow batch processes to run for a slightly longer period which will positively impact their performance.	2003-06-28 06:04:47 +00:00
Peter Wemm	eabd19726f	Tidy up leftover lazy_switch instrumentation that is no longer needed. This cleans up some #ifdef hell.	2003-06-27 22:39:14 +00:00
Sean Kelly	6cda41555b	Fix this to build on alpha. Build test successful. Suggested fix from: tjr	2003-06-27 08:35:05 +00:00
Sean Kelly	370c3cb57c	- Add a software watchdog facility. This commit has two pieces. One half is the watchdog kernel code which lives primarily in hardclock() in sys/kern/kern_clock.c. The other half is a userland daemon which, when run, will keep the watchdog from firing while the userland is intact and functioning. Approved by: jeff (mentor)	2003-06-26 09:50:52 +00:00
Warner Losh	4f2073fb4c	Fix leap second processing by the kernel time keeping routines. Before, we would add/subtract the leap second when the system had been up for an even multiple of days, rather than at the end of the day, as a leap second is defined (at least wrt ntp). We do this by calculating the notion of UTC earlier in the loop, and passing that to get it adjusted. Any adjustments that ntp_update_second makes to this time are then transferred to boot time. We can't pass it either the boot time or the uptime because their sum is what determines when a leap second is needed. This code adds an extra assignment and two extra compare in the typical case, which is as cheap as I could made it. I have confirmed with this code the kernel time does the correct thing for both positive and negative leap seconds. Since the ntp interface doesn't allow for +2 or -2, those cases can't be tested (and the folks in the know here say there will never be a +2s or -2s leap event, but rather two +1s or -1s leap events). There will very likely be no leap seconds for a while, given how the earth is speeding up and slowing down, so there will be plenty of time for this fix to propigate. UT1-UTC is currently at "about -0.4s" and decrementing by .1s every 8 months or so. 6 * 8 is 48 months, or 4 years. -stable has different code, but a similar bug that was introduced about the time of the last leap second, which is why nobody has noticed until now. MFC After: 3 weeks Reviewed by: phk "Furthermore, leap seconds must die." -- Cato the Elder	2003-06-25 21:23:51 +00:00
Warner Losh	eac3c62b51	During a positive leap second, the tai_time offset should be incremented at the start of the leap second, not after the leap second has been inserted. This is because at the start of the leap second, we set the time back one second. This setting back one second is the moment that the offset changes. The old code set it back after the leap second, but that's one second too late. The negative leap second case is handled correctly. Reviewed by: phk	2003-06-25 20:56:40 +00:00
Olivier Houchard	7f3bfd6651	At this point targp will always be NULL, so remove the useless if.	2003-06-25 13:28:32 +00:00
Warner Losh	4e82e5f6f1	Use UTC rather than GMT to describe time scale. latter is obsolete.	2003-06-23 20:14:08 +00:00
Robert Watson	f51e58036e	Redesign the externalization APIs from the MAC Framework to the MAC policy modules to improve robustness against C string bugs and vulnerabilities. Following these revisions, all string construction of labels for export to userspace (or elsewhere) is performed using the sbuf API, which prevents the consumer from having to perform laborious and intricate pointer and buffer checks. This substantially simplifies the externalization logic, both at the MAC Framework level, and in individual policies; this becomes especially useful when policies export more complex label data, such as with compartments in Biba and MLS. Bundled in here are some other minor fixes associated with externalization: including avoiding malloc while holding the process mutex in mac_lomac, and hence avoid a failure mode when printing labels during a downgrade operation due to the removal of the M_NOWAIT case. This has been running in the MAC development tree for about three weeks without problems. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-23 01:26:34 +00:00
Robert Watson	6b42f0a2eb	Prefer the vop_rmextattr() vnode operation for removing extended attributes from objects over vop_setextattr() with a NULL uio; if the file system doesn't support the vop_rmextattr() method, fall back to the vop_setextattr() method. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-22 23:03:07 +00:00
Robert Watson	77533ed2aa	Expose vop_rmextattr as an explicit operation at the vnode operation interface, rather than relying on a NULL uio for the deletion operation. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-22 22:45:24 +00:00
Robert Watson	4b090e41ff	Add an explicit credential argument to alq_open() to allow the caller to specify what credential to use when authorizing vn_open() and later write operations, rather than curthread->td_ucred. When writing KTR traces to an ALQ, specify the credential of the thread generating the sysctl request. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-22 22:28:56 +00:00
Poul-Henning Kamp	3b6d965263	Add a f_vnode field to struct file. Several of the subtypes have an associated vnode which is used for stuff like the f*() functions. By giving the vnode a speparate field, a number of checks for the specific subtype can be replaced simply with a check for f_vnode != NULL, and we can later free f_data up to subtype specific use. At this point in time, f_data still points to the vnode, so any code I might have overlooked will still work.	2003-06-22 08:41:43 +00:00
Ian Dowse	adef9265ef	When DDB is active, always send printf() output directly to the console, even if there is a TIOCCONS console tty. We were already doing this after a panic, but it's also useful when entering DDB for some other reason too.	2003-06-22 03:20:24 +00:00
Ian Dowse	d29bf12ff8	Use a new message buffer `consmsgbuf' to forward messages to a TIOCCONS console (e.g. xconsole) via a timeout routine instead of calling into the tty code directly from printf(). This fixes a number of cases where calling printf() at the wrong time (such as with locks held) would cause a panic if xconsole is running. The TIOCCONS message buffer is 8k in size by default, but this can be changed with the kern.consmsgbuf_size sysctl. By default, messages are checked for 5 times per second. The timer runs and the buffer memory remains allocated only at times when a TIOCCONS console is active. Discussed on: freebsd-arch	2003-06-22 02:54:33 +00:00
Ian Dowse	4784a46912	Replace the code for reading and writing the kernel message buffer with a new implementation that has a mostly reentrant "addchar" routine, supports multiple message buffers in the kernel, and hides the implementation details from callers. The new code uses a kind of sequence number to represend the current read and write positions in the buffer. This approach (suggested mainly by bde) permits the read and write pointers to be maintained separately, which reduces the number of atomic operations that are required. The "mostly reentrant" above refers to the way that while it is now always safe to have any number of concurrent writers, readers could see the message buffer after a writer has advanced the pointers but before it has witten the new character. Discussed on: freebsd-arch	2003-06-22 02:18:31 +00:00
Jeff Roberson	1a7a9d0ec2	- lticks was erroneously being updated in sched_pctcpu(). This was causing us to skip the pctcpu_update() call which lead to inaccurate cpu usage statistics for processes that didn't run often.	2003-06-21 02:31:49 +00:00
Jeff Roberson	665cb285a8	- Don't allow nice to have such a large effect on priority. This was causing poor interactive performance while unnice processes were running. The new scheme still allows nice to have an effect on priority but it is not as dramatic as the effect of the interactivity score.	2003-06-21 02:22:47 +00:00
Bosko Milekic	b2b417bb41	Fix a divide-by-zero on kern.log_wakeups_per_second tunable. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/53557	2003-06-20 22:18:38 +00:00
Stefan Eßer	c2ef4dd48a	Add comment about **vpp being special-cased in vnode_if.awk (1.38)	2003-06-20 12:24:06 +00:00
David Xu	ab78d4d641	cpu_set_upcall_kse needs to access userspace, release schedule lock before calling it for bound thread. To avoid this problem, change thread_schedule_upcall to not put new thread on run queue, let caller do it, so we can tweak the new thread before setting it to run. Reported by: pho	2003-06-20 09:12:12 +00:00
Poul-Henning Kamp	166400b7e6	Don't put callout_lock under #ifdef DIAGNOSTIC despite the fact that it works anyway.	2003-06-20 08:39:04 +00:00
Poul-Henning Kamp	568733688b	Initialize b_saveaddr when we hand out buffers	2003-06-20 08:26:38 +00:00
Poul-Henning Kamp	ce6912c420	Crude but efficient: #ifdef DIAGNOSTIC hold a mutex while calling callout's so that we hear about it if they sleep.	2003-06-20 08:07:15 +00:00
Poul-Henning Kamp	eaaca5deee	Don't (re)initialize f_gcflag to zero. Move initialization of DTYPE_VNODE specific field f_seqcount into the DTYPE_VNODE specific code.	2003-06-20 08:02:30 +00:00
David Xu	062cf543fc	When a STOP signal is being sent to a process, it is possible all threads in the process have already masked the signal, so job control is delayed. But later a thread unmasking the STOP signal should enable job control, so in issignal(), scanning all threads in process to see if we can direct suspend some of them, not just suspend current thread.	2003-06-20 03:36:45 +00:00
David Xu	8b56079e2b	Fix typo. td should be td0.	2003-06-20 01:56:28 +00:00
Alfred Perlstein	bab88630ba	Unlock the struct file lock before aquiring Giant, otherwise we can deadlock because of lock order reversals. This was not caught because Witness ignores pool mutexes right now. Diagnosis and help: truckman Noticed by: pho	2003-06-19 18:13:07 +00:00
Mike Silbersack	b083ea5114	Add a ratelimited message of the form "maxproc limit exceeded by uid %i, please see tuning(7) and login.conf(5)." Which will be triggered whenever a user hits his/her maxproc limit or the systemwide maxproc limit is reached. MFC after: 1 week	2003-06-19 05:57:25 +00:00
Don Lewis	6084b6c9d5	FILE_LOCK() uses a pool mutex, as does the vnode v_vnlock. Since pool mutexes are supposed to only be used as leaf mutexes, and what appear to be separate pool mutexes could be aliased together, it is bad idea for a thread to attempt to hold two pool mutexes at the same time. Slightly rearrange the code in kern_open() so that FILE_UNLOCK() is called before calling VOP_GETVOBJECT(), which will grab the v_vnlock mutex.	2003-06-19 04:10:56 +00:00

... 5 6 7 8 9 ...

7070 Commits