freebsd-nq

Author	SHA1	Message	Date
Julian Elischer	5c854accc1	Make up my mind if cpu pinning is stored in the thread structure or the scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days	2004-09-10 22:28:33 +00:00
Julian Elischer	3389af30e8	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week	2004-09-10 21:04:38 +00:00
John-Mark Gurney	ca95b2de43	remove giant required from kqueue_close.. Reported by: kuriyama MFC after: 3 days	2004-09-10 03:14:32 +00:00
Robert Watson	030c6fb156	Hard code witness lock order for BPF locks.	2004-09-09 05:01:37 +00:00
Poul-Henning Kamp	1affa3adc8	Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.	2004-09-07 09:17:05 +00:00
Julian Elischer	246409821c	fix typo MFC after: 2 days	2004-09-07 07:04:47 +00:00
Julian Elischer	5498350529	Make debug printf less threatenning and make it only print out once. MFC after: 2 days	2004-09-07 06:38:22 +00:00
Julian Elischer	a8b491c121	Give libthr a choice (per system) of scope_system or scope_thread scheduling. MFC after: 4 days	2004-09-07 06:33:39 +00:00
John-Mark Gurney	80e6bbe95b	make witness it's own sysctl branch instead of using _ to do this. I have left the old tunables in to give people a few days to transition their loader.conf and sysctl.conf's over to the new names.. MFC after: 5 days	2004-09-06 23:27:28 +00:00
John-Mark Gurney	9b90387dcf	don't call f_detach if the filter has alread removed the knote.. This happens when a proc exits, but needs to inform the user that this has happened.. This also means we can remove the check for detached from proc and sig f_detach functions as this is doing in kqueue now... MFC after: 5 days	2004-09-06 19:02:42 +00:00
Julian Elischer	6a574b2afc	Don't do IPIs on behalf of interrupt threads. just punt straight on through to teh preemption code. Make a KASSSERT out of a condition that can no longer occur. MFC after: 1 week	2004-09-06 07:23:14 +00:00
Julian Elischer	0fe38d47b7	slight code cleanup MFC after: 1 week	2004-09-05 23:23:58 +00:00
Alfred Perlstein	4c0bef6230	It's too easy to panic the machine when INVARIANTS are turned on and you botch a call to nmount(2). This is because there is an INVARIANTS check that asserts that opt->len must be zero if opt->val is not NULL. The problem is that the code does not actually follow this invariant if there is an error while processing mount options. Fix the code to honor the INVARIANT. Silence on: fs@	2004-09-05 22:24:28 +00:00
Robert Watson	76f6939888	Expand the scope of the socket buffer locks in sopoll() to include the state test as well as set, or we risk a race between a socket wakeup and registering for select() or poll() on the socket. This does increase the cost of the poll operation, but can probably be optimized some in the future. This appears to correct poll() "wedges" experienced with X11 on SMP systems with highly interactive applications, and might affect a plethora of other select() driven applications. RELENG_5 candidate. Problem reported by: Maxim Maximov <mcsi at mcsi dot pp dot ru> Debugged with help of: dwhite	2004-09-05 14:33:21 +00:00
Julian Elischer	bce73aeddb	turn on IPIs for 4bsd scheduler by default. MFC after: 1 week	2004-09-05 02:19:53 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
Julian Elischer	00b0483d5c	Don't declare a function we are not defining.	2004-09-03 09:19:49 +00:00
Julian Elischer	37c28a022b	fix compile for UP	2004-09-03 09:15:10 +00:00
Julian Elischer	293968d8d3	ooops finish last commit. moved the variables but not the declarations.	2004-09-03 08:19:31 +00:00
Julian Elischer	82a1dfc16d	Move 4bsd specific experimental IP code into the 4bsd file. Move the sysctls into kern.sched	2004-09-03 07:42:31 +00:00
Alan Cox	94ddc7076d	Push Giant deep into vm_forkproc(), acquiring it only if the process has mapped System V shared memory segments (see shmfork_myhook()) or requires the allocation of an ldt (see vm_fault_wire()).	2004-09-03 05:11:32 +00:00
Robert Watson	b6ac582880	Tag AIO as requiring Giant over the network stack using NET_NEEDS_GIANT(). RELENG_5 candidate.	2004-09-03 03:19:14 +00:00
Julian Elischer	44692526be	remove unused code MFC after: 2 days	2004-09-02 23:37:41 +00:00
Scott Long	9923b511ed	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.	2004-09-02 18:59:15 +00:00
Julian Elischer	7e37fb1729	Blush forgot to test non SMP builds.. oddly enough some UP code (particularly in the acpi code) seems to want this in a UP build. (I guess so you can have a sigle kernel module that works for both)	2004-09-01 18:05:43 +00:00
Julian Elischer	6804a3ab6d	Give the 4bsd scheduler the ability to wake up idle processors when there is new work to be done. MFC after: 5 days	2004-09-01 06:42:02 +00:00
Julian Elischer	2630e4c90c	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days	2004-09-01 02:11:28 +00:00
David Xu	cf1867f932	Remove TDP_USTATCLOCK, we no longer need it because we now always update tick count for userland in thread_userret. This change also removes a "no upcall owned" panic because fuword() schedules an upcall under heavily loaded, and code assumes there is no upcall can occur. Reported and Tested by: Peter Holm <peter@holm.cc>	2004-08-31 11:52:05 +00:00
Julian Elischer	5995adc206	Remove an unneeded argument.. The removed argument could trivially be derived from the remaining one. That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument. Having both proc and thread as an argumen tjust gives an opportunity for them to get out sync. MFC after: 3 days	2004-08-31 07:34:54 +00:00
Julian Elischer	99e9dcb817	Remove sched_free_thread() which was only used in diagnostics. It has outlived its usefulness and has started causing panics for people who turn on DIAGNOSTIC, in what is otherwise good code. MFC after: 2 days	2004-08-31 06:12:13 +00:00
Warner Losh	2063e05140	Fix BUS_DEBUG case	2004-08-30 05:48:49 +00:00
Pawel Jakub Dawidek	2e4db7cfd7	Add a missing '\n'.	2004-08-30 01:10:20 +00:00
David Xu	45a4bfa17d	Only test return_instead if P_SINGLE_EXIT is set, otherwise a fork() syscall can interrupt other thread's syscall in sleepq_catch_signals(). Current, all callers know thread_suspend_check may suspend thread itself, so we need't to check return_instead for normal suspension flags (no P_SINGLE_EXIT set). Tested by: deischen Reported by: Maarten L. Hekkelman <m.hekkelman@cmbi.kun.nl>	2004-08-29 23:10:02 +00:00
Warner Losh	f52c5866ea	Initial support (disabled) for rebidding devices. I've been running this in my tree for a while and in its disabled state there are no issues. It isn't enabled yet because some drivers (in acpi) have side effects in their probe routines that need to be resolved in some manner before this can be turned on. The consensus at the last developer's summit was to provide a static method for each driver class that will return characteristics of the driver, one of which is if can be reprobed idempotently.	2004-08-29 18:25:21 +00:00
Warner Losh	3cdf2a3f20	MFp4: Merge in the patches, submitted long ago by someone whose email address I've lost, that move the location information to the atttach routine as well. While one could use devinfo to get this data, that is difficult and error prone and subject to races for short lived devices. Would make a good MT5 candidate.	2004-08-29 18:11:10 +00:00
Dag-Erling Smørgrav	0eac4495db	Remove the HW_WDOG option; it serves no purpose. MFC after: 3 days	2004-08-29 11:10:09 +00:00
Ian Dowse	70b7ffee1b	Add support for completing the installation of ELF relocatable object format modules that were read in by the loader. Loading modules via the loader should now work on the amd64 platform.	2004-08-29 01:21:51 +00:00
David Xu	5897f840f0	1. try to use existing mailbox address in thread_update_usr_ticks. 2. remove '\n' in KASSERT.	2004-08-28 04:16:32 +00:00
David Xu	ad1280b593	Move TDF_CAN_UNBIND to thread private flags td_pflags, this eliminates need of sched_lock in some places. Also in thread_userret, remove spare thread allocation code, it is already done in thread_user_enter. Reviewed by: julian	2004-08-28 04:08:05 +00:00
Peter Wemm	6f96710c60	Backout the previous backout (with scott's ok). sched_ule.c:1.122 is believed to fix the problem with ULE that this change triggered.	2004-08-28 01:04:44 +00:00
David E. O'Brien	dd68efd05b	s/smp_rv_mtx/smp_ipi_mtx/g Requested by: jhb	2004-08-28 00:49:55 +00:00
Peter Wemm	91c1172a5a	Commit Jeff's suggested changes for avoiding a bug that is exposed by preemption and/or the rev 1.79 kern_switch.c change that was backed out. The thread was being assigned to a runq without adding in the load, which would cause the counter to hit -1.	2004-08-28 00:49:22 +00:00
Andre Oppermann	2580f4e584	Poll() uses the array smallbits that is big enough to hold 32 struct pollfd's to avoid calling malloc() on small numbers of fd's. Because smalltype's members have type char, its address might be misaligned for a struct pollfd. Change the array of char to an array of struct pollfd. PR: kern/58214 Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at> Reviewed by: bde (a long time ago) MFC after: 3 days	2004-08-27 21:23:50 +00:00
Alexander Kabaev	4cef6d5a53	Reintroduce slightly modified patch from kern/69964. Check for LK_HAVE_EXL in both acquire invocations. MFC after: 5 days	2004-08-27 01:41:28 +00:00
Ian Dowse	0ca311f6a1	When trying each linker class in turn with a preloaded module, exit the loop if the preload was successful. Previously a successful preload was ignored if the linker class was not the last in the list.	2004-08-27 01:20:26 +00:00
Robert Watson	161a0c7cff	Don't hold the UNIX domain socket subsystem lock over the body of the UNIX domain socket garbage collection implementation, as that risks holding the mutex over potentially sleeping operations (as well as introducing some nasty lock order issues, etc). unp_gc() will hold the lock long enough to do necessary deferal checks and set that it's running, but then release it until it needs to reset the gc state. RELENG_5 candidate. Discussed with: alfred	2004-08-25 21:24:36 +00:00
Robert Watson	fe0f2d4e11	Conditional acquisition of socket buffer mutexes when testing socket buffers with kqueue filters is no longer required: the kqueue framework will guarantee that the mutex is held on entering the filter, either due to a call from the socket code already holding the mutex, or by explicitly acquiring it. This removes the last of the conditional socket locking.	2004-08-24 05:28:18 +00:00
Warner Losh	0160658e84	Set the description to NULL in the right detach routine. This should keep dangling pointers to strings in loaded modules from hanging around after the drivers are unloaded.	2004-08-24 05:19:15 +00:00
David Xu	d30412a8db	Remove checking of single exit flag in thread_user_enter(), this is generic code for threaded process, should not be here.	2004-08-23 22:54:37 +00:00
Peter Wemm	f1009e1e1f	Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock. We were obtaining different spin mutexes (which disable interrupts after aquisition) and spin waiting for delivery. For example, KSE processes do LDT operations which use smp_rendezvous, while other parts of the system are doing things like tlb shootdowns with a different mutex. This patch uses the common smp_rendezvous mutex for all MD home-grown IPIs that spinwait for delivery. Having the single mutex means that the spinloop to aquire it will enable interrupts periodically, thus avoiding the cross-ipi deadlock. Obtained from: dwhite, alc Reviewed by: jhb	2004-08-23 21:39:29 +00:00
Alexander Kabaev	cffdaf2dce	Temporarily back out r1.74 as it seems to cause a number of regressions accordimg to numerous reports. It might get reintroduced some time later when an exact failure mode is understood better.	2004-08-23 02:39:45 +00:00
Robert Watson	d963815baf	Make debug.kdb.stop_cpus also a TUNABLE() so it can be set prior to boot to help debug early nasty hangs.	2004-08-22 15:10:52 +00:00
Julian Elischer	ad59c36ba1	diff reduction for upcoming patch. Use a macro that masks some of the odd goings on with sub-structures, because they will go away anyhow.	2004-08-22 05:21:41 +00:00
Don Lewis	1a1c04b6b3	Don't bother calling the module event handlers from module_shutdown() in the shutdown_final state if the RB_NOSYNC flag is set. The specific motivation in this case is that a system panic in an interrupt context results in a call to module_shutdown(), which calls g_modevent(), which calls g_malloc(..., M_WAITOK), which results in a second panic. While g_modevent() could be fixed to not call malloc() for MOD_SHUTDOWN events (which it doesn't handle in any case), it is probably also a good idea to entirely skip the execution of the module shutdown handlers after a panic. This may be a MFC candidate for RELENG_5.	2004-08-20 21:47:48 +00:00
Don Lewis	8ded654028	Don't attempt to trigger the syncer thread final sync code in the shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the likely cause of hangs after a system panic that are keeping crash dumps from being done. This is a MFC candidate for RELENG_5. MFC after: 3 days	2004-08-20 19:21:47 +00:00
John Baldwin	55c45354ff	Remove some dead code under a straggling APIC_IO #ifdef that I missed back before 5.2.	2004-08-20 17:24:52 +00:00
Robert Watson	7b38f0d3c3	Back out uipc_socket.c:1.208, as it incorrectly assumes that all sockets are connection-oriented for the purposes of kqueue registration. Since UDP sockets aren't connection-oriented, this appeared to break a great many things, such as RPC-based applications and services (i.e., NFS). Since jmg isn't around I'm backing this out before too many more feet are shot, but intend to investigate the right solution with him once he's available. Apologies to: jmg Discussed with: imp, scottl	2004-08-20 16:24:23 +00:00
Scott Long	2384290ced	Revert the previous change. It works great for 4BSD but causes major problems for ULE. The reason is quite unknown and worrisome.	2004-08-20 05:58:38 +00:00
Scott Long	2c86298c6c	In maybe_preempt(), ignore threads that are in an inconsistent state. This is an effective band-aid for at least some of the scheduler corruption seen recently. The real fix will involve protecting threads while they are inconsistent, and will come later. Submitted by: julian	2004-08-20 05:18:50 +00:00
John-Mark Gurney	5d6dd4685a	make sure that the socket is either accepting connections or is connected when attaching a knote to it... otherwise return EINVAL... Pointed out by: benno	2004-08-20 04:15:30 +00:00
Nate Lawson	0b54748fec	Add a newline.	2004-08-19 20:16:09 +00:00
Poul-Henning Kamp	d298f91974	Add bioq_takefirst(). If the bioq is empty, NULL is returned. Otherwise the front element is removed and returned. This can simplify locking in many drivers from: lock() bp = bioq_first(bq); if (bp == NULL) { unlock() return } bioq_remove(bp, bq) unlock to: lock() bp = bioq_takefirst(bq); unlock() if (bp == NULL) return;	2004-08-19 19:51:51 +00:00
Nate Lawson	c003dab8ff	Add debugging to rman_manage_region() as well. This is useful since we manage subregions in ACPI. MFC after: 3 days	2004-08-19 16:41:12 +00:00
Robert Watson	16239786ca	Remove GIANT_REQUIRED from setugidsafety() as knote_fdclose() no longer requires Giant.	2004-08-19 14:59:51 +00:00
John Baldwin	007ddf7e7a	Now that the return value semantics of cv's for multithreaded processes have been unified with that of msleep(9), further refine the sleepq interface and consolidate some duplicated code: - Move the pre-sleep checks for theaded processes into a thread_sleep_check() function in kern_thread.c. - Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c. Specifically, if a thread is awakened by something other than a signal while checking for signals before going to sleep, clear TDF_SINTR in sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in that edge case during an interruptible sleep. Also, fix sleepq_check_signals() to properly handle the condition if TDF_SINTR is clear rather than requiring the callers of the sleepq API to notice this edge case and call a non-_sig variant of sleepq_wait(). - Clarify the flags arguments to sleepq_add(), sleepq_signal() and sleepq_broadcast() by creating an explicit submask for sleepq types. Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of 0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and move the setting of TDF_SINTR to sleepq_add() if this flag is set rather than sleepq_catch_signals(). Note that it is the caller's responsibility to ensure that sleepq_catch_signals() is called if and only if this flag is passed to the preceeding sleepq_add(). Note that this also removes a sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures that for an interruptible sleep, TDF_SINTR is always set when TD_ON_SLEEPQ() is true.	2004-08-19 11:31:42 +00:00
John-Mark Gurney	000968010a	add options MPROF_BUFFERS and MPROF_HASH_SIZE that adjust the sizes of the mutex profiling buffers. Document them in the man page and in NOTES. Ensure _HASH_SIZE is larger than _BUFFERS with a cpp error.	2004-08-19 06:38:26 +00:00
Robert Watson	4c5bc1ca39	Add UNP_UNLOCK_ASSERT() to asser that the UNIX domain socket subsystem lock is not held. Rather than annotating that the lock is released after calls to unp_detach() with a comment, annotate with an assertion. Assert that the UNIX domain socket subsystem lock is not held when unp_externalize() and unp_internalize() are called.	2004-08-19 01:45:16 +00:00
Robert Watson	2cfe973b62	Annotate call to DELAY() in interrupt storm mitigation as being something to revisit. Approved by: re (scottl)	2004-08-17 04:09:09 +00:00
Alexander Kabaev	c8b876219f	Upgrading a lock does not play well together with acquiring an exclusive lock and can lead to two threads being granted exclusive access. Check that no one has the same lock in exclusive mode before proceeding to acquire it. The LK_WANT_EXCL and LK_WANT_UPGRADE bits act as mini-locks and can block other threads. Normally this is not a problem since the mini locks are upgraded to full locks and the release of the locks will unblock the other threads. However if a thread reset the bits without obtaining a full lock other threads are not awoken. Add missing wakeups for these cases. PR: kern/69964 Submitted by: Stephan Uphoff <ups at tree dot com> Very good catch by: Stephan Uphoff <ups at tree dot com>	2004-08-16 15:01:22 +00:00
David E. O'Brien	78c37b0de8	s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g	2004-08-16 08:33:37 +00:00
Robert Watson	40f2ac28a0	Always acquire the UNIX domain socket subsystem lock (UNP lock) before dereferencing sotounpcb() and checking its value, as so_pcb is protected by protocol locking, not subsystem locking. This prevents races during close() by one thread and use of ths socket in another. unp_bind() now assert the UNP lock, and uipc_bind() now acquires the lock around calls to unp_bind().	2004-08-16 04:41:03 +00:00
Brian Feldman	8912c44d9f	Add the missing knote_fdclose().	2004-08-16 03:09:01 +00:00
Brian Feldman	1c0f9af5b5	Allocate the marker, when scanning a kqueue, from the "heap" instead of the stack. When swapped out, a process's kernel stack would be unavailable, and we could get a page fault when scanning the same kqueue. PR: kern/61849	2004-08-16 03:08:38 +00:00
Robert Watson	ce5f32de11	Annotate the current UNIX domain socket locking strategies, order, strengths, and weaknesses in a comment. Assert a copyright over the changes made as part of the locking work.	2004-08-16 01:52:04 +00:00
Mike Silbersack	5173e8f567	Major enhancements to pipe memory usage: - pipespace is now able to resize non-empty pipes; this allows for many more resizing opportunities - Backing is no longer pre-allocated for the reverse direction of pipes. This direction is rarely (if ever) used, so this cuts the amount of map space allocated to a pipe in half. - Pipe growth is now much more dynamic; a pipe will now grow when the total amount of data it contains and the size of the write are larger than the size of pipe. Previously, only individual writes greater than the size of the pipe would cause growth. - In low memory situations, pipes will now shrink during both read and write operations, where possible. Once the memory shortage ends, the growth code will cause these pipes to grow back to an appropriate size. - If the full PIPE_SIZE allocation fails when a new pipe is created, the allocation will be retried with SMALL_PIPE_SIZE. This helps to deal with the situation of a fragmented map after a low memory period has ended. - Minor documentation + code changes to support the above. In total, these changes increase the total number of pipes that can be allocated simultaneously, drastically reducing the chances that pipe allocation will fail. Performance appears unchanged due to dynamic resizing.	2004-08-16 01:27:24 +00:00
Don Lewis	b6915bdbe5	Yet another tweak to the shutdown messages in boot(): Don't count busy buffers before the initial call to sync() and don't skip the initial sync() if no busy buffers were called. Always call sync() at least once if syncing is requested. This defers the "Syncing disks, buffers remaining..." message until after the initial sync() call and the first count of busy buffers. This backs out changes in kern_shutdown 1.162. Print a different message when there are no busy buffers after the initial sync(), which is now the expected situation. Print an additional message when syncing has completed successfully in the unusual situation where the work of syncing was done by boot(). Uppercase one message to make it consistent with all of the other kernel shutdown messages. Discussed with: bde (in a much earlier form, prior to 1.162) Reviewed by: njl (in an earlier form)	2004-08-15 19:17:23 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Robert Watson	d8939d82cb	Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we attempt to IPI other cpus when entering the debugger in order to stop them while in the debugger. The default remains to issue the stop; however, that can result in a hang if another cpu has interrupts disabled and is spinning, since the IPI won't be received and the KDB will wait indefinitely. We probably need to add a timeout, but this is a useful stopgap in the mean time. Reviewed by: marcel	2004-08-15 02:06:27 +00:00
Robert Watson	6cbea71c82	Cause pfind() not to return processes in the PRS_NEW state. As a result, threads consuming the result of pfind() will not need to check for a NULL credential pointer or other signs of an incompletely created process. However, this also means that pfind() cannot be used to test for the existence or find such a process. Annotate pfind() to indicate that this is the case. A review of curent consumers seems to indicate that this is not a problem for any of them. This closes a number of race conditions that could result in NULL pointer dereferences and related failure modes. Other related races continue to exist, especially during iteration of the allproc list without due caution. Discussed with: tjr, green	2004-08-14 17:15:16 +00:00
Poul-Henning Kamp	d8e8b6755c	Add some KASSERTS.	2004-08-14 08:33:49 +00:00
Julian Elischer	f0017f3321	Whitespace nit.	2004-08-14 07:21:20 +00:00
Robert Watson	b295bdcded	After completing a name lookup for a target UNIX domain socket to connect to, re-check that the local UNIX domain socket hasn't been closed while we slept, and if so, return EINVAL. This affects the system running both with and without Giant over the network stack, and recent ULE changes appear to cause it to trigger more frequently than previously under load. While here, improve catching of possibly closed UNIX domain sockets in one or two additional circumstances. I have a much larger set of related changes in Perforce, but they require more testing before they can be merged. One debugging printf is left in place to indicate when such a race takes place: this is typically triggered by a buggy application that simultaenously connect()'s and close()'s a UNIX domain socket file descriptor. I'll remove this at some point in the future, but am interested in seeing how frequently this is reported. In the case of Martin's reported problem, it appears to be a result of a non-thread safe syslog() implementation in the C library, which does not synchronize access to its logging file descriptor. Reported by: mbr	2004-08-14 03:43:49 +00:00
John-Mark Gurney	ac77164d64	clean up whitespace...	2004-08-13 17:43:53 +00:00
John-Mark Gurney	7d5e45a391	looks like rwatson forgot tabs... :)	2004-08-13 07:38:58 +00:00
Julian Elischer	c00661f83c	Don't keep evaluating our own cpu mask.. it's not likely to have changed....	2004-08-13 00:57:43 +00:00
Robert Watson	44f31f7556	Trim trailing white space.	2004-08-12 18:06:21 +00:00
Warner Losh	9f7f340a0f	Minor formatting fixes for lines > 80 characters	2004-08-12 17:26:22 +00:00
Jeff Roberson	f2b74cbf28	- Introduce a new flag KEF_HOLD that prevents sched_add() from doing a migration. Use this in sched_prio() and sched_switch() to stop us from migrating threads that are in short term sleeps or are runnable. These extra migrations were added in the patches to support KSE. - Only set NEEDRESCHED if the thread we're adding in sched_add() is a lower priority and is being placed on the current queue. - Fix some minor whitespace problems.	2004-08-12 07:56:33 +00:00
Julian Elischer	0f54f48225	Properly keep track of how many kses are on the system run queue(s).	2004-08-11 20:54:48 +00:00
Robert Watson	217a4b6e4e	Replace a reference to splnet() with a reference to locking in a comment.	2004-08-11 03:43:10 +00:00
Marcel Moolenaar	4da47b2fec	Add __elfN(dump_thread). This function is called from __elfN(coredump) to allow dumping per-thread machine specific notes. On ia64 we use this function to flush the dirty registers onto the backingstore before we write out the PRSTATUS notes. Tested on: alpha, amd64, i386, ia64 & sparc64 Not tested on: arm, powerpc	2004-08-11 02:35:06 +00:00
Robert Watson	87e83e7d4c	In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However, we may sleep when doing so; check that we didn't race with another thread allocating storage for the vnode after allocation is made to a local pointer, and only update the vnode pointer if it's still NULL. Otherwise, accept that another thread got there first, and release the local storage. Discussed with: jmg	2004-08-11 01:27:53 +00:00
Alan Cox	fad44deea3	Eliminate the acquisition and release of Giant within physio(). Remove the spl calls. Reviewed by: phk@ Discussed with: scottl@	2004-08-10 21:47:11 +00:00
John Baldwin	274f8f48e8	Synchronize the extra SA threading checks and return value handling of condition variables with that of msleep(). Reviewed by: davidxu	2004-08-10 17:42:59 +00:00
Jeff Roberson	2454aaf51c	- Use a new flag, KEF_XFERABLE, to record with certainty that this kse had contributed to the transferable load count. This prevents any potential problems with sched_pin() being used around calls to setrunqueue(). - Change the sched_add() load balancing algorithm to try to migrate on wakeup. This attempts to place threads that communicate with each other on the same CPU. - Don't clear the idle counts in kseq_transfer(), let the cpus do that when they call sched_add() from kseq_assign(). - Correct a few out of date comments. - Make sure the ke_cpu field is correct when we preempt. - Call kseq_assign() from sched_clock() to catch any assignments that were done without IPI. Presently all assignments are done with an IPI, but I'm trying a patch that limits that. - Don't migrate a thread if it is still runnable in sched_add(). Previously, this could only happen for KSE threads, but due to changes to sched_switch() all threads went through this path. - Remove some code that was added with preemption but is not necessary.	2004-08-10 07:52:21 +00:00
Nate Lawson	c8c216d558	Skip the syncing disks loop if there are no dirty buffers. Remove a variable used to flag the initial printf. Submitted by: truckman (earlier version)	2004-08-10 01:32:05 +00:00
Scott Long	0f4ad91810	Add a temporary debugging hack to detect a deadlock in setrunqueue(). This is here so that we can gather stats on the nature of the recent rash of hard lockups, and in this particular case panic the machine instead of letting it deadlock forever.	2004-08-10 00:26:25 +00:00
Julian Elischer	e2105bce2a	Slight changes to comments and some whitespace changes.	2004-08-09 21:57:30 +00:00
Julian Elischer	1a5cd27b4b	Make kg->kg_runnable actually count runnable threads in the ksegrp run queue instead of only doing it sometimes.. This is not used outdide of debugging code in the current code, but that will probably change.	2004-08-09 20:36:03 +00:00
Julian Elischer	332e72ddb7	Remove typos on KASSERT messages.	2004-08-09 20:13:07 +00:00
Brian Feldman	83dd6b37e1	Normalize the VM wiring done with SPARSE_MAPPING: check for errors, and unmap when done. For whatever reason, SPARSE_MAPPING is not even a config option, so this is dead code.	2004-08-09 18:46:13 +00:00
Julian Elischer	732d95288a	Increase the amount of data exported by KTR in the KTR_RUNQ setting. This extra data is needed to really follow what is going on in the threaded case.	2004-08-09 18:21:12 +00:00
John-Mark Gurney	6141e04a7e	add option to automaticly mark core dumps with the nodump flag PR: 57065 Submitted by: Walter C. Pelissero	2004-08-09 05:46:46 +00:00
David Xu	604be46d1e	1.Add KSE_INTR_DBSUSPEND command for kse_thr_interrupt to suspend a bound thread, after the bound thread leaves critical region, the thread should check debug flag may suspend itself by using the command. 2.Schedule upcall after thread is suspended by debugger 3.Wakeup upcall thread after process suspension. Reviewed by: deischen	2004-08-08 22:32:20 +00:00
David Xu	2b70a83aff	Call thread_user_enter for M:N thread, ast() should be treated as another entrance of kernel.	2004-08-08 22:28:33 +00:00
David Xu	1f2eac6cf3	Add pl_flags to ptrace_lwpinfo, two flags PL_FLAG_SA and PL_FLAG_BOUND indicate that a thread is in UTS critical region. Reviewed by: deischen Approved by: marcel	2004-08-08 22:26:11 +00:00
Doug Rabson	cfaf7e60cc	Make sure that AT_PHDR has a useful value even for static programs.	2004-08-08 09:48:10 +00:00
John-Mark Gurney	227559d11f	rearange some code that handles the thread taskqueue so that it is more generic. Introduce a new define TASKQUEUE_DEFINE_THREAD that takes a single arg, which is the name of the queue. Document these changes.	2004-08-08 02:37:22 +00:00
Robert Watson	b223d06425	We're not yet ready to assert !Giant in kern_fcntl(), as it's called with Giant from ABI wrappers such as Linux emulation. Foot shoot off: phk	2004-08-07 14:09:02 +00:00
Robert Watson	db532b63c2	Flag a broad range of VFS operations as GIANT_REQUIRED in order to catch leaking into VFS without Giant. Inch Giant a little lower in several file descriptor operations on vnodes to cover only VFS operations that need it, rather than file flag reading, etc.	2004-08-06 22:25:35 +00:00
Robert Watson	cc701b73b8	In thread_exit(), include more information about the thread/process context in the KTR trace record. In particular, include the same information as passed for mi_switch() and fork_exit() KTR trace records.	2004-08-06 22:06:14 +00:00
Robert Watson	5dd3a4ed6c	Push UIDINFO_UNLOCK() slightly earlier in chgsbize(), as it's not needed if we print the local variable version of the limit rather than the shared version.	2004-08-06 22:04:33 +00:00
Robert Watson	a0a819747c	Avoid acquiring Giant for some common light-weight or already MPSAFE fcntl() operations, including: F_DUPFD dup() alias F_GETFD retrieve close-on-exec flag F_SETFD set close-on-exec flag F_GETFL retrieve file descriptor flags For the remaining fcntl() operations, do acquire Giant, especially where we call into fo_ioctl() as a result. We're not yet ready to push Giant into fo_ioctl(). Once we do, this can all become quite a bit prettier.	2004-08-06 22:00:55 +00:00
Robert Watson	ff7ec58af8	Cut a KTR record whenever a callout is invoked. Mark whether it runs with Giant or not, and include the function point so it can be looked up against the kernel symbol table during trace analysis.	2004-08-06 21:49:00 +00:00
John Baldwin	44fe3c1ff0	Don't scare users with a warning about preemption being off when it isn't yet safe to have on by default.	2004-08-06 15:49:44 +00:00
Robert Watson	6f40c417ca	In ithread_schedule(), when we plan to go harvest some entropy as a result of scheduling an ithread, cut a KTR_INTR trace record so that it's clear in tracing interrupt activity where and when the entropy harvesting code is invoked.	2004-08-06 03:39:28 +00:00
Colin Percival	0413bacd09	When reseting a pending callout, perform the deregistration in callout_reset rather than calling callout_stop. This results in a few lines of code duplication, but it provides a significant performance improvement because it avoids recursing on callout_lock. Requested by: rwatson	2004-08-06 02:44:58 +00:00
John Baldwin	5cc00cfc67	Fix the code in rman that merges adjacent unallocated resources to use a better check for 'adjacent'. The old code assumed that if two resources were adjacent in the linked list that they were also adjacent range wise. This is not true when a resource manager has to manage disparate regions. For example, the current interrupt code on i386/amd64 will instruct irq_rman to manage two disjoint regions: 0-1 and 3-15 for the non-APIC case. If IRQs 1 and 3 were allocated and then released, the old code would coalesce across the 1 to 3 boundary because the resources were adjacent in the linked list thus adding 2 to the area of resources that irq_rman managed as a side effect. The fix adds extra checks so that adjacent unallocated resources are only merged with the resource being freed if the start and end values of the resources also match up. The patch also consolidates the checks for adjacent resources being allocated.	2004-08-05 15:48:18 +00:00
John Baldwin	0e5a07e533	Remove a potential deadlock on i386 SMP by changing the lazypmap ipi and spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi and spin-wait code snippets so that you can't get into the situation of one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap shootdown each of which are waiting on each other. With this change, only one of the CPUs would do an IPI and spin-wait at a time.	2004-08-04 20:31:19 +00:00
John Baldwin	c950c15c76	Workaround a possible deadlock on SMP due to a spin lock LOR by disabling the immediate awakening of proc0 (scheduler kproc, controls swapping processes in and out). The scheduler process periodically awakens already, so this will not result in processes not being swapped in, there will just be more latency in between a thread being made runnable and the scheduler waking up to swap the affected process back in.	2004-08-04 20:24:40 +00:00
John Baldwin	bdcfcf5bc4	Cache the value of curthread in the _get_sleep_lock() and _get_spin_lock() macros and pass the value to the associated _mtx_*() functions to avoid more curthread dereferences in the function implementations. This provided a very modest perf improvement in some benchmarks. Suggested by: rwatson Tested by: scottl	2004-08-04 20:18:45 +00:00
Robert Watson	7a36e1d6c7	Assert Giant in namei(). Bugs have been reported in which, following a sleep() call waking up in namei(), a later assertion triggers that Giant is not held. By asserting Giant at the start of namei(), we can know that if that assertion triggers, Giant is lost during the call to namei(), and not before.	2004-08-04 18:39:07 +00:00
Robert Watson	0be8ad5fbc	Assert Giant in the following file descriptor-related functions: Function Reason -------- ------ fdfree() VFS setugidsafety() KQueue fdcheckstd() VFS _fgetvp() VFS fgetsock() Conditional assertion based on debug.mpsafenet	2004-08-04 18:35:33 +00:00
Robert Watson	1b93405c7c	Remove spl's from kern_resource.c.	2004-08-04 18:19:09 +00:00
Maxime Henrion	9f1b87f106	Instead of calling ia32_pause() conditionally on __i386__ or __amd64__ being defined, define and use a new MD macro, cpu_spinwait(). It only expands to something on i386 and amd64, so the compiled code should be identical. Name of the macro found by: jhb Reviewed by: jhb	2004-08-03 18:44:27 +00:00
Pawel Jakub Dawidek	24b2151f4d	Don't skip permission checks when sending signals to zombie processes. Pointed out by: bde Reviewed by: rwatson	2004-08-03 15:39:23 +00:00
Mike Silbersack	e10ecdea88	Standardize pipe locking, ensuring that everything is locked via pipelock(), not via a mixture of mutexes and pipelock(). Additionally, add a few KASSERTS, and change some statements that should have been KASSERTS into KASSERTS. As a result of these cleanups, some segments of code have become significantly shorter and/or easier to read.	2004-08-03 02:59:15 +00:00
David Xu	4513fb36aa	s/TMDF_DONOTRUNUSER/TMDF_SUSPEND/g Dicussed with: deischen	2004-08-03 02:23:06 +00:00
Julian Elischer	4fd54632b0	Repeat after me: "Do not apply your tested patches to your commit tree by hand"	2004-08-03 01:43:29 +00:00
Julian Elischer	c94b38af46	Remove an argument that is never used.	2004-08-02 23:48:43 +00:00
David E. O'Brien	64298d52cc	Put a cap on the auto-tuning of kern.maxvnodes. Cap value chosen by: scottl	2004-08-02 21:52:43 +00:00
Robert Watson	3d3f5f6057	Add what appears to be a missing '*/' at the end of a comment.	2004-08-02 01:38:27 +00:00
Brian Feldman	b23f72e98a	* Add a "how" argument to uma_zone constructors and initialization functions so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.	2004-08-02 00:18:36 +00:00
Julian Elischer	6e0fbb01c5	Comment kse_create() and make a few minor code cleanups Reviewed by: davidxu	2004-08-01 23:02:00 +00:00
Poul-Henning Kamp	5e8c582ac2	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.	2004-07-30 22:08:52 +00:00
Alan Cox	9be60284a6	Giant is no longer required by vm_waitproc() and vmspace_exitfree(). Eliminate it acquisition and release around vm_waitproc() in kern_wait().	2004-07-30 20:31:02 +00:00
Nate Lawson	b1c8139147	Minor message cleanup.	2004-07-30 01:30:05 +00:00
Pawel Jakub Dawidek	0b011ea3da	Syscall kill(2) called for a zombie process should return 0. Obtained from: Darwin	2004-07-29 20:38:19 +00:00
Pawel Jakub Dawidek	cebabef04f	Fill some informations about zombie processes as well. Before this change every zombie process were reported as an owner of PID 0 in ps(1) output. Reviewed by: julian	2004-07-29 20:27:59 +00:00
Poul-Henning Kamp	d634f69316	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.	2004-07-28 20:21:04 +00:00
Alexander Kabaev	00fbcda80d	Avoid casts as lvalues.	2004-07-28 06:42:41 +00:00
David Xu	8bda8a620c	Use P_SINGLE_EXIT to check single-threading case, P_WEXIT is not for that purpose.	2004-07-28 06:30:52 +00:00
Poul-Henning Kamp	3dfe213e61	Convert the vfsconf list to a TAILQ. Introduce vfs_byname() function to find things on it. Staticize vfs_nmount() function under the name vfs_donmount(). Various cleanups.	2004-07-27 22:32:01 +00:00
Robert Watson	1a8cfbc450	Pass a thread argument into cpu_critical_{enter,exit}() rather than dereference curthread. It is called only from critical_{enter,exit}(), which already dereferences curthread. This doesn't seem to affect SMP performance in my benchmarks, but improves MySQL transaction throughput by about 1% on UP on my Xeon. Head nodding: jhb, bmilekic	2004-07-27 16:41:01 +00:00
Robert Watson	a9abdce44a	Add "options ADAPTIVE_GIANT" which causes Giant to also be treated in an adaptive fashion when adaptive mutexes are enabled. The theory behind non-adaptive Giant is that Giant will be held for long periods of time, and therefore spinning waiting on it is wasteful. However, in MySQL benchmarks which are relatively Giant-free, running Giant adaptive makes an observable difference on SMP (5% transaction rate improvement). As such, make adaptive behavior on Giant an option so it can be more widely benchmarked.	2004-07-27 16:34:48 +00:00
Alan Cox	1a276a3f91	- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.	2004-07-27 03:53:41 +00:00
Bosko Milekic	0047b9a96a	Move the schedlock owner state update following the context switch in fork_exit() to before anything else is done (but keep schedlock for the deadthread check). This means one less nasty bug if ever in the future whatever might have been called before the update played with schedlock or critical sections. Discussed with: tjr	2004-07-27 03:46:31 +00:00
Colin Percival	66d5c640fa	In revision 1.228, I accidentally broke the "total number of processes in the system" resource limit code: When checking if the caller has superuser privileges, we should be checking the real user, not the effective user. (In general, resource limiting is done based on the real user, in order to avoid resource-exhaustion-by-setuid-program attacks.) Now that a SUSER_RUID flag to suser_cred exists, use it here to return this code to its correct behaviour. Pointed out by: rwatson	2004-07-26 07:54:39 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Robert Watson	feb9bd18c6	Revert modification of subr_turnstile.c accidentally included in the last commit; this assertion was provided by jhb for local debugging and not intended for broader consumption.	2004-07-25 23:32:32 +00:00
Robert Watson	fd179ee91d	In uipc_connect(), assert that the passed thread is curthread, and pass td into unp_connect() instead of reading curthread.	2004-07-25 23:30:43 +00:00
Robert Watson	99901d0afb	Do some initial locking on accept filter registration and attach. While here, close some races that existed in the pre-locking world during low memory conditions. This locking isn't perfect, but it's closer than before.	2004-07-25 23:29:47 +00:00
Poul-Henning Kamp	cf95b5c381	Eliminate unused second argument to reassignbuf() and simplify it accordingly.	2004-07-25 21:24:23 +00:00
Robert Watson	3ed994c6c3	Add netatalk mutexes to hard-coded WITNESS lock order.	2004-07-25 20:16:51 +00:00
Warner Losh	4411688509	Expand the generic, but bogusly formed, copyright notice to include the license from /usr/src/COPYRIGHT. Since cvs annotate shows that this was written by jasone, julian, jhb, peter, bmilekic and obrien. cvs log shows that many others may have contributed to this file. As such, go ahead and use the author of 'FreeBSD Project' for this file. If this is a problem, please notify me. # this eliminates the last file in the kernel with an indirect reference # to /usr/src/COPYRIGHT in the kernel. A few more in userland remain.	2004-07-25 19:49:01 +00:00
Poul-Henning Kamp	a3d57cfbfd	Neuter this warning for now, I think I know the remaining issues.	2004-07-25 08:09:21 +00:00
Julian Elischer	aa3c8c02ae	White space fix.. diff reduction for upcoming commit.	2004-07-24 04:57:41 +00:00
Scott Long	e038d35422	Clean up whitespace, increase consistency and correctness. Submitted by: bde	2004-07-23 23:09:00 +00:00
Robert Watson	ff381670df	Don't include a "\n" in KTR output, it confuses automatic parsing.	2004-07-23 20:12:56 +00:00
Scott Long	18f480f8f6	Remove the previous hack since it doesn't make a difference and is getting in the way of debugging.	2004-07-23 19:59:16 +00:00
Alan Cox	b332cea583	Use kmem_alloc_nofault() rather than kmem_alloc_pageable() for allocating KVA for explicitly managed mappings, i.e., mappings created with pmap_qenter().	2004-07-23 19:36:18 +00:00
Robert Watson	4da86f8826	Export KTR_COMPILE as a sysctl so you can easily check from user space what event mask has been compiled into the kernel.	2004-07-23 17:41:44 +00:00
Robert Watson	46b25cb5f6	Don't perform pipe endpoint locking during pipe_create(), as the pipe can't yet be referenced by other threads. In microbenchmarks, this appears to reduce the cost of pipe();close();close() on UP by 10%, and SMP by 7%. The vast majority of the cost of allocating a pipe remains VM magic. Suggested by: silby	2004-07-23 14:11:04 +00:00
Robert Watson	71a057bc73	In setpgid(), since td is passed in as a system call argument, use it in preference to curthread, which costs slightly more.	2004-07-23 04:26:49 +00:00
Robert Watson	a6719c82b1	Push Giant acquisition down into fo_stat() from most callers. Acquire Giant conditional on debug.mpsafenet in the socket soo_stat() routine, unconditionally in vn_statfile() for VFS, and otherwise don't acquire Giant. Accept an unlocked read in kqueue_stat(), and cryptof_stat() is a no-op. Don't acquire Giant in fstat() system call. Note: in fdescfs, fo_stat() is called while holding Giant due to the VFS stack sitting on top, and therefore there will still be Giant recursion in this case.	2004-07-22 20:40:23 +00:00
Robert Watson	1c1ce9253f	Push acquisition of Giant from fdrop_closed() into fo_close() so that individual file object implementations can optionally acquire Giant if they require it: - soo_close(): depends on debug.mpsafenet - pipe_close(): Giant not acquired - kqueue_close(): Giant required - vn_close(): Giant required - cryptof_close(): Giant required (conservative) Notes: Giant is still acquired in close() even when closing MPSAFE objects due to kqueue requiring Giant in the calling closef() code. Microbenchmarks indicate that this removal of Giant cuts 3%-3% off of pipe create/destroy pairs from user space with SMP compiled into the kernel. The cryptodev and opencrypto code appears MPSAFE, but I'm unable to test it extensively and so have left Giant over fo_close(). It can probably be removed given some testing and review.	2004-07-22 18:35:43 +00:00
Robert Watson	df04411ac4	suser() accepts a thread argument; as suser() dereferences td_ucred, a thread-local pointer, in practice that thread needs to be curthread. If we're running with INVARIANTS, generate a warning if not. If we have KDB compiled in, generate a stack trace. This doesn't fire at all in my local test environment, but could be irritating if it fires frequently for someone, so there will be motivation to fix things quickly when it does.	2004-07-22 17:05:04 +00:00
Scott Long	9493183e77	Disable the PREEMPTION-enabled code in critical_exit() that encourages switching to a different thread. This is just a hack to try to improve stability some more, but likely points closer to the real culprit.	2004-07-22 14:32:48 +00:00
Bosko Milekic	01e9ccbd9c	Back out just a portion of Alfred's last commit. Remove the MBUF_CHECK (WITNESS) for code paths that always call uma_zalloc_arg() shortly after where the check was, because uma_zalloc_arg() already does a similar check. No objections from Alfred. Thanks Alfred.	2004-07-21 21:03:01 +00:00
Robert Watson	46e38ce826	Don't sync the file system on panic by default. This seems to basically work very infrequently, and often results in a compound panic which confuses debugging; locking/SMP have made the layering violation (and risks) of this more obvious over time. Discussed with: green, bde, et al.	2004-07-21 16:04:46 +00:00
Alfred Perlstein	05656b6e2b	put several of the options for DEBUG_VFS_LOCKS under control of sysctls.	2004-07-21 07:13:14 +00:00
Alfred Perlstein	063d811465	Make sure we don't call mbuf allocation functions with mutexes held. Discussed with: rwatson	2004-07-21 07:12:24 +00:00
Marcel Moolenaar	3d4f313695	Add kdb_thr_from_pid(), which given a PID returns the first thread in the process. This is useful when working from or with a process.	2004-07-21 04:49:48 +00:00
Mike Silbersack	eb3d2c61b4	Fix a minor error in pipe_stat - st_size was always reported as 0 when direct writes kicked in. Whether this affected any applications is unknown.	2004-07-20 07:06:43 +00:00
Peter Wemm	b09cb1027b	#ifdef __i386__ -> __i386__ \|\| __amd64__	2004-07-20 02:15:10 +00:00
Julian Elischer	3a63b92c12	You always spot the typos after you have committed.. Start sentence with a Cap.	2004-07-19 18:06:12 +00:00
Julian Elischer	f6449d9d31	Allow the user who calls doadump() from the kernel debugger to not get a page fault if he has not defined a dump device. Panic can often not do a dump as it can hang forever in some cases. The original PR was for amd64 only. This is a generalised version of that change. PR: amd64/67712 Submitted by: wjw@withagen.nl <Willen Jan Withagen>	2004-07-19 18:03:02 +00:00
Brian Feldman	4362fada8f	Reimplement contigmalloc(9) with an algorithm which stands a greatly- improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.	2004-07-19 06:21:27 +00:00
Julian Elischer	55d44f79ea	When calling scheduler entrypoints for creating new threads and processes, specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter	2004-07-18 23:36:13 +00:00
Pawel Jakub Dawidek	ece2d9891e	Now we have NO_ADAPTIVE_MUTEXES option, so use it here too. Missed by: scottl	2004-07-18 23:27:14 +00:00
Marcel Moolenaar	1f7a1baa37	After maintaining previous behaviour in writing out the core notes, it's time now to break with the past: do not write the PID in the first note. Rationale: 1. [impact of the breakage] Process IDs in core files serve no immediate purpose to the debugger itself. They are only useful to relate a core file to a process. This can provide context to the person looking at the core file, provided one keeps track of this. Overall, not having the PID in the core file is only in very rare occasions unfortunate. 2. [reason of the breakage] Having one PRSTATUS note contain the PID, while all others contain the LWPID of the corresponding kernel thread creates an irregularity for the debugger that cannot easily be worked around. This is caused by libthread_db correlating user thread IDs to kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs. Update comments accordingly.	2004-07-18 20:28:07 +00:00
David Malone	cdb71f7526	The recent changes to control message passing broke some things that get certain types of control messages (ping6 and rtsol are examples). This gets the new code closer to working: 1) Collect control mbufs for processing in the controlp == NULL case, so that they can be freed by externalize. 2) Loop over the list of control mbufs, as the externalize function may not know how to deal with chains. 3) In the case where there is no externalize function, remember to add the control mbuf to the controlp list so that it will be returned. 4) After adding stuff to the controlp list, walk to the end of the list of stuff that was added, incase we added a chain. This code can be further improved, but this is enough to get most things working again. Reviewed by: rwatson	2004-07-18 19:10:36 +00:00
Doug Rabson	4c4392e791	Add doxygen doc comments for most of newbus and the BUS interface.	2004-07-18 16:30:31 +00:00
Scott Long	701f140800	Enable ADAPTIVE_MUTEXES by default by changing the sense of the option to NO_ADAPTIVE_MUTEXES. This option has been enabled by default on amd64 for quite some time, and has been extensively tested on i386 and sparc64. It shows measurable performance gains in many circumstances, and few negative effects. It would be nice in t he future if adaptive mutexes actually went to sleep after a certain amount of spinning, but that will require quite a bit more testing.	2004-07-18 15:59:03 +00:00
Alan Cox	d8582da660	Remove GIANT_REQUIRED from vmapbuf().	2004-07-18 04:57:49 +00:00
Robert Watson	2260c03d77	Drop Giant and acquire the UNIX domain socket subsystem lock a bit earlier in unp_connect() so that vp->v_socket can't change between our copying its value to a local variable and later use of that variable. This may have been responsible for a panic during shutdown that I experienced where simultaneous closing of a listen socket by rpcbind and a new connection being made to rpcbind by mountd.	2004-07-18 01:29:43 +00:00
David Xu	c3d88cbab8	Fix typo.	2004-07-17 23:15:41 +00:00
David Malone	e140eb430c	Add a kern_setsockopt and kern_getsockopt which can read the option values from either user land or from the kernel. Use them for [gs]etsockopt and to clean up some calls to [gs]etsockopt in the Linux emulation code that uses the stackgap.	2004-07-17 21:06:36 +00:00
John Baldwin	52eb84641d	- Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).	2004-07-16 21:04:55 +00:00
John Baldwin	d3373e371b	Whitespace fix.	2004-07-16 21:01:52 +00:00
John Baldwin	6dbc085016	Improve readability a bit by changing some code at the end of a function that did: if (foo) return else blah to just do the simpler if (!foo) blah instead.	2004-07-16 21:00:50 +00:00
Colin Percival	24283cc01b	Add a SUSER_RUID flag to suser_cred. This flag indicates that we want to check if the real user is the superuser (vs. the normal behaviour, which checks the effective user). Reviewed by: rwatson	2004-07-16 15:57:16 +00:00
Robert Watson	dad7b41a9b	When entering soclose(), assert that SS_NOFDREF is not already set.	2004-07-16 00:37:34 +00:00
Poul-Henning Kamp	672c05d49c	Preparation commit for the tty cleanups that will follow in the near future: rename ttyopen() -> tty_open() and ttyclose() -> tty_close(). We need the ttyopen() and ttyclose() for the new generic cdevsw functions for tty devices in order to have consistent naming.	2004-07-15 20:47:41 +00:00
Poul-Henning Kamp	3e019deaed	Do a pass over all modules in the kernel and make them return EOPNOTSUPP for unknown events. A number of modules return EINVAL in this instance, and I have left those alone for now and instead taught MOD_QUIESCE to accept this as "didn't do anything".	2004-07-15 08:26:07 +00:00
Alfred Perlstein	bb5faea34f	Cleanup shutdown output.	2004-07-15 08:01:00 +00:00
Alfred Perlstein	da6303bacc	Tidy up system shutdown.	2004-07-15 04:29:48 +00:00
Alfred Perlstein	a88295bb83	Disable SIGIO for now, leave a comment as to why it's busted and hard to fix.	2004-07-15 03:49:52 +00:00
Nate Lawson	8916adb1c9	Clean up the output on reboot by keeping completion messages on the same line as the announcement. Someone should probably update the "buffers remaining" message since we now no longer should have any buffers remaining at that point.	2004-07-15 03:20:08 +00:00
Poul-Henning Kamp	e2ad640e13	A module with no modevent function gets modevent_nop() as default. Until now the function has just returned zero for any event, but that is downright wrong for MOD_UNLOAD and not very useful for any future events we add where it may be crucial to be able to tell if the event was unhandled or successful. Change the function to return as follows: MOD_LOAD -> 0 MOD_UNLOAD -> EBUSY anything else -> EOPNOTSUPP	2004-07-14 22:37:36 +00:00

... 2 3 4 5 6 ...

7786 Commits