freebsd-skq

Author	SHA1	Message	Date
phk	e0c89dae13	There is no need to explicitly call the stop function. In all likelyhood ->l_close() did it and ttyclose certainly will.	2004-06-01 11:57:15 +00:00
rwatson	5a32935851	Add a global mutex, accept_filter_mtx, to protect the global list of accept filters and prevent read-modify-write races.	2004-06-01 04:08:48 +00:00
rwatson	bddadcf71a	The SS_COMP and SS_INCOMP flags in the so_state field indicate whether the socket is on an accept queue of a listen socket. This change renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new state field on the socket, so_qstate, as the locking for these flags is substantially different for the locking on the remainder of the flags in so_state.	2004-06-01 02:42:56 +00:00
truckman	d503c79cad	Add MSG_NBIO flag option to soreceive() and sosend() that causes them to behave the same as if the SS_NBIO socket flag had been set for this call. The SS_NBIO flag for ordinary sockets is set by fcntl(fd, F_SETFL, O_NONBLOCK). Pass the MSG_NBIO flag to the soreceive() and sosend() calls in fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag on the underlying socket for each I/O operation. The O_NONBLOCK flag is a property of the descriptor, and unlike ordinary sockets, fifos may be referenced by multiple descriptors.	2004-06-01 01:18:51 +00:00
bmilekic	f7574a2276	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00
rwatson	13656d723e	Assert Giant in vn_start_write() and vn_finished_write().	2004-05-31 20:56:10 +00:00
rwatson	afc098b3e1	Assert Giant in vrele().	2004-05-31 19:06:01 +00:00
phk	30a7ac8468	Add missing #include <sys/module.h>	2004-05-30 20:34:58 +00:00
phk	d6f7d2bde6	Add some missing <sys/module.h> includes which are masked by the one on death-row in <sys/kernel.h>	2004-05-30 17:57:46 +00:00
tjr	2bc3263ac9	Enable MI bits for gcc -ftest-coverage -fprofile-arcs on amd64.	2004-05-29 01:18:14 +00:00
pjd	19d2b54248	Sysctl hw.bus.devctl_disable shouldn't be writtable from inside a jail. Approved by: imp	2004-05-26 16:36:32 +00:00
tmm	7b769ce88f	Retire cpu_sched_exit(); it is not used any more.	2004-05-26 12:09:39 +00:00
des	7eb92d1257	As previously threatened, give each device its own sysctl context and subtree (under the new dev top-level node). This should greatly simplify drivers which need per-device sysctl variables (such as ndis).	2004-05-25 12:06:26 +00:00
gad	d284b07886	Implement the new KERN_PROC_RGID option, and also implement the KERN_PROC_SESSION option which had been previously defined but never implemented. PR: bin/65803 (a very tiny piece of the PR)` Submitted by: Cyrille Lefevre	2004-05-22 23:11:44 +00:00
davidxu	e7578c3795	Clear KSE thread flags after KSE thread mode is ended. The side effect of not clearing the flags for execv() syscall will result that a new program runs in KSE thread mode without enabling it. Submitted by: tjr Modified by: davidxu	2004-05-21 14:50:23 +00:00
bde	9ec48f4cab	Fixed some style bugs in tdsigwakeup().	2004-05-21 10:02:24 +00:00
jhb	91621895aa	In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if a thread is on a sleep queue and should have it's sleep aborted. Reported by: Thierry Herbelot thierry at herbelot dot com	2004-05-20 20:17:28 +00:00
bde	b3597241df	Fixed printf format errors which helped break GUPROF for arches with 64-bit function pointers.	2004-05-20 16:48:17 +00:00
bde	38ad669603	Initialize the history counter type field in struct gmonparam as threatened in rev.1.10 of usr.sbin/kgmon/kgmon.c more than 2 years ago. kgmon has been recovering from the missing initialization for too long, but the fixup there is ifdefed for i386's and shouldn't be needed for other arches.	2004-05-20 16:42:39 +00:00
bde	36151be4c9	Moved i386 asms to an i386 header. The asms are for calibration of high resolution kernel profiling (options GUPROF. "U" in GUPROF stands for microseconds resolution, but the resolution is now smaller than 1 nanosecond on multi-GHz machines and the accuracy is heading towards 1 nanosecond too). Arches that support GUPROF must now provide certain macros for the calibration. GUPROF is now only supported for i386's, so the absence of the new macros for other arches doesn't break anything that wasn't already broken. amd64's have uncommitted support for GUPROF, and sparc64's have support that seems to be complete except here (there was an #error for non-i386 cases; now there are undefined macros). Changed the asms a little: - declare them as __volatile. They must not be moved, and exporting a label across asms is technically incorrect, so try harder to stop gcc moving them. - don't put the non-clobbered register "bx" in the clobber list. The clobber lists are still more conservative than necessary. - drop the non-support for gcc-1. It just gave a better error message, and this is not useful since compiling with gcc-1 would cause thousands of worse error messages. - drop the support for aout.	2004-05-20 16:12:19 +00:00
pjd	7cbfe4913b	Fix sysctl name: security.jail.getfsstate_getfsstatroot_only -> security.jail.getfsstatroot_only. Approved by: rwatson	2004-05-20 05:28:44 +00:00
bde	eef4a49645	Include <sys/gmon.h> instead of <machine/profile.h> for the declaration of kmupetext(). The declaration is misplaced in <machine/profile.h> since it is not MD and not related to the lowest level of profiling. It will be moved, but getting it via <sys/gmon.h> already works.	2004-05-19 14:36:38 +00:00
ps	b36520446e	syncache broke rev 1.23 which was done to fix the "thundering herd" problem in Apache. Fix it. Reviewed by: peter	2004-05-19 00:22:10 +00:00
peter	235a3d8f64	If a symbol has section+offset definitions provided, always use instead of doing a name lookup for global symbols. This fixes the snd_pcm module.	2004-05-18 05:15:43 +00:00
peter	a6091b3b08	Remove leftover padding variables. Convert some silent 'ignore programmer error' cases into panics Remove 'align' field from section table (no longer needed)	2004-05-18 05:14:19 +00:00
peter	867065a3a4	Since we go to the trouble of compiling the kobj ops table for each class, and cannot handle it going away, add an explicit reference to the kobj class inside each linker class. Without this, a class with no modules loaded will sit with an idle refcount of 0. Loading and unloading a module with it causes a 0->1->0 transition which frees the ops table and causes subsequent loads using that class to explode. Normally, the "kernel" module will remain forever loaded and prevent this happening, but if you have more than one linker class active, only one owns the "kernel". This finishes making modules work for kldload(8) on amd64.	2004-05-17 21:24:39 +00:00
peter	b7f2f12793	Clean up the code some more. Unify the text/data (progbits) and bss (nobits) tables to simplify some code. Try and shorten some of the very wide lines. Somewhere along the way, I think I fixed the memory corruption that caused panics after going multiuser.	2004-05-17 21:20:23 +00:00
peter	961476df3c	Oops, use the generic ELF_ST_BIND() macro instead of ELF64_ST_BIND. Submitted by: marks	2004-05-17 00:51:34 +00:00
peter	ea4215c521	Make a small revision to the api between the elf linker core and the elf_reloc() backends for two reasons. First, to support the possibility of there being two elf linkers in the kernel (eg: amd64), and second, to pass the relocbase explicitly (for relocating .o format kld files).	2004-05-16 20:00:28 +00:00
bde	802b835b3d	Fixed some common printf format errors. Don't assume that "struct foo " is "void " (it isn't) or that the default promotion of pid_t is int. Instead, assume that casting "struct foo " to "void " and printing the result with %p is useful, and that all pid_t's are representable as longs. Fixed some minor style bugs (mainly spelling errors in comments).	2004-05-14 20:51:42 +00:00
jhb	4e9e9bbec8	Split sleepq_wakeup_thread() into two functions. sleepq_remove_thread() removes a specific thread from a sleep queue. sleepq_resume_thread() resumes scheduling of a thread that has been previously removed from a sleep queue. - sleepq_catch_signals() just removes a thread from the queue it was just added to when a pending signal is found. - sleepq_signal() and sleepq_broadcast() remove threads from a queue, drop the queue lock, and then resume all the previously removed threads. This doesn't completely fix the sched_lock <-> sleepq chain LOR, but it makes it a little better as we no longer call setrunnble() with a sleep queue lock held meaning if setrunnable() tries to wakeup the swapper we don't try to lock two sleep queue chains at the same time.	2004-05-13 20:00:43 +00:00
tjr	e167ef630d	Eliminate a memory leak in kern_symlink() that could occur if vn_start_write() failed.	2004-05-11 10:42:02 +00:00
julian	790c941069	Remove misplaced duplicate comment and slightly reformat the version that was in the right place.	2004-05-09 22:29:14 +00:00
sam	c89a947571	set m_len to reflect mbuf contents on return from m_dup1; fixes an obscure m_pullup case that contributed to breaking ipcomp in tunnel mode for kame Submitted by: itojun Obtained from: kame	2004-05-09 05:57:58 +00:00
julian	98d03d5d06	Fix rtprio() to do sensible things when called from threaded processes. It's not quite correct from a posix Point Of view, but it is a lot better than what was there before. This will be revisited later when we decide what form our priority extensions will take. Posix doesn't specify how a system scope thread can change its priority so you need to add non-standard extensions to be able to do it.. For now make this slightly non standard to allow it to be done. Submitted by: Dan Eischen originally, changed by myself.	2004-05-08 08:56:05 +00:00
alc	b0b9413238	Avoid pointless zeroing of the bogus page in vfs_bio_clrbuf(). Suggested by: tegge@ (from October of last year)	2004-05-08 06:46:40 +00:00
rwatson	7128a3b5cc	Unconditionally lock Giant in do_sendfile(), rather than locking it conditional on debug.mpsafenet. We can try pushing down Giant here later, but we don't want to enter VFS without holding Giant. Bumped into by: kris	2004-05-08 02:24:21 +00:00
cognet	6897229d63	Compare t_brkc against (char)_POSIX_VDISABLE, not against -1. Discussed with: bde	2004-05-07 15:35:38 +00:00
njl	e9ec8dbd49	Move the CPU newbus attachment to i386 legacy. The acpi_cpu device will become just "cpu" and provide attachments in the !legacy case. Tested by: des	2004-05-06 15:54:02 +00:00
alc	b57e5e03fd	Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page. Reviewed by: tegge@	2004-05-06 05:03:23 +00:00
rwatson	9ec8ab1c20	Add /* !MAC */ to final #endif.	2004-05-03 22:54:46 +00:00
rwatson	16bb0e59b9	Bump copyright date for NETA to 2004.	2004-05-03 20:53:27 +00:00
rwatson	a857ce2f0a	Add MAC_STATIC, a kernel option that disables internal MAC Framework synchronization protecting against dynamic load and unload of MAC policies, and instead simply blocks load and unload. In a static configuration, this allows you to avoid the synchronization costs associated with introducing dynamicism. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-05-03 20:53:05 +00:00
cperciva	fe4e3a8b16	Fix a race condition which could result in profprocs being decremented more than once if stopprofclock is called multiple times on the same process.	2004-05-03 00:48:11 +00:00
peter	929c81a05b	Checkpoint commit for an alternative WIP kernel module loader that isn't as dependent on binutils features/quirks as the current one. This one loads plain .o files without having to mess with shared object mode. This happens to be essential on amd64, because binutils hasn't implemented all the quirks/features that we need for producing the hack non-PIC shared objects. As it turned out, .o format isn't all that inconvenient after all. It looks like the ability to use the same .o files for linking directly into a static kernel or loading as a module might be worth it. It is still very much a work-in-progress, but it is almost usable. Other changes are still needed in order to use it though, these have not been committed yet. There is still a memory corruption/overrun bug somewhere. For example, test modules load and work, but the machine explodes a few minutes later in vm_forkproc() or the like. Notable missing things include kldxref support, and loader(8) support. I wanted to figure out a working baseline set of code first.	2004-04-30 16:32:40 +00:00
deischen	122d328ccb	Keep track of threads waiting in kse_release() to avoid a race condition where kse_wakeup() doesn't yet see them in (interruptible) sleep queues. Also add an upcall check to sleepqueue_catch_signals() suggested by jhb. This commit should fix recent mysql hangs. Reviewed by: jhb, davidxu Mysql'd by: Robin P. Blanchard <robin.blanchard at gactr uga edu>	2004-04-28 20:36:53 +00:00
das	9df402daa4	If the buffer supplied to kenv(KENV_DUMP, ...) isn't big enough, return the number of bytes needed instead of 0. The manpage claims that we do this anyway.	2004-04-28 01:27:33 +00:00
bmilekic	6bbcc9da29	Give jail(8) the feature to allow raw sockets from within a jail, which is less restrictive but allows for more flexible jail usage (for those who are willing to make the sacrifice). The default is off, but allowing raw sockets within jails can now be accomplished by tuning security.jail.allow_raw_sockets to 1. Turning this on will allow you to use things like ping(8) or traceroute(8) from within a jail. The patch being committed is not identical to the patch in the PR. The committed version is more friendly to APIs which pjd is working on, so it should integrate into his work quite nicely. This change has also been presented and addressed on the freebsd-hackers mailing list. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/65800	2004-04-26 19:46:52 +00:00
pjd	cf34985420	Always use nd.ni_vp->v_mount as an argument for VFS_QUOTACTL(), just like in RELENG_4. Pointed out by: Alex Lyashkov <umka@sevinter.net>	2004-04-26 15:44:42 +00:00
hmp	fdb8f55130	The paper "Hashed Timers and Hierarchical Wheels: Data Structures for the Efficient Implementation of a Timer Facility" was co-author'ed by T. Lauk, not A. Lauk. Adjust nearby whitespace.	2004-04-25 04:10:17 +00:00
alc	c8457a17a5	Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes kmem_alloc_wait()) for mapping the image header. On all machines with a direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.	2004-04-23 03:01:40 +00:00
obrien	cbc25a780d	There was a thread on "unusually high load averages" when running under sched_ule, in January 2004. Looking at this, "pagezero" is (one of) the culprit(s). We had no provision for processes with P_NOLOAD set. With pagezero not running at PRI_ITHD, kseq_load_{add,rem} count pagezero as another-normal-process, thus the "expected-plus-one" load reported in the above thread. Submitted by: Nikos Ntarmos <ntarmos@ceid.upatras.gr>	2004-04-22 21:37:46 +00:00
pjd	f8b184242a	Look out! vn_start_write() is able to return 0 and NULL 'mp'. Submitted by: Alex Lyashkov <shadow@psoft.net>	2004-04-22 15:40:27 +00:00
bde	ff09fc7686	Include <sys/mutex.h> and its prerequisite <sys/lock.h> instesd of depending on namespace pollution in <sys/vnode.h>. Sorted includes.	2004-04-21 12:10:30 +00:00
cperciva	9e96265f37	1. Remove callout_stop binary compatibility. 2. Document that this means that kernel modules must be rebuilt. 3. While I'm here, fix my sorting error in callout.h Requested by: many [1], scottl [2], bde [3]	2004-04-20 15:49:31 +00:00
mtm	023ebd18f8	If you're trying to find out if a thread is valid and in the same process as the current thread it makes absolutely no sense to lock the parent process through the pointer in said thread. Submitted by: pho (with minor correction) Pointy Hat To: mtm	2004-04-19 14:20:01 +00:00
luigi	4d396f17cf	constify the last argument of m_copyback.	2004-04-18 13:01:28 +00:00
bde	62ea68b046	Fixed some style bugs in previous commit (mainly an insertion sort error for declarations, and poorly worded messages). Fixed some nearby style bugs (unsorted declarations).	2004-04-17 02:46:05 +00:00
jhb	24e7821d24	- Enable (unmask) interrupt sources earlier in the ithread loop. Specifically, we used to enable the source after locking sched_lock and just before we had already decided to do a context switch. This meant that an ithread could never process more than one interrupt per context switch. Enabling earlier in the loop before sched_lock is acquired allows an ithread to handle multiple interrupts per context switch if interrupts fire very rapidly. For the case of heavy interrupt load this can reduce the number of context switches (and thus overhead) as well as reduce interrupt latency. - Now that we can handle multiple interrupts per context switch, add simple interrupt storm protection to threaded interrupts. If X number of consecutive interrupts are triggered before the itherad voluntarily yields to another thread, then the interrupt thread will sleep with the associated interrupt source disabled (masked) for 1/10th of a second. The default value of X is 500, but it can be tweaked via the tunable/ sysctl hw.intr_storm_threshold. If an interrupt storm is detected, then a message is output to the kernel console on the first occurrence per interrupt thread. Interrupt storm protection can be disabled completely by setting this value to 0. There is no scientific reasoning for the 1/10th of a second or 500 interrupts values, so they may require tweaking at some point in the future. Tested by: rwatson (an earlier version w/o the storm protection) Tested by: mux (reportedly made a machine with two PCI interrupts storming usable rather than hard locked) Reviewed by: imp	2004-04-16 20:25:40 +00:00
rwatson	800e19506e	At some point during the history of m_getcl(), MAC support began to unconditionally initialize the mbuf header even if cluster allocation failed, which could result in a NULL pointer dereference in low-memory conditions. PR: kern/65548 Submitted by: Stephan Uphoff <ups@tree.com>	2004-04-16 14:35:11 +00:00
ru	2b44d13e3e	Ensure that the poll_burst <= poll_burst_max constraint really holds. Reviewed by: luigi	2004-04-15 07:38:44 +00:00
imp	b449361466	Fix off by one error, twice. Submitted by: Carlos Velasco (first one), jhb (second one)	2004-04-12 23:02:21 +00:00
cperciva	7eb8531271	stop() no longer needs sched_lock held; in fact, holding sched_lock causes a LOR against sleepq. Fix the comment, and fix ptracestop() to pick up sched_lock after stop() rather than before. Reported by: Scott Sipe <cscotts@mindspring.com> Reviewed by: rwatson, jhb	2004-04-12 15:56:05 +00:00
mux	79217d1505	Put deprecated sysctl code inside BURN_BRIDGES.	2004-04-11 21:09:22 +00:00
alc	643a21e287	Use vm_page_hold() rather than vm_page_wire() for short-duration page wiring. The reason being that vm_page_hold() is cheaper.	2004-04-11 19:57:11 +00:00
mux	74cb325f5e	Remove a comment that complains about the lack of %qd, to justify truncating a rlim_t to a long. We have %qd since some time now. However, the correct format to use here is %jd and a cast to intmax_t, so do this.	2004-04-10 11:08:16 +00:00
peadar	fd75a2f931	Plug minor memory leak of module_t structures when unloading a file from the kernel. Reviewed By: Doug Rabson (dfr@)	2004-04-09 15:27:38 +00:00
cognet	acc91f284d	Spell "switches" a more conventional way.	2004-04-09 14:31:29 +00:00
rwatson	435aae62de	Compare pointers with NULL rather than using pointers are booleans in if/for statements. Assign pointers to NULL rather than typecast 0. Compare pointers with NULL rather than 0.	2004-04-09 13:23:51 +00:00
silby	d35a5d60b9	Fix a regression in my change which sends headers along with data; a side effect of that change caused headers to not be sent if a 0 byte file was passed to sendfile. This change fixes that behavior, allowing sendfile to send out the headers even with a 0 byte file again. Noticed by: Dirk Engling	2004-04-08 07:14:34 +00:00
marcel	9584da2d1f	Do not assume that the initial thread (i.e. the thread with the ID equal to the process ID) is still present when we dump a core. It already may have been destroyed. In that case we would end up dereferencing a NULL pointer, so specifically test for that as well. Reported & tested by: Dan Nelson <dnelson@allantgroup.com>	2004-04-08 06:37:00 +00:00
cperciva	9c466edcbc	Add whitespace before comment blocks. (reported by njl) Remove spurious whitespace, add indent protection, fix punctuation, remove initialization of static variables to zero, put wakeup_ctr and wakeup_needed in the correct order. (reported by bde) This doesn't fix all the style bugs I introduced, but the remaining style bugs make it easier for me to understand what I did here.	2004-04-08 02:03:49 +00:00
imp	b49b7fe799	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 20:46:16 +00:00
cperciva	b174697b69	Fix filt_timer* races: Finish initializing a knote before we pass it to a callout, and use the new callout_drain API to make sure that a callout has finished before we deallocate memory it is using. PR: kern/64121 Discussed with: gallatin	2004-04-07 05:59:57 +00:00
cperciva	e0793884f3	Introduce a callout_drain() function. This acts in the same manner as callout_stop(), except that if the callout being stopped is currently in progress, it blocks attempts to reset the callout and waits until the callout is completed before it returns. This makes it possible to clean up callout-using code safely, e.g., without potentially freeing memory which is still being used by a callout. Reviewed by: mux, gallatin, rwatson, jhb	2004-04-06 23:08:49 +00:00
jhb	241908535b	Associate a simple count of waiters with each condition variable. The count is protected by the mutex that protects the condition, so the count does not require any extra locking or atomic operations. It serves as an optimization to avoid calling into the sleepqueue code at all if there are no waiters. Note that the count can get temporarily out of sync when threads sleeping on a condition variable time out or are aborted. However, it doesn't hurt to call the sleepqueue code for either a signal or a broadcast when there are no waiters, and the count is never out of sync in the opposite direction unless we have more than INT_MAX sleeping threads.	2004-04-06 19:17:46 +00:00
jhb	7cf9a1d044	Add a new kernel option MUTEX_WAKE_ALL that changes the mutex unlock code to awaken all waiters when a contested mutex is released instead of just the highest priority waiter. If the various threads are awakened in sequence then each thread may acquire and release the lock in question without contention resulting in fewer expensive unlock and lock operations. This old behavior of waking just the highest priority is still used if this option is specified. Making the algorithm conditional on a kernel option will allows us to benchmark both cases later and determine which one should be used by default. Requested by: tanimura-san	2004-04-06 19:12:24 +00:00
jhb	8ab84688c3	Rename turnstile_wakeup() to turnstile_broadcast() to make the naming more consistent with other APIs. sleepq and cv's use signal/broadcast, and msleep uses wakeup_one/wakeup. Prior to this turnstiles were using a signal/wakeup mixture.	2004-04-06 19:07:21 +00:00
bde	122259ad35	Removed some less than useful comments: - don't say what a small subset of the options includes are for. - don't mark up functions which use all their args with /* ARGSUSED */. The markup should have been removed when the unused retval parameter was removed. - don't comment on what routine suser() checks do. Removed nearby excessive vertical whitespace.	2004-04-06 10:05:02 +00:00
imp	74cf37bd00	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
dfr	b750882696	Try not to crash instantly when signalling a libthr program to death.	2004-04-05 15:06:01 +00:00
dfr	674d9507bb	Regen.	2004-04-05 10:17:23 +00:00
dfr	c76397ed09	Add lgetfh(2) which is like getfh(2) but doesn't follow symlinks.	2004-04-05 10:15:53 +00:00
rwatson	40e717b764	Detatch incorrect spellings of detach.	2004-04-04 19:15:45 +00:00
jeff	3f884ddf6b	- Use the proper constant in sched_interact_update(). Previously, SCHED_INTERACT_MAX was used where SCHED_SLP_RUN_MAX was needed. This was causing the interactivity scaler to lose history at a more dramatic rate than intended.	2004-04-04 19:12:56 +00:00
marcel	f16d24b1ae	Create NT_PRSTATUS and NT_FPREGSET notes for each and every thread in the process. This is required for proper debugging of corefiles created by 1:1 or M:N threaded processes. Add an XXX comment where we should actually call a function that dumps MD specific notes. An example of a MD specific note is the NT_PRXFPREG note for SSE registers. Since BFD creates non-annotated pseudo-sections for the first PRSTATUS and FPREGSET notes (non-annotated in the sense that the name of the section does not contain the pid/tid), make sure those sections describe the initial thread of the process (i.e. the thread which tid equals the pid). This is not strictly necessary, but makes sure that tools that use the non-annotated section names will not change behaviour due to this change. The practical upshot of this all is that one can see the threads in the debugger when looking at a corefile. For 1:1 threading this means that all threads are visible.	2004-04-03 20:25:41 +00:00
marcel	1d37410c51	Assign thread IDs to kernel threads. The purpose of the thread ID (tid) is twofold: 1. When a 1:1 or M:N threaded process dumps core, we need to put the register state of each of its kernel threads in the core file. This can only be done by differentiating the pid field in the respective note. For this we need the tid. 2. When thread support is present for remote debugging the kernel with gdb(1), threads need to be identified by an integer due to limitations in the remote protocol. This requires having a tid. To minimize the impact of having thread IDs, threads that are created as part of a fork (i.e. the initial thread in a process) will inherit the process ID (i.e. tid=pid). Subsequent threads will have IDs larger than PID_MAX to avoid interference with the pid allocation algorithm. The assignment of tids is handled by thread_new_tid(). The thread ID allocation algorithm has been written with 3 assumptions in mind: 1. IDs need to be created as fast a possible, 2. Reuse of IDs may happen instantaneously, 3. Someone else will write a better algorithm.	2004-04-03 15:59:13 +00:00
alc	1ec4d75266	In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it should not. Add a new parameter so that the caller can specify which is the case. Reported by: dillon	2004-04-03 09:16:27 +00:00
kris	90df1ff5bf	Add missing comment terminator.	2004-04-02 04:57:40 +00:00
julian	89d6572b1f	The comment complained about not having a thread_unlink() and did the work itself, but thread_unink() has existed for a while... use it.	2004-04-02 01:01:34 +00:00
jhb	03d6afada4	Finish fixing up Alpha to work with an MP safe ptrace(): - ptrace_single_step() is no longer called with the proc lock held, so don't try to unlock it and then relock it. - Push Giant down into proc_rwmem() instead of forcing all the consumers (including Alpha breakpoint support) to explicitly wrap calls to proc_rwmem() with Giant. Tested by: kensmith	2004-04-01 20:56:44 +00:00
scottl	7996fbe9e3	Don't print out 'GIANT-LOCKED' for INTR_FAST drivers.	2004-04-01 07:18:42 +00:00
pjd	cda068b39a	Remove sysctl kern.ps_argsopen, it is not very useful, one should use security.bsd.see_other_uids instead. Discussed with: phk, rwatson	2004-04-01 00:10:45 +00:00
pjd	a8489e86d9	Remove ps_argsopen check. It is was bogus in the past and was corrected not quite well by me - if kern.ps_argsopen was set to 0, users weren't permitted to see arguments of even own processes. But kern.ps_argsopen is going away, so just remove this check and leave security checks for p_cansee() function.	2004-04-01 00:08:20 +00:00
julian	7a48fb22ac	Remove unused variable.	2004-03-31 08:20:44 +00:00
rwatson	6f9005a98d	In sofree(), avoid nested declaration and initialization in declaration. Observe that initialization in declaration is frequently incompatible with locking, not just a bad idea due to style(9). Submitted by: bde	2004-03-31 03:48:35 +00:00
rwatson	8eeaad5c11	Export uipc_connect2() from uipc_usrreq.c instead of unp_connect2(), and consume that interface in portalfs and fifofs instead. In the new world order, unp_connect2() assumes that the unpcb mutex is held, whereas uipc_connect2() validates that the passed sockets are UNIX domain sockets, then grabs the mutex. NB: the portalfs and fifofs code gets down and dirty with UNIX domain sockets. Maybe this is a bad thing.	2004-03-31 01:41:30 +00:00
alc	707630ec9c	White space and wording changes to init_param3(). Mostly submitted by: bde	2004-03-30 08:00:11 +00:00
rwatson	b5754b5316	Prefer NULL to 0 when testing and assigning pointer values.	2004-03-30 02:16:25 +00:00
peter	7957bc47f6	Shorten some XXXKSE commentry	2004-03-29 22:46:54 +00:00
peter	5e91995e52	Kill some XXXKSE's. vnlru/syncer are single threaded.	2004-03-29 22:45:33 +00:00
peter	1f224a3d83	Clean up the stub fake vnode locking implemenations. The main reason this stuff was here (NFS) was fixed by Alfred in November. The only remaining consumer of the stub functions was umapfs, which is horribly horribly broken. It has missed out on about the last 5 years worth of maintenence that was done on nullfs (from which umapfs is derived). It needs major work to bring it up to date with the vnode locking protocol. umapfs really needs to find a caretaker to bring it into the 21st century. Functions GC'ed: vop_noislocked, vop_nolock, vop_nounlock, vop_sharedlock.	2004-03-29 22:41:21 +00:00
rwatson	b39ce8f898	Use a common return path for filt_soread() and filt_sowrite() to simplify the impact of locking on these functions. Submitted by: sam Sponsored by: FreeBSD Foundation	2004-03-29 18:06:15 +00:00
rwatson	4edcdd352b	In sofree(), moving caching of 'head' from 'so->so_head' to later in the function once it has been determined to be non-NULL to simplify locking on an earlier return.	2004-03-29 17:57:43 +00:00
rwatson	f31d099747	If debug.mpsafenet, initialize UNIX domain socket timeouts as MPSAFE; otherwise, assert Giant in the callouts.	2004-03-29 17:00:05 +00:00
rwatson	7360350f89	Conditionally acquire Giant when entering the sockets layer via the socket-specific system calls based on debug.mpsafenet, rather than acquiring Giant unconditionally.	2004-03-29 02:21:56 +00:00
rwatson	3b7dc3c3f7	Conditionally acquire Giant when entering the socket layer via file descriptor operations based on debug.mpsafenet, rather than acquiring Giant unconditionally.	2004-03-29 01:55:32 +00:00
rwatson	52c072609b	When validating that the length sum in recvit(), we fail to release Giant on an error. Add a Giant acquisition. Reviewed by: sam, bms	2004-03-29 01:37:06 +00:00
rwatson	0feec33757	Conditionally assert Giant in fputsock() based on the value of debug.mpsafenet.	2004-03-29 00:33:02 +00:00
alc	521fa57364	Revise the direct or optimized case to use uiomove_fromphys() by the reader instead of ephemeral mappings using pmap_qenter() by the writer. The writer is still, however, responsible for wiring the pages, just not mapping them. Consequently, the allocation of KVA for the direct case is unnecessary. Remove it and the sysctls limiting it, i.e., kern.ipc.maxpipekvawired and kern.ipc.amountpipekvawired. The number of temporarily wired pages is still, however, limited by kern.ipc.maxpipekva. Note: On platforms lacking a direct virtual-to-physical mapping, uiomove_fromphys() uses sf_bufs to cache ephemeral mappings. Thus, the number of available sf_bufs can influence the performance of pipes on platforms such i386. Surprisingly, I saw the greatest gain from this change on such a machine: lmbench's pipe bandwidth result increased from ~1050MB/s to ~1850MB/s on my 2.4GHz, 400MHz FSB P4 Xeon.	2004-03-27 19:50:23 +00:00
marcel	fb520fa860	Change the type of the various CPU masks to cpumask_t. Note that as long as there are still explicit uses of int, whether in types or in function names (such as atomic_set_int() in sched_ule.c), we can not change cpumask_t to be anything other than u_int. See also the commit log for sys/sys/types.h, revision 1.84.	2004-03-27 18:21:24 +00:00
mtm	02e9e2319a	Regen for libthr thread synchronization syscalls.	2004-03-27 14:34:17 +00:00
mtm	873aa62c96	Use the proc lock to sleep on a libthr umtx.	2004-03-27 14:32:03 +00:00
mtm	adb111ed69	Separate thread synchronization from signals in libthr. Instead use msleep() and wakeup_one(). Discussed with: jhb, peter, tjr	2004-03-27 14:30:43 +00:00
pjd	deaccb5ae7	- Add a description for vfs.usermount sysctl. - Add the vfs_equalopts() function for mount options comparsion. Now it looks much more clear. - Style fixed. In co-operation with: bde	2004-03-27 08:39:28 +00:00
pjd	b04fb12275	- Loudly disallow MNT_SUIDDIR mount flag for unprivileged users mounts. - Style fixed. Submitted by: bde	2004-03-27 08:09:00 +00:00
pjd	b05f0288da	We probably shouldn't allow users to mount file systems with MNT_SUIDDIR. There should be not shell access when SUIDDIR is compiled in, but better be sure. Reviewed by: rwatson	2004-03-26 21:12:14 +00:00
alc	cab68d38d2	Use uiomove_fromphys() instead of pmap_qenter() and pmap_qremove() in proc_rwmem().	2004-03-24 23:35:04 +00:00
imp	5d9dc609bb	Conform to local file sytle and prefer (a && (b & flag)).	2004-03-24 16:49:37 +00:00
obrien	e182cf9fee	Change the !MPSAFE boot string to something that doesn't potentially scare users that the kernel won't run on MP systems.	2004-03-23 01:58:09 +00:00
alfred	fbfae479f4	Emit a traceback when witness_trace is set and witness_warn() is called and triggers (typically caused by sleeping with a non-sleepable lock). Reviewed by: jhb	2004-03-23 00:32:27 +00:00
obrien	71b6e14bc8	Rather than display which interrupts are MPSAFE, display those that aren't. This way we can take stock of the work to be done. boot -v will note those interrupts that are MPSAFE.	2004-03-22 22:36:11 +00:00
ps	e230d44ca7	Remove some netbsd debug code that crept into rev 1.116	2004-03-22 10:17:40 +00:00
obrien	7b9a2bdb17	Give a more reasonable CPU time to the threads which are using scheduler activation (i.e., applications are using libpthread). This is because SCHED_ULE sometimes puts P_SA processes into ksq_next unnecessarily. Which doesn't give fair amount of CPU time to processes which are using scheduler-activation-based threads when other (semi-)CPU-intensive, non-P_SA processes are running. Further work will no doubt be done by jeffr at a later date. Submitted by: Taku YAMAMOTO <taku@cent.saitama-u.ac.jp> Reviewed by: rwatson, freebsd-current@	2004-03-21 18:53:29 +00:00
julian	5e0a5420a9	Massively up the (artificial) limit on system scope threads in a process from 50 to 500 Also up the number of process scope threads allowed to be in the kernel at one time from 150 to 1500 (per process)	2004-03-21 09:22:38 +00:00
green	aae79f39c1	Add the missing Giant when doing anything with VFS -- in this case, releasing the ktrace vnode.	2004-03-18 18:15:58 +00:00
nectar	97b3d4b119	Verify more bits of the ELF header: the program header table entry size and the ELF version. Also, avoid a potential integer overflow when determining whether the ELF header fits entirely within the first page. Reviewed by: jdp A panic when attempting to execute an ELF binary with a bogus program header table entry size was Reported by: Christer Öberg <christer.oberg@texonet.com>	2004-03-18 16:33:05 +00:00
alc	d48f3c1617	Revise socow_iodone() in light of recent sf_buf changes. Specifically, use sf_buf_free() instead of sf_buf_mext() to consolidate all actions that require the page queues lock in one critical section. While I'm here remove unnecessary splvm() and splx() calls.	2004-03-17 23:25:04 +00:00
jhb	275240297d	- Replace wait1() with a kern_wait() function that accepts the pid, options, status pointer and rusage pointer as arguments. It is up to the caller to copyout the status and rusage to userland if needed. This lets us axe the 'compat' argument and hide all that functionality in owait(), by the way. This also cleans up some locking in kern_wait() since it no longer has to drop locks around copyout() since all the copyout()'s are deferred. - Convert owait(), wait4(), and the various ABI compat wait() syscalls to use kern_wait() rather than wait1() or wait4(). This removes a bit more stackgap usage. Tested on: i386 Compiled on: i386, alpha, amd64	2004-03-17 20:00:00 +00:00
pjd	11852bf574	Fix information leakage. Without this fix it is possible to cheat policies like: - sysctl security.bsd.see_other_[gu]ids=0, - mac_seeotheruids(4), - jail(2) and get full processes list with their arguments. This problem exists from revision 1.62 of kern_proc.c when it was introduced. Reviewed by: nectar, rwatson.	2004-03-17 13:19:43 +00:00
cperciva	b9e38dc622	Adjust the number of processes waiting on a semaphore properly if we're woken up in the middle of sleeping. PR: misc/64347 Reviewed by: tjr MFC after: 7 days	2004-03-17 09:37:13 +00:00
alc	a2e820d27b	Refactor the existing machine-dependent sf_buf_free() into a machine- dependent function by the same name and a machine-independent function, sf_buf_mext(). Aside from the virtue of making more of the code machine- independent, this change also makes the interface more logical. Before, sf_buf_free() did more than simply undo an sf_buf_alloc(); it also unwired and if necessary freed the page. That is now the purpose of sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used as a general-purpose emphemeral map cache.	2004-03-16 19:04:28 +00:00
jhb	71c3a1c44c	Remove a bogus assertion and readd it in a more correct location. A thread might be enqueued on a sleep queue but not be asleep when the timeout fires if it is blocked on a lock trying to check for pending signals before going to sleep. In the case of fixing up the TDF_TIMEOUT race, however, the thread must be marked asleep. Reported by: kan (the bogus one)	2004-03-16 18:56:22 +00:00
grehan	84ca086b14	Add powerpc to temporary fix. The new cpu device claims all 'generic' OpenFirmware nexus nodes, since it uses bus_generic_probe. Maybe the cpu device probe should be MD.	2004-03-16 13:34:50 +00:00
dwmalone	116755fef7	Nudge Giant as far as I can into kern_open(). Mark open() as MPSAFE. Use kern_open() to implement creat() rather than taking the long route through open(). Mark creat as MPSAFE. While I'm at it, mark nosys() (syscall 0) as MPSAFE, for all the difference it will make.	2004-03-16 10:46:42 +00:00
dwmalone	bfd66a78ad	Get ready to mark open, creat and nosys as MPSAFE.	2004-03-16 10:41:23 +00:00
tjr	d37f036c18	Make vfs_nmount() public. The Linux emulator needs this in order to mount linprocfs filesystems.	2004-03-16 08:59:37 +00:00
truckman	3c4fad0869	Rename the wiredlen member of struct sysctl_req to validlen and always set it to avoid the need for a bunch of code that tests whether or not the lock member is set to REQ_WIRED in order to determine which length member should be used. Fix another bug in the oldlen return value code. Fix a potential wired memory leak if a sysctl handler uses sysctl_wire_old_buffer() and returns an EAGAIN error to trigger a retry.	2004-03-16 06:53:03 +00:00
truckman	2b0bda9870	Don't bother calling vslock() and vsunlock() if oldlen is zero. If vslock() returns ENOMEM, sysctl_wire_old_buffer() should set wiredlen to zero and return zero (success) so that the handler will operate according to sysctl(3): The size of the buffer is given by the location specified by oldlenp before the call, and that location gives the amount of data copied after a successful call and after a call that returns with the error code ENOMEM. The handler will return an ENOMEM error because the zero length buffer will overflow.	2004-03-16 01:28:45 +00:00
jhb	1ed25bd5c1	Regen for ptrace being safe again.	2004-03-15 18:50:06 +00:00
jhb	d8445e0c8d	Drop the proc lock around calls to the MD functions ptrace_single_step(), ptrace_set_pc(), and cpu_ptrace() so that those functions are free to acquire Giant, sleep, etc. We already do a PHOLD/PRELE around them so that it is safe to sleep inside of these routines if necessary. This allows ptrace() to be marked MP safe again as it no longer triggers lock order reversals on Alpha. Tested by: wilko	2004-03-15 18:48:28 +00:00
pjd	aae1ea0f99	Remove sysctl security.jail.list_allowed. This functionality was a misfeature, sysctl was added and turned off by default just to check if nobody complains. Reviewed by: rwatson	2004-03-15 12:10:34 +00:00
truckman	b7a6af3cc9	Revert to the original vslock() and vsunlock() API with the following exceptions: Retain the recently added vslock() error return. The type of the len argument should be size_t, not u_int. Suggested by: bde	2004-03-15 06:42:40 +00:00
phk	7a8abdd585	Annual NTP kernel code spring-cleaning: Use int64_t rather than long long for the fixpoint type. Don't discard fractional nanosecond frequency correction.	2004-03-14 15:23:05 +00:00
peter	7a96caafd4	Set default HZ to 1024 for amd64. The comment in kern/tty.c doesn't apply here because we have 64 bit longs and don't suffer the hz > 169 overflows.	2004-03-14 05:49:31 +00:00
peter	bd5efd4600	Make the process_exit eventhandler run without Giant. Add Giant hooks in the two consumers that need it.. processes using AIO and netncp. Update docs. Say that process_exec is called with Giant, but not to depend on it. All our consumers can handle it without Giant.	2004-03-14 02:06:28 +00:00
peter	963c36c195	Move the process_fork event out from under Giant. This one is easy, since there are no consumers in the tree. Document this.	2004-03-14 01:48:32 +00:00
peter	c59ea86c9f	Regen for mpsafe kse_create()	2004-03-13 22:32:17 +00:00
peter	1cb95fd2b7	Push Giant down a little further: - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe	2004-03-13 22:31:39 +00:00
rwatson	6a1193a64c	Add annotations to mtx_lock(&Giant) in kern_select() and poll() that we always grab Giant, even if we're actually only polling objects that don't require giant. Once socket locking is merged, there will be strong motivation to fix this.	2004-03-13 05:58:57 +00:00
bde	58d250bc29	Align the offset in vn_rdwr_inchunks() so that at most the first and the last chunk are misaligned relative to a MAXBSIZE byte boundary. vn_rdwr_inchunks() is used mainly for elf core dumps, and elf sections are usually perfectly misaligned relative to MAXBSIZE, and chunking prevents the file system from doing much realigning. This gives a surprisingly large speedup for core dumps -- from 50 to 13 seconds for a 512MB core dump here. The pessimization was mostly from an interaction of the misalignment with IO_DIRECT. It increased the number of i/o's for each chunk by a factor of 5 (3 writes and 2 read-before-writes instead of 1 write).	2004-03-13 02:56:27 +00:00
trhodes	dfcfecd6e4	These are changes to allow to use the Intel C/C++ compiler (lang/icc) to build the kernel. It doesn't affect the operation if gcc. Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as icc v8 may define __GNUC__ some parts may look strange but are necessary. Additional changes: - in_cksum.[ch]: * use a generic C version instead of the assembly version in the !gcc case (ASM code breaks with the optimizations icc does) -> no bad checksums with an icc compiled kernel Help from: andre, grehan, das Stolen from: alpha version via ppc version The entire checksum code should IMHO be replaced with the DragonFly version (because it isn't guaranteed future revisions of gcc will include similar optimizations) as in: ---snip--- Revision Changes Path 1.12 +1 -0 src/sys/conf/files.i386 1.4 +142 -558 src/sys/i386/i386/in_cksum.c 1.5 +33 -69 src/sys/i386/include/in_cksum.h 1.5 +2 -0 src/sys/netinet/igmp.c 1.6 +0 -1 src/sys/netinet/in.h 1.6 +2 -0 src/sys/netinet/ip_icmp.c 1.4 +3 -4 src/contrib/ipfilter/ip_compat.h 1.3 +1 -2 src/sbin/natd/icmp.c 1.4 +0 -1 src/sbin/natd/natd.c 1.48 +1 -0 src/sys/conf/files 1.2 +0 -1 src/sys/conf/files.amd64 1.13 +0 -1 src/sys/conf/files.i386 1.5 +0 -1 src/sys/conf/files.pc98 1.7 +1 -1 src/sys/contrib/ipfilter/netinet/fil.c 1.10 +2 -3 src/sys/contrib/ipfilter/netinet/ip_compat.h 1.10 +1 -1 src/sys/contrib/ipfilter/netinet/ip_fil.c 1.7 +1 -1 src/sys/dev/netif/txp/if_txp.c 1.7 +1 -1 src/sys/net/ip_mroute/ip_mroute.c 1.7 +1 -2 src/sys/net/ipfw/ip_fw2.c 1.6 +1 -2 src/sys/netinet/igmp.c 1.4 +158 -116 src/sys/netinet/in_cksum.c 1.6 +1 -1 src/sys/netinet/ip_gre.c 1.7 +1 -2 src/sys/netinet/ip_icmp.c 1.10 +1 -1 src/sys/netinet/ip_input.c 1.10 +1 -2 src/sys/netinet/ip_output.c 1.13 +1 -2 src/sys/netinet/tcp_input.c 1.9 +1 -2 src/sys/netinet/tcp_output.c 1.10 +1 -1 src/sys/netinet/tcp_subr.c 1.10 +1 -1 src/sys/netinet/tcp_syncache.c 1.9 +1 -2 src/sys/netinet/udp_usrreq.c 1.5 +1 -2 src/sys/netinet6/ipsec.c 1.5 +1 -2 src/sys/netproto/ipsec/ipsec.c 1.5 +1 -1 src/sys/netproto/ipsec/ipsec_input.c 1.4 +1 -2 src/sys/netproto/ipsec/ipsec_output.c and finally remove sys/i386/i386 in_cksum.c sys/i386/include in_cksum.h ---snip--- - endian.h: * DTRT in C++ mode - quad.h: * we don't use gcc v1 anymore, remove support for it Suggested by: bde (long ago) - assym.h: * avoid zero-length arrays (remove dependency on a gcc specific feature) This change changes the contents of the object file, but as it's only used to generate some values for a header, and the generator knows how to handle this, there's no impact in the gcc case. Explained by: bde Submitted by: Marius Strobl <marius@alchemy.franken.de> - aicasm.c: * minor change to teach it about the way icc spells "-nostdinc" Not approved by: gibbs (no reply to my mail) - bump __FreeBSD_version (lang/icc needs to know about the changes) Incarnations of this patch survive gcc compiles since a loooong time, I use it on my desktop. An icc compiled kernel works since Nov. 2003 (exceptions: snd_* if used as modules), it survives a build of the entire ports collection with icc. Parts of this commit contains suggestions or submissions from Marius Strobl <marius@alchemy.franken.de>. Reviewed by: -arch Submitted by: netchild	2004-03-12 21:45:33 +00:00
ru	50a11e8dfd	Do what the execve(2) manpage says and enforce what a Strictly Conforming POSIX application should do by disallowing the argv argument to be NULL. PR: kern/33738 Submitted by: Marc Olzheim, Serge van den Boom OK'ed by: nectar	2004-03-12 21:06:20 +00:00
kensmith	2cb0d83a6c	This is a temporary fix to solve a regression issue on sparc64 that is caused by the way sparc64 registers its CPUs. Nate will work on a real fix shortly. Approved by: njl	2004-03-12 20:35:21 +00:00
jhb	c754b5af47	- Remove old sleep queues. - Remove sleepqueue argument from sleepq_set_timeout() since it is not used.	2004-03-12 19:06:18 +00:00
jhb	6103cfbeb5	Fixup a comment.	2004-03-12 19:05:46 +00:00
des	a77c8e8035	Replace a manual check of a VMIO candidate with vn_canvmio(). This silences an annoying warning in getblk() when VMIO'ing on a directory vnode, which can happen when vfs.vmiodirenable is 1. Bring the warning message in line with reality at the same time. Submitted by: hmp	2004-03-12 12:02:12 +00:00
phk	5c532f7fd4	When I was a kid my work table was one cluttered mess an cleaning it up were a rather overwhelming task. I soon learned that if you don't know where you're going to store something, at least try to pile it next to something slightly related in the hope that a pattern emerges. Apply the same principle to the ffs/snapshot/softupdates code which have leaked into specfs: Add yet a buf-quasi-method and call it from the only two places I can see it can make a difference and implement the magic in ffs_softdep.c where it belongs. It's not pretty, but at least it's one less layer violated.	2004-03-11 18:50:33 +00:00
phk	2a5e157787	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
phk	9ba3cede82	Remove unused mnt_reservedvnlist field.	2004-03-11 16:59:57 +00:00
phk	eeb7579130	Remove unused second arg to vfinddev(). Don't call addaliasu() on VBLK nodes.	2004-03-11 16:33:11 +00:00
phk	7ad97e57ad	Correctly account for extra bits in unit numbers when looking for next free unit.	2004-03-11 14:11:02 +00:00
phk	fdd216910f	Add clone_setup() function rather than rely on lazy initialization. Requested by: rwatson	2004-03-11 12:58:55 +00:00
jmg	cf1b8bdb72	make sure we had the filedesc lock when calling fdinit when RFCFDG is set on call to rfork. Submitted by: Brian Buchanan Semi-Reviewed by: rwatson	2004-03-10 00:27:36 +00:00
njl	89565a7301	Hook CPUs up to newbus. CPUs will ultimately be a bus driver so that multiple CPU-specific drivers can attach. This is a work in progress so children aren't supported yet. Help from: jhb	2004-03-09 03:37:21 +00:00
rwatson	8ff4e76430	Mark loadaverage callout as CALLOUT_MPSAFE. Reviewed by: jhb	2004-03-08 22:01:19 +00:00
pjd	b2d1dcd936	Add two new sysctls: - security.bsd.hardlink_check_uid, when set, means, that unprivileged users are not permitted to create hard links to files not owned by them, - security.bsd.hardlink_check_gid, when set, means, that unprivileged users are not permitted to create hard links to files owned by group they don't belong to. OK'ed by: rwatson	2004-03-08 20:37:25 +00:00
peter	836666b0d7	Move a vref call outside of proc locks and Giant. By virtue of the fact that we (p1) are currently running, we hold a reference on p_textvp which means the vnode cannot go away. p2 cannot run yet (and hence cannot exit) so this should be safe to do at this point. As a bonus, it removes a block of under-Giant code that was there to support the vref.	2004-03-08 00:32:34 +00:00
alc	6b2e0639b2	Remove GIANT_REQUIRED from vunmapbuf().	2004-03-07 00:37:18 +00:00
alc	94f567f9bb	Giant is not required for vm_thread_new_altkstack().	2004-03-07 00:06:32 +00:00
kan	e795b7939d	Always call vn_finished_write after vn_start_write was called. All occurences of 'goto done' after vn_start_write invocation were cleaning up incompletely.	2004-03-06 04:09:54 +00:00
peter	8ac8c686e1	Add a missing part of jhb's previous commit. It looks like he had a patch chunk rejected that he missed. This would manifest as a lock assertion panic at boot (Giant not locked in kern_fork.c). Obtained from: jhb	2004-03-06 00:44:59 +00:00
jhb	2642ed4029	kthread_exit() no longer requires Giant, so don't force callers to acquire Giant just to call kthread_exit(). Requested by: many	2004-03-05 22:42:17 +00:00
jhb	4e1bd1e348	- Push down Giant in exit() and wait(). - Push Giant down a bit in coredump() and call coredump() with the proc lock already held rather than unlocking it only to turn around and relock it. Requested by: peter	2004-03-05 22:39:53 +00:00
jhb	445ca63264	Lock Giant around the single threading code in exec() to satisfy an assertion in the single threading code.	2004-03-05 22:38:26 +00:00
jhb	af72c48e5f	- Grab a share lock of the proctree lock while looking for a pid due to the process group and session dereferences. Also, check that p_pgrp and p_sesssion are NULL before dereferencing them. - Push down Giant in fork1(). Requested by: peter	2004-03-05 22:37:32 +00:00
truckman	367b608998	Undo the merger of mlock()/vslock and munlock()/vsunlock() and the introduction of kern_mlock() and kern_munlock() in src/sys/kern/kern_sysctl.c 1.150 src/sys/vm/vm_extern.h 1.69 src/sys/vm/vm_glue.c 1.190 src/sys/vm/vm_mmap.c 1.179 because different resource limits are appropriate for transient and "permanent" page wiring requests. Retain the kern_mlock() and kern_munlock() API in the revived vslock() and vsunlock() functions. Combine the best parts of each of the original sets of implementations with further code cleanup. Make the mclock() and vslock() implementations as similar as possible. Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent test, which can return EAGAIN, last so that requests that have no hope of ever being satisfied will not be retried unnecessarily. Disable the test that can return EAGAIN in the vslock() implementation because it will cause the sysctl code to wedge. Tested by: Cy Schubert <Cy.Schubert AT komquats.com>	2004-03-05 22:03:11 +00:00
rwatson	702f89fc5d	The roundrobin callout from sched_4bsd is MPSAFE, so set up the callout as MPSAFE to avoid grabbing Giant. Reviewed by: jhb	2004-03-05 19:27:04 +00:00
rwatson	e2aad13d33	Put "failed to set signal flags properly for ast()" check under DIAGNOSTIC instead of INVARIANTS. INVARIANTS is intended for tests that don't substantially change code flow or behavior (passive), but this test required locking both the proc lock and scheduler lock in order to execute. It also appears to be a very advisory diagnostic as opposed to an invariant violation. Following discussion with: bde	2004-03-05 17:35:28 +00:00
phk	e215fa23b4	Just because the timecounter reads the same value on two samples after each other doesn't mean that nothing happened.	2004-03-04 14:14:23 +00:00
bde	9c7937d9cf	Fixed some style bugs (mainly English usage errors in comments).	2004-03-04 09:56:29 +00:00
bde	a3f844af36	Fixed some style bugs (mainly misplaced comments, and totally disordered declarations in acct_process()).	2004-03-04 09:47:09 +00:00
rwatson	48d4fe5ea4	Remove unneeded label 'done2' from socket(). We now grab Giant only around socreate(), and don't need it for file descriptor accesses. Submitted by: sam	2004-03-04 01:57:48 +00:00
des	e6b61c95ad	Use different dummy wait channels to avoid panic in msleep(). Reviewed by: jhb	2004-03-03 23:03:18 +00:00
jhb	286e504b8f	Always assert that the passed in lock is the same as the saved lock in the sleep queue now that the one abnormal case has been fixed.	2004-03-02 15:02:08 +00:00
jhb	93c4123deb	Correct handling of PDROP in msleep() to just skip the mtx_lock() rather than clear the lock pointer so that sleepq_add() still gets the correct lock pointer and doesn't bogusly trip an assertion.	2004-03-02 14:58:33 +00:00
jhb	86e3aa5b6c	Check for TDF_SINTR before calling sleepq_abort() as there is a narrow race in between sleepq_add() and sleepq_catch_signals() in that setting td_wchan and TDF_SINTR is not atomic to sched_lock but only to the sleepq lock. This band-aid will stop assertion failures, but there is perhaps a larger problem with the sleepq_add/sleepq_catch_signals race that I am not sure how to solve. For the signals case the race is harmless because we always call cursig() after setting TDF_SINTR. However, KSE doesn't do anything in sleepq_catch_signals() to check that this race was lost, so I am unsure if this race is harmful for this specific abort.	2004-03-01 23:07:58 +00:00
rwatson	b0b5f961bd	Rename dup_sockaddr() to sodupsockaddr() for consistency with other functions in kern_socket.c. Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0". Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO. Submitted by: sam	2004-03-01 03:14:23 +00:00
scottl	48b0575f79	Convert the other use of flags to mflags in soalloc().	2004-03-01 01:14:28 +00:00
rwatson	94d29f7426	Modify soalloc() API so that it accepts a malloc flags argument rather than a "waitok" argument. Callers now passing M_WAITOK or M_NOWAIT rather than 0 or 1. This simplifies the soalloc() logic, and also makes the waiting behavior of soalloc() more clear in the calling context. Submitted by: sam	2004-02-29 17:54:05 +00:00
phk	fdda2333fa	Loudly announce WITNESS and DIAGNOSTIC options and warn about reduced performance.	2004-02-29 16:56:54 +00:00
phk	c45bc64148	Make sure to disable the watchdog if we cannot honour the timeout.	2004-02-28 22:01:19 +00:00
phk	1ea4f5b08e	Rename the WATCHDOG option to SW_WATCHDOG and make it use the generic watchdoc(9) interface. Make watchdogd(8) perform as watchdog(8) as well, and make it possible to specify a check command to run, timeout and sleep periods. Update watchdog(4) to talk about the generic interface and add new watchdog(8) page.	2004-02-28 20:56:35 +00:00
jhb	d25301c858	Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.	2004-02-27 18:52:44 +00:00
jhb	d76d631711	Drop sched_lock around the wakeup of the parent process after setting the process state to zombie when a process exits to avoid a lock order reversal with the sleepqueue locks. This appears to be the only place that we call wakeup() with sched_lock held.	2004-02-27 18:39:09 +00:00
jhb	d07a9130c6	Add an implementation of a generic sleep queue abstraction that is used to queue threads sleeping on a wait channel similar to how turnstiles are used to queue threads waiting for a lock. This subsystem will be used as the backend for sleep/wakeup and condition variables initially. Eventually it will also be used to replace the ithread-specific iwait thread inhibitor. Sleep queues are also not locked by sched_lock, so this splits sched_lock up a bit further increasing concurrency within the scheduler. Sleep queues also natively support timeouts on sleeps and interruptible sleeps allowing for the reduction of a lot of duplicated code between the sleep/wakeup and condition variable implementations. For more details on the sleep queue implementation, check the comments in sys/sleepqueue.h and kern/subr_sleepqueue.c.	2004-02-27 18:33:09 +00:00
des	51a4ce2dff	Add sysctl_move_oid() which reparents an existing OID.	2004-02-27 17:13:23 +00:00
jhb	b23d8371fa	Clarify and tweak some comments.	2004-02-27 16:14:27 +00:00
jhb	b7ab1db7c3	Fix _sx_assert() to panic() rather than printf() when an assertion fails and ignore assertions if we have already paniced.	2004-02-27 16:13:44 +00:00
jhb	b9e0b0f9af	Replace the ktrace queue's semaphore with a condition variable instead as it is slightly more efficient since we already have a mutex to protect the queue. Ktrace originally used a semaphore more as a proof of concept.	2004-02-26 19:30:22 +00:00

... 2 3 4 5 6 ...

7366 Commits