freebsd-dev

Author	SHA1	Message	Date
Tim J. Robbins	fa2a4d0595	Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb	2004-06-03 01:47:37 +00:00
Robert Watson	d97e0534fa	Expand the hard-coded WITNESS lock order to include the following relationships: Sockets: filedesc->accept->sellck Routing: radix node head->rtentry->ifaddr UDP: udp->udpinp TCP: tcp->tcpinp SLIP: slip_mtx->slip sc_mtx Drop in a place holder section for UNIX domain sockets. Various sections to be expanded over the next few days.	2004-06-02 23:28:06 +00:00
Maxime Henrion	2e34ae7a26	As discussed on arch@, flatten the device sysctl tree to make it more convenient to deal with. The notion of hierarchy is however preserved by adding a new %parent node.	2004-06-02 22:43:35 +00:00
Tim J. Robbins	e4e815db72	Remove a redundant "td = curthread" statement from profclock().	2004-06-02 12:05:06 +00:00
Tim J. Robbins	aa0aa7a113	Move TDF_SA from td_flags to td_pflags (and rename it accordingly) so that it is no longer necessary to hold sched_lock while manipulating it. Reviewed by: davidxu	2004-06-02 07:52:36 +00:00
Jeff Roberson	dc03363dd8	- Run sched_balance() and sched_balance_groups() from hardclock via sched_clock() rather than using callouts. This means we no longer have to take the load of the callout thread into consideration while balancing and should make the balancing decisions simpler and more accurate. Tested on: x86/UP, amd64/SMP	2004-06-02 05:46:48 +00:00
Robert Watson	2658b3bb8e	Integrate accept locking from rwatson_netperf, introducing a new global mutex, accept_mtx, which serializes access to the following fields across all sockets: so_qlen so_incqlen so_qstate so_comp so_incomp so_list so_head While providing only coarse granularity, this approach avoids lock order issues between sockets by avoiding ownership of the fields by a specific socket and its per-socket mutexes. While here, rewrite soclose(), sofree(), soaccept(), and sonewconn() to add assertions, close additional races and address lock order concerns. In particular: - Reorganize the optimistic concurrency behavior in accept1() to always allocate a file descriptor with falloc() so that if we do find a socket, we don't have to encounter the "Oh, there wasn't a socket" race that can occur if falloc() sleeps in the current code, which broke inbound accept() ordering, not to mention requiring backing out socket state changes in a way that raced with the protocol level. We may want to add a lockless read of the queue state if polling of empty queues proves to be important to optimize. - In accept1(), soref() the socket while holding the accept lock so that the socket cannot be free'd in a race with the protocol layer. Likewise in netgraph equivilents of the accept1() code. - In sonewconn(), loop waiting for the queue to be small enough to insert our new socket once we've committed to inserting it, or races can occur that cause the incomplete socket queue to overfill. In the previously implementation, it was sufficient to simply tested once since calling soabort() didn't release synchronization permitting another thread to insert a socket as we discard a previous one. - In soclose()/sofree()/et al, it is the responsibility of the caller to remove a socket from the incomplete connection queue before calling soabort(), which prevents soabort() from having to walk into the accept socket to release the socket from its queue, and avoids races when releasing the accept mutex to enter soabort(), permitting soabort() to avoid lock ordering issues with the caller. - Generally cluster accept queue related operations together throughout these functions in order to facilitate locking. Annotate new locking in socketvar.h.	2004-06-02 04:15:39 +00:00
Robert Watson	f3d055b6de	Rather than assert f_type==DTYPE_VNODE, conditionally perform the file lock release based on f_type==DTYPE_VNODE. vn_closefile() is used by non-vnode types as well (fifo).	2004-06-01 23:36:47 +00:00
Robert Watson	948a4734ed	Add GIANT_REQUIRED to kqueue_close(), since kqueue currently requires Giant.	2004-06-01 18:05:41 +00:00
Robert Watson	63732dce22	Push the VOP_ADVLOCK() call to release advisory locks on vnode file descriptors out of fdrop_locked() and into vn_closefile(). This removes all knowledge of vnodes from fdrop_locked(), since the lock behavior was specific to vnodes. This also removes the specific requirement for Giant in fdrop_locked(), it's now only required by code that it calls into. Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.	2004-06-01 18:03:20 +00:00
Bosko Milekic	6bc72ab95a	Fix a couple of bugs in the mbuf and packet ctors. In the latter case, nextpkt within the m_hdr was not being initialized to NULL for !M_PKTHDR cases. Maybe this will fix weird socket buffer inconsistency panics, but we'll see.	2004-06-01 16:17:10 +00:00
Poul-Henning Kamp	3a95025ffc	Introduce a ttyioctl() cdevsw default function.	2004-06-01 13:39:02 +00:00
Poul-Henning Kamp	be9bd88238	There is no need to explicitly call the stop function. In all likelyhood ->l_close() did it and ttyclose certainly will.	2004-06-01 11:57:15 +00:00
Robert Watson	d087080c1f	Add a global mutex, accept_filter_mtx, to protect the global list of accept filters and prevent read-modify-write races.	2004-06-01 04:08:48 +00:00
Robert Watson	36568179e3	The SS_COMP and SS_INCOMP flags in the so_state field indicate whether the socket is on an accept queue of a listen socket. This change renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new state field on the socket, so_qstate, as the locking for these flags is substantially different for the locking on the remainder of the flags in so_state.	2004-06-01 02:42:56 +00:00
Don Lewis	866046f5a6	Add MSG_NBIO flag option to soreceive() and sosend() that causes them to behave the same as if the SS_NBIO socket flag had been set for this call. The SS_NBIO flag for ordinary sockets is set by fcntl(fd, F_SETFL, O_NONBLOCK). Pass the MSG_NBIO flag to the soreceive() and sosend() calls in fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag on the underlying socket for each I/O operation. The O_NONBLOCK flag is a property of the descriptor, and unlike ordinary sockets, fifos may be referenced by multiple descriptors.	2004-06-01 01:18:51 +00:00
Bosko Milekic	099a0e588c	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00
Robert Watson	e79962dbce	Assert Giant in vn_start_write() and vn_finished_write().	2004-05-31 20:56:10 +00:00
Robert Watson	9e6127fe3b	Assert Giant in vrele().	2004-05-31 19:06:01 +00:00
Poul-Henning Kamp	77409fe148	Add missing #include <sys/module.h>	2004-05-30 20:34:58 +00:00
Poul-Henning Kamp	41ee9f1c69	Add some missing <sys/module.h> includes which are masked by the one on death-row in <sys/kernel.h>	2004-05-30 17:57:46 +00:00
Tim J. Robbins	7671b766a6	Enable MI bits for gcc -ftest-coverage -fprofile-arcs on amd64.	2004-05-29 01:18:14 +00:00
Pawel Jakub Dawidek	d860b24150	Sysctl hw.bus.devctl_disable shouldn't be writtable from inside a jail. Approved by: imp	2004-05-26 16:36:32 +00:00
Thomas Moestl	65e29c4822	Retire cpu_sched_exit(); it is not used any more.	2004-05-26 12:09:39 +00:00
Dag-Erling Smørgrav	5c1921b779	As previously threatened, give each device its own sysctl context and subtree (under the new dev top-level node). This should greatly simplify drivers which need per-device sysctl variables (such as ndis).	2004-05-25 12:06:26 +00:00
Garance A Drosehn	b8fdc89d79	Implement the new KERN_PROC_RGID option, and also implement the KERN_PROC_SESSION option which had been previously defined but never implemented. PR: bin/65803 (a very tiny piece of the PR)` Submitted by: Cyrille Lefevre	2004-05-22 23:11:44 +00:00
David Xu	702ac0f112	Clear KSE thread flags after KSE thread mode is ended. The side effect of not clearing the flags for execv() syscall will result that a new program runs in KSE thread mode without enabling it. Submitted by: tjr Modified by: davidxu	2004-05-21 14:50:23 +00:00
Bruce Evans	a4c2da1503	Fixed some style bugs in tdsigwakeup().	2004-05-21 10:02:24 +00:00
John Baldwin	80c4433c18	In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if a thread is on a sleep queue and should have it's sleep aborted. Reported by: Thierry Herbelot thierry at herbelot dot com	2004-05-20 20:17:28 +00:00
Bruce Evans	372c2e9613	Fixed printf format errors which helped break GUPROF for arches with 64-bit function pointers.	2004-05-20 16:48:17 +00:00
Bruce Evans	c81d4a0396	Initialize the history counter type field in struct gmonparam as threatened in rev.1.10 of usr.sbin/kgmon/kgmon.c more than 2 years ago. kgmon has been recovering from the missing initialization for too long, but the fixup there is ifdefed for i386's and shouldn't be needed for other arches.	2004-05-20 16:42:39 +00:00
Bruce Evans	e77c22bf45	Moved i386 asms to an i386 header. The asms are for calibration of high resolution kernel profiling (options GUPROF. "U" in GUPROF stands for microseconds resolution, but the resolution is now smaller than 1 nanosecond on multi-GHz machines and the accuracy is heading towards 1 nanosecond too). Arches that support GUPROF must now provide certain macros for the calibration. GUPROF is now only supported for i386's, so the absence of the new macros for other arches doesn't break anything that wasn't already broken. amd64's have uncommitted support for GUPROF, and sparc64's have support that seems to be complete except here (there was an #error for non-i386 cases; now there are undefined macros). Changed the asms a little: - declare them as __volatile. They must not be moved, and exporting a label across asms is technically incorrect, so try harder to stop gcc moving them. - don't put the non-clobbered register "bx" in the clobber list. The clobber lists are still more conservative than necessary. - drop the non-support for gcc-1. It just gave a better error message, and this is not useful since compiling with gcc-1 would cause thousands of worse error messages. - drop the support for aout.	2004-05-20 16:12:19 +00:00
Pawel Jakub Dawidek	2ff8a3496f	Fix sysctl name: security.jail.getfsstate_getfsstatroot_only -> security.jail.getfsstatroot_only. Approved by: rwatson	2004-05-20 05:28:44 +00:00
Bruce Evans	5ad6c3b1ea	Include <sys/gmon.h> instead of <machine/profile.h> for the declaration of kmupetext(). The declaration is misplaced in <machine/profile.h> since it is not MD and not related to the lowest level of profiling. It will be moved, but getting it via <sys/gmon.h> already works.	2004-05-19 14:36:38 +00:00
Paul Saab	c2696aaf51	syncache broke rev 1.23 which was done to fix the "thundering herd" problem in Apache. Fix it. Reviewed by: peter	2004-05-19 00:22:10 +00:00
Peter Wemm	4cec6f5d02	If a symbol has section+offset definitions provided, always use instead of doing a name lookup for global symbols. This fixes the snd_pcm module.	2004-05-18 05:15:43 +00:00
Peter Wemm	82d0d1a01b	Remove leftover padding variables. Convert some silent 'ignore programmer error' cases into panics Remove 'align' field from section table (no longer needed)	2004-05-18 05:14:19 +00:00
Peter Wemm	23eb3eb66e	Since we go to the trouble of compiling the kobj ops table for each class, and cannot handle it going away, add an explicit reference to the kobj class inside each linker class. Without this, a class with no modules loaded will sit with an idle refcount of 0. Loading and unloading a module with it causes a 0->1->0 transition which frees the ops table and causes subsequent loads using that class to explode. Normally, the "kernel" module will remain forever loaded and prevent this happening, but if you have more than one linker class active, only one owns the "kernel". This finishes making modules work for kldload(8) on amd64.	2004-05-17 21:24:39 +00:00
Peter Wemm	2094780104	Clean up the code some more. Unify the text/data (progbits) and bss (nobits) tables to simplify some code. Try and shorten some of the very wide lines. Somewhere along the way, I think I fixed the memory corruption that caused panics after going multiuser.	2004-05-17 21:20:23 +00:00
Peter Wemm	872e9216d0	Oops, use the generic ELF_ST_BIND() macro instead of ELF64_ST_BIND. Submitted by: marks	2004-05-17 00:51:34 +00:00
Peter Wemm	e8855d4f97	Make a small revision to the api between the elf linker core and the elf_reloc() backends for two reasons. First, to support the possibility of there being two elf linkers in the kernel (eg: amd64), and second, to pass the relocbase explicitly (for relocating .o format kld files).	2004-05-16 20:00:28 +00:00
Bruce Evans	a13ec35b05	Fixed some common printf format errors. Don't assume that "struct foo " is "void " (it isn't) or that the default promotion of pid_t is int. Instead, assume that casting "struct foo " to "void " and printing the result with %p is useful, and that all pid_t's are representable as longs. Fixed some minor style bugs (mainly spelling errors in comments).	2004-05-14 20:51:42 +00:00
John Baldwin	3335671ddd	Split sleepq_wakeup_thread() into two functions. sleepq_remove_thread() removes a specific thread from a sleep queue. sleepq_resume_thread() resumes scheduling of a thread that has been previously removed from a sleep queue. - sleepq_catch_signals() just removes a thread from the queue it was just added to when a pending signal is found. - sleepq_signal() and sleepq_broadcast() remove threads from a queue, drop the queue lock, and then resume all the previously removed threads. This doesn't completely fix the sched_lock <-> sleepq chain LOR, but it makes it a little better as we no longer call setrunnble() with a sleep queue lock held meaning if setrunnable() tries to wakeup the swapper we don't try to lock two sleep queue chains at the same time.	2004-05-13 20:00:43 +00:00
Tim J. Robbins	f52e2ef29f	Eliminate a memory leak in kern_symlink() that could occur if vn_start_write() failed.	2004-05-11 10:42:02 +00:00
Julian Elischer	b324899838	Remove misplaced duplicate comment and slightly reformat the version that was in the right place.	2004-05-09 22:29:14 +00:00
Sam Leffler	335b8d7e89	set m_len to reflect mbuf contents on return from m_dup1; fixes an obscure m_pullup case that contributed to breaking ipcomp in tunnel mode for kame Submitted by: itojun Obtained from: kame	2004-05-09 05:57:58 +00:00
Julian Elischer	60f798c1c8	Fix rtprio() to do sensible things when called from threaded processes. It's not quite correct from a posix Point Of view, but it is a lot better than what was there before. This will be revisited later when we decide what form our priority extensions will take. Posix doesn't specify how a system scope thread can change its priority so you need to add non-standard extensions to be able to do it.. For now make this slightly non standard to allow it to be done. Submitted by: Dan Eischen originally, changed by myself.	2004-05-08 08:56:05 +00:00
Alan Cox	ec1100fc6e	Avoid pointless zeroing of the bogus page in vfs_bio_clrbuf(). Suggested by: tegge@ (from October of last year)	2004-05-08 06:46:40 +00:00
Robert Watson	f7250466a8	Unconditionally lock Giant in do_sendfile(), rather than locking it conditional on debug.mpsafenet. We can try pushing down Giant here later, but we don't want to enter VFS without holding Giant. Bumped into by: kris	2004-05-08 02:24:21 +00:00
Olivier Houchard	a77c37b649	Compare t_brkc against (char)_POSIX_VDISABLE, not against -1. Discussed with: bde	2004-05-07 15:35:38 +00:00

1 2 3 4 5 ...

7225 Commits