freebsd-nq

Author	SHA1	Message	Date
Robert Watson	9260798fd7	Acquire Giant in link() so that the system call can be marked MPSAFE. Don't want to acquire Giant in kern_link() sync linux compat code performs actions requiring Giant prior to calling kern_link().	2004-06-22 04:34:05 +00:00
Robert Watson	7af72ad7b6	Rebuild following marking link() as MPSAFE.	2004-06-22 04:29:59 +00:00
Robert Watson	61d87ffdc0	Mark link() system call as MPSAFE.	2004-06-22 04:29:27 +00:00
Robert Watson	694b21cf7b	Acquire Giant in link() so that we can mark it as MSTD in syscalls.master. Don't want to do it in kern_link() since the Linux emulation code calls kern_link() after performing other actions requiring Giant.	2004-06-22 04:29:07 +00:00
Robert Watson	fea24c0a71	Remove spl's from uipc_socket to ease in merging.	2004-06-22 03:49:22 +00:00
Scott Long	36c6fd1c0f	Fix another typo in the previous commit.	2004-06-21 23:47:47 +00:00
Poul-Henning Kamp	ec66f15d14	Put the pre FreeBSD-2.x tty compat code under BURN_BRIDGES.	2004-06-21 22:57:16 +00:00
Scott Long	c38dd4b6bd	Fix typo that somehow crept into the previous commit	2004-06-21 22:42:46 +00:00
Kelly Yancey	de0a924120	Update previous commit to: * Obtain/release schedlock around calls to calcru. * Sort switch cases which do not cascade per style(9). * Sort local variables per style(9). * Remove "superfluous" whitespace. * Cleanup handling of NULL uap->tp in clock_getres(). It would probably be better to return EFAULT like clock_gettime() does by passing the pointer to copyout(), but I presume it was written to not fail on purpose in the original code. I'll defer to -standards on this one. Reported by: bde	2004-06-21 22:34:57 +00:00
Scott Long	dc09579417	Add the sysctl node 'kern.sched.name' that has the name of the scheduler currently in use. Move the 4bsd kern.quantum node to kern.sched.quantum for consistency.	2004-06-21 22:05:46 +00:00
Julian Elischer	dcc9954eb9	Mark the thread in an exiting program as inactive. This is not really used by the process but it's confusing to some status readers to see zombie processes the "runnin" threads. Pointed out by: Don Lewis <truckman@FreeBSD.org>	2004-06-21 20:44:02 +00:00
Bruce Evans	ba39a1c5a4	Turned off the "calcru: negative time" warning for certain SMP cases where it is known to detect a problem but the problem is not very easy to fix. The warning became very common recently after a call to calcru() was added to fill_kinfo_thread(). Another (much older) cause of "negative times" (actually non-monotonic times) was fixed in rev.1.237 of kern_exit.c. Print separate messages for non-monotonic and negative times.	2004-06-21 17:46:27 +00:00
Bruce Evans	40a3fa2d59	(1) Removed the bogus condition "p->p_pid != 1" on calling sched_exit() from exit1(). sched_exit() must be called unconditionally from exit1(). It was called almost unconditionally because the only exits on system shutdown if at all. (2) Removed the comment that presumed to know what sched_exit() does. sched_exit() does different things for the ULE case. The call became essential when it started doing load average stuff, but its caller should not know that. (3) Didn't fix bugs caused by bitrot in the condition. The condition was last correct in rev.1.208 when it was in wait1(). There p was spelled curthread->td_proc and was for the waiting parent; now p is for the exiting child. The condition was to avoid lowering init's priority. It should be in sched_exit() itself. Lowering of priorities is broken in other ways in at least the 4BSD scheduler, and doing it for init causes less noticeable problems than doing it for for shells. Noticed by: julian (1)	2004-06-21 14:49:50 +00:00
Bruce Evans	871684b822	Update p_runtime on exit. This fixes calcru() on zombies, and prepares for not calling calcru() on exit. calcru() on a zombie can happen if ttyinfo() (^T) picks one. PR: 52490	2004-06-21 14:03:38 +00:00
Poul-Henning Kamp	55dbc267cb	New style functions, kill register keyword.	2004-06-21 12:28:56 +00:00
Robert Watson	a34b704666	Merge next step in socket buffer locking: - sowakeup() now asserts the socket buffer lock on entry. Move the call to KNOTE higher in sowakeup() so that it is made with the socket buffer lock held for consistency with other calls. Release the socket buffer lock prior to calling into pgsigio(), so_upcall(), or aio_swake(). Locking for this event management will need revisiting in the future, but this model avoids lock order reversals when upcalls into other subsystems result in socket/socket buffer operations. Assert that the socket buffer lock is not held at the end of the function. - Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now have _locked versions which assert the socket buffer lock on entry. If a wakeup is required by sb_notify(), invoke sowakeup(); otherwise, unconditionally release the socket buffer lock. This results in the socket buffer lock being released whether a wakeup is required or not. - Break out socantsendmore() into socantsendmore_locked() that asserts the socket buffer lock. socantsendmore() unconditionally locks the socket buffer before calling socantsendmore_locked(). Note that both functions return with the socket buffer unlocked as socantsendmore_locked() calls sowwakeup_locked() which has the same properties. Assert that the socket buffer is unlocked on return. - Break out socantrcvmore() into socantrcvmore_locked() that asserts the socket buffer lock. socantrcvmore() unconditionally locks the socket buffer before calling socantrcvmore_locked(). Note that both functions return with the socket buffer unlocked as socantrcvmore_locked() calls sorwakeup_locked() which has similar properties. Assert that the socket buffer is unlocked on return. - Break out sbrelease() into a sbrelease_locked() that asserts the socket buffer lock. sbrelease() unconditionally locks the socket buffer before calling sbrelease_locked(). sbrelease_locked() now invokes sbflush_locked() instead of sbflush(). - Assert the socket buffer lock in socket buffer sanity check functions sblastrecordchk(), sblastmbufchk(). - Assert the socket buffer lock in SBLINKRECORD(). - Break out various sbappend() functions into sbappend_locked() (and variations on that name) that assert the socket buffer lock. The !_locked() variations unconditionally lock the socket buffer before calling their _locked counterparts. Internally, make sure to call _locked() support routines, etc, if already holding the socket buffer lock. - Break out sbinsertoob() into sbinsertoob_locked() that asserts the socket buffer lock. sbinsertoob() unconditionally locks the socket buffer before calling sbinsertoob_locked(). - Break out sbflush() into sbflush_locked() that asserts the socket buffer lock. sbflush() unconditionally locks the socket buffer before calling sbflush_locked(). Update panic strings for new function names. - Break out sbdrop() into sbdrop_locked() that asserts the socket buffer lock. sbdrop() unconditionally locks the socket buffer before calling sbdrop_locked(). - Break out sbdroprecord() into sbdroprecord_locked() that asserts the socket buffer lock. sbdroprecord() unconditionally locks the socket buffer before calling sbdroprecord_locked(). - sofree() now calls socantsendmore_locked() and re-acquires the socket buffer lock on return. It also now calls sbrelease_locked(). - sorflush() now calls socantrcvmore_locked() and re-acquires the socket buffer lock on return. Clean up/mess up other behavior in sorflush() relating to the temporary stack copy of the socket buffer used with dom_dispose by more properly initializing the temporary copy, and selectively bzeroing/copying more carefully to prevent WITNESS from getting confused by improperly initialized mutexes. Annotate why that's necessary, or at least, needed. - soisconnected() now calls sbdrop_locked() before unlocking the socket buffer to avoid locking overhead. Some parts of this change were: Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-21 00:20:43 +00:00
Garance A Drosehn	7638fa19a7	Fill in the values for the ki_tid and ki_numthreads which have been added to kproc_info. PR: bin/65803 (a tiny part...) Submitted by: Cyrille Lefevre	2004-06-20 22:17:22 +00:00
Robert Watson	c9f69064af	In uipc_rcvd(), lock the socket buffers at either end of the UNIX domain sokcet when updating fields at both ends. Submitted by: sam Sponsored by: FreeBSD Foundation	2004-06-20 21:43:13 +00:00
Robert Watson	1b2e3b4b46	Hold SOCK_LOCK(so) when frobbing so_state when disconnecting a connected UNIX domain datagram socket.	2004-06-20 21:29:56 +00:00
Robert Watson	fa8368a8fe	When retrieving the SO_LINGER socket option for user space, hold the socket lock over pulling so_options and so_linger out of the socket structure in order to retrieve a consistent snapshot. This may be overkill if user space doesn't require a consistent snapshot.	2004-06-20 17:50:42 +00:00
Robert Watson	6f4b1b5578	Convert an if->panic in soclose() into a call to KASSERT().	2004-06-20 17:47:51 +00:00
Robert Watson	ed2f7766b0	Annotate some ordering-related issues in solisten() which are not yet resolved by socket locking: in particular, that we test the connection state at the socket layer without locking, request that the protocol begin listening, and then set the listen state on the socket non-atomically, resulting in a non-atomic cross-layer test-and-set.	2004-06-20 17:38:19 +00:00
Robert Watson	d43c1f67cc	Annotate two intentionally unlocked reads with comments. Annotate a potentially inconsistent result returned to user space when performing fstaT() on a socket due to not using socket buffer locking.	2004-06-20 17:35:50 +00:00
Thomas Moestl	3971dcfa4b	Initialize ni_cnd.cn_cred before calling lookup() (this is normally done by namei(), which cannot easily be used here however). This fixes boot time crashes on sparc64 and probably other platforms. Reviewed by: phk	2004-06-20 17:31:01 +00:00
Garance A Drosehn	99d2ecbc7d	Add a call to calcru() to update the kproc_info fields of ki_rusage.ru_utime and ki_rusage.ru_stime. This greatly improves the accuracy of those fields. Suggested by: bde	2004-06-20 02:03:33 +00:00
Marcel Moolenaar	0068114dd5	Define __lwpid_t as an int32_t in <sys/_types.h> and define lwpid_t as an __lwpid_t in <sys/types.h>. Retype td_tid from an int to a lwpid_t and change related definitions accordingly.	2004-06-19 17:58:32 +00:00
Tim J. Robbins	68ba7a1d57	When no fixed address is given in a shmat() request, pass a hint address to vm_map_find() that is less likely to be outside of addressable memory for 32-bit processes: just past the end of the largest possible heap. This is the same hint that mmap() uses.	2004-06-19 14:46:13 +00:00
Garance A Drosehn	078842c5c9	Fill in the some new fields 'struct kinfo_proc', namely ki_childstime, ki_childutime, and ki_emul. Also uses the timevaladd() routine to correct the calculation of ki_childtime. That will correct the value returned when ki_childtime.tv_usec > 1,000,000. This also implements a new KERN_PROC_GID option for kvm_getprocs(). (there will be a similar update to lib/libkvm/kvm_proc.c) Submitted by: Cyrille Lefevre	2004-06-19 14:03:00 +00:00
Poul-Henning Kamp	d7086f313a	Only initialize f_data and f_ops if nobody else did so already.	2004-06-19 11:41:45 +00:00
Poul-Henning Kamp	a769355f9b	Explicitly initialize f_data and f_vnode to NULL. Report f_vnode to userland in struct xfile.	2004-06-19 11:40:08 +00:00
Robert Watson	31f555a1c5	Assert socket buffer lock in sb_lock() to protect socket buffer sleep lock state. Convert tsleep() into msleep() with socket buffer mutex as argument. Hold socket buffer lock over sbunlock() to protect sleep lock state. Assert socket buffer lock in sbwait() to protect the socket buffer wait state. Convert tsleep() into msleep() with socket buffer mutex as argument. Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK() in order to call into these functions with the lock, as well as to start protecting other socket buffer use in their implementation. Drop the socket buffer mutexes around calls into the protocol layer, around potentially blocking operations, for copying to/from user space, and VM operations relating to zero-copy. Assert the socket buffer mutex strategically after code sections or at the beginning of loops. In some cases, modify return code to ensure locks are properly dropped. Convert the potentially blocking allocation of storage for the remote address in soreceive() into a non-blocking allocation; we may wish to move the allocation earlier so that it can block prior to acquisition of the socket buffer lock. Drop some spl use. NOTE: Some races exist in the current structuring of sosend() and soreceive(). This commit only merges basic socket locking in this code; follow-up commits will close additional races. As merged, these changes are not sufficient to run without Giant safely. Reviewed by: juli, tjr	2004-06-19 03:23:14 +00:00
Brian Feldman	8e1b797456	Add a sysctl/tunable, "kern.always_console_output", that lets you set output to permanently (not ephemerally) go to the console. It is also sent to any other console specified by TIOCCONS as normal. While I'm here, document the kern.log_console_output sysctl.	2004-06-18 20:12:42 +00:00
David Xu	b370279ef8	Add comment to reflect that we should retry after thread singling failed.	2004-06-18 11:13:49 +00:00
David Xu	0aabef657e	Remove a bogus panic. It is possible more than one threads will be suspended in thread_suspend_check, after they are resumed, all threads will call thread_single, but only one can be success, others should retry and will exit in thread_suspend_check.	2004-06-18 06:21:09 +00:00
David Xu	ec008e96a8	If thread singler wants to terminate other threads, make sure it includes all threads except itself. Obtained from: julian	2004-06-18 06:15:21 +00:00
Robert Watson	7b574f2e45	Hold SOCK_LOCK(so) while frobbing so_options. Note that while the local race is corrected, there's still a global race in sosend() relating to so_options and the SO_DONTROUTE flag.	2004-06-18 04:02:56 +00:00
Robert Watson	c012260726	Merge some additional leaf node socket buffer locking from rwatson_netperf: Introduce conditional locking of the socket buffer in fifofs kqueue filters; KNOTE() will be called holding the socket buffer locks in fifofs, but sometimes the kqueue() system call will poll using the same entry point without holding the socket buffer lock. Introduce conditional locking of the socket buffer in the socket kqueue filters; KNOTE() will be called holding the socket buffer locks in the socket code, but sometimes the kqueue() system call will poll using the same entry points without holding the socket buffer lock. Simplify the logic in sodisconnect() since we no longer need spls. NOTE: To remove conditional locking in the kqueue filters, it would make sense to use a separate kqueue API entry into the socket/fifo code when calling from the kqueue() system call.	2004-06-18 02:57:55 +00:00
Kelly Yancey	b8817154c3	Implement CLOCK_VIRTUAL and CLOCK_PROF for clock_gettime(2) and clock_getres(2). Reviewed by: phk PR: 23304	2004-06-17 23:12:12 +00:00
Robert Watson	9535efc00d	Merge additional socket buffer locking from rwatson_netperf: - Lock down low hanging fruit use of sb_flags with socket buffer lock. - Lock down low hanging fruit use of so_state with socket lock. - Lock down low hanging fruit use of so_options. - Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock. - Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead. - Convert a if()->panic() into a KASSERT relating to so_state in soaccept(). - Remove a number of splnet()/splx() references. More complex merging of socket and socket buffer locking to follow.	2004-06-17 22:48:11 +00:00
Poul-Henning Kamp	b90c855961	Reduce the thaumaturgical level of root filesystem mounts: Instead of using an otherwise redundant clone routine in geom_disk.c, mount a temporary DEVFS and do a proper lookup. Submitted by: thomas	2004-06-17 21:24:13 +00:00
Poul-Henning Kamp	f3732fd15b	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Peter Wemm	a8774e396e	Change strategy based on a suggestion from Ian Dowse. Instead of trying to keep track of different section base addresses at a symbol-by-symbol level, just set the symbol values at load time.	2004-06-15 23:57:02 +00:00
Robert Watson	7721f5d760	Grab the socket buffer send or receive mutex when performing a read-modify-write on the sb_state field. This commit catches only the "easy" ones where it doesn't interact with as yet unmerged locking.	2004-06-15 03:51:44 +00:00
Peter Wemm	1cab0c857e	Fix symbol lookups between modules. This caused modules that depend on other modules to explode. eg: snd_ich->snd_pcm and umass->usb. The problem was that I was using the unified base address of the module instead of finding the start address of the section in question.	2004-06-15 01:35:57 +00:00
Peter Wemm	add21e178f	Insurance: cause a proper symbol lookup failure for symbol entries that reference unknown sections.. rather than returning a small value.	2004-06-15 01:33:39 +00:00
John Polstra	4717d22a7c	Change the return value of sema_timedwait() so it returns 0 on success and a proper errno value on failure. This makes it consistent with cv_timedwait(), and paves the way for the introduction of functions such as sema_timedwait_sig() which can fail in multiple ways. Bump __FreeBSD_version and add a note to UPDATING. Approved by: scottl (ips driver), arch	2004-06-14 18:19:05 +00:00
Robert Watson	c0b99ffa02	The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.	2004-06-14 18:16:22 +00:00
Poul-Henning Kamp	170593a9b5	Remove a left over from userland buffer-cache access to disks.	2004-06-14 14:25:03 +00:00
Robert Watson	310e7ceb94	Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.	2004-06-13 02:50:07 +00:00
Robert Watson	cce9e3f104	Introduce socket and UNIX domain socket locks into hard-coded lock order definition for witness. Send lock before receive lock, and socket locks after accept but before select: filedesc -> accept -> so_snd -> so_rcv -> sellck All routing locks after send lock: so_rcv -> radix node head All protocol locks before socket locks: unp -> so_snd udp -> udpinp -> so_snd tcp -> tcpinp -> so_snd	2004-06-13 00:23:03 +00:00
Robert Watson	3e87b34a25	Correct whitespace errors in merge from rwatson_netperf: tabs instead of spaces, no trailing tab at the end of line. Pointed out by: csjp	2004-06-12 23:36:59 +00:00
Robert Watson	395a08c904	Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 20:47:32 +00:00
Robert Watson	f6c0cce6d9	Introduce a mutex into struct sockbuf, sb_mtx, which will be used to protect fields in the socket buffer. Add accessor macros to use the mutex (SOCKBUF_()). Initialize the mutex in soalloc(), and destroy it in sodealloc(). Add addition, add SOCK_() access macros which will protect most remaining fields in the socket; for the time being, use the receive socket buffer mutex to implement socket level locking to reduce memory overhead. Submitted by: sam Sponosored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 16:08:41 +00:00
Poul-Henning Kamp	2653139fd2	Fix registration of loadable line disciplines. This should make watch(8)/snp(4) work again.	2004-06-12 12:31:42 +00:00
Bosko Milekic	96e124135b	Gah! Plug a mbuf leak I introduced in the last commit. I don the pointy-hat. Problem reported by: Peter Holm <pho@>	2004-06-11 18:17:25 +00:00
Julian Elischer	94e0a4cdf3	Shuffle some code around.	2004-06-11 17:48:20 +00:00
Poul-Henning Kamp	1930e303cf	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.	2004-06-11 11:16:26 +00:00
Brian Feldman	b4adfcf2f4	Make sysctl_wire_old_buffer() respect ENOMEM from vslock() by marking the valid length as 0. This prevents vsunlock() from removing a system wire from memory that was not successfully wired (by us). Submitted by: tegge	2004-06-11 02:20:37 +00:00
Robert Watson	0d9ce3a1ac	Introduce a subsystem lock around UNIX domain sockets in order to protect global and allocated variables. This strategy is derived from work originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam Leffler: - Add unp_mtx, a global mutex which will protect all UNIX domain socket related variables, structures, etc. - Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros. - Acquire unp_mtx on entering most UNIX domain socket code, drop/re-acquire around calls into VFS, and release it on return. - Avoid performing sodupsockaddr() while holding the mutex, so in general move to allocating storage before acquiring the mutex to copy the data. - Make a stack copy of the xucred rather than copying out while holding unp_mtx. Copy the peer credential out after releasing the mutex. - Add additional assertions of vnode locks following VOP_CREATE(). A few notes: - Use of an sx lock for the file list mutex may cause problems with regard to unp_mtx when garbage collection passed file descriptors. - The locking in unp_pcblist() for sysctl monitoring is correct subject to the unpcb zone not returning memory for reuse by other subsystems (consistent with similar existing concerns). - Sam's version of this change, as with the BSD/OS version, made use of both a global lock and per-unpcb locks. However, in practice, the global lock covered all accesses, so I have simplified out the unpcb locks in the interest of getting this merged faster (reducing the overhead but not sacrificing granularity in most cases). We will want to explore possibilities for improving lock granularity in this code in the future. Submitted by: sam Sponsored by: FreeBSD Foundatiuon Obtained from: BSD/OS 5 snapshot provided by BSDi	2004-06-10 21:34:38 +00:00
Bosko Milekic	b5b2ea9a46	Plug a race where upon free this scenario could occur: (time grows downward) thread 1 thread 2 ------------\|------------ dec ref_cnt \| \| dec ref_cnt <-- ref_cnt now zero cmpset \| free all \| return \| \| alloc again,\| reuse prev \| ref_cnt \| \| cmpset, read \| already freed \| ref_cnt ------------\|------------ This should fix that by performing only a single atomic test-and-set that will serve to decrement the ref_cnt, only if it hasn't changed since the earlier read, otherwise it'll loop and re-read. This forces ordering of decrements so that truly the thread which did the LAST decrement is the one that frees. This is how atomic-instruction-based refcnting should probably be handled. Submitted by: Julian Elischer	2004-06-10 00:04:27 +00:00
Maxime Henrion	931f76ab48	Fix a panic happening when m_getm() is called with len < MCLBYTES. Reported by: ale Tested by: ale Reviewed by: bosko	2004-06-09 14:53:35 +00:00
Juli Mallett	6c27c6039b	Add a comment explaining td_critnest's initial state and its life from that point on, as it happens relatively indirectly, and in a codepath the casual reader may not be acquainted with or find obvious. Glanced at by: jhb	2004-06-09 14:06:44 +00:00
Poul-Henning Kamp	b7b4b455b5	Rename struct pt_ioctl to "ptsc" and pointers to it from "pti" to "pt"	2004-06-09 10:21:53 +00:00
Poul-Henning Kamp	b7ffba0afc	Ditch K&R function style	2004-06-09 10:16:14 +00:00
Poul-Henning Kamp	2195e4207a	Reference count struct tty. Add two new functions: ttyref() and ttyrel(). ttymalloc() creates a struct tty with a reference count of one. when ttyrel sees the count go to zero, struct tty is freed. Hold references for open ttys and for ttys which are controlling terminal for sessions. Until drivers start using ttyrel(), this commit will make no difference.	2004-06-09 09:41:30 +00:00
Poul-Henning Kamp	a59df4e1ee	Fix a race in destruction of sessions.	2004-06-09 09:29:08 +00:00
Poul-Henning Kamp	c0afc00670	Move PTY private defines into PTY private files.	2004-06-09 09:09:54 +00:00
Stefan Farfeleder	1a5ff9285a	Avoid assignments to cast expressions. Reviewed by: md5 Approved by: das (mentor)	2004-06-08 13:08:19 +00:00
Tim J. Robbins	f55530b436	Remove remnants of PGINPROF.	2004-06-08 10:37:30 +00:00
Robert Watson	aa57bb0424	Correct a resource leak introduced in recent accept locking changes: when I reordered events in accept1() to allocate a file descriptor earlier, I didn't properly update use of goto on exit to unwind for cases where the file descriptor is now held, but wasn't previously. The result was that, in the event of accept() on a non-blocking socket, or in the event of a socket error, a file descriptor would be leaked. This ended up being non-fatal in many cases, as the file descriptor would be properly GC'd on process exit, so only showed up for processes that do a lot of non-blocking accept() calls, and also live for a long time (such as qmail). This change updates the use of goto targets to do additional unwinding. Eyes provided by: Brian Feldman <green@freebsd.org> Feet, hands provided by: Stefan Ehmann <shoesoft@gmx.net>, Dimitry Andric <dimitry@andric.com> Arjan van Leeuwen <avleeuwen@piwebs.com>	2004-06-07 21:45:44 +00:00
Poul-Henning Kamp	5df76176f7	Make linesw[] an array of pointers to linedesc instead of an array of linedisc.	2004-06-07 20:45:45 +00:00
Julian Elischer	345ad86692	Split kern_thread.c into 2 parts. kern_kse.c and kern_thread.c Kern_kse has already been committed. This separates out the KSE threading ABI from generic thread support.	2004-06-07 19:00:57 +00:00
David Xu	36939a0a5c	According to SUSv3, sigwait is different with sigwaitinfo, sigwait returns error code in return value, not in errno.	2004-06-07 13:35:02 +00:00
Pawel Jakub Dawidek	79db0f1cbf	Remove unused code. Submitted by: Bjoern A. Zeeb	2004-06-07 12:19:55 +00:00
Hajimu UMEMOTO	7a1a900c65	allow more than MLEN bytes for ancillary data to meet the requirement of Section 20.1 of RFC3542. Obtained from: KAME MFC after: 1 week	2004-06-07 09:59:50 +00:00
Tim J. Robbins	be5318b2ca	Remove a stale and misleading comment.	2004-06-07 09:35:00 +00:00
Julian Elischer	30276dc9f8	Move the KSE ABI specific code here and separate it from code that is generic to any threading system. This commit does not link this file to the build yet, nor does it remove these functions from their current location in kern_thread.c. (that commit coming up after further review)	2004-06-07 07:25:03 +00:00
Poul-Henning Kamp	9a6dc4b647	Remove filename+line number from panic messages.	2004-06-06 21:26:49 +00:00
Bruce Evans	05b2c96fd3	Detect interrupt storms better. The storm detection didn't work at all with an ASUS A7N8X-E motherboard in APIC mode, since storming interrupts don't repeat immediately. Use DELAY(1) to wait a bit for them to repeat. This affects all systems. Only delay for the first (10 * intr_storm_threshold) interrupts (per interrupt handler) so that this is only a pessimization while warming up. Throttle after calling the sub-handlers instead of before so that the long delay given by throttling can be used instead of the DELAY(1) to detect storms after warming up. Reduced the throttling period from 1/10 second to 1/hz seconds so that throttling doesn't destroy performance so much. Interrupts that are detected as storming are effectively handled by polling at a frequency of hz Hz. On A7N8X-E's there is another hardware or configuration bug that makes the throttled frequency closer to 2*hz Hz.	2004-06-05 18:27:28 +00:00
Maxime Henrion	bd304417e1	When we don't have any meaningful value to print for the device sysctl tree, output an empty string instead of "?". This is already what happened with DEVICE_SYSCTL_LOCATION and DEVICE_SYSCTL_PNPINFO. This makes the output of "sysctl dev" much nicer (it won't display those empty sysctls). Reviewed by: des	2004-06-05 11:39:05 +00:00
Tim J. Robbins	f99619a0dc	Change the types of vn_rdwr_inchunks()'s len and aresid arguments to size_t and size_t *, respectively. Update callers for the new interface. This is a better fix for overflows that occurred when dumping segments larger than 2GB to core files.	2004-06-05 02:18:28 +00:00
Tim J. Robbins	2b471bc616	Back out workaround for vn_rdwr_inchunks()'s INT_MAX length limitation after discussions with bde; vn_rdwr_inchunks() itself should be fixed.	2004-06-05 02:00:12 +00:00
Poul-Henning Kamp	13e84a71e0	Centralize the line discipline optimization determination in a function called ttyldoptim(). Use this function from all the relevant drivers. I belive no drivers finger linesw[] directly anymore, paving the way for locking and refcounting.	2004-06-04 21:55:55 +00:00
Poul-Henning Kamp	fe3ec6224a	Manual edits to change linesw[]-frobbing to ttyld_*() calls.	2004-06-04 20:04:52 +00:00
Poul-Henning Kamp	2140d01b27	Machine generated patch which changes linedisc calls from accessing linesw[] directly to using the ttyld...() functions The ttyld...() functions ar inline so there is no performance hit.	2004-06-04 16:02:56 +00:00
Tim J. Robbins	c4d85674d5	Remove a stale comment.	2004-06-04 11:00:22 +00:00
Dag-Erling Smørgrav	35e32fd8a3	Add a devclass level to the dev sysctl tree, in order to support per- class variables in addition to per-device variables. In plain English, this means that dev.foo0.bar is now called dev.foo.0.bar, and it is possible to to have dev.foo.bar as well.	2004-06-04 10:23:00 +00:00
Poul-Henning Kamp	d1afdc6644	Get rid of ttyregister(). All drivers now use ttymalloc() for struct tty, so now we stand a chance of implementing refcounting and getting rid of the damn things again.	2004-06-04 07:17:03 +00:00
Poul-Henning Kamp	214ef22684	Use ttymalloc() instead of ttyregister(). Use ttyioctl() instead of direct calls to the linedisc.	2004-06-04 06:50:35 +00:00
Tim J. Robbins	16e6d16299	Write segments to core dump files in maximally-sized chunks that neither exceed vn_rdwr_inchunks()'s INT_MAX length limitation nor span a block boundary. This fixes dumping segments larger than 2GB. PR: 67546	2004-06-04 06:30:16 +00:00
Robert Watson	e7dd9a1001	Mark sun_noname as const since it's immutable. Update definitions of functions that potentially accept &sun_noname (sbappendaddr(), et al) to accept a const sockaddr pointer.	2004-06-04 04:07:08 +00:00
Alan Cox	62326de742	Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to blist.h, enabling the removal of numerous #includes from subr_blist.c. (subr_blist.c and swap_pager.c are the only users of these definitions.)	2004-06-04 04:03:26 +00:00
John Baldwin	ba8b26f960	- Comment out NULL, NULL barrier for Unix domain sockets section as the double NULL entries signal Witness to stop processing the array of order entries meaning none of the spin locks are added resulting in panics on boot. - Add a missing NULL, NULL terminator to the Slip locks list to keep them separate from the spin locks.	2004-06-03 20:07:44 +00:00
Tim J. Robbins	cc05397ffc	Remove checks for curthread == NULL - it can't happen.	2004-06-03 10:22:47 +00:00
Tim J. Robbins	fa2a4d0595	Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb	2004-06-03 01:47:37 +00:00
Robert Watson	d97e0534fa	Expand the hard-coded WITNESS lock order to include the following relationships: Sockets: filedesc->accept->sellck Routing: radix node head->rtentry->ifaddr UDP: udp->udpinp TCP: tcp->tcpinp SLIP: slip_mtx->slip sc_mtx Drop in a place holder section for UNIX domain sockets. Various sections to be expanded over the next few days.	2004-06-02 23:28:06 +00:00
Maxime Henrion	2e34ae7a26	As discussed on arch@, flatten the device sysctl tree to make it more convenient to deal with. The notion of hierarchy is however preserved by adding a new %parent node.	2004-06-02 22:43:35 +00:00
Tim J. Robbins	e4e815db72	Remove a redundant "td = curthread" statement from profclock().	2004-06-02 12:05:06 +00:00
Tim J. Robbins	aa0aa7a113	Move TDF_SA from td_flags to td_pflags (and rename it accordingly) so that it is no longer necessary to hold sched_lock while manipulating it. Reviewed by: davidxu	2004-06-02 07:52:36 +00:00
Jeff Roberson	dc03363dd8	- Run sched_balance() and sched_balance_groups() from hardclock via sched_clock() rather than using callouts. This means we no longer have to take the load of the callout thread into consideration while balancing and should make the balancing decisions simpler and more accurate. Tested on: x86/UP, amd64/SMP	2004-06-02 05:46:48 +00:00
Robert Watson	2658b3bb8e	Integrate accept locking from rwatson_netperf, introducing a new global mutex, accept_mtx, which serializes access to the following fields across all sockets: so_qlen so_incqlen so_qstate so_comp so_incomp so_list so_head While providing only coarse granularity, this approach avoids lock order issues between sockets by avoiding ownership of the fields by a specific socket and its per-socket mutexes. While here, rewrite soclose(), sofree(), soaccept(), and sonewconn() to add assertions, close additional races and address lock order concerns. In particular: - Reorganize the optimistic concurrency behavior in accept1() to always allocate a file descriptor with falloc() so that if we do find a socket, we don't have to encounter the "Oh, there wasn't a socket" race that can occur if falloc() sleeps in the current code, which broke inbound accept() ordering, not to mention requiring backing out socket state changes in a way that raced with the protocol level. We may want to add a lockless read of the queue state if polling of empty queues proves to be important to optimize. - In accept1(), soref() the socket while holding the accept lock so that the socket cannot be free'd in a race with the protocol layer. Likewise in netgraph equivilents of the accept1() code. - In sonewconn(), loop waiting for the queue to be small enough to insert our new socket once we've committed to inserting it, or races can occur that cause the incomplete socket queue to overfill. In the previously implementation, it was sufficient to simply tested once since calling soabort() didn't release synchronization permitting another thread to insert a socket as we discard a previous one. - In soclose()/sofree()/et al, it is the responsibility of the caller to remove a socket from the incomplete connection queue before calling soabort(), which prevents soabort() from having to walk into the accept socket to release the socket from its queue, and avoids races when releasing the accept mutex to enter soabort(), permitting soabort() to avoid lock ordering issues with the caller. - Generally cluster accept queue related operations together throughout these functions in order to facilitate locking. Annotate new locking in socketvar.h.	2004-06-02 04:15:39 +00:00
Robert Watson	f3d055b6de	Rather than assert f_type==DTYPE_VNODE, conditionally perform the file lock release based on f_type==DTYPE_VNODE. vn_closefile() is used by non-vnode types as well (fifo).	2004-06-01 23:36:47 +00:00
Robert Watson	948a4734ed	Add GIANT_REQUIRED to kqueue_close(), since kqueue currently requires Giant.	2004-06-01 18:05:41 +00:00
Robert Watson	63732dce22	Push the VOP_ADVLOCK() call to release advisory locks on vnode file descriptors out of fdrop_locked() and into vn_closefile(). This removes all knowledge of vnodes from fdrop_locked(), since the lock behavior was specific to vnodes. This also removes the specific requirement for Giant in fdrop_locked(), it's now only required by code that it calls into. Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.	2004-06-01 18:03:20 +00:00
Bosko Milekic	6bc72ab95a	Fix a couple of bugs in the mbuf and packet ctors. In the latter case, nextpkt within the m_hdr was not being initialized to NULL for !M_PKTHDR cases. Maybe this will fix weird socket buffer inconsistency panics, but we'll see.	2004-06-01 16:17:10 +00:00
Poul-Henning Kamp	3a95025ffc	Introduce a ttyioctl() cdevsw default function.	2004-06-01 13:39:02 +00:00
Poul-Henning Kamp	be9bd88238	There is no need to explicitly call the stop function. In all likelyhood ->l_close() did it and ttyclose certainly will.	2004-06-01 11:57:15 +00:00
Robert Watson	d087080c1f	Add a global mutex, accept_filter_mtx, to protect the global list of accept filters and prevent read-modify-write races.	2004-06-01 04:08:48 +00:00
Robert Watson	36568179e3	The SS_COMP and SS_INCOMP flags in the so_state field indicate whether the socket is on an accept queue of a listen socket. This change renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new state field on the socket, so_qstate, as the locking for these flags is substantially different for the locking on the remainder of the flags in so_state.	2004-06-01 02:42:56 +00:00
Don Lewis	866046f5a6	Add MSG_NBIO flag option to soreceive() and sosend() that causes them to behave the same as if the SS_NBIO socket flag had been set for this call. The SS_NBIO flag for ordinary sockets is set by fcntl(fd, F_SETFL, O_NONBLOCK). Pass the MSG_NBIO flag to the soreceive() and sosend() calls in fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag on the underlying socket for each I/O operation. The O_NONBLOCK flag is a property of the descriptor, and unlike ordinary sockets, fifos may be referenced by multiple descriptors.	2004-06-01 01:18:51 +00:00
Bosko Milekic	099a0e588c	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00
Robert Watson	e79962dbce	Assert Giant in vn_start_write() and vn_finished_write().	2004-05-31 20:56:10 +00:00
Robert Watson	9e6127fe3b	Assert Giant in vrele().	2004-05-31 19:06:01 +00:00
Poul-Henning Kamp	77409fe148	Add missing #include <sys/module.h>	2004-05-30 20:34:58 +00:00
Poul-Henning Kamp	41ee9f1c69	Add some missing <sys/module.h> includes which are masked by the one on death-row in <sys/kernel.h>	2004-05-30 17:57:46 +00:00
Tim J. Robbins	7671b766a6	Enable MI bits for gcc -ftest-coverage -fprofile-arcs on amd64.	2004-05-29 01:18:14 +00:00
Pawel Jakub Dawidek	d860b24150	Sysctl hw.bus.devctl_disable shouldn't be writtable from inside a jail. Approved by: imp	2004-05-26 16:36:32 +00:00
Thomas Moestl	65e29c4822	Retire cpu_sched_exit(); it is not used any more.	2004-05-26 12:09:39 +00:00
Dag-Erling Smørgrav	5c1921b779	As previously threatened, give each device its own sysctl context and subtree (under the new dev top-level node). This should greatly simplify drivers which need per-device sysctl variables (such as ndis).	2004-05-25 12:06:26 +00:00
Garance A Drosehn	b8fdc89d79	Implement the new KERN_PROC_RGID option, and also implement the KERN_PROC_SESSION option which had been previously defined but never implemented. PR: bin/65803 (a very tiny piece of the PR)` Submitted by: Cyrille Lefevre	2004-05-22 23:11:44 +00:00
David Xu	702ac0f112	Clear KSE thread flags after KSE thread mode is ended. The side effect of not clearing the flags for execv() syscall will result that a new program runs in KSE thread mode without enabling it. Submitted by: tjr Modified by: davidxu	2004-05-21 14:50:23 +00:00
Bruce Evans	a4c2da1503	Fixed some style bugs in tdsigwakeup().	2004-05-21 10:02:24 +00:00
John Baldwin	80c4433c18	In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if a thread is on a sleep queue and should have it's sleep aborted. Reported by: Thierry Herbelot thierry at herbelot dot com	2004-05-20 20:17:28 +00:00
Bruce Evans	372c2e9613	Fixed printf format errors which helped break GUPROF for arches with 64-bit function pointers.	2004-05-20 16:48:17 +00:00
Bruce Evans	c81d4a0396	Initialize the history counter type field in struct gmonparam as threatened in rev.1.10 of usr.sbin/kgmon/kgmon.c more than 2 years ago. kgmon has been recovering from the missing initialization for too long, but the fixup there is ifdefed for i386's and shouldn't be needed for other arches.	2004-05-20 16:42:39 +00:00
Bruce Evans	e77c22bf45	Moved i386 asms to an i386 header. The asms are for calibration of high resolution kernel profiling (options GUPROF. "U" in GUPROF stands for microseconds resolution, but the resolution is now smaller than 1 nanosecond on multi-GHz machines and the accuracy is heading towards 1 nanosecond too). Arches that support GUPROF must now provide certain macros for the calibration. GUPROF is now only supported for i386's, so the absence of the new macros for other arches doesn't break anything that wasn't already broken. amd64's have uncommitted support for GUPROF, and sparc64's have support that seems to be complete except here (there was an #error for non-i386 cases; now there are undefined macros). Changed the asms a little: - declare them as __volatile. They must not be moved, and exporting a label across asms is technically incorrect, so try harder to stop gcc moving them. - don't put the non-clobbered register "bx" in the clobber list. The clobber lists are still more conservative than necessary. - drop the non-support for gcc-1. It just gave a better error message, and this is not useful since compiling with gcc-1 would cause thousands of worse error messages. - drop the support for aout.	2004-05-20 16:12:19 +00:00
Pawel Jakub Dawidek	2ff8a3496f	Fix sysctl name: security.jail.getfsstate_getfsstatroot_only -> security.jail.getfsstatroot_only. Approved by: rwatson	2004-05-20 05:28:44 +00:00
Bruce Evans	5ad6c3b1ea	Include <sys/gmon.h> instead of <machine/profile.h> for the declaration of kmupetext(). The declaration is misplaced in <machine/profile.h> since it is not MD and not related to the lowest level of profiling. It will be moved, but getting it via <sys/gmon.h> already works.	2004-05-19 14:36:38 +00:00
Paul Saab	c2696aaf51	syncache broke rev 1.23 which was done to fix the "thundering herd" problem in Apache. Fix it. Reviewed by: peter	2004-05-19 00:22:10 +00:00
Peter Wemm	4cec6f5d02	If a symbol has section+offset definitions provided, always use instead of doing a name lookup for global symbols. This fixes the snd_pcm module.	2004-05-18 05:15:43 +00:00
Peter Wemm	82d0d1a01b	Remove leftover padding variables. Convert some silent 'ignore programmer error' cases into panics Remove 'align' field from section table (no longer needed)	2004-05-18 05:14:19 +00:00
Peter Wemm	23eb3eb66e	Since we go to the trouble of compiling the kobj ops table for each class, and cannot handle it going away, add an explicit reference to the kobj class inside each linker class. Without this, a class with no modules loaded will sit with an idle refcount of 0. Loading and unloading a module with it causes a 0->1->0 transition which frees the ops table and causes subsequent loads using that class to explode. Normally, the "kernel" module will remain forever loaded and prevent this happening, but if you have more than one linker class active, only one owns the "kernel". This finishes making modules work for kldload(8) on amd64.	2004-05-17 21:24:39 +00:00
Peter Wemm	2094780104	Clean up the code some more. Unify the text/data (progbits) and bss (nobits) tables to simplify some code. Try and shorten some of the very wide lines. Somewhere along the way, I think I fixed the memory corruption that caused panics after going multiuser.	2004-05-17 21:20:23 +00:00
Peter Wemm	872e9216d0	Oops, use the generic ELF_ST_BIND() macro instead of ELF64_ST_BIND. Submitted by: marks	2004-05-17 00:51:34 +00:00
Peter Wemm	e8855d4f97	Make a small revision to the api between the elf linker core and the elf_reloc() backends for two reasons. First, to support the possibility of there being two elf linkers in the kernel (eg: amd64), and second, to pass the relocbase explicitly (for relocating .o format kld files).	2004-05-16 20:00:28 +00:00
Bruce Evans	a13ec35b05	Fixed some common printf format errors. Don't assume that "struct foo " is "void " (it isn't) or that the default promotion of pid_t is int. Instead, assume that casting "struct foo " to "void " and printing the result with %p is useful, and that all pid_t's are representable as longs. Fixed some minor style bugs (mainly spelling errors in comments).	2004-05-14 20:51:42 +00:00
John Baldwin	3335671ddd	Split sleepq_wakeup_thread() into two functions. sleepq_remove_thread() removes a specific thread from a sleep queue. sleepq_resume_thread() resumes scheduling of a thread that has been previously removed from a sleep queue. - sleepq_catch_signals() just removes a thread from the queue it was just added to when a pending signal is found. - sleepq_signal() and sleepq_broadcast() remove threads from a queue, drop the queue lock, and then resume all the previously removed threads. This doesn't completely fix the sched_lock <-> sleepq chain LOR, but it makes it a little better as we no longer call setrunnble() with a sleep queue lock held meaning if setrunnable() tries to wakeup the swapper we don't try to lock two sleep queue chains at the same time.	2004-05-13 20:00:43 +00:00
Tim J. Robbins	f52e2ef29f	Eliminate a memory leak in kern_symlink() that could occur if vn_start_write() failed.	2004-05-11 10:42:02 +00:00
Julian Elischer	b324899838	Remove misplaced duplicate comment and slightly reformat the version that was in the right place.	2004-05-09 22:29:14 +00:00
Sam Leffler	335b8d7e89	set m_len to reflect mbuf contents on return from m_dup1; fixes an obscure m_pullup case that contributed to breaking ipcomp in tunnel mode for kame Submitted by: itojun Obtained from: kame	2004-05-09 05:57:58 +00:00
Julian Elischer	60f798c1c8	Fix rtprio() to do sensible things when called from threaded processes. It's not quite correct from a posix Point Of view, but it is a lot better than what was there before. This will be revisited later when we decide what form our priority extensions will take. Posix doesn't specify how a system scope thread can change its priority so you need to add non-standard extensions to be able to do it.. For now make this slightly non standard to allow it to be done. Submitted by: Dan Eischen originally, changed by myself.	2004-05-08 08:56:05 +00:00
Alan Cox	ec1100fc6e	Avoid pointless zeroing of the bogus page in vfs_bio_clrbuf(). Suggested by: tegge@ (from October of last year)	2004-05-08 06:46:40 +00:00
Robert Watson	f7250466a8	Unconditionally lock Giant in do_sendfile(), rather than locking it conditional on debug.mpsafenet. We can try pushing down Giant here later, but we don't want to enter VFS without holding Giant. Bumped into by: kris	2004-05-08 02:24:21 +00:00
Olivier Houchard	a77c37b649	Compare t_brkc against (char)_POSIX_VDISABLE, not against -1. Discussed with: bde	2004-05-07 15:35:38 +00:00
Nate Lawson	88d2c61ee8	Move the CPU newbus attachment to i386 legacy. The acpi_cpu device will become just "cpu" and provide attachments in the !legacy case. Tested by: des	2004-05-06 15:54:02 +00:00
Alan Cox	5a32489377	Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page. Reviewed by: tegge@	2004-05-06 05:03:23 +00:00
Robert Watson	19b7882215	Add /* !MAC */ to final #endif.	2004-05-03 22:54:46 +00:00
Robert Watson	8ad5e19c6b	Bump copyright date for NETA to 2004.	2004-05-03 20:53:27 +00:00
Robert Watson	0a05006dd2	Add MAC_STATIC, a kernel option that disables internal MAC Framework synchronization protecting against dynamic load and unload of MAC policies, and instead simply blocks load and unload. In a static configuration, this allows you to avoid the synchronization costs associated with introducing dynamicism. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-05-03 20:53:05 +00:00
Colin Percival	b62b230461	Fix a race condition which could result in profprocs being decremented more than once if stopprofclock is called multiple times on the same process.	2004-05-03 00:48:11 +00:00
Peter Wemm	e9eabf5983	Checkpoint commit for an alternative WIP kernel module loader that isn't as dependent on binutils features/quirks as the current one. This one loads plain .o files without having to mess with shared object mode. This happens to be essential on amd64, because binutils hasn't implemented all the quirks/features that we need for producing the hack non-PIC shared objects. As it turned out, .o format isn't all that inconvenient after all. It looks like the ability to use the same .o files for linking directly into a static kernel or loading as a module might be worth it. It is still very much a work-in-progress, but it is almost usable. Other changes are still needed in order to use it though, these have not been committed yet. There is still a memory corruption/overrun bug somewhere. For example, test modules load and work, but the machine explodes a few minutes later in vm_forkproc() or the like. Notable missing things include kldxref support, and loader(8) support. I wanted to figure out a working baseline set of code first.	2004-04-30 16:32:40 +00:00
Daniel Eischen	4fc21c0947	Keep track of threads waiting in kse_release() to avoid a race condition where kse_wakeup() doesn't yet see them in (interruptible) sleep queues. Also add an upcall check to sleepqueue_catch_signals() suggested by jhb. This commit should fix recent mysql hangs. Reviewed by: jhb, davidxu Mysql'd by: Robin P. Blanchard <robin.blanchard at gactr uga edu>	2004-04-28 20:36:53 +00:00
David Schultz	06afcd9d10	If the buffer supplied to kenv(KENV_DUMP, ...) isn't big enough, return the number of bytes needed instead of 0. The manpage claims that we do this anyway.	2004-04-28 01:27:33 +00:00
Bosko Milekic	5a59cefcd1	Give jail(8) the feature to allow raw sockets from within a jail, which is less restrictive but allows for more flexible jail usage (for those who are willing to make the sacrifice). The default is off, but allowing raw sockets within jails can now be accomplished by tuning security.jail.allow_raw_sockets to 1. Turning this on will allow you to use things like ping(8) or traceroute(8) from within a jail. The patch being committed is not identical to the patch in the PR. The committed version is more friendly to APIs which pjd is working on, so it should integrate into his work quite nicely. This change has also been presented and addressed on the freebsd-hackers mailing list. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/65800	2004-04-26 19:46:52 +00:00
Pawel Jakub Dawidek	6c0ad4a77a	Always use nd.ni_vp->v_mount as an argument for VFS_QUOTACTL(), just like in RELENG_4. Pointed out by: Alex Lyashkov <umka@sevinter.net>	2004-04-26 15:44:42 +00:00
Hiten Pandya	024035e822	The paper "Hashed Timers and Hierarchical Wheels: Data Structures for the Efficient Implementation of a Timer Facility" was co-author'ed by T. Lauk, not A. Lauk. Adjust nearby whitespace.	2004-04-25 04:10:17 +00:00
Alan Cox	59c8bc40ce	Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes kmem_alloc_wait()) for mapping the image header. On all machines with a direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.	2004-04-23 03:01:40 +00:00
David E. O'Brien	207a6c0dcb	There was a thread on "unusually high load averages" when running under sched_ule, in January 2004. Looking at this, "pagezero" is (one of) the culprit(s). We had no provision for processes with P_NOLOAD set. With pagezero not running at PRI_ITHD, kseq_load_{add,rem} count pagezero as another-normal-process, thus the "expected-plus-one" load reported in the above thread. Submitted by: Nikos Ntarmos <ntarmos@ceid.upatras.gr>	2004-04-22 21:37:46 +00:00
Pawel Jakub Dawidek	0c0c597faa	Look out! vn_start_write() is able to return 0 and NULL 'mp'. Submitted by: Alex Lyashkov <shadow@psoft.net>	2004-04-22 15:40:27 +00:00
Bruce Evans	057e27959f	Include <sys/mutex.h> and its prerequisite <sys/lock.h> instesd of depending on namespace pollution in <sys/vnode.h>. Sorted includes.	2004-04-21 12:10:30 +00:00
Colin Percival	05641e82d7	1. Remove callout_stop binary compatibility. 2. Document that this means that kernel modules must be rebuilt. 3. While I'm here, fix my sorting error in callout.h Requested by: many [1], scottl [2], bde [3]	2004-04-20 15:49:31 +00:00
Mike Makonnen	b9fb5d4286	If you're trying to find out if a thread is valid and in the same process as the current thread it makes absolutely no sense to lock the parent process through the pointer in said thread. Submitted by: pho (with minor correction) Pointy Hat To: mtm	2004-04-19 14:20:01 +00:00
Luigi Rizzo	24665342d3	constify the last argument of m_copyback.	2004-04-18 13:01:28 +00:00
Bruce Evans	7b1fe905ef	Fixed some style bugs in previous commit (mainly an insertion sort error for declarations, and poorly worded messages). Fixed some nearby style bugs (unsorted declarations).	2004-04-17 02:46:05 +00:00
John Baldwin	7870c3c61c	- Enable (unmask) interrupt sources earlier in the ithread loop. Specifically, we used to enable the source after locking sched_lock and just before we had already decided to do a context switch. This meant that an ithread could never process more than one interrupt per context switch. Enabling earlier in the loop before sched_lock is acquired allows an ithread to handle multiple interrupts per context switch if interrupts fire very rapidly. For the case of heavy interrupt load this can reduce the number of context switches (and thus overhead) as well as reduce interrupt latency. - Now that we can handle multiple interrupts per context switch, add simple interrupt storm protection to threaded interrupts. If X number of consecutive interrupts are triggered before the itherad voluntarily yields to another thread, then the interrupt thread will sleep with the associated interrupt source disabled (masked) for 1/10th of a second. The default value of X is 500, but it can be tweaked via the tunable/ sysctl hw.intr_storm_threshold. If an interrupt storm is detected, then a message is output to the kernel console on the first occurrence per interrupt thread. Interrupt storm protection can be disabled completely by setting this value to 0. There is no scientific reasoning for the 1/10th of a second or 500 interrupts values, so they may require tweaking at some point in the future. Tested by: rwatson (an earlier version w/o the storm protection) Tested by: mux (reportedly made a machine with two PCI interrupts storming usable rather than hard locked) Reviewed by: imp	2004-04-16 20:25:40 +00:00
Robert Watson	d54efd4d31	At some point during the history of m_getcl(), MAC support began to unconditionally initialize the mbuf header even if cluster allocation failed, which could result in a NULL pointer dereference in low-memory conditions. PR: kern/65548 Submitted by: Stephan Uphoff <ups@tree.com>	2004-04-16 14:35:11 +00:00
Ruslan Ermilov	61f7581d08	Ensure that the poll_burst <= poll_burst_max constraint really holds. Reviewed by: luigi	2004-04-15 07:38:44 +00:00
Warner Losh	5e1d0a23bc	Fix off by one error, twice. Submitted by: Carlos Velasco (first one), jhb (second one)	2004-04-12 23:02:21 +00:00
Colin Percival	4a3b3dcb55	stop() no longer needs sched_lock held; in fact, holding sched_lock causes a LOR against sleepq. Fix the comment, and fix ptracestop() to pick up sched_lock after stop() rather than before. Reported by: Scott Sipe <cscotts@mindspring.com> Reviewed by: rwatson, jhb	2004-04-12 15:56:05 +00:00
Maxime Henrion	a0b5a67929	Put deprecated sysctl code inside BURN_BRIDGES.	2004-04-11 21:09:22 +00:00
Alan Cox	148b3f62a9	Use vm_page_hold() rather than vm_page_wire() for short-duration page wiring. The reason being that vm_page_hold() is cheaper.	2004-04-11 19:57:11 +00:00
Maxime Henrion	4ddd1e65d4	Remove a comment that complains about the lack of %qd, to justify truncating a rlim_t to a long. We have %qd since some time now. However, the correct format to use here is %jd and a cast to intmax_t, so do this.	2004-04-10 11:08:16 +00:00
Peter Edwards	24554d00bc	Plug minor memory leak of module_t structures when unloading a file from the kernel. Reviewed By: Doug Rabson (dfr@)	2004-04-09 15:27:38 +00:00
Olivier Houchard	d50c87decf	Spell "switches" a more conventional way.	2004-04-09 14:31:29 +00:00
Robert Watson	123f024b24	Compare pointers with NULL rather than using pointers are booleans in if/for statements. Assign pointers to NULL rather than typecast 0. Compare pointers with NULL rather than 0.	2004-04-09 13:23:51 +00:00
Mike Silbersack	e8410540b7	Fix a regression in my change which sends headers along with data; a side effect of that change caused headers to not be sent if a 0 byte file was passed to sendfile. This change fixes that behavior, allowing sendfile to send out the headers even with a 0 byte file again. Noticed by: Dirk Engling	2004-04-08 07:14:34 +00:00
Marcel Moolenaar	ece267ba58	Do not assume that the initial thread (i.e. the thread with the ID equal to the process ID) is still present when we dump a core. It already may have been destroyed. In that case we would end up dereferencing a NULL pointer, so specifically test for that as well. Reported & tested by: Dan Nelson <dnelson@allantgroup.com>	2004-04-08 06:37:00 +00:00
Colin Percival	49a74476a6	Add whitespace before comment blocks. (reported by njl) Remove spurious whitespace, add indent protection, fix punctuation, remove initialization of static variables to zero, put wakeup_ctr and wakeup_needed in the correct order. (reported by bde) This doesn't fix all the style bugs I introduced, but the remaining style bugs make it easier for me to understand what I did here.	2004-04-08 02:03:49 +00:00
Warner Losh	f36cfd49ad	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 20:46:16 +00:00
Colin Percival	ec513ff759	Fix filt_timer* races: Finish initializing a knote before we pass it to a callout, and use the new callout_drain API to make sure that a callout has finished before we deallocate memory it is using. PR: kern/64121 Discussed with: gallatin	2004-04-07 05:59:57 +00:00
Colin Percival	2c1bb20746	Introduce a callout_drain() function. This acts in the same manner as callout_stop(), except that if the callout being stopped is currently in progress, it blocks attempts to reset the callout and waits until the callout is completed before it returns. This makes it possible to clean up callout-using code safely, e.g., without potentially freeing memory which is still being used by a callout. Reviewed by: mux, gallatin, rwatson, jhb	2004-04-06 23:08:49 +00:00
John Baldwin	9000d57d57	Associate a simple count of waiters with each condition variable. The count is protected by the mutex that protects the condition, so the count does not require any extra locking or atomic operations. It serves as an optimization to avoid calling into the sleepqueue code at all if there are no waiters. Note that the count can get temporarily out of sync when threads sleeping on a condition variable time out or are aborted. However, it doesn't hurt to call the sleepqueue code for either a signal or a broadcast when there are no waiters, and the count is never out of sync in the opposite direction unless we have more than INT_MAX sleeping threads.	2004-04-06 19:17:46 +00:00
John Baldwin	535eb30962	Add a new kernel option MUTEX_WAKE_ALL that changes the mutex unlock code to awaken all waiters when a contested mutex is released instead of just the highest priority waiter. If the various threads are awakened in sequence then each thread may acquire and release the lock in question without contention resulting in fewer expensive unlock and lock operations. This old behavior of waking just the highest priority is still used if this option is specified. Making the algorithm conditional on a kernel option will allows us to benchmark both cases later and determine which one should be used by default. Requested by: tanimura-san	2004-04-06 19:12:24 +00:00
John Baldwin	ef2c0ba7e4	Rename turnstile_wakeup() to turnstile_broadcast() to make the naming more consistent with other APIs. sleepq and cv's use signal/broadcast, and msleep uses wakeup_one/wakeup. Prior to this turnstiles were using a signal/wakeup mixture.	2004-04-06 19:07:21 +00:00
Bruce Evans	295ed75297	Removed some less than useful comments: - don't say what a small subset of the options includes are for. - don't mark up functions which use all their args with /* ARGSUSED */. The markup should have been removed when the unused retval parameter was removed. - don't comment on what routine suser() checks do. Removed nearby excessive vertical whitespace.	2004-04-06 10:05:02 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
Doug Rabson	7d5ea13fcd	Try not to crash instantly when signalling a libthr program to death.	2004-04-05 15:06:01 +00:00
Doug Rabson	e2c8a799c1	Regen.	2004-04-05 10:17:23 +00:00
Doug Rabson	0b0a60fb43	Add lgetfh(2) which is like getfh(2) but doesn't follow symlinks.	2004-04-05 10:15:53 +00:00
Robert Watson	051bbf603a	Detatch incorrect spellings of detach.	2004-04-04 19:15:45 +00:00
Jeff Roberson	37a35e4a60	- Use the proper constant in sched_interact_update(). Previously, SCHED_INTERACT_MAX was used where SCHED_SLP_RUN_MAX was needed. This was causing the interactivity scaler to lose history at a more dramatic rate than intended.	2004-04-04 19:12:56 +00:00
Marcel Moolenaar	8c9b7b2c84	Create NT_PRSTATUS and NT_FPREGSET notes for each and every thread in the process. This is required for proper debugging of corefiles created by 1:1 or M:N threaded processes. Add an XXX comment where we should actually call a function that dumps MD specific notes. An example of a MD specific note is the NT_PRXFPREG note for SSE registers. Since BFD creates non-annotated pseudo-sections for the first PRSTATUS and FPREGSET notes (non-annotated in the sense that the name of the section does not contain the pid/tid), make sure those sections describe the initial thread of the process (i.e. the thread which tid equals the pid). This is not strictly necessary, but makes sure that tools that use the non-annotated section names will not change behaviour due to this change. The practical upshot of this all is that one can see the threads in the debugger when looking at a corefile. For 1:1 threading this means that all threads are visible.	2004-04-03 20:25:41 +00:00
Marcel Moolenaar	fdcac92868	Assign thread IDs to kernel threads. The purpose of the thread ID (tid) is twofold: 1. When a 1:1 or M:N threaded process dumps core, we need to put the register state of each of its kernel threads in the core file. This can only be done by differentiating the pid field in the respective note. For this we need the tid. 2. When thread support is present for remote debugging the kernel with gdb(1), threads need to be identified by an integer due to limitations in the remote protocol. This requires having a tid. To minimize the impact of having thread IDs, threads that are created as part of a fork (i.e. the initial thread in a process) will inherit the process ID (i.e. tid=pid). Subsequent threads will have IDs larger than PID_MAX to avoid interference with the pid allocation algorithm. The assignment of tids is handled by thread_new_tid(). The thread ID allocation algorithm has been written with 3 assumptions in mind: 1. IDs need to be created as fast a possible, 2. Reuse of IDs may happen instantaneously, 3. Someone else will write a better algorithm.	2004-04-03 15:59:13 +00:00
Alan Cox	121230a40d	In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it should not. Add a new parameter so that the caller can specify which is the case. Reported by: dillon	2004-04-03 09:16:27 +00:00
Kris Kennaway	c5af600675	Add missing comment terminator.	2004-04-02 04:57:40 +00:00
Julian Elischer	4f73277a35	The comment complained about not having a thread_unlink() and did the work itself, but thread_unink() has existed for a while... use it.	2004-04-02 01:01:34 +00:00
John Baldwin	e43257aa7d	Finish fixing up Alpha to work with an MP safe ptrace(): - ptrace_single_step() is no longer called with the proc lock held, so don't try to unlock it and then relock it. - Push Giant down into proc_rwmem() instead of forcing all the consumers (including Alpha breakpoint support) to explicitly wrap calls to proc_rwmem() with Giant. Tested by: kensmith	2004-04-01 20:56:44 +00:00
Scott Long	cd587b1397	Don't print out 'GIANT-LOCKED' for INTR_FAST drivers.	2004-04-01 07:18:42 +00:00

... 2 3 4 5 6 ...

7471 Commits