freebsd-nq

Author	SHA1	Message	Date
Alfred Perlstein	81d16e2d64	do the vfsstd thing instead of messing up our VFS_SYSCTL macro.	2004-07-07 06:58:29 +00:00
Peter Edwards	0f01586867	Fix bug introduced in rev 1.434: When avoiding the zeroing of "bogus_page" when it appears in a buf, be sure to advance the pointers into the data for successive pages. The bug caused file corruption when read(2)ing from a "hole" in a file where a previous page of the read block had already been faulted in: fsx tripped up on this pretty quickly. The particular access pattern is probably pretty unusual, so other applications probably wouldn't have had problems, but you'd never know. Reviewed By: alc@	2004-07-06 23:40:40 +00:00
Alfred Perlstein	1ea6061793	Use vfs_suser() where appropriate.	2004-07-06 09:39:32 +00:00
Alfred Perlstein	ea0104b032	Introduce vfs_suser(), used to test if a user should have special privs for a mount.	2004-07-06 09:37:43 +00:00
Alfred Perlstein	c713aaaeca	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.	2004-07-06 09:12:03 +00:00
Robert Watson	df623e3c2f	Temporarily disable preemption in SCHED_ULE due to reported panics and hangs due to recent preemption changes. This change appears to remove the panic that I was running into, but at the cost of increasing ithread scheduling latency, and as such is a temporary band-aid until jhb has a chance to resolve the ule<->preemption interaction that is the source of the problem. If it doesn't fix the problem for others-- sorry!	2004-07-06 05:57:29 +00:00
Don Lewis	27875d9c88	Unconditionally set last_work_seen while in the SYNCER_RUNNING state so that last_work_seen has a reasonable value at the transition to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened to be zero at the time. Initialize last_work_seen to zero as a safety measure in case the syncer never ran in the SYNCER_RUNNING state. Tested by: phk	2004-07-05 21:32:01 +00:00
Robert Watson	6a72b225b7	Drop the socket buffer lock around a call to m_copym() with M_TRYWAIT. A subset of locking changes to soreceive() in the queue for merging. Bumped into by: Willem Jan Withagen <wjw@withagen.nl>	2004-07-05 19:29:33 +00:00
Don Lewis	faf1b66d1d	Rework syncer termination code: Speed up the syncer when shutting down by sleeping for a shorter period of time instead of cranking up rushjob and using the normal one second sleep. Skip empty worklist slots when shutting down to avoid lengthy intervals of inactivity. Give I/O more time to complete between steps by not speeding the syncer quite as much. Terminate the syncer after one full pass through the worklist plus one second with the worklist containing nothing but syncer vnodes. Print an indication of shutdown progress to the console. Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist to be monitored.	2004-07-05 01:07:33 +00:00
Poul-Henning Kamp	c555963fd1	Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE.	2004-07-04 22:33:22 +00:00
Alfred Perlstein	2d1dca73ee	Pass the operation in with the fsidctl. Remove some fsidctls that we will not be using. Correct prototypes for fs sysctls.	2004-07-04 20:21:58 +00:00
Poul-Henning Kamp	7f6599fec6	Make the last commit handle non-phk root devices better.	2004-07-04 19:42:25 +00:00
Stefan Farfeleder	5908d366fb	Consistently use __inline instead of __inline__ as the former is an empty macro in <sys/cdefs.h> for compilers without support for inline.	2004-07-04 16:11:03 +00:00
Poul-Henning Kamp	1cbb1e02c4	Blocksize for I/O should be a property of the vnode and not found by groping around in the vnodes surroundings when we allocate a block. Assign a blocksize when we create a vnode, and yell a warning (and ignore it) if we got the wrong size. Please email all such warnings to me.	2004-07-04 12:49:04 +00:00
Alfred Perlstein	94ed9c8af5	Introduce a new kevent filter. EVFILT_FS that will be used to signal generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS. Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids. Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.	2004-07-04 10:52:54 +00:00
Alfred Perlstein	903ac7c219	Revision 1.496 would not boot on my system due to ffs_mount -> bdevvp -> getnewvnode(..., mp = NULL, ...) -> insmntqueue(vp, mp = NULL) -> KASSERT -> panic Make getnewvnode() only call insmntqueue() if the mountpoint parameter is not NULL.	2004-07-04 10:19:15 +00:00
Poul-Henning Kamp	e3c5a7a4dd	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
Poul-Henning Kamp	cfa5e80af8	Remove stale comment	2004-07-03 19:37:06 +00:00
Poul-Henning Kamp	279f949ee5	Add NULL arg to mi_switch() call to stop kernel compiles from breaking.	2004-07-03 16:57:51 +00:00
John Baldwin	b5cbda5055	Add a NULL param to an mi_switch() that I missed. Reported by: Jung-uk Kim jkim at niksun dot com	2004-07-03 02:38:03 +00:00
Bosko Milekic	abdb4e5d01	Fix SCHED_ULE build on SMP. The previous revision (1.110) introduced a KSE_CAN_MIGRATE() invocation with one argument missing (class). Either this is a genuine forget or it crept in from JHB's repo where he may have modified it. If it's the latter then it may require more attention. For now fix the make depend.	2004-07-03 01:19:46 +00:00
Marcel Moolenaar	8b44a2e2c9	Unbreak build for the the !PREEMPTION case: don't define variables that aren't used in that case.	2004-07-03 00:57:43 +00:00
John Baldwin	0c0b25ae91	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)	2004-07-02 20:21:44 +00:00
John Baldwin	bf0acc273a	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.	2004-07-02 19:09:50 +00:00
David Xu	f3b929bf42	Allow ptrace to deal with lwpid. Reviewed by: marcel	2004-07-02 09:19:22 +00:00
Alfred Perlstein	95f004dccd	We allocate an array of pointers to the global file table while not holding the filelist_lock. This means the filelist can change size while allocating. Detect this race and retry the allocation.	2004-07-02 07:40:10 +00:00
John Baldwin	a3a7017895	Tidy up uprof locking. Mostly the fields are protected by both the proc lock and sched_lock so they can be read with either lock held. Document the locking as well. The one remaining bogosity is that pr_addr and pr_ticks should be per-thread but profiling of multithreaded apps is currently undefined.	2004-07-02 03:50:48 +00:00
John Baldwin	16f9f20579	- Assert that any process that has statclock called on it has both a stats structure and a vmspace as this should always be true rather than checking the always true condition in an if statement. - Remove never-false check: if ((ru = &pstats->p_ru) != NULL) - Remove pstats variable that is only used once and inline its one use instead.	2004-07-02 03:48:09 +00:00
Marcel Moolenaar	cd28f17da2	Change the thread ID (thr_id_t) used for 1:1 threading from being a pointer to the corresponding struct thread to the thread ID (lwpid_t) assigned to that thread. The primary reason for this change is that libthr now internally uses the same ID as the debugger and the kernel when referencing to a kernel thread. This allows us to implement the support for debugging without additional translations and/or mappings. To preserve the ABI, the 1:1 threading syscalls, including the umtx locking API have not been changed to work on a lwpid_t. Instead the 1:1 threading syscalls operate on long and the umtx locking API has not been changed except for the contested bit. Previously this was the least significant bit. Now it's the most significant bit. Since the contested bit should not be tested by userland, this change is not expected to be visible. Just to be sure, UMTX_CONTESTED has been removed from <sys/umtx.h>. Reviewed by: mtm@ ABI preservation tested on: i386, ia64	2004-07-02 00:40:07 +00:00
Marcel Moolenaar	c2589102b0	Regen.	2004-07-02 00:38:56 +00:00
Don Lewis	e06500dde5	When shutting down the syncer kernel thread, first tell it to run faster and iterate to over its work list a few times in an attempt to empty the work list before the syncer terminates. This leaves fewer dirty blocks to be written at the "syncing disks" stage and keeps the the "giving up on N buffers" problem from being triggered by the presence of a large soft updates work list at system shutdown time. The downside is that the syncer takes noticeably longer to terminate. Tested by: "Arjan van Leeuwen" <avleeuwen AT piwebs DOT com> Approved by: mckusick	2004-07-01 23:59:19 +00:00
Warner Losh	da35daffaf	Add ability to set start/end for rman	2004-07-01 16:22:10 +00:00
John Baldwin	39981fed82	Trim a few things from the dmesg output and stick them under bootverbose to cut down on the clutter including PCI interrupt routing, MTRR, pcibios, etc. Discussed with: USENIX Cabal	2004-07-01 07:46:29 +00:00
Warner Losh	0363a12688	Hide struct resource and struct rman. You must define __RMAN_RESOURCE_VISIBLE to see inside these now. Reviewed by: dfr, njl (not njr)	2004-06-30 16:54:10 +00:00
Warner Losh	37b4e4f471	Include more information about the device in the devadded and devremoved events. This reduces the races around these events. We now include the pnp info in both. This lets one do more interesting thigns with devd on device insertion. Submitted by: Bernd Walter	2004-06-30 02:46:25 +00:00
John Baldwin	01bd10e163	Oops, this didn't make it into my submit before I committed: Defer creation of the sysctl tree for the turnstile profiling stats until a SI_SUB_LOCK sysinit. Doing it in init_turnstiles() is too early as it is called before mi_startup().	2004-06-29 03:48:49 +00:00
Peter Wemm	5b201fdcaa	Wrap long line.	2004-06-29 03:13:54 +00:00
John Baldwin	ef0ebfc351	Add two new kernel options to allow rudimentary profiling of the internal hash tables used in the sleep queue and turnstile code. Each option adds a sysctl tree under debug containing the maximum depth of any bucket in the hash table as well as a separate node for each bucket (or chain) containing the current depth and maximum depth for that bucket.	2004-06-29 02:30:12 +00:00
John Baldwin	a5471e4ef4	Remove the signal_caught argument from sleepq_timedwait() as it was effectively always zero.	2004-06-28 18:57:06 +00:00
John Baldwin	bd83e879fd	- Execute all of the tasks on the taskqueue during taskqueue_free() after the queue has been removed from the global taskqueue_queues list. This removes the need for the draining queue hack. - Allow taskqueue_run() to be called with the taskqueue mutex held. It can still be called without the lock for API compatiblity. In that case it will acquire the lock internally. - Don't lock the individual queue mutex in taskqueue_find() until after the strcmp as the global queues mutex is sufficient for the strcmp. - Simplify taskqueue_thread_loop() now that it can hold the lock across taskqueue_run(). Submitted by: bde (mostly)	2004-06-28 16:28:23 +00:00
John Baldwin	c086588f32	Adjust the priority of the idle threads to be the lowest possible priority. This is just a comestic nit as the idle thread priorities aren't used by the schedulers. Reported by: bde	2004-06-28 16:19:50 +00:00
Warner Losh	29b95d5a7e	Turns out that jhb didn't really like this. And nate pointed out that it wasn't a good idea to have the test for NULL on only a limited subset. Go back because I'm not sure adding NULL to all the others is a good idea.	2004-06-28 03:40:23 +00:00
Warner Losh	d5ca7f4f2b	Allow dev to be NULL and assume that a device is not alive or not attached. Reviewed by: njl(?) and jhb	2004-06-28 02:24:04 +00:00
Pawel Jakub Dawidek	46e3b1cbe7	Add two missing includes and remove two uneeded. This is quite serious fix, because even with MAC framework compiled in, MAC entry points in those two files were simply ignored.	2004-06-27 09:03:22 +00:00
Robert Watson	7717cf07f8	Acquire the socket buffer lock when calling unp_scan() on so->so_rcv.sb_mb to prevent the mbuf chain from changing during the scan.	2004-06-27 03:29:25 +00:00
Robert Watson	a290574663	Add a new global mutex, so_global_mtx, which protects the global variables so_gencnt, numopensockets, and the per-socket field so_gencnt. Annotate this this might be better done with atomic operations. Annotate what accept_mtx protects.	2004-06-27 03:22:15 +00:00
Robert Watson	1e4d7da707	Reduce the number of unnecessary unlock-relocks on socket buffer mutexes associated with performing a wakeup on the socket buffer: - When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append(). - When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets. For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.	2004-06-26 19:10:39 +00:00
Marcel Moolenaar	247aba2474	Allocate TIDs in thread_init() and deallocate them in thread_fini(). The overhead of unconditionally allocating TIDs (and likewise, unconditionally deallocating them), is amortized across multiple thread creations by the way UMA makes it possible to have type-stable storage. Previously the cost was kept down by having threads created as part of a fork operation use the process' PID as the TID. While this had some nice properties, it also introduced complexity in the way TIDs were allocated. Most importantly, by using the type-stable storage that UMA gives us this was also unnecessary. This change affects how core dumps are created and in particular how the PRSTATUS notes are dumped. Since we don't have a thread with a TID equalling the PID, we now need a different way to preserve the old and previous behavior. We do this by having the given thread (i.e. the thread passed to the core dump code in td) dump it's state first and fill in pr_pid with the actual PID. All other threads will have pr_pid contain their TIDs. The upshot of all this is that the debugger will now likely select the right LWP (=TID) as the initial thread. Credits to: julian@ for spotting how we can utilize UMA. Thanks to: all who provided julian@ with test results.	2004-06-26 18:58:22 +00:00
Robert Watson	11c40a39b6	Replace comment on spl state when calling soabort() with a comment on locking state. No socket locks should be held when calling soabort() as it will call into protocol code that may acquire socket locks.	2004-06-26 17:12:29 +00:00
Poul-Henning Kamp	cb9ea5f4cb	Pick the hotchar out of the tty structure instead of caching private copies. No current line disciplines have a dynamically changing hotchar, and expecting to receive anything sensible during a change in ldisc is insane so no locking of the hotchar field is necessary.	2004-06-26 09:20:07 +00:00

1 2 3 4 5 ...

7388 Commits