freebsd-dev

Author	SHA1	Message	Date
Pawel Jakub Dawidek	2c7b0f41ec	Remove VFS_VPTOFH entirely. API is already broken and it is good time to do it. Suggested by: rwatson	2007-02-16 17:32:41 +00:00
Pawel Jakub Dawidek	10bcafe9ab	Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs	2007-02-15 22:08:35 +00:00
Luigi Rizzo	33d5497079	Cleanup and document the implementation of firmware(9) based on a version that i posted earlier on the -current mailing list, and subsequent feedback received. The core of the change is just in sys/firmware.h and kern/subr_firmware.c, while other files are just adaptation of the clients to the ABI change (const-ification of some parameters and hiding of internal info, so this is fully compatible at the binary level). In detail: - reduce the amount of information exported to clients in struct firmware, and constify the pointer; - internally, document and simplify the implementation of the various functions, and make sure error conditions are dealt with properly. The diffs are large, but the code is really straightforward now (i hope). Note also that there is a subtle issue with the implementation of firmware_register(): currently, as in the previous version, we just store a reference to the 'imagename' argument, but we should rather copy it because there is no guarantee that this is a static string. I realised this while testing this code, but i prefer to fix it in a later commit -- there is no regression with respect to the past. Note, too, that the version in RELENG_6 has various bugs including missing locks around the module release calls, mishandling of modules loaded by /boot/loader, and so on, so an MFC is absolutely necessary there. I was just postponing it until this cleanup to avoid doing things twice. MFC after: 1 week	2007-02-15 17:21:31 +00:00
Robert Watson	780a98ad1f	Catch up file descriptor printing function in DDB to the addition of kqueues and POSIX message queues.	2007-02-15 10:55:43 +00:00
Robert Watson	442f65e958	Break file descriptor printing logic out of db_show_files() into db_print_file(), and add a new "show file <ptr>" DDB command, which can be used to print out file descriptors referenced in stack traces.	2007-02-15 10:50:48 +00:00
Robert Watson	f58dd47091	Rename somaxconn_sysctl() to sysctl_somaxconn() so that I will be able to claim that sofoo() functions all accept a socket as their first argument.	2007-02-15 10:11:00 +00:00
Konstantin Belousov	478a8db4ce	If both ISDOTDOT and NOCROSSMOUNT are set then lookup() might breaks out of the special handling for ".." and perform an ISDOTDOT VOP_LOOKUP() for a filesystem root vnode. Handle this case inside lookup(). Submitted by: tegge PR: 92785 MFC after: 1 week	2007-02-15 09:53:49 +00:00
Robert Watson	c3b162d54e	Teach DDB how to print sockets, socket buffers, protosw's, and domain structures given pointers to them.	2007-02-15 01:28:22 +00:00
Robert Watson	aea52f1bf8	Minor rearrangement of global variables, comments, etc, in UNIX domain sockets.	2007-02-14 15:05:40 +00:00
Robert Watson	46a1d9bfe8	Change unp_mtx to supporting recursion, and do not drop the unp_mtx over sonewconn() in unp_connect(). This avoids a race that occurs due to v_socket being an uncounted reference, as the lock was being released in order to call sonewconn(), which otherwise recurses into the UNIX domain socket code via pru_attach, as well as holding the lock over a sleeping memory allocation in uipc_attach(). Switch to a non-sleeping memory allocation during UNIX domain socket attach. This fix non-ideal in that it requires enabling recursion, but is a much smaller change than moving to using true references for v_socket. The reported panic occurs in unp_connect() following the return of sonewconn(). Update copyright year. Panic reported by: jhb	2007-02-14 12:22:11 +00:00
Robert Watson	05102f04d5	Set UNP_CONNECTING when committing to moving ahead in unp_connect(). This logic was lost when merging the remainder of these changes in 1.178.	2007-02-13 21:00:57 +00:00
Olivier Houchard	38cc2a5caa	Make vfs_getopts() set *error to ENOENT if the option wasn't found, so that consumers don't have to check for both error and the return value (some of them actually don't do it). MFC After: 1 week	2007-02-13 01:28:48 +00:00
Mike Pritchard	51fd6380c5	Do not do a vn_close for all references to the ktraced file if we are doing a CLEARFILE option. Do a vrele instead. This prevents a panic later due to v_writecount being negative when the vnode is taken off the freelist. Submitted by: jhb	2007-02-13 00:20:13 +00:00
Mike Pritchard	87aabdc126	Add a VNASSERT to vn_close to detect if v_writecount is going to become negative. This will detect the underflow when it happens, instead of having it discovered when the vnode is taken off the freelist, long after the offending process is long gone.	2007-02-12 22:53:01 +00:00
Craig Rodrigues	d139ce67c0	Makefile changes to reflect moving sys/isofs/cd9660 to sys/fs/cd9660. Continue to install userland include files in /usr/include/isofs/cd9660 so as not to break userland applications such as libstand.	2007-02-11 14:01:32 +00:00
Xin LI	d60226bd43	Give which signal caller has attempted to deliver when panicking.	2007-02-09 17:48:28 +00:00
Jeff Roberson	ed0e8f2fe9	- Change types for necent runq additions to u_char rather than int. - Fix these types in ULE as well. This fixes bugs in priority index calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64. Reported by: kkenn using pho's stress2.	2007-02-08 01:52:25 +00:00
Alan Cox	0e2056ee7f	Remove the vm page queue free mutex from the CDEV order.	2007-02-07 05:43:31 +00:00
Robert Watson	1f837c4753	Push UNIX domain socket locking further into uipc_ctloutput() in order to avoid holding the UNIX domain socket subsystem lock over soooptcopyin() and sooptcopyout(). This problem was introduced when LOCAL_CREDS, and LOCAL_CONNWAIT support were added. Reviewed by: mdodd	2007-02-06 14:31:37 +00:00
Mike Pritchard	af7a34173d	The change to the vm_page_queue_freelist lock from a spin lock to a sleep lock missed the witness code, and the system will panic immediately on boot if WITNESS is enabled. Changed the witness definition to the new type.	2007-02-06 05:51:55 +00:00
Max Laier	38d4db193b	Add a small informative printf under bootverbose to firmware_register to track problems when loading firmware from loader.	2007-02-03 16:01:46 +00:00
Bruce M Simpson	7dc8d021ea	Diff reduction with RELENG_6, style(9): Remove unnecessary brace; && should be on end of line. No functional changes.	2007-02-03 03:57:45 +00:00
Bruce M Simpson	217f71d80c	Use int instead of u_int for the 'extra' argument to the clone_create() KPI. This fixes a signedness bug in unit number comparisons. Submitted by: imp, Landon Fuller PR: kern/105228 MFC after: 2 weeks	2007-02-02 22:27:45 +00:00
Konstantin Belousov	e6a4f4cd40	Record kqueue -> struct mount mtx -> vnode interlock lock order to catch the places where reverse lock order is instantiated. OKed by: jeff	2007-02-02 09:02:18 +00:00
Julian Elischer	c6226eea4c	Move the seting of the idle_mask bits to a place where they can't be wrong. Also use the IDLETD bit in the thread mask to test if its an idle thread rather than doing a PCPU access.	2007-02-02 05:14:22 +00:00
Andre Oppermann	6a37f331d7	Generic socket buffer auto sizing support, header defines, flag inheritance. MFC after: 1 month	2007-02-01 17:53:41 +00:00
Max Laier	191c2cea1c	In case we are supplied with an imagename that matches a module, but not a firmware in that module (eventhough this is a programming error) - drop the reference to the module again. Submitted by: Benjamin Close MFC after: 3 days	2007-01-27 19:52:08 +00:00
Jeff Roberson	fc3a97dcb7	- Implement much more intelligent ipi sending. This algorithm tries to minimize IPIs and rescheduling when scheduling like tasks while keeping latency low for important threads. 1) An idle thread is running. 2) The current thread is worse than realtime and the new thread is better than realtime. Realtime to realtime doesn't preempt. 3) The new thread's priority is less than the threshold.	2007-01-25 23:51:59 +00:00
Jeff Roberson	1461899028	- Get rid of the unused DIDRUN flag. This was really only present to support sched_4bsd. - Rename the KTR level for non schedgraph parsed events. They take event space from things we'd like to graph. - Reset our slice value after we sleep. The slice is simply there to prevent starvation among equal priorities. A thread which had almost exhausted it's slice and then slept doesn't need to be rescheduled a tick after it wakes up. - Set the maximum slice value to a more conservative 100ms now that it is more accurately enforced.	2007-01-25 19:14:11 +00:00
Mohan Srinivasan	6c125b8df6	Fix for problems that occur when all mbuf clusters migrate to the mbuf packet zone. Cluster allocations fail when this happens. Also processes that may have blocked on cluster allocations will never be woken up. Thanks to rwatson for an overview of the issue and pointers to the mbuma paper and his tool to dump out UMA zones. Reviewed by: andre@	2007-01-25 01:05:23 +00:00
Jeff Roberson	9a93305a2e	- With a sleep time over 2097 seconds hzticks and slptime could end up negative. Use unsigned integers for sleep and run time so this doesn't disturb sched_interact_score(). This should fix the invalid interactive priority panics reported by several users.	2007-01-24 18:18:43 +00:00
Randall Stewart	6dbde03086	Fixes the MSG_PEEK for sctp_generic_recvmsg() the msg_flags were not being copied in properly so PEEK and any other msg_flags input operation were not being performed right. Approved by: gnn	2007-01-24 12:59:56 +00:00
Konstantin Belousov	2cc7d26f7f	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)	2007-01-23 10:01:19 +00:00
Jeff Roberson	7a5e5e2a59	- Catch up to setrunqueue/choosethread/etc. api changes. - Define our own maybe_preempt() as sched_preempt(). We want to be able to preempt idlethread in all cases. - Define our idlethread to require preemption to exit. - Get the cpu estimation tick from sched_tick() so we don't have to worry about errors from a sampling interval that differs from the time domain. This was the source of sched_priority prints/panics and inaccurate pctcpu display in top.	2007-01-23 08:50:34 +00:00
Jeff Roberson	f0393f063a	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
Craig Rodrigues	61e323a2fa	When exiting vfs_export(), delete the "export" option from the mount options list with vfs_deleteopt(). At this point, the export information is saved in mp->mnt_export, so we can delete the "export" mount option from mp->mnt_optnew and mp->mnt_opt. This fixes read-write/read-only update mounts (mount -u -o rw, mount -u -o ro) of NFS exported directories. For some reason, I could only reproduce the problem with a configuration supplied by Andre: - "options QUOTA" enabled in kernel config - "/ -maproot=root 10.0.1.105" in /etc/exports Reported by: kris, Andre Guibert de Bruet <andy siliconlandmark com>, Andrzej Tobola <ato iem pw edu pl> Tested by: Andre Guibert de Bruet	2007-01-23 06:19:16 +00:00
Andre Oppermann	7c32173ba8	Unbreak writes of 0 bytes. Zero byte writes happen when only ancillary control data but no payload data is passed. Change m_uiotombuf() to return at least one empty mbuf if the requested length was zero. Add comment to sosend_dgram and sosend_generic(). Diagnoses by: jhb Regression test by: rwatson Pointy hat to. andre	2007-01-22 14:50:28 +00:00
Konstantin Belousov	7f92c4ee02	Below is slightly edited description of the LOR by Tor Egge: -------------------------- [Deadlock] is caused by a lock order reversal in vfs_lookup(), where [some] process is trying to lock a directory vnode, that is the parent directory of covered vnode) while holding an exclusive vnode lock on covering vnode. A simplified scenario: root fs var fs / A / (/var) D /var B /log (/var/log) E vfs lock C vfs lock F Within each file system, the lock order is clear: C->A->B and F->D->E When traversing across mounts, the system can choose between two lock orders, but everything must then follow that lock order: L1: C->A->B \| +->F->D->E L2: F->D->E \| +->C->A->B The lookup() process for namei("/var") mixes those two lock orders: VOP_LOOKUP() obtains B while A is held vfs_busy() obtains a shared lock on F while A and B are held (follows L1, violates L2) vput() releases lock on B VOP_UNLOCK() releases lock on A VFS_ROOT() obtains lock on D while shared lock on F is held vfs_unbusy() releases shared lock on F vn_lock() obtains lock on A while D is held (violates L1, follows L2) dounmount() follows L1 (B is locked while F is drained). Without unmount activity, vfs_busy() will always succeed without blocking and the deadlock isn't triggered (the system behaves as if L2 is followed). With unmount, you can get 4 processes in a deadlock: p1: holds D, want A (in lookup()) p2: holds shared lock on F, want D (in VFS_ROOT()) p3: holds B, want drain lock on F (in dounmount()) p4: holds A, want B (in VOP_LOOKUP()) You can have more than one instance of p2. The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode. - Tor Egge To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp is actually not used by the callers of namei. Thus, placeholder deadfs vnode vp_crossmp is introduced that is filled into ni_dvp. Idea by: ups Reviewed by: tegge, ups, jeff, rwatson (mac interaction) Tested by: Peter Holm MFC after: 2 weeks	2007-01-22 11:25:22 +00:00
Jeff Roberson	5cea64d54f	- Disable the long-term load balancer. I believe that steal_busy works better and gives more predictable results.	2007-01-20 21:24:05 +00:00
Jeff Roberson	c95d2db298	- We do need to IPI the idlethread on some systems. It may be stuck in a power saving mode otherwise. - If the thread is already bound in sched_bind() unbind it before re-binding it to a new cpu. I don't like these semantics but they are expected by some code in the tree. Patch by jkoshy.	2007-01-20 17:03:33 +00:00
Jeff Roberson	6b2f763f7c	- In tdq_transfer() always set NEEDRESCHED when necessary regardless of the ipi settings. If NEEDRESCHED is set and an ipi is later delivered it will clear it rather than cause extra context switches. However, if we miss setting it we can have terrible latency. - In sched_bind() correctly implement bind. Also be slightly more tolerant of code which calls bind multiple times. However, we don't change binding if another call is made with a different cpu. This does not presently work with hwpmc which I believe should be changed.	2007-01-20 09:03:43 +00:00
Jeff Roberson	7b8bfa0de9	Major revamp of ULE's cpu load balancing: - Switch back to direct modification of remote CPU run queues. This added a lot of complexity with questionable gain. It's easy enough to reimplement if it's shown to help on huge machines. - Re-implement the old tdq_transfer() call as tdq_pickidle(). Change sched_add() so we have selectable cpu choosers and simplify the logic a bit here. - Implement tdq_pickpri() as the new default cpu chooser. This algorithm is similar to Solaris in that it tries to always run the threads with the best priorities. It is actually slightly more complex than solaris's algorithm because we also tend to favor the local cpu over other cpus which has a boost in latency but also potentially enables cache sharing between the waking thread and the woken thread. - Add a bunch of tunables that can be used to measure effects of different load balancing strategies. Most of these will go away once the algorithm is more definite. - Add a new mechanism to steal threads from busy cpus when we idle. This is enabled with kern.sched.steal_busy and kern.sched.busy_thresh. The threshold is the required length of a tdq's run queue before another cpu will be able to steal runnable threads. This prevents most queue imbalances that contribute the long latencies.	2007-01-19 21:56:08 +00:00
Xin LI	4f506694bb	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.	2007-01-17 14:58:53 +00:00
Suleiman Souhlal	e8ac01c56a	Remove hptlock from the static witness table, now that it's a regular sleep mutex.	2007-01-16 22:56:28 +00:00
Randall Stewart	9b3386570c	Removes useless (flags \| ) KASSERT. The ^ one that actually does what we want. Submitted by: Li Xin delphij@delphij.net Reviewed by: rrs Approved by: gnn	2007-01-16 11:40:55 +00:00
Kip Macy	e440d8fff5	Fix warning by adding extra parentheses	2007-01-16 00:09:58 +00:00
Randall Stewart	b939bb368a	Reviewed by: rwatson Approved by: gnn Add a new function hashinit_flags() which allows NOT-waiting for memory (or waiting). The old hashinit() function now calls hashinit_flags(..., HASH_WAITOK);	2007-01-15 15:06:28 +00:00
Robert Watson	b0c521e29c	Re-wrap comments to wider margins now that they have been relocated from within functions.	2007-01-12 22:01:03 +00:00
Warner Losh	fe18f3853e	When ntp_gettime() was converted from a sysctl + wrapper to a system call, its semantics were unintentionally changed. It went from returning the time state to returning 0 or -1. Since 0 means time normal, and non-zero effectively only shows up around leap seconds, this went unnoticed until now. At least unnoticed until someone was trying to run a binary they didn't have source for and it was misbehaving... Submitted by: Judah Levine MFC After: 2 weeks	2007-01-12 07:40:30 +00:00
John Baldwin	19c80b2652	Wrap propagate_priority() in a critical section to prevent unwanted preemptions when adjusting the priority of a thread that is on a run queue. This was only observed when FULL_PREEMPTION was enabled. Reported by: kris Diagnosed by: ups MFC after: 1 week	2007-01-11 19:13:27 +00:00

1 2 3 4 5 ...

9753 Commits