freebsd-nq

Author	SHA1	Message	Date
Poul-Henning Kamp	bde1a9c98b	Kill MAJOR_AUTO	2005-03-17 13:37:28 +00:00
Poul-Henning Kamp	800b42bde0	Prepare for the final onslaught on devices: Move uid/gid/mode from cdev to cdevsw. Add kind field to use for devd(8) later. Bump both D_VERSION and __FreeBSD_version	2005-03-17 12:07:00 +00:00
Poul-Henning Kamp	572b4402d1	In stange circumstances we may end up being the last reference to a session in tprintf(). SESSRELE() needs to properly dispose of the sessions mutex. Add sessrele() which does the proper cleanup and have SESSRELE() call it. Use SESSRELE also in pgdelete(). Found by: Coverity (ID:526)	2005-03-17 08:44:41 +00:00
Poul-Henning Kamp	51f5ce0c8c	Add two arguments to the vfs_hash() KPI so that filesystems which do not have unique hashes (NFS) can also use it.	2005-03-16 11:20:51 +00:00
Poul-Henning Kamp	9068e77689	Fix a memoryleak in case of failed root filesystem mount. Spotted by: Coverity via sam	2005-03-16 11:06:49 +00:00
John-Mark Gurney	2a77000b75	MFp4: print a more useful error when we don't have a /dev to mount devfs on..	2005-03-16 08:04:39 +00:00
Poul-Henning Kamp	78bb3c21ed	Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use it to get better hashing in vfs_hash. In case of an insert collision in vfs_hash_insert(), put the loosing vnode on a special list so that vfs_hash_remove() can just assume that it is on a list. Drop the VI_HASHED flag.	2005-03-16 07:35:06 +00:00
Warner Losh	358fef538f	Sometimes, when asked to return region A..C, we'd return A+N..C+N instead of failing. When looking for a region to allocate, we used to check to see if the start address was < end. In the case where A..B is allocated already, and one wants to allocate A..C (B < C), then this test would improperly fail (which means we'd examine that region as a possible one), and we'd return the region B+1..C+(B-A+1) rather than NULL. Since C+(B-A+1) is necessarily larger than C (end argument), this is incorrect behavior for rman_reserve_resource_bound(). The fix is to exclude those regions where r->r_start + count - 1 > end rather than r->r_start > end. This bug has been in this code for a very long time. I believe that all other tests against end are correctly done. This is why sio0 generated a message about interrupts not being enabled properly for the device. When fdc had a bug that allocated from 0x3f7 to 0x3fb, sio0 was then given 0x3fc-0x404 rather than the 0x3f8-0x3ff that it wanted. Now when fdc has the same bug, sio0 fails to allocate its ports, which is the proper behavior. Since the probe failed, we never saw the messed up resources reported. I suspect that there are other places in the tree that have weird looping or other odd work arounds to try to cope with the observed weirdness this bug can introduce. These workarounds should be located and eliminated. Minor debug write fix to match the above test done as well. 'nice' by: mdodd Sponsored by: timing solutions (http://www.timing.com/)	2005-03-15 20:28:51 +00:00
Warner Losh	a33ab77447	Fix a debugging printf. The order of start/end was inconsistant with all the other start/end debugs, causing momentary confusion when the output was examined.	2005-03-15 20:15:15 +00:00
Poul-Henning Kamp	45c26fa2b6	Improve the vfs_hash() API: vput() the unneeded vnode centrally to avoid replicating the vput in all the filesystems.	2005-03-15 20:00:03 +00:00
Jeff Roberson	b172f6c5f9	- Now that there are no external users of vfree() make it static. - Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so no one else attempts to grow a dependency on them. - Now that objects with pages hold the vnode we don't have to do unlocked checks for the page count in the vm object in VSHOULDFREE. These three macros could simply check for holdcnt state transitions to determine whether the vnode is on the free list already, but the extra safety the flag affords us is probably worth the minimal cost. - The leafonly sysctl and code have been dead for several years now, remove the sysctl and the code that employed it from vtryrecycle(). - vtryrecycle() also no longer has to check the object's page count as the object holds the vnode until it reaches 0. Sponsored by: Isilon Systems, Inc.	2005-03-15 14:38:16 +00:00
Poul-Henning Kamp	7933351a28	Fix a debug message to print a usable device name rather than useless major+minor tupple.	2005-03-15 14:08:10 +00:00
Jeff Roberson	c178628d6e	- Expose vholdl() so it may be used outside of vfs_subr.c	2005-03-15 13:43:10 +00:00
Poul-Henning Kamp	4ba679d6d0	Remove findcdev().	2005-03-15 12:58:08 +00:00
Poul-Henning Kamp	0a2e49f1f8	Rename cdev->si_udev to cdev->si_drv0 to reflect the new nature of the field.	2005-03-15 11:33:28 +00:00
Jeff Roberson	f5f0da0a0e	- transferlockers() requires the interlock to be SMP safe. Sponsored by: Isilon Systems, Inc.	2005-03-15 09:27:45 +00:00
Poul-Henning Kamp	e82ef95c11	Simplify the vfs_hash calling convention.	2005-03-15 08:07:07 +00:00
Poul-Henning Kamp	ee148e2606	Cleanup accidentally include #if 0 section.	2005-03-14 10:25:09 +00:00
Poul-Henning Kamp	6c325a2a21	Currently (almost) all filesystems maintain a local inode hash table to get from (mount + inode) to vnode. These tables are mostly copy&pasted from UFS, sized based on desiredvnodes and therefore quite large (128K-512K). Several filesystems are buggy enough that they allocate the hash table even before they know if they will ever be used or not. Add "vfs_hash", a system wide hash table, which will replace all the per-filesystem hash-tables. The fields we add to struct vnode will more or less be saved in the respective filesystems inodes. Having one central implementation will save code and will allow us to justify the complexity of code to dynamically (re)size the hash at a later point.	2005-03-14 10:01:29 +00:00
Jeff Roberson	8045557f2b	- Increment the holdcnt once for each usecount reference. This allows us to use only the holdcnt to determine whether a vnode may be recycled, simplifying the V* macros as well as vtryrecycle(), etc. Sponsored by: Isilon Systems, Inc.	2005-03-14 09:25:19 +00:00
Jeff Roberson	159b454819	- We do not have to check the object's ref_count in VSHOULDFREE or vtryrecycle(). All obj refs also ref the vnode. - Consistently use v_incr_usecount() to increment the usecount. This will be more important later. Sponsored by: Isilon Systems, Inc.	2005-03-14 08:30:31 +00:00
Jeff Roberson	8f13a540ed	- Slightly rearrange vrele() to move the common case in one indentation level. Sponsored by: Isilon Systems, Inc.	2005-03-14 07:16:55 +00:00
Jeff Roberson	6fc16a838c	- Rework vget() so we drop the usecount in two failure cases that were missed by my last commit. Sponsored by: Isilon Systems, Inc.	2005-03-14 07:11:19 +00:00
Poul-Henning Kamp	93f6c81e25	Remove debugging printfs.	2005-03-14 06:51:29 +00:00
Jeff Roberson	0463dc9ef1	- Do a vn_start_write in vn_close, we may write if this is the last ref on an unlinked file. We can't know if this is the case until after we have the lock. - Lock the vnode in vn_close, many filesystems had code which was unsafe without the lock held, and holding it greatly simplifies vgone(). - Adjust vn_lock() to check for the VI_DOOMED flag where appropriate. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:56:28 +00:00
Jeff Roberson	6703c30bb5	- Remove vx_lock, vx_unlock, vx_wait, etc. - Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do writing ops without suspending. This could suspend the vlruproc which should not be a problem under normal circumstances. - Manually implement VMIGHTFREE in vlrureclaim as this was the only instance where it was used. - Acquire a lock before calling vgone() as it now requires it. - Move the acquisition of the vnode interlock from vtryrecycle() to getnewvnode() so that if it fails we don't drop and reacquire the vnode_free_list_mtx. - Check for a usecount or holdcount at the end of vtryrecycle() in case someone grabbed a ref while we were recycling. Abort the recycle, and on the final ref drop this vnode will be placed on the head of the free list. - Move the redundant VOP_INACTIVE protection code into the local vinactive() routine to avoid code bloat. - Keep the vnode lock held across calls to vgone() in several places. - vgonel() no longer uses XLOCK, instead callers must hold an exclusive vnode lock. The VI_DOOMED flag is set to allow other threads to detect a vnode which is no longer valid. This flag is set until the last reference is gone, and there are no chances for a new ref. vgonel() holds this lock across the entire function, which greatly simplifies logic. _ Only vfree() in one place in vgone() not three. - Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock in the LK_NOWAIT case. In other cases, check after we have slept and acquired an exlusive lock. This will simulate the old vx_wait() behavior. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:54:28 +00:00
Jeff Roberson	2b3183a8b7	- A lock is required before calling VOP_REVOKE. Our reference protects us from accessing another vnode so a naked VOP_LOCK is sufficient. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:47:04 +00:00
Jeff Roberson	9331fd135b	- Don't VOP_UNLOCK prior to VOP_REVOKE. The lock is required now. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:45:51 +00:00
Jeff Roberson	23f2513a4e	- Don't drop the lock in the default inactive handler anymore, VOP_NULL will do for vop_stdinactive now. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:45:01 +00:00
Jeff Roberson	4e6746965e	- CLOSE, REVOKE, INACTIVE, and RECLAIM are not L L L, that's a locked vnode on enter, exit, error. This allows for the removal of the XLOCK. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:42:16 +00:00
Pawel Jakub Dawidek	cefcecbefd	Function jailed() looks into ucred strcture, so be sure ucred is not NULL. Reviewed by: rwatson MFC after: 1 week	2005-03-12 14:31:04 +00:00
Pawel Jakub Dawidek	d079d0a0d2	Clean up a bit. Reviewed by: rwatson MFC after: 1 week	2005-03-12 14:28:34 +00:00
Robert Watson	59f21d5ab1	Extend the coverage of the accept and socket mutexes in soisconnected() so that the socket lock is held over the test-and-set removal of the accept filter option during connect, and the two socket mutex regions (transition to connected, perform accept filter) are combined.	2005-03-12 13:39:39 +00:00
Robert Watson	a59f81d263	Move the logic implementing retrieval of the SO_ACCEPTFILTER socket option from uipc_socket.c to uipc_accf.c in do_getopt_accept_filter(), so that it now matches do_setopt_accept_filter(). Slightly reformulate the logic to match the optimistic allocation of storage for the argument in advance, and slightly expand the coverage of the socket lock.	2005-03-12 12:57:18 +00:00
Robert Watson	92081a8344	Part two of post-SMPng cleanup of accept filter registration: perform all allocation up front before grabbing the socket mutex and doing the registration work. The result is a lot cleaner.	2005-03-12 12:27:47 +00:00
Peter Wemm	f71692e9be	Replace my previous change for 32 bit systems with hz > 169 with Bruce's simpler one.	2005-03-12 00:13:45 +00:00
Peter Wemm	2afec87508	Make the tty vmin/vtime timeouts work for hz > 169 on 32 bit machines.	2005-03-12 00:10:23 +00:00
Robert Watson	64c238075f	First step in simplifying accept filter socket option logic in the post-SMPng world order. Centralize handling of the socket option clear case in do_setopt_accept_filter().	2005-03-11 21:37:45 +00:00
Robert Watson	56856fbfb4	Remove an additional commented out reference to a possible future sx lock.	2005-03-11 19:16:02 +00:00
Robert Watson	2b37548a71	When setting up a socket in socreate(), there's no need to lock the socket lock around knlist_init(), so don't. Hard code the setting of the socket reference count to 1 rather than using soref() to avoid asserting the socket lock, since we've not yet exposed the socket to other threads. This removes two mutex operations from each socket allocation.	2005-03-11 16:30:02 +00:00
Robert Watson	5fab68b19e	Remove suggestive sx_init() comment in soalloc(). We will have something like this at some point, but for now it clutters the source.	2005-03-11 16:26:33 +00:00
Robert Watson	35a196154f	The SO_NOSIGPIPE socket option allows a user process to mark a socket so that the socket does not generate SIGPIPE, only EPIPE, when a write is attempted after socket shutdown. When the option was introduced in 2002, this required the logic for determining whether SIGPIPE was generated to be pushed down from dofilewrite() to the socket layer so that the socket options could be considered. However, the change in 2002 omitted modification to soo_write() required to add that logic, resulting in SIGPIPE not being generated even without SO_NOSIGPIPE when the socket was written to using write() or related generic system calls. This change adds the EPIPE logic to soo_write(), generating a SIGPIPE signal to the process associated with the passed uio in the event that the SO_NOSIGPIPE option is not set. Notes: - The are upsides and downsides to placing this logic in the socket layer as opposed to the file descriptor layer. This is really fd layer logic, but because we need so_options, we have a choice of layering violations and pick this one. - SIGPIPE possibly should be delivered to the thread performing the write, not the process performing the write. - uio->uio_td and the td argument to soo_write() might potentially differ; we use the thread in the uio argument. - The "sigpipe" regression test in src/tools/regression/sockets/sigpipe tests for the bug. Submitted by: Mikko Tyolajarvi <mbsd at pacbell dot net> Talked with: glebius, alfred PR: 78478 MFC after: 1 week	2005-03-11 15:06:16 +00:00
John-Mark Gurney	74e620476c	fix spelling of match in comment... MFC after: 3 days	2005-03-10 21:23:06 +00:00
Poul-Henning Kamp	b43ab0e378	Try to fix the mess I made of devname, with the minimal subset of the larger minor/major patch which was posted for testing.	2005-03-10 18:21:34 +00:00
Robert Watson	53358cc907	Document, via WITNESS, that the NFS server mutex falls ahead of the socket buffer mutexes.	2005-03-09 21:38:53 +00:00
Dag-Erling Smørgrav	628b83cd08	My addled brains didn't realize that since vtp points into value, we can't freeenv(value) before we're done inspecting vtp[0]. Tested by: Anish Mistry <mistry.7@osu.edu>	2005-03-09 12:16:45 +00:00
Stefan Farfeleder	b26244446b	Fix typo in comment.	2005-03-09 11:50:55 +00:00
Sam Leffler	a4e714295a	allow the destination of m_move_pkthdr to have external storage (e.g. a cluster) Glanced at by: rwatson, silby	2005-03-08 17:52:01 +00:00
Giorgos Keramidas	0a11e99990	Remove redundant initialization that is repeated in the for() loop right below it. Approved by: jhb	2005-03-08 16:57:20 +00:00
Maxim Sobolev	8d6e40c3f1	Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress SIGPIPE signal for the duration of the sento-family syscalls. Use it to replace previously added hack in Linux layer based on temporarily setting SO_NOSIGPIPE flag. Suggested by: alfred	2005-03-08 16:11:41 +00:00
Poul-Henning Kamp	d9a54d5c23	Reengineer subr_unit Add support for passing in a mutex. If NULL is passed a global subr_unit mutex is used. Add alloc_unrl() which expects the mutex to be held. Allocating a unit will never sleep as it does not need to allocate memory. Cut possible range in half so we can use -1 to mean "out of number". Collapse first and last runs into the head by means of counters. This saves memory in the common case(s).	2005-03-08 10:40:48 +00:00
Poul-Henning Kamp	3238ec33e1	Fix signedness of minor2unit().	2005-03-08 10:40:03 +00:00
Jeff Roberson	ec346d1040	- Lock access to the buffer_map with the vm_map lock. In 4.x this was done with splbio, in 5.x this was done with Giant. Discussed with: alc Reported by: julian, pho	2005-03-08 09:34:54 +00:00
Giorgos Keramidas	46da8bf8fb	Typo & grammar fixes in comments.	2005-03-08 00:58:50 +00:00
Robert Watson	9bfb7389bc	When upcalling from a socket in soisconnected() for an accept filter, call with flag M_DONTWAIT rather than M_TRYWAIT, as we don't want to do blocking memory allocation (etc) in the netisr. MFC after: 3 days	2005-03-07 13:50:16 +00:00
Poul-Henning Kamp	3b3f38ed7d	Add placeholder mutex argument to new_unrhdr().	2005-03-07 11:05:47 +00:00
Bill Paul	58a6edd121	When you call MiniportInitialize() for an 802.11 driver, it will at some point result in a status event being triggered (it should be a link down event: the Microsoft driver design guide says you should generate one when the NIC is initialized). Some drivers generate the event during MiniportInitialize(), such that by the time MiniportInitialize() completes, the NIC is ready to go. But some drivers, in particular the ones for Atheros wireless NICs, don't generate the event until after a device interrupt occurs at some point after MiniportInitialize() has completed. The gotcha is that you have to wait until the link status event occurs one way or the other before you try to fiddle with any settings (ssid, channel, etc...). For the drivers that set the event sycnhronously this isn't a problem, but for the others we have to pause after calling ndis_init_nic() and wait for the event to arrive before continuing. Failing to wait can cause big trouble: on my SMP system, calling ndis_setstate_80211() after ndis_init_nic() completes, but _before_ the link event arrives, will lock up or reset the system. What we do now is check to see if a link event arrived while ndis_init_nic() was running, and if it didn't we msleep() until it does. Along the way, I discovered a few other problems: - Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL. ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation wrong.) - Similarly, the NDIS interrupt handler, which is essentially a DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask() has been fixed accordingly. - MiniportQueryInformation() and MiniportSetInformation() run at DISPATCH_LEVEL, and each request must complete before another can be submitted. ndis_get_info() and ndis_set_info() have been fixed accordingly. - Turned the sleep lock that guards the NDIS thread job list into a spin lock. We never do anything with this lock held except manage the job list (no other locks are held), so it's safe to do this, and it's possible that ndis_sched() and ndis_unsched() can be called from DISPATCH_LEVEL, so using a sleep lock here is semantically incorrect. Also updated subr_witness.c to add the lock to the order list.	2005-03-07 03:05:31 +00:00
Alan Cox	2b2c7a6b40	The m_ext reference counts are potentially shared and modified asynchronously by different threads. Thus, declare as volatile the reference count that is accessed through m_ext's pointer, ref_cnt. Revert the previous change, revision 1.144, that casts as volatile a single dereference of ref_cnt. Reviewed by: bmilekic, dwhite Problem reported by: kris MFC after: 3 days	2005-03-06 20:09:00 +00:00
Dag-Erling Smørgrav	f3301d15f1	Teach getenv_quad() to recognize k/m/g/t suffixes in both lower- and upper-case. This means (almost) all tunables now support those suffixes.	2005-03-05 15:52:12 +00:00
David Xu	bc8e6d817d	Allocate umtx_q from heap instead of stack, this avoids page fault panic in kernel under heavy swapping.	2005-03-05 09:15:03 +00:00
David Xu	627451c1d9	The td_waitset is pointing to a stack address when thread is waiting for a signal, because kernel stack is swappable, this causes page fault in kernel under heavy swapping case. Fix this bug by eliminating unneeded code.	2005-03-04 22:46:31 +00:00
Maxim Sobolev	4b1783363f	In linux emulation layer try to detect attempt to use linux_clone() to create kernel threads and call rfork(2) with RFTHREAD flag set in this case, which puts parent and child into the same threading group. As a result all threads that belong to the same program end up in the same threading group. This is similar to what linuxthreads port does, though in this case we don't have a luxury of having access to the source code and there is no definite way to differentiate linux_clone() called for threading purposes from other uses, so that we have to resort to heuristics. Allow SIGTHR to be delivered between all processes in the same threading group previously it has been blocked for s[ug]id processes. This also should improve locking of the same file descriptor from different threads in programs running under linux compat layer. PR: kern/72922 Reported by: Andriy Gapon <avg@icyb.net.ua> Idea suggested by: rwatson	2005-03-03 16:57:55 +00:00
Doug White	a1d0c3f203	Insert volatile cast to discourage gcc from optimizing the read outside of the while loop. Suggested by: alc MFC after: 1 day	2005-03-03 02:41:37 +00:00
Joerg Wunsch	a5f50ef9e4	netchild's mega-patch to isolate compiler dependencies into a central place. This moves the dependency on GCC's and other compiler's features into the central sys/cdefs.h file, while the individual source files can then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42. By now, GCC and ICC (the Intel compiler) have been actively tested on IA32 platforms by netchild. Extension to other compilers is supposed to be possible, of course. Submitted by: netchild Reviewed by: various developers on arch@, some time ago	2005-03-02 21:33:29 +00:00
David Xu	6675b36ec5	In kern_sigtimedwait, remove waitset bits for td_sigmask before sleeping, so in do_tdsignal, we no longer need to test td_waitset. now td_waitset is only used to give a thread higher priority when delivering signal to multithreads process. This also fixes a bug: when a thread in sigwait states was suspended and later resumed by SIGCONT, it can no longer receive signals belong to waitset.	2005-03-02 13:43:51 +00:00
Paul Saab	b8a4edc17e	Use kern_kevent instead of the stackgap for 32bit syscall wrapping. Submitted by: jhb Tested on: amd64	2005-03-01 17:45:55 +00:00
Paul Saab	c1aa81b6d9	regen	2005-03-01 17:44:34 +00:00
Paul Saab	96d31285fe	Change the prototype of kevent to remove the const from the changelist. Reviewed by: jhb	2005-03-01 17:43:08 +00:00
Robert Watson	081322613b	When mac_check_system_acct() fails, make sure to unlock as well as close the vnode. Pointed out by: jeff	2005-03-01 08:56:13 +00:00
Wes Peters	a09150446d	Add a sysctl that records the amount of physical memory in the machine. Submitted by: Nicko Dehaine <nicko@stbernard.com> MFC after: 1 day	2005-02-28 21:42:56 +00:00
Poul-Henning Kamp	8045ce213d	Also handle d_maj hints from cloning drivers correctly.	2005-02-27 22:57:32 +00:00
Poul-Henning Kamp	84f580a093	Whine about any drivers which hardcode the device major number.	2005-02-27 22:41:07 +00:00
Poul-Henning Kamp	acd102e64b	Use dynamic major number allocation.	2005-02-27 22:02:03 +00:00
Poul-Henning Kamp	78e253c8d5	Use dynamic major number allocation.	2005-02-27 22:00:45 +00:00
Poul-Henning Kamp	89685e2269	Use dynamic major number allocation for /dev/console, there is no longer any benefit from hard wiring it. Remove special hack used to wire major to zero despite zero having a different magic meaning as well.	2005-02-27 21:52:42 +00:00
Nate Lawson	789f03ceb4	Add locking to handle multiple threads getting/setting frequencies at the same time. We use an sx lock and serialize the cpufreq device's get/set/levels methods.	2005-02-27 01:34:08 +00:00
Nate Lawson	b070969b48	Allow users to reject levels below a given frequency (in MHz) via the debug.cpufreq.lowest tunable and sysctl. Some systems seem to have problems with the lowest frequencies so setting this prevents them from being available or used.	2005-02-26 22:37:49 +00:00
Tom Rhodes	183a16a3ec	Remove recently added note about DEVICE_POLLING not working with SMP. Remove warning from kern_poll.c to allow DEVICE_POLLING to be built with SMP. Discussed with: ru, glebius	2005-02-25 22:07:51 +00:00
Robert Watson	fa6fc5b819	Insert missing increment of (i) when walking the temporary semaphore vector during fork. Fix assertion which contained an off-by-one error. Submitted by: Antoine Brodin < antoine dot brodin at laposte dot net >	2005-02-25 21:00:14 +00:00
Robert Watson	590f242cc0	Add an exit hook, sem_forkhook(), which walks the list of POSIX semaphores owned by a process when it forks, and creates a matching set of references for the child process, as prescribed by POSIX. In order to avoid races with other threads in the parent process during fork(), it is necessary to allocate a temporary reference list while holding the sem_lock, then transfer those references to the new process once the sem_lock is released. The implementation is inefficient but appears functional; in order to improve the efficiency, it will be necessary to modify the existing structures and logic, which generally rely on O(n) operations over the global set of semaphores.	2005-02-25 19:10:51 +00:00
Robert Watson	955ec4156c	Assert sem_lock in id_to_sem() and sem_lookup_byname(), since these functions iterate over the global POSIX semaphore lists. MFC after: 3 days	2005-02-25 17:01:35 +00:00
Maxim Sobolev	90dc539be0	Welcome to the 21st century: increase MAXSHELLCMDLEN from 128 bytes to PAGE_SIZE. Unlike originator of the PR suggests retain MAXSHELLCMDLEN definition (he has been proposing to replace it with PAGE_SIZE everywhere), not only this reduced the diff significantly, but prevents code obfuscation and also allows to increase/decrease this parameter easily if needed. PR: kern/64196 Submitted by: Magnus Bäckström <b@etek.chalmers.se>	2005-02-25 11:49:42 +00:00
Maxim Sobolev	6916a1da50	o Replace two while {} do loops with more appropriate do {} while loops. This doesn't change functionality, but makes code more logical. Obtained from: DrafonFlyBSD o Use VOP_GETATTR() to obtain actual size of file and parse no more than that. Previously, we parsed MAXSHELLCMDLEN characters regardless of the actual file size. This makes the following working: $ printf '#!/bin/echo' > /tmp/test.sh $ chmod 755 /tmp/test.sh $ /tmp/test.sh Previously, attempts to execve() that shell script has been failing with bogus ENAMETOOLONG. PR: kern/64196 Submitted by: Magnus B.ckstr.m <b@etek.chalmers.se>	2005-02-25 10:17:53 +00:00
Maxim Sobolev	b4305f8d91	Try harder to not exceed MAXSHELLCMDLEN when parsing first line of shell script. Otherwise it's possible to panic kernel by constructing a shell script with first line not ending in '\n'. Also, treat '\0' as line terminating character, which may me useful in some situations. Submitted by: gad	2005-02-25 08:42:04 +00:00
Nate Lawson	d269386a24	Bump the maximum number of levels to 64 and add warning messages about what to do to fix reduced functionality if the number of levels is too low.	2005-02-24 20:21:41 +00:00
Sam Leffler	59d8b31002	change m_adj to reclaim unused mbufs instead of zero'ing m_len when trim'ing space off the back of a chain; this is indirect solution to a potential null ptr deref Noticed by: Coverity Prevent analysis tool (null ptr deref) Reviewed by: dg, rwatson	2005-02-24 00:40:33 +00:00
Christian S.J. Peron	cd13819433	Add locking assertions into vn_extattr_set, vn_extattr_get and vn_extattr_rm. This is meant to catch conditions where IO_NODELOCKED has been specified without the vnode being locked. Discussed with: rwatson MFC after: 1 week	2005-02-24 00:13:16 +00:00
Christian S.J. Peron	df579737e5	Drop bzero and shove the responsibility of zeroing the kse upcall object on to the zone allocator. It should be noted that uma_zalloc(9) uses bzero to zero out the object so there probably wont be any real performance benefit. If UMA grows the ability to supply zeroed zones more efficiently in the future, we will not have to modify all the existing consumers. Discussed with: rwatson,julian MFC after: 1 week	2005-02-24 00:05:50 +00:00
Sam Leffler	9d8993bbc5	remove dead code Noticed by: Coverity Prevent analysis tool Reviewed by: silby	2005-02-23 19:34:44 +00:00
Sam Leffler	15ecf3968d	eliminate potential null deref Noticed by: Coverity Prevent analysis tool Reviewed by: jhb	2005-02-23 19:32:29 +00:00
Jeff Roberson	d9a9c2c22c	- Enable SMP VFS by default on current. More users are needed to turn up any remaining bugs. Anyone inconvenienced by this can still disable it in the loader. Sponsored by: Isilon Systems, Inc.	2005-02-23 10:05:43 +00:00
Jeff Roberson	7a9507b60e	- A test in sched_switch() is no longer necessary and it is incorrect when td0 is preempted before it voluntarily switches. Discovered by: Arjan Van Leeuwen <avleeuwen@gmail.com>	2005-02-23 00:50:26 +00:00
Sam Leffler	3e55226c46	kill dead code Noticed by: Coverity Prevent analysis tool	2005-02-23 00:43:00 +00:00
Jeff Roberson	d8a7c99a1c	- Only the xlock holder should be calling VOP_LOCK on a vp once VI_XLOCK has been set. Assert that this is the case so that we catch filesystems who are using naked VOP_LOCKs in illegal cases. Sponsored by: Isilon Systems, Inc.	2005-02-23 00:11:14 +00:00
Jeff Roberson	4c11620bb9	- Add a check for xlock in vop_lock_assert. Presently the xlock is considered to be as good as an exclusive lock, although there is still a possibility of someone acquiring a VOP LOCK while xlock is held. Sponsored by: Isilon Systems, Inc.	2005-02-22 23:59:11 +00:00
Poul-Henning Kamp	767056c0e8	Zero the v_un container field to make sure everything is gone.	2005-02-22 18:56:18 +00:00
Poul-Henning Kamp	aa2f6ddc3f	Reap more benefits from DEVFS: List devfs_dirents rather than vnodes off their shared struct cdev, this saves a pointer field in the vnode at the expense of a field in the devfs_dirent. There are often 100 times more vnodes so this is bargain. In addition it makes it harder for people to try to do stypid things like "finding the vnode from cdev". Since DEVFS handles all VCHR nodes now, we can do the vnode related cleanup in devfs_reclaim() instead of in dev_rel() and vgonel(). Similarly, we can do the struct cdev related cleanup in dev_rel() instead of devfs_reclaim(). rename idestroy_dev() to destroy_devl() for consistency. Add LIST_ENTRY de_alias to struct devfs_dirent. Remove v_specnext from struct vnode. Change si_hlist to si_alist in struct cdev. String new devfs vnodes' devfs_dirent on si_alist when we create them and take them off in devfs_reclaim(). Fix devfs_revoke() accordingly. Also don't clear fields devfs_reclaim() will clear when called from vgone(); Let devfs_reclaim() call dev_rel() instead of vgonel(). Move the usecount tracking from dev_rel() to devfs_reclaim(), and let dev_rel() take a struct cdev argument instead of vnode. Destroy SI_CHEAPCLONE devices in dev_rel() (instead of devfs_reclaim()) when they are no longer used. (This should maybe happen in devfs_close() instead.)	2005-02-22 15:51:07 +00:00
Poul-Henning Kamp	1a1457d427	Make dev_ref() require the dev_lock() to be held and use it from devfs instead of directly frobbing the si_refcount.	2005-02-22 14:41:04 +00:00
Poul-Henning Kamp	7fc940b266	Remove vfinddev(), it is generally bogus when faced with jails and chroot and has no legitimate use(r)s in the tree.	2005-02-22 14:11:47 +00:00
Robert Watson	4f7fd28ee1	When invoking callout_init(), spell '1' as "CALLOUT_MPSAFE". MFC after: 3 days	2005-02-22 13:11:33 +00:00
Robert Watson	0daccb9c94	In the current world order, solisten() implements the state transition of a socket from a regular socket to a listening socket able to accept new connections. As part of this state transition, solisten() calls into the protocol to update protocol-layer state. There were several bugs in this implementation that could result in a race wherein a TCP SYN received in the interval between the protocol state transition and the shortly following socket layer transition would result in a panic in the TCP code, as the socket would be in the TCPS_LISTEN state, but the socket would not have the SO_ACCEPTCONN flag set. This change does the following: - Pushes the socket state transition from the socket layer solisten() to to socket "library" routines called from the protocol. This permits the socket routines to be called while holding the protocol mutexes, preventing a race exposing the incomplete socket state transition to TCP after the TCP state transition has completed. The check for a socket layer state transition is performed by solisten_proto_check(), and the actual transition is performed by solisten_proto(). - Holds the socket lock for the duration of the socket state test and set, and over the protocol layer state transition, which is now possible as the socket lock is acquired by the protocol layer, rather than vice versa. This prevents additional state related races in the socket layer. This permits the dual transition of socket layer and protocol layer state to occur while holding locks for both layers, making the two changes atomic with respect to one another. Similar changes are likely require elsewhere in the socket/protocol code. Reported by: Peter Holm <peter@holm.cc> Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net> Philosophical head nod: gnn	2005-02-21 21:58:17 +00:00
Robert Watson	c364c823d0	When aborting a UNIX domain socket bind() because VOP_CREATE() failed, make sure to call vn_finished_write(mp) before returning. MFC after: 3 days	2005-02-21 14:21:50 +00:00
Robert Watson	892af6b930	style(9)-ize function headers, remove use of 'register'. MFC after: 3 days	2005-02-20 23:22:13 +00:00
David Schultz	e8ed933099	Remove VFS_START(). Its original purpose involved the mfs filesystem, which is long gone. Discussed with: mckusick Reviewed by: phk	2005-02-20 23:02:20 +00:00
Robert Watson	d664e4fa50	In unp_attach(), allow uma_zalloc to zero the new unpcb rather than explicitly using bzero(). Update copyright. MFC after: 3 days	2005-02-20 20:05:11 +00:00
Robert Watson	2b85a170d1	Prefer NULL to returning 0 cast to a pointer type. MFC after: 3 days	2005-02-20 15:56:13 +00:00
Robert Watson	a00428ef92	In soreceive(), when considering delivery to a socket in SS_ISCONFIRMING, only call the protocol's pru_rcvd() if the protocol has the flag PR_WANTRCVD set. This brings that instance of pru_rcvd() into line with the rest, which do check the flag. MFC after: 3 days	2005-02-20 15:54:44 +00:00
Robert Watson	7301cf23ef	Move assignment of UNIX domain socket pcb during unp_attach() outside of the global UNIX domain socket mutex: no protection is needed that early in the setup of the UNIX domain socket and socket structures. MFC after: 3 days	2005-02-20 04:18:22 +00:00
Nate Lawson	e959a70bad	Add the "freq_settings" sysctl to each device that registers with cpufreq so their individual settings can be seen separately for debugging.	2005-02-20 00:59:15 +00:00
Poul-Henning Kamp	dfd4be14bd	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.	2005-02-19 11:44:57 +00:00
David Xu	1089f0319b	Don't restart a timeout wait in kern_sigtimedwait, also allow it to wait longer than a single integer can represent.	2005-02-19 06:05:49 +00:00
Paul Saab	b7820945ac	Swap the arguments for CP so we copy the correct source and destination.	2005-02-18 22:14:40 +00:00
Robert Watson	29bdd01910	Remove now unused 'int s' from spl(). MFC after: 3 days	2005-02-18 21:39:55 +00:00
Robert Watson	d8d716bef5	De-spl kern_connect(). MFC after: 3 days	2005-02-18 19:37:36 +00:00
Robert Watson	a7ae36bc45	Correct a typo in the comment describing soreceive_rcvoob(). MFC after: 3 days	2005-02-18 19:15:22 +00:00
Robert Watson	1b5c4b15b4	In soconnect(), when resetting so->so_error, the socket lock is not required due to a straight integer write in which minor races are not a problem.	2005-02-18 19:13:51 +00:00
Robert Watson	11d06c4b78	Re-style do_setopt_accept_filter() to match uipc_accf.c style, and fix one other style nit in the file. MFC after: 3 days	2005-02-18 19:01:22 +00:00
Robert Watson	78e436448f	Move do_setopt_accept_filter() from uipc_socket.c to uipc_accf.c, where the rest of the accept filter code currently lives. MFC after: 3 days	2005-02-18 18:54:42 +00:00
Robert Watson	1ed716a149	Minor style tweaks: line wrap comments and lines more consistently. MFC after: 3 days	2005-02-18 18:49:44 +00:00
Robert Watson	627de7fa2c	Re-order checks in socheckuid() so that we check all deny cases before returning accept. MFC after: 3 days	2005-02-18 18:43:33 +00:00
Poul-Henning Kamp	900b7e2648	Make sure to drop the VI_LOCK in vgonel(); Spotted by: Taku YAMAMOTO <taku@tackymt.homeip.net>	2005-02-18 11:13:56 +00:00
Robert Watson	0d89301c51	In solisten(), unconditionally set the SO_ACCEPTCONN option in so->so_options when solisten() will succeed, rather than setting it conditionally based on there not being queued sockets in the completed socket queue. Otherwise, if the protocol exposes new sockets via the completed queue before solisten() completes, the listen() system call will succeed, but the socket and protocol state will be out of sync. For TCP, this didn't happen in practice, as the TCP code will panic if a new connection comes in after the tcpcb has been transitioned to a listening state but the socket doesn't have SO_ACCEPTCONN set. This is historical behavior resulting from bitrot since 4.3BSD, in which that line of code was associated with the conditional NULL'ing of the connection queue pointers (one-time initialization to be performed during the transition to a listening socket), which are now initialized separately. Discussed with: fenner, gnn MFC after: 3 days	2005-02-18 00:52:17 +00:00
Nate Lawson	e94a0c1a18	Introduce a new method, cpufreq_drv_type(), that returns the type of the driver. This used to be handled by cpufreq_drv_settings() but it's useful to get the type/flags separately from getting the settings. (For example, you don't have to pass an array of cf_setting just to find the driver type.) Use this new method in our in-tree drivers to detect reliably if acpi_perf is present and owns the hardware. This simplifies logic in drivers as well as fixing a bug introduced in my last commit where too many drivers attached.	2005-02-18 00:23:36 +00:00
Robert Watson	1e8f89541e	In accept1(), extend coverage of the socket lock from just covering soref() to also covering the update of so_state. While no other user threads can update the socket state here as it's not yet hooked up to the file descriptor array yet, the protocol could also frob the socket state here, leading to a lost update to the so_state field. No reported instances of this bug (as yet). MFC after: 3 days	2005-02-17 13:00:23 +00:00
Robert Watson	280249a66a	In sonewconn(), set the new socket's state to show the protocol-provided connection status before inserting the new socket into the listen socket's accept queue, or there might be a race in which another thread wakes up when the accept lock is released, and sees the socket before its state is set correctly. The wakeup still occurs after the accept lock is released. There have been no diagnoses of this bug in real-world systems (as yet). MFC after: 3 days	2005-02-17 12:53:45 +00:00
Poul-Henning Kamp	4d8ac58b05	Introduce vx_wait{l}() and use it instead of home-rolled versions.	2005-02-17 10:49:51 +00:00
Poul-Henning Kamp	58aac12894	Convert KASSERTS to VNASSERTS	2005-02-17 10:28:58 +00:00
Dag-Erling Smørgrav	f3f4baf099	Add /rescue/init to the default init_path, before /stand/sysinstall. MFC after: 2 weeks	2005-02-17 10:00:10 +00:00
Bosko Milekic	8076cb5289	Well, it seems that I pre-maturely removed the "All rights reserved" statement from some files, so re-add it for the moment, until the related legalese is sorted out. This change affects: sys/kern/kern_mbuf.c sys/vm/memguard.c sys/vm/memguard.h sys/vm/uma.h sys/vm/uma_core.c sys/vm/uma_dbg.c sys/vm/uma_dbg.h sys/vm/uma_int.h	2005-02-16 21:45:59 +00:00
Nate Lawson	67c8649f7f	When dealing with systems with no absolute drivers attached, only calibrate the rate for the 100% state once. Afterwards, use that value for deriving states. This should fix the problem where the calibrated frequency was different once a switch was done, giving a different set of levels each time. Also, properly search for the right cpufreqX device when detaching.	2005-02-15 07:43:48 +00:00
Nate Lawson	1196826af5	Bind to the driver's parent cpu before switching, for both absolute and relative drivers. Remove some extraneous KASSERTs since NULL pointers will be found when they're used right afterwards.	2005-02-15 07:22:42 +00:00
Nate Lawson	5f0afa0415	Implement priorities. This allows a driver (say, for cooling purposes) to override the current freq level temporarily and restore it when the higher priority condition is past. Note that only the first overridden value is saved. Callers pass NULL to CPUFREQ_SET to restore the saved level. Priorities are not yet used so this commit should have no effect.	2005-02-14 18:16:35 +00:00
Nate Lawson	e22cd41c01	Add support for the CPUFREQ_FLAG_INFO_ONLY flag. Devices that report this are not added to the list(s) of available settings. However, other drivers can call the CPUFREQ_DRV_SETTINGS() method on those devices directly to get info about available settings. Update the acpi_perf(4) driver to use this flag in the presence of "functional fixed hardware." Thus, future drivers like Powernow can query acpi_perf for platform info but perform frequency transitions themselves.	2005-02-13 18:49:48 +00:00
Maxim Sobolev	f460d05699	Backout addition of SIGTHR into the list of signals allowed to be delivered to the suid/sugid process, since apparently it has security implications. Suggested by: rwatson	2005-02-13 17:51:47 +00:00
Maxim Sobolev	1a88a252fd	Backout previous change (disabling of security checks for signals delivered in emulation layers), since it appears to be too broad. Requested by: rwatson	2005-02-13 17:37:20 +00:00
Nate Lawson	0325089dad	Set levels on all CPUs and attach a cpufreq device to each one. Sysctl on dev.cpu.0 will affect all of the CPUs together. In the future, independent control will be supported but this is good enough for now. Check that the timecounter isn't TSC before switching (from Colin Percival.)	2005-02-13 17:31:56 +00:00
Maxim Sobolev	d8ff44b79f	Split out kill(2) syscall service routine into user-level and kernel part, the former is callable from user space and the latter from the kernel one. Make kernel version take additional argument which tells if the respective call should check for additional restrictions for sending signals to suid/sugid applications or not. Make all emulation layers using non-checked version, since signal numbers in emulation layers can have different meaning that in native mode and such protection can cause misbehaviour. As a result remove LIBTHR from the signals allowed to be delivered to a suid/sugid application. Requested (sorta) by: rwatson MFC after: 2 weeks	2005-02-13 16:42:08 +00:00
Christian S.J. Peron	84f85aedef	Add much needed descriptions for a number of the IPC related sysctl OIDs. This information will be very useful for people who are tuning applications which have a dependence on IPC mechanisms. The following OIDs were documented: Message queues: kern.ipc.msgmax kern.ipc.msgmni kern.ipc.msgmnb kern.ipc.msgtlq kern.ipc.msgssz kern.ipc.msgseg Semaphores: kern.ipc.semmap kern.ipc.semmni kern.ipc.semmns kern.ipc.semmnu kern.ipc.semmsl kern.ipc.semopm kern.ipc.semume kern.ipc.semusz kern.ipc.semvmx kern.ipc.semaem Shared memory: kern.ipc.shmmax kern.ipc.shmmin kern.ipc.shmmni kern.ipc.shmseg kern.ipc.shmall kern.ipc.shm_use_phys kern.ipc.shm_allow_removed kern.ipc.shmsegs These new descriptions can be viewed using sysctl -d PR: kern/65219 Submitted by: Dan Nelson <dnelson at allantgroup dot com> (modified) No objections: developers@ Descriptions reviewed by: gnn MFC after: 1 week	2005-02-12 01:22:39 +00:00
Maxim Sobolev	ac16ff40c5	Add SIGTHR (32) into list of signals permitted to be delivered to the suid application. The problem is that Linux applications using old Linux threads (pre-NPTL) use signal 32 (linux SIGRTMIN) for communication between thread-processes. If such an linux application is installed suid or sgid and security.bsd.conservative_signals=1 (default), then permission will be denied to send such a signal and the application will freeze. I believe the same will be true for native applications that use libthr, since libthr uses SIGTHR for implementing conditional variables. PR: 72922 Submitted by: Andriy Gapon <avg@icyb.net.ua> MFC after: 2 weeks	2005-02-11 14:02:42 +00:00
Ian Dowse	57c037be1c	When processing a timeout() callout and returning it to the free list, set `curr_callout' to NULL. This ensures that we won't attempt to cancel the current callout if the original callout structure gets recycled while we wait to acquire Giant. This is reported to fix an intermittent syscons problem that was introduced by revision 1.96.	2005-02-11 00:14:00 +00:00
Bosko Milekic	3d2a3ff25e	Optimize the way reference counting is performed with Mbufs. We do not need to perform an extra memory fetch in the Packet (Mbuf+Cluster) constructor to initialize the reference counter anymore. The reference counts are located in a separate memory region (in the slab header, because this zone is UMA_ZONE_REFCNT), so the memory fetch resulted very often in a cache miss. Additionally, and perhaps more significantly, optimize the free mbuf+cluster (packet) case, which is very common, to no longer require an atomic operation on free (to verify the reference counter) if the reference on the cluster has never been increased (also very common). Reduces an atomic on mbuf free on average. Original patch submitted by: Gerrit Nagelhout <gnagelhout@sandvine.com>	2005-02-10 22:23:02 +00:00
Colin Percival	e5e6a46460	Declare "cnt" (a number of bytes to read or write) as an "ssize_t", not as a "long" in dofileread() and dofilewrite(). Discussed with: jhb	2005-02-10 20:19:17 +00:00
Poul-Henning Kamp	1ba212823f	Make various vnode related functions static	2005-02-10 12:28:58 +00:00
Poul-Henning Kamp	44dc16a986	Make some file/filedesc related functions static	2005-02-10 12:27:58 +00:00
Poul-Henning Kamp	ebbfc2f82d	Make various mountpoint related functions static.	2005-02-10 12:25:38 +00:00
Poul-Henning Kamp	5ece08f57a	Make a SYSCTL_NODE static	2005-02-10 12:23:29 +00:00
Poul-Henning Kamp	502a35d60f	MD5Pad() should never have been exposed.	2005-02-10 12:20:42 +00:00
Poul-Henning Kamp	502a590bf1	make cluster_callback() static	2005-02-10 12:17:48 +00:00
Poul-Henning Kamp	2adc2b87c7	Make a SYSCTL_NODE and a mutex static	2005-02-10 12:16:42 +00:00
Poul-Henning Kamp	85eb15a2ce	Make another bunch of SYSCTL_NODEs static	2005-02-10 12:16:08 +00:00
Poul-Henning Kamp	0c898376fa	Make a bunch of SYSCTL_NODEs static.	2005-02-10 12:15:49 +00:00
Poul-Henning Kamp	c711aea6ca	Make a bunch of malloc types static. Found by: src/tools/tools/kernxref	2005-02-10 12:02:37 +00:00
Poul-Henning Kamp	fe0198779c	Don't pass NULL to vprint()	2005-02-10 08:55:08 +00:00
Jeff Roberson	5c18d18b1d	- Add more information to the getnewbuf() recycling KTR. Sponsored by: Isilon Systems, Inc.	2005-02-10 02:22:56 +00:00
Jeff Roberson	68f2274d97	- Add a new assert in the getnewvnode(). Assert that the usecount is still 0 to detect getnewvnode() races. - Add the vnode address to a few panics near by to help in debugging. Sponsored by: Isilon Systems, Inc.	2005-02-08 23:27:10 +00:00
Jeff Roberson	b56dc9a785	- Remove an invalid KASSERT added in recent background write reshuffling. Sponsored by: Isilon Systems, Inc.	2005-02-08 23:25:08 +00:00
Colin Percival	79653046d8	Add a new sysctl, "security.jail.chflags_allowed", which controls the behaviour of chflags within a jail. If set to 0 (the default), then a jailed root user is treated as an unprivileged user; if set to 1, then a jailed root user is treated the same as an unjailed root user. This is necessary to allow "make installworld" to work inside a jail, since it attempts to manipulate the system immutable flag on certain files. Discussed with: csjp, rwatson MFC after: 2 weeks	2005-02-08 21:31:11 +00:00
Poul-Henning Kamp	dd19a799b8	Background writes are entirely an FFS/Softupdates thing. Give FFS vnodes a specific bufwrite method which contains all the background write stuff and then calls into the default bufwrite() for the rest of the job. Remove all the background write related stuff from the normal bufwrite. This drags the softdep_move_dependencies() back into FFS. Long term, it is worth looking at simply copying the data into allocated memory and issuing the bio directly and not create the "shadow buf" in the first place (just like copy-on-write is done in snapshots for instance). I don't think we really gain anything but complexity from doing this with a buf.	2005-02-08 20:29:10 +00:00
Poul-Henning Kamp	88e5b12a20	Drag another softupdates tentacle back into FFS: Now that FFS's vop_fsync is separate from the internal use we can do the full job there.	2005-02-08 18:09:11 +00:00
Nate Lawson	7990a18c7b	Maxunit is inclusive so fix off-by-one in previous commit.	2005-02-08 18:03:17 +00:00
Nate Lawson	5b68bf38ab	Update device_find_child(9) to return the first matching child if unit is set to -1. Reviewed by: dfr, imp	2005-02-08 18:00:29 +00:00
John Baldwin	fee4a6af3a	Implement a kern_pathconf() wrapper for pathconf() which can take the filename from either a user space or a kernel space pointer.	2005-02-07 21:46:43 +00:00
John Baldwin	5e85ac176f	If the pointer to the new itimerval is NULL in kern_setitimer(), just read the old value via kern_getitimer().	2005-02-07 21:45:48 +00:00
John Baldwin	76951d21d1	- Tweak kern_msgctl() to return a copy of the requested message queue id structure in the struct pointed to by the 3rd argument for IPC_STAT and get rid of the 4th argument. The old way returned a pointer into the kernel array that the calling function would then access afterwards without holding the appropriate locks and doing non-lock-safe things like copyout() with the data anyways. This change removes that unsafeness and resulting race conditions as well as simplifying the interface. - Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(), fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of code duplication for freebsd4 and netbsd compatability system calls. - Add a new lookup function kern_alternate_path() that looks up a filename under an alternate prefix and determines which filename should be used. This is basically a more general version of linux_emul_convpath() that can be shared by all the ABIs thus allowing for further reduction of code duplication.	2005-02-07 18:44:55 +00:00
John Baldwin	c90110d639	Various and sundry style fixes.	2005-02-07 18:38:29 +00:00
Poul-Henning Kamp	b9489a449c	Access vmobject via the bufobj instead of the vnode	2005-02-07 10:04:06 +00:00
Poul-Henning Kamp	7ee4eb6192	VOP_DESTROYVOBJECT() is no more.	2005-02-07 09:26:58 +00:00
Poul-Henning Kamp	7c5d36fb80	Remove vop_stddestroyvobject()	2005-02-07 09:26:39 +00:00
Poul-Henning Kamp	b348abd6cd	Don't call VOP_DESTROYVOBJECT(), trust that VOP_RECLAIM() did what was necessary.	2005-02-07 07:48:03 +00:00
Poul-Henning Kamp	5937226d51	Add a missing prefix to a struct field for consistency.	2005-02-07 07:40:39 +00:00
Ian Dowse	98c926b20f	Add a mechanism for associating a mutex with a callout when the callout is first initialised, using a new function callout_init_mtx(). The callout system will acquire this mutex before calling the callout function and release it on return. In addition, the callout system uses the mutex to avoid most of the complications and race conditions inherent in asynchronous timer facilities, so mutex-protected callouts have much simpler semantics. As long as the mutex is held when invoking callout_stop() or callout_reset(), then these functions will guarantee that the callout will be stopped, even if softclock() had already begun to process the callout. Existing Giant-locked callouts will automatically pick up the new race-free semantics. This should close a number of race conditions in the USB code and probably other areas of the kernel too. There should be no change in behaviour for "MP-safe" callouts; these still need to use the techniques mentioned in timeout(9) to avoid race conditions.	2005-02-07 02:47:33 +00:00
Nate Lawson	88c9b54c47	Add support for relative cpufreq drivers. Such drivers modulate clock frequency as a percentage of the base rate and do not change the base rate directly. The cpufreq framework combines these with absolute drivers to produce synthesized levels made of one or more settings.	2005-02-06 21:08:35 +00:00
Jeff Roberson	8364446643	- Don't release BKGRDINPROG until after we've bufdone'd the copy. Sponsored by: Isilon Systems, Inc.	2005-02-05 01:26:14 +00:00
Jeff Roberson	42a29039de	- Add ke_runq == NULL to the conditions which will cause us to abort adjusting timeshare loads in sched_class(). This is only important if the thread has never run, otherwise the state checks should work as expected.	2005-02-04 17:22:46 +00:00
Suleiman Souhlal	339a7e7fbb	Set the scheduling class of the idle threads to PRI_IDLE. While there, set their priority with sched_prio() instead of changing it 'by hand'. Reviewed by: jhb Approved by: grehan (mentor)	2005-02-04 06:16:05 +00:00
Nate Lawson	73347b071d	Add the cpufreq framework. This code manages multiple drivers and presents a unified kernel and user interface for controlling cpu frequencies.	2005-02-04 05:39:19 +00:00
Nate Lawson	bfdbeca163	Add an interface for cpufreq. The kernel interface lets other drivers select the CPU frequency level (say for cooling). The driver interface allows hardware drivers to announce themselves as capable of adjusting an individual frequency setting.	2005-02-04 05:38:30 +00:00
Pawel Jakub Dawidek	f627315f1e	- Move gets() function to libkern (I want to use it outside vfs_mount.c). - Add buffer size limitations (overflow will not be possible anymore). - Add 'visible' option, which will allow for passphrase reading in the future. - Remove special treatment of '@' and '#', those two are only confusing. Discussed with: rwatson MFC after: 2 weeks	2005-02-03 15:10:58 +00:00
Jeff Roberson	ff05fd5d77	- Correct a typo in kern_rename. tvfslocked should be initialized from tond and not fromnd. This could lead us to leak Giant, or unlock it twice, depending on the filesystems involved. renames within a single filesystem would not have caused any problems. Sponsored by: Isilon Systems, Inc.	2005-02-02 17:17:15 +00:00
Jeff Roberson	37c15216fc	- Or MPSAFE with the correct set of flags in stat(). This affected only the LOOKUP_SHARED case. Spotted by: jhb	2005-02-01 23:43:46 +00:00
Bosko Milekic	737cd9525b	Update copyright, remove "all rights reserved" (since they are not all reserved, as the lisence makes clear), and strike the third clause (now this is a 2-clause liberal BSDL as are the rest of files I hold copyright over).	2005-02-01 03:17:52 +00:00
Maxim Sobolev	a6886ef173	Extend kern_sendit() to take another enum uio_seg argument, which specifies where the buffer to send lies and use it to eliminate yet another stackgap in linuxlator. MFC after: 2 weeks	2005-01-30 07:20:36 +00:00
Maxim Sobolev	ec217396c4	Fix build on AMD64 (and probably other arches where size_t != int). Submitted by: Tinderbox MFC after: 2 weeks	2005-01-30 06:43:17 +00:00
Robert Watson	78bb1895ab	Fix spelling of integer in a comment. Beady eyes: ceri	2005-01-30 00:31:19 +00:00
Maxim Sobolev	56c2262c0e	Grrr, this committer needs to have a sleep. Remove lines from the previous delta not intended for public consumption. MFC after: 2 weeks	2005-01-29 23:51:05 +00:00
Maxim Sobolev	c30af53213	Fix small non-conformance introduced in the previous commit: execve() is expected to return ENAMETOOLONG, not E2BIG if first argument doesn't fit into {PATH_MAX} bytes. MFC after: 2 weeks	2005-01-29 23:47:36 +00:00
Maxim Sobolev	610ecfe035	o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks	2005-01-29 23:12:00 +00:00
Robert Watson	3fcd9325ec	Correct a minr whitespace inconsistency introduced in revision 1.159: add a tab between #define and DF_REBID instead of a space.	2005-01-29 22:04:30 +00:00
Poul-Henning Kamp	d9aaa28f63	Use MAXMINOR	2005-01-29 16:50:04 +00:00
Poul-Henning Kamp	37085a3931	Typo.	2005-01-29 15:10:30 +00:00
Poul-Henning Kamp	3a85fd262c	Add MAXMINOR #define, we should have had this long time ago. Add minor2unit() in addition to dev2unit() and unit2minor(). If it wasn't such a hazzle we should redefine minor numbers in the kernel without the gap for the major number, but it's not worth the bother (yet).	2005-01-29 15:07:13 +00:00
Poul-Henning Kamp	a258707313	In 1.276 of kern/subr_trap.c I introduced a mechanism for delaying a process return to userspace if it had pending GEOM events. We need to have the same check in the exit pass to catch the case where a GEOM related filedescriptor is not explicitly closed by the process. Bumped into by: people using dd(1) to build releases, nanobsd etc.	2005-01-29 14:03:41 +00:00
Jeff Roberson	bd8d684fd7	- Don't drop the wref on the bufobj until after bufdone() has completed. Without this, threads waiting in bufobj_wwait() may wakeup prior to bufdone() completing. Sponsored by: Isilon Systems, Inc.	2005-01-28 17:48:58 +00:00
Poul-Henning Kamp	d4eb29ba71	Remove unused argument to vrecycle()	2005-01-28 13:08:21 +00:00
Poul-Henning Kamp	1fdfaafb08	Integrate vclean() into vgonel(). Various associated polishing.	2005-01-28 13:00:03 +00:00
Poul-Henning Kamp	3fc8dd0653	Remove register keyword	2005-01-28 12:39:10 +00:00
Poul-Henning Kamp	7146d6cb3e	Move the contents of vop_stddestroyvobject() to the new vnode_pager function vnode_destroy_vobject(). Make the new function zero the vp->v_object pointer so we can tell if a call is missing.	2005-01-28 08:56:48 +00:00
Jeff Roberson	37f32177bd	- Regen	2005-01-26 02:29:18 +00:00
Jeff Roberson	810ad5ec4c	- Struct mount is not yet locked well enough to allow mount/nmount/unmount to run without Giant. Mark them as STD here.	2005-01-26 02:28:43 +00:00
Maxim Sobolev	f4b6eb045f	Split out kernel side of msgctl(2) into two parts: the first that pops data from the userland and pushes results back and the second which does actual processing. Use the latter to eliminate stackgap in the linux wrapper of that syscall. MFC after: 2 weeks	2005-01-26 00:46:36 +00:00
Maxim Sobolev	cfa0efe7ab	Split out kernel side of {get,set}itimer(2) into two parts: the first that pops data from the userland and pushes results back and the second which does actual processing. Use the latter to eliminate stackgap in the linux wrappers of those syscalls. MFC after: 2 weeks	2005-01-25 21:28:28 +00:00
Jeff Roberson	04186764a4	- Include LK_INTERLOCK in LK_EXTFLG_MASK so that it makes its way into acquire. - Correct the condition that causes us to skip apause() to only require the presence of LK_INTERLOCK. Sponsored by: Isilon Systems, Inc.	2005-01-25 16:06:05 +00:00
Jeff Roberson	013e6650ca	- Make lf_print static and move its prototype into kern_lockf.c - Protect all of the advlock code with Giant as some filesystems may not be entering with Giant held now. Sponsored by: Isilon Systems, Inc.	2005-01-25 10:15:26 +00:00
Poul-Henning Kamp	4f8d23d662	Previously a read of zero bytes got handled in devfs:vop_read() but I missed that when the vnode bypass was introduced. Deal with zero length transfers before we even get to fo_ops->fo_read(). Found by: Slawa Olhovchenkov <slwzxy.spb.ru@zxy.spb.ru> PR: 75758	2005-01-25 09:15:32 +00:00
Poul-Henning Kamp	729fcf7efb	Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now.	2005-01-25 00:42:16 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Poul-Henning Kamp	69816ea35e	Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem for a given vnode to create a vnode_pager object if one is needed.	2005-01-25 00:12:24 +00:00
Poul-Henning Kamp	dcff5b1440	Don't call VOP_CREATEVOBJECT(), it's the responsibility of the filesystem which owns the vnode.	2005-01-24 23:53:54 +00:00
Poul-Henning Kamp	b5b6ec5faa	Eliminate the constant flags argument to vclean()	2005-01-24 22:22:02 +00:00
Poul-Henning Kamp	d07a6d3f61	Move the body of vop_stdcreatevobject() over to the vnode_pager under the name Sande^H^H^H^H^Hvnode_create_vobject(). Make the new function take a size argument which removes the need for a VOP_STAT() or a very pessimistic guess for disks. Call that new function from vop_stdcreatevobject(). Make vnode_pager_alloc() private now that its only user came home.	2005-01-24 21:21:59 +00:00
Poul-Henning Kamp	f6dc414a5c	Save a line by unlocking before we test.	2005-01-24 14:13:24 +00:00
Poul-Henning Kamp	7c93282e42	Change vprint() to vn_printf() which takes varargs. Add #define for vprint() to call vn_printf().	2005-01-24 13:58:08 +00:00
Poul-Henning Kamp	35764be39e	Kill the VV_OBJBUF and test the v_object for NULL instead.	2005-01-24 13:13:57 +00:00
Poul-Henning Kamp	027b1f716c	Fix a list corruption issue in cloning device management using the western strategy ("allocate first, ask questions later") so we can extend the devmtx coverage to the clone list.	2005-01-24 12:44:56 +00:00
Gleb Smirnoff	90d52f2f21	- Convert so_qlen, so_incqlen, so_qlimit fields of struct socket from short to unsigned short. - Add SYSCTL_PROC() around somaxconn, not accepting values < 1 or > U_SHRTMAX. Before this change setting somaxconn to smth above 32767 and calling listen(fd, -1) lead to a socket, which doesn't accept connections at all. Reviewed by: rwatson Reported by: Igor Sysoev	2005-01-24 12:20:21 +00:00
Jeff Roberson	e1279468ec	- Regen for recent vfs syscall changes. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:50:42 +00:00
Jeff Roberson	29ed48fc6a	- Change all VFS syscalls to MSTD as they all manually deal with giant or the appropriate filesystem locks. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:49:26 +00:00
Jeff Roberson	71ddd673b1	- Add CTR calls to trace the lifecycle of a buffer. - Remove some KASSERTs which are invalid if the appropriate lock is not held. - Slightly restructure bremfree() so that it is more sane. - Change the flush code in bdwrite() to avoid acquiring a mutex whenever possible. - Change the flush code in bdwrite() to avoid holding the bufobj mutex while calling buf_countdeps(). This introduces a lock-order relationship with the softdep lock that can not otherwise be resolved. - Don't set B_DONE until bufdone() is complete, otherwise another processor may believe the buf is done before it is. - Only acquire Giant if the caller has set b_iodone. Don't grab giant around normal bufdone() calls. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:47:04 +00:00
Jeff Roberson	d1fcf3bb31	- Add the tunable and sysctl for the mpsafevfs. It currently defaults to off. - Protect access to mnt_kern_flag with the mointpoint mutex. - Remove some KASSERTs which are not legal checks without the appropriate locks held. - Use VCANRECYCLE() rather than rolling several slightly different checks together. - Return from vtryrecycle() with a recycled vnode rather than a locked vnode. This simplifies some locking. - Remove several GIANT_REQUIRED lines. - Add a few KASSERTs to help with INACT debugging. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:41:01 +00:00
Jeff Roberson	791625d853	- Remove GIANT_REQUIRED where giant is no longer required. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:33:46 +00:00
Jeff Roberson	82d1b24c70	- Remove GIANT_REQUIRED where it is no longer required. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:32:14 +00:00
Jeff Roberson	f50a2d5e2d	- Remove GIANT_REQUIRED where giant is no longer required. - Protect access to mnt_kern_flag with the mountpoint mutex. - Use the appropriate nd flags to deal with giant in vn_open_cred(). We currently determine whether the caller is mpsafe by checking for a valid fdidx. Any caller coming from user-space is now mpsafe and supplies a valid fd. No kenrel callers have been converted to mpsafe, so this check is sufficient for now. - Use VFS_LOCK_GIANT instead of manual giant acquisition where appropriate. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:31:42 +00:00
Jeff Roberson	fc48b760ac	- Protect mnt_kern_flag with the mountpoint's mutex. This is required to make the suspend related functions mpsafe. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:28:41 +00:00
Jeff Roberson	22a960a69c	- Acquire and release Giant as we enter and leave filesystems which require it. - Track the status of Giant with the nd flag HASGIANT. - Release giant on return of namei() callers are not marked MPSAFE as they already own giant. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:27:05 +00:00
Jeff Roberson	94a9458501	- Change all vfs syscalls to use VFS_LOCK_GIANT(), and MPSAFE nds. - Move Giant acquisition into the few vfs syscalls that weren't already directly acquiring it. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:25:44 +00:00
Jeff Roberson	799cc2dcee	- Simplify the cache locking. The lock order relationship with the vnode lock is much simpler than I originally thought it would be. Now, the cache lock is always acquired before the vnode lock. - Provide some gotos in __getcwd() to simplify the unlocking a bit. - Move Giant acquisition down into __getcwd(). Sponsored By: Isilon Systems, Inc.	2005-01-24 10:24:12 +00:00
Jeff Roberson	41bd6c15f2	- Do not use APAUSE if LK_INTERLOCK is set. We lose synchronization if the lockmgr interlock is dropped after the caller's interlock is dropped. - Change some lockmgr KTRs to be slightly more helpful. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:20:59 +00:00
Jeff Roberson	66ca1b4878	- Use VFS_LOCK_GIANT() in place of mtx_lock(&giant), etc. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:19:31 +00:00
Robert Watson	471135a3af	Style cleanup: with removal of mutex operations, we can also remove {}'s from securelevel_gt() and securelevel_ge(). MFC after: 1 week	2005-01-23 21:11:39 +00:00
Robert Watson	0b880542e6	When reading pr_securelevel from a prison, perform a lockless read, as it's an integer read operation and the resulting slight race is acceptable. MFC after: 1 week	2005-01-23 21:01:00 +00:00
Robert Watson	4261ed50fd	When retrieving the current per-jails securelevel for a sysctl read, don't acquire the prison mutex, as it's an integer read and races here don't make a difference. MFC after: 1 week	2005-01-23 20:59:19 +00:00
Robert Watson	5324bda309	When DDB is not defined, don't implement witness_thread_has_locks() and witness_proc_has_locks(), as they are unused, which results in a compiler error. This problem was introduced with the implementation of "show alllocks". Spotted by: Artem Kuchin <matrix at itlegion dot ru>	2005-01-22 21:14:21 +00:00
Robert Watson	14cedfc842	Invoke label initialization, creation, cleanup, and tear-down MAC Framework entry points for System V IPC shared memory. Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net> Obtained from: TrustedBSD Project Sponsored by: DARPA, SPAWAR, McAfee Research	2005-01-22 19:10:25 +00:00
Robert Watson	a6009aa7c1	Invoke label initialization, creation, cleanup, and tear-down MAC Framework entry points for System V IPC semaphores. Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net> Obtained from: TrustedBSD Project Sponsored by: DARPA, SPAWAR, McAfee Research	2005-01-22 19:04:17 +00:00
Robert Watson	e6a543f8db	Invoke label initialization, creation, cleanup, and tear-down MAC Framework entry points for System V IPC message queues. Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net> Obtained from: TrustedBSD Project Sponsored by: DARPA, SPAWAR, McAfee Research	2005-01-22 18:51:43 +00:00
Bosko Milekic	e4eb384b47	Bring in MemGuard, a very simple and small replacement allocator designed to help detect tamper-after-free scenarios, a problem more and more common and likely with multithreaded kernels where race conditions are more prevalent. Currently MemGuard can only take over malloc()/realloc()/free() for particular (a) malloc type(s) and the code brought in with this change manually instruments it to take over M_SUBPROC allocations as an example. If you are planning to use it, for now you must: 1) Put "options DEBUG_MEMGUARD" in your kernel config. 2) Edit src/sys/kern/kern_malloc.c manually, look for "XXX CHANGEME" and replace the M_SUBPROC comparison with the appropriate malloc type (this might require additional but small/simple code modification if, say, the malloc type is declared out of scope). 3) Build and install your kernel. Tune vm.memguard_divisor boot-time tunable which is used to scale how much of kmem_map you want to allott for MemGuard's use. The default is 10, so kmem_size/10. ToDo: 1) Bring in a memguard(9) man page. 2) Better instrumentation (e.g., boot-time) of MemGuard taking over malloc types. 3) Teach UMA about MemGuard to allow MemGuard to override zone allocations too. 4) Improve MemGuard if necessary. This work is partly based on some old patches from Ian Dowse.	2005-01-21 18:09:17 +00:00
Colin Percival	7834081c88	Make "c->c_func = NULL" conditional on CALLOUT_LOCAL_ALLOC in both places where it occurs, not just one. :-) Pointed out by: glebius Pointy had to: cperciva	2005-01-19 21:15:58 +00:00
Colin Percival	0ceba3d69c	Make "c->c_func = NULL" conditional on the CALLOUT_LOCAL_ALLOC flag, i.e., only clear c->c_func if the callout c is being used via the old timeout(9) interface. Requested by: glebius	2005-01-19 20:34:46 +00:00
Colin Percival	86fd19de7b	Clarify the description of the callout_active() macro: It is cleared by callout_stop, callout_drain, and callout_deactivate, but is not automatically cleared when a callout returns.	2005-01-19 19:46:35 +00:00
Paul Saab	efa42cbc93	move kern_nanosleep to sys/syscallsubr.h Requested by: jhb	2005-01-19 18:09:50 +00:00
Paul Saab	0e214fad37	Add a 32bit syscall wrapper for modstat Obtained from: Yahoo!	2005-01-19 17:53:06 +00:00
Paul Saab	7fdf2c856f	- rename nanosleep1 to kern_nanosleep - Add a 32bit syscall entry for nanosleep Reviewed by: peter Obtained from: Yahoo!	2005-01-19 17:44:59 +00:00
Warner Losh	234111d6d0	Introduce bus_free_resource. It is a convenience function which wraps bus_release_resource by grabbing the rid from the resource.	2005-01-19 06:52:19 +00:00
David Xu	a2cc61fa6e	Revert my previous errno hack, that is certainly an issue, and always has been, but the system call itself returns errno in a register so the problem is really a function of libc, not the system call. Discussed with : Matthew Dillion <dillon@apollo.backplane.com>	2005-01-18 13:53:10 +00:00
Poul-Henning Kamp	9fc6aa0618	Detect sign-extension bugs in the ioctl(2) command argument: Truncate to 32 bits and print warning.	2005-01-18 07:37:05 +00:00
Mike Silbersack	6792415119	Rearrange the kninit calls for both directions of a pipe so that they both happen before pipe backing allocation occurs. Previously, a pipe memory shortage would cause a panic due to a KNOTE call on an uninitialized si_note. Reported by: Peter Holm MFC after: 1 week	2005-01-17 07:56:28 +00:00
Poul-Henning Kamp	7bf38aeae7	Fix a bug I introduced in 1.561 which has caused considerable filesystem unhappiness lately. As far as I can tell, no files that have made it safely to disk have been endangered, but stuff in transit has been in peril. Pointy hat: phk	2005-01-16 21:09:39 +00:00
David Xu	b7be40d612	make umtx timeout relative so userland can select different clock type, e.g, CLOCK_REALTIME or CLOCK_MONOTONIC. merge umtx_wait and umtx_timedwait into single function.	2005-01-14 13:38:15 +00:00
Poul-Henning Kamp	7c0745eeae	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
Poul-Henning Kamp	e39db32ab0	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.	2005-01-13 12:25:19 +00:00
Poul-Henning Kamp	63f89abf4a	Change the generated VOP_ macro implementations to improve type checking and KASSERT coverage. After this check there is only one "nasty" cast in this code but there is a KASSERT to protect against the wrong argument structure behind that cast. Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical kernel with no change in performance. We also now run the checking and tracing on VOP's which have been layered by nullfs, umapfs, deadfs or unionfs. Add new (non-inline) VOP_FOO_AP() functions which take a "struct foo_args" argument and does everything the VOP_FOO() macros used to do with checks and debugging code. Add KASSERT to VOP_FOO_AP() check for argument type being correct. Slim down VOP_FOO() inline functions to just stuff arguments into the struct foo_args and call VOP_FOO_AP(). Put function pointer to VOP_FOO_AP() into vop_foo_desc structure and make VCALL() use it instead of the current offsetoff() hack. Retire vcall() which implemented the offsetoff() Make deadfs and unionfs use VOP_FOO_AP() calls instead of VCALL(), we know which specific call we want already. Remove unneeded arguments to VCALL() in nullfs and umapfs bypass functions. Remove unused vdesc_offset and VOFFSET(). Generally improve style/readability of the generated code.	2005-01-13 07:53:01 +00:00
Maxim Sobolev	fdf84ec4c6	When re-connecting already connected datagram socket ensure to clean up its pending error state, which may be set in some rare conditions resulting in connect() syscall returning that bogus error and making application believe that attempt to change association has failed, while it has not in fact. There is sockets/reconnect regression test which excersises this bug. MFC after: 2 weeks	2005-01-12 10:15:23 +00:00
Poul-Henning Kamp	3963baec64	Comment out debugging printf which doesn't compile on amd64.	2005-01-12 10:11:31 +00:00
David Xu	333d4875cd	Let _umtx_op directly return error code rather than from errno because errno can be tampered potentially by nested signal handle. Now all error codes are returned in negative value, positive value are reserved for future expansion.	2005-01-12 05:55:52 +00:00
Poul-Henning Kamp	6ef8480a88	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
Poul-Henning Kamp	6afa350d53	More vnode -> bufobj migration.	2005-01-11 10:16:39 +00:00
Poul-Henning Kamp	8d785753bd	Give flushbuflist() a struct bufv as first argument and avoid home-rolling TAILQ_FOREACH_SAFE(). Loose the error pointer argument and return any errors the normal way. Return EAGAIN for the case where more work needs to be done.	2005-01-11 10:01:54 +00:00
Poul-Henning Kamp	8df6bac4c7	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
David Xu	3e380f0d3d	Break out of loop earlier if it is not timeout.	2005-01-08 06:57:46 +00:00
Robert Watson	2b05b557ff	In acct_process(), do a lockless read of acctvp to see if it's NULL before deciding to do more expensive locking to account for process exit. This acceptable minor race avoids two mutex operations in that highly common case of accounting not being enabled. MFC after: 2 weeks	2005-01-08 04:45:57 +00:00
Robert Watson	fd544ee8f7	In kern_wait(), let the compiler copy the rusage structure rather than an explicit bcopy() -- it probably does a better job.	2005-01-08 04:17:48 +00:00
Colin Percival	e9dec2c41b	Adjust two of my comments to the new world order: Indent protection in the first column is performed using /*, not /-.	2005-01-07 03:25:45 +00:00
Warner Losh	60727d8b86	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Warner Losh	73108a1664	Expand COPYRIGHT inline, per Matthew Dillon's earlier approval.	2005-01-06 23:34:38 +00:00
David Xu	476e1d077e	Return ETIMEDOUT when thread is timeouted since POSIX thread APIs expect ETIMEDOUT not EAGAIN, this simplifies userland code a bit.	2005-01-06 02:08:34 +00:00
John Baldwin	c88379381b	- Move the function prototypes for kern_setrlimit() and kern_wait() to sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.	2005-01-05 22:19:44 +00:00
John Baldwin	33fb8a386e	Rework the optimization for spinlocks on UP to be slightly less drastic and turn it back on. Specifically, the actual changes are now less intrusive in that the _get_spin_lock() and _rel_spin_lock() macros now have their contents changed for UP vs SMP kernels which centralizes the changes. Also, UP kernels do not use _mtx_lock_spin() and no longer include it. The UP versions of the spin lock functions do not use any atomic operations, but simple compares and stores which allow mtx_owned() to still work for spin locks while removing the overhead of atomic operations. Tested on: i386, alpha	2005-01-05 21:13:27 +00:00
Poul-Henning Kamp	0b3e4fe239	Since we do not support forceful unmount of DEVFS we can do away with the partially implemented vnode-readoption code in vgonechrl().	2005-01-04 08:49:14 +00:00
Marcel Moolenaar	3195113e2a	Regen.	2005-01-03 00:47:23 +00:00
Marcel Moolenaar	fe0ef598b6	uuidgen(2) is MP safe.	2005-01-03 00:45:57 +00:00
Warner Losh	d0d4cc63e3	Implement device_quiesce. This method means 'you are about to be unloaded, cleanup, or return ebusy of that's inconvenient.' The default module hanlder for newbus will now call this when we get a MOD_QUIESCE event, but in the future may call this at other times. This shouldn't change any actual behavior until drivers start to use it.	2004-12-31 20:47:51 +00:00
Pawel Jakub Dawidek	46003fb337	Be consistent and always use form 'return (value);' instead of 'return value;'. We had (before this change) 84 lines where it was style(9)-clean and 15 lines where it was not.	2004-12-31 14:52:53 +00:00
John Baldwin	50aaa791ba	Fix a typo and two whitespace nits.	2004-12-30 22:17:00 +00:00
John Baldwin	f5c157d986	Rework the interface between priority propagation (lending) and the schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64	2004-12-30 20:52:44 +00:00
John Baldwin	99b808f461	Whitespace fix.	2004-12-30 20:30:58 +00:00
John Baldwin	63710c4d35	Stop explicitly touching td_base_pri outside of the scheduler and simply set a thread's priority via sched_prio() when that is the desired action. The schedulers will start managing td_base_pri internally shortly.	2004-12-30 20:29:58 +00:00
John Baldwin	9e6c867ccc	Call tty_close() at the very end of ttyclose() since otherwise NULL deferences can occur since tty_close() may end up freeing the tty structure if it drops the last reference to it. Glanced at by: phk	2004-12-30 19:24:49 +00:00
Robert Watson	b36aab857b	Make the sysctls kern.ipc.msgmnb and kern.ipc.msgtql into tunables as is the case for most other sysctls in the System V IPC message queue implementation. PR: 75541 Submitted by: Sergiy Vyshnevetskiy <serg at vostok dot net> MFC after: 2 weeks	2004-12-30 13:56:34 +00:00
David Xu	cc1000ac5b	Make umtx_wait and umtx_wake more like linux futex does, it is more general than previous. It also lets me implement cancelable point in thread library. Also in theory, umtx_lock and umtx_unlock can be implemented by using umtx_wait and umtx_wake, all atomic operations can be done in userland without kernel's casuptr() function.	2004-12-30 02:56:17 +00:00
Alan Cox	956d03da83	Eliminate (now) unnecessary acquisition and release of the global page queues lock.	2004-12-29 04:49:10 +00:00
John Baldwin	83ae089aab	- Up the WITNESS_COUNT macro from 200 to 1024 to support the growing number of lock types in the kernel. This results in an increase of witness data usage from ~145k to ~280k on i386 for kernels with 'options WITNESS'. - Remove the unused witness malloc bucket. Submitted by: Michal Mertl mime at traveller dot cz (1)	2004-12-28 21:21:27 +00:00
Robert Watson	6ce8940626	Attempt to slightly refine the print out from "show alllocks" -- list the process and thread numbers/names on the same line rather than on separate lines, and print the thread pointer not just the tid.	2004-12-27 10:47:08 +00:00
Alexander Kabaev	aa6f98d12f	Do not vput(9) unlocked vnode and do not VREF it with the sole purpose of vputting it back immediately. Complained by: DEBUG_VFS_LOCKS	2004-12-27 05:17:11 +00:00
Jeff Roberson	2ebf8eb132	- Unintentionally checked in a debugging panic. Remove that.	2004-12-26 23:21:48 +00:00
Jeff Roberson	36996b3b7c	- Remove a 4BSD specific hack since this will work on ULE too.	2004-12-26 22:56:51 +00:00
Jeff Roberson	598b368d6c	- Fix a long standing problem where an ithread would not honor sched_pin(). - Remove the sched_add wrapper that used sched_add_internal() as a backend. Its only purpose was to interpret one flag and turn it into an int. Do the right thing and interpret the flag in sched_add() instead. - Pass the flag argument to sched_add() to kseq_runq_add() so that we can get the SRQ_PREEMPT optimization too. - Add a KEF_INTERNAL flag. If KEF_INTERNAL is set we don't adjust the SLOT counts, otherwise the slot counts are adjusted as soon as we enter sched_add() or sched_rem() rather than when the thread is actually placed on the run queue. This greatly simplifies the handling of slots. - Remove the explicit prevention of migration for ithreads on non-x86 platforms. This was never shown to have any real benefit. - Remove the unused class argument to KSE_CAN_MIGRATE(). - Add ktr points for thread migration events. - Fix a long standing bug on platforms which don't initialize the cpu topology. The ksg_maxid variable was never correctly set on these platforms which caused the long term load balancer to never inspect more than the first group or processor. - Fix another bug which prevented the long term load balancer from working properly. If stathz != hz we can't expect sched_clock() to be called on the exact tick count that we're anticipating. - Rearrange sched_switch() a bit to reduce indentation levels.	2004-12-26 22:56:08 +00:00
Robert Watson	b6dd9ef2fe	Add "show alllocks" command to DDB, which dumps a list of processes and threads currently holding sleep mutexes (and spin mutexes for curthread). This can be quite useful in looking for a lock condition summary for a system, as it avoids manually iterating through threads and processes to find all the interesting locks. NB: "alllocks" is up there with "lockedvnods" for a bad argument for show. MFC after: 2 weeks	2004-12-26 22:52:24 +00:00
Jeff Roberson	6a98702001	- Run sched_userret() after thread_userret(). Before, sched_userret() would lower the priority of the returning thread to a user priority before calling into thread_userret() which would call wakeup() which in turn would cause the returning thread to eventually context switch rather than completing its slice. Allowing this thread to complete its slice first yields a 15% performance improvement in super-smack on my dual opteron with 4BSD.	2004-12-26 07:30:35 +00:00
Jeff Roberson	907bdbc288	- Wrap the thread count adjustment in sched_load_add() and sched_load_rem() so that we may place some ktr entries nearby. - Define other KTR_SCHED tracepoints so that we may graph the operation of the scheduler.	2004-12-26 00:16:24 +00:00
Jeff Roberson	81d47d3f4b	- Remove earlier KTR_ULE tracepoints. - Define new KTR_SCHED points so that we can graph the operation of the scheduler.	2004-12-26 00:15:33 +00:00
Jeff Roberson	85da7a569b	- Define KTR points for KTR_SCHED.	2004-12-26 00:14:21 +00:00
David Xu	c180db2bce	Make _umtx_op() as more general interface, the final parameter needn't be timespec pointer, every parameter will be interpreted by its opcode.	2004-12-25 13:02:50 +00:00
David Xu	8b37fbabb4	1. introduce umtx_owner to get an owner of a umtx. 2. add const qualifier to umtx_timedlock and umtx_timedwait. 3. add missing blackets in umtx do_unlock_and_wait.	2004-12-25 12:49:35 +00:00
David Xu	3dd213f160	Add umtxq_lock/unlock around umtx_signal, fix debug kernel compiling, let umtx_lock returns EINTR when it returns ERESTART, this lets userland have chance to back off mtx lock code when needed.	2004-12-24 11:59:20 +00:00
David Xu	a08c214a72	1. Fix race condition between umtx lock and unlock, heavy testing on SMP can explore the bug. 2. Let umtx_wake returns number of threads have been woken.	2004-12-24 11:30:55 +00:00
Robert Watson	0fddf92d72	Assert the sem lock in sem_ref() and sem_rel(), as it is required to safely manipulate the reference count.	2004-12-23 02:22:47 +00:00
Robert Watson	38e6a58c77	Remove temporary debugging printf that was used to detect the presence of a race that had previously caused a panic in order to determine if the fix was for the right problem. It was. MFC after: 2 weeks	2004-12-23 01:19:27 +00:00
Robert Watson	1ef121cf6b	In sonewconn(), the s/if/while/ change to wait for room at the tail of the accept queue is a feature, not a bug/issue, so remove the XXXRW from the comment.	2004-12-23 01:16:21 +00:00
Robert Watson	ba65391172	Remove an XXXRW indicating atomic operations might be used as a substitute for a global mutex protecting the socket count and generation number. The observation that soreceive_rcvoob() can't return an mbuf chain is a property, not a bug, so remove the XXXRW. In sorflush, s/existing/previous/ for code when describing prior behavior. For SO_LINGER socket option retrieval, remove an XXXRW about why we hold the mutex: this is correct and not dubious. MFC after: 2 weeks	2004-12-23 01:07:12 +00:00

... 4 5 6 7 8 ...

8536 Commits