freebsd-skq

Author	SHA1	Message	Date
glebius	19f8b36e66	Merge the //depot/user/yar/vlan branch into CVS. It contains some collective work by yar, thompsa and myself. The checksum offloading part also involves work done by Mihail Balikov. The most important changes: o Instead of global linked list of all vlan softc use a per-trunk hash. The size of hash is dynamically adjusted, depending on number of entries. This changes struct ifnet, replacing counter of vlans with a pointer to trunk structure. This change is an improvement for setups with big number of VLANs, several interfaces and several CPUs. It is a small regression for a setup with a single VLAN interface. An alternative to dynamic hash is a per-trunk static array with 4096 entries, which is a compile time option - VLAN_ARRAY. In my experiments the array is not an improvement, probably because such a big trunk structure doesn't fit into CPU cache. o Introduce an UMA zone for VLAN tags. Since drivers depend on it, the zone is declared in kern_mbuf.c, not in optional vlan(4) driver. This change is a big improvement for any setup utilizing vlan(4). o Use rwlock(9) instead of mutex(9) for locking. We are the first ones to do this! :) o Some drivers can do hardware VLAN tagging + hardware checksum offloading. Add an infrastructure for this. Whenever vlan(4) is attached to a parent or parent configuration is changed, the flags on vlan(4) interface are updated. In collaboration with: yar, thompsa In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>	2006-01-30 13:45:15 +00:00
rwatson	3485717ebc	Move pts master devices into /dev/pty/ instead of littering /dev with them; this is more consistent with the placement of slaves in /dev/pts. The actual name doesn't matter as it's not part of the exposed API or used by libc. In some sense, it would be nice if these device nodes didn't have to have names in devfs at all. Suggested by: Stephen McKay <smckay at internode dot on dot net>	2006-01-30 11:59:19 +00:00
glebius	e8ec4c3a0a	- In pipe() return the error returned by pipe_create(), rather then hardcoded ENFILES, which is incorrect. pipe_create() can fail due to ENOMEM. - Update manual page, describing ENOMEM return code. Reviewed by: arch	2006-01-30 08:25:04 +00:00
jeff	cafb7bd7e0	- Add a comment warning about an anomalous condition where we VOP_UNLOCK and then vrele rather than vput because we would like to VOP_UNLOCK with a specific thread.	2006-01-30 08:21:23 +00:00
jeff	822d3c8355	- Lock access to vrele() with VFS_LOCK_GIANT() rather than mtx_lock(&Giant). Sponsored by: Isilon Systems, Inc.	2006-01-30 08:19:01 +00:00
scottl	66db92c03a	Take a stab at making this compile when WITNESS is not defined. gcc can't figure out the order of operations at line 519, and neither can I, but this is my best guess. Also correct a number of typos and syntax errors.	2006-01-29 20:48:25 +00:00
mlaier	719dd1ebed	firmware(9) is a subsystem to load binary data into the kernel via a specially crafted module. There are several handrolled sollutions to this problem in the tree already which will be replaced with this. They include iwi(4), ipw(4), ispfw(4) and digi(4). No objection from: arch MFC after: 2 weeks X-MFC after: some drivers have been converted	2006-01-29 02:52:42 +00:00
mlaier	00e2fdfae2	Unbreak on archs where %d doesn't print uintptr_t arithmetic.	2006-01-29 02:35:22 +00:00
rwatson	252cbc4973	Rename use_old_pty variable to use_pts, as this more accurately reflects the sense of the variable. Suggested by: dwhite	2006-01-28 23:31:19 +00:00
ssouhlal	0ee4f07a23	Don't try to load KLDs if we're mounting the root. We'd otherwise panic. Tested by: kris MFC after: 3 days	2006-01-28 22:58:39 +00:00
kris	a70f9992d4	Back out r1.653; it turns out that the race (or at least the printf) is actually not hard to trigger, and it can cause a lot of console spam. Approved by: kan	2006-01-28 03:06:35 +00:00
imp	7b10eaebae	lock unused when INVARIANTS not defined, so don't declare it then	2006-01-28 00:49:31 +00:00
jhb	bf16c50390	Add a basic reader/writer lock implementation to the kernel. This implementation is by no means perfect as far as some of the algorithms that it uses and the fact that it is missing some functionality (try locks and upgrades/downgrades are not there yet), however it does seem to work in my local testing. There is more detail in the comments in the code, but the short version follows. A reader/writer lock is very much like a regular mutex: it cannot be held across a voluntary sleep; it can be acquired in an interrupt thread; if the lock is held by a writer then the priority of any threads that block on the lock will be lent to the owner; the simple case lock operations all are done in a single atomic op. It also shares some similiarities with sx locks: it supports reader/writer semantics (multiple readers, but single writers); readers are allowed to recurse, but writers are not. We can extend this implementation further by either improving algorithms or adding new functionality, but this should at least give us a base to work with now. Reviewed by: arch (in theory) Tested on: i386 (4 cpu box with a kernel module that used 4 threads that randomly chose between read locks and write locks that ran w/o panicing for over a day solid. It usually panic'd within a few seconds when there were bugs during testing. :) The kernel module source is available on request.)	2006-01-27 23:13:26 +00:00
jhb	d9899f5d16	Whitespace.	2006-01-27 23:06:08 +00:00
jhb	752dede518	- Add support for having both a shared and exclusive queue of threads in each turnstile. Also, allow for the owner thread pointer of a turnstile to be NULL. This is needed for the upcoming reader/writer lock implementation. - Add a new ddb command 'show turnstile' that will look up the turnstile associated with the given lock argument and display useful information like the list of threads blocked on each queue, etc. If there isn't an active turnstile for a lock at the specified address, then the function will see if there is an active turnstile at the specified address and display info about it if so. - Adjust the mutex code to handle the turnstile API changes. Tested on: i386 (all), alpha, amd64, sparc64 (1 and 3)	2006-01-27 22:42:12 +00:00
jhb	6160bb7d84	Add a new ddb command 'show sleepq'. It takes a wait channel as an argument and looks for a sleep queue associated with that wait channel. If it finds one it will display information such as the list of threads sleeping on that queue. If it can't find a sleep queue for that wait channel, then it will see if that address matches any of the active sleep queues. If so, it will display information about the sleepq at the specified address.	2006-01-27 22:24:07 +00:00
jhb	a7d556d07e	Add a new sysctl, debug.ktr.clear. If you write a non-zero value to this sysctl then it will clear the KTR buffer. Note that if you have active KTR traces at the same time as a clear operation the behavior is undefined, though it shouldn't panic.	2006-01-27 22:17:31 +00:00
cognet	b1d0cfd746	Merge a bunch of changes that where done in tty_pty.c after tty_pts.c was forked from it, but missed from some reason.	2006-01-27 15:13:40 +00:00
pjd	26c98c76f9	Grr. Backout previous change. vn_open_cred() will call NDFREE() on failure.	2006-01-27 11:25:06 +00:00
pjd	bc66ded70e	Don't forget to call NDFREE(9) in case of vn_open_cred() failure. MFC after: 3 days	2006-01-27 11:19:53 +00:00
davidxu	a8cf0ae648	Just like dofilewrite(), call bwillwrite before fo_write.	2006-01-27 08:02:25 +00:00
davidxu	7a2375b9e4	return final error code in aio_return rather than a hardcoded 0.	2006-01-27 04:14:16 +00:00
cognet	3299c77864	Take into account that bits 0x0000ff00 can't be used for minor.	2006-01-27 00:21:48 +00:00
cognet	998c2ee892	Don't attempt to re-create the /dev entry for the slave part if it already exist when opening the master. This can happen if one open the master, then open the slave, then close and re-open the master. Reported by: Peter Holm	2006-01-26 20:54:49 +00:00
davidxu	e2e1fa02bc	in aio_aqueue, store same return code into job->_aiocb_private.error. in aio_return, unlock proc lock before suword.	2006-01-26 08:37:02 +00:00
cognet	aff5d6bf80	Bring in a sysv-style pts implementation, as found in the rwatson_pts perforce branch. It works the same as its SysV/linux counterpart : You obtain a fd to the master pseudo terminal by opening /dev/ptmx, which craetes a node for the master as /dev/pty[num] and a node for the slave as /dev/pts/[num]. It should play nicely with the existing BSD ptys. By default, the system will use the BSD ptys, one can set the sysctl kern.pts.enable to 1 to make it use the new pts system. The max number of pty that can be allocated on a system can be changed with the sysctl kern.pts.max. It defaults to 1000, and can be increased, but it is not recommanded, as any pty with a number > 999 won't be handled by whatever uses utmp(5).	2006-01-26 01:30:34 +00:00
jhb	39a7d62e79	Axe KTR_ALQ_MASK now that KTR_WITNESS is off unless you hack an #ifdef in subr_witness.c. I did add a comment in subr_witness.c noting that KTR_WITNESS is incompatible with KTR_ALQ.	2006-01-25 14:57:23 +00:00
ups	181ac11e19	Back out changes made in rev. 1.151. They were bogus. Cluebat applied by: jhb@	2006-01-25 02:05:47 +00:00
truckman	9f8a407ccb	Touch all the pages wired by sysctl_wire_old_buffer() to avoid PTE modified bit emulation traps on Alpha while holding locks in the sysctl handler. A better solution would be to pass a hint to the Alpha pmap code to tell mark these pages as modified when they as they are being wired, but that appears to be more difficult to implement. Suggested by: jhb MFC after: 3 days	2006-01-25 01:03:34 +00:00
jhb	4730881b84	Whitespace fix.	2006-01-24 22:24:05 +00:00
jhb	35abe809de	- Add a new KTR_SUBSYS in place of KTR_SPARE1 to serve as a subsystem placeholder similar to KTR_DEV. Explain the use of KTR_DEV and KTR_SUBSYS in a comment as well. - Retire KTR_WITNESS and instead have KTR_WITNESS default to off but use KTR_SUBSYS if it is enabled.	2006-01-24 22:23:45 +00:00
davidxu	1f5105d8a7	Add locking annotation and comments about socket, pipe, fifo problem. Temporarily fix a locking problem for socket I/O.	2006-01-24 07:24:24 +00:00
davidxu	edbd69603d	Er, rescure a deleted comment line.	2006-01-24 02:50:42 +00:00
davidxu	acb3d897fb	More cleanup for aio code: 1) unregsiter kqueue filter for EVFILT_LIO. 2) free uma_zones. 3) call setsid directly to enter another session rather than implementing by itself. Submitted by: jhb	2006-01-24 02:46:15 +00:00
davidxu	4aec3469f3	Add bracket.	2006-01-23 23:46:30 +00:00
jhb	0c769dbc8b	Fix a vnode reference leak in the ktrace code. We always grab a reference to the vnode at the start of ktr_writerequest() but were missing the corresponding vrele() after we finished the write operation. Reported by: jasone	2006-01-23 21:45:32 +00:00
ups	18ba9270dc	Hopefully fix the "calcru: runtime went backwards from ..." problem by keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days	2006-01-23 19:15:13 +00:00
andre	b7dae7ac6c	In mb_zinit_pack() explicitly ignore the return value of uma_zalloc_arg(). The success of the cluster allocation is checked through a field in the mbuf structure. This change is non-functional but properly silences code inspection tools. Found by: Coverity Prevent(tm) Coverity ID: CID807 Sponsored by: TCP/IP Optimization Fundraise 2005	2006-01-23 15:49:01 +00:00
davidxu	14cd6d7f49	Verify all supported notification types.	2006-01-23 10:27:15 +00:00
davidxu	47e7cba205	1) Merge _aio_aqueue and aio_aqueue, check quota in aio_aqueue, so that lio_listio won't exceed the quota. 2) Remove lio_ref_count, it is no longer used.	2006-01-23 02:49:34 +00:00
alc	bd4e907d2a	Remove an unnecessary call to pmap_remove_all(). The given page is not mapped because its contents are invalid. Reviewed by: tegge	2006-01-23 00:00:45 +00:00
truckman	7ef6769a30	Tweak previous vfs_lookup.c commit to return an EINVAL error from lookup() instead of EPERM when a DELETE or RENAME operation is attempted on "..". In kern_unlink(), remap EINVAL errors returned from namei() to EPERM to match existing (and POSIX required) behaviour. Discussed with: bde MFC after: 3 days	2006-01-22 19:37:02 +00:00
davidxu	7c1b5ba95e	Fix a bogus panic.	2006-01-22 09:39:59 +00:00
davidxu	362cb85fb6	Decrease kaio_active_count first, because user process may go away after we notified it.	2006-01-22 09:25:52 +00:00
davidxu	ea6572d219	Regen.	2006-01-22 06:01:48 +00:00
davidxu	72c8645faa	Make aio code MP safe.	2006-01-22 05:59:27 +00:00
njl	94f070e5b3	Add a devd(8) event that is sent after the system resumes. This can be used by utilities to reset moused(8), for example. The syntax is: !system=kern subsystem=power type=resume Note that it would be nice to have notification of suspend, but it's more difficult since there would have to be a method of doing request/ack to userland and automatically timing out if no response. apm(4) has a similar mechanism. MFC after: 2 weeks	2006-01-22 01:06:25 +00:00
rwatson	36caf43985	Convert remaining functions to ANSI C function declarations. MFC after: 1 week	2006-01-22 00:30:46 +00:00
alc	6650221a11	Avoid a vm object reference leak in a rarely used code path. An executable contains at most one PT_INTERP program header. Therefore, the loop that searches for it can terminate after it is found rather than iterating over the entire set of program headers. Eliminate an unneeded initialization. Reviewed by: tegge	2006-01-21 20:11:49 +00:00
truckman	60d9e55a8a	Return EPERM from lookup() if cn_nameiop is DELETE or RENAME and the last component of the path name is "..". This keeps VOP_LOOKUP() from locking vnodes in reverse order. Tested by: Denis Shaposhnikov <dsh AT vlink DOT ru> MFC after: 3 days	2006-01-21 19:57:56 +00:00
rwatson	f04c2fbb7d	Convert remaining functions in vfs_subr.c from K&R prototypes to ANSI C prototypes, as the majority of new functions added have been in this style. Changing prototype style now results in gcc noticing that the implementation of vn_pollrecord() has a 'short' argument instead of 'int' as prototyped in vnode.h, so correct that definition. In practice this didn't matter as only poll flags in the lower 16 bits are used. MFC after: 1 week	2006-01-21 19:42:10 +00:00
jhb	76788b459e	When loading a driver that is a subclass of another driver don't set the devclass's parent pointer if the two drivers share the same devclass. This can happen if the drivers use the same new-bus name. For example, we currently have 3 drivers that use the name "pci": the generic PCI bus driver, the ACPI PCI bus driver, and the OpenFirmware PCI bus driver. If the ACPI PCI bus driver was defined as a subclass of the generic PCI bus driver, then without this check the "pci" devclass would point to itself as its parent and device_probe_child() would spin forever when it encountered the first PCI device that did have a matching driver. Reviewed by: dfr, imp, new-bus@	2006-01-20 21:59:13 +00:00
julian	7e4664b0c3	Return the thread name in the kinfo_proc structure. Also correct the comment describing what the value is.	2006-01-18 20:27:43 +00:00
jhb	59fe0d8fe8	Always include the lock_classes[] array in the kernel. The "is it a spinlock" test in mtx_destroy() needs it even in non-debug kernels. Reported by: danfe	2006-01-18 18:02:50 +00:00
jmallett	e38f514d90	Since p_cansee will end up dereferencing p_ucred, don't check for p_ucred equal to NULL several times later. p_ucred "should probably not" be NULL if the process isn't PRS_NEW anyway. This is strongly reinforced by the fact that we don't see frequent crashes here. Remove the checks after p_cansee and add a KASSERT right before it. Found by: Coverity Prevent (tm) Also trim one nearby trailing space.	2006-01-17 20:25:01 +00:00
jhb	fefbd8d12e	Bah. Fix 'show lock' to actually be compiled in. I had just fixed this in p4 but had an older subr_lock.c on the machine I committed to CVS from.	2006-01-17 16:58:32 +00:00
jhb	c0cf4870f4	Add a new file (kern/subr_lock.c) for holding code related to struct lock_obj objects: - Add new lock_init() and lock_destroy() functions to setup and teardown lock_object objects including KTR logging and registering with WITNESS. - Move all the handling of LO_INITIALIZED out of witness and the various lock init functions into lock_init() and lock_destroy(). - Remove the constants for static indices into the lock_classes[] array and change the code outside of subr_lock.c to use LOCK_CLASS to compare against a known lock class. - Move the 'show lock' ddb function and lock_classes[] array out of kern_mutex.c over to subr_lock.c.	2006-01-17 16:55:17 +00:00
jhb	b24626498e	Initialize thread0.td_contested in init_turnstiles() rather than mutex_init() as it is used by the turnstile code and is not mutex-specific.	2006-01-17 16:47:42 +00:00
jhb	a8d64eb19c	Garbage collect turnstile_empty() since it is unused.	2006-01-17 16:40:20 +00:00
phk	7a469c93bd	Fix an 11 year old mistake: Let the hash functions take a void* instead of unsigned char* argument.	2006-01-17 15:35:57 +00:00
tegge	98fde94067	Set flag in needsbuffer while still holding bqlock to avoid lost wakeup.	2006-01-16 22:09:47 +00:00
csjp	4f56714639	vfs_busy can only return something useful if MNTK_UNMOUNT has been set. Since we are using vfs_busy() on a freshly allocated mount structure, use (void) to show that we do not care about the return value. Found with: Coverity Prevent (tm) MFC after: 2 weeks	2006-01-15 20:14:11 +00:00
rwatson	04dec30982	Cast VFS_STATFS() in vfs_domount() to (void) to indicate that ignoring the return value is intentional: this is simply an attempt to pre-cache the statfs state. Found with: Coverity Prevent (tm) MFC after: 3 days	2006-01-15 20:01:05 +00:00
csjp	ef078951d6	Initialize ki to p->p_aioinfo after we know it's going to be referencing a valid kaioinfo structure. This avoids a potential NULL pointer dereference. Found with: Coverity Prevent(tm) MFC after: 2 weeks	2006-01-15 01:55:45 +00:00
ru	af650f8533	AMD64 also supports disk slices.	2006-01-14 20:47:11 +00:00
phk	7168d9e051	Correct STAILQ usage in purge of resourcelist. Found with: Coverity Prevent(tm)	2006-01-14 09:41:35 +00:00
scottl	645eb22044	Add the following to the taskqueue api: taskqueue_start_threads(struct taskqueue *, int count, int pri, const char name, ...); This allows the creation of 1 or more threads that will service a single taskqueue. Also rework the taskqueue_create() API to remove the API change that was introduced a while back. Creating a taskqueue doesn't rely on the presence of a process structure, and the proc mechanics are much better encapsulated in taskqueue_start_threads(). Also clean up the taskqueue_terminate() and taskqueue_free() functions to safely drain pending tasks and remove all associated threads. The TASKQUEUE_DEFINE and TASKQUEUE_DEFINE_THREAD macros have been changed to use the new API, but drivers compiled against the old definitions will still work. Thus, recompiling drivers is not a strict requirement.	2006-01-14 01:55:24 +00:00
rwatson	bcba87c522	When calling bioq_first() to see if a queue is empty in bioq_disksort(), don't save the return value as we won't use it. Noticed by: Coverity Prevent analysis tool MFC after: 3 days	2006-01-13 23:27:12 +00:00
rwatson	6bf4e0e89f	Add sosend_dgram(), a greatly reduced and simplified version of sosend() intended for use solely with atomic datagram socket types, and relies on the previous break-out of sosend_copyin(). Changes to allow UDP to optionally use this instead of sosend() will be committed as a follow-up.	2006-01-13 10:22:01 +00:00
rwatson	a59d345748	XXX a comment in uipc_usrreq.c that requires updating.	2006-01-13 00:00:32 +00:00
alfred	941cdcd1a0	Novel idea, don't print a string if it is NULL! This protects people from loading _really_ old modules, like say from 5.x to a 6.x or 7.x system, like for instance right after an upgrade.	2006-01-12 19:15:14 +00:00
scottl	a1e420856f	The interlock in taskqueue_terminate() is completely wrong for taskqueues that use spinlocks. Remove it for now.	2006-01-11 00:37:13 +00:00
phk	57be8af642	Move the old BSD4.3 tty compatibility from (!BURN_BRIDGES && COMPAT_43) to COMPAT_43TTY. Add COMPAT_43TTY to NOTES and */conf/GENERIC Compile tty_compat.c only under the new option. Spit out #warning "Old BSD tty API used, please upgrade." if ioctl_compat.h gets #included from userland.	2006-01-10 09:19:10 +00:00
scottl	706bc421be	Add functions and macros and refactor code to make it easier to manage fast taskqueues. The following have been added: TASKQUEUE_FAST_DEFINE() - create a global task queue. an arbitrary execution context. TASKQUEUE_FAST_DEFINE_THREAD() - create a global taskqueue that uses a dedicated kthread. taskqueue_create_fast() - create a local/private taskqueue. These are all complimentary of the standard taskqueue functions. They are primarily useful for fast interrupt handlers that can only use spinlock for synchronization. I personally think that the taskqueue API is starting to get too narrow and hairy, but fixing it will require a major redesign on the API. Such a redesign would be good but would break compatibility with FreeBSD 6.x, so it really isn't desirable at this time. Submitted by: sam	2006-01-10 06:31:12 +00:00
tegge	d344c11861	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
scottl	5e21ed37ae	If destroying a spinlock, make sure that it is exited properly. Submitted by: jhb MFC After: 3 days	2006-01-08 00:18:34 +00:00
jhb	02a9d26edc	Revert an untested local change that crept in with the lo_class changes and subsequently broke the build. This change is supposed to fix the case where doing a mtx_destroy() off a spin mutex while you hold it fails. If it had been tested I would just leave it in, but it hasn't been tested yet, so it will have to wait until later.	2006-01-07 14:03:15 +00:00
davidxu	a4e63b41b7	Add a new feature to thr_kill, if thread ID argument is -1, send signals to all threads except current sender.	2006-01-07 03:15:21 +00:00
avatar	6ad90eb0a7	Trying to fix compilation bustage introduced in rev1.160 by converting a missing lo_class to LO_CLASSINDEX().	2006-01-07 02:07:08 +00:00
jhb	8f18f21de1	Trim another pointer from struct lock_object (and thus from struct mtx and struct sx). Instead of storing a direct pointer to a our lock_class struct in lock_object, reserve 4 bits in the lo_flags field to serve as an index into a global lock_classes array that contains pointers to the lock classes. Only debugging code such as WITNESS or INVARIANTS checks and KTR logging need to access the lock_class member, so this shouldn't add any overhead to production kernels. It might add some slight overhead to kernels using those debug options however. As with the previous set of changes to lock_object, this is going to completely obliterate the kernel ABI, so be sure to recompile all your modules.	2006-01-06 18:07:32 +00:00
jhb	0beb1bf77b	Return error from fget_write() rather than hardcoding EBADF now that fget_write() DTRT. Requested by: bde	2006-01-06 16:34:22 +00:00
jhb	e851a1b52a	Return EBADF rather than EINVAL for FWRITE failure as per POSIX. MFC after: 1 week	2006-01-06 16:30:30 +00:00
jhb	6d056f7e81	Remove XXX comments complaining that write(2) on a read-only descriptor returns EBADF. That errno is correct and is mandated by POSIX. It also goes back to revision 1.1 of our CVS history (i.e. 4.4BSD). The _fget() function should probably also be upated as it currently returns EINVAL in that case rather than EBADF. (It does return EBADF for reads on a write-only descriptor without any XXX comments oddly enough.) Discussed with: scottl, grog, mjacob, bde	2006-01-05 22:20:31 +00:00
bz	326e376458	Minor whitespace cleanup.	2006-01-04 17:40:54 +00:00
phk	3dc6c75d0c	Deorbit ttymalloc() in preference for ttyalloc()	2006-01-04 09:59:07 +00:00
phk	3bbf36cf3f	Use ttyalloc() instead of ttymalloc()	2006-01-04 09:09:46 +00:00
phk	04004b89a1	Use MTX_SYSINIT to set up the tty list mutex.	2006-01-04 08:22:39 +00:00
dds	9a6d7bb900	Fix style bug. Prompted by: bde	2006-01-04 07:50:54 +00:00
dds	8943d89662	Replace tv_usec normalization with the return of EINVAL. This addresses two objections to the previous behavior, and unbreaks the alpha tinderbox build. TODO: update the utimes(2) man page.	2006-01-04 00:47:13 +00:00
dds	e209052111	Normalize the tv_usec part of the utimes(2) arguments to ensure that a file's atime and mtime are only set to correct fractional second values (0-999999000ns with the current interface). Prior to this change users could create files with values outside that range. Moreover, on 32-bit machines tv_usec offsets larger than 4.3s would result in an unnormalized AND wrong timestamp value, due to overflow. MFC after: 1 week	2006-01-03 21:58:21 +00:00
netchild	507a9b3e93	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)	2005-12-31 14:39:20 +00:00
pjd	3cc29e6ebf	Improve memguard a bit: - Provide tunable vm.memguard.desc, so one can specify memory type without changing the code and recompiling the kernel. - Allow to use memguard for kernel modules by providing sysctl vm.memguard.desc, which can be changed to short description of memory type before module is loaded. - Move as much memguard code as possible to memguard.c. - Add sysctl node vm.memguard. and move memguard-specific sysctl there. - Add malloc_desc2type() function for finding memory type based on its short description (ks_shortdesc field). - Memory type can be changed (via vm.memguard.desc sysctl) only if it doesn't exist (will be loaded later) or when no memory is allocated yet. If there is allocated memory for the given memory type, return EBUSY. - Implement two ways of memory types comparsion and make safer/slower the default.	2005-12-30 11:45:07 +00:00
pjd	2cf01da412	Print a warning when we miss vinactive() call, because of race in vget(). The race is very real, but conditions needed for triggering it are rather hard to meet now. When gjournal will be committed (where it is quite easy to trigger) we need to fix it. For now, verify if it is really hard to trigger. Discussed with: kan	2005-12-29 22:52:09 +00:00
jhb	c0024de329	patch(1) and I aren't friends today. Axe a duplicate copy of the msleep_spin() function definition. Spotted by: pjd	2005-12-29 21:15:32 +00:00
jhb	dc2b7b5f5d	Add a new function msleep_spin() which is a slightly stripped down version of msleep(). msleep_spin() doesn't support changing the priority of the thread while it is asleep nor does it support interruptible sleeps (PCATCH) or the PDROP flag. It does support timeouts however. It differs from msleep() in that the passed in mutex is a spin mutex. This means one can use msleep_spin() and wakeup() with a spin mutex similar to msleep() and wakeup() with a regular mutex. Note that the spin mutex in question needs to come before sched_lock and the sleepq locks in lock order.	2005-12-29 20:57:45 +00:00
jhb	efb6208d84	Teach WITNESS_SAVE() and WITNESS_RESTORE() to work with spin locks instead of only sleep locks.	2005-12-29 20:54:25 +00:00
jhb	e782568056	Fix a deadlock I introduced with the recently added printf to warn about spin locks that are not in the static order list. It is not safe to call printf while holding the witness spin mutex since the console drivers that back printf may need to use their own spin locks which would try to talk to witness when they were locked. Given this, it is possible for one CPU to lock a console driver lock (such as sio) which then tries to lock the witness lock while another CPU is doing the printf while holding the witness lock. Fix this by moving the printf outside of the witness lock. All other printf's in witness are already correct. MFC after: 3 days	2005-12-29 20:53:01 +00:00
jhb	a31913be5e	Increment kobj_lookup_misses on a miss rather than decrementing it. Otherwise, the miss count is actually -kobj_lookup_misses. Mostly a pedantic change as KOBJ_STATS isn't on by default.	2005-12-29 18:00:42 +00:00
davidxu	39b63efe82	Add code to report zombie state. PR: threads/91044 MFC after: 3 days	2005-12-29 13:00:42 +00:00
kan	cac531bbcc	Trim trailing whitespace.	2005-12-28 17:13:31 +00:00
pjd	6617f46a01	In realloc(9), determine size of the original block based on UMA_SLAB_MALLOC flag. In some circumstances (I observed it when I was doing a lot of reallocs) UMA_SLAB_MALLOC can be set even if us_keg != NULL. If this is the case we have wonderful, silent data corruption, because less data is copied to the newly allocated region than should be. I'm not sure when this bug was introduced, it could be there undetected for years now, as we don't have a lot of realloc(9) consumers and it was hard to reproduce it... ...but what I know for sure, is that I don't want to know who introduce the bug:) It took me two/three days to track it down (of course most of the time I was looking for the bug in my own code).	2005-12-28 01:53:13 +00:00
davidxu	50b9056f24	Use variable i instead of variable cpus as an index to get correct kseq.	2005-12-27 12:02:03 +00:00
sobomax	4c47ec5eaa	Fix breakage introduced in the previous commit.	2005-12-26 22:32:52 +00:00
sobomax	34fa5a81a5	Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually allow executing elf dynamic binaries (aka shared libraries). When it is requested to execute ET_DYN elf image check if this flag is on after we know the elf brand allowing execution if so. PR: kern/87615 Submitted by: Marcin Koziej <creep@desk.pl>	2005-12-26 21:23:57 +00:00
alc	8d1c855285	Maintain the lock on the vnode for most of exec_elfN_imgact(). Specifically, it is required for the I/O that may be performed by elfN_load_section(). Avoid an obscure deadlock in the a.out, elf, and gzip image activators. Add a comment describing why the deadlock does not occur in the common case and how it might occur in less usual circumstances. Eliminate an unused variable from exec_aout_imgact(). In collaboration with: tegge	2005-12-24 04:57:50 +00:00
davidxu	4b072f53d2	Avoid kernel panic when attaching a process which may not be stopped by debugger, e.g process is dumping core. Only access p_xthread if P_STOPPED_TRACE is set, this means thread is ready to exchange signal with debugger, print a warning if P_STOPPED_TRACE is not set due to some bugs in other code, if there is. The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and is slightly adjusted.	2005-12-24 02:59:29 +00:00
jeff	e2af894dc5	- Remove and unused include. Submitted by: Antoine Brodin <antoine.brodin@laposte.net>	2005-12-23 21:32:40 +00:00
phk	213233d76c	Regenerate sysent with new abort2 system call. Implement abort2(const char reason, int narg, void *args); Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>	2005-12-23 11:58:42 +00:00
phk	69832fdcf6	Add abort2() systemcall.	2005-12-23 11:54:11 +00:00
phk	4bbae65b4a	Make sbuf_copyin() return the number of bytes copied on success. Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>	2005-12-23 11:49:53 +00:00
scottl	90a17769ed	Create the taskqueue_fast handler with INTR_MPSAFE so that it doesn't run with Giant. MFC After: 3 days	2005-12-23 06:18:33 +00:00
jhb	cb0d490ebe	Tweak how the MD code calls the fooclock() methods some. Instead of passing a pointer to an opaque clockframe structure and requiring the MD code to supply CLKF_FOO() macros to extract needed values out of the opaque structure, just pass the needed values directly. In practice this means passing the pair (usermode, pc) to hardclock() and profclock() and passing the boolean (usermode) to hardclock_cpu() and hardclock_process(). Other details: - Axe clockframe and CLKF_FOO() macros on all architectures. Basically, all the archs were taking a trapframe and converting it into a clockframe one way or another. Now they can just extract the PC and usermode values directly out of the trapframe and pass it to fooclock(). - Renamed hardclock_process() to hardclock_cpu() as the latter is more accurate. - On Alpha, we now run profclock() at hz (profhz == hz) rather than at the slower stathz. - On Alpha, for the TurboLaser machines that don't have an 8254 timecounter, call hardclock() directly. This removes an extra conditional check from every clock interrupt on Alpha on the BSP. There is probably room for even further pruning here by changing Alpha to use the simplified timecounter we use on x86 with the lapic timer since we don't get interrupts from the 8254 on Alpha anyway. - On x86, clkintr() shouldn't ever be called now unless using_lapic_timer is false, so add a KASSERT() to that affect and remove a condition to slightly optimize the non-lapic case. - Change prototypeof arm_handler_execute() so that it's first arg is a trapframe pointer rather than a void pointer for clarity. - Use KCOUNT macro in profclock() to lookup the kernel profiling bucket. Tested on: alpha, amd64, arm, i386, ia64, sparc64 Reviewed by: bde (mostly)	2005-12-22 22:16:09 +00:00
alc	09b6655974	Maintain the vnode lock throughout elfN_load_file() rather than releasing it and reacquiring it in vrele(). Consequently, there is no reason to increase the reference count on the vm object caching the file's pages. Reviewed by: tegge Eliminate unused parameters to elfN_load_file().	2005-12-21 18:58:40 +00:00
alc	4bc5d218ff	Eliminate an unneeded (vm_prot_t) parameter from two functions. Eliminate unnecessary uses of a local variable. Reviewed by: tegge	2005-12-20 23:42:18 +00:00
pjd	b98bf8f1d5	Reduce Giant scope a bit, as fdrop() is believed to be MPSAFE. The purpose of this change is consistency (not performance improvement:)), as it was hard to tell if fdrop() is MPSAFE or not when I saw it sometimes under the Giant and sometimes without it. Glanced at by: ssouhlal, kan	2005-12-20 00:49:59 +00:00
pjd	d4461845a5	vfs_mount_alloc() always returns 0, but what we really want is newly allocated 'struct mount *' pointer, so simplify code a bit and return the pointer directly. Reviewed by: ssouhlal	2005-12-20 00:43:51 +00:00
pjd	eb0bfcfd9a	Use 'td' instead of 'curthread'.	2005-12-19 16:27:13 +00:00
davidxu	9dda14459d	Fix a bug in slice calculation code, current code uses hz but sched_clock() is called by state clock. Submitted by: taku at tackymt dot homeip dot net	2005-12-19 08:26:09 +00:00
njl	035fda7990	Remove the KTR for hardclock completely. It seems to not be useful. Requested by: jhb	2005-12-18 18:11:55 +00:00
njl	731935d9f8	Restore KTR_CRITICAL but conditionally compile it in as KTR_SCHED. Requested by: scottl, jhb	2005-12-18 18:10:57 +00:00
marcel	0a081d09f4	Make our ELF64 type definitions match standards. In particular this means: o Remove Elf64_Quarter, o Redefine Elf64_Half to be 16-bit, o Redefine Elf64_Word to be 32-bit, o Add Elf64_Xword and Elf64_Sxword for 64-bit entities, o Use Elf_Size in MI code to abstract the difference between Elf32_Word and Elf64_Word. o Add Elf_Ssize as the signed counterpart of Elf_Size. MFC after: 2 weeks	2005-12-18 04:52:37 +00:00
alc	8f7e8790b1	Correct a long-standing problem in elfN_map_insert(): In order to copy a page to user space, the user space mapping must allow write access. In collaboration with: tegge@ MFC after: 3 weeks	2005-12-17 19:40:47 +00:00
njl	4c2aff8681	Clean up unused or poorly utilized KTR values. Remove KTR_FS, KTR_KGDB, and KTR_IO as they were never used. Remove KTR_CLK since it was only used for hardclock firing and use KTR_INTR there instead. Remove KTR_CRITICAL since it was only used for crit enter/exit and use KTR_CONTENTION instead.	2005-12-17 03:57:10 +00:00
jhb	ce80df24ac	- Use uintfptr_t rather than int for the kernel profiling index (though it really should be a fptrdiff_t if we had that) in profclock(). - Don't try to profile kernel pc's that are >= the kernel lowpc to avoid underflows when computing a profiling index. - Use the PC_TO_I() macro to compute the kernel profiling index rather than doing it inline. Discussed with: bde	2005-12-16 22:11:52 +00:00
jhb	60c3b40e9e	Change the addupc_*() functions to use the uintfptr_t type for pc rather than uintptr_t as that is technically more correct.	2005-12-16 22:08:32 +00:00
alc	8df8bb9f23	Style: The second argument to vm_map_find() should be NULL instead of 0.	2005-12-16 19:14:25 +00:00
alc	f69d4d5fa8	Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the ephemeral mappings that are used as the source for three copy operations from kernel space to user space. There are two reasons for making this change: (1) Under heavy load exec_map can fill up causing vm_map_find() to fail. When it fails, the nascent process is aborted (SIGABRT). Whereas, this reimplementation using sf_buf_alloc() sleeps. (2) Although it is possible to sleep on vm_map_find()'s failure until address space becomes available (see kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore, the reimplementation uses a CPU private mapping, avoiding a TLB shootdown on multiprocessors. Problem uncovered by: kris@ Reviewed by: tegge@ MFC after: 3 weeks	2005-12-16 18:34:14 +00:00
delphij	4ea00e0984	In pipe_write(): when uiomove() fails, do not spin on it forever. Submitted by: Kostik Belousov <kostikbel at gmail.com> on -current@ Message-ID: <20051216151016.GE84442@deviant.zoral.local> MFC After: 3 weeks	2005-12-16 18:32:39 +00:00
davidxu	2673c91f24	Replace selwakeuppri with selwakeup, let scheduler figure out appropriate thread priority.	2005-12-16 15:01:16 +00:00
emaste	a7aeead21d	When using m_dup(9) to copy more than MHLEN bytes of data, don't create an mbuf chain that starts with a cluster containing just MHLEN bytes. This happened because m_dup called m_get or m_getcl depending on the amount of data to copy, but then always set the size available in the first mbuf to MHLEN. Submitted by: Matt Koivisto <mkoivisto at sandvine dot com> Approved by: jmg Silence from: rwatson (mentor)	2005-12-14 23:34:26 +00:00
mux	b29e3549b8	Fix a bunch of SYSCTL_INT() that should have been SYSCTL_ULONG() to match the type of the variable they are exporting. Spotted by: Thomas Hurst <tom@hur.st> MFC after: 3 days	2005-12-14 22:27:48 +00:00
des	5d3c44687b	Eradicate caddr_t from the VFS API.	2005-12-14 00:49:52 +00:00
jhb	c69212d7ad	Add a new 'show lock' command to ddb. If the argument has a valid lock class, then it displays various information about the lock and calls a new function pointer in lock_class (lc_ddb_show) to dump class-specific information about the lock as well (such as the owner of a mutex or xlock'ed sx lock). This is easier than staring at hex dumps of locks to figure out who owns the lock, etc. Note that extending lock_class doesn't affect the ABI for any kernel modules as the only code that deals with lock_class structures directly is kern_mutex.c, kern_sx.c, and witness. MFC after: 1 week	2005-12-13 23:14:35 +00:00
davidxu	a7ea81a09f	Stop fiddling thread priority with msleep, eliminating unnecessary context switching. This improves performance about 30% on UP machine.	2005-12-12 05:04:56 +00:00
rodrigc	4377e3b906	Contributions from XFS for FreeBSD project: - Implement cv_wait_unlock() method which has semantics compatible with the sv_wait() method in IRIX. For cv_wait_unlock(), the lock must be held before entering the function, but is not held when the function is exited. - Implement the existing cv_wait() function in terms of cv_wait_unlock(). Submitted by: kan Feedback from: jhb, trhodes, Christoph Hellwig <hch at infradead dot org>	2005-12-12 00:02:22 +00:00
alc	a5d0ac5faf	Remove unneeded calls to pmap_remove_all(). The given page is not mapped. Reviewed by: tegge	2005-12-11 22:06:57 +00:00
andre	5751cf9192	Hide the 4k mbuf clusters if the normal clusters are defined to be 4k already. This unbreaks tinderbox. Submitted by: ru	2005-12-10 15:21:04 +00:00
davidxu	c7a54e3a64	Fix compiling warning on 64 bits system.	2005-12-09 13:16:48 +00:00
davidxu	52e3a6cedd	Add a sysctl to force a process to sigexit if a trap signal is being hold by current thread or ignored by current process, otherwise, it is very possible the thread will enter an infinite loop and lead to an administrator's nightmare.	2005-12-09 08:29:29 +00:00
davidxu	1628e16677	Register itimers_event_hook as a kernel event handler, so I don't have to duplicate code to call it in exec() and exit1().	2005-12-09 05:43:26 +00:00
davidxu	459600a8d3	Comment out mqfs_create_link. Inline some small functions.	2005-12-09 02:38:29 +00:00
davidxu	c0c32b144f	Now SIGCHLD is always queued.	2005-12-09 02:27:55 +00:00
davidxu	2ee31f310b	Cleanup sigqueue sysctl.	2005-12-09 02:26:44 +00:00
andre	143b5d29e0	Add an API for jumbo mbuf cluster allocation and also provide 4k clusters in addition to 9k and 16k ones. struct mbuf m_getjcl(int how, short type, int flags, int size) void m_cljget(struct mbuf *m, int how, int size) m_getjcl() returns an mbuf with a cluster of the specified size attached like m_getcl() does for 2k clusters. m_cljget() is different from m_clget() as it can allocate clusters without attaching them to an mbuf. In that case the return value is the pointer to the cluster of the requested size. If an mbuf was specified, it gets the cluster attached to it and the return value can be safely ignored. For size both take MCLBYTES, MJUM4BYTES, MJUM9BYTES, MJUM16BYTES. Reviewed by: glebius Tested by: glebius Sponsored by: TCP/IP Optimization Fundraise 2005	2005-12-08 13:13:06 +00:00
rodrigc	89bb16053b	In devfs_first(), set mp->mnt_opt to a valid empty list of mount options instead of leaving it NULL. This eliminates a kernel panic when trying to do a mount -o update of /dev. Noticed by: cjsp Reviewed by: phk	2005-12-08 04:27:53 +00:00
rodrigc	d4df430592	Add "errmsg" to list of global mount options.	2005-12-08 04:09:29 +00:00
rodrigc	5a03a98174	Changes imported from XFS for FreeBSD project: - add fields to struct buf (needed by XFS) - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3 - b_pin_count, count of pinned buffer - add new B_MANAGED flag - add breada() function to initiate asynchronous I/O on read-ahead blocks. - add bufdone_finish(), bpin(), bunpin_wait() functions Patches provided by: kan Reviewed by: phk Silence on: arch@	2005-12-07 03:39:08 +00:00
alc	71b52e0fd2	Reduce the scope of the page queues lock in exec_map_first_page(). The vm object lock is sufficient for reading a page's PG_BUSY and busy flags. MFC after: 1 week	2005-12-06 07:39:36 +00:00
davidxu	34bbe012ae	o Turn on MPSAFE flag for mqueuefs. o Reuse si_mqd field in siginfo_t, this also gives userland information about which descriptor is notified.	2005-12-06 06:22:12 +00:00
davidxu	2d6ec412df	Fix a lock leak in childproc_continued().	2005-12-06 05:30:13 +00:00
jhb	7e42aad088	Tweak witness handling of lock object to shave 2 pointers off of each lock object (and thus off of each mutex and sx lock): - Rename the all_locks list to pending_locks and only put locks initialized before SI_SUB_WITNESS on the list so that the SI_SUB_WITNESS can add them to witness once it starts up. - Now that pending_locks is only used during early startup, change it from a TAILQ to an STAILQ. This removes a pointer from the STAILQ_ENTRY in struct lock_object. - Since the pending_locks list is only used during the single-threaded early boot it no longer needs to be protected by a mutex, so remove all_mtx. - Since the lo_list member of struct lock_object is now only used during early boot before witness is running, collapse lo_list and lo_witness into a union. This shaves the second pointer off of struct lock_object. - Axe lock_cur_cnt and lock_max_cnt. With these changes, struct mtx shrinks from 36 to 28 bytes on 32-bit platforms and from 72 to 56 bytes on 64-bit platforms. Note that this commit will completely and utterly destroy the kernel ABI, so no MFC. Tested on: alpha, amd64, i386, sparc64	2005-12-05 20:45:24 +00:00
davidxu	1e351478d4	After reading some documents, I realized SIGEV_NONE != NULL, also fix code in mqueue_send_notification to handle SIGEV_NONE.	2005-12-05 04:41:32 +00:00
davidxu	d4b584de18	Handle SIGEV_NONE, if notification is SIGEV_NONE, error status and return status will be set, but no notification will be registered. Increase hard limit of maxmsg to 100, so posixtestsuite ports can run.	2005-12-05 03:23:27 +00:00
ru	522e9c2b7b	Fix -Wundef.	2005-12-04 02:12:43 +00:00
rodrigc	fa8afa1a00	Add "rdonly" to global_opts, and parse it in vfs_donmount(). Requested by: rwatson	2005-12-03 12:04:20 +00:00
rodrigc	16d338ecbb	- Add "rw" mount option to global_opts. - In vfs_donmount(), parse "ro", "noro", and "rw", in order to set or unset the MNT_RDONLY filesystem flag.	2005-12-03 01:26:27 +00:00
davidxu	e184004d54	1. Cleanup including. 2. Set configuration value for CTL_P1003_1B_MESSAGE_PASSING.	2005-12-02 14:09:32 +00:00
davidxu	a8411b92b7	1. Check if message priority is less than MQ_PRIO_MAX. 2. Use getnanotime instead of getnanouptime. 3. Don't free message in _mqueue_send, mqueue_send will free it.	2005-12-02 08:23:49 +00:00
davidxu	739ca77c48	1. Set timer configuration values for sysconf(). 2. Set overrun limit to INT_MAX, report ERANGE error if overrun will be greater than INT_MAX.	2005-12-01 07:56:15 +00:00
davidxu	9208ca9d98	set signal queue values for sysconf().	2005-12-01 00:25:50 +00:00
davidxu	5d50adf57d	Last step to make mq_notify conform to POSIX standard, If the process has successfully attached a notification request to the message queue via a queue descriptor, file closing should remove the attachment.	2005-11-30 05:12:03 +00:00
jhb	4b322c88f2	Fix snderr() to not leak the socket buffer lock if an error occurs in sosend(). Robert accidentally changed the snderr() macro to jump to the out label which assumes the lock is already released rather than the release label which drops the lock in his previous change to sosend(). This should fix the recent panics about returning from write(2) with the socket lock held and the most recent LOR on current@.	2005-11-29 23:07:14 +00:00
rwatson	079403d5b7	Move zero copy statistics structure before sosend_copyin(). MFC after: 1 month Reported by: tinderbox, sam	2005-11-28 21:45:36 +00:00
jhb	76c1ae2002	When checking to see if a process has exceeded its time limit, flag the process as over the limit when its time is >= to the limit rather than > the limit. Technically, if p->p_rux.rux_runtime.sec == p->p_pcpulimit and p->p_rux.rux_runtime.frac == 0, the process hasn't exceeded the limit yet. However, having the fraction exactly equal to 0 is rather rare, and it is not worth the overhead to handle that edge case. With just the > comparison, the process would have to exceed its limit by almost a second before it was killed. PR: kern/83192 Submitted by: Maciej Zawadzinski mzawadzinski at gmail dot com Reviewed by: bde MFC after: 1 week	2005-11-28 19:09:08 +00:00
rwatson	45b44c73b7	Break out functionality in sosend() responsible for building mbuf chains and copying in mbufs from the body of the send logic, creating a new function sosend_copyin(). This changes makes sosend() almost readable, and will allow the same logic to be used by tailored socket send routines. MFC after: 1 month Reviewed by: andre, glebius	2005-11-28 18:09:03 +00:00
davidxu	0b4ce8e3e1	Fix a stupid compiler warining, remove a redundant line.	2005-11-27 22:59:47 +00:00
davidxu	d7421ba6b3	Change filesystem name from mqueue to mqueuefs for style consistent. Suggested by: rwatson	2005-11-27 08:30:12 +00:00
davidxu	d81e111959	Regen.	2005-11-27 01:23:31 +00:00
davidxu	d0fa8c77de	Don't use OpenBSD syscall numbers, instead, use new syscall numbers for POSIX message queue. Suggested by: rwatson	2005-11-27 01:13:00 +00:00
rwatson	76b544b4b3	Add several aliases for existing clockid_t names to indicate that the application wishes to request high precision time stamps be returned: Alias Existing CLOCK_REALTIME_PRECISE CLOCK_REALTIME CLOCK_MONOTONIC_PRECISE CLOCK_MONOTONIC CLOCK_UPTIME_PRECISE CLOCK_UPTIME Add experimental low-precision clockid_t names corresponding to these clocks, but implemented using cached timestamps in kernel rather than a full time counter query. This offers a minimum update rate of 1/HZ, but in practice will often be more frequent due to the frequency of time stamping in the kernel: New clockid_t name Approximates existing clockid_t CLOCK_REALTIME_FAST CLOCK_REALTIME CLOCK_MONOTONIC_FAST CLOCK_MONOTONIC CLOCK_UPTIME_FAST CLOCK_UPTIME Add one additional new clockid_t, CLOCK_SECOND, which returns the current second without performing a full time counter query or cache lookup overhead to make sure the cached timestamp is stable. This is intended to support very low granularity consumers, such as time(3). The names, visibility, and implementation of the above are subject to change, and will not be MFC'd any time soon. The goal is to expose lower quality time measurement to applications willing to sacrifice accuracy in performance critical paths, such as when taking time stamps for the purpose of rescheduling select() and poll() timeouts. Future changes might include retrofitting the time counter infrastructure to allow the "fast" time query mechanisms to use a different time counter, rather than a cached time counter (i.e., TSC). NOTE: With different underlying time mechanisms exposed, using different time query mechanisms in the same application may result in relative non-monoticity or the appearance of clock stalling for a single clockid_t, as a cached time stamp queried after a precision time stamp lookup may be "before" the time returned by the earlier live time counter query.	2005-11-27 00:55:18 +00:00
davidxu	e674eb31f2	Regen.	2005-11-26 12:45:22 +00:00
davidxu	dac7c81b62	Bring in experimental kernel support for POSIX message queue.	2005-11-26 12:42:35 +00:00
rodrigc	dc0fe47898	In nmount() and vfs_donmount(), do not strcmp() the options in the iovec directly. We need to copyin() the strings in the iovec before we can strcmp() them. Also, when we want to send the errmsg back to userspace, we need to copyout()/copystr() the string. Add a small helper function vfs_getopt_pos() which takes in the name of an option, and returns the array index of the name in the iovec, or -1 if not found. This allows us to locate an option in the iovec without actually manipulating the iovec members. directly via strcmp(). Noticed by: kris on sparc64	2005-11-23 20:51:15 +00:00
jdp	88e469fc50	Fix a bug in the loop in sonewconn that makes room on the incomplete connection queue for a new connection. It was removing connections from the wrong list. Submitted by: Paul Mikesell Sponsored by: Isilon Systems MFC after: 1 week	2005-11-22 01:55:29 +00:00
marcel	7fe698f697	Fix bug introduced in revision 1.186: When all file systems have a time stamp of zero, which is the case for example when the root file system is on a read-only medium, we ended up not calling inittodr() at all. A potential uncleanliness existed as well. If multiple file systems had a non-zero time stamp, we would call inittodr() multiple times. While this should not be harmful, it's definitely not ideal. Fix both issues by iterating over the mounted file systems to find the largest time stamp and call inittodr() exactly once with that time stamp. This could of course be a zero time stamp if none of the mounted file systems have a non-zero time stamp. In that case the annoying errors mentioned in the commit log for revision 1.186 still haven't been avoided. The bottom line is that inittodr() should not complain when it gets a time base of zero. At the time of this commit only alpha seems to have that problem. Reported by: Dario Freni (saturnero at freesbie dot org) MFC after: 1 week	2005-11-19 21:51:45 +00:00
rodrigc	9cf0eb5132	Parse more mount options in vfs_donmount(), before vfs_domount() is called. It looks like there are lots of different mount flags checked in vfs_domount(), so we need to do the parsing for these particular mount flags earlier on. The new flags parsed are: async, force, multilabel, noasync, noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow, snapshot, suiddir, sync, union. Existing code which uses mount() to mount UFS filesystems is not affected, but new code which uses nmount() to mount UFS filesystems should behave better.	2005-11-19 21:22:21 +00:00
andre	73d3dcb9b2	Add CLOCK_UPTIME to clock_gettime(2) reporting the current uptime measured in SI seconds. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-18 16:51:13 +00:00
rodrigc	c677f67c67	In vfs_nmount(), check to see if "update" mount option was passed in, and if so, set MNT_UPDATE filesystem flag. vfs_nmount() calls vfs_domount(), and there is special logic inside vfs_domount() if MNT_UPDATE is set. This is very important when we want to do an update mount of the root filesystem, using nmount().	2005-11-18 01:31:10 +00:00
yongari	8b951cd641	Prefer NULL to 0. Add missing lock/unlock in sysctl handler. Protect accessing NULL pointer when resource allocation was failed. style(9) Reviewed by: scottl MFC after: 1 week	2005-11-17 08:56:21 +00:00
cognet	48c06903ba	Add a new sysctl, kern.elf[32\|64].can_exec_dyn. When set to 1, one can execute a ET_DYN binary (shared object). This does not make much sense, but some linux scripts expect to be able to execute /lib/ld-linux.so.2 (ldd comes to mind). The sysctl defaults to 0. MFC after: 3 days	2005-11-14 22:24:00 +00:00
rwatson	2fab30d9d4	In ktr_getrequest(), acquire ktrace_mtx earlier -- while the race currently present is minor and offers no real semantic issues, it also doesn't make sense since an earlier lockless check has already occurred. Also hold the mutex longer, over a manipulation of per-process ktrace state, which requires synchronization. MFC after: 1 month Pointed out by: jhb	2005-11-14 19:30:09 +00:00
rwatson	2a5785fb21	Moderate rewrite of kernel ktrace code to attempt to generally improve reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>	2005-11-13 13:27:44 +00:00
rodrigc	2630cf9721	style(9) cleanups. Spotted by: njl, bde	2005-11-12 14:41:44 +00:00
rwatson	257af099d1	Significant refactoring of the accounting code to improve locking and VFS happiness, as well as correct other bugs: - Replace notion of current and saved accounting credential/vnode with a single credential/vnode and an acct_suspended flag. This simplifies the accounting logic substantially. - Replace acct_mtx with acct_sx, a sleepable lock held exclusively during reconfiguration and space polling, but shared during log entry generation. This avoids holding a mutex over sleepable VFS operations. - Hold the sx lock over the duration of the I/O so that the vnode I/O cannot occur after vnode close, which could occur previously if accounting was disabled as a process exited. - Write the accounting log entry with Giant conditionally acquired based on the file system where the log is stored. Previously, the accounting code relied on the caller acquiring Giant. - Acquire Giant conditionally in the accounting callout based on the file system where the accounting log is stored. Run the callout MPSAFE. - Expose acct_suspended via a read-only sysctl so it is possibly to programmatically determine whether accounting is suspended or not without attempting to parse logs. - Check both acct_vp and acct_suspended lock-free before entering the accounting sx lock in acct(). - When accounting is disabled due to a VBAD vnode (i.e., forceable unmount), generate a log message indicating accounting has been disabled. - Correct a long-standing bug in how free space is calculated and compared to the required space: generate and compare signed results, not unsigned results, or negative free space will cause accounting to not be suspended when required, or worse, incorrectly resumed once negative free space is reached. MFC after: 2 weeks	2005-11-12 10:45:13 +00:00
davidxu	d5fcf7dfa0	Make sure only remove one signal by debugger.	2005-11-12 04:22:16 +00:00
rwatson	9487c057e2	Correct a number of serious and closely related bugs in the UNIX domain socket file descriptor garbage collection code, which is intended to detect and clear cycles of orphaned file descriptors that are "in-flight" in a socket when that socket is closed before they are received. The algorithm present was both run at poor times (resulting in recursion and reentrance), and also buggy in the presence of parallelism. In order to fix these problems, make the following changes: - When there are in-flight sockets and a UNIX domain socket is destroyed, asynchronously schedule the garbage collector, rather than running it synchronously in the current context. This avoids lock order issues when the garbage collection code reenters the UNIX domain socket code, avoiding lock order reversals, deadlocks, etc. Run the code asynchronously in a task queue. - In the garbage collector, when skipping file descriptors that have entered a closing state (i.e., have f_count == 0), re-test the FDEFER flag, and decrement unp_defer. As file descriptors can now transition to a closed state, while the garbage collector is running, it is no longer the case that unp_defer will remain an accurate count of deferred sockets in the mark portion of the GC algorithm. Otherwise, the garbage collector will loop waiting waiting for unp_defer to reach zero, which it will never do as it is skipping file descriptors that were marked in an earlier pass, but now closed. - Acquire the UNIX domain socket subsystem lock in unp_discard() when modifying the unp_rights counter, or a read/write race is risked with other threads also manipulating the counter. While here: - Remove #if 0'd code regarding acquiring the socket buffer sleep lock in the garbage collector, this is not required as we are able to use the socket buffer receive lock to protect scanning the receive buffer for in-flight file descriptors on the socket buffer. - Annotate that the description of the garbage collector implementation is increasingly inaccurate and needs to be updated. - Add counters of the number of deferred garbage collections and recycled file descriptors. This will be removed and is here temporarily for debugging purposes. With these changes in place, the unp_passfd regression test now appears to be passed consistently on UP and SMP systems for extended runs, whereas before it hung quickly or panicked, depending on which bug was triggered. Reported by: Philip Kizer <pckizer at nostrum dot com> MFC after: 2 weeks	2005-11-10 16:06:04 +00:00
rwatson	3153d02ada	Add the f_msgcount field to the set of struct file fields printed in show files. MFC after: 1 week	2005-11-10 13:26:29 +00:00
rwatson	dcccc2e254	Expanet of details printed for each file descriptor to include it's garbage collection flags. Reformat generally to make this fit and leave some room for future expansion. MFC after: 1 week	2005-11-10 11:35:59 +00:00
rwatson	20a1214886	Add a DDB "show files" command to list the current open file list, some state about each open file, and identify the first process in the process table that references the file. This is helpful in debugging leaks of file descriptors. MFC after: 1 week	2005-11-10 10:42:50 +00:00
dwhite	0bcdf7c033	This is a workaround for a complicated issue involving VFS cookies and devfs. The PR and patch have the details. The ultimate fix requires architectural changes and clarifications to the VFS API, but this will prevent the system from panicking when someone does "ls /dev" while running in a shell under the linuxulator. This issue affects HEAD and RELENG_6 only. PR: 88249 Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com> MFC after: 3 days	2005-11-09 22:03:50 +00:00
rwatson	fc360a564f	Fix typo in recent comment tweak. Submitted by: jkim MFC after: 1 week	2005-11-09 22:02:02 +00:00
rwatson	6b8f490b77	In closef(), remove the assumption that there is a thread associated with the file descriptor. When a file descriptor is closed as a result of garbage collecting a UNIX domain socket, the file descriptor will not have any associated thread, so the logic to identify advisory locks held by that thread is not appropriate. Check the thread for NULL to avoid this scenario. Expand an existing comment to say a bit more about this. MFC after: 1 week	2005-11-09 20:54:25 +00:00
imp	53b73d2a31	General consensus is that it would be even better to run this in a thread context. While it doesn't matter too much at the moment, in the future we could be back in the same boat if/when more restrictions are placed (or enforced) in a SWI. Suggested by: njl, bde, jhb, scottl	2005-11-09 16:22:56 +00:00
jhb	e53f1ca06b	Use intptr_t casts to convert void * <--> int to make 64-bit archs happy.	2005-11-09 15:15:59 +00:00
ru	dcace5669d	Use sparse initializers for "struct domain" and "struct protosw", so they are easier to follow for the human being.	2005-11-09 13:29:16 +00:00
davidxu	f9da852761	WIFxxx macros requires an int type but p_xstat is short, convert it to int before using the macros. Bug reported by : Pyun YongHyeon pyunyh at gmail dot com	2005-11-09 07:58:16 +00:00
imp	a528ef30b2	Kick off the suspend sequence from the keyboard in a SWI rather than in the hardware interrupt context (even if it is likely just an ithread). We don't document that suspend/resume routines are run from such a context and some of the things that happen in those routines aren't interrupt safe. Since there's no real need to run from that context, this restores assumptions that suspend routines have made. This fixes Thierry Herbelot's 'Trying to sleep while sleeping is prohibited' problem.	2005-11-09 07:32:01 +00:00
imp	2f1cffe264	Clarify panic message, I parsed the old one 'trying to sleep while sleeping'	2005-11-09 07:28:52 +00:00
rodrigc	2cbc12617e	For nmount(), allow a text string error message to be propagated back to user-space if a parameter named "errmsg" is passed into the iovec. Used in conjunction with vfs_mount_error(), more useful error messages than errno can be passed back to userspace when mounting a filesystem fails. Discussed with: phk, pjd	2005-11-09 02:26:38 +00:00
davidxu	ce1172e446	In aio_waitcomplete, do not return EAGAIN if no other threads have started aio, instead, initialize aio management structure if it hasn't been done, the reason to adjust this behavior is to make it a bit friendly for threaded program, consider two threads, one submits aio_write, and another just calls aio_waitcomplete to wait any I/O to be completed and recycle the aio requests, before submitter doing any I/O, the recycler wants to wait in kernel. This also fixes inconsistency with other aio syscalls.	2005-11-08 23:48:32 +00:00

... 2 3 4 5 6 ...

9179 Commits