freebsd-nq

Author	SHA1	Message	Date
Tor Egge	c78226329a	For low memory situations, non-VMIO buffers didnt't release pages back to the system when brelse() was called with B_RELBUF set on the buffer. This could be a problem when the system was low on memory, had many buffers on QUEUE_EMPTYKVA and started to traverse directories. For each getnewbuf(), pages were allocated from the system, driving the free reserve downwards. For each brelse(), the system put the buffer on QUEUE_CLEAN, with B_INVAL set. This commit changes the semantics of B_RELBUF to also free pages from non-VMIO buffers. Reviewed by: alc	2006-02-02 21:37:39 +00:00
Olivier Houchard	56db7f4cc6	Don't destroy the slave /dev entry until someone figures out why devfs seems to behave badly when we do so.	2006-02-02 20:35:45 +00:00
John Baldwin	f6b457923d	Whitespace fix. Submitted by: Wojciech A. Koszek <dunstan at zsno ids czest pl>	2006-02-02 20:14:52 +00:00
Jeff Roberson	68ce4375c4	- textvp may have been from a different mountpoint than ndp->ni_vp and we may need to acquire giant to vrele it. Found by: mjacob MFC After: 3 days	2006-02-02 08:39:39 +00:00
Robert Watson	06f2859f6d	Regenerate.	2006-02-02 01:45:01 +00:00
Robert Watson	35d29f5091	Map audit-related system calls to audit event identifiers. Much work by: wsalamon Obtained from: TrustedBSD Project	2006-02-02 01:44:30 +00:00
Robert Watson	fcf7f27a36	Hook up audit to fork() and exit() events. These changes manage the audit state on processes, not auditing of these events. Much work by: wsalamon Obtained from: TrustedBSD Project	2006-02-02 01:32:58 +00:00
Robert Watson	3683665bbd	Hook up audit to the initial process creation events (proc0, proc1). Much help from: wsalamon Obtained from: TrustedBSD Project	2006-02-02 01:16:31 +00:00
Robert Watson	911b84b08d	Add new fields to process-related data structures: - td_ar to struct thread, which holds the in-progress audit record during a system call. - p_au to struct proc, which holds per-process audit state, such as the audit identifier, audit terminal, and process audit masks. In the earlier implementation, td_ar was added to the zero'd section of struct thread. In order to facilitate merging to RELENG_6, it has been moved to the end of the data structure, requiring explicit initalization in the thread constructor. Much help from: wsalamon Obtained from: TrustedBSD Project	2006-02-02 00:37:05 +00:00
Jeff Roberson	9157b485f0	- Solve a problem where a vput could be called on an outgoing directory without Giant held. Do this by tracking the vfslocked state for the directory seperate from the child. This is only important in the case where we cross a mountpoint. Sponsored by: Isilon Systems, Inc. MFC After: 3 days	2006-02-01 09:34:32 +00:00
Jeff Roberson	0ac72424f0	- chroot and chdir need to lock giant as appropriate for the outgoing vp as well as the new vp. Sponsored by: Isilon Systems, Inc. MFC After: 3 days	2006-02-01 09:30:44 +00:00
Scott Long	803e980d03	Fix another compile problem. If I find any more, this file is going in the Attic until it is properly fixed.	2006-02-01 04:18:07 +00:00
Jeff Roberson	b099db5881	- Solve a race where we could lose a call to VOP_INACTIVE. If vget() waiting on a lock held the last usecount ref on a vnode and the lock failed we would not call INACTIVE. Solve this by only holding a holdcnt to prevent the vnode from disappearing while we wait on vn_lock. Other callers may now VOP_INACTIVE while we are waiting on the lock, however this race is acceptable, while losing INACTIVE is not. Discussed with: kan, pjd Tested by: kkenn Sponsored by: Isilon Systems, Inc. MFC After: 1 week	2006-02-01 00:30:05 +00:00
Jeff Roberson	89b0e10910	- Reorder calls to vrele() after calls to vput() when the vrele is a directory. vrele() may lock the passed vnode, which in these cases would give an invalid lock order of child -> parent. These situations are deadlock prone although do not typically deadlock because the vrele is typically not releasing the last reference to the vnode. Users of vrele must consider it as a call to vn_lock() and order it appropriately. MFC After: 1 week Sponsored by: Isilon Systems, Inc. Tested by: kkenn	2006-02-01 00:25:26 +00:00
Christian S.J. Peron	b4e12c03e9	Allow root to open prison pts devices too. Pointed out by: rwatson	2006-01-31 22:19:37 +00:00
Christian S.J. Peron	f737c45c91	Allow root in the host environment to open ptys within jailed environments. This logic change was introduced in revision 1.74: Correct an oversight in jail() that allowed processes in jail to access ptys in ways that might be unethical, especially towards processes not in jail, or in other jails. It should be fine to allow root in the host environment to do this. This allows for more effective monitoring of prisons from the host environment. Discussed with: rwatson MFC after: 1 week	2006-01-31 17:17:45 +00:00
Pawel Jakub Dawidek	847a2a1716	Add buffer corruption protection (RedZone) for kernel's malloc(9). It detects both: buffer underflows and buffer overflows bugs at runtime (on free(9) and realloc(9)) and prints backtraces from where memory was allocated and from where it was freed. Tested by: kris	2006-01-31 11:09:21 +00:00
Scott Long	019a2f40ae	Regroup order of operations to better reflect what was probably intended. Submitted by: Peter Jeremy	2006-01-30 19:25:52 +00:00
Gleb Smirnoff	75ee267c22	Merge the //depot/user/yar/vlan branch into CVS. It contains some collective work by yar, thompsa and myself. The checksum offloading part also involves work done by Mihail Balikov. The most important changes: o Instead of global linked list of all vlan softc use a per-trunk hash. The size of hash is dynamically adjusted, depending on number of entries. This changes struct ifnet, replacing counter of vlans with a pointer to trunk structure. This change is an improvement for setups with big number of VLANs, several interfaces and several CPUs. It is a small regression for a setup with a single VLAN interface. An alternative to dynamic hash is a per-trunk static array with 4096 entries, which is a compile time option - VLAN_ARRAY. In my experiments the array is not an improvement, probably because such a big trunk structure doesn't fit into CPU cache. o Introduce an UMA zone for VLAN tags. Since drivers depend on it, the zone is declared in kern_mbuf.c, not in optional vlan(4) driver. This change is a big improvement for any setup utilizing vlan(4). o Use rwlock(9) instead of mutex(9) for locking. We are the first ones to do this! :) o Some drivers can do hardware VLAN tagging + hardware checksum offloading. Add an infrastructure for this. Whenever vlan(4) is attached to a parent or parent configuration is changed, the flags on vlan(4) interface are updated. In collaboration with: yar, thompsa In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>	2006-01-30 13:45:15 +00:00
Robert Watson	4c0b19957f	Move pts master devices into /dev/pty/ instead of littering /dev with them; this is more consistent with the placement of slaves in /dev/pts. The actual name doesn't matter as it's not part of the exposed API or used by libc. In some sense, it would be nice if these device nodes didn't have to have names in devfs at all. Suggested by: Stephen McKay <smckay at internode dot on dot net>	2006-01-30 11:59:19 +00:00
Gleb Smirnoff	61fb9bd80c	- In pipe() return the error returned by pipe_create(), rather then hardcoded ENFILES, which is incorrect. pipe_create() can fail due to ENOMEM. - Update manual page, describing ENOMEM return code. Reviewed by: arch	2006-01-30 08:25:04 +00:00
Jeff Roberson	608c95d341	- Add a comment warning about an anomalous condition where we VOP_UNLOCK and then vrele rather than vput because we would like to VOP_UNLOCK with a specific thread.	2006-01-30 08:21:23 +00:00
Jeff Roberson	033eb86e52	- Lock access to vrele() with VFS_LOCK_GIANT() rather than mtx_lock(&Giant). Sponsored by: Isilon Systems, Inc.	2006-01-30 08:19:01 +00:00
Scott Long	8ad6b7ab7c	Take a stab at making this compile when WITNESS is not defined. gcc can't figure out the order of operations at line 519, and neither can I, but this is my best guess. Also correct a number of typos and syntax errors.	2006-01-29 20:48:25 +00:00
Max Laier	6aec1278dc	firmware(9) is a subsystem to load binary data into the kernel via a specially crafted module. There are several handrolled sollutions to this problem in the tree already which will be replaced with this. They include iwi(4), ipw(4), ispfw(4) and digi(4). No objection from: arch MFC after: 2 weeks X-MFC after: some drivers have been converted	2006-01-29 02:52:42 +00:00
Max Laier	69e99c5d4c	Unbreak on archs where %d doesn't print uintptr_t arithmetic.	2006-01-29 02:35:22 +00:00
Robert Watson	5276d7471f	Rename use_old_pty variable to use_pts, as this more accurately reflects the sense of the variable. Suggested by: dwhite	2006-01-28 23:31:19 +00:00
Suleiman Souhlal	c270875f7c	Don't try to load KLDs if we're mounting the root. We'd otherwise panic. Tested by: kris MFC after: 3 days	2006-01-28 22:58:39 +00:00
Kris Kennaway	d5e5528afe	Back out r1.653; it turns out that the race (or at least the printf) is actually not hard to trigger, and it can cause a lot of console spam. Approved by: kan	2006-01-28 03:06:35 +00:00
Warner Losh	6229621e2c	lock unused when INVARIANTS not defined, so don't declare it then	2006-01-28 00:49:31 +00:00
John Baldwin	3f08bd8bce	Add a basic reader/writer lock implementation to the kernel. This implementation is by no means perfect as far as some of the algorithms that it uses and the fact that it is missing some functionality (try locks and upgrades/downgrades are not there yet), however it does seem to work in my local testing. There is more detail in the comments in the code, but the short version follows. A reader/writer lock is very much like a regular mutex: it cannot be held across a voluntary sleep; it can be acquired in an interrupt thread; if the lock is held by a writer then the priority of any threads that block on the lock will be lent to the owner; the simple case lock operations all are done in a single atomic op. It also shares some similiarities with sx locks: it supports reader/writer semantics (multiple readers, but single writers); readers are allowed to recurse, but writers are not. We can extend this implementation further by either improving algorithms or adding new functionality, but this should at least give us a base to work with now. Reviewed by: arch (in theory) Tested on: i386 (4 cpu box with a kernel module that used 4 threads that randomly chose between read locks and write locks that ran w/o panicing for over a day solid. It usually panic'd within a few seconds when there were bugs during testing. :) The kernel module source is available on request.)	2006-01-27 23:13:26 +00:00
John Baldwin	135161049e	Whitespace.	2006-01-27 23:06:08 +00:00
John Baldwin	7aa4f6852a	- Add support for having both a shared and exclusive queue of threads in each turnstile. Also, allow for the owner thread pointer of a turnstile to be NULL. This is needed for the upcoming reader/writer lock implementation. - Add a new ddb command 'show turnstile' that will look up the turnstile associated with the given lock argument and display useful information like the list of threads blocked on each queue, etc. If there isn't an active turnstile for a lock at the specified address, then the function will see if there is an active turnstile at the specified address and display info about it if so. - Adjust the mutex code to handle the turnstile API changes. Tested on: i386 (all), alpha, amd64, sparc64 (1 and 3)	2006-01-27 22:42:12 +00:00
John Baldwin	f126e754e0	Add a new ddb command 'show sleepq'. It takes a wait channel as an argument and looks for a sleep queue associated with that wait channel. If it finds one it will display information such as the list of threads sleeping on that queue. If it can't find a sleep queue for that wait channel, then it will see if that address matches any of the active sleep queues. If so, it will display information about the sleepq at the specified address.	2006-01-27 22:24:07 +00:00
John Baldwin	bef4bf1adf	Add a new sysctl, debug.ktr.clear. If you write a non-zero value to this sysctl then it will clear the KTR buffer. Note that if you have active KTR traces at the same time as a clear operation the behavior is undefined, though it shouldn't panic.	2006-01-27 22:17:31 +00:00
Olivier Houchard	23c15e6437	Merge a bunch of changes that where done in tty_pty.c after tty_pts.c was forked from it, but missed from some reason.	2006-01-27 15:13:40 +00:00
Pawel Jakub Dawidek	f220f7afa6	Grr. Backout previous change. vn_open_cred() will call NDFREE() on failure.	2006-01-27 11:25:06 +00:00
Pawel Jakub Dawidek	970c7ca2ef	Don't forget to call NDFREE(9) in case of vn_open_cred() failure. MFC after: 3 days	2006-01-27 11:19:53 +00:00
David Xu	6d53aa6297	Just like dofilewrite(), call bwillwrite before fo_write.	2006-01-27 08:02:25 +00:00
David Xu	03d66b36c7	return final error code in aio_return rather than a hardcoded 0.	2006-01-27 04:14:16 +00:00
Olivier Houchard	f94cf2b10b	Take into account that bits 0x0000ff00 can't be used for minor.	2006-01-27 00:21:48 +00:00
Olivier Houchard	169c44907a	Don't attempt to re-create the /dev entry for the slave part if it already exist when opening the master. This can happen if one open the master, then open the slave, then close and re-open the master. Reported by: Peter Holm	2006-01-26 20:54:49 +00:00
David Xu	55a122bf28	in aio_aqueue, store same return code into job->_aiocb_private.error. in aio_return, unlock proc lock before suword.	2006-01-26 08:37:02 +00:00
Olivier Houchard	12af2a0f4f	Bring in a sysv-style pts implementation, as found in the rwatson_pts perforce branch. It works the same as its SysV/linux counterpart : You obtain a fd to the master pseudo terminal by opening /dev/ptmx, which craetes a node for the master as /dev/pty[num] and a node for the slave as /dev/pts/[num]. It should play nicely with the existing BSD ptys. By default, the system will use the BSD ptys, one can set the sysctl kern.pts.enable to 1 to make it use the new pts system. The max number of pty that can be allocated on a system can be changed with the sysctl kern.pts.max. It defaults to 1000, and can be increased, but it is not recommanded, as any pty with a number > 999 won't be handled by whatever uses utmp(5).	2006-01-26 01:30:34 +00:00
John Baldwin	6b81555744	Axe KTR_ALQ_MASK now that KTR_WITNESS is off unless you hack an #ifdef in subr_witness.c. I did add a comment in subr_witness.c noting that KTR_WITNESS is incompatible with KTR_ALQ.	2006-01-25 14:57:23 +00:00
Stephan Uphoff	6807424d19	Back out changes made in rev. 1.151. They were bogus. Cluebat applied by: jhb@	2006-01-25 02:05:47 +00:00
Don Lewis	f4af687a3b	Touch all the pages wired by sysctl_wire_old_buffer() to avoid PTE modified bit emulation traps on Alpha while holding locks in the sysctl handler. A better solution would be to pass a hint to the Alpha pmap code to tell mark these pages as modified when they as they are being wired, but that appears to be more difficult to implement. Suggested by: jhb MFC after: 3 days	2006-01-25 01:03:34 +00:00
John Baldwin	67f7fe8c01	Whitespace fix.	2006-01-24 22:24:05 +00:00
John Baldwin	2b604e82b2	- Add a new KTR_SUBSYS in place of KTR_SPARE1 to serve as a subsystem placeholder similar to KTR_DEV. Explain the use of KTR_DEV and KTR_SUBSYS in a comment as well. - Retire KTR_WITNESS and instead have KTR_WITNESS default to off but use KTR_SUBSYS if it is enabled.	2006-01-24 22:23:45 +00:00
David Xu	1aa4c324ee	Add locking annotation and comments about socket, pipe, fifo problem. Temporarily fix a locking problem for socket I/O.	2006-01-24 07:24:24 +00:00
David Xu	e6bdc05ff7	Er, rescure a deleted comment line.	2006-01-24 02:50:42 +00:00
David Xu	bd793be3c6	More cleanup for aio code: 1) unregsiter kqueue filter for EVFILT_LIO. 2) free uma_zones. 3) call setsid directly to enter another session rather than implementing by itself. Submitted by: jhb	2006-01-24 02:46:15 +00:00
David Xu	7f34b521c7	Add bracket.	2006-01-23 23:46:30 +00:00
John Baldwin	704c9f00fb	Fix a vnode reference leak in the ktrace code. We always grab a reference to the vnode at the start of ktr_writerequest() but were missing the corresponding vrele() after we finished the write operation. Reported by: jasone	2006-01-23 21:45:32 +00:00
Stephan Uphoff	03001f59c8	Hopefully fix the "calcru: runtime went backwards from ..." problem by keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days	2006-01-23 19:15:13 +00:00
Andre Oppermann	fd2413398c	In mb_zinit_pack() explicitly ignore the return value of uma_zalloc_arg(). The success of the cluster allocation is checked through a field in the mbuf structure. This change is non-functional but properly silences code inspection tools. Found by: Coverity Prevent(tm) Coverity ID: CID807 Sponsored by: TCP/IP Optimization Fundraise 2005	2006-01-23 15:49:01 +00:00
David Xu	68d7111884	Verify all supported notification types.	2006-01-23 10:27:15 +00:00
David Xu	a9bf5e37ae	1) Merge _aio_aqueue and aio_aqueue, check quota in aio_aqueue, so that lio_listio won't exceed the quota. 2) Remove lio_ref_count, it is no longer used.	2006-01-23 02:49:34 +00:00
Alan Cox	bb53e2bf27	Remove an unnecessary call to pmap_remove_all(). The given page is not mapped because its contents are invalid. Reviewed by: tegge	2006-01-23 00:00:45 +00:00
Don Lewis	1dd5fc0fde	Tweak previous vfs_lookup.c commit to return an EINVAL error from lookup() instead of EPERM when a DELETE or RENAME operation is attempted on "..". In kern_unlink(), remap EINVAL errors returned from namei() to EPERM to match existing (and POSIX required) behaviour. Discussed with: bde MFC after: 3 days	2006-01-22 19:37:02 +00:00
David Xu	8c0d9af5bf	Fix a bogus panic.	2006-01-22 09:39:59 +00:00
David Xu	9b84335c84	Decrease kaio_active_count first, because user process may go away after we notified it.	2006-01-22 09:25:52 +00:00
David Xu	4ca4c9ee68	Regen.	2006-01-22 06:01:48 +00:00
David Xu	1ce9182407	Make aio code MP safe.	2006-01-22 05:59:27 +00:00
Nate Lawson	a2b31c5b4f	Add a devd(8) event that is sent after the system resumes. This can be used by utilities to reset moused(8), for example. The syntax is: !system=kern subsystem=power type=resume Note that it would be nice to have notification of suspend, but it's more difficult since there would have to be a method of doing request/ack to userland and automatically timing out if no response. apm(4) has a similar mechanism. MFC after: 2 weeks	2006-01-22 01:06:25 +00:00
Robert Watson	c1250af683	Convert remaining functions to ANSI C function declarations. MFC after: 1 week	2006-01-22 00:30:46 +00:00
Alan Cox	e5e6093ba9	Avoid a vm object reference leak in a rarely used code path. An executable contains at most one PT_INTERP program header. Therefore, the loop that searches for it can terminate after it is found rather than iterating over the entire set of program headers. Eliminate an unneeded initialization. Reviewed by: tegge	2006-01-21 20:11:49 +00:00
Don Lewis	bea7a8d75c	Return EPERM from lookup() if cn_nameiop is DELETE or RENAME and the last component of the path name is "..". This keeps VOP_LOOKUP() from locking vnodes in reverse order. Tested by: Denis Shaposhnikov <dsh AT vlink DOT ru> MFC after: 3 days	2006-01-21 19:57:56 +00:00
Robert Watson	6be2c41a22	Convert remaining functions in vfs_subr.c from K&R prototypes to ANSI C prototypes, as the majority of new functions added have been in this style. Changing prototype style now results in gcc noticing that the implementation of vn_pollrecord() has a 'short' argument instead of 'int' as prototyped in vnode.h, so correct that definition. In practice this didn't matter as only poll flags in the lower 16 bits are used. MFC after: 1 week	2006-01-21 19:42:10 +00:00
John Baldwin	267ec43593	When loading a driver that is a subclass of another driver don't set the devclass's parent pointer if the two drivers share the same devclass. This can happen if the drivers use the same new-bus name. For example, we currently have 3 drivers that use the name "pci": the generic PCI bus driver, the ACPI PCI bus driver, and the OpenFirmware PCI bus driver. If the ACPI PCI bus driver was defined as a subclass of the generic PCI bus driver, then without this check the "pci" devclass would point to itself as its parent and device_probe_child() would spin forever when it encountered the first PCI device that did have a matching driver. Reviewed by: dfr, imp, new-bus@	2006-01-20 21:59:13 +00:00
Julian Elischer	11f4763dd4	Return the thread name in the kinfo_proc structure. Also correct the comment describing what the value is.	2006-01-18 20:27:43 +00:00
John Baldwin	25e498b4b0	Always include the lock_classes[] array in the kernel. The "is it a spinlock" test in mtx_destroy() needs it even in non-debug kernels. Reported by: danfe	2006-01-18 18:02:50 +00:00
Juli Mallett	b241b0a239	Since p_cansee will end up dereferencing p_ucred, don't check for p_ucred equal to NULL several times later. p_ucred "should probably not" be NULL if the process isn't PRS_NEW anyway. This is strongly reinforced by the fact that we don't see frequent crashes here. Remove the checks after p_cansee and add a KASSERT right before it. Found by: Coverity Prevent (tm) Also trim one nearby trailing space.	2006-01-17 20:25:01 +00:00
John Baldwin	6ef970a972	Bah. Fix 'show lock' to actually be compiled in. I had just fixed this in p4 but had an older subr_lock.c on the machine I committed to CVS from.	2006-01-17 16:58:32 +00:00
John Baldwin	83a81bcb14	Add a new file (kern/subr_lock.c) for holding code related to struct lock_obj objects: - Add new lock_init() and lock_destroy() functions to setup and teardown lock_object objects including KTR logging and registering with WITNESS. - Move all the handling of LO_INITIALIZED out of witness and the various lock init functions into lock_init() and lock_destroy(). - Remove the constants for static indices into the lock_classes[] array and change the code outside of subr_lock.c to use LOCK_CLASS to compare against a known lock class. - Move the 'show lock' ddb function and lock_classes[] array out of kern_mutex.c over to subr_lock.c.	2006-01-17 16:55:17 +00:00
John Baldwin	550d1c9392	Initialize thread0.td_contested in init_turnstiles() rather than mutex_init() as it is used by the turnstile code and is not mutex-specific.	2006-01-17 16:47:42 +00:00
John Baldwin	3eb9cab0c6	Garbage collect turnstile_empty() since it is unused.	2006-01-17 16:40:20 +00:00
Poul-Henning Kamp	25a14196dd	Fix an 11 year old mistake: Let the hash functions take a void* instead of unsigned char* argument.	2006-01-17 15:35:57 +00:00
Tor Egge	dffaf91aa3	Set flag in needsbuffer while still holding bqlock to avoid lost wakeup.	2006-01-16 22:09:47 +00:00
Christian S.J. Peron	323203d389	vfs_busy can only return something useful if MNTK_UNMOUNT has been set. Since we are using vfs_busy() on a freshly allocated mount structure, use (void) to show that we do not care about the return value. Found with: Coverity Prevent (tm) MFC after: 2 weeks	2006-01-15 20:14:11 +00:00
Robert Watson	6994eebcab	Cast VFS_STATFS() in vfs_domount() to (void) to indicate that ignoring the return value is intentional: this is simply an attempt to pre-cache the statfs state. Found with: Coverity Prevent (tm) MFC after: 3 days	2006-01-15 20:01:05 +00:00
Christian S.J. Peron	8213baf002	Initialize ki to p->p_aioinfo after we know it's going to be referencing a valid kaioinfo structure. This avoids a potential NULL pointer dereference. Found with: Coverity Prevent(tm) MFC after: 2 weeks	2006-01-15 01:55:45 +00:00
Ruslan Ermilov	6a61c14ee1	AMD64 also supports disk slices.	2006-01-14 20:47:11 +00:00
Poul-Henning Kamp	c9df826b0a	Correct STAILQ usage in purge of resourcelist. Found with: Coverity Prevent(tm)	2006-01-14 09:41:35 +00:00
Scott Long	0f92108d32	Add the following to the taskqueue api: taskqueue_start_threads(struct taskqueue *, int count, int pri, const char name, ...); This allows the creation of 1 or more threads that will service a single taskqueue. Also rework the taskqueue_create() API to remove the API change that was introduced a while back. Creating a taskqueue doesn't rely on the presence of a process structure, and the proc mechanics are much better encapsulated in taskqueue_start_threads(). Also clean up the taskqueue_terminate() and taskqueue_free() functions to safely drain pending tasks and remove all associated threads. The TASKQUEUE_DEFINE and TASKQUEUE_DEFINE_THREAD macros have been changed to use the new API, but drivers compiled against the old definitions will still work. Thus, recompiling drivers is not a strict requirement.	2006-01-14 01:55:24 +00:00
Robert Watson	bc03ea7f49	When calling bioq_first() to see if a queue is empty in bioq_disksort(), don't save the return value as we won't use it. Noticed by: Coverity Prevent analysis tool MFC after: 3 days	2006-01-13 23:27:12 +00:00
Robert Watson	b8ae1cd619	Add sosend_dgram(), a greatly reduced and simplified version of sosend() intended for use solely with atomic datagram socket types, and relies on the previous break-out of sosend_copyin(). Changes to allow UDP to optionally use this instead of sosend() will be committed as a follow-up.	2006-01-13 10:22:01 +00:00
Robert Watson	d7dca9034c	XXX a comment in uipc_usrreq.c that requires updating.	2006-01-13 00:00:32 +00:00
Alfred Perlstein	7d7e053c21	Novel idea, don't print a string if it is NULL! This protects people from loading _really_ old modules, like say from 5.x to a 6.x or 7.x system, like for instance right after an upgrade.	2006-01-12 19:15:14 +00:00
Scott Long	1c3a3b0bd0	The interlock in taskqueue_terminate() is completely wrong for taskqueues that use spinlocks. Remove it for now.	2006-01-11 00:37:13 +00:00
Poul-Henning Kamp	d3e64681d6	Move the old BSD4.3 tty compatibility from (!BURN_BRIDGES && COMPAT_43) to COMPAT_43TTY. Add COMPAT_43TTY to NOTES and */conf/GENERIC Compile tty_compat.c only under the new option. Spit out #warning "Old BSD tty API used, please upgrade." if ioctl_compat.h gets #included from userland.	2006-01-10 09:19:10 +00:00
Scott Long	9df1a6dd61	Add functions and macros and refactor code to make it easier to manage fast taskqueues. The following have been added: TASKQUEUE_FAST_DEFINE() - create a global task queue. an arbitrary execution context. TASKQUEUE_FAST_DEFINE_THREAD() - create a global taskqueue that uses a dedicated kthread. taskqueue_create_fast() - create a local/private taskqueue. These are all complimentary of the standard taskqueue functions. They are primarily useful for fast interrupt handlers that can only use spinlock for synchronization. I personally think that the taskqueue API is starting to get too narrow and hairy, but fixing it will require a major redesign on the API. Such a redesign would be good but would break compatibility with FreeBSD 6.x, so it really isn't desirable at this time. Submitted by: sam	2006-01-10 06:31:12 +00:00
Tor Egge	82be0a5a24	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
Scott Long	861a23087b	If destroying a spinlock, make sure that it is exited properly. Submitted by: jhb MFC After: 3 days	2006-01-08 00:18:34 +00:00
John Baldwin	3b783acd2a	Revert an untested local change that crept in with the lo_class changes and subsequently broke the build. This change is supposed to fix the case where doing a mtx_destroy() off a spin mutex while you hold it fails. If it had been tested I would just leave it in, but it hasn't been tested yet, so it will have to wait until later.	2006-01-07 14:03:15 +00:00
David Xu	0a5cd498bb	Add a new feature to thr_kill, if thread ID argument is -1, send signals to all threads except current sender.	2006-01-07 03:15:21 +00:00
Tai-hwa Liang	75d6a87fa3	Trying to fix compilation bustage introduced in rev1.160 by converting a missing lo_class to LO_CLASSINDEX().	2006-01-07 02:07:08 +00:00
John Baldwin	3c6decc327	Trim another pointer from struct lock_object (and thus from struct mtx and struct sx). Instead of storing a direct pointer to a our lock_class struct in lock_object, reserve 4 bits in the lo_flags field to serve as an index into a global lock_classes array that contains pointers to the lock classes. Only debugging code such as WITNESS or INVARIANTS checks and KTR logging need to access the lock_class member, so this shouldn't add any overhead to production kernels. It might add some slight overhead to kernels using those debug options however. As with the previous set of changes to lock_object, this is going to completely obliterate the kernel ABI, so be sure to recompile all your modules.	2006-01-06 18:07:32 +00:00
John Baldwin	af56abaab5	Return error from fget_write() rather than hardcoding EBADF now that fget_write() DTRT. Requested by: bde	2006-01-06 16:34:22 +00:00
John Baldwin	38f63f7e47	Return EBADF rather than EINVAL for FWRITE failure as per POSIX. MFC after: 1 week	2006-01-06 16:30:30 +00:00
John Baldwin	e730167f16	Remove XXX comments complaining that write(2) on a read-only descriptor returns EBADF. That errno is correct and is mandated by POSIX. It also goes back to revision 1.1 of our CVS history (i.e. 4.4BSD). The _fget() function should probably also be upated as it currently returns EINVAL in that case rather than EBADF. (It does return EBADF for reads on a write-only descriptor without any XXX comments oddly enough.) Discussed with: scottl, grog, mjacob, bde	2006-01-05 22:20:31 +00:00
Bjoern A. Zeeb	ba0b6851b4	Minor whitespace cleanup.	2006-01-04 17:40:54 +00:00
Poul-Henning Kamp	d5f1e0d1ef	Deorbit ttymalloc() in preference for ttyalloc()	2006-01-04 09:59:07 +00:00
Poul-Henning Kamp	8607f52b66	Use ttyalloc() instead of ttymalloc()	2006-01-04 09:09:46 +00:00
Poul-Henning Kamp	246b8d448a	Use MTX_SYSINIT to set up the tty list mutex.	2006-01-04 08:22:39 +00:00
Diomidis Spinellis	c3d78136c9	Fix style bug. Prompted by: bde	2006-01-04 07:50:54 +00:00
Diomidis Spinellis	f8ccc6ceb9	Replace tv_usec normalization with the return of EINVAL. This addresses two objections to the previous behavior, and unbreaks the alpha tinderbox build. TODO: update the utimes(2) man page.	2006-01-04 00:47:13 +00:00
Diomidis Spinellis	51339e8593	Normalize the tv_usec part of the utimes(2) arguments to ensure that a file's atime and mtime are only set to correct fractional second values (0-999999000ns with the current interface). Prior to this change users could create files with values outside that range. Moreover, on 32-bit machines tv_usec offsets larger than 4.3s would result in an unnormalized AND wrong timestamp value, due to overflow. MFC after: 1 week	2006-01-03 21:58:21 +00:00
Alexander Leidinger	ef39c05baa	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)	2005-12-31 14:39:20 +00:00
Pawel Jakub Dawidek	d362c40d3a	Improve memguard a bit: - Provide tunable vm.memguard.desc, so one can specify memory type without changing the code and recompiling the kernel. - Allow to use memguard for kernel modules by providing sysctl vm.memguard.desc, which can be changed to short description of memory type before module is loaded. - Move as much memguard code as possible to memguard.c. - Add sysctl node vm.memguard. and move memguard-specific sysctl there. - Add malloc_desc2type() function for finding memory type based on its short description (ks_shortdesc field). - Memory type can be changed (via vm.memguard.desc sysctl) only if it doesn't exist (will be loaded later) or when no memory is allocated yet. If there is allocated memory for the given memory type, return EBUSY. - Implement two ways of memory types comparsion and make safer/slower the default.	2005-12-30 11:45:07 +00:00
Pawel Jakub Dawidek	e7736557d6	Print a warning when we miss vinactive() call, because of race in vget(). The race is very real, but conditions needed for triggering it are rather hard to meet now. When gjournal will be committed (where it is quite easy to trigger) we need to fix it. For now, verify if it is really hard to trigger. Discussed with: kan	2005-12-29 22:52:09 +00:00
John Baldwin	8963150678	patch(1) and I aren't friends today. Axe a duplicate copy of the msleep_spin() function definition. Spotted by: pjd	2005-12-29 21:15:32 +00:00
John Baldwin	0cb7e6aec8	Add a new function msleep_spin() which is a slightly stripped down version of msleep(). msleep_spin() doesn't support changing the priority of the thread while it is asleep nor does it support interruptible sleeps (PCATCH) or the PDROP flag. It does support timeouts however. It differs from msleep() in that the passed in mutex is a spin mutex. This means one can use msleep_spin() and wakeup() with a spin mutex similar to msleep() and wakeup() with a regular mutex. Note that the spin mutex in question needs to come before sched_lock and the sleepq locks in lock order.	2005-12-29 20:57:45 +00:00
John Baldwin	b0e9883e2f	Teach WITNESS_SAVE() and WITNESS_RESTORE() to work with spin locks instead of only sleep locks.	2005-12-29 20:54:25 +00:00
John Baldwin	0a46ed7d56	Fix a deadlock I introduced with the recently added printf to warn about spin locks that are not in the static order list. It is not safe to call printf while holding the witness spin mutex since the console drivers that back printf may need to use their own spin locks which would try to talk to witness when they were locked. Given this, it is possible for one CPU to lock a console driver lock (such as sio) which then tries to lock the witness lock while another CPU is doing the printf while holding the witness lock. Fix this by moving the printf outside of the witness lock. All other printf's in witness are already correct. MFC after: 3 days	2005-12-29 20:53:01 +00:00
John Baldwin	42b6a681bc	Increment kobj_lookup_misses on a miss rather than decrementing it. Otherwise, the miss count is actually -kobj_lookup_misses. Mostly a pedantic change as KOBJ_STATS isn't on by default.	2005-12-29 18:00:42 +00:00
David Xu	3357835a46	Add code to report zombie state. PR: threads/91044 MFC after: 3 days	2005-12-29 13:00:42 +00:00
Alexander Kabaev	3f34977614	Trim trailing whitespace.	2005-12-28 17:13:31 +00:00
Pawel Jakub Dawidek	619f284195	In realloc(9), determine size of the original block based on UMA_SLAB_MALLOC flag. In some circumstances (I observed it when I was doing a lot of reallocs) UMA_SLAB_MALLOC can be set even if us_keg != NULL. If this is the case we have wonderful, silent data corruption, because less data is copied to the newly allocated region than should be. I'm not sure when this bug was introduced, it could be there undetected for years now, as we don't have a lot of realloc(9) consumers and it was hard to reproduce it... ...but what I know for sure, is that I don't want to know who introduce the bug:) It took me two/three days to track it down (of course most of the time I was looking for the bug in my own code).	2005-12-28 01:53:13 +00:00
David Xu	9f8eb3cb52	Use variable i instead of variable cpus as an index to get correct kseq.	2005-12-27 12:02:03 +00:00
Maxim Sobolev	d49b21093c	Fix breakage introduced in the previous commit.	2005-12-26 22:32:52 +00:00
Maxim Sobolev	900b28f9f6	Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually allow executing elf dynamic binaries (aka shared libraries). When it is requested to execute ET_DYN elf image check if this flag is on after we know the elf brand allowing execution if so. PR: kern/87615 Submitted by: Marcin Koziej <creep@desk.pl>	2005-12-26 21:23:57 +00:00
Alan Cox	60bb39431a	Maintain the lock on the vnode for most of exec_elfN_imgact(). Specifically, it is required for the I/O that may be performed by elfN_load_section(). Avoid an obscure deadlock in the a.out, elf, and gzip image activators. Add a comment describing why the deadlock does not occur in the common case and how it might occur in less usual circumstances. Eliminate an unused variable from exec_aout_imgact(). In collaboration with: tegge	2005-12-24 04:57:50 +00:00
David Xu	d7bc12b096	Avoid kernel panic when attaching a process which may not be stopped by debugger, e.g process is dumping core. Only access p_xthread if P_STOPPED_TRACE is set, this means thread is ready to exchange signal with debugger, print a warning if P_STOPPED_TRACE is not set due to some bugs in other code, if there is. The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and is slightly adjusted.	2005-12-24 02:59:29 +00:00
Jeff Roberson	49bdcff518	- Remove and unused include. Submitted by: Antoine Brodin <antoine.brodin@laposte.net>	2005-12-23 21:32:40 +00:00
Poul-Henning Kamp	25f6e35a05	Regenerate sysent with new abort2 system call. Implement abort2(const char reason, int narg, void *args); Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>	2005-12-23 11:58:42 +00:00
Poul-Henning Kamp	5a56b437ec	Add abort2() systemcall.	2005-12-23 11:54:11 +00:00
Poul-Henning Kamp	49091c48d5	Make sbuf_copyin() return the number of bytes copied on success. Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>	2005-12-23 11:49:53 +00:00
Scott Long	d2a401cb70	Create the taskqueue_fast handler with INTR_MPSAFE so that it doesn't run with Giant. MFC After: 3 days	2005-12-23 06:18:33 +00:00
John Baldwin	b439e431bf	Tweak how the MD code calls the fooclock() methods some. Instead of passing a pointer to an opaque clockframe structure and requiring the MD code to supply CLKF_FOO() macros to extract needed values out of the opaque structure, just pass the needed values directly. In practice this means passing the pair (usermode, pc) to hardclock() and profclock() and passing the boolean (usermode) to hardclock_cpu() and hardclock_process(). Other details: - Axe clockframe and CLKF_FOO() macros on all architectures. Basically, all the archs were taking a trapframe and converting it into a clockframe one way or another. Now they can just extract the PC and usermode values directly out of the trapframe and pass it to fooclock(). - Renamed hardclock_process() to hardclock_cpu() as the latter is more accurate. - On Alpha, we now run profclock() at hz (profhz == hz) rather than at the slower stathz. - On Alpha, for the TurboLaser machines that don't have an 8254 timecounter, call hardclock() directly. This removes an extra conditional check from every clock interrupt on Alpha on the BSP. There is probably room for even further pruning here by changing Alpha to use the simplified timecounter we use on x86 with the lapic timer since we don't get interrupts from the 8254 on Alpha anyway. - On x86, clkintr() shouldn't ever be called now unless using_lapic_timer is false, so add a KASSERT() to that affect and remove a condition to slightly optimize the non-lapic case. - Change prototypeof arm_handler_execute() so that it's first arg is a trapframe pointer rather than a void pointer for clarity. - Use KCOUNT macro in profclock() to lookup the kernel profiling bucket. Tested on: alpha, amd64, arm, i386, ia64, sparc64 Reviewed by: bde (mostly)	2005-12-22 22:16:09 +00:00
Alan Cox	373d1a3f8c	Maintain the vnode lock throughout elfN_load_file() rather than releasing it and reacquiring it in vrele(). Consequently, there is no reason to increase the reference count on the vm object caching the file's pages. Reviewed by: tegge Eliminate unused parameters to elfN_load_file().	2005-12-21 18:58:40 +00:00
Alan Cox	ff6f03c7cd	Eliminate an unneeded (vm_prot_t) parameter from two functions. Eliminate unnecessary uses of a local variable. Reviewed by: tegge	2005-12-20 23:42:18 +00:00
Pawel Jakub Dawidek	c505fe7a0f	Reduce Giant scope a bit, as fdrop() is believed to be MPSAFE. The purpose of this change is consistency (not performance improvement:)), as it was hard to tell if fdrop() is MPSAFE or not when I saw it sometimes under the Giant and sometimes without it. Glanced at by: ssouhlal, kan	2005-12-20 00:49:59 +00:00
Pawel Jakub Dawidek	ade9b797a0	vfs_mount_alloc() always returns 0, but what we really want is newly allocated 'struct mount *' pointer, so simplify code a bit and return the pointer directly. Reviewed by: ssouhlal	2005-12-20 00:43:51 +00:00
Pawel Jakub Dawidek	003ba8a000	Use 'td' instead of 'curthread'.	2005-12-19 16:27:13 +00:00
David Xu	a1d4fe69d2	Fix a bug in slice calculation code, current code uses hz but sched_clock() is called by state clock. Submitted by: taku at tackymt dot homeip dot net	2005-12-19 08:26:09 +00:00
Nate Lawson	bd6b217753	Remove the KTR for hardclock completely. It seems to not be useful. Requested by: jhb	2005-12-18 18:11:55 +00:00
Nate Lawson	1335c4df32	Restore KTR_CRITICAL but conditionally compile it in as KTR_SCHED. Requested by: scottl, jhb	2005-12-18 18:10:57 +00:00
Marcel Moolenaar	757686b115	Make our ELF64 type definitions match standards. In particular this means: o Remove Elf64_Quarter, o Redefine Elf64_Half to be 16-bit, o Redefine Elf64_Word to be 32-bit, o Add Elf64_Xword and Elf64_Sxword for 64-bit entities, o Use Elf_Size in MI code to abstract the difference between Elf32_Word and Elf64_Word. o Add Elf_Ssize as the signed counterpart of Elf_Size. MFC after: 2 weeks	2005-12-18 04:52:37 +00:00
Alan Cox	044bbbb523	Correct a long-standing problem in elfN_map_insert(): In order to copy a page to user space, the user space mapping must allow write access. In collaboration with: tegge@ MFC after: 3 weeks	2005-12-17 19:40:47 +00:00
Nate Lawson	8615fd8696	Clean up unused or poorly utilized KTR values. Remove KTR_FS, KTR_KGDB, and KTR_IO as they were never used. Remove KTR_CLK since it was only used for hardclock firing and use KTR_INTR there instead. Remove KTR_CRITICAL since it was only used for crit enter/exit and use KTR_CONTENTION instead.	2005-12-17 03:57:10 +00:00
John Baldwin	5c8b444153	- Use uintfptr_t rather than int for the kernel profiling index (though it really should be a fptrdiff_t if we had that) in profclock(). - Don't try to profile kernel pc's that are >= the kernel lowpc to avoid underflows when computing a profiling index. - Use the PC_TO_I() macro to compute the kernel profiling index rather than doing it inline. Discussed with: bde	2005-12-16 22:11:52 +00:00
John Baldwin	cb49fcd145	Change the addupc_*() functions to use the uintfptr_t type for pc rather than uintptr_t as that is technically more correct.	2005-12-16 22:08:32 +00:00
Alan Cox	584716b08a	Style: The second argument to vm_map_find() should be NULL instead of 0.	2005-12-16 19:14:25 +00:00
Alan Cox	da61b9a69e	Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the ephemeral mappings that are used as the source for three copy operations from kernel space to user space. There are two reasons for making this change: (1) Under heavy load exec_map can fill up causing vm_map_find() to fail. When it fails, the nascent process is aborted (SIGABRT). Whereas, this reimplementation using sf_buf_alloc() sleeps. (2) Although it is possible to sleep on vm_map_find()'s failure until address space becomes available (see kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore, the reimplementation uses a CPU private mapping, avoiding a TLB shootdown on multiprocessors. Problem uncovered by: kris@ Reviewed by: tegge@ MFC after: 3 weeks	2005-12-16 18:34:14 +00:00
Xin LI	6ba9ec2d09	In pipe_write(): when uiomove() fails, do not spin on it forever. Submitted by: Kostik Belousov <kostikbel at gmail.com> on -current@ Message-ID: <20051216151016.GE84442@deviant.zoral.local> MFC After: 3 weeks	2005-12-16 18:32:39 +00:00
David Xu	03f70aec67	Replace selwakeuppri with selwakeup, let scheduler figure out appropriate thread priority.	2005-12-16 15:01:16 +00:00
Ed Maste	63e6f39011	When using m_dup(9) to copy more than MHLEN bytes of data, don't create an mbuf chain that starts with a cluster containing just MHLEN bytes. This happened because m_dup called m_get or m_getcl depending on the amount of data to copy, but then always set the size available in the first mbuf to MHLEN. Submitted by: Matt Koivisto <mkoivisto at sandvine dot com> Approved by: jmg Silence from: rwatson (mentor)	2005-12-14 23:34:26 +00:00
Maxime Henrion	e59898ff36	Fix a bunch of SYSCTL_INT() that should have been SYSCTL_ULONG() to match the type of the variable they are exporting. Spotted by: Thomas Hurst <tom@hur.st> MFC after: 3 days	2005-12-14 22:27:48 +00:00
Dag-Erling Smørgrav	0430a5e289	Eradicate caddr_t from the VFS API.	2005-12-14 00:49:52 +00:00

1 2 3 4 5 ...

9144 Commits