freebsd-dev

Author	SHA1	Message	Date
Robert Watson	4024496496	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the necessary MAC entry points to maintain labels on process credentials. In particular, invoke entry points for the initialization and destruction of struct ucred, the copying of struct ucred, and permit the initial labels to be set for both process 0 (parent of all kernel processes) and process 1 (parent of all user processes). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 00:39:19 +00:00
Robert Watson	47ac133d33	Regen.	2002-07-31 00:16:58 +00:00
Robert Watson	55fb783052	Introduce support for Mandatory Access Control and extensible kernel access control. Replace 'void ' with 'struct mac ' now that mac.h is in the base tree. The current POSIX.1e-derived userland MAC interface is schedule for replacement, but will act as a functional placeholder until the replacement is done. These system calls allow userland processes to get and set labels on both the current process, as well as file system objects and file descriptor backed objects.	2002-07-30 22:43:20 +00:00
Robert Watson	f8ef020e2e	Begin committing support for Mandatory Access Control and extensible kernel access control. The MAC framework permits loadable kernel modules to link to the kernel at compile-time, boot-time, or run-time, and augment the system security policy. This commit includes the initial kernel implementation, although the interface with the userland components of the operating system is still under work, and not all kernel subsystems are supported. Later in this commit sequence, documentation of which kernel subsystems will not work correctly with a kernel compiled with MAC support will be added. Introduce two node vnode operations required to support MAC. First, VOP_REFRESHLABEL(), which will be invoked by callers requiring that vp->v_label be sufficiently "fresh" for access control purposes. Second, VOP_SETLABEL(), which be invoked by callers requiring that the passed label contents be updated. The file system is responsible for updating v_label if appropriate in coordination with the MAC framework, as well as committing to disk. File systems that are not MAC-aware need not implement these VOPs, as the MAC framework will default to maintaining a single label for all vnodes based on the label on the file system mount point. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 22:15:09 +00:00
Robert Watson	95fab37ea8	Begin committing support for Mandatory Access Control and extensible kernel access control. The MAC framework permits loadable kernel modules to link to the kernel at compile-time, boot-time, or run-time, and augment the system security policy. This commit includes the initial kernel implementation, although the interface with the userland components of the oeprating system is still under work, and not all kernel subsystems are supported. Later in this commit sequence, documentation of which kernel subsystems will not work correctly with a kernel compiled with MAC support will be added. kern_mac.c contains the body of the MAC framework. Kernel and user APIs defined in mac.h are implemented here, providing a front end to loaded security modules. This code implements a module registration service, state (label) management, security configuration and policy composition. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 21:36:05 +00:00
Julian Elischer	4d492b4369	Don't need to hold schedlock specifically for stop() ans it calls wakeup() that locks it anyhow. Reviewed by: jhb@freebsd.org	2002-07-30 21:13:48 +00:00
Bosko Milekic	c89137ff90	Make reference counting for mbuf clusters [only] work like in RELENG_4. While I don't think this is the best solution, it certainly is the fastest and in trying to find bottlenecks in network related code I want this out of the way, so that I don't have to think about it. What this means, for mbuf clusters anyway is: - one less malloc() to do for every cluster allocation (replaced with a relatively quick calculation + assignment) - no more free() in the cluster free case (replaced with empty space) :-) This can offer a substantial throughput improvement, but it may not for all cases. Particularly noticable for larger buffer sends/recvs. See http://people.freebsd.org/~bmilekic/code/measure2.txt for a rough idea.	2002-07-30 21:06:27 +00:00
Alan Cox	1812190d09	o Replace vm_page_sleep_busy() with vm_page_sleep_if_busy() in vfs_busy_pages().	2002-07-30 20:41:10 +00:00
Julian Elischer	b8e45df779	Remove code that removes thread from sleep queue before adding it to a condvar wait. We do not have asleep() any more so this can not happen.	2002-07-30 20:34:30 +00:00
Alan Cox	1161b86a15	o In do_sendfile(), replace vm_page_sleep_busy() by vm_page_sleep_if_busy() and extend the scope of the page queues lock to cover all accesses to the page's flags and busy fields.	2002-07-30 18:51:07 +00:00
Robert Watson	e66c87b70e	When referencing nd_cnp after namei(), always pass SAVENAME into NDINIT() operation flags. Submitted by: green Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 18:48:25 +00:00
Robert Watson	e37b1fcdee	Make M_COPY_PKTHDR() macro into a wrapper for a m_copy_pkthdr() function. This permits conditionally compiled extensions to the packet header copying semantic, such as extensions to copy MAC labels. Reviewed by: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 18:28:58 +00:00
Robert Watson	4266d0d0ce	Regen.	2002-07-30 16:52:22 +00:00
Robert Watson	aedbd622fe	Introduce a mac_policy() system call that will provide MAC policies with a general purpose front end entry point for user applications to invoke. The MAC framework will route the system call to the appropriate policy by name. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 16:50:25 +00:00
Jacques Vidrine	89ab930718	For processes which are set-user-ID or set-group-ID, the kernel performs a few special actions for safety. One of these is to make sure that file descriptors 0..2 are in use, by opening /dev/null for those that are not already open. Another is to close any file descriptors 0..2 that reference procfs. However, these checks were made out of order, so that it was still possible for a set-user-ID or set-group-ID process to be started with some of the file descriptors 0..2 unused. Submitted by: Georgi Guninski <guninski@guninski.com>	2002-07-30 15:38:29 +00:00
Seigo Tanimura	133267776c	In endtsleep() and cv_timedwait_end(), a thread marked TDF_TIMEOUT may be swapped out. Do not put such the thread directly back to the run queue. Spotted by: David Xu <davidx@viasoft.com.cn> While I am here, s/PS_TIMEOUT/TDF_TIMEOUT/.	2002-07-30 10:12:11 +00:00
Jeff Roberson	1e4c7a1368	- Acknowledge recursive vnode locks in the vop_unlock specification. The vnode may not be unlocked even if the operation succeeded.	2002-07-30 08:50:52 +00:00
Seigo Tanimura	9eb881f804	- Optimize wakeup() and its friends; if a thread waken up is being swapped in, we do not have to ask for the scheduler thread to do that. - Assert that a process is not swapped out in runq functions and swapout(). - Introduce thread_safetoswapout() for readability. - In swapout_procs(), perform a test that may block (check of a thread working on its vm map) first. This lets us call swapout() with the sched_lock held, providing a better atomicity.	2002-07-30 06:54:05 +00:00
Mike Silbersack	c4441bc769	Update docs to reflect change in count of procs reserved for root from 1 to 10. PR: kern/40515 Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU> MFC after: 1 day	2002-07-30 05:37:00 +00:00
Robert Watson	03a719dcd1	Rebuild of files generated from syscalls.master. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 02:09:24 +00:00
Robert Watson	5d37d00afc	Prototype function arguments, only with MAC-specific structures replaced with void until we bring in the actual structure definitions. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 02:06:34 +00:00
Robert Watson	7bc8250003	Stubs for the TrustedBSD MAC system calls to permit TrustedBSD MAC userland code to operate on kernel's from the main tree. Not much in this file yet. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 02:04:05 +00:00
Julian Elischer	1d7b9ed2e6	Create a new thread state to describe threads that would be ready to run except for the fact tha they are presently swapped out. Also add a process flag to indicate that the process has started the struggle to swap back in. This will be needed for the case where multiple threads start the swapin action top a collision. Also add code to stop a process fropm being swapped out if one of the threads in this process is actually off running on another CPU.. that might hurt... Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>	2002-07-29 18:33:32 +00:00
Jeff Roberson	a562685f65	- Backout the patch made in revision 1.75 of vfs_mount.c. The vputs here were hiding the real problem of the missing unlock in sync_inactive. - Add the missing unlock in sync_inactive. Submitted by: iedowse	2002-07-29 06:26:55 +00:00
Don Lewis	9e74cba35a	Make a temporary copy of the output data in the generic sysctl handlers so that the data is less likely to be inconsistent if SYSCTL_OUT() blocks. If the data is large, wire the output buffer instead. This is somewhat less than optimal, since the handler could skip the copy if it knew that the data was static. If the data is dynamic, we are still not guaranteed to get a consistent copy since another processor could change the data while the copy is in progress because the data is not locked. This problem could be solved if the generic handlers had the ability to grab the proper lock before the copy and release it afterwards. This may duplicate work done in other sysctl handlers in the kernel which also copy the data, possibly while a lock is held, before calling they call a generic handler to output the data. These handlers should probably call SYSCTL_OUT() directly.	2002-07-28 21:06:14 +00:00
Don Lewis	5c38b6dbce	Wire the sysctl output buffer before grabbing any locks to prevent SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.	2002-07-28 19:59:31 +00:00
David Malone	25dec7474c	If a socket is disconnected for some reason (like a TCP connection not responding) then drop any data on the outgoing queue in soisdisconnected because there is no way to get it to its destination any longer. The only objection to this patch I got on -net was from Terry, who wasn't sure that the condition in question could arise, so I provided some example code.	2002-07-27 23:06:52 +00:00
Robert Watson	d06c0d4d40	Slight restructuring of the logic for credential change case identification during execve() to use a 'credential_changing' variable. This makes it easier to have outstanding patchsets against this code, as well as to add conditionally defined clauses. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-27 18:06:49 +00:00
John Baldwin	ce39e722ec	Disable optimization of spinlocks on UP kernels w/o debugging for now since it breaks mtx_owned() on spin mutexes when used outside of mtx_assert(). Unfortunately we currently use it in the i386 MD code and in the sio(4) driver. Reported by: bde	2002-07-27 16:54:23 +00:00
Jeff Roberson	0ea5e55265	- The default for lock, unlock, and islocked is now std* instead of no*.	2002-07-27 05:16:20 +00:00
Robert Drehmel	e25dadb05d	Fix -Werror build for sparc64: Use the appropriate conversion specifier for an 'unsigned int' argument.	2002-07-26 12:57:57 +00:00
Julian Elischer	8625e3721f	get suspension counting right. fix an error message Submitted by: David Xu <bsddiy@yahoo.com>	2002-07-25 03:21:35 +00:00
Julian Elischer	e3b9bf7198	fix some style problems and remove a mis-merged assert.	2002-07-25 00:27:39 +00:00
Julian Elischer	294e6308bf	slight stylisations to take into account recent code changes.	2002-07-24 23:59:15 +00:00
Julian Elischer	b6d5995e5f	Add some locking asserts and some comments	2002-07-24 23:21:05 +00:00
Julian Elischer	cf19bf911d	When single threading a multithreaded program, awaken the 'single threading thread' when the last other thread suspends. I had this code in there before but it seems to have been accidentally deleted somewhere along the way. This would only affect multithreaded processes. Reviewed by: David Xu <bsddiy@yahoo.com>	2002-07-24 19:50:08 +00:00
Maxime Henrion	dae0abedbd	Fix a stupid bug where I wasn't initializing the names of 0-length mount options.	2002-07-24 19:50:00 +00:00
Robert Watson	eeb9251884	Under #ifdef DIAGNOSTIC, NULL out componentname pointers if we free the pnbuf to increase the chances of detecting use of a free'd name buffer if SAVENAME or SAVESTART wasn't passed in. Curiously, running with these changes doesn't panic the kernel, and should.	2002-07-24 15:42:22 +00:00
Bosko Milekic	4151d2e620	Move m_freem() from uipc_mbuf.c to subr_mbuf.c so it can take advantage of the inlines, like its cousin, m_free(). Also, make a small (first step?) optimisation of m_free() to use the MBP_PERSIST{,ENT} interface to hold the lock across frees when possible. The thing is that right now, we can only do this easily for at most across one mbuf + one cluster free, as the comment mentions (it also explains why). Anyway, some basic tests revealed a 5-10% overall improvement. Some of the results can be found here: http://people.freebsd.org/~bmilekic/code/measure.txt	2002-07-24 15:11:23 +00:00
Mike Barcroft	5f0de71223	Catch up to rev 1.87 of sys/sys/socketvar.h (sb_cc changed from u_long to u_int). Noticed by: sparc64 tinderbox	2002-07-24 14:21:41 +00:00
Julian Elischer	205683663f	When suspending a thread, update the appropriate (sic) statistic.	2002-07-24 07:29:16 +00:00
Julian Elischer	38038891e9	revert some of the handling of STOP signals in issignal(). Let thread_suspend_check() actually do the suspension at the user boundary. Submitted by: David Xu <bsddiy@yahoo.com>	2002-07-24 07:23:41 +00:00
John Polstra	f824b5187e	Widen struct sockbuf's sb_timeo member to int from short. With non-default but reasonable values of hz this member overflowed, breaking NFS over UDP. Also, as long as I'm plowing up struct sockbuf ... Change certain members from u_long/long to u_int/int in order to reduce wasted space on 64-bit machines. This change was requested by Andrew Gallatin. Netstat and systat need to be rebuilt. I am incrementing __FreeBSD_version in case any ports need to change.	2002-07-24 03:02:43 +00:00
Alfred Perlstein	b605b54ce3	Attempt to clarify comment in selrecord.	2002-07-24 00:29:22 +00:00
Bosko Milekic	dd4ac026f7	Introduce mb_free() to the MBP_PERSIST{,ENT} interface. What this means is that grouped frees will be done as most often as possible without dropping the cache lock in between. So, for the most part, they'll be done without the lock being dropped. This is particularly true if you have something that does a grouped m_getm() or m_getcl() (a cluster and mbuf at the same time) - most likely getting the buffers from the same per-CPU cache - and then frees them with m_free{,m}(). Unless the buffers' underlying buckets were moved, the free will be done without the lock getting dropped in between. So far, only m_free() has been shown how to do this, and m_freem() will shortly follow. Since I'm here, I also fixed a small (but mostly harmless) type-mismatch introduced in the last commit.	2002-07-23 14:55:33 +00:00
Alexander Kabaev	1c4229a6a7	Fix DIOCGMEDIASIZE and DIOCGSECTORSIZE ioctls to work for all disk devices. This fixes the problem with these ioctls returning EINVAL for plain slice devices with no disklabel on them. The patch incorporates improvements and style fixes from BDE. Reviewed by: bde Approved by: obrien (mentor)	2002-07-23 14:30:27 +00:00
Andrew R. Reiter	5d3232048e	- Make use of the VM_ALLOC_WIRED flag in the call to vm_page_alloc() in do_sendfile(). This allows us to rearrange an if statement in order to avoid doing an unnecesary call to vm_page_lock_queues(), and an attempt at re-wiring the pages (which were wired in the vm_page_alloc() call). Reviewed by: alc, jhb	2002-07-23 01:09:34 +00:00
Alfred Perlstein	1a5a641600	Remove unneeded caddr_t casts.	2002-07-22 19:05:44 +00:00
Alfred Perlstein	fd6d9be4f5	Cleanup: Define a debug printf macro rather than wrapping all calls to printf with #ifdefs.	2002-07-22 18:27:54 +00:00
Alfred Perlstein	8209f090f1	Change struct vmspace->vm_shm from void * to struct shmmap_state *, this removes the need for casts in several cases.	2002-07-22 16:22:27 +00:00
Alfred Perlstein	2cc593fd8e	Remove caddr_t.	2002-07-22 16:12:55 +00:00
Alfred Perlstein	d452ec95a9	remove caddr_t from fo_ioctl calls	2002-07-22 15:46:51 +00:00
Alfred Perlstein	0a3e28cf1c	remove caddr_t	2002-07-22 15:44:27 +00:00
Robert Watson	0b1040cb88	Set VAPPEND in open mode when O_APPEND is specified as an argument to open() of fhopen(). Currently this has no actual affect due to the treatment of VAPPEND in vaccess() and vaccess_acl() as a subset of VWRITE, but when MAC comes in, MAC will distinguish the two. Note: if any file systems are cutting their own permission models, they may wish to now take this into account. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-22 12:51:06 +00:00
Don Lewis	dcbe050b29	Pre-wire the output buffer so that sysctl_kern_function_list() doesn't block in SYSCTL_OUT() while holding a lock.	2002-07-22 08:28:09 +00:00
Don Lewis	0600730d73	Provide a way for sysctl handlers to pre-wire their output buffer before they grab a lock so that they don't block in SYSCTL_OUT() with the lock being held.	2002-07-22 08:25:37 +00:00
Robert Watson	b02aac465d	Teach discretionary access control methods for files about VAPPEND and VALLPERM. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-22 03:57:07 +00:00
Alan Cox	c1d5e2741e	o Lock page queue accesses by vm_page_free().	2002-07-21 19:06:46 +00:00
Johan Karlsson	5b60674451	Save flags returned by vn_open and use them when calling vn_close. Reviewed by: bde Approved by: sheldonh (mentor)	2002-07-21 15:22:56 +00:00
Warner Losh	5878eb3fca	Add bus_child_present and the child_present method to bus_if.m	2002-07-21 03:28:43 +00:00
Robert Watson	4f18efe220	Do preserve the error result from calling p_cansee() and use that when failing because of the error. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-20 22:44:39 +00:00
Peter Wemm	3ebc124838	Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable handler in the kernel at the same time. Also, allow for the exec_new_vmspace() code to build a different sized vmspace depending on the executable environment. This is a big help for execing i386 binaries on ia64. The ELF exec code grows the ability to map partial pages when there is a page size difference, eg: emulating 4K pages on 8K or 16K hardware pages. Flesh out the i386 emulation support for ia64. At this point, the only binary that I know of that fails is cvsup, because the cvsup runtime tries to execute code in pages not marked executable. Obtained from: dfr (mostly, many tweaks from me).	2002-07-20 02:56:12 +00:00
Alan Cox	4aca0b1510	o Use vm_page_alloc(... \| VM_ALLOC_WIRED) in place of vm_page_wire().	2002-07-19 19:35:06 +00:00
Maxime Henrion	0f3b0aa87c	Wrap a line longer than 80 characters.	2002-07-19 17:44:44 +00:00
Maxime Henrion	72fda5bc50	- Merge the mount options at MNT_UPDATE time with vfs_mergeopts(). - Sanity check the mount options list (remove duplicates) with vfs_sanitizeopts(). - Fix some malloc(0)/free(NULL) bugs. Reviewed by: rwatson (some time ago)	2002-07-19 16:05:31 +00:00
Kirk McKusick	7aca6291e3	Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags \|= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.	2002-07-19 07:29:39 +00:00
Julian Elischer	9f189ade99	Clear up confusion in ugly code. ^T gave wrong results for RSS. I misinterpretted this code when changing it to handle threads. (there are still issues here) Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2002-07-18 21:19:56 +00:00
Peter Wemm	02fb42b0a8	ia64 does not have the same degree of stealth include file nesting, so it needs an explicit #include <machine/frame.h> to get 'struct trapframe'. The fact that it needs this at this level is rather bogus but it will not compile without it.	2002-07-17 23:43:55 +00:00
Peter Wemm	8a2bd34560	Pacify gcc on ia64	2002-07-17 23:32:13 +00:00
Julian Elischer	2d014fd7f8	Fix a reversed test. Fix some style nits. Fix a KASSERT message. Add/fix some comments. Submitted by: bde@freebsd.org	2002-07-17 19:20:48 +00:00
Julian Elischer	cad4143a58	Make sure the process state for the idle proc is set correctly from the beginning.	2002-07-17 19:18:45 +00:00
John Baldwin	3d3f20cbe6	Preallocate a struct file as the first thing in falloc() before we lock the filelist_lock and check nfiles. This closes a race where we had to unlock the filedesc to re-lock the filelist_lock. Reported by: David Xu Reviewed by: bde (mostly)	2002-07-17 02:48:43 +00:00
John Baldwin	627ed43ba7	Add a KASSERT() to assert that td_critnest is == 1 when mi_switch() is called.	2002-07-17 02:46:13 +00:00
Andrew Gallatin	fe79953325	Allow alphas to do crashdumps: Refuse to run anything in choosethread() after a panic which is not an interrupt thread, or the thread which caused the panic. Also, remove panicstr checks from msleep() and from cv_wait() in order to allow threads to go to sleep and yeild the cpu to the panicing thread, or to an interrupt thread which might be doing the crashdump. Reviewed by: jhb (and it was mostly his idea too)	2002-07-17 02:23:44 +00:00
Kirk McKusick	fb36a3d847	Change utimes to set the file creation time (for filesystems that support creation times such as UFS2) to the value of the modification time if the value of the modification time is older than the current creation time. See utimes(2) for further details. Sponsored by: DARPA & NAI Labs.	2002-07-17 02:03:19 +00:00
Kirk McKusick	faab4e2722	Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.	2002-07-16 22:36:00 +00:00
Mark Murray	f0d2d03884	Fix a bazillion lint and WARNS warnings. One major fix is the removal of semicolons from the end of macros: #define FOO() bar(a,b,c); becomes #define FOO() bar(a,b,c) Thus requiring the semicolon in the invocation of FOO. This is much cleaner syntax and more consistent with expectations when writing function-like things in source. With both peril-sensitive sunglasses and flame-proof undies on, tighten up some types, and work around some warnings generated by this. There are some _horrible_ const/non-const issues in this code.	2002-07-15 17:28:34 +00:00
Mark Murray	b90cce95e0	Use ISO 9X variadic macro format; arguments are not optional, just variable.	2002-07-15 17:17:56 +00:00
Bosko Milekic	185c2244ce	o Introduce new m_getcl() interface routine that allocates an mbuf and a cluster in one shot. o Introduce MBP_PERSIST and MBP_PERSISTENT control bits to mb_alloc(); MBP_PERSIST means "if you can allocate, then keep the cache lock held on exit," and MBP_PERSISTENT means "a cache lock is alredy held on entry, so allocate from the specified (already locked) cache." They may be used in combination. o m_getcl() uses the MBP_PERSIST/MBP_PERSISTENT interface so that it doesn't drop the cache lock in between the mbuf and cluster allocations. o m_getm(), which takes a size and allocates an mbuf + cluster "best fit" chain, has been moved from uipc_mbuf.c to subr_mbuf.c and shown how to use MBP_PERSIST/MBP_PERSISTENT to attempt to do a grouped allocation without dropping the cache lock in between. Why this is good: much less bus-locked lock acquires/drops when they're not needed. Also, prototype for m_getcl(): struct mbuf * m_getcl(int how, short type, int flags); "how" and "type" are self-explanatory. "flags" may be M_PKTHDR, in which case m_getcl() will make the mbuf a pkthdr-mbuf. While I'm in subr_mbuf.c: o Every exported routine now has a nice comment with a description of the expected arguments. Eventually, mbuf(9) needs to be re-vamped but there's still more code to write/finalize before I get to that. o internal macros have been changed a bit. o consistently use 'short' for "type." This somehow slipped through before (that 'type' was sometimes declared as int). Alfred has been pushing for the MBP_PERSIST{,ENT} thing for almost a year now. Luigi asked for m_getcl(), and will probably MFC that part of this commit. TODO [Related]: teach mb_free() about MBP_PERSIST{, ENT}.	2002-07-15 15:32:59 +00:00
Mark Murray	1cc6a5356c	Consistently use semicolons to terminate macro invocations. Cleaner style and fixes later warnings.	2002-07-15 13:17:23 +00:00
Mark Murray	8deedb62c1	Convert GNU-styled variadic macros to ISO(9x) style.	2002-07-15 13:15:31 +00:00
Mark Murray	4f8cb019ea	Use a semicolon at the end of a function-like macro invocation. Kills warnings and makes the visual style easier.	2002-07-15 13:13:04 +00:00
Mark Peek	11a78c514f	Silence compiler warnings when DDB is not defined. PR: 36002 Submitted by: Yoshikazu GOTO <goto@snowy.to>	2002-07-15 02:03:17 +00:00
Alan Cox	9175709532	o Lock page queue accesses by vm_page_wire().	2002-07-14 19:45:46 +00:00
Alan Cox	b3afd20d9a	In execve(), delay the acquisition of Giant until after kmem_alloc_wait(). (Operations on the exec_map don't require Giant.)	2002-07-14 17:58:35 +00:00
Julian Elischer	66d593142d	part of a greater patch set.. 1/ don't need to set td_state to TDS_RUNNING in fork_return. it's already set in choosethread(). 2/ Set a child process state to "normal" as opposed to "new" when we allow it to be put on the run queue. Allows child to receive signals from the parent if the parent runs first and tries to immediatly signal he child. Submitted by: (part 2) Thomas Moestl <tmoestl@gmx.net>	2002-07-14 08:29:15 +00:00
Julian Elischer	c3b98db091	Thinking about it I came to the conclusion that the KSE states were incorrectly formulated. The correct states should be: IDLE: On the idle KSE list for that KSEG RUNQ: Linked onto the system run queue. THREAD: Attached to a thread and slaved to whatever state the thread is in. This means that most places where we were adjusting kse state can go away as it is just moving around because the thread is.. The only places we need to adjust the KSE state is in transition to and from the idle and run queues. Reviewed by: jhb@freebsd.org	2002-07-14 03:43:33 +00:00
Julian Elischer	ac8bcbb700	oops, state cannot be two different values at once.. use \|\| instead of &&	2002-07-14 01:36:48 +00:00
Alan Cox	5123aaef42	o Lock some page queue accesses, in particular, those by vm_page_unwire().	2002-07-13 20:13:34 +00:00
Alfred Perlstein	8a32e0c96f	Remove incorrect comment about now corrected manpage.	2002-07-13 17:11:17 +00:00
Alan Cox	9d1291be8f	Lock accesses to the page queues.	2002-07-13 04:37:22 +00:00
Alan Cox	b416fa1041	o Lock accesses to the page queues. o Add a comment explaining why hoisting the page queue lock outside of a particular loop is not possible.	2002-07-13 04:09:45 +00:00
John Baldwin	03d7a9fffb	- Change chroot_refuse_vdir_fds() to require that the passed in struct filedesc is already locked rather than having chroot() unlock the filedesc so chroot_refuse_vdir_fds() can immediately relock it. - Reorder chroot() a bitso that we do the namei lookup before checking the process's struct filedesc. This closes at least one potential race and allows us to only acquire the filedsec lock once in chroot(). - Push down Giant slightly into chroot().	2002-07-13 04:07:12 +00:00
John Baldwin	63c9e754e0	We don't need to clear oldcred here since newcred is not NULL yet.	2002-07-13 03:13:15 +00:00
Alan Cox	ae0ffa73cc	Lock accesses to the page queues by sendfile() and friends.	2002-07-13 03:10:55 +00:00
Matthew Dillon	fbcf77c2ea	Re-enable the idle page-zeroing code. Remove all IPIs from the idle page-zeroing code as well as from the general page-zeroing code and use a lazy tlb page invalidation scheme based on a callback made at the end of mi_switch. A number of people came up with this idea at the same time so credit belongs to Peter, John, and Jake as well. Two-way SMP buildworld -j 5 tests (second run, after stabilization) 2282.76 real 2515.17 user 704.22 sys before peter's IPI commit 2266.69 real 2467.50 user 633.77 sys after peter's commit 2232.80 real 2468.99 user 615.89 sys after this commit Reviewed by: peter, jhb Approved by: peter	2002-07-12 20:17:06 +00:00
Julian Elischer	40e550266d	also set the KSE state for the idle KSE/thread case.	2002-07-12 20:16:46 +00:00
John Baldwin	33d7ad1abe	Set the thread state of the newly chosen to run thread to TDS_RUNNING in choosethread() in MI C code instead of doing it in in assembly in all the various cpu_switch() functions. This fixes problems on ia64 and sparc64. Reviewed by: julian, peter, benno Tested on: i386, alpha, sparc64	2002-07-12 18:34:22 +00:00
Alan Cox	a4e80b6b64	Lock accesses to the page queues.	2002-07-12 17:21:22 +00:00
Thomas Moestl	5c85966098	Fix ptrace(PT_READ_*, ...) for non-little-endian architectures where sizeof(register_t) != sizeof(int).	2002-07-12 16:48:05 +00:00
Peter Wemm	f1b665c8fe	Revive backed out pmap related changes from Feb 2002. The highlights are: - It actually works this time, honest! - Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive, so try and optimize things where possible. - Introduce ranged shootdowns that can be done as a single IPI. - PG_G support for i386 - Specific-cpu targeted shootdowns. For example, there is no sense in globally purging the TLB cache for where we are stealing a page from the local unshared process on the local cpu. Use pm_active to track this. - Add some instrumentation for the tlb shootdown code. - Rip out SMP code from <machine/cpufunc.h> - Try and fix some very bogus PG_G and PG_PS interactions that were bad enough to cause vm86 bios calls to break. vm86 depended on our existing bugs and this was the cause of the VESA panics last time. - Fix the silly one-line error that caused the 'panic: bad pte' last time. - Fix a couple of other silly one-line errors that should have caused more pain than they did. Some more work is needed: - pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we have a hook in cpu_switch. - The IPI handlers need some cleanup. I have a bogus %ds load that can be avoided. - APTD handling is rather bogus and appears to be a large source of global TLB IPI shootdowns for no really good reason. I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop. I expect to see a bigger difference when there is significant pageout activity or the system otherwise has memory shortages. I have backed out a few optimizations that I had been using over the last few days in order to be a little more conservative. I'll revisit these again over the next few days as the dust settles. New option: DISABLE_PG_G - In case I missed something.	2002-07-12 07:56:11 +00:00
Alfred Perlstein	d11a56617d	regen for freebsd4_sendfile(2) compat.	2002-07-12 06:52:44 +00:00
Alfred Perlstein	9c34129662	Create a bug-for-bug FreeBSD4 compatible version of sendfile and move the fixed sendfile over. This is needed to preserve binary compatibility from 4.x to 5.x.	2002-07-12 06:51:57 +00:00
Alfred Perlstein	074453c230	Introduce syscall.master option 'COMPAT4' which allows one to wrap syscalls for FreeBSD 4 compatibility. Add kernel option COMPAT_FREEBSD4 to enable these syscalls.	2002-07-12 06:38:34 +00:00
Kenneth D. Merry	a720097d24	Fix compilation with ENABLE_VFS_IOOPT turned on and ZERO_COPY_SOCKETS turned off. Clean up #ifdefs, and remove a bunch of unnecessary includes. Reviewed by: bde Tested by: netchild	2002-07-12 02:23:55 +00:00
Julian Elischer	5e3da64ee9	Remove debugging code that I originally only wanted to be there for a couple of days after merge. Reminded with pointy stick by: jhb	2002-07-11 22:47:58 +00:00
John Baldwin	eb80408cff	Add a missing newline during panic printf's for SMP systems that don't have APICS. (Like all the !i386 archs).	2002-07-11 21:56:37 +00:00
Alan Cox	97646f567d	o Lock accesses to the page queues.	2002-07-11 18:48:05 +00:00
Jonathan Mini	aaa1c7715b	Revert removal of cred_free_thread(): It is used to ensure that a thread's credentials are not improperly borrowed when the thread is not current in the kernel. Requested by: jhb, alfred	2002-07-11 02:18:33 +00:00
Johan Karlsson	92da2e7671	Open accounting file for appending, not general writing. This allows accton(1) to be used with an append-only file. PR: 7169 Reported by: Joao Carlos Mendes Luis <jonny@jonny.eng.br> Reviewed by: bde Approved by: sheldonh (mentor) MFC after: 2 weeks	2002-07-10 17:31:58 +00:00
Matthew Dillon	d331c5d43f	Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc	2002-07-10 17:02:32 +00:00
Julian Elischer	ad22735e3f	Don't slow every syscall and trap by doing locks and stuff if the 'stop' bits are not set. This is a temporary thing.. I think this code probably needs to be rewritten anyhow.	2002-07-10 06:40:22 +00:00
Don Lewis	832dafad3d	Rearrange the code so that it checks whether the file is something valid to write a core dump to before doing the preparations to actually write to the file. Call VOP_GETATTR() before dropping the initial vnode lock.	2002-07-10 06:31:35 +00:00
Maxime Henrion	fbedc80bd1	Remove vfs_stdmount() and vfs_stdunmount(). They are not really useful and are incompatible with nmount.	2002-07-09 22:50:29 +00:00
Jeff Roberson	50bfcee1cb	- Use the new vop_lookup_{pre,post} instead of simpler locking specification.	2002-07-09 19:55:06 +00:00
Jeff Roberson	25b286d6db	- Use standard locking functions in syncer's opv - vput instead of vrele syncer vnodes in vfs_mount - Add vop_lookup_{pre,post} to verify locking in VOP_LOOKUP	2002-07-09 19:54:20 +00:00
John Baldwin	54a033896d	Fix a minor whitespace style nit that broke 'grep ^uuidgen'.	2002-07-09 19:36:50 +00:00
Maxime Henrion	43088e9868	Add a VFS_START() call in vfs_mountroot_try() for the sake of being correct. None of the root mountable filesystems do something at VFS_START(). Shorten a comment to fix a style bug while I'm here. PR: kern/18505	2002-07-08 19:10:15 +00:00
Bruce Evans	f20c8dde15	Fixed some printf format errors (one new one reported by gcc and 3 nearby old ones not reported by gcc). This helps unbreak LINT.	2002-07-08 12:21:11 +00:00
Peter Wemm	a136efe9b6	Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/ pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code and use a single equivalent MI version. There are other cleanups needed still. While here, use the UMA zone hooks to keep a cache of preinitialized proc structures handy, just like the thread system does. This eliminates one dependency on 'struct proc' being persistent even after being freed. There are some comments about things that can be factored out into ctor/dtor functions if it is worth it. For now they are mostly just doing statistics to get a feel of how it is working.	2002-07-07 23:05:27 +00:00
Jeff Roberson	41a5470d03	- Require locks for getattr. At some point this could only require shared locks.	2002-07-07 22:37:45 +00:00
Jeff Roberson	31965a72c9	- Delay unlocking a vnode in linker_hints_lookup until we're actually done with it. - Remove a now stale comment about improper vnode locking.	2002-07-07 22:35:47 +00:00
Peter Wemm	b799f5a475	Make this compile on 64 bit platforms	2002-07-07 22:27:40 +00:00
Jeff Roberson	18c48f437f	- Don't hold the vn lock while calling VOP_CLOSE in vclean().	2002-07-07 06:38:22 +00:00
Jeff Roberson	bed75d4627	- BUF_REFCNT() seems to be the preferred method for verifying a locked buf. Tell vop_strategy_pre() to use this instead. - Ignore B_CLUSTER bufs. Their components are locked but they don't really exist so they don't have to be. This isn't ideal but it is safe.	2002-07-07 05:29:45 +00:00
Jeff Roberson	49244e35ff	Add two asserts that prove & document getblk and geteblk's behavior of returning locked bufs.	2002-07-07 05:27:08 +00:00
Jeff Roberson	c031d11bb4	Fix a mistake in my last commit. Don't grab an extra reference to the object in bp->b_object.	2002-07-06 21:27:20 +00:00
Jeff Roberson	9a236af3ad	Fixup uses of GETVOBJECT. - Cache a pointer to the vnode's object in the buf. - Hold a reference to that object in addition to the vnode's reference just to be consistent. - Cleanup code that got the object indirectly through the vp and VOP calls. This fixes at least one case where we were calling GETVOBJECT without a lock. It also avoids an expensive layered call at the cost of another pointer in struct buf.	2002-07-06 08:59:52 +00:00
Julian Elischer	fe0f1bf4b1	make this repect ps_sigintr if there is a pre-existing signal or suspension request. Submitted by: David Xu	2002-07-06 08:47:24 +00:00
Jeff Roberson	0b2ed1aef7	Clean up execve locking: - Grab the vnode object early in exec when we still have the vnode lock. - Cache the object in the image_params. - Make use of the cached object in imgact_*.c	2002-07-06 07:00:01 +00:00
Jeff Roberson	e818064e98	- Disable original vop_strategy lock specification. - Switch to the new vop_strategy_pre for lock validation. VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need to be translated. There may be other reasons, but as long as the underlying layer uses a VOP to perform the operations they will be caught later.	2002-07-06 05:23:17 +00:00
Jeff Roberson	302c7aaab9	- Add vop_strategy_pre to validate VOP_STRATEGY locking. - Disable original vop_strategy lock specification. - Switch to the new vop_strategy_pre for lock validation. VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need to be translated. There may be other reasons, but as long as the underlying layer uses a VOP to perform the operations they will be caught later.	2002-07-06 05:21:12 +00:00
Jeff Roberson	13e407efee	Use the new #! directive for vop_rename. Leave the old lock specification intact but disabled.	2002-07-06 04:41:27 +00:00
Jeff Roberson	cc8662b0f9	Add "vop_rename_pre" to do pre rename lock verification. This is enabled only with DEBUG_VFS_LOCKS.	2002-07-06 04:39:48 +00:00
Julian Elischer	55fb7ca894	Fix at least one of the things wrong with signals ^Z should work a lot better now. Submitted by: peter@freebsd.org	2002-07-06 02:45:11 +00:00
Andrew Gallatin	a83560d677	Remove the advertising clause from the Duke BSD copyright on the zero-copy files Requested by: rwatson Approved by: Jeff Chase (my old boss at Duke)	2002-07-06 02:44:15 +00:00
Warner Losh	e0b7446484	dd %i as an alias for %d for greater compatibility with our *BSD bretheren Obtained from: NetBSD Reviewed by: jake, rwatson, bosko	2002-07-05 18:36:49 +00:00
Jeff Roberson	2efc89d4dc	Include systm.h before vnode.h so Debugger() and printf() are available when full vnode lock debugging is enabled.	2002-07-05 05:15:30 +00:00
Alan Cox	70c1763634	o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the free queue lock (revision 1.33 of vm/vm_page.c removed them). o Make the free queue lock a spin lock because it's sometimes acquired inside of a critical section.	2002-07-04 22:07:37 +00:00
Maxime Henrion	d7f9ecc86b	Move vfs_rootmountalloc() in vfs_mount.c and remove lite2_vfs_mountroot() which was #if 0'd and is not likely to be used now.	2002-07-03 09:27:24 +00:00
Julian Elischer	aa0fa33464	Try clean up some of the mess that resulted from layers and layers of p4 merges from -current as things started getting different. Corroborated by: Similar patches just mailed by BDE.	2002-07-03 09:15:20 +00:00
Maxime Henrion	563af2ec15	Remove an unused argument in vfs_mountroot().	2002-07-03 08:52:37 +00:00
Julian Elischer	ee9919b024	White space commit. I'm working on this file but I wanted to make the whitespece commit separatly.	2002-07-03 06:15:26 +00:00
Andrew Gallatin	0ac3b6364f	Hold the sched lock across call to forward_signal() in tdsignal() to keep SMP systems from panic'ing when ^C'ing an app suggested by julian	2002-07-03 02:55:48 +00:00
Dag-Erling Smørgrav	b61860ad2d	Add mtx_ prefixes to the fields used for mutex profiling, and fix a bug where the profiling code would report the release point instead of the acquisition point. Requested by: bde	2002-07-03 01:50:27 +00:00
Maxime Henrion	534ab2e108	I didn't pay enough attention when copy/pasting disclaimers. The disclaimer in vfs_conf.c was slightly different. Fix this.	2002-07-02 18:33:32 +00:00
Maxime Henrion	2b4edb69f1	Move every code related to mount(2) in a new file, vfs_mount.c. The file vfs_conf.c which was dealing with root mounting has been repo-copied into vfs_mount.c to preserve history. This makes nmount related development easier, and help reducing the size of vfs_syscalls.c, which is still an enormous file. Reviewed by: rwatson Repo-copy by: peter	2002-07-02 17:09:22 +00:00
Julian Elischer	8b768fc82b	When going back to SLEEP state, make sure our State is correctly marked so.	2002-07-02 05:40:51 +00:00
Julian Elischer	d5cb7e14f6	Fix failure to correctly transition back to sleep mode.	2002-07-02 05:33:46 +00:00
Peter Wemm	c781aea8ba	#include <sys/ktrace.h> would be useful too. (for ktrace_mtx)	2002-07-01 23:18:08 +00:00
Ian Dowse	f2f2285a6a	The jail syscall calls chroot, which is not mpsafe, so put back a mtx_lock(&Giant) around that call. Reviewed by: arr	2002-07-01 20:46:01 +00:00
Peter Wemm	1e9b3d9142	Add #include "opt_ktrace.h"	2002-07-01 19:49:04 +00:00
Ian Dowse	6bd521df93	Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick	2002-07-01 17:59:40 +00:00
Andrew R. Reiter	c0854cd341	- In thread_userret(), remove the Giant locking and unlocking around the call to thread_alloc(). Approved by: julian Reviewed by: jake, jeff	2002-07-01 03:15:16 +00:00
Julian Elischer	7c7a6f22ca	If the process is a zombie, then you must not try dereference the thread because there isn't one. Of course this code only possibly works for single threaded processes anyhow..	2002-06-30 07:50:22 +00:00
Alfred Perlstein	37a6b453c4	Partial backout of 1.318, remove error handling added because it may be incorrect. Requested by: bde	2002-06-30 05:23:58 +00:00
Ian Dowse	37777f4d1f	Add a hashdestroy() function to undo the actions of hashinit().	2002-06-30 02:07:26 +00:00
Alfred Perlstein	97bb78ace2	Fix several style bugs: close up the continued line after removing the cast made the line. space before parentheses in indirect function call. Add an addtional error handler case for the results of callback. Submitted by: bde	2002-06-29 17:58:44 +00:00
Alfred Perlstein	c5e3ef7e1f	Unbreak computation of 'smask' that I broke when removing caddr_t. Submitted by: bde	2002-06-29 17:56:34 +00:00
Julian Elischer	e602ba25fd	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..	2002-06-29 17:26:22 +00:00
Julian Elischer	44990b8cb8	Add files that are new for KSE.	2002-06-29 07:04:59 +00:00
David E. O'Brien	87e1503e2c	Rename the db command lockedvnodes to lockedvnods so that it fits on the help screen and one doens't think we have a lockedvnodesmap command.	2002-06-29 04:45:09 +00:00
Alfred Perlstein	016091145e	more caddr_t removal.	2002-06-29 02:00:02 +00:00
Alfred Perlstein	7f05b0353a	More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.	2002-06-29 01:50:25 +00:00
Alfred Perlstein	69a3693f3e	catch up with mextadd callback taking a void argument instead of a caddr_t.	2002-06-29 01:49:22 +00:00
Alfred Perlstein	802082390b	More caddr_t removal. Change struct knote's kn_hook from caddr_t to void *.	2002-06-29 00:29:12 +00:00
Alfred Perlstein	a551e20e27	nuke more instances of caddr_t	2002-06-29 00:02:01 +00:00
Alfred Perlstein	337f75e11c	m_extadd takes a void (freef)(void , void ) now, not a void (freef)(caddr_t, void *).	2002-06-29 00:01:46 +00:00
Alfred Perlstein	64f0b9d749	remove or replace caddr_t with void. make the mbuf external free function take a void * rather than caddr_t.	2002-06-28 23:48:23 +00:00
Alfred Perlstein	210a5a7169	nuke caddr_t.	2002-06-28 23:17:36 +00:00
Alfred Perlstein	a788442584	Remove unneeded casts to caddr_t.	2002-06-28 23:02:38 +00:00
Alfred Perlstein	52545a237b	document that the pipe fo_stat routine doesn't need locks because it's a read operation. Requested by: rwatson	2002-06-28 22:35:12 +00:00
Jeff Roberson	90769c9ed0	Improve the VOP locking asserts - Add vfs_badlock_print to control whether or not we print lock violations - Add vfs_badlock_panic to control whether we panic on lock violations Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.	2002-06-28 20:58:14 +00:00
Ian Dowse	84b2995b2f	In vn_mkdir(), use vrele() instead of vput() on the parent directory vnode in the case that the target exists and is the same vnode as the parent (i.e. "mkdir ."). The namei() call does not leave the vnode locked in this case even though you might expect it to. This bug was mostly harmless in practice because unlocking an already unlocked vnode currently does not trigger any panics or warnings. Reviewed by: jeff	2002-06-28 20:06:47 +00:00
Jeff Roberson	5c71bc6cf2	Clean up vn_rdwr locking. - Do shared locks on read. - Only do vn_{start,finished}_write when writing.	2002-06-28 17:51:11 +00:00
Brian Feldman	aac12bcfbc	Fix a case where a vnode got explicitly unlocked after the pointer to it got set to NULL. Revision 1.355: in the box	2002-06-28 16:17:47 +00:00
Luigi Rizzo	d26d355f0e	Remove a printf and add a comment on an assumption that could be occasionally violated by device drivers.	2002-06-27 23:23:04 +00:00
Robert Watson	600c1a5a8e	Fix a bug that prevented the deletion of non-default ACLs from being passed down the VFS stack. While I'm here, replace a '0' with a 'NULL' to make the code more readable. Sponsored by: DARPA, NAI Labs Obtained from: TrustedBSD Project	2002-06-27 19:31:15 +00:00
Robert Watson	cbeb840245	A bit of whitespace magic.	2002-06-27 19:30:11 +00:00
Kenneth D. Merry	98cb733c67	At long last, commit the zero copy sockets code. MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.	2002-06-26 03:37:47 +00:00
Andrew R. Reiter	e024f583de	- Remove Giant acquisition from modevent(), modfnext(), modstat() and modfind(). Giant is no longer needed by these functions for safe execution. Reviewed by: jhb	2002-06-26 00:31:44 +00:00
Andrew R. Reiter	4e77f68011	- Alleviate jail() from having the burden of acquiring Giant by simply removing. We can do this since we no longer need Giant to safely execute jail(). Reviewed by: rwatson, jhb	2002-06-26 00:29:01 +00:00
Alan Cox	366838ddfe	o Eliminate vmspace::vm_minsaddr. It's initialized but never used. o Replace stale comments in vmspace by "const until freed" annotations on some fields.	2002-06-25 18:14:38 +00:00
Jake Burkholder	8ba3d077ff	Add an MD callout like cpu_exit, but which is called after sched_lock is obtained, when all other scheduling activity is suspended. This is needed on sparc64 to deactivate the vmspace of the exiting process on all cpus. Otherwise if another unrelated process gets the exact same vmspace structure allocated to it (same address), its address space will not be activated properly. This seems to fix some spontaneous signal 11 problems with smp on sparc64.	2002-06-24 15:48:02 +00:00
Maxime Henrion	0ae037eb45	Bring sys/kern/md5c.c in sync with the userland version. Add a comment so that people don't forget to keep the version in src/lib/libmd/md5c.c in sync with this one. This fixes a warning on sparc64. Reviewed by: phk	2002-06-24 14:15:25 +00:00
Kirk McKusick	d374764fd3	Use proper size in bzero of stat structure. Submitted by: Jake Burkholder <jake@locore.ca> Sponsored by: DARPA & NAI Labs.	2002-06-24 07:14:44 +00:00
Jonathan Mini	01ad8a53db	Remove unused diagnostic function cread_free_thread(). Approved by: alfred	2002-06-24 06:22:00 +00:00
Matthew Dillon	727300861d	I Noticed a defect in the way wakeup() scans the tailq. Tor noticed an even worse defect in wakeup_one(). This patch cleans up both. Submitted by: tegge MFC after: 3 days	2002-06-24 00:14:36 +00:00
Maxime Henrion	0a84e7c8f7	More 64 bits platforms warning fixes. Reviewed by: rwatson	2002-06-23 18:32:39 +00:00
Kirk McKusick	6524dddcd5	This patch fixes a size problem with the stat structure for 64-bit architectures that was introduced in the UFS2 code merge two days ago. The stat structure change that caused the problem was the addition of the file create time. Submitted by: Bruce Evans <bde@zeta.org.au> Sponsored by: DARPA & NAI Labs.	2002-06-22 22:01:13 +00:00
Maxime Henrion	2853bfa0df	We don't need to check the return value of malloc() against NULL when the M_WAITOK flag is specified.	2002-06-22 21:44:11 +00:00
Matthew Dillon	ed22d6e948	Fix a bug in vfs_bio_clrbuf(). The single-page-clrbuf optimization was improperly clearing more then just the invalid portions of the page. (This bug is not known to have been triggered by anything). Submitted by: tegge MFC after: 7 days	2002-06-22 19:09:35 +00:00
Maxime Henrion	cacd1c9b49	o Remove the initialization of unused fields in the struct uio now that we don't use uiomove() anymore. o Enforce stricter checks on the length of the iov's in nmount(2) since we now malloc() them individually and corrupted iov's could make the kernel crash in malloc() with "kmem_map too small". Reviewed by: phk	2002-06-22 18:07:05 +00:00
Jonathan Mini	9718382d85	Always drop the p_args reference we held for copyout, even if we're about to change it. This fixes a leak triggered by setproctitle(3). Approved by: alfred Noticed by: Peter Jeremy <peter.jeremy@alcatel.com.au>	2002-06-22 10:05:50 +00:00
Kirk McKusick	1c85e6a35d	This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>	2002-06-21 06:18:05 +00:00
Maxime Henrion	7d2d440991	Change the way we internally store the mount options to a linked list. This is to allow the merging of the mount options in the MNT_UPDATE case, as the current data structure is unsuitable for this. There are no functional differences in this commit. Reviewed by: phk	2002-06-20 20:03:42 +00:00
Alfred Perlstein	c33c825169	Implement SO_NOSIGPIPE option for sockets. This allows one to request that an EPIPE error return not generate SIGPIPE on sockets. Submitted by: lioux Inspired by: Darwin	2002-06-20 18:52:54 +00:00
Alfred Perlstein	69be5db96f	Don't leak resources if fdcheckstd() fails during exec. Submitted by: Mike Makonnen <makonnen@pacbell.net>	2002-06-20 17:27:28 +00:00
Ian Dowse	99568bcaf7	Display the mutex name in the ^T status line if the selected thread is blocked on a mutex. Prepend a '*' to distinguish this case as is done in top(1).	2002-06-20 14:03:36 +00:00
Peter Wemm	9d04103b7c	Remove UIO_USERISPACE - we do not support any split instruction/data address space machines (eg: pdp-11) and are not likely to ever do so. Nothing in our kernel sets this.	2002-06-20 07:08:43 +00:00
Peter Wemm	2f9267ec23	Move the "- 1" into the RQB_FFS(mask) macro itself so that implementations can provide a base zero ffs function if they wish. This changes #define RQB_FFS(mask) (ffs64(mask)) foo = RQB_FFS(mask) - 1; to #define RQB_FFS(mask) (ffs64(mask) - 1) foo = RQB_FFS(mask); On some platforms we can get the "- 1" for free, eg: those that use the C code for ffs64(). Reviewed by: jake (in principle)	2002-06-20 06:21:20 +00:00
Andrew R. Reiter	2eb7b21b00	- Remove the lock(9) protecting the kernel linker system. - Added a mutex, kld_mtx, to protect the kernel_linker system. Note that while ``classes'' is global (to that file), it is only read only after SI_SUB_KLD, SI_ORDER_ANY. - Add a SYSINIT to flip a flag that disallows class registration after SI_SUB_KLD, SI_ORDER_ANY. Idea for ``classes'' read only by: jake Reviewed by: jake	2002-06-19 21:25:59 +00:00
Poul-Henning Kamp	c4bacc1871	Remove the compat bits for the mis-aligned struct disklabel on alpha, people got three times longer than I promised. Sponsored by: DARPA & NAI Labs.	2002-06-19 08:37:02 +00:00
Alfred Perlstein	1419eacb86	Squish the "could sleep with process lock" messages caused by calling uifind() with a proc lock held. change_ruid() and change_euid() have been modified to take a uidinfo structure which will be pre-allocated by callers, they will then call uihold() on the uidinfo structure so that the caller's logic is simplified. This allows one to call uifind() before locking the proc struct and thereby avoid a potential blocking allocation with the proc lock held. This may need revisiting, perhaps keeping a spare uidinfo allocated per process to handle this situation or re-examining if the proc lock needs to be held over the entire operation of changing real or effective user id. Submitted by: Don Lewis <dl-freebsd@catspoiler.org>	2002-06-19 06:39:25 +00:00
Alfred Perlstein	f2102dadf9	setsugid() touches p->p_flag so assert that the proc is locked.	2002-06-18 22:41:35 +00:00
Seigo Tanimura	03e4918190	Remove so*_locked(), which were backed out by mistake.	2002-06-18 07:42:02 +00:00
Maxime Henrion	fe93750656	Change vfs_copyopt() so that the length argument passed to it must be the exact same size as the mount option. This makes vfs_copyopt() much more useful.	2002-06-14 20:04:21 +00:00
Bosko Milekic	ad1026f912	Set system_map for both mbuf_map and clust_map to 1, in mbuf_init(). Submitted by: Tor Egge (tegge) Pointed out to me by: hsu	2002-06-13 23:53:42 +00:00
Robert Watson	6480dc743e	Regen.	2002-06-13 23:44:50 +00:00
Robert Watson	65772a1a0a	Keep POSIX.1e capabilities system call placeholders, but remove definitions.	2002-06-13 23:43:53 +00:00
Robert Watson	a3cce19f7d	kern_cap.c no longer needed.	2002-06-13 23:19:34 +00:00
Robert Watson	4aaae52d99	opt_cap.c no longer needed	2002-06-13 23:17:39 +00:00
Kelly Yancey	9ae6d334da	Make nselcol, the number of select collisions since boot, unsigned as negative collisions simply doesn't make sense. PR: (one small part of) 19720 Approved by: alfred	2002-06-12 02:08:18 +00:00
Kelly Yancey	e3f0c5755c	Time counter stats are unsigned, advertise them to sysctl(8) that way. PR: (one small part of) 19720 Approved by: phk	2002-06-11 19:47:44 +00:00
Kelly Yancey	3316a80bd1	Convert hit and miss counters to unsigned values. Surely negative values for either does not make sense. PR: (one small part of) 19720	2002-06-10 22:40:26 +00:00
John Baldwin	d0c149fce8	We no longer need to acqure Giant in ast() for ktrpsig() in postsig() now that ktrace no longer needs Giant.	2002-06-07 05:43:40 +00:00
John Baldwin	374a15aa55	- trapsignal() no longer needs to acquire Giant for ktrpsig(). - Catch up to new ktrace API.	2002-06-07 05:43:02 +00:00
John Baldwin	af300f2367	- Proper locking for p_tracep and p_traceflag. - Catch up to new ktrace API.	2002-06-07 05:42:25 +00:00
John Baldwin	6c84de02e0	Properly lock accesses to p_tracep and p_traceflag. Also make a few ktrace-only things #ifdef KTRACE that were not before.	2002-06-07 05:41:27 +00:00
John Baldwin	9ba7fe1b76	- Catch up to new ktrace API. - ktrace trace points in msleep() and cv_wait() no longer need Giant.	2002-06-07 05:39:16 +00:00
John Baldwin	60a9bb197d	Catch up to changes in ktrace API.	2002-06-07 05:37:18 +00:00
John Baldwin	ea3fc8e4cd	Overhaul the ktrace subsystem a bit. For the most part, the actual vnode operations to dump a ktrace event out to an output file are now handled asychronously by a ktrace worker thread. This enables most ktrace events to not need Giant once p_tracep and p_traceflag are suitably protected by the new ktrace_lock. There is a single todo list of pending ktrace requests. The various ktrace tracepoints allocate a ktrace request object and tack it onto the end of the queue. The ktrace kernel thread grabs requests off the head of the queue and processes them using the trace vnode and credentials of the thread triggering the event. Since we cannot assume that the user memory referenced when doing a ktrgenio() will be valid and since we can't access it from the ktrace worker thread without a bit of hassle anyways, ktrgenio() requests are still handled synchronously. However, in order to ensure that the requests from a given thread still maintain relative order to one another, when a synchronous ktrace event (such as a genio event) is triggered, we still put the request object on the todo list to synchronize with the worker thread. The original thread blocks atomically with putting the item on the queue. When the worker thread comes across an asynchronous request, it wakes up the original thread and then blocks to ensure it doesn't manage to write a later event before the original thread has a chance to write out the synchronous event. When the original thread wakes up, it writes out the synchronous using its own context and then finally wakes the worker thread back up. Yuck. The sychronous events aren't pretty but they do work. Since ktrace events can be triggered in fairly low-level areas (msleep() and cv_wait() for example) the ktrace code is designed to use very few locks when posting an event (currently just the ktrace_mtx lock and the vnode interlock to bump the refcoun on the trace vnode). This also means that we can't allocate a ktrace request object when an event is triggered. Instead, ktrace request objects are allocated from a pre-allocated pool and returned to the pool after a request is serviced. The size of this pool defaults to 100 objects, which is about 13k on an i386 kernel. The size of the pool can be adjusted at compile time via the KTRACE_REQUEST_POOL kernel option, at boot time via the kern.ktrace_request_pool loader tunable, or at runtime via the kern.ktrace_request_pool sysctl. If the pool of request objects is exhausted, then a warning message is printed to the console. The message is rate-limited in that it is only printed once until the size of the pool is adjusted via the sysctl. I have tested all kernel traces but have not tested user traces submitted by utrace(2), though they should work fine in theory. Since a ktrace request has several properties (content of event, trace vnode, details of originating process, credentials for I/O, etc.), I chose to drop the first argument to the various ktrfoo() functions. Currently the functions just assume the event is posted from curthread. If there is a great desire to do so, I suppose I could instead put back the first argument but this time make it a thread pointer instead of a vnode pointer. Also, KTRPOINT() now takes a thread as its first argument instead of a process. This is because the check for a recursive ktrace event is now per-thread instead of process-wide. Tested on: i386 Compiles on: sparc64, alpha	2002-06-07 05:32:59 +00:00
John Baldwin	48849938e8	Change the all locks list from a STAILQ to a TAILQ. This bloats struct lock_object by another pointer (though all of lock_object should be conditional on LOCK_DEBUG anyways) in exchange for an O(1) TAILQ_REMOVE() in witness_destroy() (called for every mtx_destroy() and sx_destroy()) instead of an O(n) STAILQ_REMOVE. Since WITNESS is so dog slow as it is, the speed-up is worth the space cost. Suggested by: iedowse	2002-06-06 20:51:04 +00:00
Chad David	ca18d53eae	s/!SIGNOTEMPY/SIGISEMPTY/ Reviewed by: marcel, jhb, alfred	2002-06-06 19:12:41 +00:00
John Baldwin	8dcb900b62	Handle "dead" witnesses better in the situation of several short term locks being created and destroyed without a single long-term one around to ensure the witness associated with that group of locks stays alive. The pipe mutexes are an example of this group. For a dead witness we no longer clear the witness name. Instead, when looking up the witness for a lock, if a dead witness' (a witness with a refcount of 0) w_name pointer is identical to the witness name of the lock then we revive that witness instead of using a new witness for the lock. This results in far fewer dead witness objects and also better preserves locking orders over the long term resulting in more correct lock order checking. Note that we can't ever derefence w_name of a dead witness since we don't know if the string it is pointing to has been free()'d or kldunload()'d out from under us.	2002-06-06 19:04:38 +00:00
Dag-Erling Smørgrav	edad3af28d	Move some sysctls from the debug tree to the vfs tree.	2002-06-06 15:50:22 +00:00
Dag-Erling Smørgrav	4a357a32e0	Gratuitous whitespace cleanup.	2002-06-06 15:46:38 +00:00
Poul-Henning Kamp	53e645d90e	Use "bwrbg" as description when we sleep for background writing, "biord" was misleading in every possible way.	2002-06-06 08:56:10 +00:00
Bruce Evans	6438c894da	Fixed overflow in the bounds checking in dscheck(). It assumed that daadr_t is no larger than a long, and some other relatively harmless things (blush). Overflow for subtracting a daddr_t from a u_long caused "truncation" of the i/o for attempts to access blocks beyond the end of the actually cause expansion of the i/o to a preposterous size.	2002-06-06 00:35:07 +00:00
John Baldwin	6a95e08f2f	Replace thread_runnable() with thread_running() as the latter is more accurate. Suggested by: julian	2002-06-04 22:36:24 +00:00
John Baldwin	7fcca6096f	Optimize the adaptive mutex spin a bit. Use a simple while loop with simple reads (and on IA32, a "pause" instruction for each interation of the loop) to spin until either the mutex owner field changes, or the lock owner stops executing. Suggested by: tanimura Tested on: i386	2002-06-04 21:53:48 +00:00
John Baldwin	5853d37d3b	Add a private thread_runnable() macro to make the code more readable and make the KSE diff easier to maintain.	2002-06-04 21:50:02 +00:00
Dag-Erling Smørgrav	f16a5176ad	ANSIfy the one remaining K&R function.	2002-06-02 21:57:28 +00:00
Dag-Erling Smørgrav	e89efc02e0	Whitespace nits.	2002-06-02 21:55:58 +00:00
Dag-Erling Smørgrav	3b1f7e7de0	Add support for 'j' flag. Simplify the size modifier code and reduce code duplication. Also add support for 'n' specifier. Reviewed by: bde	2002-06-02 21:54:55 +00:00
Jens Schweikhardt	21dc7d4f57	Fix typo in the BSD copyright: s/withough/without/ Spotted and suggested by: des MFC after: 3 weeks	2002-06-02 20:05:59 +00:00
Mike Barcroft	6ee093fb8f	Add POSIX.1-2001 WCONTINUED option for waitpid(2). A proc flag (P_CONTINUED) is set when a stopped process receives a SIGCONT and cleared after it has notified a parent process that has requested notification via waitpid(2) with WCONTINUED specified in its options operand. The status value can be checked with the new WIFCONTINUED() macro. Reviewed by: jake	2002-06-01 18:37:46 +00:00
Archie Cobbs	48d183faca	Fix a bug in m_split(): the "m->m_ext.ext_size" field of an mbuf was being set to zero. This field indicates the total space in the external buffer and therefore should not be modified after the external buffer is added. Add a comment warning that the mbufs returned by m_split() might be read-only. Fix M_TRAILINGSPACE() to return zero if !M_WRITABLE(m). Reviewed by: freebsd-net Obtained from: Vernier Networks, Inc. MFC after: 1 week	2002-05-31 22:09:57 +00:00
Dag-Erling Smørgrav	7aa57dca57	Nit: kern.ttys is of type S,xtty, not S,tty.	2002-05-31 16:11:49 +00:00
Seigo Tanimura	4cc20ab1f0	Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu	2002-05-31 11:52:35 +00:00
Robert Drehmel	280759e75e	- Replace the bandaid introduced in revision 1.110 with a better solution. - Add braces for a ``for'' statement containing a single multi-line statement.	2002-05-31 09:41:09 +00:00
Poul-Henning Kamp	eef633a71f	Mistyped and lost a '&' in previous commit.	2002-05-30 16:26:39 +00:00
Poul-Henning Kamp	fe71224650	Don't forget to factor in the boottime when we calculate PPS timestamps. Submitted by: Akira Watanabe <akira@myaw.ei.meisei-u.ac.jp>	2002-05-30 10:34:01 +00:00
Jeff Roberson	7181624aaa	Record the file, line, and pid of the last successful shared lock holder. This is useful as a last effort in debugging file system deadlocks. This is enabled via 'options DEBUG_LOCKS'	2002-05-30 05:55:22 +00:00
Julian Elischer	628855e758	CURSIG() is not a macro so rename it cursig(). Obtained from: KSE tree	2002-05-29 23:44:32 +00:00
Julian Elischer	2d0231f5da	diff reduction from KSE to keep WW-III from happenning on -current	2002-05-29 20:40:50 +00:00
Dag-Erling Smørgrav	6b658142fd	Add some checks to prevent NULL dereferences. Submitted by: jhay	2002-05-28 14:29:56 +00:00
Maxime Henrion	8eb0098f4c	Remove a duplicated vfs_freeopts() that I introduced in last revision.	2002-05-28 13:27:55 +00:00
Dag-Erling Smørgrav	6c533ac713	Add NAI copyright.	2002-05-28 06:53:41 +00:00
Marcel Moolenaar	52183d0145	Add uuidgen(2) and uuidgen(1). The uuidgen command, by means of the uuidgen syscall, generates one or more Universally Unique Identifiers compatible with OSF/DCE 1.1 version 1 UUIDs. From the Perforce logs (change 11995): Round of cleanups: o Give uuidgen() the correct prototype in syscalls.master o Define struct uuid according to DCE 1.1 in sys/uuid.h o Use struct uuid instead of uuid_t. The latter is defined in sys/uuid.h but should not be used in kernel land. o Add snprintf_uuid(), printf_uuid() and sbuf_printf_uuid() to kern_uuid.c for use in the kernel (currently geom_gpt.c). o Rename the non-standard struct uuid in kern/kern_uuid.c to struct uuid_private and give it a slightly better definition for better byte-order handling. See below. o In sys/gpt.h, fix the broken uuid definitions to match the now compliant struct uuid definition. See below. o In usr.bin/uuidgen/uuidgen.c catch up with struct uuid change. A note about byte-order: The standard failed to provide a non-conflicting and unambiguous definition for the binary representation. My initial implementation always wrote the timestamp as a 64-bit little-endian (2s-complement) integral. The clock sequence was always written as a 16-bit big-endian (2s-complement) integral. After a good nights sleep and couple of Pan Galactic Gargle Blasters (not necessarily in that order :-) I reread the spec and came to the conclusion that the time fields are always written in the native by order, provided the the low, mid and hi chopping still occurs. The spec mentions that you "might need to swap bytes if you talk to a machine that has a different byte-order". The clock sequence is always written in big-endian order (as is the IEEE 802 address) because its division is resulting in bytes, making the ordering unambiguous.	2002-05-28 06:16:08 +00:00
Marcel Moolenaar	494eefd86b	Add syscall uuidgen() for generating Univerally Unique Identifiers (UUIDs). On ia64 UUIDs, aka GUIDs, are used by EFI and the firmware among others. To create GUID Partition Tables (GPTs), we need to be able to generate UUIDs.	2002-05-28 05:58:06 +00:00
Dag-Erling Smørgrav	1a149fcd67	Introduce struct xtty, used when exporting tty information to userland. Make kern.ttys export a struct xtty rather than struct tty. Since struct tty is no longer exposed to userland, remove the dev_t / udev_t hack. Sponsored by: DARPA, NAI Labs	2002-05-28 05:40:53 +00:00
Alan Cox	a739e09c6e	o Remove some unnecessary casting from and add some necessary casting to aio_suspend() and lio_listio(). Submitted by: bde	2002-05-25 18:39:42 +00:00
Dag-Erling Smørgrav	4b4c18f861	ANSIfy (significant portions were already partly ANSIfied)	2002-05-25 15:52:53 +00:00
Dag-Erling Smørgrav	b7457aabf6	Remove register.	2002-05-25 15:44:38 +00:00
Dag-Erling Smørgrav	dedf14f521	Automated whitespace cleanup.	2002-05-25 15:43:06 +00:00
Jake Burkholder	d2ac231616	Make the run queue parameters machine dependent. Optimize 64 bit architectures by using a 64 bit word for the bit array which keeps track of non-empty queues. Reviewed by: peter	2002-05-25 01:12:23 +00:00
Peter Wemm	34e3110c70	Fix warnings. Also, removed an unused variable that I found that was just initialized and never used afterwards.	2002-05-24 06:06:18 +00:00
Maxime Henrion	2274ec995c	Style nit, no functional changes.	2002-05-23 23:22:22 +00:00
Maxime Henrion	9ee6bf717f	Slightly change the way we pass mount options to the filesystem VFS_NMOUNT operations. Reviewed by: phk	2002-05-23 23:02:19 +00:00
Hajimu UMEMOTO	4b562eede1	In m_aux_delete, no need to chase beyond victim. Submitted by: archie Obtained from: KAME	2002-05-23 15:59:48 +00:00
John Baldwin	cc5d39f81e	Minor nit: get p pointer in msleep() from td->td_proc (where td == curthread) rather than from curproc.	2002-05-23 04:14:18 +00:00
John Baldwin	a79c98fa98	Whitespace: trim a trailing tab.	2002-05-23 04:12:28 +00:00
Dag-Erling Smørgrav	db586c8b7c	Make the counters uintmax_ts, and use %ju rather than %llu.	2002-05-23 03:08:42 +00:00
John Baldwin	6b8c698908	Rename pause() to ia32_pause() so it doesn't conflict with the pause() function defined in <unistd.h>. I didn't #ifdef _KERNEL it because the mutex implementation in libpthread will probably need this.	2002-05-22 20:32:39 +00:00
John Baldwin	0228ea4e0b	Rename cpu_pause() to pause(). Originally I was going to make this an MI API with empty cpu_pause() functions on other arch's, but this functionality is definitely unique to IA-32, so I decided to leave it as i386-only and wrap it in #ifdef's. I should have dropped the cpu_ prefix when I made that decision. Requested by: bde	2002-05-22 13:19:22 +00:00
John Baldwin	703fc290fb	Add appropriate IA32 "pause" instructions to improve performanec on Pentium 4's and newer IA32 processors. The "pause" instruction has been verified by Intel to be a NOP on all currently existing IA32 processors prior to the Pentium 4.	2002-05-21 22:26:35 +00:00
Andrew R. Reiter	ec41816009	- td will never be NULL, so the call to soalloc() in socreate() will always be passed a 1; we can, however, use M_NOWAIT to indicate this. - Check so against NULL since it's a pointer to a structure.	2002-05-21 21:30:44 +00:00
John Baldwin	0e54ddadd9	Fix an old cut 'n' paste bug inherited from BSD/OS: don't increment 'i' twice once we are in the long wait stage of spinning on a spin mutex.	2002-05-21 21:27:05 +00:00
Andrew R. Reiter	1515cd22e1	- OR the flag variable with M_ZERO so that the uma_zalloc() handles the zero'ing out of the allocated memory. Also removed the logical bzero that followed.	2002-05-21 21:18:41 +00:00
John Baldwin	e6302957fe	Whitespace fixup, properly indent the body of an else clause.	2002-05-21 21:13:27 +00:00
John Baldwin	2498cf8c42	Add code to make default mutexes adaptive if the ADAPTIVE_MUTEXES kernel option is used (not on by default). - In the case of trying to lock a mutex, if the MTX_CONTESTED flag is set, then we can safely read the thread pointer from the mtx_lock member while holding sched_lock. We then examine the thread to see if it is currently executing on another CPU. If it is, then we keep looping instead of blocking. - In the case of trying to unlock a mutex, it is now possible for a mutex to have MTX_CONTESTED set in mtx_lock but to not have any threads actually blocked on it, so we need to handle that case. In that case, we just release the lock as if MTX_CONTESTED was not set and return. - We do not adaptively spin on Giant as Giant is held for long times and it slows SMP systems down to a crawl (it was taking several minutes, like 5-10 or so for my test alpha and sparc64 SMP boxes to boot up when they adaptively spinned on Giant). - We only compile in the code to do this for SMP kernels, it doesn't make sense for UP kernels. Tested on: i386, alpha, sparc64	2002-05-21 20:47:11 +00:00
John Baldwin	e8fdcfb57a	Optimize spin mutexes for UP kernels without debugging to just enter and exit critical sections. We only contest on a spin mutex on an SMP kernel running on an SMP machine.	2002-05-21 20:34:28 +00:00
John Baldwin	525c135972	In witness_unlock(), when updating a lock list entry bucket, decrement the count of lock list entries after we fixup the bucket of lock list entries. In theory we can remove the intr_disable/intr_restore() calls now.	2002-05-20 19:16:22 +00:00
Jake Burkholder	45eefe7176	Add a bandaid so that sysctl kern.malloc works on sparc64.	2002-05-20 18:29:37 +00:00
John Baldwin	bbd296aba6	- Allow witness_sleep() to be called when witness hasn't been initialized yet. We just return without performing any checks. - Don't explicitly enter and exit critical sections when walking lock lists. We don't need a critical section to walk the list of sleep locks for a thread. We check to see if a spin lock list is empty before we walk it. If the list is empty we don't need to walk it. If it isn't then we already hold at least one spin lock and are already in a critical section and thus don't need our own explicit critical section.	2002-05-20 17:49:46 +00:00
John Baldwin	42e498655d	Fix the td_intr_nesting_level check to work ok if a flag like M_ZERO is passed in with M_WAITOK to malloc().	2002-05-20 17:46:57 +00:00
Mike Silbersack	184fec1a09	Subtle fix to the accept filter LRU code. In some cases, a newly initialized socket with no qlimit was being passed in. In order to handle this case properly, we must not use >= when comparing queue sizes to qlimit. As a result of this improper handling, a panic could result in certain cases. PR: 38325 MFC after: 3 days	2002-05-20 17:34:31 +00:00
Maxime Henrion	e9e705b0df	Change two vput() that should have been vrele(). Submitted by: iedowse	2002-05-20 14:59:43 +00:00
Seigo Tanimura	243917fe3b	Lock down a socket, milestone 1. o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred	2002-05-20 05:41:09 +00:00
Marcel Moolenaar	a9b4acea06	All signals can be sent to the inferior process when it's restarted, not just the legacy ones. PR: 33299 Submitted by: Alexander N. Kabaev <ak03@gte.com>	2002-05-19 01:37:43 +00:00
John Baldwin	f44d9e24fb	Change p_can{debug,see,sched,signal}()'s first argument to be a thread pointer instead of a proc pointer and require the process pointed to by the second argument to be locked. We now use the thread ucred reference for the credential checks in p_can*() as a result. p_canfoo() should now no longer need Giant.	2002-05-19 00:14:50 +00:00
John Baldwin	bdc9a8d01b	Now that daddr_t has grown up, use %lld to printf it and cast it to long long.	2002-05-18 23:46:04 +00:00
Poul-Henning Kamp	e96d018d92	Use btodb() macro. Sponsored by: DARPA & NAI Labs.	2002-05-18 09:34:09 +00:00
Eric Melville	096a727e41	Separate "seperate" from kernel source.	2002-05-16 22:43:20 +00:00
Tom Rhodes	d394511de3	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
Maxime Henrion	34e53231d0	o Fix vfs_copyopt(), the first argument to bcopy() is the source, not the destination. o Remove some code from vfs_getopt() which was making the interface more complicated to use for a very slight gain.	2002-05-16 17:09:41 +00:00
Robert Watson	661016419c	p_cansignal() returns an errno value; at some point, the check for inter-process signalling ceased to preserve and return that value, instead always returning EPERM. This meant that it was possible to "probe" the pid space for processes that were not otherwise visible. This change reverts that reversion. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-05-14 23:07:15 +00:00
Jeff Roberson	0e2d6cc899	Disable the shared locking namei() code for now. It breaks several stacking filesystems. This is on hold until the rest of VFS Locking is reviewed and deemed safe. It can be enabled with 'options LOOKUP_SHARED'.	2002-05-14 21:59:49 +00:00
Dag-Erling Smørgrav	733c328439	Remove a printf(3) argument with no corresponding format specifier.	2002-05-14 18:28:06 +00:00
Poul-Henning Kamp	98b0c78978	Make daddr_t and u_daddr_t 64bits wide. Retire daddr64_t and use daddr_t instead. Sponsored by: DARPA & NAI Labs.	2002-05-14 11:09:43 +00:00
Poul-Henning Kamp	77068a7fe2	Retire the bogus uses of the disklabel field d_sbsize and begin to initialize it to zero so we don't have to have everbody and their aunt including FFS specific header files. Sponsored by: DARPA & NAI Labs.	2002-05-12 20:49:41 +00:00
Marcel Moolenaar	882c6b1e5a	Fix alpha build. The alpha has dumpsys implemented. While here, revert the condition to list the machines for which dumpsys has not been implemented. Reported by: wilko	2002-05-12 18:27:28 +00:00
Mike Silbersack	a9caffba47	Change the mbuf exhaustion warning message to match the message in -stable.	2002-05-09 20:21:07 +00:00
Jonathan Mini	d8f4f6a404	Remove trace_req(). Reviewed by: alfred, jhb, peter	2002-05-09 04:13:41 +00:00
Alan Cox	82641acd17	o Correct an error made in revision 1.65: In readv(), if uap->iovcnt is out-of-range, drop the file reference before returning. (This error also exists in the RELENG_4 branch.) o Eliminate the acquisition and release of Giant in readv() now that malloc() and free() are callable without Giant.	2002-05-09 02:30:41 +00:00
Alfred Perlstein	8b43b53530	expand_name fixes: .) don't use MAXPATHLEN + 1, fix logic to compensate. .) style(9) function parameters. .) fix line wrapping. .) remove duplicated error and string handling code. .) don't NUL terminate already NUL terminated string. .) all string length variables changed from int to size_t. .) constify variables. .) catch when corename would be truncated. .) cast pid_t and uid_t args for format string. .) add parens around return arguments. Help and suggestions from: bde	2002-05-08 09:06:47 +00:00
Jake Burkholder	0cce52f8eb	Remove runq_findproc. This never worked right in the first place and can be prohibitively expensive.	2002-05-08 04:39:49 +00:00
Alfred Perlstein	b2bc3101a8	M_ZERO the temp buffer in expand_name() otherwise if an error occurs while logging we may pass a non NUL terminated string to log(9) for a %s format arg.	2002-05-07 23:37:07 +00:00
Peter Wemm	0d93809e04	Re-remove kern_random.c and svr4_signal.c. Somehow dillon managed to keep on committing to these while they were in the Attic after they had been removed. I think this was because he had the file checked out and already 'modified' while markm cvs rm'ed them, and cvs screws up when trying to "merge" the modifications with the "rm". And after that the client state was sufficiently hosed to keep it messed up. Yay CVS! (CVS is very fragile for adding and removing files remotely) The existence of these files was pointed out by: ru	2002-05-07 21:54:47 +00:00

... 4 5 6 7 8 ...

5396 Commits