freebsd-nq

Author	SHA1	Message	Date
Poul-Henning Kamp	7c5d36fb80	Remove vop_stddestroyvobject()	2005-02-07 09:26:39 +00:00
Poul-Henning Kamp	b348abd6cd	Don't call VOP_DESTROYVOBJECT(), trust that VOP_RECLAIM() did what was necessary.	2005-02-07 07:48:03 +00:00
Poul-Henning Kamp	5937226d51	Add a missing prefix to a struct field for consistency.	2005-02-07 07:40:39 +00:00
Ian Dowse	98c926b20f	Add a mechanism for associating a mutex with a callout when the callout is first initialised, using a new function callout_init_mtx(). The callout system will acquire this mutex before calling the callout function and release it on return. In addition, the callout system uses the mutex to avoid most of the complications and race conditions inherent in asynchronous timer facilities, so mutex-protected callouts have much simpler semantics. As long as the mutex is held when invoking callout_stop() or callout_reset(), then these functions will guarantee that the callout will be stopped, even if softclock() had already begun to process the callout. Existing Giant-locked callouts will automatically pick up the new race-free semantics. This should close a number of race conditions in the USB code and probably other areas of the kernel too. There should be no change in behaviour for "MP-safe" callouts; these still need to use the techniques mentioned in timeout(9) to avoid race conditions.	2005-02-07 02:47:33 +00:00
Nate Lawson	88c9b54c47	Add support for relative cpufreq drivers. Such drivers modulate clock frequency as a percentage of the base rate and do not change the base rate directly. The cpufreq framework combines these with absolute drivers to produce synthesized levels made of one or more settings.	2005-02-06 21:08:35 +00:00
Jeff Roberson	8364446643	- Don't release BKGRDINPROG until after we've bufdone'd the copy. Sponsored by: Isilon Systems, Inc.	2005-02-05 01:26:14 +00:00
Jeff Roberson	42a29039de	- Add ke_runq == NULL to the conditions which will cause us to abort adjusting timeshare loads in sched_class(). This is only important if the thread has never run, otherwise the state checks should work as expected.	2005-02-04 17:22:46 +00:00
Suleiman Souhlal	339a7e7fbb	Set the scheduling class of the idle threads to PRI_IDLE. While there, set their priority with sched_prio() instead of changing it 'by hand'. Reviewed by: jhb Approved by: grehan (mentor)	2005-02-04 06:16:05 +00:00
Nate Lawson	73347b071d	Add the cpufreq framework. This code manages multiple drivers and presents a unified kernel and user interface for controlling cpu frequencies.	2005-02-04 05:39:19 +00:00
Nate Lawson	bfdbeca163	Add an interface for cpufreq. The kernel interface lets other drivers select the CPU frequency level (say for cooling). The driver interface allows hardware drivers to announce themselves as capable of adjusting an individual frequency setting.	2005-02-04 05:38:30 +00:00
Pawel Jakub Dawidek	f627315f1e	- Move gets() function to libkern (I want to use it outside vfs_mount.c). - Add buffer size limitations (overflow will not be possible anymore). - Add 'visible' option, which will allow for passphrase reading in the future. - Remove special treatment of '@' and '#', those two are only confusing. Discussed with: rwatson MFC after: 2 weeks	2005-02-03 15:10:58 +00:00
Jeff Roberson	ff05fd5d77	- Correct a typo in kern_rename. tvfslocked should be initialized from tond and not fromnd. This could lead us to leak Giant, or unlock it twice, depending on the filesystems involved. renames within a single filesystem would not have caused any problems. Sponsored by: Isilon Systems, Inc.	2005-02-02 17:17:15 +00:00
Jeff Roberson	37c15216fc	- Or MPSAFE with the correct set of flags in stat(). This affected only the LOOKUP_SHARED case. Spotted by: jhb	2005-02-01 23:43:46 +00:00
Bosko Milekic	737cd9525b	Update copyright, remove "all rights reserved" (since they are not all reserved, as the lisence makes clear), and strike the third clause (now this is a 2-clause liberal BSDL as are the rest of files I hold copyright over).	2005-02-01 03:17:52 +00:00
Maxim Sobolev	a6886ef173	Extend kern_sendit() to take another enum uio_seg argument, which specifies where the buffer to send lies and use it to eliminate yet another stackgap in linuxlator. MFC after: 2 weeks	2005-01-30 07:20:36 +00:00
Maxim Sobolev	ec217396c4	Fix build on AMD64 (and probably other arches where size_t != int). Submitted by: Tinderbox MFC after: 2 weeks	2005-01-30 06:43:17 +00:00
Robert Watson	78bb1895ab	Fix spelling of integer in a comment. Beady eyes: ceri	2005-01-30 00:31:19 +00:00
Maxim Sobolev	56c2262c0e	Grrr, this committer needs to have a sleep. Remove lines from the previous delta not intended for public consumption. MFC after: 2 weeks	2005-01-29 23:51:05 +00:00
Maxim Sobolev	c30af53213	Fix small non-conformance introduced in the previous commit: execve() is expected to return ENAMETOOLONG, not E2BIG if first argument doesn't fit into {PATH_MAX} bytes. MFC after: 2 weeks	2005-01-29 23:47:36 +00:00
Maxim Sobolev	610ecfe035	o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks	2005-01-29 23:12:00 +00:00
Robert Watson	3fcd9325ec	Correct a minr whitespace inconsistency introduced in revision 1.159: add a tab between #define and DF_REBID instead of a space.	2005-01-29 22:04:30 +00:00
Poul-Henning Kamp	d9aaa28f63	Use MAXMINOR	2005-01-29 16:50:04 +00:00
Poul-Henning Kamp	37085a3931	Typo.	2005-01-29 15:10:30 +00:00
Poul-Henning Kamp	3a85fd262c	Add MAXMINOR #define, we should have had this long time ago. Add minor2unit() in addition to dev2unit() and unit2minor(). If it wasn't such a hazzle we should redefine minor numbers in the kernel without the gap for the major number, but it's not worth the bother (yet).	2005-01-29 15:07:13 +00:00
Poul-Henning Kamp	a258707313	In 1.276 of kern/subr_trap.c I introduced a mechanism for delaying a process return to userspace if it had pending GEOM events. We need to have the same check in the exit pass to catch the case where a GEOM related filedescriptor is not explicitly closed by the process. Bumped into by: people using dd(1) to build releases, nanobsd etc.	2005-01-29 14:03:41 +00:00
Jeff Roberson	bd8d684fd7	- Don't drop the wref on the bufobj until after bufdone() has completed. Without this, threads waiting in bufobj_wwait() may wakeup prior to bufdone() completing. Sponsored by: Isilon Systems, Inc.	2005-01-28 17:48:58 +00:00
Poul-Henning Kamp	d4eb29ba71	Remove unused argument to vrecycle()	2005-01-28 13:08:21 +00:00
Poul-Henning Kamp	1fdfaafb08	Integrate vclean() into vgonel(). Various associated polishing.	2005-01-28 13:00:03 +00:00
Poul-Henning Kamp	3fc8dd0653	Remove register keyword	2005-01-28 12:39:10 +00:00
Poul-Henning Kamp	7146d6cb3e	Move the contents of vop_stddestroyvobject() to the new vnode_pager function vnode_destroy_vobject(). Make the new function zero the vp->v_object pointer so we can tell if a call is missing.	2005-01-28 08:56:48 +00:00
Jeff Roberson	37f32177bd	- Regen	2005-01-26 02:29:18 +00:00
Jeff Roberson	810ad5ec4c	- Struct mount is not yet locked well enough to allow mount/nmount/unmount to run without Giant. Mark them as STD here.	2005-01-26 02:28:43 +00:00
Maxim Sobolev	f4b6eb045f	Split out kernel side of msgctl(2) into two parts: the first that pops data from the userland and pushes results back and the second which does actual processing. Use the latter to eliminate stackgap in the linux wrapper of that syscall. MFC after: 2 weeks	2005-01-26 00:46:36 +00:00
Maxim Sobolev	cfa0efe7ab	Split out kernel side of {get,set}itimer(2) into two parts: the first that pops data from the userland and pushes results back and the second which does actual processing. Use the latter to eliminate stackgap in the linux wrappers of those syscalls. MFC after: 2 weeks	2005-01-25 21:28:28 +00:00
Jeff Roberson	04186764a4	- Include LK_INTERLOCK in LK_EXTFLG_MASK so that it makes its way into acquire. - Correct the condition that causes us to skip apause() to only require the presence of LK_INTERLOCK. Sponsored by: Isilon Systems, Inc.	2005-01-25 16:06:05 +00:00
Jeff Roberson	013e6650ca	- Make lf_print static and move its prototype into kern_lockf.c - Protect all of the advlock code with Giant as some filesystems may not be entering with Giant held now. Sponsored by: Isilon Systems, Inc.	2005-01-25 10:15:26 +00:00
Poul-Henning Kamp	4f8d23d662	Previously a read of zero bytes got handled in devfs:vop_read() but I missed that when the vnode bypass was introduced. Deal with zero length transfers before we even get to fo_ops->fo_read(). Found by: Slawa Olhovchenkov <slwzxy.spb.ru@zxy.spb.ru> PR: 75758	2005-01-25 09:15:32 +00:00
Poul-Henning Kamp	729fcf7efb	Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now.	2005-01-25 00:42:16 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Poul-Henning Kamp	69816ea35e	Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem for a given vnode to create a vnode_pager object if one is needed.	2005-01-25 00:12:24 +00:00
Poul-Henning Kamp	dcff5b1440	Don't call VOP_CREATEVOBJECT(), it's the responsibility of the filesystem which owns the vnode.	2005-01-24 23:53:54 +00:00
Poul-Henning Kamp	b5b6ec5faa	Eliminate the constant flags argument to vclean()	2005-01-24 22:22:02 +00:00
Poul-Henning Kamp	d07a6d3f61	Move the body of vop_stdcreatevobject() over to the vnode_pager under the name Sande^H^H^H^H^Hvnode_create_vobject(). Make the new function take a size argument which removes the need for a VOP_STAT() or a very pessimistic guess for disks. Call that new function from vop_stdcreatevobject(). Make vnode_pager_alloc() private now that its only user came home.	2005-01-24 21:21:59 +00:00
Poul-Henning Kamp	f6dc414a5c	Save a line by unlocking before we test.	2005-01-24 14:13:24 +00:00
Poul-Henning Kamp	7c93282e42	Change vprint() to vn_printf() which takes varargs. Add #define for vprint() to call vn_printf().	2005-01-24 13:58:08 +00:00
Poul-Henning Kamp	35764be39e	Kill the VV_OBJBUF and test the v_object for NULL instead.	2005-01-24 13:13:57 +00:00
Poul-Henning Kamp	027b1f716c	Fix a list corruption issue in cloning device management using the western strategy ("allocate first, ask questions later") so we can extend the devmtx coverage to the clone list.	2005-01-24 12:44:56 +00:00
Gleb Smirnoff	90d52f2f21	- Convert so_qlen, so_incqlen, so_qlimit fields of struct socket from short to unsigned short. - Add SYSCTL_PROC() around somaxconn, not accepting values < 1 or > U_SHRTMAX. Before this change setting somaxconn to smth above 32767 and calling listen(fd, -1) lead to a socket, which doesn't accept connections at all. Reviewed by: rwatson Reported by: Igor Sysoev	2005-01-24 12:20:21 +00:00
Jeff Roberson	e1279468ec	- Regen for recent vfs syscall changes. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:50:42 +00:00
Jeff Roberson	29ed48fc6a	- Change all VFS syscalls to MSTD as they all manually deal with giant or the appropriate filesystem locks. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:49:26 +00:00
Jeff Roberson	71ddd673b1	- Add CTR calls to trace the lifecycle of a buffer. - Remove some KASSERTs which are invalid if the appropriate lock is not held. - Slightly restructure bremfree() so that it is more sane. - Change the flush code in bdwrite() to avoid acquiring a mutex whenever possible. - Change the flush code in bdwrite() to avoid holding the bufobj mutex while calling buf_countdeps(). This introduces a lock-order relationship with the softdep lock that can not otherwise be resolved. - Don't set B_DONE until bufdone() is complete, otherwise another processor may believe the buf is done before it is. - Only acquire Giant if the caller has set b_iodone. Don't grab giant around normal bufdone() calls. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:47:04 +00:00
Jeff Roberson	d1fcf3bb31	- Add the tunable and sysctl for the mpsafevfs. It currently defaults to off. - Protect access to mnt_kern_flag with the mointpoint mutex. - Remove some KASSERTs which are not legal checks without the appropriate locks held. - Use VCANRECYCLE() rather than rolling several slightly different checks together. - Return from vtryrecycle() with a recycled vnode rather than a locked vnode. This simplifies some locking. - Remove several GIANT_REQUIRED lines. - Add a few KASSERTs to help with INACT debugging. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:41:01 +00:00
Jeff Roberson	791625d853	- Remove GIANT_REQUIRED where giant is no longer required. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:33:46 +00:00
Jeff Roberson	82d1b24c70	- Remove GIANT_REQUIRED where it is no longer required. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:32:14 +00:00
Jeff Roberson	f50a2d5e2d	- Remove GIANT_REQUIRED where giant is no longer required. - Protect access to mnt_kern_flag with the mountpoint mutex. - Use the appropriate nd flags to deal with giant in vn_open_cred(). We currently determine whether the caller is mpsafe by checking for a valid fdidx. Any caller coming from user-space is now mpsafe and supplies a valid fd. No kenrel callers have been converted to mpsafe, so this check is sufficient for now. - Use VFS_LOCK_GIANT instead of manual giant acquisition where appropriate. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:31:42 +00:00
Jeff Roberson	fc48b760ac	- Protect mnt_kern_flag with the mountpoint's mutex. This is required to make the suspend related functions mpsafe. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:28:41 +00:00
Jeff Roberson	22a960a69c	- Acquire and release Giant as we enter and leave filesystems which require it. - Track the status of Giant with the nd flag HASGIANT. - Release giant on return of namei() callers are not marked MPSAFE as they already own giant. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:27:05 +00:00
Jeff Roberson	94a9458501	- Change all vfs syscalls to use VFS_LOCK_GIANT(), and MPSAFE nds. - Move Giant acquisition into the few vfs syscalls that weren't already directly acquiring it. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:25:44 +00:00
Jeff Roberson	799cc2dcee	- Simplify the cache locking. The lock order relationship with the vnode lock is much simpler than I originally thought it would be. Now, the cache lock is always acquired before the vnode lock. - Provide some gotos in __getcwd() to simplify the unlocking a bit. - Move Giant acquisition down into __getcwd(). Sponsored By: Isilon Systems, Inc.	2005-01-24 10:24:12 +00:00
Jeff Roberson	41bd6c15f2	- Do not use APAUSE if LK_INTERLOCK is set. We lose synchronization if the lockmgr interlock is dropped after the caller's interlock is dropped. - Change some lockmgr KTRs to be slightly more helpful. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:20:59 +00:00
Jeff Roberson	66ca1b4878	- Use VFS_LOCK_GIANT() in place of mtx_lock(&giant), etc. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:19:31 +00:00
Robert Watson	471135a3af	Style cleanup: with removal of mutex operations, we can also remove {}'s from securelevel_gt() and securelevel_ge(). MFC after: 1 week	2005-01-23 21:11:39 +00:00
Robert Watson	0b880542e6	When reading pr_securelevel from a prison, perform a lockless read, as it's an integer read operation and the resulting slight race is acceptable. MFC after: 1 week	2005-01-23 21:01:00 +00:00
Robert Watson	4261ed50fd	When retrieving the current per-jails securelevel for a sysctl read, don't acquire the prison mutex, as it's an integer read and races here don't make a difference. MFC after: 1 week	2005-01-23 20:59:19 +00:00
Robert Watson	5324bda309	When DDB is not defined, don't implement witness_thread_has_locks() and witness_proc_has_locks(), as they are unused, which results in a compiler error. This problem was introduced with the implementation of "show alllocks". Spotted by: Artem Kuchin <matrix at itlegion dot ru>	2005-01-22 21:14:21 +00:00
Robert Watson	14cedfc842	Invoke label initialization, creation, cleanup, and tear-down MAC Framework entry points for System V IPC shared memory. Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net> Obtained from: TrustedBSD Project Sponsored by: DARPA, SPAWAR, McAfee Research	2005-01-22 19:10:25 +00:00
Robert Watson	a6009aa7c1	Invoke label initialization, creation, cleanup, and tear-down MAC Framework entry points for System V IPC semaphores. Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net> Obtained from: TrustedBSD Project Sponsored by: DARPA, SPAWAR, McAfee Research	2005-01-22 19:04:17 +00:00
Robert Watson	e6a543f8db	Invoke label initialization, creation, cleanup, and tear-down MAC Framework entry points for System V IPC message queues. Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net> Obtained from: TrustedBSD Project Sponsored by: DARPA, SPAWAR, McAfee Research	2005-01-22 18:51:43 +00:00
Bosko Milekic	e4eb384b47	Bring in MemGuard, a very simple and small replacement allocator designed to help detect tamper-after-free scenarios, a problem more and more common and likely with multithreaded kernels where race conditions are more prevalent. Currently MemGuard can only take over malloc()/realloc()/free() for particular (a) malloc type(s) and the code brought in with this change manually instruments it to take over M_SUBPROC allocations as an example. If you are planning to use it, for now you must: 1) Put "options DEBUG_MEMGUARD" in your kernel config. 2) Edit src/sys/kern/kern_malloc.c manually, look for "XXX CHANGEME" and replace the M_SUBPROC comparison with the appropriate malloc type (this might require additional but small/simple code modification if, say, the malloc type is declared out of scope). 3) Build and install your kernel. Tune vm.memguard_divisor boot-time tunable which is used to scale how much of kmem_map you want to allott for MemGuard's use. The default is 10, so kmem_size/10. ToDo: 1) Bring in a memguard(9) man page. 2) Better instrumentation (e.g., boot-time) of MemGuard taking over malloc types. 3) Teach UMA about MemGuard to allow MemGuard to override zone allocations too. 4) Improve MemGuard if necessary. This work is partly based on some old patches from Ian Dowse.	2005-01-21 18:09:17 +00:00
Colin Percival	7834081c88	Make "c->c_func = NULL" conditional on CALLOUT_LOCAL_ALLOC in both places where it occurs, not just one. :-) Pointed out by: glebius Pointy had to: cperciva	2005-01-19 21:15:58 +00:00
Colin Percival	0ceba3d69c	Make "c->c_func = NULL" conditional on the CALLOUT_LOCAL_ALLOC flag, i.e., only clear c->c_func if the callout c is being used via the old timeout(9) interface. Requested by: glebius	2005-01-19 20:34:46 +00:00
Colin Percival	86fd19de7b	Clarify the description of the callout_active() macro: It is cleared by callout_stop, callout_drain, and callout_deactivate, but is not automatically cleared when a callout returns.	2005-01-19 19:46:35 +00:00
Paul Saab	efa42cbc93	move kern_nanosleep to sys/syscallsubr.h Requested by: jhb	2005-01-19 18:09:50 +00:00
Paul Saab	0e214fad37	Add a 32bit syscall wrapper for modstat Obtained from: Yahoo!	2005-01-19 17:53:06 +00:00
Paul Saab	7fdf2c856f	- rename nanosleep1 to kern_nanosleep - Add a 32bit syscall entry for nanosleep Reviewed by: peter Obtained from: Yahoo!	2005-01-19 17:44:59 +00:00
Warner Losh	234111d6d0	Introduce bus_free_resource. It is a convenience function which wraps bus_release_resource by grabbing the rid from the resource.	2005-01-19 06:52:19 +00:00
David Xu	a2cc61fa6e	Revert my previous errno hack, that is certainly an issue, and always has been, but the system call itself returns errno in a register so the problem is really a function of libc, not the system call. Discussed with : Matthew Dillion <dillon@apollo.backplane.com>	2005-01-18 13:53:10 +00:00
Poul-Henning Kamp	9fc6aa0618	Detect sign-extension bugs in the ioctl(2) command argument: Truncate to 32 bits and print warning.	2005-01-18 07:37:05 +00:00
Mike Silbersack	6792415119	Rearrange the kninit calls for both directions of a pipe so that they both happen before pipe backing allocation occurs. Previously, a pipe memory shortage would cause a panic due to a KNOTE call on an uninitialized si_note. Reported by: Peter Holm MFC after: 1 week	2005-01-17 07:56:28 +00:00
Poul-Henning Kamp	7bf38aeae7	Fix a bug I introduced in 1.561 which has caused considerable filesystem unhappiness lately. As far as I can tell, no files that have made it safely to disk have been endangered, but stuff in transit has been in peril. Pointy hat: phk	2005-01-16 21:09:39 +00:00
David Xu	b7be40d612	make umtx timeout relative so userland can select different clock type, e.g, CLOCK_REALTIME or CLOCK_MONOTONIC. merge umtx_wait and umtx_timedwait into single function.	2005-01-14 13:38:15 +00:00
Poul-Henning Kamp	7c0745eeae	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
Poul-Henning Kamp	e39db32ab0	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.	2005-01-13 12:25:19 +00:00
Poul-Henning Kamp	63f89abf4a	Change the generated VOP_ macro implementations to improve type checking and KASSERT coverage. After this check there is only one "nasty" cast in this code but there is a KASSERT to protect against the wrong argument structure behind that cast. Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical kernel with no change in performance. We also now run the checking and tracing on VOP's which have been layered by nullfs, umapfs, deadfs or unionfs. Add new (non-inline) VOP_FOO_AP() functions which take a "struct foo_args" argument and does everything the VOP_FOO() macros used to do with checks and debugging code. Add KASSERT to VOP_FOO_AP() check for argument type being correct. Slim down VOP_FOO() inline functions to just stuff arguments into the struct foo_args and call VOP_FOO_AP(). Put function pointer to VOP_FOO_AP() into vop_foo_desc structure and make VCALL() use it instead of the current offsetoff() hack. Retire vcall() which implemented the offsetoff() Make deadfs and unionfs use VOP_FOO_AP() calls instead of VCALL(), we know which specific call we want already. Remove unneeded arguments to VCALL() in nullfs and umapfs bypass functions. Remove unused vdesc_offset and VOFFSET(). Generally improve style/readability of the generated code.	2005-01-13 07:53:01 +00:00
Maxim Sobolev	fdf84ec4c6	When re-connecting already connected datagram socket ensure to clean up its pending error state, which may be set in some rare conditions resulting in connect() syscall returning that bogus error and making application believe that attempt to change association has failed, while it has not in fact. There is sockets/reconnect regression test which excersises this bug. MFC after: 2 weeks	2005-01-12 10:15:23 +00:00
Poul-Henning Kamp	3963baec64	Comment out debugging printf which doesn't compile on amd64.	2005-01-12 10:11:31 +00:00
David Xu	333d4875cd	Let _umtx_op directly return error code rather than from errno because errno can be tampered potentially by nested signal handle. Now all error codes are returned in negative value, positive value are reserved for future expansion.	2005-01-12 05:55:52 +00:00
Poul-Henning Kamp	6ef8480a88	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
Poul-Henning Kamp	6afa350d53	More vnode -> bufobj migration.	2005-01-11 10:16:39 +00:00
Poul-Henning Kamp	8d785753bd	Give flushbuflist() a struct bufv as first argument and avoid home-rolling TAILQ_FOREACH_SAFE(). Loose the error pointer argument and return any errors the normal way. Return EAGAIN for the case where more work needs to be done.	2005-01-11 10:01:54 +00:00
Poul-Henning Kamp	8df6bac4c7	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
David Xu	3e380f0d3d	Break out of loop earlier if it is not timeout.	2005-01-08 06:57:46 +00:00
Robert Watson	2b05b557ff	In acct_process(), do a lockless read of acctvp to see if it's NULL before deciding to do more expensive locking to account for process exit. This acceptable minor race avoids two mutex operations in that highly common case of accounting not being enabled. MFC after: 2 weeks	2005-01-08 04:45:57 +00:00
Robert Watson	fd544ee8f7	In kern_wait(), let the compiler copy the rusage structure rather than an explicit bcopy() -- it probably does a better job.	2005-01-08 04:17:48 +00:00
Colin Percival	e9dec2c41b	Adjust two of my comments to the new world order: Indent protection in the first column is performed using /*, not /-.	2005-01-07 03:25:45 +00:00
Warner Losh	60727d8b86	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Warner Losh	73108a1664	Expand COPYRIGHT inline, per Matthew Dillon's earlier approval.	2005-01-06 23:34:38 +00:00
David Xu	476e1d077e	Return ETIMEDOUT when thread is timeouted since POSIX thread APIs expect ETIMEDOUT not EAGAIN, this simplifies userland code a bit.	2005-01-06 02:08:34 +00:00
John Baldwin	c88379381b	- Move the function prototypes for kern_setrlimit() and kern_wait() to sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.	2005-01-05 22:19:44 +00:00
John Baldwin	33fb8a386e	Rework the optimization for spinlocks on UP to be slightly less drastic and turn it back on. Specifically, the actual changes are now less intrusive in that the _get_spin_lock() and _rel_spin_lock() macros now have their contents changed for UP vs SMP kernels which centralizes the changes. Also, UP kernels do not use _mtx_lock_spin() and no longer include it. The UP versions of the spin lock functions do not use any atomic operations, but simple compares and stores which allow mtx_owned() to still work for spin locks while removing the overhead of atomic operations. Tested on: i386, alpha	2005-01-05 21:13:27 +00:00
Poul-Henning Kamp	0b3e4fe239	Since we do not support forceful unmount of DEVFS we can do away with the partially implemented vnode-readoption code in vgonechrl().	2005-01-04 08:49:14 +00:00
Marcel Moolenaar	3195113e2a	Regen.	2005-01-03 00:47:23 +00:00
Marcel Moolenaar	fe0ef598b6	uuidgen(2) is MP safe.	2005-01-03 00:45:57 +00:00
Warner Losh	d0d4cc63e3	Implement device_quiesce. This method means 'you are about to be unloaded, cleanup, or return ebusy of that's inconvenient.' The default module hanlder for newbus will now call this when we get a MOD_QUIESCE event, but in the future may call this at other times. This shouldn't change any actual behavior until drivers start to use it.	2004-12-31 20:47:51 +00:00
Pawel Jakub Dawidek	46003fb337	Be consistent and always use form 'return (value);' instead of 'return value;'. We had (before this change) 84 lines where it was style(9)-clean and 15 lines where it was not.	2004-12-31 14:52:53 +00:00
John Baldwin	50aaa791ba	Fix a typo and two whitespace nits.	2004-12-30 22:17:00 +00:00
John Baldwin	f5c157d986	Rework the interface between priority propagation (lending) and the schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64	2004-12-30 20:52:44 +00:00
John Baldwin	99b808f461	Whitespace fix.	2004-12-30 20:30:58 +00:00
John Baldwin	63710c4d35	Stop explicitly touching td_base_pri outside of the scheduler and simply set a thread's priority via sched_prio() when that is the desired action. The schedulers will start managing td_base_pri internally shortly.	2004-12-30 20:29:58 +00:00
John Baldwin	9e6c867ccc	Call tty_close() at the very end of ttyclose() since otherwise NULL deferences can occur since tty_close() may end up freeing the tty structure if it drops the last reference to it. Glanced at by: phk	2004-12-30 19:24:49 +00:00
Robert Watson	b36aab857b	Make the sysctls kern.ipc.msgmnb and kern.ipc.msgtql into tunables as is the case for most other sysctls in the System V IPC message queue implementation. PR: 75541 Submitted by: Sergiy Vyshnevetskiy <serg at vostok dot net> MFC after: 2 weeks	2004-12-30 13:56:34 +00:00
David Xu	cc1000ac5b	Make umtx_wait and umtx_wake more like linux futex does, it is more general than previous. It also lets me implement cancelable point in thread library. Also in theory, umtx_lock and umtx_unlock can be implemented by using umtx_wait and umtx_wake, all atomic operations can be done in userland without kernel's casuptr() function.	2004-12-30 02:56:17 +00:00
Alan Cox	956d03da83	Eliminate (now) unnecessary acquisition and release of the global page queues lock.	2004-12-29 04:49:10 +00:00
John Baldwin	83ae089aab	- Up the WITNESS_COUNT macro from 200 to 1024 to support the growing number of lock types in the kernel. This results in an increase of witness data usage from ~145k to ~280k on i386 for kernels with 'options WITNESS'. - Remove the unused witness malloc bucket. Submitted by: Michal Mertl mime at traveller dot cz (1)	2004-12-28 21:21:27 +00:00
Robert Watson	6ce8940626	Attempt to slightly refine the print out from "show alllocks" -- list the process and thread numbers/names on the same line rather than on separate lines, and print the thread pointer not just the tid.	2004-12-27 10:47:08 +00:00
Alexander Kabaev	aa6f98d12f	Do not vput(9) unlocked vnode and do not VREF it with the sole purpose of vputting it back immediately. Complained by: DEBUG_VFS_LOCKS	2004-12-27 05:17:11 +00:00
Jeff Roberson	2ebf8eb132	- Unintentionally checked in a debugging panic. Remove that.	2004-12-26 23:21:48 +00:00
Jeff Roberson	36996b3b7c	- Remove a 4BSD specific hack since this will work on ULE too.	2004-12-26 22:56:51 +00:00
Jeff Roberson	598b368d6c	- Fix a long standing problem where an ithread would not honor sched_pin(). - Remove the sched_add wrapper that used sched_add_internal() as a backend. Its only purpose was to interpret one flag and turn it into an int. Do the right thing and interpret the flag in sched_add() instead. - Pass the flag argument to sched_add() to kseq_runq_add() so that we can get the SRQ_PREEMPT optimization too. - Add a KEF_INTERNAL flag. If KEF_INTERNAL is set we don't adjust the SLOT counts, otherwise the slot counts are adjusted as soon as we enter sched_add() or sched_rem() rather than when the thread is actually placed on the run queue. This greatly simplifies the handling of slots. - Remove the explicit prevention of migration for ithreads on non-x86 platforms. This was never shown to have any real benefit. - Remove the unused class argument to KSE_CAN_MIGRATE(). - Add ktr points for thread migration events. - Fix a long standing bug on platforms which don't initialize the cpu topology. The ksg_maxid variable was never correctly set on these platforms which caused the long term load balancer to never inspect more than the first group or processor. - Fix another bug which prevented the long term load balancer from working properly. If stathz != hz we can't expect sched_clock() to be called on the exact tick count that we're anticipating. - Rearrange sched_switch() a bit to reduce indentation levels.	2004-12-26 22:56:08 +00:00
Robert Watson	b6dd9ef2fe	Add "show alllocks" command to DDB, which dumps a list of processes and threads currently holding sleep mutexes (and spin mutexes for curthread). This can be quite useful in looking for a lock condition summary for a system, as it avoids manually iterating through threads and processes to find all the interesting locks. NB: "alllocks" is up there with "lockedvnods" for a bad argument for show. MFC after: 2 weeks	2004-12-26 22:52:24 +00:00
Jeff Roberson	6a98702001	- Run sched_userret() after thread_userret(). Before, sched_userret() would lower the priority of the returning thread to a user priority before calling into thread_userret() which would call wakeup() which in turn would cause the returning thread to eventually context switch rather than completing its slice. Allowing this thread to complete its slice first yields a 15% performance improvement in super-smack on my dual opteron with 4BSD.	2004-12-26 07:30:35 +00:00
Jeff Roberson	907bdbc288	- Wrap the thread count adjustment in sched_load_add() and sched_load_rem() so that we may place some ktr entries nearby. - Define other KTR_SCHED tracepoints so that we may graph the operation of the scheduler.	2004-12-26 00:16:24 +00:00
Jeff Roberson	81d47d3f4b	- Remove earlier KTR_ULE tracepoints. - Define new KTR_SCHED points so that we can graph the operation of the scheduler.	2004-12-26 00:15:33 +00:00
Jeff Roberson	85da7a569b	- Define KTR points for KTR_SCHED.	2004-12-26 00:14:21 +00:00
David Xu	c180db2bce	Make _umtx_op() as more general interface, the final parameter needn't be timespec pointer, every parameter will be interpreted by its opcode.	2004-12-25 13:02:50 +00:00
David Xu	8b37fbabb4	1. introduce umtx_owner to get an owner of a umtx. 2. add const qualifier to umtx_timedlock and umtx_timedwait. 3. add missing blackets in umtx do_unlock_and_wait.	2004-12-25 12:49:35 +00:00
David Xu	3dd213f160	Add umtxq_lock/unlock around umtx_signal, fix debug kernel compiling, let umtx_lock returns EINTR when it returns ERESTART, this lets userland have chance to back off mtx lock code when needed.	2004-12-24 11:59:20 +00:00
David Xu	a08c214a72	1. Fix race condition between umtx lock and unlock, heavy testing on SMP can explore the bug. 2. Let umtx_wake returns number of threads have been woken.	2004-12-24 11:30:55 +00:00
Robert Watson	0fddf92d72	Assert the sem lock in sem_ref() and sem_rel(), as it is required to safely manipulate the reference count.	2004-12-23 02:22:47 +00:00
Robert Watson	38e6a58c77	Remove temporary debugging printf that was used to detect the presence of a race that had previously caused a panic in order to determine if the fix was for the right problem. It was. MFC after: 2 weeks	2004-12-23 01:19:27 +00:00
Robert Watson	1ef121cf6b	In sonewconn(), the s/if/while/ change to wait for room at the tail of the accept queue is a feature, not a bug/issue, so remove the XXXRW from the comment.	2004-12-23 01:16:21 +00:00
Robert Watson	ba65391172	Remove an XXXRW indicating atomic operations might be used as a substitute for a global mutex protecting the socket count and generation number. The observation that soreceive_rcvoob() can't return an mbuf chain is a property, not a bug, so remove the XXXRW. In sorflush, s/existing/previous/ for code when describing prior behavior. For SO_LINGER socket option retrieval, remove an XXXRW about why we hold the mutex: this is correct and not dubious. MFC after: 2 weeks	2004-12-23 01:07:12 +00:00
Robert Watson	81b5dbecd4	In soalloc(), simplify the mac_init_socket() handling to remove unnecessary use of a global variable and simplify the return case. While here, use ()'s around return values. In sodealloc(), remove a comment about why we bump the gencnt and decrement the socket count separately. It doesn't add substantially to the reading, and clutters the function. MFC after: 2 weeks	2004-12-23 00:59:43 +00:00
Alan Cox	7abe2ac214	Add send buffer locking to uipc_send(). Without this locking a race can occur between a reader and a writer that results in a panic upon close, e.g., "panic: sbflush_locked: cc 4 \|\| mb 0xffffff0052afa400 \|\| mbcnt 0" Reviewed by: rwatson@ MFC after: 2 weeks	2004-12-22 20:28:46 +00:00
Poul-Henning Kamp	40b5a6f2c6	Include uio.h Check O_NONBLOCK instead if IO_NDELAY Don't include vnode.h	2004-12-22 17:37:14 +00:00
Poul-Henning Kamp	72e8dfe5a0	Hide/remove various printfs, now that root mounting doesn't seem to explode on people.	2004-12-20 21:59:25 +00:00
Poul-Henning Kamp	118253ca24	fix a misleading sleep identifier.	2004-12-20 21:38:13 +00:00
Poul-Henning Kamp	e87047b437	We can only ever get to vgonechrl() from a devfs vnode, so we do not need to reassign the vp->v_op to devfs_specops, we know that is the value already. Make devfs_specops private to devfs.	2004-12-20 21:34:29 +00:00
David Xu	839f811c6a	1. msleep returns EWOULDBLOCK not ETIMEDOUT, use EWOULDBLOCK instead. 2. Eliminate a possible lock leak in timed wait loop.	2004-12-18 13:43:16 +00:00
David Xu	50586e8b6b	1. make umtx sharable between processes, the way is two or more processes call mmap() to create a shared space, and then initialize umtx on it, after that, each thread in different processes can use the umtx same as threads in same process. 2. introduce a new syscall _umtx_op to support timed lock and condition variable semantics. also, orignal umtx_lock and umtx_unlock inline functions now are reimplemented by using _umtx_op, the _umtx_op can use arbitrary id not just a thread id.	2004-12-18 12:52:44 +00:00
Sam Leffler	a37c415e66	fix m_append for case where additional mbufs are required	2004-12-15 19:04:07 +00:00
Poul-Henning Kamp	662d80dc23	Fix a deadlock I introduced this morning. Mostly from: tegge	2004-12-14 20:48:40 +00:00
Jeff Roberson	7842f65e7f	- Garbage collect several unused members of struct kse and struce ksegrp. As best as I can tell, some of these were never used.	2004-12-14 10:53:55 +00:00
Jeff Roberson	8ffb8f5558	- In kseq_choose(), don't recalculate slice values for processes with a nice of 0. Doing so can cause an infinite loop because they should be running, but a nice -20 process could prevent them from doing so. - Add a new flag KEF_PRIOELEV to flag a thread that has had its priority elevated due to priority propagation. If a thread has had its priority elevated, we assume that it must go on the current queue and it must get a slice. - In sched_userret() if our priority was elevated and we shouldn't have a timeslice, yield here until we should. Found/Tested by: glebius	2004-12-14 10:34:27 +00:00
Poul-Henning Kamp	d986dbb448	Add a new kind of reference count (fd_holdcnt) to struct filedesc which holds on to just the data structure and the mutex. (The existing refcount (fd_refcnt) holds onto the open files in the descriptor.) The fd_holdcnt is protected by fdesc_mtx, fd_refcnt by FILEDESC_LOCK. Add fdhold(struct proc ) which gets a hold on the filedescriptors of the specified proc.. Add fddrop(struct filedesc ) which drops the fd_holdcnt and if zero destroys the mutex and frees the memory. Initialize the fd_holdcnt to one in fdinit(). Normal operations on the filedesc structure will not change it. In fdfree() use fddrop() to dispose of the mutex and structure. Hold the FILEDESC_LOCK() until we have cleaned out the contents and carefully set the fields to null values during cleanup. Use fdhold()/fddrop() in mountcheckdirs() and sysctl_kern_file().	2004-12-14 09:09:51 +00:00
Poul-Henning Kamp	30abaa53df	Make fdesc_mtx private to kern_descrip.c now that the flock has come home.	2004-12-14 08:44:51 +00:00
Poul-Henning Kamp	12b18fdab4	Move the checkdirs() function from vfs_mount.c to kern_descrip.c and call it mountcheckdirs().	2004-12-14 08:23:18 +00:00
Poul-Henning Kamp	c113083c5a	Add new function fdunshare() which encapsulates the necessary light magic for ensuring that a process' filedesc is not shared with anybody. Use it in the two places which previously had private implmentations. This collects all fd_refcnt handling in kern_descrip.c	2004-12-14 07:20:03 +00:00
Jeff Roberson	3ef6ac3361	- If delivering a signal will result in killing a process that has a nice value above 0, set it to 0 so that it may proceed with haste. This is especially important on ULE, where adjusting the priority does not guarantee that a thread will be granted a greater time slice.	2004-12-13 16:45:57 +00:00
Jeff Roberson	2d59a44dc0	- Take up a 'slot' while we're on the assigned queue, waiting to be posted to another processor. Otherwise, kern_switch() gets confused and tries to sched_add(NULL).	2004-12-13 13:09:33 +00:00
Pawel Jakub Dawidek	bf4843166f	Add bioq_insert_head() function. OK'd by: phk	2004-12-13 12:57:21 +00:00
Alan Cox	db24060c25	Correct the handling of two unusual cases by the zero-copy receive path, specifically, vm_pgmoveco(): 1. If vm_pgmoveco() sleeps on a busy page, it must redo the look up because the page may have been freed. 2. If the receive buffer is copy-on-write due to, for example, a fork, then although the first vm object in the shadow chain may not contain a page there may still be one from a backing object that is mapped. Thus, a pmap_remove() is required for the new page rather than the backing object's page to been seen by the application. Also, add some comments to vm_pgmoveco() and update some assertions. Tested by: ken@	2004-12-13 06:24:14 +00:00
Poul-Henning Kamp	1ab58cc2df	Copy the entire stats structure. Let compiler decide how.	2004-12-11 22:13:02 +00:00
Poul-Henning Kamp	e40da1f149	Fix whitespace. Spotted by: njl	2004-12-11 20:41:32 +00:00
Poul-Henning Kamp	494ea31a7d	Remove the /dev/dev -> / symlink after we are done with it.	2004-12-11 12:48:37 +00:00
Alan Cox	c73e3e9223	Remove unneeded code from the zero-copy receive path. Discussed with: gallatin@ Tested by: ken@	2004-12-10 04:49:13 +00:00
Max Laier	f8aabcb680	Start the protocol timeouts only after all domains have been initialized completely. For some reason (that I am still curious about) we started to no longer manage to finish the initialization before the timeouts run the first time leading to panics when using uninitialized mutex etc. The root of this problem is that we currently first link a domain to the domains list and only later initialize the domain's protocols. This should be reworked in the future, but with the current API it is not possible in all situations. We settle with this lazy fix for now. Tested by: gnn, ru, myself	2004-12-09 11:47:30 +00:00
Sam Leffler	4873d1754f	add m_append utility function to be used in forthcoming changes	2004-12-08 05:42:02 +00:00
Alan Cox	1c4dbedac4	Tidy up the zero-copy receive path: Remove an unneeded argument to uiomoveco() and userspaceco().	2004-12-08 05:25:08 +00:00
Nate Lawson	8844d5efa6	Add the devclass_get_count(9) function and man page. It gets a count of the number of devices in a devclass and is a subset of devclass_get_devices(9). Reviewed by: imp, dfr	2004-12-08 02:39:56 +00:00
Stephan Uphoff	5656474145	Propagate TDF_NEEDRESCHED to replacement thread in sched_switch(). Reviewed by: julian, jhb (in October) Approved by: sam (mentor) MFC after: 4 weeks	2004-12-07 18:17:24 +00:00
Poul-Henning Kamp	20a92a18f1	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).	2004-12-07 08:15:41 +00:00
Poul-Henning Kamp	46d2b4184d	Instead of complaining about it, just silently filter out MNT_ROOTFS. This fixes the "fsck /" problem various people have reported overnight.	2004-12-07 06:58:42 +00:00
Poul-Henning Kamp	8d8883caaf	make "ffs" and alias for "ufs" when it comes to filesystem names.	2004-12-06 22:22:57 +00:00
Poul-Henning Kamp	1e8ca0f0b0	Always call VFS_STATFS() on mp->mnt_stat when we have mounted a filesystem, this way individual filesystems don't have to do it.	2004-12-06 19:53:32 +00:00
Poul-Henning Kamp	53a05b7c3f	Add more functions for handling mount arguments in VFS_MOUNT(): vfs_flagopt() for binary/boolean options. vfs_getopts() for string options vfs_filteropt() to check for unknown options. vfs_scanopt() for scanf() like processing of options. Also add function for setting the stat.f_mntfromname field.	2004-12-06 18:18:35 +00:00
Poul-Henning Kamp	5ddb073996	Change the first argument of vfs_cmount() to a handy struct mntarg* and call it accordingly. (No filesystems implement vfs_cmount() yet, so this is a no-op commit)	2004-12-06 16:39:05 +00:00
Poul-Henning Kamp	49bfeeb848	Add a few convenient functions in the mount_arg() family and collect the entire family at the end of the source file.	2004-12-06 13:01:41 +00:00
Poul-Henning Kamp	f0df036767	Collapse two almost identical license copies, preserving the rights of all listed authors, rightholders and contributors.	2004-12-06 12:44:30 +00:00
Poul-Henning Kamp	def7671ad8	Remove the kern.rootdev sysctl. Root filessytems (like NFS) don't have an associated disk device, and even if they had, the exact semantics would be filesystem dependent and should be implemented there.	2004-12-06 12:40:45 +00:00
Poul-Henning Kamp	a804d99c40	Make struct vfsopt{list} private to vfs_mount.c	2004-12-06 12:36:17 +00:00
Joseph Koshy	fdf20233c7	Use 'const char *' for a few prototypes. Reviewed by: ru	2004-12-06 10:53:40 +00:00
Alan Cox	370abcb3e5	Update the Tigon 1 and 2 driver to use the sf_buf API for implementing zero-copy receive of jumbo frames. This eliminates the need for the jumbo frame allocator implemented in kern/uipc_jumbo.c and sys/jumbo.h. Remove it. Note: Zero-copy receive of jumbo frames did not work without these changes; I believe there was insufficient locking on the jumbo vm object. Tested by: ken@ Discussed with: gallatin@	2004-12-06 00:43:40 +00:00
Poul-Henning Kamp	743312367a	VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that. Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.	2004-12-05 22:41:02 +00:00
David E. O'Brien	405a104ec0	When panicing in device_unbusy(), actually tell what device has the issue.	2004-12-05 20:58:56 +00:00
Warner Losh	891e611130	Start to add GIANT_REQUIRED; macros in places where giant is required and that I've verified things seem to basically work. I was able to boot and hot plug usb devices. Please let me know if this causes problems for anybody. The push down of giant has proceeded to the point that this will start to matter more and more.	2004-12-05 07:55:30 +00:00
Poul-Henning Kamp	6c12df5a19	Implement a function, mount_arg() for accumulating a list of mount parameters to nmount. Make kernel_mount() accept the output from mount_arg() and know how to free the malloc'ed space. Make kernel_vmount() use the new function.	2004-12-03 22:38:06 +00:00
Poul-Henning Kamp	9722743b9a	Sort and wash #includes.	2004-12-03 21:29:25 +00:00
Poul-Henning Kamp	b74f4d8bd1	When omount() is called, check if the filesystem have a cmount method and if so call it. The cmount method will gather and interpret omount() style arguments, and issue a kern_[v]mount() call to execute the corresponding nmount operation.	2004-12-03 21:14:46 +00:00
Poul-Henning Kamp	2a8b79eb6a	Add early checks for MNT_ROOTFS since we need to allow it later on in the code path.	2004-12-03 19:25:44 +00:00
Poul-Henning Kamp	a08805c741	Retire unused vfs_mount() function in the name of nmount migration.	2004-12-03 18:40:58 +00:00
Poul-Henning Kamp	32ba8e9390	Introduce vfs_byname_kld() which will try to load the filesystem as a module if possible. Use it so we don't have linker magic in the middle of the already complex mount code.	2004-12-03 16:11:01 +00:00
Poul-Henning Kamp	082d21222b	Make NAMEI_DIAGNOSTIC compile again and add a stragic vprint()	2004-12-03 12:15:39 +00:00
Poul-Henning Kamp	f76fedd20b	Improve vprint() a little bit: break long lines, reduce indent and tell if the VI_LOCK() is held.	2004-12-03 12:09:34 +00:00
Poul-Henning Kamp	6a0737aef1	Add missing vop_bypass (returning EOPNOTSUPP). Tripped up: marks	2004-12-03 08:56:30 +00:00
Max Laier	83727f0c3a	Am I smoking crack? Correct stupid, wrong ASSERT -> if conversion and make it do what I had in mind. Noticed by: glebius Pointyhat to: me, myself and mlaier	2004-12-02 15:47:15 +00:00
Poul-Henning Kamp	355be4eeda	Drop ffree() as a separate function and incorporate the only place used.	2004-12-02 12:17:27 +00:00
Poul-Henning Kamp	20ddb405f8	Style polishing. Use grepable functions Other minor nitpickings.	2004-12-02 11:56:13 +00:00
Poul-Henning Kamp	aec0fb7b40	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)	2004-12-01 23:16:38 +00:00
David Xu	c1df5a1a5d	If a thread is resumed by thr_wake, it should return 0, especially it should not return ERESTART after it caught a signal, otherwise thr_wake() call will be lost, also a timeout wait should not be restarted. Final, using wakeup not wakeup_one to be safeness.	2004-12-01 13:50:04 +00:00
Poul-Henning Kamp	d672e07541	We already have a lock initialization function, use that for fdesc_mtx also. Polish badfo stuff.	2004-12-01 09:42:35 +00:00
Poul-Henning Kamp	010b1e3fdc	Collect the stuff for the /dev/fd/{%d,std{in,out,err}} pseudo-device driver at the bottom of the file.	2004-12-01 09:29:31 +00:00
Poul-Henning Kamp	e4643c730a	"nfiles" is a bad name for a global variable. Call it "openfiles" instead as this is more correct and matches the sysctl variable.	2004-12-01 09:22:26 +00:00
Poul-Henning Kamp	cc2f51ef32	Style: move data to top of file.	2004-12-01 08:06:27 +00:00
Scott Long	05d0bf79ed	Remove the last vestiges of the userconfig option. None of this actually did anything, so this commit should be considered a NO-OP.	2004-12-01 04:59:33 +00:00
Max Laier	69fb23b73d	Implement the check I was talking about in the previous message already. Introduce domain_init_status to keep track of the init status of the domains list (surprise). 0 = uninitialized, 1 = initialized/unpopulated, 2 = initialized/done. Higher values can be used to support late addition of domains which right now "works", but is potential dangerous. I choose to only give a warning when doing so. Use domain_init_status with if_attachdomain[1]() to ensure that we have a complete domains list when we init the if_afdata array. Store the current value of domain_init_status in if_afdata_initialized. This way we can update if_afdata after a new protocol has been added (once that is allowed). Submitted by: se (with changes) Reviewed by: julian, glebius, se PR: kern/73321 (partly)	2004-11-30 22:38:37 +00:00
David Xu	d111b34081	Forgot to inline umtxq_unlock.	2004-11-30 12:18:53 +00:00
David Xu	3f76af0f4a	1. use per-chain mutex instead of global mutex to reduce lock collision. 2. Fix two race conditions. One is between _umtx_unlock and signal, also a thread was marked TDF_UMTXWAKEUP by _umtx_unlock, it is possible a signal delivered to the thread will cause msleep returns EINTR, and the thread breaks out of loop, this causes umtx ownership is not transfered to the thread. Another is in _umtx_unlock itself, when the function sets the umtx to UMTX_UNOWNED state, a new thread can come in and lock the umtx, also the function tries to set contested bit flag, but it will fail. Although the function will wake a blocked thread, if that thread breaks out of loop by signal, no contested bit will be set.	2004-11-30 12:02:53 +00:00
Nate Lawson	2fd32b933f	Replace a printf with a KASSERT that we are indeed running on the BSP.	2004-11-30 06:21:38 +00:00
Bruce M Simpson	4ac33532ef	Fix the build.	2004-11-30 03:23:35 +00:00
Peter Wemm	1114f4f9a2	Switch from 1024hz to 1000hz on amd64 to match i386. 1024 is a bad choice because it is so in sync with stathz (128hz or 4096hz etc).	2004-11-30 00:25:26 +00:00
Paul Saab	d297f70246	If soreceive() is called from a socket callback, there's no reason to do a window update to the peer (thru an ACK) from soreceive() itself. TCP will do that upon return from the socket callback. Sending a window update from soreceive() results in a lock reversal. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson	2004-11-29 23:10:59 +00:00
Paul Saab	85d11adf25	Make soreceive(MSG_DONTWAIT) nonblocking. If MSG_DONTWAIT is passed into soreceive(), then pass in M_DONTWAIT to m_copym(). Also fix up error handling for the case where m_copym() returns failure. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson	2004-11-29 23:09:07 +00:00
Paul Saab	d8b8e875a2	When upgrading the shared lock to an exclusive lock, if we discover that the exclusive lock is already held, then we call panic. Don't clobber internal lock state before panic'ing. This change improves debugging if this case were to happen. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson	2004-11-29 22:58:32 +00:00
Colin Percival	40ab7ed988	Sigh. I really need to get an internet connection which is less than 2km away from where I'm living, so that I can fix these typos sooner. s/SA_MAX/AF_MAX/ is previous commit. Reported by: marcus, ups, Yiawei Ye, dwhite	2004-11-29 14:00:08 +00:00
Colin Percival	b96e102ae2	Check that saddr->sa_family is a sensible value before using it. Reported by: Bryan Fulton and Ted Unangst, Coverity, Inc. Found by: The SWAT analysis tool	2004-11-28 19:16:00 +00:00
Robert Watson	1a1238a112	Don't acquire Giant before calling closef() in close() (and elsewhere); instead acquire it conditionally in closef() if it is required for advisory locking. This removes Giant from the close() path of sockets and pipes (and any other objects that don't acquire Giant in their fo_close path, such as kqueues). Giant will still be acquired twice for vnodes -- once for advisory lock teardown, and a second time in the fo_close method. Both Poul-Henning and I believe that the advisory lock teardown code can be moved into the vn_closefile path shortly. This trims a percent or two off the cost of most non-vnode close operations on SMP, but has a fairly minimal impact on UP where the cost of a single mutex operation is pretty low.	2004-11-28 14:37:17 +00:00
Poul-Henning Kamp	a7db6b6ed3	Use FILEDESC_LOCK_FAST in checkdirs()	2004-11-28 11:26:43 +00:00
David Xu	7d2eb68b66	Unlock mutex if PDROP was set by caller.	2004-11-27 11:43:31 +00:00
David Schultz	6004362e66	Don't include sys/user.h merely for its side-effect of recursively including other headers.	2004-11-27 06:51:39 +00:00
David Schultz	1eecfae3e5	Axe a.out core dump support. Neither older gdb binaries nor current bfd sources understand the present format.	2004-11-27 06:46:59 +00:00
Poul-Henning Kamp	6518a5aa8e	Eliminate MNT_NODEV usage, it doesn't have any meaning any more. Keep a #define MNT_NODEV 0 around to avoid dealing with contrib userland like mount_smbfs.	2004-11-26 19:28:39 +00:00
Poul-Henning Kamp	f0775d7c7a	Fix LOR. Solution pointed out by: jhb	2004-11-26 06:14:04 +00:00
Poul-Henning Kamp	1b52747b5f	Allow a filesystem to have both old and new mount methods at the same time. This will be necessary for transitioning.	2004-11-25 12:19:24 +00:00
Poul-Henning Kamp	f5b2f15a0c	Regen.	2004-11-25 12:08:16 +00:00
Poul-Henning Kamp	7fa77ace06	Mark mount, unmount and nmount MPSAFE	2004-11-25 12:07:28 +00:00
Poul-Henning Kamp	19da2efc3e	Assert Giant held in vfs_domount() and vfs_dounmount() Explicitly grab Giant before calling these.	2004-11-25 12:06:43 +00:00
Poul-Henning Kamp	de4cbbf593	Integrate the relevant bits of vfs_rootmountalloc() where it matters.	2004-11-25 09:47:51 +00:00
Robert Watson	436cac68e6	Correct a bug introduced in sys_pipe.c:1.179: in pipe_ioctl(), release the pipe mutex before calling fsetown(), as fsetown() may block. The sigio code protects the pipe sigio data using its own mutex, and the pipe reference count held by the caller will prevent the pipe from being prematurely garbage-collected. Discovered by: imp	2004-11-23 22:15:08 +00:00
David Schultz	c17ff94938	Neither of the arguments to closef() can be NULL anymore, so don't check for that.	2004-11-21 11:06:24 +00:00
David Schultz	6db36923ad	Remove local definitions of RANGEOF() and use __rangeof() instead. Also remove a few bogus casts.	2004-11-20 23:00:59 +00:00
David Schultz	0ef5c36ff1	Maintain the broken state of backwards compatibilty for a.out (and PECOFF!) core dumps. None of the old versions of gdb I tried were able to read a.out core dumps before or after this change. Reviewed by: arch@	2004-11-20 02:32:04 +00:00
David Schultz	8b059651ba	Malloc p_stats instead of putting it in the U area. We should consider simply embedding it in struct proc. Reviewed by: arch@	2004-11-20 02:28:48 +00:00
Mark Santcroos	9b7fe7e497	Place function comment above the right function.	2004-11-19 00:58:30 +00:00
Mark Santcroos	2524cfb753	Rebuild from syscalls.master:1.179 Reviewed by: imp, phk, njl, peter Approved by: njl	2004-11-18 23:52:40 +00:00
Mark Santcroos	6b270b4825	Add ntp_gettime(2) system call. Reviewed by: imp, phk, njl, peter Approved by: njl	2004-11-18 23:46:14 +00:00
Mark Santcroos	932cfd418c	Add system call implementation of ntp_gettime(2). Moved most of the work to ntp_gettime1(), which is now called by ntp_gettime() and ntp_sysctl(). Reviewed by: imp, phk, njl, peter Approved by: njl	2004-11-18 23:44:49 +00:00
Poul-Henning Kamp	18dc737317	Ok, first blunder: ioctls are not entirely unused on vnodes anymore :-) Add dropped call to VOP_IOCTL().	2004-11-18 17:15:04 +00:00
Poul-Henning Kamp	7cc9fb79db	Pass path to filesystem when mounting root	2004-11-18 14:31:24 +00:00
Jeff Roberson	b646893f0f	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.	2004-11-18 08:44:09 +00:00
Poul-Henning Kamp	c31e6a8dc8	Make more sense out of vop_stdcreatevobject()	2004-11-18 07:55:05 +00:00
John Baldwin	d0b4135e00	Don't bother exiting storming mode once a second to see if it has gone away, instead only exit storming mode when an interrupt stops firing long enough for the ithread to exit the loop and go back to sleep. Tested by: macrus (cruder version)	2004-11-17 14:39:41 +00:00
Poul-Henning Kamp	a0fbccc9e7	Push Giant down through ioctl. Don't grab Giant in the upper syscall/wrapper code NET_LOCK_GIANT in the socket code (sockets/fifos). mtx_lock(&Giant) in the vnode code. mtx_lock(&Giant) in the opencrypto code. (This may actually not be needed, but better safe than sorry). Devfs grabs Giant if the driver is marked as needing Giant.	2004-11-17 09:09:55 +00:00
Poul-Henning Kamp	db446e30cc	Push Giant down through select and poll. Don't grab Giant in the upper syscall/wrapper code NET_LOCK_GIANT in the socket code (sockets/fifos). mtx_lock(&Giant) in the vnode code. Devfs grabs Giant if the driver is marked as needing Giant.	2004-11-17 08:01:10 +00:00
Diomidis Spinellis	7690a6e4ba	Improvements and fixes in the 1.241 commit: - Have TS_ZOMBIE ttys return POLLHUP instead of POLLERR - Remove unneeded POLLWRNORM (old bug) - TS_ZOMBIE ttys will set POLLIN and POLLRDNORM - Do not call selrecord in TS_ZOMBIE ttys PR: kern/73821 Reviewed by: bde MFC after: 4 weeks	2004-11-16 17:41:16 +00:00
John Baldwin	a51dae09ec	Adjust the interrupt storm handling code to better handle a storm. When a storm is detected, enter "storming" mode which throttles the interrupt source such that the handlers are run once every clock tick. Previously we allowed a full set of storm_threshold interations through the handler before going back to sleep. Also, this currently will intentionally exit storming mode once a second to see if the storm has passed. Tested by: marcus Discussed with: bde	2004-11-16 16:09:46 +00:00
Poul-Henning Kamp	8ccf264fcc	Polish code to correctly reflect structure.	2004-11-16 14:47:04 +00:00
Poul-Henning Kamp	1b5cd47aa0	Move a FILEDESC_UNLOCK upwards to silence witness.	2004-11-16 14:41:31 +00:00
Poul-Henning Kamp	dc99052535	Move a FILEDESC_UNLOCK up to maintain correct nesting of FILEDESC/FILE locking.	2004-11-16 09:12:03 +00:00
Poul-Henning Kamp	9bb4281603	Eliminate pointless goto.	2004-11-16 08:22:06 +00:00
Poul-Henning Kamp	7f21497282	Add missing break.	2004-11-16 06:57:52 +00:00
Poul-Henning Kamp	f608397595	Give vn_poll single exit point (to make it easier to insert "mtx_unlock(&Giant)" real soon now).	2004-11-15 21:56:42 +00:00
Poul-Henning Kamp	f661e9a0bc	Straighten the ioctl function out to have only one exit point.	2004-11-15 21:51:28 +00:00
Poul-Henning Kamp	48ab5b2d21	Forgot to remove now unused variable in last commit.	2004-11-15 21:28:00 +00:00
Poul-Henning Kamp	136211e58e	It is not necessary to hold vn_start_write/vn_finished_write around VOP_REVOKE.	2004-11-15 21:27:06 +00:00
Poul-Henning Kamp	718fe8e2bf	Next FILEDESC_LOCK properly around FILE_LOCK	2004-11-15 21:26:13 +00:00
Warner Losh	6d1ab6edac	Fix an off by one error. MAXPATHLEN already has +1.	2004-11-15 20:51:32 +00:00
Poul-Henning Kamp	970d8904d6	Make FILE_LOCK and FILEDESC_LOCK nest properly by postponing the the release of FILEDESC_LOCK a few more lines.	2004-11-15 16:10:55 +00:00
Poul-Henning Kamp	9c83534dd8	Make VOP_BMAP return a struct bufobj for the underlying storage device instead of a vnode for it. The vnode_pager does not and should not have any interest in what the filesystem uses for backend. (vfs_cluster doesn't use the backing store argument.)	2004-11-15 09:18:27 +00:00

... 3 4 5 6 7 ...

8319 Commits