freebsd-nq

Author	SHA1	Message	Date
John Baldwin	6b81555744	Axe KTR_ALQ_MASK now that KTR_WITNESS is off unless you hack an #ifdef in subr_witness.c. I did add a comment in subr_witness.c noting that KTR_WITNESS is incompatible with KTR_ALQ.	2006-01-25 14:57:23 +00:00
Stephan Uphoff	6807424d19	Back out changes made in rev. 1.151. They were bogus. Cluebat applied by: jhb@	2006-01-25 02:05:47 +00:00
Don Lewis	f4af687a3b	Touch all the pages wired by sysctl_wire_old_buffer() to avoid PTE modified bit emulation traps on Alpha while holding locks in the sysctl handler. A better solution would be to pass a hint to the Alpha pmap code to tell mark these pages as modified when they as they are being wired, but that appears to be more difficult to implement. Suggested by: jhb MFC after: 3 days	2006-01-25 01:03:34 +00:00
John Baldwin	67f7fe8c01	Whitespace fix.	2006-01-24 22:24:05 +00:00
John Baldwin	2b604e82b2	- Add a new KTR_SUBSYS in place of KTR_SPARE1 to serve as a subsystem placeholder similar to KTR_DEV. Explain the use of KTR_DEV and KTR_SUBSYS in a comment as well. - Retire KTR_WITNESS and instead have KTR_WITNESS default to off but use KTR_SUBSYS if it is enabled.	2006-01-24 22:23:45 +00:00
David Xu	1aa4c324ee	Add locking annotation and comments about socket, pipe, fifo problem. Temporarily fix a locking problem for socket I/O.	2006-01-24 07:24:24 +00:00
David Xu	e6bdc05ff7	Er, rescure a deleted comment line.	2006-01-24 02:50:42 +00:00
David Xu	bd793be3c6	More cleanup for aio code: 1) unregsiter kqueue filter for EVFILT_LIO. 2) free uma_zones. 3) call setsid directly to enter another session rather than implementing by itself. Submitted by: jhb	2006-01-24 02:46:15 +00:00
David Xu	7f34b521c7	Add bracket.	2006-01-23 23:46:30 +00:00
John Baldwin	704c9f00fb	Fix a vnode reference leak in the ktrace code. We always grab a reference to the vnode at the start of ktr_writerequest() but were missing the corresponding vrele() after we finished the write operation. Reported by: jasone	2006-01-23 21:45:32 +00:00
Stephan Uphoff	03001f59c8	Hopefully fix the "calcru: runtime went backwards from ..." problem by keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days	2006-01-23 19:15:13 +00:00
Andre Oppermann	fd2413398c	In mb_zinit_pack() explicitly ignore the return value of uma_zalloc_arg(). The success of the cluster allocation is checked through a field in the mbuf structure. This change is non-functional but properly silences code inspection tools. Found by: Coverity Prevent(tm) Coverity ID: CID807 Sponsored by: TCP/IP Optimization Fundraise 2005	2006-01-23 15:49:01 +00:00
David Xu	68d7111884	Verify all supported notification types.	2006-01-23 10:27:15 +00:00
David Xu	a9bf5e37ae	1) Merge _aio_aqueue and aio_aqueue, check quota in aio_aqueue, so that lio_listio won't exceed the quota. 2) Remove lio_ref_count, it is no longer used.	2006-01-23 02:49:34 +00:00
Alan Cox	bb53e2bf27	Remove an unnecessary call to pmap_remove_all(). The given page is not mapped because its contents are invalid. Reviewed by: tegge	2006-01-23 00:00:45 +00:00
Don Lewis	1dd5fc0fde	Tweak previous vfs_lookup.c commit to return an EINVAL error from lookup() instead of EPERM when a DELETE or RENAME operation is attempted on "..". In kern_unlink(), remap EINVAL errors returned from namei() to EPERM to match existing (and POSIX required) behaviour. Discussed with: bde MFC after: 3 days	2006-01-22 19:37:02 +00:00
David Xu	8c0d9af5bf	Fix a bogus panic.	2006-01-22 09:39:59 +00:00
David Xu	9b84335c84	Decrease kaio_active_count first, because user process may go away after we notified it.	2006-01-22 09:25:52 +00:00
David Xu	4ca4c9ee68	Regen.	2006-01-22 06:01:48 +00:00
David Xu	1ce9182407	Make aio code MP safe.	2006-01-22 05:59:27 +00:00
Nate Lawson	a2b31c5b4f	Add a devd(8) event that is sent after the system resumes. This can be used by utilities to reset moused(8), for example. The syntax is: !system=kern subsystem=power type=resume Note that it would be nice to have notification of suspend, but it's more difficult since there would have to be a method of doing request/ack to userland and automatically timing out if no response. apm(4) has a similar mechanism. MFC after: 2 weeks	2006-01-22 01:06:25 +00:00
Robert Watson	c1250af683	Convert remaining functions to ANSI C function declarations. MFC after: 1 week	2006-01-22 00:30:46 +00:00
Alan Cox	e5e6093ba9	Avoid a vm object reference leak in a rarely used code path. An executable contains at most one PT_INTERP program header. Therefore, the loop that searches for it can terminate after it is found rather than iterating over the entire set of program headers. Eliminate an unneeded initialization. Reviewed by: tegge	2006-01-21 20:11:49 +00:00
Don Lewis	bea7a8d75c	Return EPERM from lookup() if cn_nameiop is DELETE or RENAME and the last component of the path name is "..". This keeps VOP_LOOKUP() from locking vnodes in reverse order. Tested by: Denis Shaposhnikov <dsh AT vlink DOT ru> MFC after: 3 days	2006-01-21 19:57:56 +00:00
Robert Watson	6be2c41a22	Convert remaining functions in vfs_subr.c from K&R prototypes to ANSI C prototypes, as the majority of new functions added have been in this style. Changing prototype style now results in gcc noticing that the implementation of vn_pollrecord() has a 'short' argument instead of 'int' as prototyped in vnode.h, so correct that definition. In practice this didn't matter as only poll flags in the lower 16 bits are used. MFC after: 1 week	2006-01-21 19:42:10 +00:00
John Baldwin	267ec43593	When loading a driver that is a subclass of another driver don't set the devclass's parent pointer if the two drivers share the same devclass. This can happen if the drivers use the same new-bus name. For example, we currently have 3 drivers that use the name "pci": the generic PCI bus driver, the ACPI PCI bus driver, and the OpenFirmware PCI bus driver. If the ACPI PCI bus driver was defined as a subclass of the generic PCI bus driver, then without this check the "pci" devclass would point to itself as its parent and device_probe_child() would spin forever when it encountered the first PCI device that did have a matching driver. Reviewed by: dfr, imp, new-bus@	2006-01-20 21:59:13 +00:00
Julian Elischer	11f4763dd4	Return the thread name in the kinfo_proc structure. Also correct the comment describing what the value is.	2006-01-18 20:27:43 +00:00
John Baldwin	25e498b4b0	Always include the lock_classes[] array in the kernel. The "is it a spinlock" test in mtx_destroy() needs it even in non-debug kernels. Reported by: danfe	2006-01-18 18:02:50 +00:00
Juli Mallett	b241b0a239	Since p_cansee will end up dereferencing p_ucred, don't check for p_ucred equal to NULL several times later. p_ucred "should probably not" be NULL if the process isn't PRS_NEW anyway. This is strongly reinforced by the fact that we don't see frequent crashes here. Remove the checks after p_cansee and add a KASSERT right before it. Found by: Coverity Prevent (tm) Also trim one nearby trailing space.	2006-01-17 20:25:01 +00:00
John Baldwin	6ef970a972	Bah. Fix 'show lock' to actually be compiled in. I had just fixed this in p4 but had an older subr_lock.c on the machine I committed to CVS from.	2006-01-17 16:58:32 +00:00
John Baldwin	83a81bcb14	Add a new file (kern/subr_lock.c) for holding code related to struct lock_obj objects: - Add new lock_init() and lock_destroy() functions to setup and teardown lock_object objects including KTR logging and registering with WITNESS. - Move all the handling of LO_INITIALIZED out of witness and the various lock init functions into lock_init() and lock_destroy(). - Remove the constants for static indices into the lock_classes[] array and change the code outside of subr_lock.c to use LOCK_CLASS to compare against a known lock class. - Move the 'show lock' ddb function and lock_classes[] array out of kern_mutex.c over to subr_lock.c.	2006-01-17 16:55:17 +00:00
John Baldwin	550d1c9392	Initialize thread0.td_contested in init_turnstiles() rather than mutex_init() as it is used by the turnstile code and is not mutex-specific.	2006-01-17 16:47:42 +00:00
John Baldwin	3eb9cab0c6	Garbage collect turnstile_empty() since it is unused.	2006-01-17 16:40:20 +00:00
Poul-Henning Kamp	25a14196dd	Fix an 11 year old mistake: Let the hash functions take a void* instead of unsigned char* argument.	2006-01-17 15:35:57 +00:00
Tor Egge	dffaf91aa3	Set flag in needsbuffer while still holding bqlock to avoid lost wakeup.	2006-01-16 22:09:47 +00:00
Christian S.J. Peron	323203d389	vfs_busy can only return something useful if MNTK_UNMOUNT has been set. Since we are using vfs_busy() on a freshly allocated mount structure, use (void) to show that we do not care about the return value. Found with: Coverity Prevent (tm) MFC after: 2 weeks	2006-01-15 20:14:11 +00:00
Robert Watson	6994eebcab	Cast VFS_STATFS() in vfs_domount() to (void) to indicate that ignoring the return value is intentional: this is simply an attempt to pre-cache the statfs state. Found with: Coverity Prevent (tm) MFC after: 3 days	2006-01-15 20:01:05 +00:00
Christian S.J. Peron	8213baf002	Initialize ki to p->p_aioinfo after we know it's going to be referencing a valid kaioinfo structure. This avoids a potential NULL pointer dereference. Found with: Coverity Prevent(tm) MFC after: 2 weeks	2006-01-15 01:55:45 +00:00
Ruslan Ermilov	6a61c14ee1	AMD64 also supports disk slices.	2006-01-14 20:47:11 +00:00
Poul-Henning Kamp	c9df826b0a	Correct STAILQ usage in purge of resourcelist. Found with: Coverity Prevent(tm)	2006-01-14 09:41:35 +00:00
Scott Long	0f92108d32	Add the following to the taskqueue api: taskqueue_start_threads(struct taskqueue *, int count, int pri, const char name, ...); This allows the creation of 1 or more threads that will service a single taskqueue. Also rework the taskqueue_create() API to remove the API change that was introduced a while back. Creating a taskqueue doesn't rely on the presence of a process structure, and the proc mechanics are much better encapsulated in taskqueue_start_threads(). Also clean up the taskqueue_terminate() and taskqueue_free() functions to safely drain pending tasks and remove all associated threads. The TASKQUEUE_DEFINE and TASKQUEUE_DEFINE_THREAD macros have been changed to use the new API, but drivers compiled against the old definitions will still work. Thus, recompiling drivers is not a strict requirement.	2006-01-14 01:55:24 +00:00
Robert Watson	bc03ea7f49	When calling bioq_first() to see if a queue is empty in bioq_disksort(), don't save the return value as we won't use it. Noticed by: Coverity Prevent analysis tool MFC after: 3 days	2006-01-13 23:27:12 +00:00
Robert Watson	b8ae1cd619	Add sosend_dgram(), a greatly reduced and simplified version of sosend() intended for use solely with atomic datagram socket types, and relies on the previous break-out of sosend_copyin(). Changes to allow UDP to optionally use this instead of sosend() will be committed as a follow-up.	2006-01-13 10:22:01 +00:00
Robert Watson	d7dca9034c	XXX a comment in uipc_usrreq.c that requires updating.	2006-01-13 00:00:32 +00:00
Alfred Perlstein	7d7e053c21	Novel idea, don't print a string if it is NULL! This protects people from loading _really_ old modules, like say from 5.x to a 6.x or 7.x system, like for instance right after an upgrade.	2006-01-12 19:15:14 +00:00
Scott Long	1c3a3b0bd0	The interlock in taskqueue_terminate() is completely wrong for taskqueues that use spinlocks. Remove it for now.	2006-01-11 00:37:13 +00:00
Poul-Henning Kamp	d3e64681d6	Move the old BSD4.3 tty compatibility from (!BURN_BRIDGES && COMPAT_43) to COMPAT_43TTY. Add COMPAT_43TTY to NOTES and */conf/GENERIC Compile tty_compat.c only under the new option. Spit out #warning "Old BSD tty API used, please upgrade." if ioctl_compat.h gets #included from userland.	2006-01-10 09:19:10 +00:00
Scott Long	9df1a6dd61	Add functions and macros and refactor code to make it easier to manage fast taskqueues. The following have been added: TASKQUEUE_FAST_DEFINE() - create a global task queue. an arbitrary execution context. TASKQUEUE_FAST_DEFINE_THREAD() - create a global taskqueue that uses a dedicated kthread. taskqueue_create_fast() - create a local/private taskqueue. These are all complimentary of the standard taskqueue functions. They are primarily useful for fast interrupt handlers that can only use spinlock for synchronization. I personally think that the taskqueue API is starting to get too narrow and hairy, but fixing it will require a major redesign on the API. Such a redesign would be good but would break compatibility with FreeBSD 6.x, so it really isn't desirable at this time. Submitted by: sam	2006-01-10 06:31:12 +00:00
Tor Egge	82be0a5a24	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
Scott Long	861a23087b	If destroying a spinlock, make sure that it is exited properly. Submitted by: jhb MFC After: 3 days	2006-01-08 00:18:34 +00:00
John Baldwin	3b783acd2a	Revert an untested local change that crept in with the lo_class changes and subsequently broke the build. This change is supposed to fix the case where doing a mtx_destroy() off a spin mutex while you hold it fails. If it had been tested I would just leave it in, but it hasn't been tested yet, so it will have to wait until later.	2006-01-07 14:03:15 +00:00
David Xu	0a5cd498bb	Add a new feature to thr_kill, if thread ID argument is -1, send signals to all threads except current sender.	2006-01-07 03:15:21 +00:00
Tai-hwa Liang	75d6a87fa3	Trying to fix compilation bustage introduced in rev1.160 by converting a missing lo_class to LO_CLASSINDEX().	2006-01-07 02:07:08 +00:00
John Baldwin	3c6decc327	Trim another pointer from struct lock_object (and thus from struct mtx and struct sx). Instead of storing a direct pointer to a our lock_class struct in lock_object, reserve 4 bits in the lo_flags field to serve as an index into a global lock_classes array that contains pointers to the lock classes. Only debugging code such as WITNESS or INVARIANTS checks and KTR logging need to access the lock_class member, so this shouldn't add any overhead to production kernels. It might add some slight overhead to kernels using those debug options however. As with the previous set of changes to lock_object, this is going to completely obliterate the kernel ABI, so be sure to recompile all your modules.	2006-01-06 18:07:32 +00:00
John Baldwin	af56abaab5	Return error from fget_write() rather than hardcoding EBADF now that fget_write() DTRT. Requested by: bde	2006-01-06 16:34:22 +00:00
John Baldwin	38f63f7e47	Return EBADF rather than EINVAL for FWRITE failure as per POSIX. MFC after: 1 week	2006-01-06 16:30:30 +00:00
John Baldwin	e730167f16	Remove XXX comments complaining that write(2) on a read-only descriptor returns EBADF. That errno is correct and is mandated by POSIX. It also goes back to revision 1.1 of our CVS history (i.e. 4.4BSD). The _fget() function should probably also be upated as it currently returns EINVAL in that case rather than EBADF. (It does return EBADF for reads on a write-only descriptor without any XXX comments oddly enough.) Discussed with: scottl, grog, mjacob, bde	2006-01-05 22:20:31 +00:00
Bjoern A. Zeeb	ba0b6851b4	Minor whitespace cleanup.	2006-01-04 17:40:54 +00:00
Poul-Henning Kamp	d5f1e0d1ef	Deorbit ttymalloc() in preference for ttyalloc()	2006-01-04 09:59:07 +00:00
Poul-Henning Kamp	8607f52b66	Use ttyalloc() instead of ttymalloc()	2006-01-04 09:09:46 +00:00
Poul-Henning Kamp	246b8d448a	Use MTX_SYSINIT to set up the tty list mutex.	2006-01-04 08:22:39 +00:00
Diomidis Spinellis	c3d78136c9	Fix style bug. Prompted by: bde	2006-01-04 07:50:54 +00:00
Diomidis Spinellis	f8ccc6ceb9	Replace tv_usec normalization with the return of EINVAL. This addresses two objections to the previous behavior, and unbreaks the alpha tinderbox build. TODO: update the utimes(2) man page.	2006-01-04 00:47:13 +00:00
Diomidis Spinellis	51339e8593	Normalize the tv_usec part of the utimes(2) arguments to ensure that a file's atime and mtime are only set to correct fractional second values (0-999999000ns with the current interface). Prior to this change users could create files with values outside that range. Moreover, on 32-bit machines tv_usec offsets larger than 4.3s would result in an unnormalized AND wrong timestamp value, due to overflow. MFC after: 1 week	2006-01-03 21:58:21 +00:00
Alexander Leidinger	ef39c05baa	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)	2005-12-31 14:39:20 +00:00
Pawel Jakub Dawidek	d362c40d3a	Improve memguard a bit: - Provide tunable vm.memguard.desc, so one can specify memory type without changing the code and recompiling the kernel. - Allow to use memguard for kernel modules by providing sysctl vm.memguard.desc, which can be changed to short description of memory type before module is loaded. - Move as much memguard code as possible to memguard.c. - Add sysctl node vm.memguard. and move memguard-specific sysctl there. - Add malloc_desc2type() function for finding memory type based on its short description (ks_shortdesc field). - Memory type can be changed (via vm.memguard.desc sysctl) only if it doesn't exist (will be loaded later) or when no memory is allocated yet. If there is allocated memory for the given memory type, return EBUSY. - Implement two ways of memory types comparsion and make safer/slower the default.	2005-12-30 11:45:07 +00:00
Pawel Jakub Dawidek	e7736557d6	Print a warning when we miss vinactive() call, because of race in vget(). The race is very real, but conditions needed for triggering it are rather hard to meet now. When gjournal will be committed (where it is quite easy to trigger) we need to fix it. For now, verify if it is really hard to trigger. Discussed with: kan	2005-12-29 22:52:09 +00:00
John Baldwin	8963150678	patch(1) and I aren't friends today. Axe a duplicate copy of the msleep_spin() function definition. Spotted by: pjd	2005-12-29 21:15:32 +00:00
John Baldwin	0cb7e6aec8	Add a new function msleep_spin() which is a slightly stripped down version of msleep(). msleep_spin() doesn't support changing the priority of the thread while it is asleep nor does it support interruptible sleeps (PCATCH) or the PDROP flag. It does support timeouts however. It differs from msleep() in that the passed in mutex is a spin mutex. This means one can use msleep_spin() and wakeup() with a spin mutex similar to msleep() and wakeup() with a regular mutex. Note that the spin mutex in question needs to come before sched_lock and the sleepq locks in lock order.	2005-12-29 20:57:45 +00:00
John Baldwin	b0e9883e2f	Teach WITNESS_SAVE() and WITNESS_RESTORE() to work with spin locks instead of only sleep locks.	2005-12-29 20:54:25 +00:00
John Baldwin	0a46ed7d56	Fix a deadlock I introduced with the recently added printf to warn about spin locks that are not in the static order list. It is not safe to call printf while holding the witness spin mutex since the console drivers that back printf may need to use their own spin locks which would try to talk to witness when they were locked. Given this, it is possible for one CPU to lock a console driver lock (such as sio) which then tries to lock the witness lock while another CPU is doing the printf while holding the witness lock. Fix this by moving the printf outside of the witness lock. All other printf's in witness are already correct. MFC after: 3 days	2005-12-29 20:53:01 +00:00
John Baldwin	42b6a681bc	Increment kobj_lookup_misses on a miss rather than decrementing it. Otherwise, the miss count is actually -kobj_lookup_misses. Mostly a pedantic change as KOBJ_STATS isn't on by default.	2005-12-29 18:00:42 +00:00
David Xu	3357835a46	Add code to report zombie state. PR: threads/91044 MFC after: 3 days	2005-12-29 13:00:42 +00:00
Alexander Kabaev	3f34977614	Trim trailing whitespace.	2005-12-28 17:13:31 +00:00
Pawel Jakub Dawidek	619f284195	In realloc(9), determine size of the original block based on UMA_SLAB_MALLOC flag. In some circumstances (I observed it when I was doing a lot of reallocs) UMA_SLAB_MALLOC can be set even if us_keg != NULL. If this is the case we have wonderful, silent data corruption, because less data is copied to the newly allocated region than should be. I'm not sure when this bug was introduced, it could be there undetected for years now, as we don't have a lot of realloc(9) consumers and it was hard to reproduce it... ...but what I know for sure, is that I don't want to know who introduce the bug:) It took me two/three days to track it down (of course most of the time I was looking for the bug in my own code).	2005-12-28 01:53:13 +00:00
David Xu	9f8eb3cb52	Use variable i instead of variable cpus as an index to get correct kseq.	2005-12-27 12:02:03 +00:00
Maxim Sobolev	d49b21093c	Fix breakage introduced in the previous commit.	2005-12-26 22:32:52 +00:00
Maxim Sobolev	900b28f9f6	Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually allow executing elf dynamic binaries (aka shared libraries). When it is requested to execute ET_DYN elf image check if this flag is on after we know the elf brand allowing execution if so. PR: kern/87615 Submitted by: Marcin Koziej <creep@desk.pl>	2005-12-26 21:23:57 +00:00
Alan Cox	60bb39431a	Maintain the lock on the vnode for most of exec_elfN_imgact(). Specifically, it is required for the I/O that may be performed by elfN_load_section(). Avoid an obscure deadlock in the a.out, elf, and gzip image activators. Add a comment describing why the deadlock does not occur in the common case and how it might occur in less usual circumstances. Eliminate an unused variable from exec_aout_imgact(). In collaboration with: tegge	2005-12-24 04:57:50 +00:00
David Xu	d7bc12b096	Avoid kernel panic when attaching a process which may not be stopped by debugger, e.g process is dumping core. Only access p_xthread if P_STOPPED_TRACE is set, this means thread is ready to exchange signal with debugger, print a warning if P_STOPPED_TRACE is not set due to some bugs in other code, if there is. The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and is slightly adjusted.	2005-12-24 02:59:29 +00:00
Jeff Roberson	49bdcff518	- Remove and unused include. Submitted by: Antoine Brodin <antoine.brodin@laposte.net>	2005-12-23 21:32:40 +00:00
Poul-Henning Kamp	25f6e35a05	Regenerate sysent with new abort2 system call. Implement abort2(const char reason, int narg, void *args); Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>	2005-12-23 11:58:42 +00:00
Poul-Henning Kamp	5a56b437ec	Add abort2() systemcall.	2005-12-23 11:54:11 +00:00
Poul-Henning Kamp	49091c48d5	Make sbuf_copyin() return the number of bytes copied on success. Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>	2005-12-23 11:49:53 +00:00
Scott Long	d2a401cb70	Create the taskqueue_fast handler with INTR_MPSAFE so that it doesn't run with Giant. MFC After: 3 days	2005-12-23 06:18:33 +00:00
John Baldwin	b439e431bf	Tweak how the MD code calls the fooclock() methods some. Instead of passing a pointer to an opaque clockframe structure and requiring the MD code to supply CLKF_FOO() macros to extract needed values out of the opaque structure, just pass the needed values directly. In practice this means passing the pair (usermode, pc) to hardclock() and profclock() and passing the boolean (usermode) to hardclock_cpu() and hardclock_process(). Other details: - Axe clockframe and CLKF_FOO() macros on all architectures. Basically, all the archs were taking a trapframe and converting it into a clockframe one way or another. Now they can just extract the PC and usermode values directly out of the trapframe and pass it to fooclock(). - Renamed hardclock_process() to hardclock_cpu() as the latter is more accurate. - On Alpha, we now run profclock() at hz (profhz == hz) rather than at the slower stathz. - On Alpha, for the TurboLaser machines that don't have an 8254 timecounter, call hardclock() directly. This removes an extra conditional check from every clock interrupt on Alpha on the BSP. There is probably room for even further pruning here by changing Alpha to use the simplified timecounter we use on x86 with the lapic timer since we don't get interrupts from the 8254 on Alpha anyway. - On x86, clkintr() shouldn't ever be called now unless using_lapic_timer is false, so add a KASSERT() to that affect and remove a condition to slightly optimize the non-lapic case. - Change prototypeof arm_handler_execute() so that it's first arg is a trapframe pointer rather than a void pointer for clarity. - Use KCOUNT macro in profclock() to lookup the kernel profiling bucket. Tested on: alpha, amd64, arm, i386, ia64, sparc64 Reviewed by: bde (mostly)	2005-12-22 22:16:09 +00:00
Alan Cox	373d1a3f8c	Maintain the vnode lock throughout elfN_load_file() rather than releasing it and reacquiring it in vrele(). Consequently, there is no reason to increase the reference count on the vm object caching the file's pages. Reviewed by: tegge Eliminate unused parameters to elfN_load_file().	2005-12-21 18:58:40 +00:00
Alan Cox	ff6f03c7cd	Eliminate an unneeded (vm_prot_t) parameter from two functions. Eliminate unnecessary uses of a local variable. Reviewed by: tegge	2005-12-20 23:42:18 +00:00
Pawel Jakub Dawidek	c505fe7a0f	Reduce Giant scope a bit, as fdrop() is believed to be MPSAFE. The purpose of this change is consistency (not performance improvement:)), as it was hard to tell if fdrop() is MPSAFE or not when I saw it sometimes under the Giant and sometimes without it. Glanced at by: ssouhlal, kan	2005-12-20 00:49:59 +00:00
Pawel Jakub Dawidek	ade9b797a0	vfs_mount_alloc() always returns 0, but what we really want is newly allocated 'struct mount *' pointer, so simplify code a bit and return the pointer directly. Reviewed by: ssouhlal	2005-12-20 00:43:51 +00:00
Pawel Jakub Dawidek	003ba8a000	Use 'td' instead of 'curthread'.	2005-12-19 16:27:13 +00:00
David Xu	a1d4fe69d2	Fix a bug in slice calculation code, current code uses hz but sched_clock() is called by state clock. Submitted by: taku at tackymt dot homeip dot net	2005-12-19 08:26:09 +00:00
Nate Lawson	bd6b217753	Remove the KTR for hardclock completely. It seems to not be useful. Requested by: jhb	2005-12-18 18:11:55 +00:00
Nate Lawson	1335c4df32	Restore KTR_CRITICAL but conditionally compile it in as KTR_SCHED. Requested by: scottl, jhb	2005-12-18 18:10:57 +00:00
Marcel Moolenaar	757686b115	Make our ELF64 type definitions match standards. In particular this means: o Remove Elf64_Quarter, o Redefine Elf64_Half to be 16-bit, o Redefine Elf64_Word to be 32-bit, o Add Elf64_Xword and Elf64_Sxword for 64-bit entities, o Use Elf_Size in MI code to abstract the difference between Elf32_Word and Elf64_Word. o Add Elf_Ssize as the signed counterpart of Elf_Size. MFC after: 2 weeks	2005-12-18 04:52:37 +00:00
Alan Cox	044bbbb523	Correct a long-standing problem in elfN_map_insert(): In order to copy a page to user space, the user space mapping must allow write access. In collaboration with: tegge@ MFC after: 3 weeks	2005-12-17 19:40:47 +00:00
Nate Lawson	8615fd8696	Clean up unused or poorly utilized KTR values. Remove KTR_FS, KTR_KGDB, and KTR_IO as they were never used. Remove KTR_CLK since it was only used for hardclock firing and use KTR_INTR there instead. Remove KTR_CRITICAL since it was only used for crit enter/exit and use KTR_CONTENTION instead.	2005-12-17 03:57:10 +00:00
John Baldwin	5c8b444153	- Use uintfptr_t rather than int for the kernel profiling index (though it really should be a fptrdiff_t if we had that) in profclock(). - Don't try to profile kernel pc's that are >= the kernel lowpc to avoid underflows when computing a profiling index. - Use the PC_TO_I() macro to compute the kernel profiling index rather than doing it inline. Discussed with: bde	2005-12-16 22:11:52 +00:00
John Baldwin	cb49fcd145	Change the addupc_*() functions to use the uintfptr_t type for pc rather than uintptr_t as that is technically more correct.	2005-12-16 22:08:32 +00:00
Alan Cox	584716b08a	Style: The second argument to vm_map_find() should be NULL instead of 0.	2005-12-16 19:14:25 +00:00
Alan Cox	da61b9a69e	Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the ephemeral mappings that are used as the source for three copy operations from kernel space to user space. There are two reasons for making this change: (1) Under heavy load exec_map can fill up causing vm_map_find() to fail. When it fails, the nascent process is aborted (SIGABRT). Whereas, this reimplementation using sf_buf_alloc() sleeps. (2) Although it is possible to sleep on vm_map_find()'s failure until address space becomes available (see kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore, the reimplementation uses a CPU private mapping, avoiding a TLB shootdown on multiprocessors. Problem uncovered by: kris@ Reviewed by: tegge@ MFC after: 3 weeks	2005-12-16 18:34:14 +00:00
Xin LI	6ba9ec2d09	In pipe_write(): when uiomove() fails, do not spin on it forever. Submitted by: Kostik Belousov <kostikbel at gmail.com> on -current@ Message-ID: <20051216151016.GE84442@deviant.zoral.local> MFC After: 3 weeks	2005-12-16 18:32:39 +00:00
David Xu	03f70aec67	Replace selwakeuppri with selwakeup, let scheduler figure out appropriate thread priority.	2005-12-16 15:01:16 +00:00
Ed Maste	63e6f39011	When using m_dup(9) to copy more than MHLEN bytes of data, don't create an mbuf chain that starts with a cluster containing just MHLEN bytes. This happened because m_dup called m_get or m_getcl depending on the amount of data to copy, but then always set the size available in the first mbuf to MHLEN. Submitted by: Matt Koivisto <mkoivisto at sandvine dot com> Approved by: jmg Silence from: rwatson (mentor)	2005-12-14 23:34:26 +00:00
Maxime Henrion	e59898ff36	Fix a bunch of SYSCTL_INT() that should have been SYSCTL_ULONG() to match the type of the variable they are exporting. Spotted by: Thomas Hurst <tom@hur.st> MFC after: 3 days	2005-12-14 22:27:48 +00:00
Dag-Erling Smørgrav	0430a5e289	Eradicate caddr_t from the VFS API.	2005-12-14 00:49:52 +00:00
John Baldwin	d272fe53a4	Add a new 'show lock' command to ddb. If the argument has a valid lock class, then it displays various information about the lock and calls a new function pointer in lock_class (lc_ddb_show) to dump class-specific information about the lock as well (such as the owner of a mutex or xlock'ed sx lock). This is easier than staring at hex dumps of locks to figure out who owns the lock, etc. Note that extending lock_class doesn't affect the ABI for any kernel modules as the only code that deals with lock_class structures directly is kern_mutex.c, kern_sx.c, and witness. MFC after: 1 week	2005-12-13 23:14:35 +00:00
David Xu	dd1a6f53ac	Stop fiddling thread priority with msleep, eliminating unnecessary context switching. This improves performance about 30% on UP machine.	2005-12-12 05:04:56 +00:00
Craig Rodrigues	92f44a3f3c	Contributions from XFS for FreeBSD project: - Implement cv_wait_unlock() method which has semantics compatible with the sv_wait() method in IRIX. For cv_wait_unlock(), the lock must be held before entering the function, but is not held when the function is exited. - Implement the existing cv_wait() function in terms of cv_wait_unlock(). Submitted by: kan Feedback from: jhb, trhodes, Christoph Hellwig <hch at infradead dot org>	2005-12-12 00:02:22 +00:00
Alan Cox	05406e6f33	Remove unneeded calls to pmap_remove_all(). The given page is not mapped. Reviewed by: tegge	2005-12-11 22:06:57 +00:00
Andre Oppermann	36ae3fd3c3	Hide the 4k mbuf clusters if the normal clusters are defined to be 4k already. This unbreaks tinderbox. Submitted by: ru	2005-12-10 15:21:04 +00:00
David Xu	3e70c6f047	Fix compiling warning on 64 bits system.	2005-12-09 13:16:48 +00:00
David Xu	f71a882f15	Add a sysctl to force a process to sigexit if a trap signal is being hold by current thread or ignored by current process, otherwise, it is very possible the thread will enter an infinite loop and lead to an administrator's nightmare.	2005-12-09 08:29:29 +00:00
David Xu	d26b1a1fb9	Register itimers_event_hook as a kernel event handler, so I don't have to duplicate code to call it in exec() and exit1().	2005-12-09 05:43:26 +00:00
David Xu	102178d0df	Comment out mqfs_create_link. Inline some small functions.	2005-12-09 02:38:29 +00:00
David Xu	5c4745177f	Now SIGCHLD is always queued.	2005-12-09 02:27:55 +00:00
David Xu	761a4d9423	Cleanup sigqueue sysctl.	2005-12-09 02:26:44 +00:00
Andre Oppermann	d5269a636b	Add an API for jumbo mbuf cluster allocation and also provide 4k clusters in addition to 9k and 16k ones. struct mbuf m_getjcl(int how, short type, int flags, int size) void m_cljget(struct mbuf *m, int how, int size) m_getjcl() returns an mbuf with a cluster of the specified size attached like m_getcl() does for 2k clusters. m_cljget() is different from m_clget() as it can allocate clusters without attaching them to an mbuf. In that case the return value is the pointer to the cluster of the requested size. If an mbuf was specified, it gets the cluster attached to it and the return value can be safely ignored. For size both take MCLBYTES, MJUM4BYTES, MJUM9BYTES, MJUM16BYTES. Reviewed by: glebius Tested by: glebius Sponsored by: TCP/IP Optimization Fundraise 2005	2005-12-08 13:13:06 +00:00
Craig Rodrigues	d5989f64cf	In devfs_first(), set mp->mnt_opt to a valid empty list of mount options instead of leaving it NULL. This eliminates a kernel panic when trying to do a mount -o update of /dev. Noticed by: cjsp Reviewed by: phk	2005-12-08 04:27:53 +00:00
Craig Rodrigues	8539ca4cde	Add "errmsg" to list of global mount options.	2005-12-08 04:09:29 +00:00
Craig Rodrigues	6951bea6c8	Changes imported from XFS for FreeBSD project: - add fields to struct buf (needed by XFS) - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3 - b_pin_count, count of pinned buffer - add new B_MANAGED flag - add breada() function to initiate asynchronous I/O on read-ahead blocks. - add bufdone_finish(), bpin(), bunpin_wait() functions Patches provided by: kan Reviewed by: phk Silence on: arch@	2005-12-07 03:39:08 +00:00
Alan Cox	8ad398d089	Reduce the scope of the page queues lock in exec_map_first_page(). The vm object lock is sufficient for reading a page's PG_BUSY and busy flags. MFC after: 1 week	2005-12-06 07:39:36 +00:00
David Xu	9da8a32aae	o Turn on MPSAFE flag for mqueuefs. o Reuse si_mqd field in siginfo_t, this also gives userland information about which descriptor is notified.	2005-12-06 06:22:12 +00:00
David Xu	027f760408	Fix a lock leak in childproc_continued().	2005-12-06 05:30:13 +00:00
John Baldwin	5d2162b2f8	Tweak witness handling of lock object to shave 2 pointers off of each lock object (and thus off of each mutex and sx lock): - Rename the all_locks list to pending_locks and only put locks initialized before SI_SUB_WITNESS on the list so that the SI_SUB_WITNESS can add them to witness once it starts up. - Now that pending_locks is only used during early startup, change it from a TAILQ to an STAILQ. This removes a pointer from the STAILQ_ENTRY in struct lock_object. - Since the pending_locks list is only used during the single-threaded early boot it no longer needs to be protected by a mutex, so remove all_mtx. - Since the lo_list member of struct lock_object is now only used during early boot before witness is running, collapse lo_list and lo_witness into a union. This shaves the second pointer off of struct lock_object. - Axe lock_cur_cnt and lock_max_cnt. With these changes, struct mtx shrinks from 36 to 28 bytes on 32-bit platforms and from 72 to 56 bytes on 64-bit platforms. Note that this commit will completely and utterly destroy the kernel ABI, so no MFC. Tested on: alpha, amd64, i386, sparc64	2005-12-05 20:45:24 +00:00
David Xu	052ea11c71	After reading some documents, I realized SIGEV_NONE != NULL, also fix code in mqueue_send_notification to handle SIGEV_NONE.	2005-12-05 04:41:32 +00:00
David Xu	9947b45978	Handle SIGEV_NONE, if notification is SIGEV_NONE, error status and return status will be set, but no notification will be registered. Increase hard limit of maxmsg to 100, so posixtestsuite ports can run.	2005-12-05 03:23:27 +00:00
Ruslan Ermilov	f4e9888107	Fix -Wundef.	2005-12-04 02:12:43 +00:00
Craig Rodrigues	1245b3433e	Add "rdonly" to global_opts, and parse it in vfs_donmount(). Requested by: rwatson	2005-12-03 12:04:20 +00:00
Craig Rodrigues	ec528a3472	- Add "rw" mount option to global_opts. - In vfs_donmount(), parse "ro", "noro", and "rw", in order to set or unset the MNT_RDONLY filesystem flag.	2005-12-03 01:26:27 +00:00
David Xu	5ee2d4ac5a	1. Cleanup including. 2. Set configuration value for CTL_P1003_1B_MESSAGE_PASSING.	2005-12-02 14:09:32 +00:00
David Xu	a6de716d7e	1. Check if message priority is less than MQ_PRIO_MAX. 2. Use getnanotime instead of getnanouptime. 3. Don't free message in _mqueue_send, mqueue_send will free it.	2005-12-02 08:23:49 +00:00
David Xu	77e718f773	1. Set timer configuration values for sysconf(). 2. Set overrun limit to INT_MAX, report ERANGE error if overrun will be greater than INT_MAX.	2005-12-01 07:56:15 +00:00
David Xu	b51d237a67	set signal queue values for sysconf().	2005-12-01 00:25:50 +00:00
David Xu	b2f92ef96b	Last step to make mq_notify conform to POSIX standard, If the process has successfully attached a notification request to the message queue via a queue descriptor, file closing should remove the attachment.	2005-11-30 05:12:03 +00:00
John Baldwin	398293a8de	Fix snderr() to not leak the socket buffer lock if an error occurs in sosend(). Robert accidentally changed the snderr() macro to jump to the out label which assumes the lock is already released rather than the release label which drops the lock in his previous change to sosend(). This should fix the recent panics about returning from write(2) with the socket lock held and the most recent LOR on current@.	2005-11-29 23:07:14 +00:00
Robert Watson	66dd8a6f99	Move zero copy statistics structure before sosend_copyin(). MFC after: 1 month Reported by: tinderbox, sam	2005-11-28 21:45:36 +00:00
John Baldwin	ef627e7da0	When checking to see if a process has exceeded its time limit, flag the process as over the limit when its time is >= to the limit rather than > the limit. Technically, if p->p_rux.rux_runtime.sec == p->p_pcpulimit and p->p_rux.rux_runtime.frac == 0, the process hasn't exceeded the limit yet. However, having the fraction exactly equal to 0 is rather rare, and it is not worth the overhead to handle that edge case. With just the > comparison, the process would have to exceed its limit by almost a second before it was killed. PR: kern/83192 Submitted by: Maciej Zawadzinski mzawadzinski at gmail dot com Reviewed by: bde MFC after: 1 week	2005-11-28 19:09:08 +00:00
Robert Watson	a725629cf8	Break out functionality in sosend() responsible for building mbuf chains and copying in mbufs from the body of the send logic, creating a new function sosend_copyin(). This changes makes sosend() almost readable, and will allow the same logic to be used by tailored socket send routines. MFC after: 1 month Reviewed by: andre, glebius	2005-11-28 18:09:03 +00:00
David Xu	f72b11a40c	Fix a stupid compiler warining, remove a redundant line.	2005-11-27 22:59:47 +00:00
David Xu	47bf2cf9fe	Change filesystem name from mqueue to mqueuefs for style consistent. Suggested by: rwatson	2005-11-27 08:30:12 +00:00
David Xu	6829585c43	Regen.	2005-11-27 01:23:31 +00:00
David Xu	94e1294b06	Don't use OpenBSD syscall numbers, instead, use new syscall numbers for POSIX message queue. Suggested by: rwatson	2005-11-27 01:13:00 +00:00
Robert Watson	5e758b9561	Add several aliases for existing clockid_t names to indicate that the application wishes to request high precision time stamps be returned: Alias Existing CLOCK_REALTIME_PRECISE CLOCK_REALTIME CLOCK_MONOTONIC_PRECISE CLOCK_MONOTONIC CLOCK_UPTIME_PRECISE CLOCK_UPTIME Add experimental low-precision clockid_t names corresponding to these clocks, but implemented using cached timestamps in kernel rather than a full time counter query. This offers a minimum update rate of 1/HZ, but in practice will often be more frequent due to the frequency of time stamping in the kernel: New clockid_t name Approximates existing clockid_t CLOCK_REALTIME_FAST CLOCK_REALTIME CLOCK_MONOTONIC_FAST CLOCK_MONOTONIC CLOCK_UPTIME_FAST CLOCK_UPTIME Add one additional new clockid_t, CLOCK_SECOND, which returns the current second without performing a full time counter query or cache lookup overhead to make sure the cached timestamp is stable. This is intended to support very low granularity consumers, such as time(3). The names, visibility, and implementation of the above are subject to change, and will not be MFC'd any time soon. The goal is to expose lower quality time measurement to applications willing to sacrifice accuracy in performance critical paths, such as when taking time stamps for the purpose of rescheduling select() and poll() timeouts. Future changes might include retrofitting the time counter infrastructure to allow the "fast" time query mechanisms to use a different time counter, rather than a cached time counter (i.e., TSC). NOTE: With different underlying time mechanisms exposed, using different time query mechanisms in the same application may result in relative non-monoticity or the appearance of clock stalling for a single clockid_t, as a cached time stamp queried after a precision time stamp lookup may be "before" the time returned by the earlier live time counter query.	2005-11-27 00:55:18 +00:00
David Xu	7023331e59	Regen.	2005-11-26 12:45:22 +00:00
David Xu	655291f2ae	Bring in experimental kernel support for POSIX message queue.	2005-11-26 12:42:35 +00:00
Craig Rodrigues	5e6b93a014	In nmount() and vfs_donmount(), do not strcmp() the options in the iovec directly. We need to copyin() the strings in the iovec before we can strcmp() them. Also, when we want to send the errmsg back to userspace, we need to copyout()/copystr() the string. Add a small helper function vfs_getopt_pos() which takes in the name of an option, and returns the array index of the name in the iovec, or -1 if not found. This allows us to locate an option in the iovec without actually manipulating the iovec members. directly via strcmp(). Noticed by: kris on sparc64	2005-11-23 20:51:15 +00:00
John Polstra	ba3612cd5c	Fix a bug in the loop in sonewconn that makes room on the incomplete connection queue for a new connection. It was removing connections from the wrong list. Submitted by: Paul Mikesell Sponsored by: Isilon Systems MFC after: 1 week	2005-11-22 01:55:29 +00:00
Marcel Moolenaar	60b7823989	Fix bug introduced in revision 1.186: When all file systems have a time stamp of zero, which is the case for example when the root file system is on a read-only medium, we ended up not calling inittodr() at all. A potential uncleanliness existed as well. If multiple file systems had a non-zero time stamp, we would call inittodr() multiple times. While this should not be harmful, it's definitely not ideal. Fix both issues by iterating over the mounted file systems to find the largest time stamp and call inittodr() exactly once with that time stamp. This could of course be a zero time stamp if none of the mounted file systems have a non-zero time stamp. In that case the annoying errors mentioned in the commit log for revision 1.186 still haven't been avoided. The bottom line is that inittodr() should not complain when it gets a time base of zero. At the time of this commit only alpha seems to have that problem. Reported by: Dario Freni (saturnero at freesbie dot org) MFC after: 1 week	2005-11-19 21:51:45 +00:00
Craig Rodrigues	425e5b6268	Parse more mount options in vfs_donmount(), before vfs_domount() is called. It looks like there are lots of different mount flags checked in vfs_domount(), so we need to do the parsing for these particular mount flags earlier on. The new flags parsed are: async, force, multilabel, noasync, noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow, snapshot, suiddir, sync, union. Existing code which uses mount() to mount UFS filesystems is not affected, but new code which uses nmount() to mount UFS filesystems should behave better.	2005-11-19 21:22:21 +00:00
Andre Oppermann	5eefd88949	Add CLOCK_UPTIME to clock_gettime(2) reporting the current uptime measured in SI seconds. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-18 16:51:13 +00:00
Craig Rodrigues	8fd860cfa1	In vfs_nmount(), check to see if "update" mount option was passed in, and if so, set MNT_UPDATE filesystem flag. vfs_nmount() calls vfs_domount(), and there is special logic inside vfs_domount() if MNT_UPDATE is set. This is very important when we want to do an update mount of the root filesystem, using nmount().	2005-11-18 01:31:10 +00:00
Pyun YongHyeon	dc22aef2ae	Prefer NULL to 0. Add missing lock/unlock in sysctl handler. Protect accessing NULL pointer when resource allocation was failed. style(9) Reviewed by: scottl MFC after: 1 week	2005-11-17 08:56:21 +00:00
Olivier Houchard	481a1fe19e	Add a new sysctl, kern.elf[32\|64].can_exec_dyn. When set to 1, one can execute a ET_DYN binary (shared object). This does not make much sense, but some linux scripts expect to be able to execute /lib/ld-linux.so.2 (ldd comes to mind). The sysctl defaults to 0. MFC after: 3 days	2005-11-14 22:24:00 +00:00
Robert Watson	c5c9bd5b72	In ktr_getrequest(), acquire ktrace_mtx earlier -- while the race currently present is minor and offers no real semantic issues, it also doesn't make sense since an earlier lockless check has already occurred. Also hold the mutex longer, over a manipulation of per-process ktrace state, which requires synchronization. MFC after: 1 month Pointed out by: jhb	2005-11-14 19:30:09 +00:00
Robert Watson	2c255e9df6	Moderate rewrite of kernel ktrace code to attempt to generally improve reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>	2005-11-13 13:27:44 +00:00
Craig Rodrigues	d5328381f1	style(9) cleanups. Spotted by: njl, bde	2005-11-12 14:41:44 +00:00
Robert Watson	71909edec8	Significant refactoring of the accounting code to improve locking and VFS happiness, as well as correct other bugs: - Replace notion of current and saved accounting credential/vnode with a single credential/vnode and an acct_suspended flag. This simplifies the accounting logic substantially. - Replace acct_mtx with acct_sx, a sleepable lock held exclusively during reconfiguration and space polling, but shared during log entry generation. This avoids holding a mutex over sleepable VFS operations. - Hold the sx lock over the duration of the I/O so that the vnode I/O cannot occur after vnode close, which could occur previously if accounting was disabled as a process exited. - Write the accounting log entry with Giant conditionally acquired based on the file system where the log is stored. Previously, the accounting code relied on the caller acquiring Giant. - Acquire Giant conditionally in the accounting callout based on the file system where the accounting log is stored. Run the callout MPSAFE. - Expose acct_suspended via a read-only sysctl so it is possibly to programmatically determine whether accounting is suspended or not without attempting to parse logs. - Check both acct_vp and acct_suspended lock-free before entering the accounting sx lock in acct(). - When accounting is disabled due to a VBAD vnode (i.e., forceable unmount), generate a log message indicating accounting has been disabled. - Correct a long-standing bug in how free space is calculated and compared to the required space: generate and compare signed results, not unsigned results, or negative free space will cause accounting to not be suspended when required, or worse, incorrectly resumed once negative free space is reached. MFC after: 2 weeks	2005-11-12 10:45:13 +00:00
David Xu	413cf3bbe1	Make sure only remove one signal by debugger.	2005-11-12 04:22:16 +00:00
Robert Watson	a0ec558af0	Correct a number of serious and closely related bugs in the UNIX domain socket file descriptor garbage collection code, which is intended to detect and clear cycles of orphaned file descriptors that are "in-flight" in a socket when that socket is closed before they are received. The algorithm present was both run at poor times (resulting in recursion and reentrance), and also buggy in the presence of parallelism. In order to fix these problems, make the following changes: - When there are in-flight sockets and a UNIX domain socket is destroyed, asynchronously schedule the garbage collector, rather than running it synchronously in the current context. This avoids lock order issues when the garbage collection code reenters the UNIX domain socket code, avoiding lock order reversals, deadlocks, etc. Run the code asynchronously in a task queue. - In the garbage collector, when skipping file descriptors that have entered a closing state (i.e., have f_count == 0), re-test the FDEFER flag, and decrement unp_defer. As file descriptors can now transition to a closed state, while the garbage collector is running, it is no longer the case that unp_defer will remain an accurate count of deferred sockets in the mark portion of the GC algorithm. Otherwise, the garbage collector will loop waiting waiting for unp_defer to reach zero, which it will never do as it is skipping file descriptors that were marked in an earlier pass, but now closed. - Acquire the UNIX domain socket subsystem lock in unp_discard() when modifying the unp_rights counter, or a read/write race is risked with other threads also manipulating the counter. While here: - Remove #if 0'd code regarding acquiring the socket buffer sleep lock in the garbage collector, this is not required as we are able to use the socket buffer receive lock to protect scanning the receive buffer for in-flight file descriptors on the socket buffer. - Annotate that the description of the garbage collector implementation is increasingly inaccurate and needs to be updated. - Add counters of the number of deferred garbage collections and recycled file descriptors. This will be removed and is here temporarily for debugging purposes. With these changes in place, the unp_passfd regression test now appears to be passed consistently on UP and SMP systems for extended runs, whereas before it hung quickly or panicked, depending on which bug was triggered. Reported by: Philip Kizer <pckizer at nostrum dot com> MFC after: 2 weeks	2005-11-10 16:06:04 +00:00
Robert Watson	742be7821c	Add the f_msgcount field to the set of struct file fields printed in show files. MFC after: 1 week	2005-11-10 13:26:29 +00:00
Robert Watson	2be165c93e	Expanet of details printed for each file descriptor to include it's garbage collection flags. Reformat generally to make this fit and leave some room for future expansion. MFC after: 1 week	2005-11-10 11:35:59 +00:00
Robert Watson	b4e507aafa	Add a DDB "show files" command to list the current open file list, some state about each open file, and identify the first process in the process table that references the file. This is helpful in debugging leaks of file descriptors. MFC after: 1 week	2005-11-10 10:42:50 +00:00
Doug White	16e35dcc39	This is a workaround for a complicated issue involving VFS cookies and devfs. The PR and patch have the details. The ultimate fix requires architectural changes and clarifications to the VFS API, but this will prevent the system from panicking when someone does "ls /dev" while running in a shell under the linuxulator. This issue affects HEAD and RELENG_6 only. PR: 88249 Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com> MFC after: 3 days	2005-11-09 22:03:50 +00:00
Robert Watson	f8a9ed1fa7	Fix typo in recent comment tweak. Submitted by: jkim MFC after: 1 week	2005-11-09 22:02:02 +00:00
Robert Watson	923633b4b5	In closef(), remove the assumption that there is a thread associated with the file descriptor. When a file descriptor is closed as a result of garbage collecting a UNIX domain socket, the file descriptor will not have any associated thread, so the logic to identify advisory locks held by that thread is not appropriate. Check the thread for NULL to avoid this scenario. Expand an existing comment to say a bit more about this. MFC after: 1 week	2005-11-09 20:54:25 +00:00
Warner Losh	5d56add2ba	General consensus is that it would be even better to run this in a thread context. While it doesn't matter too much at the moment, in the future we could be back in the same boat if/when more restrictions are placed (or enforced) in a SWI. Suggested by: njl, bde, jhb, scottl	2005-11-09 16:22:56 +00:00
John Baldwin	26b7bd707c	Use intptr_t casts to convert void * <--> int to make 64-bit archs happy.	2005-11-09 15:15:59 +00:00
Ruslan Ermilov	303989a2f3	Use sparse initializers for "struct domain" and "struct protosw", so they are easier to follow for the human being.	2005-11-09 13:29:16 +00:00
David Xu	f4d8522334	WIFxxx macros requires an int type but p_xstat is short, convert it to int before using the macros. Bug reported by : Pyun YongHyeon pyunyh at gmail dot com	2005-11-09 07:58:16 +00:00
Warner Losh	161604d863	Kick off the suspend sequence from the keyboard in a SWI rather than in the hardware interrupt context (even if it is likely just an ithread). We don't document that suspend/resume routines are run from such a context and some of the things that happen in those routines aren't interrupt safe. Since there's no real need to run from that context, this restores assumptions that suspend routines have made. This fixes Thierry Herbelot's 'Trying to sleep while sleeping is prohibited' problem.	2005-11-09 07:32:01 +00:00
Warner Losh	2002eaadb7	Clarify panic message, I parsed the old one 'trying to sleep while sleeping'	2005-11-09 07:28:52 +00:00
Craig Rodrigues	4560dfb5b1	For nmount(), allow a text string error message to be propagated back to user-space if a parameter named "errmsg" is passed into the iovec. Used in conjunction with vfs_mount_error(), more useful error messages than errno can be passed back to userspace when mounting a filesystem fails. Discussed with: phk, pjd	2005-11-09 02:26:38 +00:00
David Xu	323fe56580	In aio_waitcomplete, do not return EAGAIN if no other threads have started aio, instead, initialize aio management structure if it hasn't been done, the reason to adjust this behavior is to make it a bit friendly for threaded program, consider two threads, one submits aio_write, and another just calls aio_waitcomplete to wait any I/O to be completed and recycle the aio requests, before submitter doing any I/O, the recycler wants to wait in kernel. This also fixes inconsistency with other aio syscalls.	2005-11-08 23:48:32 +00:00
David Xu	c20cedbfc9	Make sure pending SIGCHLD is removed from previous parent when process is attached or detached.	2005-11-08 23:28:12 +00:00
John Baldwin	2a522eb9d3	Various and sundry cleanups: - Use curthread for calls to knlist_delete() and add a big comment explaining why as well as appropriate assertions. - Use TAILQ_FOREACH and TAILQ_FOREACH_SAFE instead of handrolling them. - Use fget() family of functions to lookup file objects instead of grovelling around in file descriptor tables. - Destroy the aio_freeproc mutex if we are unloaded. Tested on: i386	2005-11-08 17:43:05 +00:00
Christian S.J. Peron	576068804d	Giant clean up for exit(2) -Change unconditional aquisition of Giant to only pickup Giant if the vnode for the controlling tty resides on a non-mpsafe file system. -Pickup Giant around executable vnode reference counting operations only if the executable resides on a non-mpsafe file system. -If this process is being traced, pickup Giant for trace file reference count operations only if it resides on a non-mpsafe file system. Discussed with: jhb Tested by: kris	2005-11-08 17:11:03 +00:00
David Xu	ebceaf6dc7	Add support for queueing SIGCHLD same as other UNIX systems did. For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)	2005-11-08 09:09:26 +00:00
Craig Rodrigues	84e69560b6	Add utility function to propagate mount errors as text string messages. Discussed with: phk	2005-11-08 04:13:39 +00:00
Gleb Smirnoff	49d46b616e	Fix panic string in last revision.	2005-11-06 16:47:59 +00:00
Andre Oppermann	cd5bb63b3d	Free only those mbuf+clusters back to the packet zone that were allocated from there. All others get broken up and free'd individually to the mbuf and cluster zones. The packet zone is a secondary zone to the mbuf zone. There is currently a limitation in UMA which prevents decreasing the packet zone stock when the mbuf and cluster zone are drained and all their members are part of packets. When this is fixed this change may be reverted.	2005-11-05 19:43:55 +00:00
Andre Oppermann	a5f7708723	Fix a logic error introduced with mandatory mbuf cluster refcounting and freeing of mbufs+clusters back to the packet zone.	2005-11-04 17:20:53 +00:00
David Xu	8f0371f19d	Fix name compatible problem with POSIX standard. the sigval_ptr and sigval_int really should be sival_ptr and sival_int. Also sigev_notify_function accepts a union sigval value but not a pointer.	2005-11-04 09:41:00 +00:00
John Baldwin	091e8307d0	Add stoppcbs[] arrays on Alpha and sparc64 and have each CPU save its current context in the IPI_STOP handler so that we can get accurate stack traces of threads on other CPUs on these two archs like we do now on i386 and amd64. Tested on: alpha, sparc64	2005-11-03 21:08:20 +00:00
John Baldwin	55de4dcab6	Fix 'show allpcpu' ddb command on non-x86. CPU IDs are in the range 0 .. mp_maxid, not 0 .. mp_maxid - 1. The result was that the highest numbered CPU was skipped on Alpha and sparc64. MFC after: 1 week	2005-11-03 21:06:29 +00:00
Pawel Jakub Dawidek	2a143d5bf5	Detect memory leaks when memory type is being destroyed. This is very helpful for detecting kernel modules memory leaks on unload. Discussed and reviewed by: rwatson	2005-11-03 13:48:59 +00:00
David Xu	4c0fb2cfff	Support sending realtime signal information via signal queue, realtime signal memory is pre-allocated, so kernel can always notify user code.	2005-11-03 05:25:26 +00:00
David Xu	6d7b314b14	Cleanup some signal interfaces. Now the tdsignal function accepts both proc pointer and thread pointer, if thread pointer is NULL, tdsignal automatically finds a thread, otherwise it sends signal to given thread. Add utility function psignal_event to send a realtime sigevent to a process according to the delivery requirement specified in struct sigevent.	2005-11-03 04:49:16 +00:00
David Xu	d1f16b4d2e	Oops, don't change tdsignal call.	2005-11-03 01:38:49 +00:00
David Xu	44355392b4	Add thread_find() function to search a thread by lwpid.	2005-11-03 01:34:08 +00:00
Paul Saab	1471f287e1	Calling setrlimit from 32bit apps could potentially increase certain limits beyond what should be capiable in a 32bit process, so we must fixup the limits. Reviewed by: jhb	2005-11-02 21:18:07 +00:00
Andre Oppermann	56a4e45ab3	Mandatory mbuf cluster reference counting and groundwork for UMA based jumbo 9k and jumbo 16k cluster support. All mbuf's with external storage attached are mandatory reference counted. For clusters and jumbo clusters UMA provides the refcnt storage directly. It does not have to be separatly allocated. Any other type of external storage gets its own refcnt allocated from an UMA mbuf refcnt zone instead of normal kernel malloc. The refcount API MEXT_ADD_REF() and MEXT_REM_REF() is no longer publically accessible. The proper m_* functions have to be used. mb_ctor_clust() and mb_dtor_clust() both handle normal 2K as well as 9k and 16k clusters. Clusters and jumbo clusters may be obtained without attaching it immideatly to an mbuf. This is for high performance cluster allocation in network drivers where mbufs are attached after the cluster has been filled. Tested by: rwatson Sponsored by: TCP/IP Optimizations Fundraise 2005	2005-11-02 16:20:36 +00:00
Andre Oppermann	34333b16cd	Retire MT_HEADER mbuf type and change its users to use MT_DATA. Having an additional MT_HEADER mbuf type is superfluous and redundant as nothing depends on it. It only adds a layer of confusion. The distinction between header mbuf's and data mbuf's is solely done through the m->m_flags M_PKTHDR flag. Non-native code is not changed in this commit. For compatibility MT_HEADER is mapped to MT_DATA. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-02 13:46:32 +00:00
John Baldwin	68a17869c1	Push down Giant into fdfree() and remove it from two of the callers. Other callers such as some rfork() cases weren't locking Giant anyway. Reviewed by: csjp MFC after: 1 week	2005-11-01 17:13:05 +00:00
Robert Watson	2bdeb3f9f2	Reuse ktr_unused field in ktr_header structure as ktr_tid; populate ktr_tid as part of gathering of ktr header data for new ktrace records. The continued use of intptr_t is required for file layout reasons, and cannot be changed to lwpid_t at this point. MFC after: 1 month Reviewed by: davidxu	2005-11-01 14:46:37 +00:00
Robert Watson	d977a58307	Replace ktr_buffer pointer in struct ktr_header with a ktr_unused intptr_t. The buffer length needs to be written to disk as part of the trace log, but the kernel pointer for the buffer does not. Add a new ktr_buffer pointer to the kernel-only ktrace request structure to hold that pointer. This frees up an integer in the ktrace record format that can be used to hold the threadid, although older ktrace files will have a garbage ktr_buffer field (or more accurately, a kernel pointer value). MFC after: 2 weeks Space requested by: davidxu	2005-11-01 12:36:19 +00:00
Paul Saab	ecc44de7a2	Reformat socket control messages on input/output for 32bit compatibility on 64bit systems. Submitted by: ps, ups Reviewed by: jhb	2005-10-31 21:09:56 +00:00
John Baldwin	f6494f2e13	Check to see if the hash table is present in link_elf_lookup_symbol() before dereferencing it. Certain corrupt kernel modules might not have a valid hash table, and would cause a kernel panic when they were loaded. Instead of panic'ing, the kernel now prints out a warning that it is missing the symbol hash table. Tested by: Benjamin Close Benjamin dot Close at clearchain dot com MFC after: 1 week	2005-10-31 19:17:32 +00:00
Robert Watson	5bb84bc84b	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
Robert Watson	d374e81efd	Push the assignment of a new or updated so_qlimit from solisten() following the protocol pru_listen() call to solisten_proto(), so that it occurs under the socket lock acquisition that also sets SO_ACCEPTCONN. This requires passing the new backlog parameter to the protocol, which also allows the protocol to be aware of changes in queue limit should it wish to do something about the new queue limit. This continues a move towards the socket layer acting as a library for the protocol. Bump __FreeBSD_version due to a change in the in-kernel protocol interface. This change has been tested with IPv4 and UNIX domain sockets, but not other protocols.	2005-10-30 19:44:40 +00:00
David Xu	56c06c4b67	Let itimer store itimerspec instead of itimerval, so I don't have to convert to or from timeval frequently. Introduce function itimer_accept() to ack a timer signal in signal acceptance code, this allows us to return more fresh overrun counter than at signal generating time. while POSIX says: "the value returned by timer_getoverrun() shall apply to the most recent expiration signal delivery or acceptance for the timer,.." I prefer returning it at acceptance time. Introduce SIGEV_THREAD_ID notification mode, it is used by thread libary to request kernel to deliver signal to a specified thread, and in turn, the thread library may use the mechanism to implement SIGEV_THREAD which is required by POSIX. Timer signal is managed by timer code, so it can not fail even if signal queue is full filled by sigqueue syscall.	2005-10-30 02:56:08 +00:00
David Xu	01790d850d	Regen.	2005-10-30 02:14:37 +00:00
David Xu	0972628aff	Fix sigevent's POSIX incompatible problem by adding member fields sigev_notify_function and sigev_notify_attributes. AIO syscalls use sigevent, so they have to be adjusted. Reviewed by: alc	2005-10-30 02:12:49 +00:00
Ed Maste	5d89e1d0af	In watchdog_config enable the software watchdog iff the WD_ACTIVE flag is set. When watchdogd(1) is terminated intentionally it clears the bit, which should then disable it in the kernel. PR: kern/74386 Submitted by: Alex Hoff <ahoff at sandvine dot com> Approved by: phk, rwatson (mentor)	2005-10-27 17:22:47 +00:00
John Baldwin	2851f51eb1	Revert most of revision 1.235 and fix the problem a different way. We can't acquire an sx lock in ttyinfo() because ttyinfo() can be called from interrupt handlers (such as atkbd_intr()). Instead, go back to locking the process group while we pick a thread to display information for and hold that lock until after we drop sched_lock to make sure the process doesn't exit out from under us. sched_lock ensures that the specific thread from that process doesn't go away. To protect against the process exiting after we drop the proc lock but before we dereference it to lookup the pid and p_comm in the call to ttyprintf(), we now copy the pid and p_comm to local variables while holding the proc lock. This problem was found by the recently added TD_NO_SLEEPING assertions for interrupt handlers. Tested by: emaste MFC after: 1 week	2005-10-27 16:47:28 +00:00
Paul Saab	53f5742d33	Allow 32bit get/setsockopt with SO_SNDTIMEO or SO_RECVTIMEO to work.	2005-10-27 04:26:35 +00:00
Peter Wemm	40c9966a37	Commit something we found useful at work at one point. Add sysctls for debug.kdb.panic and debug.kdb.trap alongside the existing debug.kdb.enter sysctl. 'panic' causes a panic, and 'trap' causes a page fault. We used these to ensure that crash dumps succeed from those two common failure modes. This avoids the need for creating a 'panic' kld module.	2005-10-26 22:40:07 +00:00
John Baldwin	fe486a370a	Add a swi_remove() function to teardown software interrupt handlers. For now it just calls intr_event_remove_handler(), but at some point it might also be responsible for tearing down interrupt events created via swi_add.	2005-10-26 15:51:05 +00:00
Gleb Smirnoff	c0bc2867c1	- Fix leak of struct nlminfo on process exit. - Fix malloc type collision, that made the above problem difficult to understand. Reported by: Vladimir Sharun <sharun ukr.net>	2005-10-26 07:18:37 +00:00
David Xu	4938faa635	do umtx_wake at userland thread exit address, so that others userland threads can wait for a thread to exit, and safely assume that the thread has left userland and is no longer using its userland stack, this is necessary for pthread_join when a thread is waiting for another thread to exit which has user customized stack, after pthread_join returns, the userland stack can be reused for other purposes, without this change, the joiner thread has to spin at the address to ensure the thread is really exited.	2005-10-26 06:55:46 +00:00
John Baldwin	e0f66ef861	Reorganize the interrupt handling code a bit to make a few things cleaner and increase flexibility to allow various different approaches to be tried in the future. - Split struct ithd up into two pieces. struct intr_event holds the list of interrupt handlers associated with interrupt sources. struct intr_thread contains the data relative to an interrupt thread. Currently we still provide a 1:1 relationship of events to threads with the exception that events only have an associated thread if there is at least one threaded interrupt handler attached to the event. This means that on x86 we no longer have 4 bazillion interrupt threads with no handlers. It also means that interrupt events with only INTR_FAST handlers no longer have an associated thread either. - Renamed struct intrhand to struct intr_handler to follow the struct intr_foo naming convention. This did require renaming the powerpc MD struct intr_handler to struct ppc_intr_handler. - INTR_FAST no longer implies INTR_EXCL on all architectures except for powerpc. This means that multiple INTR_FAST handlers can attach to the same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach to the same interrupt. Sharing INTR_FAST handlers may not always be desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun either. Drivers can always still use INTR_EXCL to ask for an interrupt exclusively. The way this sharing works is that when an interrupt comes in, all the INTR_FAST handlers are executed first, and if any threaded handlers exist, the interrupt thread is scheduled afterwards. This type of layout also makes it possible to investigate using interrupt filters ala OS X where the filter determines whether or not its companion threaded handler should run. - Aside from the INTR_FAST changes above, the impact on MD interrupt code is mostly just 's/ithread/intr_event/'. - A new MI ddb command 'show intrs' walks the list of interrupt events dumping their state. It also has a '/v' verbose switch which dumps info about all of the handlers attached to each event. - We currently don't destroy an interrupt thread when the last threaded handler is removed because it would suck for things like ppbus(8)'s braindead behavior. The code is present, though, it is just under #if 0 for now. - Move the code to actually execute the threaded handlers for an interrrupt event into a separate function so that ithread_loop() becomes more readable. Previously this code was all in the middle of ithread_loop() and indented halfway across the screen. - Made struct intr_thread private to kern_intr.c and replaced td_ithd with a thread private flag TDP_ITHREAD. - In statclock, check curthread against idlethread directly rather than curthread's proc against idlethread's proc. (Not really related to intr changes) Tested on: alpha, amd64, i386, sparc64 Tested on: arm, ia64 (older version of patch by cognet and marcel)	2005-10-25 19:48:48 +00:00
John Baldwin	6caf758e7c	Use shorter names for the Giant and fast taskqueues so that their names actually fit.	2005-10-25 19:29:02 +00:00
John Baldwin	58553b9925	Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all IPI_STOP IPIs. - Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is enabled if an attempt is made to send an IPI_STOP IPI. If the kernel option is enabled, there is also a sysctl to change the behavior at runtime (debug.stop_cpus_with_nmi which defaults to enabled). This includes removing stop_cpus_nmi() and making ipi_nmi_selected() a private function for i386 and amd64. - Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is enabled. - Fix ipi_nmi_handler() to execute the restart function on the first CPU that is restarted making use of atomic_readandclear() rather than assuming that the BSP is always included in the set of restarted CPUs. Also, the NMI handler didn't clear the function pointer meaning that subsequent stop and restarts could execute the function again. - Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use of stoppedpcbs[] and always enable it for i386 and amd64 instead of being dependent on KDB_STOP_NMI. It works fine in both the NMI and non-NMI cases.	2005-10-24 21:04:19 +00:00
John Baldwin	6b1e0d75b0	- Various small whitespace and style nits. - Use PCPU_GET(cpumask) in preference to 1 << PCPU_GET(cpuid) in a few places.	2005-10-24 20:31:04 +00:00
John Baldwin	f55ab99409	Document in #ifdef notnow code the actions that proc_fini would need to take if struct procs were actually freed.	2005-10-24 20:15:23 +00:00
John Baldwin	cf23efc12a	Don't panic if a spin lock is initialized that isn't in our static order list. Just warn about it instead. Requested by: scottl MFC after: 1 day	2005-10-24 20:14:24 +00:00
John Baldwin	8d2a5b8c34	Revert previous change to this file. I accidentally committed while fixing spelling in a comment.	2005-10-24 15:58:21 +00:00
John Baldwin	971d0ad835	Spell hierarchy correctly in comments. Submitted by: Wojciech A. Koszek dunstan at freebsd dot czest dot pl	2005-10-24 15:57:27 +00:00
Stephan Uphoff	198b0a3b71	Only set B_RAM (Read ahead mark) on an incore buffers if we can lock it. This fixes a race condition caused by the unlocked write access to the b_flags field. MFC after: 3 days	2005-10-24 14:23:04 +00:00
David Xu	fe80a39034	Don't touch last overrun if signal was already on queue.	2005-10-23 22:59:33 +00:00
David Xu	60354683d9	Make p_itimers as a pointer, so file sys/proc.h does not need to include sys/timers.h.	2005-10-23 12:19:08 +00:00
Alan Cox	5b5908005a	Previously, nothing prevented the page that was returned by pmap_extract() from being reclaimed before it was wired. Use pmap_extract_and_hold() instead of pmap_extract() and retain the hold on the page until it has been wired.	2005-10-23 07:41:56 +00:00
David Xu	e706ee8a16	Regen for POSIX timer syscalls.	2005-10-23 04:26:10 +00:00
David Xu	86857b368d	Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.	2005-10-23 04:22:56 +00:00
David Xu	5da49fcb8a	1. Make ksiginfo_alloc and ksiginfo_free public. 2. Introduce flags KSI_EXT and KSI_INS. The flag KSI_EXT allows a ksiginfo to be managed by outside code, the KSI_INS indicates sigqueue_add should directly insert passed ksiginfo into queue other than copy it.	2005-10-23 04:12:26 +00:00
Alan Cox	0a5a219830	Verify that access to the given address is allowed from user-space. Discussed with: rwatson@	2005-10-22 20:02:59 +00:00
Alan Cox	52ad48b69f	Eliminate spl* calls.	2005-10-21 05:48:38 +00:00
Robert Watson	64a266f9e8	Change format string for u_int64_t to %ju from %llu, in order to use the correct format string on 64-bit systems. Pointed out by: pjd	2005-10-20 21:28:31 +00:00
Robert Watson	909ed16c2b	Add a "show malloc" command to DDB, which prints out the current stats for available kernel malloc types. Quite useful for post-mortem debugging of memory leaks without a dump device configured on a panicked box. MFC after: 2 weeks	2005-10-20 17:41:47 +00:00
John Baldwin	2a2b58faa4	Add entry for the spin mutex used by the hptmv(4) driver. MFC after: 1 day Tested by: Philip Kizer pckizer at nostrum dot com	2005-10-20 14:49:59 +00:00
John Polstra	135c43dc52	Fix a bug in the kernel module runtime linker that made it impossible to unload the usb.ko module after boot if it was originally preloaded from "/boot/loader.conf". When processing preloaded modules, the linker erroneously added self-dependencies the each module's reference count. That prevented usb.ko's reference count from ever going to 0, so it could not be unloaded. Sponsored by Isilon Systems. Reviewed by: pjd, peter MFC after: 1 week	2005-10-19 20:40:30 +00:00
John Baldwin	8c4b6380c7	Move the initialization of the devmtx into the mutex_init() function called during early init before cninit(). Tested on: i386, alpha, sparc64 Reviewed by: phk, imp Reported by: Divacky Roman xdivac02 at stud dot fit dot vutbr dot cz MFC after: 1 week	2005-10-18 18:27:44 +00:00
Stefan Farfeleder	d60e86c86e	Const-qualify ksem_timedwait's parameter abstime as it's only passed in.	2005-10-18 11:46:24 +00:00
Peter Wemm	1a330eb01d	Add support for kernel modules with a single PT_LOAD section. While here, support up to four sections because it was trivial to do and cheap. (One pointer per section). For amd64 with "-fpic -shared" format .ko files, using a single PT_LOAD section is important to avoid wasting about 1MB of KVM and physical ram for the 'gap' between the two PT_LOAD sections. amd64 normally uses .o format kld files and isn't affected normally. But -fpic -shared modules are actually possible to produce and load... (And with a bugfix to binutils, we can build and use plain -shared .ko files without -fpic) i386 only wastes 4K per .ko file, so that isn't such a big deal there.	2005-10-17 23:21:55 +00:00
Poul-Henning Kamp	5ef5ee7b62	Use new functions to call into drivers methods.	2005-10-16 21:07:31 +00:00
Poul-Henning Kamp	7423b2b40c	Make ttyconsolemode() call ttsetwater() so that drivers don't have to.	2005-10-16 20:58:22 +00:00
Poul-Henning Kamp	51514bc484	Make ttsetcompat() static	2005-10-16 20:40:40 +00:00
Poul-Henning Kamp	733634738e	Eliminate two unused arguments to ttycreate().	2005-10-16 20:22:56 +00:00
Paul Saab	a372f8224c	Implement the 32bit versions of recvmsg, recvfrom, sendmsg Partially obtained from: jhb	2005-10-15 05:57:06 +00:00
Paul Saab	f0b479cd75	Implement 32bit wrappers for clock_gettime, clock_settime, and clock_getres.	2005-10-15 02:54:18 +00:00
Kris Kennaway	14cdc36456	mpsafevm has been stable and defaulted to 1 on sparc64 for over 6 months, so we are ready for mpsafevfs=1 by default on sparc64 too. I have been running this on all my sparc64 machines for over 6 months, and have not encountered MD problems. MFC after: 1 week	2005-10-14 23:56:13 +00:00
Kris Kennaway	f098dcded5	Partially revert revision 1.66, which contained a change that did not correspond to the commit log. It changed the maxswzone and maxbcache parameters from int to long, without changing the extern definitions in <sys/buf.h>. In fact it's a good thing it did not, because other parts of the system are not yet ready for this, and on large-memory sparc machines it causes severe filesystem damage if you try. The worst effect of the change was that the tunables controlling the above variables stopped working. These were necessary to allow such large sparc64 machines (with >12GB RAM) to boot, since sparc64 did not set a hard-coded upper limit on these parameters and they ended up overflowing an int, causing an infinite loop at boot in bufinit(). Reviewed by: mlaier	2005-10-14 19:15:10 +00:00
David Xu	823acd70b6	Regen for sigqueue syscall.	2005-10-14 12:56:28 +00:00
David Xu	9104847f21	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64	2005-10-14 12:43:47 +00:00
Doug Ambrisko	db43cd0417	Fix tinderbox box by removing incomplete/bad spl usage. Proper giant free locking is required in for aio. Pointed out by: imp	2005-10-12 22:33:22 +00:00
Doug Ambrisko	69cd28dacb	Add in kqueue support to LIO event notification and fix how it handled notifications when LIO operations completed. These were the problems with LIO event complete notification: - Move all LIO/AIO event notification into one general function so we don't have bugs in different data paths. This unification got rid of several notification bugs one of which if kqueue was used a SIGILL could get sent to the process. - Change the LIO event accounting to count all AIO request that could have been split across the fast path and daemon mode. The prior accounting only kept track of AIO op's in that mode and not the entire list of operations. This could cause a bogus LIO event complete notification to occur when all of the fast path AIO op's completed and not the AIO op's that ended up queued for the daemon. Suggestions from: alc	2005-10-12 17:51:31 +00:00
Diomidis Spinellis	9f5c1d1955	Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks	2005-10-12 06:56:00 +00:00
Tor Egge	8272da3106	Release clean buffer with wrong size and no dependencies also for non-VMIO case.	2005-10-09 22:41:25 +00:00
Marcel Moolenaar	125fbd3cdc	Add parse_uuid() that creates a binary representation of an UUID from a string representation.	2005-10-07 13:37:10 +00:00
Poul-Henning Kamp	0694506637	Eliminate __RMAN_RESOURCE_VISIBLE hack entirely by moving the struct resource_ to subr_rman.c where it belongs.	2005-10-06 21:49:31 +00:00
Gleb Smirnoff	f0796cd26c	- Don't pollute opt_global.h with DEVICE_POLLING and introduce opt_device_polling.h - Include opt_device_polling.h into appropriate files. - Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that can be compiled as loadable modules. Reviewed by: bde	2005-10-05 10:09:17 +00:00
Warner Losh	2f624c21c5	When data passed into devctl_notify is NULL, don't print (null). Instead don't print anything at all. # this fixes a problem that I noticed with devd.pipe not terminating lines # with \n correctly sometimes.	2005-10-04 22:25:14 +00:00
Robert Watson	7723d5ed12	Re-order MAC and DAC checks in shmget() in order to give precedence to the MAC result, as well as avoid losing the DAC check result when MAC is enabled. MFC after: 3 days Reported by: Patrick LeBlanc <Patrick dot LeBlanc at sparta dot com>	2005-10-04 16:40:20 +00:00
Roman Kurakin	826cf005ed	Use FILEDESC_UNLOCK(fdp) after FILE_UNLOCK(p), not before to avoid LOR. Slightly discussed on current@. LOR #055 MFC after: 14 days	2005-10-04 16:27:54 +00:00
Christian S.J. Peron	9eea3d85cc	Standard Giant push down operations for the Mandatory Access Control (MAC) framework. This makes Giant protection around MAC operations which inter- act with VFS conditional, based on the MPSAFE status of the file system. Affected the following syscalls: o __mac_get_fd o __mac_get_file o __mac_get_link o __mac_set_fd o __mac_set_file o __mac_set_link -Drop Giant all together in __mac_set_proc because the mac_cred_mmapped_drop_perms_recurse routine no longer requires it. -Move conditional Giant aquisitions to after label allocation routines. -Move the conditional release of Giant to before label de-allocation routines. Discussed with: rwatson	2005-10-04 14:32:58 +00:00
Don Lewis	34ea500bea	Add missing word to comment.	2005-10-04 04:02:33 +00:00
Gleb Smirnoff	e113edf30a	o Move a lot of parameter checking from netisr_poll() to dedicated sysctl handlers. Protect manipulations with poll_mtx. The affected sysctls are: - kern.polling.burst_max - kern.polling.each_burst - kern.polling.user_frac - kern.polling.reg_frac o Use CTLFLAG_RD on MIBs that supposed to be read-only. o u_int32t -> uint32_t o Remove unneeded locking from poll_switch().	2005-10-03 14:15:26 +00:00
Colin Percival	33812c066d	If sufficiently bad things happen during a call to kern_execve(), it is possible for do_execve() to call exit1() rather than returning. As a result, the sequence "allocate memory; call kern_execve; free memory" can end up leaking memory. This commit documents this astonishing behaviour and adds a call to exec_free_args() before the exit1() call in do_execve(). Since all the users of kern_execve() in the tree use exec_free_args() to free the command-line arguments after kern_execve() returns, this should be safe, and it fixes the memory leak which can otherwise occur. Submitted by: Peter Holm MFC after: 3 days Security: Local denial of service	2005-10-03 12:49:54 +00:00
Hajimu UMEMOTO	56e5a87a55	make saved cpu level stackable.	2005-10-03 06:57:29 +00:00
Don Lewis	5032ff8197	Always wire the sysctl output buffer in sysctl_kern_proc() before calling sysctl_out_proc(). -- fix from jhb Move the code in fill_kinfo_thread() that gathers data from struct proc into the new function fill_kinfo_proc_only(). Change all callers of fill_kinfo_thread() to call both fill_kinfo_proc_only() and fill_kinfo() thread. When gathering data from a multi-threaded process, fill_kinfo_proc_only() only needs to be called once. Grab sched_lock before accessing the process thread list or calling fill_kinfo_thread(). PR: kern/84684 MFC after: 3 days	2005-10-02 23:27:56 +00:00
Robert Watson	c30bf5c317	Include kdb.h so that kdb_active is declared regardless of KDB being included in the kernel. MFC after: 0 days	2005-10-02 10:03:51 +00:00
Poul-Henning Kamp	7bbb3a2690	Make sure the clone lists are sorted in the right order. Explosion triggered by: pjd MFC: 3 days	2005-10-01 19:21:03 +00:00
Gleb Smirnoff	4092996774	Big polling(4) cleanup. o Axe poll in trap. o Axe IFF_POLLING flag from if_flags. o Rework revision 1.21 (Giant removal), in such a way that poll_mtx is not dropped during call to polling handler. This fixes problem with idle polling. o Make registration and deregistration from polling in a functional way, insted of next tick/interrupt. o Obsolete kern.polling.enable. Polling is turned on/off with ifconfig. Detailed kern_poll.c changes: - Remove polling handler flags, introduced in 1.21. The are not needed now. - Forget and do not check if_flags, if_capenable and if_drv_flags. - Call all registered polling handlers unconditionally. - Do not drop poll_mtx, when entering polling handlers. - In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx. - In netisr_poll() axe the block, where polling code asks drivers to unregister. - In netisr_poll() and ether_poll() do polling always, if any handlers are present. - In ether_poll_[de]register() remove a lot of error hiding code. Assert that arguments are correct, instead. - In ether_poll_[de]register() use standard return values in case of error or success. - Introduce poll_switch() that is a sysctl handler for kern.polling.enable. poll_switch() goes through interface list and enabled/disables polling. A message that kern.polling.enable is deprecated is printed. Detailed driver changes: - On attach driver announces IFCAP_POLLING in if_capabilities, but not in if_capenable. - On detach driver calls ether_poll_deregister() if polling is enabled. - In polling handler driver obtains its lock and checks IFF_DRV_RUNNING flag. If there is no, then unlocks and returns. - In ioctl handler driver checks for IFCAP_POLLING flag requested to be set or cleared. Driver first calls ether_poll_[de]register(), then obtains driver lock and [dis/en]ables interrupts. - In interrupt handler driver checks IFCAP_POLLING flag in if_capenable. If present, then returns.This is important to protect from spurious interrupts. Reviewed by: ru, sam, jhb	2005-10-01 18:56:19 +00:00
Don Lewis	5997cae9a4	Copy new process argument list in do_execve() before grabbing PROC_LOCK to avoid touching pageable memory while holding a mutex. Simplify argument list replacement logic. PR: kern/84935 Submitted by: "Antoine Pelisse" apelisse AT gmail.com (in a different form) MFC after: 3 days	2005-10-01 08:33:56 +00:00
Don Lewis	bd3c2d867d	Un-staticize waitrunningbufspace() and call it before returning from ffs_copyonwrite() if any async writes were launched. Restore the threads previous TDP_NORUNNINGBUF state before returning from ffs_copyonwrite().	2005-09-30 18:07:41 +00:00
David Xu	763a429571	Fox a LOR of sleep and sched_lock by using a timeout wait when process reaches maximum number of threads. MFC after: 3 days	2005-09-30 06:09:41 +00:00
Don Lewis	6c8b634f1d	Un-staticize runningbufwakeup() and staticize updateproc. Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>	2005-09-30 01:30:01 +00:00
John Baldwin	b65089ccb5	Trim a couple of unneeded includes.	2005-09-29 19:13:52 +00:00
Peter Edwards	d41c4674c2	Close a race in biodone(), whereby the bio_done field of the passed bio may have been freed and reassigned by the wakeup before being tested after releasing the bdonelock. There's a non-zero chance this is the cause of a few of the crashes knocking around with biodone() sitting in the stack backtrace. Reviewed By: phk@	2005-09-29 10:37:20 +00:00
Poul-Henning Kamp	64fd97df54	puc(4) does strange things to resources in order to fool the subdrivers to hook up. It should probably be rewritten to implement a simple bus to which the sub drivers attach using some kind of hint. Until then, provide a couple of crutch functions with big warning signs so it can survive the recent changes to struct resource.	2005-09-28 18:06:25 +00:00
Robert Watson	5f419982c2	Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57, osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde	2005-09-28 07:03:03 +00:00
Christian S.J. Peron	453f7d5369	Push Giant down in jails. Pass the MPSAFE flag to NDINIT, and keep track of whether or not Giant was picked up by the filesystem. Add VFS_LOCK_GIANT macros around vrele as it's possible that this can call in the VOP_INACTIVE filesystem specific code. Also while we are here, remove the Giant assertion. from the sysctl handler, we do not actually require Giant here so we shouldn't assert it. Doing so will just complicate things when Giant is removed from the sysctl framework.	2005-09-28 00:30:56 +00:00
Robert Watson	667285c4e3	If KDB_STOP_NMI is compiled into the kernel, default debug.kdb.stop_cpus_with_nmi to 1 rather than 0. MFC after: 3 days	2005-09-27 21:12:05 +00:00
Robert Watson	2b59d50cfb	In lockstatus(), don't lock and unlock the interlock when testing the sleep lock status while kdb_active, or we risk contending with the mutex on another CPU, resulting in a panic when using "show lockedvnods" while in DDB. MFC after: 3 days Reviewed by: jhb Reported by: kris	2005-09-27 21:02:59 +00:00
Robert Watson	32a6bd9510	No longer maintain mbstat statistics for the mbuf allocator, UMA statistics and libmemstat(3) are now used to track mbuf statistics. MFC after: 1 month	2005-09-27 20:28:43 +00:00
John Baldwin	7e9e371f2d	Use the refcount API to manage the reference count for user credentials rather than using pool mutexes. Tested on: i386, alpha, sparc64	2005-09-27 18:09:42 +00:00
John Baldwin	b2149bde1f	Use the reference count API to manage the reference counts for process limit structures rather than using pool mutexes to protect the reference counts. Tested on: i386, alpha, sparc64	2005-09-27 18:07:05 +00:00
John Baldwin	55b4a5ae0d	Use the refcount API to implement reference counts on process argument structures rather than using a global mutex to protect the reference counts. Tested on: i386, alpha, sparc64	2005-09-27 18:03:15 +00:00
Christian S.J. Peron	6acd4b6189	Update the "created from" section to reflect the most recent version of syscalls.master Requested by: jhb	2005-09-27 14:36:59 +00:00
Christian S.J. Peron	7f300b47dd	Mark the extended attribute syscalls as being MP safe. Requested by: jhb	2005-09-27 14:32:04 +00:00
John Baldwin	d27acf445e	Add the spin lock used by the binary nvidia driver to the static lock order list so that WITNESS and the driver play together nicely. Tested by: Harald Schmalzbauer MFC after: 3 days	2005-09-26 18:30:12 +00:00
Robert Watson	9b7915859d	Add "show allpcpu" to DDB, which prints the current CPU id followed by the per-cpu data for all CPUs. This is easier to ask users to do than "figure out how many CPUs you have, now run show pcpu, then run it once for each CPU you have". MFC after: 3 days	2005-09-26 16:55:11 +00:00
David Xu	2b7182c6b7	Reorder statements to avoid accessing unknown memory. In theory, invoking kenv with very long string can panic kernel.	2005-09-26 14:14:55 +00:00
Robert Watson	329c75a730	Acquire Giant in uprintf() and tprintf() rather than asserting it. In the vast majority of cases, these functions are called without mutexes held, meaning that in all but two cases, there will be no ordering issues with doing this, and it will eliminate the need for changes in the caller. In two cases, mutexes are held, so Giant must be acquired before those mutexes such that uprintf() and tprintf() recurse Giant rather than generating a lock order reversal. Suggested by: bde	2005-09-26 08:02:24 +00:00
Poul-Henning Kamp	2b35175c8a	Add rman_is_region_manager() for the benefit of an alpha hack.	2005-09-25 20:10:10 +00:00
Christian S.J. Peron	c47a4d1c9f	Implement new world order in VFS locking for extended attributes. This will remove the unconditional acquisition of Giant for extended attribute related operations. If the file system is set as being MP safe and debug.mpsafevfs is 1, do not pickup Giant. Mark the following system calls as being MP safe so we no longer pickup Giant in the system call handler: o extattrctl o extattr_set_file o extattr_get_file o extattr_delete_file o extattr_set_fd o extattr_get_fd o extattr_delete_fd o extattr_set_link o extattr_get_link o extattr_delete_link o extattr_list_file o extattr_list_link o extattr_list_fd -Pass MPSAFE flags to namei(9) lookup and introduce vfslocked variable which will keep track of any Giant acquisitions. -Wrap any fd operations which manipulate vnodes in VFS_{UN}LOCK_GIANT -Drop VFS_ASSERT_GIANT into function which operate on vnodes to ensure that we are sufficiently protected. I've tested these changes with various TrustedBSD MAC policies which use extended attribute a lot on SMP and UP systems (thanks to Scott Long for making some SMP hardware available to me for testing). Discussed with: jeff Requested by: jhb, rwatson	2005-09-24 23:47:04 +00:00
Poul-Henning Kamp	ae7ff71f63	Split struct resource in an external and internal part. The external part is still called 'struct resource' but the contents is now visible to drivers etc. This makes it part of the device driver ABI so it not be changed lightly. A comment to this effect is in place. The internal part is called 'struct resource_i' and contain its external counterpart as one field. Move the bus_space tag+handle into the external struct resource, this removes the need for device drivers to even know about these fields in order to use bus_space to access hardware. (More in following commit).	2005-09-24 20:07:03 +00:00
Poul-Henning Kamp	a778923149	Add two convenience functions for device drivers: bus_alloc_resources() and bus_free_resources(). These functions take a list of resources and handle them all in one go. A flag makes it possible to mark a resource as optional. A typical device driver can save 10-30 lines of code by using these. Usage examples will follow RSN. MFC: A good idea, eventually.	2005-09-24 19:31:10 +00:00
Robert Watson	e1ac28e239	Canonicalize the UNIX domain socket copyright layout: original holders before more recent holders. MFC after: 3 days	2005-09-23 12:41:06 +00:00
Stephan Uphoff	3fafa27b27	Don't pretend to be thread0 when calling sync(). It confuses the lock manager since in some places thread0 is then used for vnode locking while curthread is used for vnode unlocking. Found by: Yahoo! Reviewed by: ps@,jhb@ MFC after: 3 days	2005-09-22 15:34:15 +00:00
David Xu	a861574011	Temporarily disable nice threshold detection code, as it can starve a thread holding critical resource, e.g mutex or other implicit synchronous flags. Give thread which exceeds nice threshold a minimum time slice. PR: kern/86087	2005-09-22 01:19:37 +00:00
John Baldwin	e12560dd4b	Use correct VFS locking rather than unconditionally grabbing Giant around namei() calls in kern_alternate_path(). Reviewed by: csjp MFC after: 1 week	2005-09-21 19:49:42 +00:00
Robert Watson	87328e07e0	Pass 'curthread' into VFS_STATFS() from acctwatch(), rather than passing NULL. The NFS client expects that a thread will always be present for a VOP so that it can check for signal conditions, and will dereference a NULL pointer if one isn't present. MFC after: 3 days	2005-09-21 15:28:07 +00:00
Robert Watson	5580b0b157	Correct an incorrect comment from the dawn of time: neither tprintf() nor uprintf() is believed to perform tsleep() or msleep() as written, as ttycheckoutq() is called with '0' as its sleep argument. Remove recently added WITNESS warnings for sleep as the comment was incorrect. This should silence a warning from the nfs_timer() code. Discussed with: bde	2005-09-20 09:55:36 +00:00
Andre Oppermann	e452573df7	Start time_uptime with 1 instead of 0. Discussed with: phk	2005-09-19 22:16:31 +00:00
Poul-Henning Kamp	e606a3c63e	Rewamp DEVFS internals pretty severely [1]. Give DEVFS a proper inode called struct cdev_priv. It is important to keep in mind that this "inode" is shared between all DEVFS mountpoints, therefore it is protected by the global device mutex. Link the cdev_priv's into a list, protected by the global device mutex. Keep track of each cdev_priv's state with a flag bit and of references from mountpoints with a dedicated usecount. Reap the benefits of much improved kernel memory allocator and the generally better defined device driver APIs to get rid of the tables of pointers + serial numbers, their overflow tables, the atomics to muck about in them and all the trouble that resulted in. This makes RAM the only limit on how many devices we can have. The cdev_priv is actually a super struct containing the normal cdev as the "public" part, and therefore allocation and freeing has moved to devfs_devs.c from kern_conf.c. The overall responsibility is (to be) split such that kern/kern_conf.c is the stuff that deals with drivers and struct cdev and fs/devfs handles filesystems and struct cdev_priv and their private liason exposed only in devfs_int.h. Move the inode number from cdev to cdev_priv and allocate inode numbers properly with unr. Local dirents in the mountpoints (directories, symlinks) allocate inodes from the same pool to guarantee against overlaps. Various other fields are going to migrate from cdev to cdev_priv in the future in order to hide them. A few fields may migrate from devfs_dirent to cdev_priv as well. Protect the DEVFS mountpoint with an sx lock instead of lockmgr, this lock also protects the directory tree of the mountpoint. Give each mountpoint a unique integer index, allocated with unr. Use it into an array of devfs_dirent pointers in each cdev_priv. Initially the array points to a single element also inside cdev_priv, but as more devfs instances are mounted, the array is extended with malloc(9) as necessary when the filesystem populates its directory tree. Retire the cdev alias lists, the cdev_priv now know about all the relevant devfs_dirents (and their vnodes) and devfs_revoke() will pick them up from there. We still spelunk into other mountpoints and fondle their data without 100% good locking. It may make better sense to vector the revoke event into the tty code and there do a destroy_dev/make_dev on the tty's devices, but that's for further study. Lots of shuffling of stuff and churn of bits for no good reason[2]. XXX: There is still nothing preventing the dev_clone EVENTHANDLER from being invoked at the same time in two devfs mountpoints. It is not obvious what the best course of action is here. XXX: comment out an if statement that lost its body, until I can find out what should go there so it doesn't do damage in the meantime. XXX: Leave in a few extra malloc types and KASSERTS to help track down any remaining issues. Much testing provided by: Kris Much confusion caused by (races in): md(4) [1] You are not supposed to understand anything past this point. [2] This line should simplify life for the peanut gallery.	2005-09-19 19:56:48 +00:00
Robert Watson	84d2b7df26	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week	2005-09-19 16:51:43 +00:00
Robert Watson	223aaaecb0	Remove mac_create_root_mount() and mpo_create_root_mount(), which provided access to the root file system before the start of the init process. This was used briefly by SEBSD before it knew about preloading data in the loader, and using that method to gain access to data earlier results in fewer inconsistencies in the approach. Policy modules still have access to the root file system creation event through the mac_create_mount() entry point. Removed now, and will be removed from RELENG_6, in order to gain third party policy dependencies on the entry point for the lifetime of the 6.x branch. MFC after: 3 days Submitted by: Chris Vance <Christopher dot Vance at SPARTA dot com> Sponsored by: SPARTA	2005-09-19 13:59:57 +00:00
Marcel Moolenaar	73130b2224	Move the UUID generator into its own function, called kern_uuidgen(), so that UUIDs can be generated from within the kernel. The uuidgen(2) syscall now allocates kernel memory, calls the generator, and does a copyout() for the whole UUID store. This change is in support of GPT.	2005-09-18 21:40:15 +00:00
Robert Watson	8434c29b28	Add three new read-only socket options, which allow regression tests and other applications to query the state of the stack regarding the accept queue on a listen socket: SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog) SO_LISTENQLEN Return the value of so_qlen (complete sockets) SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets) Minor white space tweaks to existing socket options to make them consistent. Discussed with: andre MFC after: 1 week	2005-09-18 21:08:03 +00:00
Robert Watson	bc6b8b5d64	Fix spelling in a comment. MFC after: 3 days	2005-09-18 10:46:34 +00:00
Robert Watson	7da7362b95	Re-comment sbcompress() to explain what it is it does; it took me quite a bit of reading to figure it out, and I want to avoid figuring it out again. Convert an if (foo) else printf("this is almost a panic") into a KASSERT. MFC after: 3 days	2005-09-18 10:30:10 +00:00
Warner Losh	fe0519b171	MFp4: Expose device_probe_child()	2005-09-18 01:32:09 +00:00
Christian S.J. Peron	42e7197fba	Implement new world order in VFS locking for ACLs. This will remove the unconditional acquisition of Giant for ACL related operations. If the file system is set as being MP safe and debug.mpsafevfs is 1, do not pickup giant. For any operations which require namei(9) lookups: __acl_get_file __acl_get_link __acl_set_file __acl_set_link __acl_delete_file __acl_delete_link __acl_aclcheck_file __acl_aclcheck_link -Set the MPSAFE flag in NDINIT -Initialize vfslocked variable using the NDHASGIANT macro For functions which operate on fds, make sure the operations are locked: __acl_get_fd __acl_set_fd __acl_delete_fd __acl_aclcheck_fd -Initialize vfslocked using VFS_LOCK_GIANT before we manipulate the vnode Discussed with: jeff	2005-09-17 22:01:14 +00:00
Tor Egge	61ac14dab6	Break out of loop if next buffer pointer has become invalid while flushing current buffer. Reviewed by: kan	2005-09-16 18:28:12 +00:00
Stephan Uphoff	19b2dff7b0	Fix race condition that caused activation of an event to be ignored immediately after it was deactivated. Found by: Yahoo! MFC after: 3 days	2005-09-15 21:10:12 +00:00
John Baldwin	21f9e816cd	Oops, missed adding the required include. Pointy hat to: jhb	2005-09-15 20:20:36 +00:00
John Baldwin	53c0e1ff7d	Replace the dont_sleep_in_callout mutex hack (similar to g_x{up,down}) with the disallow sleeping facility.	2005-09-15 20:09:08 +00:00
John Baldwin	10f508d9a3	Don't disallow sleeping for handlers on swi's since some swi handlers (like CAM) do sleep in their handlers. Requested by: scottl	2005-09-15 20:08:21 +00:00
John Baldwin	b27dbfbf4a	- Enforce an implicit lock order that Giant cannot be locked while holding any other non-sleepable lock. In plain English: Giant comes before all other mutexes. - Add some extra description to the lock order reversal printf's to indicate when a reversal is triggered by a hard-coded implicit rule. Requested by: truckman (2) MFC after: 1 week	2005-09-15 19:07:14 +00:00
John Baldwin	51460da87f	- Add a new simple facility for marking the current thread as being in a state where sleeping on a sleep queue is not allowed. The facility doesn't support recursion but uses a simple private per-thread flag (TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is set and INVARIANTS is enabled. - Use this new facility to replace the g_xup and g_xdown mutexes that were (ab)used to achieve similar behavior. - Disallow sleeping in interrupt threads when invoking interrupt handlers. MFC after: 1 week Reviewed by: phk	2005-09-15 19:05:37 +00:00
Christian S.J. Peron	68ff2a4397	Improve the MP safeness associated with the creation of symbolic links and the execution of ELF binaries. Two problems were found: 1) The link path wasn't tagged as being MP safe and thus was not properly protected. 2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was insufficiently protected. This commit makes the following changes: -Sets the MPSAFE flag in NDINIT for symbolic link paths -Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been picked up. -Drop in an assertion into vfs_lookup which ensures that if the MPSAFE flag is NOT set, that we have picked up giant. If not panic (if WITNESS compiled into the kernel). This should help us find conditions where vnode operations are in-sufficiently protected. This is a RELENG_6 candidate. Discussed with: jeff MFC after: 4 days	2005-09-15 15:03:48 +00:00
Maxim Konovalov	aada5cccd8	Backout rev. 1.246, it breaks code uses shutdown(2) on non-connected sockets. Pointed out by: rwatson	2005-09-15 13:18:05 +00:00
Ralf S. Engelschall	724447ac41	Fix system shutdown timeout handling by again supporting longer running shutdown procedures (which have a duration of more than 120 seconds). We have two user-space affecting shutdown timeouts: a "soft" one in /etc/rc.shutdown and a "hard" one in init(8). The first one can be configured via /etc/rc.conf variable "rcshutdown_timeout" and defaults to 30 seconds. The second one was originally (in 1998) intended to be configured via sysctl(8) variable "kern.shutdown_timeout" and defaults to 120 seconds. Unfortunately, the "kern.shutdown_timeout" was declared "unused" in 1999 (as it obviously is actually not used within the kernel itself) and hence was intentionally but misleadingly removed in revision 1.107 from init_main.c. Kernel sysctl(8) variables are certainly a wrong way to control user-space processes in general, but in this particular case the sysctl(8) variable should have remained as it supports init(8), which isn't passed command line flags (which in turn could have been set via /etc/rc.conf), etc. As there is already a similar "kern.init_path" sysctl(8) variable which directly affects init(8), resurrect the init(8) shutdown timeout under sysctl(8) variable "kern.init_shutdown_timeout". But this time document it as being intentionally unused within the kernel and used by init(8). Also document it in the manpages init(8) and rc.conf(5). Reviewed by: phk MFC after: 2 weeks	2005-09-15 13:16:07 +00:00
Maxim Konovalov	c5cff17017	o Return ENOTCONN when shutdown(2) on non-connected socket. PR: kern/84761 Submitted by: James Juran R-test: tools/regression/sockets/shutdown MFC after: 1 month	2005-09-15 11:45:36 +00:00
Poul-Henning Kamp	74f46f19aa	Retire unused dev_named() function.	2005-09-15 08:01:57 +00:00
Robert Watson	fd1a469ba5	In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupported kqueue filter type is requested on a vnode. MFC after: 3 days	2005-09-12 19:22:37 +00:00
Jung-uk Kim	9ed448b20c	use monotonic `time_uptime' instead of` time_second' Approved by: anholt (mentor) Discussed on: arch	2005-09-12 15:31:28 +00:00
Poul-Henning Kamp	2883ba6668	Introduce vfs_read_dirent() which can help VOP_READDIR() implementations by handling all the cookie stuff.	2005-09-12 08:46:07 +00:00
Tor Egge	6ff5e2db45	Don't retry when vget() returns ENOENT in the nonblocking case due to the vnode being doomed. It causes a livelock.	2005-09-12 01:48:57 +00:00
Don Lewis	908b3deb2b	Relocate witness_levelall(), witness_leveldescendents(), and witness_displaydescendants() so that they are protected by "#ifdef DDB/#endif" to unbreak kernels not using "option DDB". MFC after: 3 weeks	2005-09-11 07:57:06 +00:00
Gleb Smirnoff	d04304d155	Make callout_reset() return a non-zero value if a pending callout was rescheduled. If there was no pending callout, then return 0. Reviewed by: iedowse, cperciva	2005-09-08 14:20:39 +00:00
Don Lewis	d07f87a218	Add a new struct buf flag bit, B_PERSISTENT, and use it to tag struct bufs that are persistently held by ext2fs. Ignore any buffers with this flag in the code in boot() that counts "busy" and dirty buffers and attempts to sync the dirty buffers, which is done before attempting to unmount all the file systems during shutdown. This fixes the problem caused by any ext2fs file systems that are mounted at system shutdown time, which caused boot() to give up on a non-zero number of buffers and skip the call to vfs_unmountall(). This left all the mounted file systems in a dirty state and caused them to all require cleanup by fsck on reboot. Move the two separate copies of the "busy" buffer test in boot() to a separate function. Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro. Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with this and previous flag changes. PR: kern/56675, kern/85163 Tested by: "Matthias Andree" matthias.andree at gmx.de Reviewed by: bde MFC after: 3 days	2005-09-08 06:30:05 +00:00
David E. O'Brien	5b1c0294e4	Forward declaring static variables as extern is invalid ISO-C. Now that GCC can properly handle forward static declarations, do this properly.	2005-09-07 10:06:14 +00:00
Gleb Smirnoff	016e62123a	In soreceive(), when a first mbuf is removed from socket buffer use sockbuf_pushsync(). Previous manipulation could lead to an inconsistent mbuf. Reviewed by: rwatson	2005-09-06 17:05:11 +00:00
Gleb Smirnoff	f46ab10c02	Document flags of a pollrec.	2005-09-06 11:09:18 +00:00
Christian S.J. Peron	d1dfd92177	Convert the primary ACL allocator from malloc(9) to using a UMA zone instead. Also introduce an aclinit function which will be used to create the UMA zone for use by file systems at system start up. MFC after: 1 month Discussed with: rwatson	2005-09-06 00:06:30 +00:00
Gleb Smirnoff	16901c0186	Remove Giant mutex from polling(4) and use a separate poll_mtx(4) instead. Detailed changelist: o Add flags field to struct pollrec, to indicate that are particular entry is being worked on. o Define a macro PR_VALID() to check that a pollrec is valid and pollable. o Mark ISRs as mpsafe. o ether_poll() - Acquire poll_mtx while traversing pollrec array. - Skip pollrecs, that are being worked on. - Conditionally acquire Giant when entering handler. o netisr_pollmore() - Conditionally assert Giant. - Acquire poll_mtx while working with statistics. o netisr_poll() - Conditionally assert Giant. - Acquire poll_mtx while working with statistics and traversing pollrec array. o ether_poll_register(), ether_poll_deregister() - Conditionally assert Giant. - Acquire poll_mtx while working with pollrec array. o poll_idle() - Remove all strange manipulations with Giant. In collaboration with: ru, pjd In collaboration with: Oleg Bulyzhin <oleg rinet.ru> In collaboration with: dima <_pppp mail.ru>	2005-09-05 16:02:11 +00:00
Xin LI	5248ef8a3c	When padding with zero, do pad after prefixes rather than padding before prefixes. Use cases: printf("%05d", -42); --> "00-42" (should be "-0042") printf("%#05x", 12); --> "000xc" (should be "0x00c") Submitted by: Oliver Fromme PR: kern/85520 MFC After: 1 week	2005-09-04 18:03:45 +00:00
Poul-Henning Kamp	1e7d2c4763	If we ignore an unknown % sequence, we must stop interpreting the remaining % arguments because the varargs are now out of sync and there is a risk that we might for instance dereference an integer in a %s argument. Sponsored by: Napatech.com	2005-09-03 10:28:08 +00:00
John Baldwin	acc0265cc2	- Add some comments to some of the static lock orders. Don't explicitly link proctree and allproc to Giant since that order is already implicitly enforced. - Use a goto to handle the case where we want to enforce a reversal before calling isitmydescendant() in witness_checkorder() so that the logic is easier to follow and so that it is easier to add more forced-reversal cases in the future. MFC after: 3 days	2005-09-02 20:23:49 +00:00
John Baldwin	83cece6fa1	- Add an assertion to panic if one tries to call mtx_trylock() on a spin mutex. - Don't panic if a spin lock is held too long inside _mtx_lock_spin() if panicstr is set (meaning that we are already in a panic). Just keep spinning forever instead.	2005-09-02 20:21:49 +00:00
John Baldwin	83de502d59	Add witness warnings to panic if a thread tries to exit while holding any locks. Requested by: jeff MFC after: 3 days	2005-09-02 20:20:01 +00:00
Nate Lawson	9000b91eb9	Break out the checks for duplicates and absolute settings being too high instead of trying to do them all at once. This should fix the level sorting problems from the previous revision. Testing help: ume	2005-09-02 16:32:43 +00:00
Suleiman Souhlal	1f71de49e1	Print out a warning and a backtrace if we try to unlock a lockmgr that we do not hold. Glanced at by: phk MFC after: 3 days	2005-09-02 15:56:01 +00:00
Suleiman Souhlal	2611e5a6a9	Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed and unbusied in devfs_fixup(), which assumes that the devfs mount is still locked. Granced at by: phk MFC after: 3 days	2005-09-02 13:37:54 +00:00
Pawel Jakub Dawidek	d8b464e51e	In case of mac_check_vnode_rename_from() or vn_start_write() failure, vn_finished_write() should not be called. Reviewed by: ssouhlal MFC after: 3 days	2005-09-01 21:46:33 +00:00
Andre Oppermann	fdcc028d11	Changes and cleanups to m_sanity(): o for() instead of while() looping over mbuf chain o paren's around all flag checks o more verbose function and purpose description o some more style changes Based on feedback from: sam	2005-08-30 21:31:42 +00:00
Andre Oppermann	e0068c3a69	Unbreak m_demote() and put back the 'all' flag. Without it we cannot correctly test for m_nextpkt in an mbuf chain.	2005-08-30 21:14:30 +00:00
Andre Oppermann	fbe816384a	o Remove the 'all' flag from m_demote(). Users can simply call it with m_demote(m->m_next) if they wish to start at the second mbuf in chain. o Test m_type with == instead of &. o Check m_nextpkt against NULL instead of implicit 0. Based on feedback from: sam	2005-08-30 20:07:49 +00:00
Nate Lawson	5308b2a64e	Eliminate cpufreq levels for two cases that are less than optimal: 1. Walk the absolute list in reverse to prefer duplicated levels that have a lower absolute setting, i.e. 800 Mhz/50% is better than 1600 Mhz/25% even though both have the same actual frequency. This also removes the need to check for already-modified levels since by definition, those will be added later in the sorted list. 2. Compare the absolute settings for derived levels and don't use the new level if it's higher. For example, a level of 800 Mhz/75% is preferable to 1600 Mhz/25% even though the latter has a lower total frequency. This work is based on a patch from the submitter but reworked by myself. Submitted by: Tijl Coosemans (tijl/ulyssis.org)	2005-08-30 04:45:32 +00:00
Andre Oppermann	4da8443133	Add m_copymdata(struct mbuf m, struct mbuf n, int off, int len, int prep, int how). Copies the data portion of mbuf (chain) n starting from offset off for length len to mbuf (chain) m. Depending on prep the copied data will be appended or prepended. The function ensures that the mbuf (chain) m will be fully writeable by making real (not refcnt) copies of mbuf clusters. For the prepending the function returns a pointer to the new start of mbuf chain m and leaves as much leading space as possible in the new first mbuf. Reviewed by: glebius	2005-08-29 20:15:33 +00:00
Andre Oppermann	a048affba5	Add m_sanity(struct mbuf *m, int sanitize) to do some heavy sanity checking on mbuf's and mbuf chains. Set sanitize to 1 to garble illegal things and have them blow up later when used/accessed. m_sanity()'s main purpose is for KASSERT()'s and debugging of non- kosher mbuf manipulation (of which we have a number of). Reviewed by: glebius	2005-08-29 19:58:56 +00:00
Andre Oppermann	ed111688e9	Add m_demote(struct mbuf *m, int all) to clean up mbuf (chain) from any tags and packet headers. If "all" is set then the first mbuf in the chain will be cleaned too. This function is used before an mbuf, that arrived as packet with m->flags & M_PKTHDR, is appended to an mbuf chain using m->m_next (not m->m_nextpkt). Reviewed by: glebius	2005-08-29 19:45:39 +00:00
Pawel Jakub Dawidek	e37a499443	Add 'depth' argument to CTRSTACK() macro, which allows to reduce number of ktr slots used. If 'depth' is equal to 0, the whole stack will be logged, just like before.	2005-08-29 11:34:08 +00:00
Suleiman Souhlal	a6c109d658	Fix a typo in vop_rename_pre() where we ended up using vholdl() instead of vhold(), even though the vnode interlock is unlocked. MFC after: 3 days	2005-08-28 23:00:11 +00:00
Alan Cox	7f1ef325d7	Handle vm_map_wire()'s failure.	2005-08-28 05:38:40 +00:00
Alan Cox	5d3043ce9a	Correctly handle vm_map_wire()'s failure. (See also revisions 1.81 and 1.82.) Reviewed by: tegge	2005-08-28 04:50:11 +00:00
Alan Cox	45e31b6034	Eliminate an unneeded reference on a vm object. If, in fact, the nearby vm_map_find() fails, then the excess reference causes the vm object to be leaked. Reviewed by: tegge	2005-08-28 00:24:58 +00:00
Alan Cox	4167396552	Revert the previous change for two reasons: (1) If vm_map_find() succeeds but vm_map_wire() fails, then a vm object, vm map entries, and kernel_map free space is leaked and (2) unwiring is handled automatically by vm_map_remove(). Suggested by: tegge	2005-08-28 00:19:54 +00:00

... 5 6 7 8 9 ...

9300 Commits