freebsd-nq

Author	SHA1	Message	Date
Jeff Roberson	4b24e4210e	- Properly check against B_DELWRI and B_NEEDSGIANT. This check was incorrectly written and caused some !NEEDSGIANT buffers to be put in the NEEDSGIANT queue. Sponsored by: Isilon Systems, Inc.	2006-04-04 06:44:21 +00:00
Marcel Moolenaar	39eb1d1263	Increment kdb_active after we stopped the other CPUs and decrement kdb_active before we restart them. This avoids false positives on restarted CPUs when they test for kdb_active while kdb_trap() is still finishing up.	2006-04-04 00:40:20 +00:00
Marcel Moolenaar	bfcdefd8aa	Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the PCB in which the context of stopped CPUs is stored. To access this PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The definition, when present, lives in <machine/kdb.h> and abstracts where MD code saves the context. Define KDB_STOPPEDPCB on i386, amd64, alpha and sparc64 in accordance to previous code.	2006-04-03 22:51:47 +00:00
Peter Wemm	b9eee07e36	Remove the unused sva and eva arguments from pmap_remove_pages().	2006-04-03 21:16:10 +00:00
Marcel Moolenaar	5991a4f811	In kdb_trap(), change the type of the local variable 'intr' from int to register_t, as intr_disable() returns the latter and register_t may be wider than int. Pointed out by: marius@	2006-04-03 20:55:52 +00:00
Marcel Moolenaar	2fae8f5aed	Replace critical_enter() and critical_exit() in kdb_trap() with intr_disable() and intr_restore() resp. Previously, critical regions would have interrupts disabled, but that was changed. Consequently, the debugger could run with interrupts enabled. This could cause problems for the low-level console code where received characters would trigger an interrupt that causes the interrupt handler to read the character instead of the cngetc() function.	2006-04-03 17:48:09 +00:00
John-Mark Gurney	5e6125891f	mask out any action when copying the flags from the event to the knote.. Pointed out by: Václav Haisman Submitted by: Dan Nelson (slightly modifed patch) MFC after: 3 days	2006-04-01 20:15:39 +00:00
Robert Watson	bc725eafc7	Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months	2006-04-01 15:42:02 +00:00
Robert Watson	ac45e92ff2	Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months	2006-04-01 15:15:05 +00:00
Robert Watson	fa4c5373ce	Add comment to accept1() that it should use getsock() instead of fgetsock() to avoid additional mutex operations, and also to avoid use of soref/sorele which are now not preferred. MFC after: 3 months	2006-04-01 11:14:56 +00:00
Robert Watson	197b35d717	Mark fgetsock() and fputsock() as depcrecated: callers should rely on the file descriptor reference, rather than paying additional lock operations to acquire a socket reference from the file descriptor. This will also help to ensure that file descriptor based socket requests are not delivered to a socket after close. Most consumers have already been converted to this model. MFC after: 3 months	2006-04-01 11:09:54 +00:00
Robert Watson	7f689de232	Assert so->so_pcb is NULL in sodealloc() -- the protocol state should not be present at this point. We will eventually remove this assert because the socket layer should never look at so_pcb, but for now it's a useful debugging tool. MFC after: 3 months	2006-04-01 10:45:52 +00:00
Robert Watson	220c1357ed	Add a somewhat sizable comment documenting the semantics of various kernel socket calls relating to the creation and destruction of sockets. This will eventually form the foundation of socket(9), but is currently in too much flux to do so. MFC after: 3 months	2006-04-01 10:43:02 +00:00
Jeff Roberson	0af2472199	- Add an assert to vgone. It is illegal to call vgone without a reference to the vnode. Without a reference the vnode will never be vdestroy'd and the memory will never be reclaimed. Sponsored by: Isilon Systems, Inc.	2006-03-31 23:39:26 +00:00
Jeff Roberson	ba5eb429e3	- When there are dangling vnodes at unmount print them before we panic. Sponsored by: Isilon Systems, Inc.	2006-03-31 23:38:15 +00:00
Jeff Roberson	3bbd6d8ae6	- Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs(). Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.	2006-03-31 03:54:20 +00:00
Jeff Roberson	94bc95db3c	- Hold a reference from the time vfs_busy starts until vfs_unbusy is called. - vfs_getvfs has to return a reference to prevent the returned mountpoint from changing identities. - Release references acquired via vfs_getvfs. Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.	2006-03-31 03:53:25 +00:00
Jeff Roberson	c5fcce21c5	- GETWRITEMOUNT now returns a referenced mountpoint to prevent its identity from changing. This is possible now that mounts are not freed. Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.	2006-03-31 03:52:24 +00:00
Jeff Roberson	a218edceb2	- Allocate mounts from a uma zone that uses UMA_ZONE_NOFREE to prevent mount memory from being reclaimed. This resolves a number of race conditions described in vfs_default.c and introduced with the VFS_LOCK_GIANT macros. - Let the mtx and lock remain valid after the mount structure has been freed by using init and fini calls. Technically fini will never be called but is included for completeness. - Consistently use lockmgr directly rather than lockmgr to lock and vfs_unbusy to unlock. Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.	2006-03-31 03:49:51 +00:00
Jeff Roberson	fdf86b2dcd	- LK_RETRY means nothing when passed to VOP_LOCK. Call vn_lock instead. - Move the vn_lock of the dvp until after we've unbusied the filesystem to avoid a LOR with the mount point lock. - In the v_mountedhere while loop we acquire a new instance of giant each time through without releasing the first. This would cause us to leak Giant. Sponsored by: Isilon Systems, Inc.	2006-03-31 02:59:23 +00:00
Jeff Roberson	084d64ac21	- Add the B_NEEDSGIANT flag which is only set if the vnode that owns a buf requires Giant. It is set in bgetvp and cleared in brelvp. - Create QUEUE_DIRTY_GIANT for dirty buffers that require giant. - In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and only if we think there are buffers in that queue. Sponsored by: Isilon Systems, Inc.	2006-03-31 02:56:30 +00:00
Sam Leffler	00537061dd	fixup error handling in taskqueue_start_threads: check for kthread_create failing, print a message when we fail for some reason as most callers do not check the return value (e.g. 'cuz they're called from SYSINIT) Reviewed by: scottl MFC after: 1 week	2006-03-30 23:06:59 +00:00
Pawel Jakub Dawidek	177a987379	Fix a panic on sparc64 related to inproper aligment - we cannot assume, that 'unsigned char *' argument is 4 byte aligned. MFC after: 3 days	2006-03-30 18:45:50 +00:00
Marcel Moolenaar	6174e6ed12	Add scc(4), a driver for serial communications controllers. These controllers typically have multiple channels and support a number of serial communications protocols. The scc(4) driver is itself an umbrella driver that delegates the control over each channel and mode to a subordinate driver (like uart(4)). The scc(4) driver supports the Siemens SAB 82532 and the Zilog Z8530 and replaces puc(4) for these devices.	2006-03-30 18:33:22 +00:00
Paul Saab	fbb273bc05	Properly support for FreeBSD 4 32bit System V shared memory. Submitted by: peter Obtained from: Yahoo! MFC after: 3 weeks	2006-03-30 07:42:32 +00:00
John Baldwin	4b3b0413d2	Always explicitly panic in propogate_priority() if we try to propogate a lock's priority to a sleeping thread. When we panic, dump a stack trace of the thread that is asleep if DDB is compiled into the kernel just before calling panic(). This is much more informative and useful for debugging than the current behavior of getting a page fault and not having an easy way of determining which thread caused the original problem. MFC after: 1 week	2006-03-29 23:24:55 +00:00
John-Mark Gurney	4e095bc045	hold the list lock over the f_event and KNOTE_ACTIVATE calls... This closes a race where data could come in before we clear the INFLUX flag, and get skipped over by knote (and hence never be activated, though it should of been)... Found by: glebius & co. Reviewed by: glebius MFC after: 3 days	2006-03-29 18:15:30 +00:00
John Baldwin	33f19bee6f	- Conditionalize Giant around VFS operations for ALQ, ktrace, and generating a coredump as the result of a signal. - Fix a bug where we could leak a Giant lock if vn_start_write() failed in coredump(). Reported by: jmg (2)	2006-03-28 21:30:22 +00:00
John Baldwin	11178ee4c1	Conditionalize locking of Giant for VFS in acct(2). We already conditionally acquired Giant in the other parts of the accounting code.	2006-03-28 21:26:59 +00:00
John Baldwin	861dab08e7	Change vn_open() to honor the MPSAFE flag in the passed in nameidata object and use that instead of testing fdidx against -1 to determine if it should release Giant if Giant was locked due to the requested file residing on a non-MPSAFE VFS. Discussed with: jeff	2006-03-28 21:22:08 +00:00
Dag-Erling Smørgrav	867c089bc7	Revert previous commit at davidxu's insistance. Instead, use __DECONST (argh!) and rearrange the prototypes to make it clear that _umtx_op() is not deprecated.	2006-03-28 14:32:38 +00:00
Dag-Erling Smørgrav	b3efbabe87	The undocumented and deprecated system call _umtx_op() takes two pointer arguments. The first one is never used (all callers pass in 0); the second is sometimes used to pass in a struct timespec * which is used as a timeout and never modified. Constify that argument so callers can pass a const struct timespec * without jumping through hoops.	2006-03-28 09:18:34 +00:00
Alan Cox	7c8dcf2def	Use NET_LOCK_GIANT() and VFS_LOCK_GIANT() instead of unconditionally acquiring Giant in kern_sendfile(). Guard against the forced reclamation of a vnode in kern_sendfile(). Discussed with: jeff Reviewed by: tegge MFC after: 3 weeks	2006-03-27 04:23:16 +00:00
Robert Watson	63b01ffd34	Add a sysctl, regression.sonewconn_earlytest, which when options REGRESSION is enabled, allows user space to dictate that sonewconn() should skip it's "skip the hard work" check to see if the listen queue is full, and instead proceed with allocation of a socket and trimming of the overflowed queue. This makes it easier to test the queue overflow logic. MFC after: 1 month	2006-03-26 22:44:37 +00:00
Joseph Koshy	49874f6ea3	MFP4: Support for profiling dynamically loaded objects. Kernel changes: Inform hwpmc of executable objects brought into the system by kldload() and mmap(), and of their removal by kldunload() and munmap(). A helper function linker_hwpmc_list_objects() has been added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve the list of currently loaded kernel modules. The unused `MAPPINGCHANGE' event has been deprecated in favour of separate `MAP_IN' and `MAP_OUT' events; this change reduces space wastage in the log. Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to handle the map change callbacks. Change the default per-cpu sample buffer size to hold 32 samples (up from 16). Increment __FreeBSD_version. libpmc(3) changes: Update libpmc(3) to deal with the new events in the log file; bring the pmclog(3) manual page in sync with the code. pmcstat(8) changes: Introduce new options to pmcstat(8): "-r" (root fs path), "-M" (mapfile name), "-q"/"-v" (verbosity control). Option "-k" now takes a kernel directory as its argument but will also work with the older invocation syntax. Rework string handling in pmcstat(8) to use an opaque type for interned strings. Clean up ELF parsing code and add support for tracking dynamic object mappings reported by a v2.0.00 hwpmc(4). Report statistics at the end of a log conversion run depending on the requested verbosity level. Reviewed by: jhb, dds (kernel parts of an earlier patch) Tested by: gallatin (earlier patch)	2006-03-26 12:20:54 +00:00
David Xu	dbbccfe923	1. Move code for scanning pending I/O from aio_fsync to aio_aqueue, it has less overhead. 2. Avoid scheduling task if maximum number of I/O threads is reached.	2006-03-24 00:50:06 +00:00
David Xu	177e987e63	Regenerate.	2006-03-23 08:48:37 +00:00
David Xu	99eee864ad	Implement aio_fsync() syscall.	2006-03-23 08:46:42 +00:00
Pawel Jakub Dawidek	96c0381f5c	Destroy "bip" bio in error case. Found by: Coverity Prevent analysis tool Coverity ID: 795 MFC after: 3 days	2006-03-22 00:42:41 +00:00
Jeff Roberson	bacb51fb67	- Remove explicit giant acquires and replace it with VFS_LOCK_GIANT. Sponsored by: Isilon Systems, Inc.	2006-03-22 00:00:05 +00:00
Jeff Roberson	77c79550af	- Remove explicit calls to lock and unlock Giant and replace them with VFS_LOCK_GIANT/VFS_UNLOCK_GIANT calls. This completely removes Giant acquisition in the syscall path for ffs. Bug fix to kern_fhstatfs from: Todd Miller <Todd.Miller@sparta.com> Sponsored by: Isilon Systems, Inc.	2006-03-21 23:58:37 +00:00
David Xu	bf1a322061	Rethink it a bit, if there is a STOP flag, don't bother to resume other threads.	2006-03-21 10:05:15 +00:00
David Xu	568b4ebbcc	Because JOB control has higher priority than single threading in thread_suspend_check(), call thread_stopped() to report SIGCHLD if there is JOB control in progress.	2006-03-21 08:41:15 +00:00
Tor Egge	41d7199b9d	Remove unused leaked debug function prototype.	2006-03-21 01:04:24 +00:00
Christian S.J. Peron	2ed4894a26	Restore fd optimization with a few minor tweaks, to quote tegge: "fdinit() fails to initialize newfdp->fd_fd.fd_lastfile to -1. This breaks fdcopy() which will incorrectly set newfdp->fd_freefile to 1 if no files are open and the last file descriptor marked as unused for fdp was 0. This later causes descriptor 0 to be unavailable in newfdp when the optimization is enabled. When the last file descriptor previously marked as used is nonzero and marked as unused, fdunused() incorrectly sets fdp->fd_lastfile to fd - 1 due to fd_last_used() returning (size - 1). This hides the problem that breaks the optimization." This allows us to keep the optimization, while un-breaking it. This is a RELENG_6 candidate. PR: kern/87208 MFC after: 1 week Submitted by: tegge	2006-03-20 00:13:47 +00:00
Tor Egge	7de3839d0d	Let snapshots make a copy of old contents for all buffers taking part in a cluster instead of just the first buffer. Delay buf_start() calls until snapshots have a copy of old content. PR: kern/93942	2006-03-19 21:43:36 +00:00
Tor Egge	d50ef66d03	Don't call vn_finished_write() if vn_start_write() failed.	2006-03-19 20:43:07 +00:00
Jeff Roberson	e44270a781	- Correct an assert in vop_rename_pre. fdvp may be locked if it is either the target directory or file. This case should fail in the filesystem anyway and perhaps kern_rename() should catch it. Sponsored by: Isilon Systems, Inc.	2006-03-19 20:14:46 +00:00
Christian S.J. Peron	30bacc08e0	Back out fd optimization introduced in revision 1.280 as it appears to be really breaking things. Simple "close(0); dup(fd)" does not return descriptor "0" in some cases. Further, this change also breaks some MAC interactions with mac_execve_will_transition(). Under certain circumstances, fdcheckstd() can be called in execve(2) causing an assertion that checks to make sure that stdin, stdout and stderr reside at indexes 0, 1 and 2 in the process fd table to fail, resulting in a kernel panic when INVARIANTS is on. This should also kill the "dup(2) regression on 6.x" show stopper item on the 6.1-RELEASE TODO list. This is a RELENG_6 candidate. PR: kern/87208 Silence from: des MFC after: 1 week	2006-03-18 23:27:21 +00:00
Robert Watson	4d4b555efa	Modify UNIX domain sockets to guarantee, and assume, that so_pcb is always defined for an in-use socket. This allows us to eliminate countless tests of whether so_pcb is non-NULL, eliminating dozens of error cases. For now, retain the call to sotryfree() in the uipc_abort() path, but this will eventually move to soabort(). These new assumptions should be largely correct, and will become more so as the socket/pcb reference model is fixed. Removing the notion that so_pcb can be non-NULL is a critical step towards further fine-graining of the UNIX domain socket locking, as the so_pcb reference no longer needs to be protected using locks, instead it is a property of the socket life cycle.	2006-03-17 13:52:57 +00:00
Alan Cox	41634e2e8d	Correct two vm object reference leaks in error cases. Submitted by: davidxu	2006-03-16 08:51:59 +00:00
Robert Watson	92c07a345e	Change soabort() from returning int to returning void, since all consumers ignore the return value, soabort() is required to succeed, and protocols produce errors here to report multiple freeing of the pcb, which we hope to eliminate.	2006-03-16 07:03:14 +00:00
David Xu	795a11d049	Fix a race between file operations and rfork(RFCFDG) by parking all other threads at user boundary, the race can crash kernel under stress testing. Reviewed by: jhb MFC after: 3 days	2006-03-15 23:24:14 +00:00
Sam Leffler	47e2996e8b	promote fast ipsec's m_clone routine for public use; it is renamed m_unshare and the caller can now control how mbufs are allocated Reviewed by: andre, luigi, mlaier MFC after: 1 week	2006-03-15 21:11:11 +00:00
Poul-Henning Kamp	590487078f	Disable the "cputick increased..." message now that the dust has settled.	2006-03-15 20:22:32 +00:00
Alexander Leidinger	a8f47039c7	Fix memory leak introduced in previous revision. Discussed with: phk	2006-03-15 19:23:08 +00:00
Robert Watson	93709ad0be	As with socket consumer references (so_count), make sofree() return without GC'ing the socket if a strong protocol reference to the socket is present (SS_PROTOREF).	2006-03-15 12:45:35 +00:00
David Xu	e170bfda56	1. Count last time slice, this intends to fix "calcru: runtime went backwards" bug for threaded process. 2. Add comment about possible logical problem with scheduler. MFC after: 3 days	2006-03-14 04:00:21 +00:00
John-Mark Gurney	45e0d0aa30	spell pdata correctly, we now will only dump maxlen of each mbuf in the chain, instead of the entire mbuf... This should probably be reworked so that it prints at max maxlen bytes for the entire chain...	2006-03-14 00:22:10 +00:00
Ruslan Ermilov	936ddefcd6	The mount(8) manpage says: "In case of conflicting options being specified, the rightmost option takes effect." Fix code to obey this. This makes e.g. "mount -r /usr" or "mount -ar" actually mount file systems read-only.	2006-03-13 14:58:37 +00:00
David Xu	28e989e9ca	Remove unused code.	2006-03-13 10:37:25 +00:00
Christian S.J. Peron	a19fd0e766	Make sure that we are adding a path token to the audit record in open(2). Do this by making sure we are using the AUDITVNODE1 mask in the namei flags. Obtained from: TrustedBSD Project	2006-03-11 17:14:05 +00:00
Poul-Henning Kamp	272601f8f0	Go over calcru and friends once more. Reintroduce the monotonicity for the normal case and make the two special cases behave in what is belived to be the most sensible fasion.	2006-03-11 10:48:19 +00:00
Tor Egge	ca2fa80767	Block secondary writes while expunging active unlinked files. Fix detection of active unlinked files by checking VI_OWEINACT and VI_DOINGINACT in addition to v_usecount. Defer inactive handling for unlinked files if the file system is mostly suspended (secondary writes being blocked). Perform deferred inactive handling after the file system is resumed.	2006-03-11 01:08:37 +00:00
Jung-uk Kim	0d84d9ebb5	Implement printf 'X' conversion for both libstand and kernel.	2006-03-09 22:37:34 +00:00
Poul-Henning Kamp	fef527ee73	Oops, forgot newline.	2006-03-09 09:44:10 +00:00
Poul-Henning Kamp	0f038c05ea	Add slop to "backwards" cpu accounting messages, 3 usec or 1% whichever triggers. This should eliminate all the trivial messages which result from minor increases in cpu_tick frequency. Machines which don't du cpu clock fiddling shouldn't issue "backwards" messages now. Laptops and other machines where the initial estimate of cputicks may be waaaay off will still issue warnings.	2006-03-09 09:33:17 +00:00
Poul-Henning Kamp	6cda760f09	silence cpu_tick calibration and notice only (under bootverbose) when the frequency increases.	2006-03-09 09:30:33 +00:00
Poul-Henning Kamp	c8d7706e75	Ignore kenv strings which overflow the room we have, rather than pretend we have room for them.	2006-03-09 09:29:41 +00:00
David Xu	7b8d5e4865	Remove _STOPEVENT call, it is already called in issignal, simplify code for SIGKILL signal.	2006-03-09 08:31:51 +00:00
Tor Egge	791dd2fade	Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.	2006-03-08 23:43:39 +00:00
Stephan Uphoff	68ff3c2445	Fix exec_map resource leaks. Tested by: kris@	2006-03-08 20:21:54 +00:00
Andre Oppermann	a7bd90ef93	Properly handle the case when the packet secondary zone can't allocate further mbuf clusters to attach to mbufs. Reported by: kris Tested by: kris Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-03-08 14:05:38 +00:00
John Baldwin	88ca07e79a	Style nit.	2006-03-07 22:17:26 +00:00
John Baldwin	67c0796ca3	For consistency sake, use >= MINCLSIZE rather than > MINCLSIZE to determine whether or not to allocate a full mbuf cluster rather than just a plain mbuf when adding on additional mbufs in m_getm(). In practice, there wasn't any resulting mem trashing since m_getm() doesn't ever allocate an mbuf with a packet header, and MINCLSIZE is the available payload in an mbuf with a header rather than the available payload in a plain mbuf. Discussed with: andre (lightly)	2006-03-07 21:31:20 +00:00
Poul-Henning Kamp	fccfcfba00	Add missing cast.	2006-03-04 06:07:26 +00:00
Poul-Henning Kamp	5b51d1de62	More detailed logging if timestepwarnings are enabled.	2006-03-04 06:06:43 +00:00
Paul Saab	6308f39da8	use strlcpy in cvtstatfs and copy_statfs instead of bcopy to ensure the copied strings are properly terminated. bzero the statfs32 struct in copy_statfs.	2006-03-04 00:09:09 +00:00
Paul Saab	45d48bdad5	Fix bug in malloc_uninit(): Releasing items from the mt_zone can not be done by a simple uma_zfree() call since mt_zone is allocated with the UMA_ZONE_MALLOC flag. Use uma_zfree_arg instead and supply the slab. This bug caused panics in low memory situations on unloading kernel modules containing MALLOC_DEFINE(..) statements. Submitted by: ups	2006-03-03 22:36:52 +00:00
Paul Saab	6815739e00	Don't truncate f_mntfromname & f_mntonname to 16 characters when translating statfs into ostatfs. This allows 4.x binaries making statfs calls to work on 6.x.	2006-03-03 07:20:54 +00:00
Marcus Alves Grando	b4130b8ae0	- Print message about cpufreq and timecounter TSC Approved by: njl MFC after: 1 day	2006-03-03 02:06:04 +00:00
Tor Egge	3b582b4e72	Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.	2006-03-02 22:13:28 +00:00
Tor Egge	b983aac762	Don't try to show marker nodes.	2006-03-02 21:31:15 +00:00
David Xu	3dfcaad667	Add signal set sq_kill to sigqueue structure, the member saves all signals sent by kill() syscall, without this, a signal sent by sigqueue() can cause a signal sent by kill() to be lost.	2006-03-02 14:06:40 +00:00
Poul-Henning Kamp	301af28a06	Suffer a little bit of math every 16 second and tighten calibration of cpu_ticks to the low side of PPM.	2006-03-02 08:09:46 +00:00
Jeff Roberson	eb2ea10590	- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris	2006-03-02 05:50:23 +00:00
David Xu	80452384e6	Regenerate.	2006-03-01 06:49:38 +00:00
David Xu	61d3a4efc2	Let kernel POSIX timer code and mqueue code to use integer as a resource handle, the timer_t and mqd_t types will be a pointer which userland will define it.	2006-03-01 06:29:34 +00:00
Paul Saab	fa545f434c	Fix 32bit sendfile by implementing kern_sendfile so that it takes the header and trailers as iovec arguments instead of copying them in inside of sendfile. Reviewed by: jhb MFC after: 3 weeks	2006-02-28 19:39:18 +00:00
Gleb Smirnoff	73bb09f2d0	One more grammar nit. Submitted by: ru	2006-02-27 07:22:32 +00:00
David Xu	27b8220d12	1. Remove aio entry from lists earlier in aio_free_entry, so other threads can not see it if we unlock the proc lock (this can happen in knlist_delete). Don't do wakeup, it is not necessary. 2. Decrease kaio_buffer_count in biohelper rather than doing it in aio_bio_done_notify. 3. In aio_bio_done_notify, don't send notification if KAIO_RUNDOWN was set, because the process is already in single thread mode. 4. Use assignment to initialize aiothreadflags. 5. AIOCBLIST_RUNDOWN is not useful, axe the code using it. 6. use LIO_NOP instead of zero.	2006-02-26 12:56:23 +00:00
Gleb Smirnoff	fcf9061858	Fix several typos and trim spaces at eol. PR: kern/93759 Submitted by: Antoine Brodin <antoine.brodin laposte.net>	2006-02-26 11:44:28 +00:00
Scott Long	6ec6fb9bc6	Always print a newline char at the end of the line.	2006-02-25 16:20:22 +00:00
John Baldwin	b36f458861	Use the recently added msleep_spin() function to simplify the callout_drain() logic. We no longer need a separate non-spin mutex to do sleep/wakeup with, instead we can now just use the one spin mutex to manage all the callout functionality.	2006-02-23 19:13:12 +00:00
David Xu	7e0221a251	1. Refine kern_sigtimedwait() to remove redundant code. 2. Fix a bug, if thread got a SIGKILL signal, call sigexit() to kill its process. MFC after: 3 days	2006-02-23 09:24:19 +00:00
David Xu	7c9a98f15b	Code cleanup, simply compare with curproc.	2006-02-23 05:50:55 +00:00
Jeff Roberson	8febcfb92f	- Use vfs_ref/rel to protect a mountpoint from going away while VFS_STATFS is being called. Be sure to grab the ref before we unlock the vnode to prevent the mount from disappearing. Tested by: kris	2006-02-23 05:18:07 +00:00
Jeff Roberson	a1db11fc40	- Release the mount ref once the vnode has been recycled rather than once the last reference is dropped. I forgot that vnodes can stick around for a very long time until processes discover that they are dead. This means that a vnode reference is not sufficient to keep the mount referenced and even more code will be required to ref mount points. Discovered by: kris	2006-02-23 05:15:37 +00:00
David Xu	dc94f5e383	Move comments to more accurate place.	2006-02-23 03:42:17 +00:00
David Xu	c008d51784	Fix a sleep queue race for KSE thread. Reviewed by: jhb	2006-02-23 00:13:58 +00:00
John Baldwin	daad1cd74d	Fixup some comments. Mutexes's are locked, not entered for several years now and msleep blocks threads rather than processes.	2006-02-22 20:46:10 +00:00
John Baldwin	06ad42b2f7	Close some races between procfs/ptrace and exit(2): - Reorder the events in exit(2) slightly so that we trigger the S_EXIT stop event earlier. After we have signalled that, we set P_WEXIT and then wait for any processes with a hold on the vmspace via PHOLD to release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops to zero. - Change proc_rwmem() to require that the processing read from has its vmspace held via PHOLD by the caller and get rid of all the junk to screw around with the vmspace reference count as we no longer need it. - In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it doesn't exist. - Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem() to clear an earlier single-step simualted via a breakpoint). We only do one to avoid races. Also, by making the EINVAL error for unknown requests be part of the default: case in the switch, the various switch cases can now just break out to return which removes a _lot_ of duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug where a LWP ptrace command could return EINVAL with the proc lock still held. - Changed the locking for ptrace_single_step(), ptrace_set_pc(), and ptrace_clear_single_step() to always be called with the proc lock held (it was a mixed bag previously). Alpha and arm have to drop the lock while the mess around with breakpoints, but other archs avoid extra lock release/acquires in ptrace(). I did have to fix a couple of other consumers in kern_kse and a few other places to hold the proc lock and PHOLD. Tested by: ps (1 mostly, but some bits of 2-4 as well) MFC after: 1 week	2006-02-22 18:57:50 +00:00
John Baldwin	54690b5679	Don't do a PHOLD() in kthread_create() w/o a matching PRELE() in kthread_exit(). Rather than add the missing PRELE() I chose to just axe the PHOLD() since it was redundant with the P_SYSTEM flag. MFC after: 1 week	2006-02-22 17:21:45 +00:00
John Baldwin	8f95fc2481	Various style and comment fixes. Submitted by: bde	2006-02-22 16:58:48 +00:00
Wayne Salamon	bc5504b942	Add pathname and/or vnode argument auditing for the following system calls: quotactl, statfs, fstatfs, fchdir, chdir, chroot, open, mknod, mkfifo, link, symlink, undelete, unlink, access, eaccess, stat, lstat, pathconf, readlink, chflags, lchflags, fchflags, chmod, lchmod, fchmod, chown, lchown, fchown, utimes, lutimes, futimes, truncate, ftruncate, fsync, rename, mkdir, rmdir, getdirentries, revoke, lgetfh, getfh, extattrctl, extattr_set_file, extattr_set_link, extattr_get_file, extattr_get_link, extattr_delete_file, extattr_delete_link, extattr_list_file, extattr_list_link. In many cases the pathname and vnode auditing is done within namei lookup instead of directly in the system call. Audit the remaining arguments to these system calls: fstatfs, fchdir, open, mknod, chflags, lchflags, fchflags, chmod, lchmod, fchmod, chown, lchown, fchown, futimes, ftruncate, fsync, mkdir, getdirentries.	2006-02-22 16:04:20 +00:00
Jeff Roberson	c5dcb84008	- Revert r1.406 until a solution can be found that doesn't break nfs. The statfs handler in nfs will lock vnodes which may lead to deadlock or recursion. Found by: kris Pointy hat to: me	2006-02-22 09:52:25 +00:00
Jeff Roberson	a4aeaefe5a	- We can not hold a vnode lock while we do a lookup. Search for and load modules prior to looking up the directory which we will cover to avoid this problem in mount. - We must hold the coveredvp locked before we can busy the mountpoint to prevent a lock order reversal with the vfs_busy() in lookup which holds the directory lock prior to doing a vfs_busy(). The directory lock is required to safely clear the v_mountedhere field on the directory. MFC After: 1 week	2006-02-22 06:29:55 +00:00
Jeff Roberson	8a7cd2fdfb	- Grab a mnt ref in vfs_busy() before dropping the interlock. This will prevent the mount point from going away while we're waiting on the lock. The ref does not need to persist once we have the lock because the lock prevents the mount point from being unmounted. MFC After: 1 week	2006-02-22 06:20:12 +00:00
Jeff Roberson	05b6a20a66	- Hold the vnode used in the statfs related functions until we're done with the VFS_STATFS call to prevent the mount from disappearing while we're stating. - Convert these routines to use MPSAFE namei semantics. MFC After: 1 week	2006-02-22 06:19:08 +00:00
David Xu	ba0360b135	Abstract function mqfs_create_node() to create a mqueue node.	2006-02-22 02:38:25 +00:00
David Xu	ad8de0f243	If block size is zero, use normal file operations to do I/O, this eliminates a divided-by-zero fault. Recommended by: phk	2006-02-22 00:05:12 +00:00
John Baldwin	bd106be404	Move the ruadd() in kern_exit() to save our final stats in our child stats even further down in exit1() so that it includes the runtime and tick counts from the final time slice for the dying thread. Reviewed by: phk	2006-02-21 21:48:42 +00:00
John Baldwin	6fc6433ecd	Split calcru() back into a calcru1() function shared with calccru() and a calcru() wrapper that passes a local rusage_ext on the stack that is a snapshot to do the calculations on. Now we can pass p->p_crux to calcru1() in calccru() again which fixes the issues with runtime going backwards messages when dead processes are harvested by init. Reviewed by: phk Tested by: Stefan Ehmann shoesoft at gmx dot net	2006-02-21 21:47:46 +00:00
Andre Oppermann	80444f8803	The sysctls kern.ipc.[max_linkhdr\|max_protohdr\|max_hdr\|max_datalen] can't be changed from userland. Make them read-only and provide descriptions. kern.ipc.max_datalen must never be less than one byte. Enforce this with a panic in net_init_domain(). Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-02-18 17:16:18 +00:00
Andre Oppermann	ec63cb90a3	Replace the 4k fixed sized jumbo mbuf clusters with PAGE_SIZE sized jumbo mbuf clusters. To make the variable size clear they are named MJUMPAGESIZE. Having jumbo clusters with the native PAGE_SIZE is more useful than a fixed 4k size according the device driver writers using this API. The 9k and 16k jumbo mbuf clusters remain unchanged. Requested by: glebius, gallatin Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-02-17 14:14:15 +00:00
Andre Oppermann	a4684d742d	Make sysctl_msec_to_ticks(SYSCTL_HANDLER_ARGS) generally available instead of being private to tcp_timer.c. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-02-16 15:40:36 +00:00
David Xu	94f0972bec	Fix a long standing race between sleep queue and thread suspension code. When a thread A is going to sleep, it calls sleepq_catch_signals() to detect any pending signals or thread suspension request, if nothing happens, it returns without holding process lock or scheduler lock, this opens a race window which allows thread B to come in and do process suspension work, however since A is still at running state, thread B can do nothing to A, thread A continues, and puts itself into actually sleeping state, but B has never seen it, and it sits there forever until B is woken up by other threads sometimes later(this can be very long delay or never happen). Fix this bug by forcing sleepq_catch_signals to return with scheduler lock held. Fix sleepq_abort() by passing it an interrupted code, previously, it worked as wakeup_one(), and the interruption can not be identified correctly by sleep queue code when the sleeping thread is resumed. Let thread_suspend_check() returns EINTR or ERESTART, so sleep queue no longer has to use SIGSTOP as a hack to build a return value. Reviewed by: jhb MFC after: 1 week	2006-02-15 23:52:01 +00:00
Wayne Salamon	085a0d43ca	Audit the arguments to the ptrace(2) system call. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-02-14 01:18:31 +00:00
Wayne Salamon	bfd7575a39	Audit the arguments to the kill(2) and killpg(2) system calls. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-02-14 01:17:03 +00:00
David Xu	d8267df729	In order to speed up process suspension on MP machine, send IPI to remote CPU. While here, abstract thread suspension code into a function called sig_suspend_threads, the function is called when a process received a STOP signal.	2006-02-13 03:16:55 +00:00
Robert Watson	13f322c2fc	Improve consistency of return() style. MFC after: 3 days	2006-02-12 15:00:27 +00:00
Poul-Henning Kamp	e8444a7e6f	CPU time accounting speedup (step 2) Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64	2006-02-11 09:33:07 +00:00
David Xu	42925630b6	Test before modifying p_sflag to avoid unconditionally cache line ping-pong on SMP.	2006-02-10 14:59:16 +00:00
David Xu	71b7afb2b4	Call thread_stopped in thr_exit to notify parent that the child process is now fully stopped, this was already in kse_exit().	2006-02-10 03:34:29 +00:00
Poul-Henning Kamp	eb2da9a51f	Simplify system time accounting for profiling. Rename struct thread's td_sticks to td_pticks, we will need the other name for more appropriately named use shortly. Reduce it from uint64_t to u_int. Clear td_pticks whenever we enter the kernel instead of recording its value as reference for userret(). Use the absolute value of td->pticks in userret() and eliminate third argument.	2006-02-08 08:09:17 +00:00
Poul-Henning Kamp	5b1a8eb397	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.	2006-02-07 21:22:02 +00:00
John Baldwin	222fdf4bff	Provide some anti-footshooting. Don't allow the user to set the interval for acctwatch() runs to be negative or zero as this could result in either a possible hang (or panic if INVARIANTS is on). Previously the accounting code handled the <= 0 case by calling acctwatch on every clock tick (eww!) due to an implementation detail of callout_reset(). (Tick counts of <= 0 are converted to 1). MFC after: 3 days	2006-02-07 18:59:47 +00:00
John Baldwin	505a14934e	- Add a kthread to periodically call acctwatch() when accounting is active instead of calling acctwatch() from softclock. The acctwatch() function needs to hold an sx lock and also makes a VFS call, and neither of these are good things (or safe) to do from a callout. The kthread only exists and is running when accounting is turned on; it is started and stopped as needed. I didn't run acctwatch() via the thread taskqueue at Robert's request as he was worried that if the accounting file was over NFS the VFS_STAT() calls might stall other work on the taskqueue. - Add an acct_disable() function to take care of closing the accounting vnode and cleaning up so we don't duplicate the same code in two different places. MFC after: 3 days	2006-02-07 16:04:03 +00:00
John Baldwin	8917b8d28c	- Always call exec_free_args() in kern_execve() instead of doing it in all the callers if the exec either succeeds or fails early. - Move the code to call exit1() if the exec fails after the vmspace is gone to the bottom of kern_execve() to cut down on some code duplication.	2006-02-06 22:06:54 +00:00
John Baldwin	809f984b21	Add a kern_eaccess() function and use it to implement xenix_eaccess() rather than kern_access(). Suggested by: rwatson	2006-02-06 22:00:53 +00:00
John Baldwin	934ba9b2cf	- Move the wakeup() for exiting kthreads out of exit1() and into kthread_exit() as that is cleaner and less obscured. It also does the wakeup sooner. - Add some comments to kthread_exit().	2006-02-06 21:56:13 +00:00
John Baldwin	2c9d9d392a	We don't need the proc lock to check P_KTHREAD on curthread since it is only set before the kthread starts executing and is never cleared.	2006-02-06 21:54:47 +00:00
Olivier Houchard	2a3b10658d	rwlock expects the struct thread to be aligned on 8 bytes, so make sure thread0 is.	2006-02-06 16:03:10 +00:00
Jeff Roberson	04f6d3effa	- Add a ref count to the mount structure. Sleep for up to 3 seconds in vfs_mount_destroy waiting for this ref to hit 0. We don't print an error if we are rebooting as the root mount always retains some refernces by init proc. - Acquire a mnt ref for every vnode allocated to a mount point. Drop this ref only once vdestroy() has been called and the mount has been freed. - No longer NULL the v_mount pointer in delmntque() so that we may release the ref after vgone() has been called. This allows us to guarantee that the mount point structure will be valid until the last vnode has lost its last ref. - Fix a few places that rely on checking v_mount to detect recycling. Sponsored by: Isilon Systems, Inc. MFC After: 1 week	2006-02-06 10:19:50 +00:00
Jeff Roberson	2f0bca553a	- Don't check v_mount for NULL to determine if a vnode has been recycled. Use the more appropriate VI_DOOMED flag instead. Sponsored by: Isilon Systems, Inc. MFC After: 1 week	2006-02-06 10:15:27 +00:00
Jeff Roberson	36a52c3cae	- Add the global 'rebooting' variable that is used to detect when boot() has been called. Sponsored by: Isilon Systems, Inc. MFC After: 1 week	2006-02-06 10:12:00 +00:00
David Xu	ea8e65b0fa	Add members pl_sigmask and pl_siglist into ptrace_lwpinfo to get lwp's signal mask and pending signals.	2006-02-06 09:41:56 +00:00
Robert Watson	9653775b18	Regenerate.	2006-02-06 02:00:32 +00:00
Robert Watson	c983324ef5	Prefer AUE_FOO audit identifiers to AUE_O_FOO, which are largely left over from the Darwin implementation. When we implement a system call as a wrapper to sysctl(), audit it as AUE_SYSCTL. This leads to greater compatibility with Solaris audit trails as sysctl() argument tokens are not the same as the ones for the originaly system calls (i.e., setdomainname()). Replace references to AUE_ events that are equivilent to AUE_NULL with AUE_NULL. In the case of process signal configuration, this is because these events do not require auditing. Move from the Darwin spelling of getsockopt() to the FreeBSD/Solaris one. Audit nmount(). Obtained from: TrustedBSD Project	2006-02-06 02:00:06 +00:00
Robert Watson	89964dd284	When exiting a thread, submit any pending record. Today, we don't audit thread exit, but should that happen, this will prevent unhappiness, as the thread exit system call will never return, and hence not commit the record. Pointed out by/with: cognet Obtained from: TrustedBSD Project	2006-02-06 01:51:08 +00:00
Wayne Salamon	2f8a46d5ff	Audit the arguments (user/group IDs) for the system calls that set these IDs. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-02-06 00:32:33 +00:00
Wayne Salamon	ad20c8f325	Audit the args to rfork(), and the child PID for all fork system calls. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-02-06 00:28:50 +00:00
Wayne Salamon	de3007e8f3	Audit the pid being requested in wait4(). Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-02-06 00:19:09 +00:00
Wayne Salamon	a750d0b2a2	Add auditing of arguments to the close() and fstat() system calls. Much more argument auditing yet to come, for remaining system calls in this file. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-02-05 23:57:32 +00:00
Robert Watson	00c28d9678	On process exit, audit the return value of the process, and commit the record immediately, as this system call never returns. Obtained from: TrustedBSD Project	2006-02-05 21:08:25 +00:00
Robert Watson	6e8525ce84	When GC'ing a thread, assert that it has no active audit record. This should not happen, but with this assert, brueffer and I would not have spent 45 minutes trying to figure out why he wasn't seeing audit records with the audit version in CVS. Obtained from: TrustedBSD Project	2006-02-05 21:06:09 +00:00
Robert Watson	95fea57c65	Add AUDITVNODE[12] flags to namei(), which cause namei() to audit path and vnode attribute information for looked up vnodes during the lookup operation. This will allow consumers of namei() to specify that this information be added to the in-process audit record. Submitted by: wsalamon Obtained from: TrustedBSD Project	2006-02-05 15:42:01 +00:00
David Xu	25c926f1b0	Regenerate.	2006-02-05 02:23:41 +00:00
David Xu	9e7d72246f	Implement thr_set_name to set a name for thread. Reviewed by: julian	2006-02-05 02:18:46 +00:00
David Xu	7f96995ebd	Create childproc_jobstate function to report job control state, this also fixes a bug in childproc_continued which ignored PS_NOCLDSTOP.	2006-02-04 14:10:57 +00:00

1 2 3 4 5 ...

9300 Commits