freebsd-skq

Author	SHA1	Message	Date
jhb	311f35fbe0	A few whitespace fixes.	2008-01-02 17:09:15 +00:00
jeff	d473542286	- Place the fhold() in unp_internalize_fp to be more consistent with refs. - Clear all of the gc flags before doing a run. Stale flags were causing us to skip some descriptors. - If a unp socket has been marked REF in a gc pass it can't be dead. Found by: rwatson's test tool.	2008-01-01 01:46:42 +00:00
rodrigc	9cea85864d	In vfs_scanopt(), make sure that the mount option value is not NULL before calling vsscanf(). PR: 118531 Submitted by: Jaakko Heinonen <jh saunalahti fi> MFC after: 3 days	2007-12-31 23:44:53 +00:00
jhb	f9f7dedb21	Actually declare the kern.features sysctl node. Pointy hat to: jhb	2007-12-31 22:03:57 +00:00
jeff	c8db393cdc	- Pause a while after disabling lock profiling and before resetting it to be sure that all participating CPUs have stopped updating it. - Restore the behavior of printing the name of the lock type in the output.	2007-12-31 03:45:51 +00:00
jeff	319a3672d9	- Check the correct variable against NULL in two places. - If the unp_file is NULL that means it has never been internalized and it must be reachable.	2007-12-31 03:44:54 +00:00
imp	95c2febd5a	Rather than not redirting the bp when we get ENXIO, only redirty it when the error is EIO. This catches a much larger class of errors that are unlikely to succeed if retried. Submitted by: bde	2007-12-30 05:53:45 +00:00
jeff	ce18638805	Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho	2007-12-30 01:42:15 +00:00
alc	4565fa1697	Add the superpage reservation system. This is "part 2 of 2" of the machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date. Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)	2007-12-29 19:53:04 +00:00
rwatson	166f16a0c6	In "show lockedvnods" DDB command, use db_printf() rather than printf() so that the results end up in the DDB output stream rather than the console output stream. This should likely also be done for the vprint() function it calls. MFC after: 3 months	2007-12-28 00:47:31 +00:00
attilio	c720b77ca9	Trimm out now unused option LK_EXCLUPGRADE from the lockmgr namespace. This option just adds complexity and the new implementation no longer will support it, so axing it now that it is unused is probabilly the better idea. FreeBSD version is bumped in order to reflect the KPI breakage introduced by this patch. In the ports tree, kris found that only old OSKit code uses it, but as it is thought to work only on 2.x kernels serie, version bumping will solve any problem.	2007-12-28 00:38:13 +00:00
attilio	bd42361fd1	In order to avoid a huge class of deadlocks (in particular in interactions with the interlock), owner of the lock should be only curthread or at least, for its limited usage, NULL which identifies LK_KERNPROC. The thread "extra argument" for the lockmgr interface is going to be removed in the near future, but for the moment, just let kernel run for some days with this check on in order to find potential deadlocking places around the kernel and fix them.	2007-12-27 22:56:57 +00:00
rwatson	010adfaab6	Return ESRCH when a kernel stack is queried on a process in execve() -- p_candebug() will return EAGAIN which, if the other process never leaves execve(), will result in the sysctl spinning and never returning to userspace. Processes should always eventually leave execve(), but spinning in kernel while we wait is bad for countless reasons, and particularly harmful if execve() itself is deadlocked. Possibly we should return another error, or return a marker indicating the thread is in execve() so it can be reported that way in userspace. Reported by: kris	2007-12-27 22:44:01 +00:00
attilio	d9b244638e	As LK_EXCLUPGRADE is used in conjuction with LK_NOWAIT, LK_UPGRADE becames equivalent with this and so operate the switch. That call is the only one remaining LK_EXCLUPGRADE consumer and removing it will prepare the ground for LK_EXCLUPGRADE axing and further lockmgr improvements. Discussed with: jeff, ups	2007-12-27 20:52:05 +00:00
imp	430e46303c	A partial solution to some of the 'pull the umass device with a mounted FS' problems. These are more along the lines of 'avoiding an avoidable panic' than a complete solution to removable devices. We now close the barn door after the horse has gotten lose and has been hit by a truck, as it were. The barn no longer catches fire in this case, but the horse is still dead :-). The vfs_bio.c fix causes us not to put a failed write back into the dirty pool if the error returned was ENXIO. In that case, the buffer is treated like any other clean buffer that's being retured. ENXIO means the device isn't there anymore and will never be there again in the future, so retrying is futile. The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file system. If the device is gone, retrying later won't help and we'll never be able to unmount the device. These two are part of a larger patch set submitted by the author. The other patches will be forth coming. I added comments to these two patches. Submitted by: Henrik Gulbrandsen Reviewed by: phk@ PR: usb/46176 (partial)	2007-12-27 16:38:28 +00:00
rwatson	956e2983ba	Add textdump(4) facility, which provides an alternative form of kernel dump using mechanically generated/extracted debugging output rather than a simple memory dump. Current sources of debugging output are: - DDB output capture buffer, if there is captured output to save - Kernel message buffer - Kernel configuration, if included in kernel - Kernel version string - Panic message Textdumps are stored in swap/dump partitions as with regular dumps, but are laid out as ustar files in order to allow multiple parts to be stored as a stream of sequentially written blocks. Blocks are written out in reverse order, as the size of a textdump isn't known a priori. As with regular dumps, they will be extracted using savecore(8). One new DDB(4) command is added, "textdump", which accepts "set", "unset", and "status" arguments. By default, normal kernel dumps are generated unless "textdump set" is run in order to schedule a textdump. It can be canceled using "textdump unset" to restore generation of a normal kernel dump. Several sysctls exist to configure aspects of textdumps; debug.ddb.textdump.pending can be set to check whether a textdump is pending, or set/unset in order to control whether the next kernel dump will be a textdump from userspace. While textdumps don't have to be generated as a result of a DDB script run automatically as part of a kernel panic, this is a particular useful way to use them, as instead of generating a complete memory dump, a simple transcript of an automated DDB session can be captured using the DDB output capture and textdump facilities. This can be used to generate quite brief kernel bug reports rich in debugging information but not dependent on kernel symbol tables or precisely synchronized source code. Most textdumps I generate are less than 100k including the full message buffer. Using textdumps with an interactive debugging session is also useful, with capture being enabled/disabled in order to record some but not all of the DDB session. MFC after: 3 months	2007-12-26 11:32:33 +00:00
wkoszek	2179e058f4	Rewrite kern.console handling in sbuf(9). My intention is to leave kern.console format as is. Thus, no difference in output format should appear after this commit. Reviewed by: cognet@ (mentor) Approved by: cognet@ (mentor)	2007-12-25 21:17:34 +00:00
rwatson	bdee30611d	Add a new 'why' argument to kdb_enter(), and a set of constants to use for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run. Assign approximate why values to all current consumers of the kdb_enter() interface.	2007-12-25 17:52:02 +00:00
julian	265714a11e	give thread0 the tid 100000 and bumpt the others to start at 100001 MFC after: 1 week	2007-12-22 04:56:48 +00:00
wkoszek	7d4fbe3a46	Make SCHED_ULE buildable with gcc3. Reviewed by: cognet (mentor), jeffr Approved by: cognet (mentor), jeffr	2007-12-21 23:30:18 +00:00
imp	46b79678a8	When devclass_get_maxunit is passed a NULL, return -1 to indicate that there's nothing allocated at all yet.	2007-12-19 22:05:07 +00:00
obrien	ee337f2c34	Be more exact with sigaction SA_SIGINFO handling. Reviewed by: marcel	2007-12-18 20:39:13 +00:00
kmacy	41d59439f8	Add SB_NOCOALESCE flag to disable socket buffer update in place	2007-12-17 10:02:01 +00:00
davidxu	496a3a2c52	Check NULL pointer.	2007-12-17 08:09:37 +00:00
davidxu	747bbe486b	Add missing changes for fixing LOR of umtx lock and thread lock, follow the committing of files: kern_resource.c revision 1.181 sched_4bsd.c revision 1.111 sched_ule.c revision 1.218	2007-12-17 05:55:07 +00:00
jeff	4ec9caf00c	Refactor select to reduce contention and hide internal implementation details from consumers. - Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details. Tested by: kris Reviewed on: arch	2007-12-16 06:21:20 +00:00
rrs	edce837b8c	- fix tab to space issue, hmm maybe I should use vi.	2007-12-15 23:14:53 +00:00
jeff	12adc443d6	- Re-implement lock profiling in such a way that it no longer breaks the ABI when enabled. There is no longer an embedded lock_profile_object in each lock. Instead a list of lock_profile_objects is kept per-thread for each lock it may own. The cnt_hold statistic is now always 0 to facilitate this. - Support shared locking by tracking individual lock instances and statistics in the per-thread per-instance lock_profile_object. - Make the lock profiling hash table a per-cpu singly linked list with a per-cpu static lock_prof allocator. This removes the need for an array of spinlocks and reduces cache contention between cores. - Use a seperate hash for spinlocks and other locks so that only a critical_enter() is required and not a spinlock_enter() to modify the per-cpu tables. - Count time spent spinning in the lock statistics. - Remove the LOCK_PROFILE_SHARED option as it is always supported now. - Specifically drop and release the scheduler locks in both schedulers since we track owners now. In collaboration with: Kip Macy Sponsored by: Nokia	2007-12-15 23:13:31 +00:00
obrien	56e498fb3b	style.Makefile(5)	2007-12-14 21:30:51 +00:00
davidxu	3695a78889	Fix LOR of thread lock and umtx's priority propagation mutex due to the reworking of scheduler lock. MFC: after 3 days	2007-12-11 08:25:36 +00:00
rwatson	cce7cfdaf5	Check for P_WEXIT before PHOLD() on a process in kstack and vm query sysctls, as PHOLD() asserts !P_WEXIT. Reported by: Michael Plass <mfp49_freebsd at plass-family dot net>	2007-12-09 17:22:27 +00:00
jkoshy	72c27d71d8	Kernel and hwpmc(4) support for callchain capture. Sponsored by: FreeBSD Foundation and Google Inc.	2007-12-07 08:20:17 +00:00
jhb	0373a29045	Move several data structure definitions out of freebsd32_misc.c and into freebsd32.h instead. MFC after: 1 week	2007-12-06 23:11:27 +00:00
rrs	28889e11f4	- Puts default limits on 4k/9k and 16k zones for mbufs all based on 1/2 of each of the successive limits tied to the limit for 2k clusters. - Adds real functionality in so that doing a sysctl to change these actually changes them :-) MFC after: 1 week	2007-12-05 15:29:44 +00:00
kib	53229c8ee9	Use curthread instead of the FIRST_THREAD_IN_PROC for vnlru and syncer, when applicable. Aquire Giant slightly later for vnlru. In the syncer, aquire the Giant only when a vnode belongs to the non-MPsafe fs. In both speedup_syncer() and syncer_shutdown(), remove the syncer thread from the lbolt sleep queue after the syncer state is modified, not before. Herded by: attilio Tested by: Peter Holm Reviewed by: ups MFC after: 1 week	2007-12-05 09:34:04 +00:00
rodrigc	ca18adc0b5	In nmount(), internally convert the mount option: "rdonly" to "ro". This makes updates mounts such as: "mount -u -o rdonly" work more like, "mount -u -o ro". References to "-o rdonly" were changed to "-o ro" in revision 1.60 of the mount(8) man page, but some people still like to use "-o rdonly" since it was documented in earlier versions of FreeBSD. Requested by: rwatson MFC after: 1 week	2007-12-05 03:26:14 +00:00
thompsa	36f039142a	Apply a workaround for the unkillable jail problem where some devices created within the jail are never freed. si_cred is only used by the MAC framework so make the cred reference conditional on it being compiled in, this is not a fix and will need to be reviewed for any new consumers of si_cred. This will quell some user complaint when using jails with a default kernel. Reviewed by: rwatson MFC after: 3 days	2007-12-05 01:22:03 +00:00
kib	feb2aba5b6	Implement fetching of the __FreeBSD_version from the ELF ABI-tag note. The value is read into the p_osrel member of the struct proc. p_osrel is set to 0 for the binaries without the note. MFC after: 3 days	2007-12-04 12:28:07 +00:00
kib	dbef1afd93	Check for the program headers alignment of the ELF images before dereferencing. Unaligned access could cause panic on strict alignment architectures. Reviewed by: marcel, marius (also tested on sparc64, thanks !) MFC after: 3 days	2007-12-04 12:21:27 +00:00
alc	200640eacf	Introduce an UMA backend page allocator for the jumbo frame zones that allocates physically contiguous memory. MFC after: 3 months Requested and reviewed by: Kip Macy Tested by: Andrew Gallatin and Pyun YongHyeon	2007-12-04 07:06:08 +00:00
rwatson	0c4e2d79d0	When a symbol name can't be resolved, return "??" as the name, rather than "Unknown func", in order to avoid putting spaces in what ideally is a string separated by white space.	2007-12-03 14:44:35 +00:00
rwatson	e891688481	Add another new sysctl in support of the forthcoming procstat(1) to support its -k argument: kern.proc.kstack - dump the kernel stack of a process, if debugging is permitted. This sysctl is present if either "options DDB" or "options STACK" is compiled into the kernel. Having support for tracing the kernel stacks of processes from user space makes it much easier to debug (or understand) specific wmesg's while avoiding the need to enter DDB in order to determine the path by which a process came to be blocked on a particular wait channel or lock.	2007-12-02 21:52:18 +00:00
rwatson	99285f7544	Break out stack(9) from ddb(4): - Introduce per-architecture stack_machdep.c to hold stack_save(9). - Introduce per-architecture machine/stack.h to capture any common definitions required between db_trace.c and stack_machdep.c. - Add new kernel option "options STACK"; we will build in stack(9) if it is defined, or also if "options DDB" is defined to provide compatibility with existing users of stack(9). Add new stack_save_td(9) function, which allows the capture of a stacktrace of another thread rather than the current thread, which the existing stack_save(9) was limited to. It requires that the thread be neither swapped out nor running, which is the responsibility of the consumer to enforce. Update stack(9) man page. Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)	2007-12-02 20:40:35 +00:00
rwatson	c25458da37	Add two new sysctls in support of the forthcoming procstat(1) to support its -f and -v arguments: kern.proc.filedesc - dump file descriptor information for a process, if debugging is permitted, including socket addresses, open flags, file offsets, file paths, etc. kern.proc.vmmap - dump virtual memory mapping information for a process, if debugging is permitted, including layout and information on underlying objects, such as the type of object and path. These provide a superset of the information historically available through the now-deprecated procfs(4), and are intended to be exported in an ABI-robust form.	2007-12-02 10:10:27 +00:00
alc	ed1f1ac93c	Eliminate vfs_page_set_valid()'s unused argument.	2007-12-02 01:28:35 +00:00
rwatson	090235e567	Modify stack(9) stack_print() and stack_sbuf_print() routines to use new linker interfaces for looking up function names and offsets from instruction pointers. Create two variants of each call: one that is "DDB-safe" and avoids locking in the linker, and one that is safe for use in live kernels, by virtue of observing locking, and in particular safe when kernel modules are being loaded and unloaded simultaneous to their use. This will allow them to be used outside of debugging contexts. Modify two of three current stack(9) consumers to use the DDB-safe interfaces, as they run in low-level debugging contexts, such as inside lockmgr(9) and the kernel memory allocator. Update man page.	2007-12-01 22:04:16 +00:00
rwatson	0ab794a171	The kernel linker includes a number of utility functions to look up symbol information in support of DDB(4); these functions bypass normal linker locking as they may run in contexts where locking is unsafe (such as the kernel debugger). Add a new interface linker_ddb_search_symbol_name(), which looks up a symbol name and offset given an address, and also linker_search_symbol_name() which does the same but does follow the locking conventions of the linker. Unlike existing functions, these functions place the name in a caller-provided buffer, which is stable even after linker locks have been released. These functions will be used in upcoming revisions to stack(9) to support kernel stack trace generation in contexts as part of a live, rather than suspended, kernel.	2007-12-01 19:24:28 +00:00
peter	e287ae6b7a	Deal with the possibility of device_set_unit() being called when attaching the associated devinfo sysctl tree.	2007-11-30 21:30:14 +00:00
peter	8959a77f75	Add sysctl_rename_oid() to support device_set_unit() usage. Otherwise, when unit numbers are changed, the sysctl devinfo tree gets out of sync and duplicate trees are attempted to be attached with the original name.	2007-11-30 21:29:08 +00:00
rwatson	df297799bd	Move use of 'i' in cp_time sysctl under SCTL_MASK32 so that it compiles without warnings on systems that don't define it.	2007-11-29 08:38:22 +00:00
peter	8e9baed553	Move the shared cp_time array (counts %sys, %user, %idle etc) to the per-cpu area. cp_time[] goes away and a new function creates a merged cp_time-like array for things like linprocfs, sysctl etc. The atomic ops for updating cp_time[] in statclock go away, and the scope of the thread lock is reduced. sysctl kern.cp_time returns a backwards compatible cp_time[] array. A new kern.cp_times sysctl returns the individual per-cpu stats. I have pending changes to make top and vmstat optionally show per-cpu stats. I'm very aware that there are something like 5 or 6 other versions "out there" for doing this - but none were handy when I needed them. I did merge my changes with John Baldwin's, and ended up replacing a few chunks of my stuff with his, and stealing some other code. Reviewed by: jhb Partly obtained from: jhb	2007-11-29 06:34:30 +00:00
attilio	2562874cb6	Make ADAPTIVE_GIANT as the default in the kernel and remove the option. Currently, Giant is not too much contented so that it is ok to treact it like any other mutexes. Please don't forget to update your own custom config kernel files. Approved by: cognet, marcel (maintainers of arches where option is not enabled at the moment)	2007-11-28 05:50:45 +00:00
attilio	6fcea2407c	Simplify the adaptive spinning algorithm in rwlock and mutex: currently, before to spin the turnstile spinlock is acquired and the waiters flag is set. This is not strictly necessary, so just spin before to acquire the spinlock and to set the flags. This will simplify a lot other functions too, as now we have the waiters flag set only if there are actually waiters. This should make wakeup/sleeping couplet faster under intensive mutex workload. This also fixes a bug in rw_try_upgrade() in the adaptive case, where turnstile_lookup() will recurse on the ts_lock lock that will never be really released [1]. [1] Reported by: jeff with Nokia help Tested by: pho, kris (earlier, bugged version of rwlock part) Discussed with: jhb [2], jeff MFC after: 1 week [2] John had a similar patch about 6.x and/or 7.x about mutexes probabilly	2007-11-26 22:37:35 +00:00
attilio	7d4ec70696	Fix the spinlock static table adding missing spinlocks. - rm_spinlock has turnstile chain as child - srclock has callout and clk as child, found by witness "emulation". Just move it very high in our ranking	2007-11-24 04:32:32 +00:00
attilio	c61d48e731	transferlockers() is a very dangerous and hack-ish function as waiters should never be moved by one lock to another. As, luckily, nothing in our tree is using it, axe the function. This breaks lockmgr KPI, so interested, third-party modules should update their source code with appropriate replacement. Ok'ed by: ups, rwatson MFC after: 3 days	2007-11-24 04:22:28 +00:00
kris	e9fa0b52d6	Remove remaining Giant acquisition around vn_fullpath1. This was missed in r1.106 and has not been required for some years now. Reviewed by: jeff MFC After: 1 week	2007-11-22 21:26:25 +00:00
attilio	5ed5cf01be	Cache the value of c_lock as it can change, in the struct, while the global callout spinlock is not held, and can lead to PF#. Reported by: dougb, Mark Atkinson <atkin901 at yahoo dot com> Tested by: dougb Diagnosed by: jhb	2007-11-22 12:15:54 +00:00
davidxu	648833c953	Add function UMTX_OP_WAIT_UINT, the function causes thread to wait for an integer to be changed.	2007-11-21 04:21:02 +00:00
rwatson	6651bdd106	Test that p_textvp is non-NULL be dereferencing, as no executable vnode is set for kernel processes. Reported by: Skip Ford <skip at menantico dot com> MFC after: 3 days	2007-11-20 18:03:09 +00:00
attilio	6c4cd05be5	Add the function callout_init_rw() to callout facility in order to use rwlocks in conjuction with callouts. The function does basically what callout_init_mtx() alredy does with the difference of using a rwlock as extra argument. CALLOUT_SHAREDLOCK flag can be used, now, in order to acquire the lock only in read mode when running the callout handler. It has no effects when used in conjuction with mtx. In order to implement this, underlying callout functions have been made completely lock type-unaware, so accordingly with this, sysctl debug.to_avg_mtxcalls is now changed in the generic debug.to_avg_lockcalls. Note: currently the allowed lock classes are mutexes and rwlocks because callout handlers run in softclock swi, so they cannot sleep and they cannot acquire sleepable locks like sx or lockmgr. Requested by: kmacy, pjd, rwatson Reviewed by: jhb	2007-11-20 00:37:45 +00:00
jhb	3c82b6a5c7	Bump up the number of ttys supported by pty(4) to 512 by making use of [pt]ty[lmnoLMNO][0-9a-v]. MFC after: 3 days Reviewed by: rwatson	2007-11-19 20:49:42 +00:00
dumbbell	fff090c011	The kernel uses two ways to write data on a pipe: o buffered write, for chunks smaller than PIPE_MINDIRECT bytes o direct write, for everything else A call to writev(2) may receive struct iov of various size and the kernel may have to switch from one solution to the other. Before doing this, it must wake reader processes and any select/poll/kqueue up. This commit fixes a bug where select/poll/kqueue are not triggered when switching from buffered write to direct write. It adds calls to pipeselwakeup(). I give more details on freebsd-arch@: http://lists.freebsd.org/pipermail/freebsd-arch/2007-September/006790.html This should fix issues with Erlang (lang/erlang) and kqueue. Reported by: Rickard Green (Erlang)	2007-11-19 15:05:20 +00:00
attilio	bfc761fdba	Expand lock class with the "virtual" function lc_assert which will offer an unified way for all the lock primitives to express lock assertions. Currenty, lockmgrs and rmlocks don't have assertions, so just panic in that case. This will be a base for more callout improvements. Ok'ed by: jhb, jeff	2007-11-18 14:43:53 +00:00
rrs	9dbec8e7df	- Add in missing event handler invokes for initial proc and thread.	2007-11-18 13:56:51 +00:00
jb	9bd9c03e92	Add a function to list symbols in a file and their values at the same time rather than having to list the symbols and then go back and look each one up by name.	2007-11-18 00:23:31 +00:00
jhb	b113160d81	Acquire the process mutex and spin locks before calling thread_exit() in kthread_exit() to fix panics when using INVARIANTS.	2007-11-15 21:45:17 +00:00
rrs	303afeb279	- Adds event handlers for process_ctor,process_dtor, process_init, process_fini, thread_ctor, thread_dtor, thread_init, thread_fini. This will allow us to extend dynamically areas in proc/thread for dtrace ;-) Reviewed by: rwatson	2007-11-15 14:20:07 +00:00
glebius	93fa6a684a	Fix build.	2007-11-15 14:16:20 +00:00
rrs	3925f424bc	Adds an event handler for: - process_ctor,dtor, init and fini - thread_ctor,dtor, init and fini This allows the ability to add on additional things during construction/destruction of threads and processes. Reviewed by: rwatson	2007-11-15 13:28:54 +00:00
julian	303014d009	This time REALLY copy the name from the proc to the thread as a default.	2007-11-15 06:35:26 +00:00
julian	bfb9581757	When forking, the new thread deserves a name too. Don't just use the td_startcopy section as it is not the right thing to do in other cases (e.g. if starting a new thread from one that is already named).	2007-11-15 02:13:44 +00:00
attilio	4b290fe097	Remove a bogus KASSERT which will prevent rwlock to be acquired recursively in exclusive mode with debugging kernels. Submitted by: kmacy Approved by: jeff	2007-11-14 21:21:48 +00:00
marcel	1e7c4f0a3f	o Rename cpu_thread_setup() to cpu_thread_alloc() to better communicate that it relates to (is called by) thread_alloc() o Add cpu_thread_free() which is called from thread_free() to counter-act cpu_thread_alloc(). i386: Have cpu_thread_free() call cpu_thread_clean() to preserve behaviour. ia64: Have cpu_thread_free() call mtx_destroy() for the mutex initialized in cpu_thread_alloc(). PR: ia64/118024	2007-11-14 20:21:54 +00:00
julian	7ee6259be7	A bunch more files that should probably print out a thread name instead of a process name.	2007-11-14 06:51:33 +00:00
julian	b2732e0c22	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.	2007-11-14 06:21:24 +00:00
julian	b248158d8d	Make sure there is a good default thread name for all threads.	2007-11-14 06:04:57 +00:00
rwatson	9162bf224c	Add rm_wowned(9) function to test whether the current thread owns an exclusive lock on the passed rmlock. Reviewed by: ups	2007-11-10 15:06:30 +00:00
jhb	a97196ea0d	A couple of optimizations to the last commit. Submitted by: Christoph Mallon christoph mallon of gmx de	2007-11-08 21:45:56 +00:00
ups	c6c6cc7efa	Use VM_FAULT_DIRTY to fault in pages for write access in proc_rwmen. Otherwise copy on write may create an anonymous page that is not marked as dirty. Since writing data to these pages in this function also does not dirty these pages they may be later discarded by the pagedaemon.	2007-11-08 19:35:36 +00:00
jhb	3a40a5ed4b	Make it easier to add more ptys to the pty(4) driver: - Use unit2minor() and minor2unit() to generate minor numbers to support unit numbers higher than 255. - Use simple string operations on the 'names' array rather than hard-coded constants and switch statements so that more ptys can be added by simply expanding the 'names' array. MFC after: 1 week	2007-11-08 15:51:52 +00:00
ups	9c75ede409	Initial checkin for rmlock (read mostly lock) a multi reader single writer lock optimized for almost exclusive reader access. (see also rmlock.9) TODO: Convert to per cpu variables linkerset as soon as it is available. Optimize UP (single processor) case.	2007-11-08 14:47:55 +00:00
rwatson	70292a328f	Remove unused variable td from sched_idletd(). MFC after: 3 days Found with: Coverity Prevent(tm) CID: 3561	2007-11-05 12:01:12 +00:00
kib	9ae733819b	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb	2007-11-05 11:36:16 +00:00
julian	0bf0ae3b24	Completely remove the code for single threading the mainline fork code. Put in a little comment explaining why it went away. Re-enable it in the case there an exisiting process is just splitting off its address space and file descriptors. (I donpt think anything uses that code but it needs some sort of locking and this does the job. Reviewed by: Davidxu, alc, others MFC after: 3 days	2007-11-02 19:40:36 +00:00
njl	53b7cf834c	If we're on an SMP kernel and there is more than 1 CPU, reject any attempts to change the freq before the other CPUs are active. The current code always attempts to change all CPUs to match each other, and the requisite sched_bind() call won't work before APs are launched.	2007-10-30 22:18:08 +00:00
julian	03284e6e08	fix typo in code normally not compiled in.	2007-10-29 20:45:31 +00:00
julian	5d7e7a3ea7	Fix typo in code obviously not being compiled on any of my machines. found by: rdivacky@	2007-10-28 23:11:57 +00:00
jhb	3ee3b45ab8	Change the roundrobin implementation in the 4BSD scheduler to trigger a userland preemption directly from hardclock() via sched_clock() when a thread uses up a full quantum instead of using a periodic timeout to cause a userland preemption every so often. This fixes a potential deadlock when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held by a thread pinned or bound to another CPU. The current thread on that CPU will never be preempted while softclock is blocked. Note that ULE already drives its round-robin userland preemption from sched_clock() as well and always enables IPI_PREEMPT. MFC after: 1 week	2007-10-27 22:07:40 +00:00
rodrigc	0b0826633e	In nmount(), if MNT_ROOT is in the mount flags, filter it out instead of returning an error. (1) This makes the behavior consistent with mount(2). (2) This makes update mounts on the root file system work properly. (3) The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c and src/usr.sbin/mountd/mountd.c which were put in to eliminate errors during update mounts on the root file system can be removed. The only place were MNT_ROOTFS can be validly set is inside the kernel, i.e. with vfs_mountroot_try(). Reviewed by: phk MFC after: 3 days	2007-10-27 15:59:18 +00:00
julian	93719028ad	Add support for the pre-exisiting module shutdoen handshake. Fix some comments.	2007-10-27 00:54:16 +00:00
julian	bc607610fb	rename the process to 'idle' and 'intr' as per jhb.	2007-10-27 00:52:26 +00:00
julian	83e9ad5680	Initialise the initial process pointer to NULL so that we know we don't have an idle process yet. I'm guessing that on my system this was always 0 already. found by: Ed Schouten	2007-10-27 00:42:40 +00:00
julian	4eb58f88d1	If kthread_exit() is called on the last kthread in a kproc, then all the work in kproc_exit must be done. We don't actually have a user of this yet but why leave it to chance.	2007-10-26 22:18:20 +00:00
julian	9f54ea2c1e	if one changes a function's arguments, one must also change the callers.	2007-10-26 22:03:19 +00:00
julian	c21b38f409	oops, over optimised and broke non-SMP builds	2007-10-26 20:32:33 +00:00
julian	b6e413a633	kthread_exit needs no stinkin argument.	2007-10-26 17:03:22 +00:00
obrien	2fc050a9c2	style(9)	2007-10-26 16:33:47 +00:00
julian	11e1aa0d18	Introduce a way to make pure kernal threads. kthread_add() takes the same parameters as the old kthread_create() plus a pointer to a process structure, and adds a kernel thread to that process. kproc_kthread_add() takes the parameters for kthread_add, plus a process name and a pointer to a pointer to a process instead of just a pointer, and if the proc * is NULL, it creates the process to the specifications required, before adding the thread to it. All other old kthread_xxx() calls return, but act on (struct thread ) instead of (struct proc ). One reason to change the name is so that any old kernel modules that are lying around and expect kthread_create() to make a process will not just accidentally link. fix top to show kernel threads by their thread name in -SH mode add a tdnam formatting option to ps to show thread names. make all idle threads actual kthreads and put them into their own idled process. make all interrupt threads kthreads and put them in an interd process (mainly for aesthetic and accounting reasons) rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper' man page fixes to follow.	2007-10-26 08:00:41 +00:00
csjp	a16bb8381d	Implement AUE_CORE, which adds process core dump support into the kernel. This change introduces audit_proc_coredump() which is called by coredump(9) to create an audit record for the coredump event. When a process dumps a core, it could be security relevant. It could be an indicator that a stack within the process has been overflowed with an incorrectly constructed malicious payload or a number of other events. The record that is generated looks like this: header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec argument,0,0xb,signal path,/usr/home/csjp/test.core subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2 return,success,1 trailer,111 - We allocate a completely new record to make sure we arent clobbering the audit data associated with the syscall that produced the core (assuming the core is being generated in response to SIGABRT and not an invalid memory access). - Shuffle around expand_name() so we can use the coredump name at the very beginning of the coredump call. Make sure we free the storage referenced by "name" if we need to bail out early. - Audit both successful and failed coredump creation efforts Obtained from: TrustedBSD Project Reviewed by: rwatson MFC after: 1 month	2007-10-26 01:23:07 +00:00
rwatson	60570a92bf	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
csjp	76f4445c43	Move where we audit the PID argument such that we unconditionally audit it at the beginning of the syscall. This fixes a problem where the user supplies an invalid process ID which is > 0 which results in the PID argument not being audited. Obtained from: TrustedBSD Project MFC after: 1 week	2007-10-24 00:14:19 +00:00
julian	8b459baa42	Take out the single-threading code in fork. After discussions with jeff, alc, (various Ironport people), david Xu, and mostly Alfred (who found the problem) it has been demonstrated that this is not needed for our implementations of threads and represents a real (as in we've seen it happen a lot) deadlock danger. Several points: Since forking multiple threads is not allowed, and posix states that any mutexes owned by othre threads wilol be owned in the child by phantom threads, and therads shouldn't ba accessing shared structures without protection, It can be proved that if this leads to the child process accessing inconsistent data, it's a programming error. The mode of thread_single() being used in fork() is the wrong one. It is using SINGLE_NO_EXIT when it should be using SINGLE_BOUNDARY. Even if this we used, System processes have no need to do it as they have no userland to get inconsistent. This commmit first fixes the above bugs to get tehm correct in CVS. then removes them with #ifdef. This is so that history contains the corrected version should it be needed in the future. This code may be needed if we implement the forkall() syscall from Solaris. It may be needed for other non-posix thread libraries at some time in the future, so let the code sit for a short while while I do some work on it anyhow. This removes a reproducible lockup in NFS. It may be argued that maybe doing a fork while holding a vnode lock may not be the best idea in th efirst place but it shouldn't cause a deadlock. The removal has been running under soak test for several days now. This removal should be seriously considered for 7.0 and RELENG_6. Note. There is code in the core-dumping code that may have a similar problem with coredumping threaded processes MFC After: 4 days	2007-10-23 17:54:15 +00:00
grehan	7c82e0520d	Cut over to ULE on PowerPC kern/sched_ule.c - Add __powerpc__ to the list of supported architectures powerpc/conf/GENERIC - Swap SCHED_4BSD with SCHED_ULE powerpc/powerpc/genassym.c - Export TD_LOCK field of thread struct powerpc/powerpc/swtch.S - Handle new 3rd parameter to cpu_switch() by updating the old thread's lock. Note: uniprocessor-only, will require modification for MP support. powerpc/powerpc/vm_machdep.c - Set 3rd param of cpu_switch to mutex of old thread's lock, making the call a no-op. Reviewed by: marcel, jeffr (slightly older version)	2007-10-23 00:52:25 +00:00
jb	9dec415fef	Add the full module path name to the kld_file_stat structure for kldstat(2). This allows libdtrace to determine the exact file from which a kernel module was loaded without having to guess. The kldstat(2) API is versioned with the size of the kld_file_stat structure, so this change creates version 2. Add the pathname to the verbose output of kldstat(8) too. MFC: 3 days	2007-10-22 04:12:57 +00:00
rwatson	2bd7685cc6	Add PRIV_VFS_STAT privilege, which will allow overriding policy limits on the right to stat() a file, such as in mac_bsdextended. Obtained from: TrustedBSD Project MFC after: 3 months	2007-10-21 22:50:11 +00:00
julian	51d643caa6	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
emaste	87bf077fa0	Put comments about syscalls by the correct ones, and use the correct syscall number in the comment.	2007-10-19 19:17:53 +00:00
sam	026e744752	ULE works fine on arm; allow it to be used Reviewed by: jeff, cognet, imp MFC after: 1 week	2007-10-16 19:25:26 +00:00
alfred	3dcb842f61	Export maxswzone, maxbcache, maxtsiz, dfldsiz, maxdsiz, dflssiz, maxssiz, and sgrowsiz via sysctl. MFC after: 1 week	2007-10-16 10:40:53 +00:00
netchild	21c6e78ea7	Backout sensors framework. Requested by: phk Discussed on: cvs-all	2007-10-15 20:00:24 +00:00
netchild	4af9918bc0	Import OpenBSD's sysctl hardware sensors framework. This commit includes the following core components: * sample configuration file for sensorsd * rc(8) script and glue code for sensorsd(8) * sysctl(3) doc fixes for CTL_HW tree * sysctl(3) documentation for hardware sensors * sysctl(8) documentation for hardware sensors * support for the sensor structure for sysctl(8) * rc.conf(5) documentation for starting sensorsd(8) * sensor_attach(9) et al documentation * /sys/kern/kern_sensors.c o sensor_attach(9) API for drivers to register ksensors o sensor_task_register(9) API for the update task o sysctl(3) glue code o hw.sensors shadow tree for sysctl(8) internal magic * <sys/sensors.h> * HW_SENSORS definition for <sys/sysctl.h> * sensors display for systat(1), including documentation * sensorsd(8) and all applicable documentation The userland part of the framework is entirely source-code compatible with OpenBSD 4.1, 4.2 and -current as of today. All sensor readings can be viewed with `sysctl hw.sensors`, monitored in semi-realtime with `systat -sensors` and also logged with `sensorsd`. Submitted by: Constantine A. Murenin <cnst@FreeBSD.org> Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors) Mentored by: syrinx Tested by: many OKed by: kensmith Obtained from: OpenBSD (parts)	2007-10-14 10:45:31 +00:00
des	73606ae492	I don't know what I was smoking when I wrote these three years ago; the return value is an error code, hence always an int. While I'm here, add getenv_uint() for completeness.	2007-10-13 11:30:19 +00:00
mohans	11057bb00f	Set the NFS server sockbuf high watermarks to the system defaults (up form 32KB). The low highwatermark setting caused UDP fullsock request drops, throttling thruput greatly. Reported by: Kris Kennaway Approved by: re@ (Ken Smith)	2007-10-12 03:56:27 +00:00
jeff	8f126294e7	- Fix from pr kern/115469; Don't redeliver a signal once it has been handled by the target process. Contributed by: Tijl Coosemans <tijl@ulyssis.org> Approved by: re	2007-10-09 00:03:39 +00:00
jeff	2bbd1c2e70	- Bail out of tdq_idled if !smp_started or idle stealing is disabled. This fixes a bug on UP machines with SMP kernels where the idle thread constantly switches after trying to steal work from the local cpu. - Make the idle stealing code more robust against self selection. - Prefer to steal from the cpu with the highest load that has at least one transferable thread. Before we selected the cpu with the highest transferable count which excludes bound threads. Collaborated with: csjp Approved by: re	2007-10-08 23:50:39 +00:00
jeff	57102cf5ad	- Restore historical sched_yield() behavior by changing sched_relinquish() to simply switch rather than lowering priority and switching. This allows threads of equal priority to run but not lesser priority. Discussed with: davidxu Reported by: NIIMI Satoshi <sa2c@sa2c.net> Approved by: re	2007-10-08 23:45:24 +00:00
jeff	065472edb7	- Restore historical yield() behavior by manually lowering priority and switching. Approved by: re	2007-10-08 23:40:40 +00:00
jeff	5e4673913f	- Fix ULE in kernels without PREEMPTION compiled in by always enabling the critical_exit() owepreempt check. ULE will always use owepreempt to preempt the idle thread. This change does not effect 4BSD since it will never set owepreempt without PREEMPTION enabled. - Remove some unused code from choosethread(). Discussed with: jhb Approved by: re	2007-10-08 23:37:28 +00:00
kmacy	caeef6111c	This patch adds an M_NOFREE flag which allows one to mark an mbuf as not being independently freeable. This allows one to embed an mbuf in the cluster itself. This confers the benefits of the packet zone on all cluster sizes. Embedded mbufs currently suffer from the same limitation that packet zone mbufs do in that one cannot disconnect them and pass them around independently of the cluster. It would likely be possible to eliminate this limitation in the future by adding a second reference for the mbuf itself. Approved by: re(gnn)	2007-10-06 21:42:39 +00:00
kmacy	c00108fd67	Allow drivers to free an mbuf without having the mbuf be touched if the driver has already freed any attached tags Approved by: re(gnn)	2007-10-06 21:13:55 +00:00
pjd	9281633138	Fix sx_try_slock(), so it only fails when there is an exclusive owner. Before that fix, it was possible for the function to fail if number of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr() call. This fixes ZFS problem when ZFS returns strange EIO errors under load. In ZFS there is a code that depends on the fact that sx_try_slock() can only fail if there is an exclusive owner. Discussed with: attilio Reviewed by: jhb Approved by: re (kensmith)	2007-10-02 14:48:48 +00:00
jeff	4a1456cf5d	- Reassign the thread queue lock to newtd prior to switching. Assigning after the switch leads to a race where the outgoing thread still owns the local queue lock while another cpu may switch it in. This race is only possible on machines where cpu_switch can take significantly longer on different cpus which in practice means HTT machines with unfair thread scheduling algorithms. Found by: kris (of course) Approved by: re	2007-10-02 01:30:18 +00:00
jeff	de71038e60	- Move the rebalancer back into hardclock to prevent potential softclock starvation caused by unbalanced interrupt loads. - Change the rebalancer to work on stathz ticks but retain randomization. - Simplify locking in tdq_idled() to use the tdq_lock_pair() rather than complex sequences of locks to avoid deadlock. Reported by: kris Approved by: re	2007-10-02 00:36:06 +00:00
jeff	644f587c97	- Honor the PREEMPTION and FULL_PREEMPTION flags by setting the default value for kern.sched.preempt_thresh appropriately. It can still by adjusted at runtime. ULE will still use IPI_PREEMPT in certain migration situations. - Assert that we're not trying to compile ULE on an unsupported architecture. To date, I believe only i386 and amd64 have implemented the third cpu switch argument required. Approved by: re	2007-09-27 16:39:27 +00:00
ru	78adc698af	Fix the description of the formula used to autosize the number of buffers in the buffer cache. Approved by: re (kensmith)	2007-09-26 11:22:23 +00:00
alc	d1bce06c64	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
jeff	ea91086eb3	- Bound the interactivity score so that it cannot become negative. Approved by: re	2007-09-24 00:28:54 +00:00
jeff	280a86b2ee	- Improve grammar. s/it's/its/. - Improve load long-term load balancer by always IPIing exactly once. Previously the delay after rebalancing could cause problems with uneven workloads. - Allow nice to have a linear effect on the interactivity score. This allows negatively niced programs to stay interactive longer. It may be useful with very expensive Xorg servers under high loads. In general it should not be necessary to alter the nice level to improve interactive response. We may also want to consider never allowing positively niced processes to become interactive at all. - Initialize ccpu to 0 rather than 0.0. The decimal point was leftover from when the code was copied from 4bsd. ccpu is 0 in ULE because ULE only exports weighted cpu values. Reported by: Steve Kargl (Load balancing problem) Approved by: re	2007-09-22 02:20:14 +00:00
pjd	763f7449dc	Fix some locking cases where we ask for exclusively locked vnode, but we get shared locked vnode in instead when vfs.lookup_shared is set to 1. Discussed with: kib, kris Tested by: kris Approved by: re (kensmith)	2007-09-21 10:16:56 +00:00
jeff	bc0eadb21d	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re	2007-09-21 04:10:23 +00:00
jeff	13d3160ef5	- Call sched_sleep() before we suspend threads. sched_wakeup() is already called via setrunnable(). This allows time slept while suspended to be accounted for swap. Approved by: re	2007-09-21 04:04:22 +00:00
attilio	e25b203061	Fix some entries in the locks static table of witness. In particular: - smp_tlb_mtx is no longer used, so it is axed. - smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in the table, however, has been the source of a false positive LOR reporting with the dt_lock. However, smp rendezvous lock would have had sched_lock there for older lock, so it wasn't still a leaf lock. - allpmaps is only used in ia32 architecture, so it is inserted in the appropriate stub. Addictionally: - kse_zombie_lock is no longer present, so its definition is axed out. - zombie_lock doesn't need to have an exported symbol, so just let's it be declared as static. Tested by: kris Approved by: jeff (mentor) Approved by: re	2007-09-20 20:38:43 +00:00
jeff	3fc0f8b973	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
rwatson	950792fe22	Remove the definition and implementation of 'CALLOUT_NETGIANT', a now- (and possibly always-) unused define. Reported by: kmacy Approved by: re (kensmith)	2007-09-15 12:33:24 +00:00
attilio	6160c569c2	Currently the LO_NOPROFILE flag (which is masked on upper level code by per-primitive macros like MTX_NOPROFILE, SX_NOPROFILE or RW_NOPROFILE) is not really honoured. In particular lock_profile_obtain_lock_failure() and lock_profile_obtain_lock_success() are naked respect this flag. The bug leads to locks marked with no-profiling to be profiled as well. In the case of the clock_lock, used by the timer i8254 this leads to unpredictable behaviour both on amd64 and ia32 (double faults panic, sudden reboots, etc.). The amd64 clock_lock is also not marked as not profilable as it should be. Fix these bugs adding proper checks in the lock profiling code and at clock_lock initialization time. i8254 bug pointed out by: kris Tested by: matteo, Giuseppe Cocomazzi <sbudella at libero dot it> Approved by: jeff (mentor) Approved by: re	2007-09-14 01:12:39 +00:00
attilio	4fcdd410b3	subr_sleepqueue.c presents a thread lock missing which leads to dangerous races for some struct thread members. More specifically, this bug seems responsible for some memory dumping problems people were experiencing. Fix this adding correct thread locking. Tested by: rwatson Submitted by: tegge Approved by: jeff Approved by: re	2007-09-13 09:12:36 +00:00
kib	e651705b7e	When restoring the mount after umount failed, the MNTK_UNMOUNT flag prevents insmntque() from placing reallocated syncer vnode on mount list, that causes panic in vfs_allocate_syncvnode(). Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is set and cleared simultaneously with MNTK_UNMOUNT, except on umount error path, where it is cleaned just before the syncer vnode is going to be allocated. Reported by: Peter Jeremy <peterjeremy optushome com au> Suggested by: tegge Approved by: re (rwatson)	2007-09-12 16:31:32 +00:00
attilio	ae7f786cc5	This is a follow-up, cleaning-up commit about recent changes involving topology foo functions. Working at the patch for topology problems in ia32/amd64 evicted some problems regarding functions ordering in the SI_SUB_CPU family of SYSINIT'ed subsystems. In order to avoid problems with new modified to involved functions, a correct ordering is not semantically specified for SI_SUB_CPU functions (for a larger view of the issue please visit: http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html ) Discussed with: peter Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org> Approved by: jeff Approved by: re	2007-09-11 22:54:09 +00:00
rwatson	198c38400a	Rename mac_check_vnode_delete() MAC Framework and MAC Policy entry point to mac_check_vnode_unlink(), reflecting UNIX naming conventions. This is the first of several commits to synchronize the MAC Framework in FreeBSD 7.0 with the MAC Framework as it will appear in Mac OS X Leopard. Reveiwed by: csjp, Samy Bahra <sbahra at gwu dot edu> Submitted by: Jacques Vidrine <nectar at apple dot com> Obtained from: Apple Computer, Inc. Sponsored by: SPARTA, SPAWAR Approved by: re (bmah)	2007-09-10 00:00:18 +00:00
rwatson	64e850e184	In userland_sysctl(), call useracc() with the actual newlen value to be used, rather than the one passed via 'req', which may not reflect a rewrite. This call to useracc() is redundant to validation performed by later copyin()/copyout() calls, so there isn't a security issue here, but this could technically lead to excessive validation of addresses if the length in newlen is shorter than req.newlen. Approved by: re (kensmith) Reviewed by: jhb Submitted by: Constantine A. Murenin <cnst+freebsd@bugmail.mojo.ru> Sponsored by: Google Summer of Code 2007	2007-09-02 09:59:33 +00:00
jhb	c54b68ab60	Close a race that snuck in with the recent changes to fix a LOR between the callout_lock spin lock and the sleepqueue spin locks. In the fix, callout_drain() has to drop the callout_lock so it can acquire the sleepqueue lock. The state of the callout can change while the callout_lock is held however (for example, it can be rescheduled via callout_reset()). The previous code assumed that the only state change that could happen is that the callout could finish executing. This change alters callout_drain() to effectively restart and recheck everything after it acquires the sleepqueue lock thus handling all the possible states that the callout could be in after any changes while callout_lock was dropped. Approved by: re (kensmith) Tested by: kris	2007-08-31 19:01:30 +00:00
dds	0134883846	Add missing newline in the log message of the previous commit. Approved by: re (kensmith) - implied	2007-08-31 13:56:26 +00:00
dds	674de1aff0	Don't panic. When encountering a negative value call log(LOG_NOTICE, ...) and record LONG_MAX, instead of calling KASSERT(...). Reported by: rwatson Approved by: re (kensmith)	2007-08-31 13:36:58 +00:00
jhb	7ec8dd9926	Partially revert the previous change. I failed to notice that where ktruserret() is invoked, an unlocked check of the per-process queue is performed inline, thus, we don't lock the ktrace_sx on every userret(). Pointy hat to: jhb Approved by: re (kensmith) Pointy hat recovered from: rwatson	2007-08-29 21:17:11 +00:00
jhb	736eaf5ce3	Rework the routines to convert a 5.x+ statfs structure (with fixed-size 64-bit counters) to a 4.x statfs structure (with long-sized counters). - For block counters, we scale up the block size sufficiently large so that the resulting block counts fit into a the long-sized (long for the ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs VOP did this already. This can lie about the block size to 4.x binaries, but it presents a more accurate picture of the ratios of free and available space. - For non-block counters, fix the freebsd32 stats converter to cap the values at INT32_MAX rather than losing the upper 32-bits to match the behavior of the 4.x statfs conversion routine in vfs_syscalls.c Approved by: re (kensmith)	2007-08-28 20:28:12 +00:00
rrs	e335457f91	- During shutdown pending, when the last sack came in and the last message on the send stream was "null" but still there, a state we allow, we could get hung and not clean it up and wait for the shutdown guard timer to clear the association without a graceful close. Fix this so that that we properly clean up. - Added support for Multiple ASCONF per new RFC. We only (so far) accept input of these and cannot yet generate a multi-asconf. - Sysctl'd support for experimental Fast Handover feature. Always disabled unless sysctl or socket option changes to enable. - Error case in add-ip where the peer supports AUTH and ADD-IP but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to ABORT in this case. - According to the Kyoto summit of socket api developers (Solaris, Linux, BSD). We need to have: o non-eeor mode messages be atomic - Fixed o Allow implicit setup of an assoc in 1-2-1 model if using the sctp_**() send calls - Fixed o Get rid of HAVE_XXX declarations - Done o add a sctp_pr_policy in hole in sndrcvinfo structure - Done o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch! - Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize when we close sending out the data and disabling Nagle. - Change key concatenation order to match the auth RFC - When sending OOTB shutdown_complete always do csum. - Don't send PKT-DROP to a PKT-DROP - For abort chunks just always checksums same for shutdown-complete. - inpcb_free front state had a bug where in queue data could wedge an assoc. We need to just abandon ones in front states (free_assoc). - If a peer sends us a 64k abort, we would try to assemble a response packet which may be larger than 64k. This then would be dropped by IP. Instead make a "minimum" size for us 64k-2k (we want at least 2k for our initack). If we receive such an init discard it early without all the processing. - When we peel off we must increment the tcb ref count to keep it from being freed from underneath us. - handling fwd-tsn had bugs that caused memory overwrites when given faulty data, fixed so can't happen and we also stop at the first bad stream no. - Fixed so comm-up generates the adaption indication. - peeloff did not get the hmac params copied. - fix it so we lock the addr list when doing src-addr selection (in future we need to use a multi-reader/one writer lock here) - During lowlevel output, we could end up with a _l_addr set to null if the iterator is calling the output routine. This means we would possibly crash when we gather the MTU info. Fix so we only do the gather where we have a src address cached. - we need to be sure to set abort flag on conn state when we receive an abort. - peeloff could leak a socket. Moved code so the close will find the socket if the peeloff fails (uipc_syscalls.c) Approved by: re@freebsd.org(Ken Smith)	2007-08-27 05:19:48 +00:00
kib	e23c502c5b	Destroy the kaio_mtx on the freeing the struct kaioinfo in the aio_proc_rundown. Do not allow for zero-length read to be passed to the fo_read file method by aio. Reported and tested by: Peter Holm Approved by: re (kensmith)	2007-08-20 11:53:26 +00:00
jeff	b3923a600f	- Improve runq_findbit_from() which is used by ULE's circular queue. Mask of the bits we want to ignore on the first pass rather than doing a linear scan. This puts us within a few instructions of the cost of runq_findbit() and removes this function from the top of profiling output for context switch heavy workloads. Approved by: re	2007-08-20 06:36:12 +00:00
jeff	0f3cc9a72e	- Set steal_thresh to log2(ncpus). This improves idle-time load balancing on 2cpu machines by reducing it to 1 by default. This improves loaded operation on 8cpu machines by increasing it to 3 where the extra idle time is not as critical. Approved by: re	2007-08-20 06:34:20 +00:00
njl	4140a5b735	Always call sched_bind(), even if on the CPU in question. It is wrong to check if we're already on that cpu and skip the bind since the thread could be migrated off in the meantime. Suggested by: jeff Approved by: re	2007-08-20 06:28:26 +00:00
njl	7d2f282057	Use a different loop variable for the inner loop. This previous reuse could have caused a hang, but we got lucky with the available multi-CPU states on actual hardware. Submitted by: Bjorn Koenig <bkoenig / alpha-tierchen.de> Approved by: re MFC after: 3 days	2007-08-19 20:34:13 +00:00
davidxu	3fef07aaec	Regenerate. Approved by: re(kensmith)	2007-08-16 05:32:26 +00:00
davidxu	0abd045472	Add thr_kill2 syscall which sends a signal to a thread in another process. Submitted by: Tijl Coosemans tijl at ulyssis dot org Approved by: re (kensmith)	2007-08-16 05:26:42 +00:00
jhb	7fdc86bfe3	On 6.x this works: % mount \| grep home /dev/ad4s1e on /home (ufs, local, noatime, soft-updates) % mount -u -o atime /home % mount \| grep home /dev/ad4s1e on /home (ufs, local, soft-updates) Restore this behavior for on 7.x for the following mount options: noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow In addition, on 7.x, the following are equivalent: mount -u -o atime /home mount -u -o nonoatime /home Ideally, when we introduce new mount options, we should avoid options starting with "no". :) Requested by: jhb Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com> Approved by: re (bmah) Proxy commit for: rodrigc	2007-08-15 17:40:09 +00:00
pjd	8d074382c8	Improve vn_printf() by: - adding missing vnode flags, - printing unknown flags as numbers, - using strlcat() instead of strcat(). Approved by: re (bmah)	2007-08-13 21:23:30 +00:00
kib	b236f3d925	Do not call free() while holding vnode interlock. Reported and tested by: Peter Holm Reviewed by: jeff Approved by: re (kensmith)	2007-08-07 09:04:50 +00:00
rwatson	23574c8673	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
jeff	155be8b518	- Fix one line that erroneously crept in my last commit. Approved by: re	2007-08-04 01:21:28 +00:00
jeff	8b42bd66ad	- Share scheduler locks between hyper-threaded cores to protect the tdq_group structure. Hyper-threaded cores won't really benefit from seperate locks anyway. - Seperate out the migration case from sched_switch to simplify the main switch code. We only migrate here if called via sched_bind(). - When preempted place the preempted thread back in the same queue at the head. - Improve the cpu group and topology infrastructure. Tested by: many on current@ Approved by: re	2007-08-03 23:38:46 +00:00
jeff	bfe5819efe	- Set SW_PREEMPT when we preempt in critical_exit(). Approved by: re	2007-08-03 23:35:35 +00:00
rwatson	c29e74320b	First in a series of changes to remove the now-unused Giant compatibility framework for non-MPSAFE network protocols: - Remove debug_mpsafenet variable, sysctl, and tunable. - Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel. - Remove logic to automatically flag interrupt handlers as non-MPSAFE if debug.mpsafenet is set for an INTR_TYPE_NET handler. - Remove logic to automatically flag netisr handlers as non-MPSAFE if debug.mpsafenet is set. - Remove references in a few subsystems, including NFS and Cronyx drivers, which keyed off debug_mpsafenet to determine various aspects of their own locking behavior. - Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into no-op's, as their entire behavior was determined by the value in debug_mpsafenet. - Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE. Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still present in subsystems, and will be removed in followup commits. Reviewed by: bz, jhb Approved by: re (kensmith)	2007-07-27 11:59:57 +00:00
attilio	c2dedaa0a9	Actually, upcalls cannot be freed while destroying the thread because we should call uma_zfree() with various spinlock helds. Rearranging the code would not help here because we cannot break atomicity respect prcess spinlock, so the only one choice we have is to defer the operation. In order to do this use a global queue synchronized through the kse_lock spinlock which is freed at any thread_alloc() / thread_wait() through a call to thread_reap(). Note that this approach is not ideal as we should want a per-process list of zombie upcalls, but it follows initial guidelines of KSE authors. Tested by: jkim, pav Approved by: jeff, julian Approved by: re	2007-07-27 09:21:18 +00:00
pjd	fe74e944d1	When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)	2007-07-26 16:58:09 +00:00
pjd	8dc8d42bdc	The v_mountedhere field is protected by the vnode lock, not vnode's internal lock. Approved by: re (rwatson)	2007-07-26 16:52:57 +00:00
attilio	0986b6b87f	upcall_free() was only used in kse_GC() which has been removed so it now results unused; this, with -Werror option of gcc, rise a warning for gcc which let the buildkernel to be busted. Fix this removing upcall_free(). Reported by: various Approved by: jeff Approved by: re Pointy hat to: attilio	2007-07-23 23:16:53 +00:00
attilio	ad75d346f7	Actually, KSE kernel bits locking is broken and can lead likely to dangerous races. Fix this problems adding correct locking for the members of 'struct kse_upcall' and other struct proc/struct thread related members. For the moment, just leave ku_mflag and ku_flags "lazy" locked. While here, cleanup the code removing the function kse_GC() (unused), and merging upcall_link(), upcall_unlink(), upcall_stash() in their respective callers (static functions, very short and only called in one place). Reported by: pav Tested by: pav (on some pointyhat cluster nodes) Approved by: jeff Approved by: re Sponsorized by: NGX Italy (http://www.ngx.it)	2007-07-23 14:52:22 +00:00
dwmalone	e8276674f3	If clock_ct_to_ts fails to convert time time from the real time clock, print a one line error message. Add some comments on not being able to trust the day of week field (I'll act on these comments in a follow up commit). Approved by: re MFC after: 3 weeks	2007-07-23 09:42:32 +00:00
kib	a8fa279159	ttyfree() frees the cdev(). But if there are pending kevents, filt_ttyrdetach() etc would later attempt to dereference cdev->si_tty, causing a 0xdeadc0de dereference. Change kn_hook value from cdev to struct tty to avoid dereferencing freed cdev. In ttygone(), wake up select(), sigio and kevent() users in addition to the queue sleepers. Return EV_EOF from kevent filters if TS_GONE is set. Submitted by: peter Tested by: Peter Holm Approved by: re (kensmith) MFC after: 2 weeks	2007-07-20 09:41:54 +00:00
attilio	636c12b5fb	Fix some problems with lock profiling in rw locks: - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - As for sx locks, disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - style(9) fixes Approved by: jeff, kmacy, jhb[1] Approved by: re [1] Had initial reservations not shared by others, conceded in the end.	2007-07-20 08:43:42 +00:00
jeff	e2ebe96ef4	- Refine the load balancer to improve buildkernel times on dual core machines. - Leave the long-term load balancer running by default once per second. - Enable stealing load from the idle thread only when the remote processor has more than two transferable tasks. Setting this to one further improves buildworld. Setting it higher improves mysql. - Remove the bogus pick_zero option. I had not intended to commit this. - Entirely disallow migration for threads with SRQ_YIELDING set. This balances out the extra migration allowed for with the load balancers. It also makes pick_pri perform better as I had anticipated. Tested by: Dmitry Morozovsky <marck@rinet.ru> Approved by: re	2007-07-19 20:03:15 +00:00
jeff	550dacee12	- When newtd is specified to sched_switch() it was not being initialized properly. We have to temporarily unlock the TDQ lock so we can lock the thread and add it to the run queue. This is used only for KSE. - When we add a thread from the tdq_move() via sched_balance() we need to ipi the target if it's sitting in the idle thread or it'll never run. Reported by: Rene Landan Approved by: re	2007-07-19 19:51:45 +00:00
jeff	b7b0163326	- Remove explicit references to sched_lock. A simpler assert will do. Approved by: re	2007-07-19 08:58:40 +00:00
jeff	1961c471a3	- Calling sched_nice() in tdsigwakeup() is no longer required by ULE and actually causes LORs and other panics. Reported by: mlaier Approved by: re	2007-07-19 08:49:16 +00:00
jeff	e6da269e0c	- Remove the global definition of sched_lock in mutex.h to break new code and third party modules which try to depend on it. - Initialize sched_lock in sched_4bsd.c. - Declare sched_lock in sparc64 pmap.c and assert that we're compiling with SCHED_4BSD to prevent accidental crashes from running ULE. This is the sole remaining file outside of the scheduler that uses the global sched_lock. Approved by: re	2007-07-18 20:46:06 +00:00
jeff	45491c3f8a	- Add the proper lock profiling calls to _thread_lock(). Obtained from: kipmacy Approved by: re	2007-07-18 20:38:13 +00:00
jeff	119fa6328d	ULE 3.0: Fine grain scheduler locking and affinity improvements. This has been in development for over 6 months as SCHED_SMP. - Implement one spin lock per thread-queue. Threads assigned to a run-queue point to this lock via td_lock. - Improve the facility for assigning threads to CPUs now that sched_lock contention no longer dominates scheduling decisions on larger SMP machines. - Re-write idle time stealing in an attempt to make it less damaging to general performance. This is still disabled by default. See kern.sched.steal_idle. - Call the long-term load balancer from a callout rather than sched_clock() so there are no locks held. This is disabled by default. See kern.sched.balance. - Parameterize many scheduling decisions via sysctls. Try to document these via sysctl descriptions. - General structural and naming cleanups. - Document each function with comments. Tested by: current@ amd64, x86, UP, SMP. Approved by: re	2007-07-17 22:53:23 +00:00
jeff	a682b8e2c4	- Use ruxagg() in calcru() to make sure we have current tick information from all threads. Discussed with: bde, attilio Approved by: re	2007-07-17 01:08:09 +00:00
rodrigc	81f7b063ec	Revert previous commits which I committed by mistake. Approved by: re (implicit) Pointy hat to: me	2007-07-14 21:23:31 +00:00
rodrigc	def1240ba2	The last entry in the ext2_opts array must be NULL, otherwise the kernel with crash in vfs_filteropt() if an invalid mount option is passed to ext2fs. Approved by: re (kensmith)	2007-07-14 21:18:19 +00:00
jhb	ef42e8706b	Fix a couple of issues with the stack limit for 32-bit processes on 64-bit kernels exposed by the recent fixes to resource limits for 32-bit processes on 64-bit kernels: - Let ABIs expose their maximum stack size via a new pointer in sysentvec and use that in preference to maxssiz during exec() rather than always using maxssiz for all processses. - Apply the ABI's limit fixup to the previous stack size when adjusting RLIMIT_STACK to determine if the existing mapping for the stack needs to be grown or shrunk (as well as how much it should be grown or shrunk). Approved by: re (kensmith)	2007-07-12 18:01:31 +00:00
attilio	7f10997905	Fix some problems with lock_profiling in sx locks: - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - Disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - Use homogeneous names for automatic variables in hard functions regarding lock_profiling - Style fixes - Add a CTASSERT for some flags building Discussed with: kmacy, kris Approved by: jeff (mentor) Approved by: re	2007-07-06 13:20:44 +00:00
kib	81a0028ff2	Revert destroy_dev() to the state before destroy_dev_sched() was introduced. Attempt to spawn destroy_dev_sched() from it causes inadmissible races. Requested by: tegge Approved by: re (kensmith)	2007-07-05 13:04:59 +00:00
bz	908dadf8c3	Remove netkey directory from cscope/TAGs generation and replace it with netipsec now that KAME IPsec is gone. While here add missing netinet6 directories. Add comments about the ports needed to be able to run those targets. Reviewed by: philip Approved by: re (rwatson)	2007-07-05 08:55:14 +00:00
peter	c23e287f99	Fix bad function type passed to destroy_dev_sched_cb(). Approved by: re (rwatson)	2007-07-05 05:54:47 +00:00
peter	fd45cf2a6d	Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate Approved by: re (kensmith)	2007-07-04 22:57:21 +00:00
peter	14c396a779	Regenerate after mmap/lseek/etc syscall changes. Approved by: re (kensmith)	2007-07-04 22:49:55 +00:00
peter	f69e26042a	Create new syscalls for mmap(), lseek(), pread(), pwrite(), truncate() and ftruncate(), but without the pad arg. There are several reasons for this. Consider 'mmap()'. On AMD64, the function call (and syscall) ABI allow for 6 register arguments. Additional arguments go on the stack. mmap(2) has 6 arguments. However, the syscall definition has an extra 'int pad' argument. This pushes it to 7 arguments, which means one must spill into the memory stack. Since the kernel API doesn't match userland API, we have a hack in libc - libc/sys/mmap.c. This implements the userland API by calling __syscall() with an extra argument and the pad argument, for a total of 8 args. This is all unnecessary and inconvenient for several things, including the kernel's syscall handler code which now has to handle merging stack arguments with register arguments. It is a big deal for certain 3rd party code. I'm adding libc glue to make the transition totally painless. I had intended to mark the old syscalls as COMPAT6, but the potential to shoot your feet by building a new kernel without COMPAT_FREEBSD6 but with a slighly older userland was too great. For now, they have manual "freebsd6_" prefixes rather than being COMPAT6. They will go back to being marked 'COMPAT6' after 7-stable starts. Approved by: re (kensmith)	2007-07-04 22:47:37 +00:00
peter	33b2e41038	Add support for COMPAT6 syscalls. Also, change the visibility of compat syscalls a slightly. Compat syscalls were missing from 'syscalls.h' entirely. This additionally adds them with their compat prefix. eg: SYS_freebsd6_mmap. Also, the syscalls.c names strings have different prefixes to differentiate syscalls. Instead of several "old.mmap" strings, there will now be a "compat.mmap" and "compat6.mmap" etc. Before, both would have had the same "old.mmap" label. Approved by: re	2007-07-04 22:38:28 +00:00
kib	bbfd17f7e8	Since cdev mutex is after system map mutex in global lock order, free() shall not be called while holding cdev mutex. devfs_inos unrhdr has cdev as mutex, thus creating this LOR situation. Postpone calling free() in kern/subr_unit.c:alloc_unr() and nested functions until the unrhdr mutex is dropped. Save the freed items on the ppfree list instead, and provide the clean_unrhdrl() and clean_unrhdr() functions to clean the list. Call clean_unrhdrl() after devfs_create() calls immediately before dropping cdev mutex. devfs_create() is the only user of the alloc_unrl() in the tree. Reviewed by: phk Tested by: Peter Holm LOR: 80 Approved by: re (kensmith)	2007-07-04 06:56:58 +00:00
jeff	af5bbfbc7b	- Use explicit locking in the various fcntl case statements so that we can acquire shared filedescriptor locks in the appropriate cases. - Remove Giant from calls that issue ioctls. The ioctl path has been mpsafe for some time now. - Only acquire giant for VOP_ADVLOCK when the filesystem requires giant. advlock is now mpsafe. Reviewed by: rwatson Approved by: re	2007-07-03 21:26:06 +00:00
jeff	4c779e4b9e	- Remove explicit Giant protection from lockf. Use the vnode interlock to protect this datastructure instead. - Preallocate an extra lockf structure in case we want to split a lock on insert or delete. - msleep() on the vnode interlock when blocking on a lock. Reviewed by: rwatson Approved by: re	2007-07-03 21:22:58 +00:00
jhb	160998c890	Tweak the low-level MI SMP code some: - Use cpu_spinwait() in the spin loops in stop_cpus(), restart_cpus(), and smp_rendezvous_action(). - Remove unneeded acq memory barriers in stop_cpus(), restart_cpus(), and smp_rendezvous_action(). - Add an additional synch point in smp_rendezvous() to ensure that all the CPUs will always see an up-to-date value of smp_rv_setup_func. Reviewed by: attilio Approved by: re (kensmith) Tested on: alpha, amd64, i386, sparc64 SMP (for several years)	2007-07-03 18:37:06 +00:00
kib	c88cac4c41	Rev. 1.204 and 1.205 got an erronous version of destroy_dev() that calls destroy_dev_sched() with cdev mutex locked. Commit the code that was actually tested. Pointy hat to: kib Approved by: re (implicit)	2007-07-03 18:18:30 +00:00
kib	764028016f	Lock Giant and proctree lock around dereferencing p_session->s_ttyvp->v_rdev. Lock cdev mutex too to close the race with tty being freed. Relock clone_drain_lock to prevent the LOR with proctree lock, thus add #include <fs/devfs/devfs_int.h>. Suggested by: tegge Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:46:37 +00:00
kib	a1bd6e4f52	Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from pty clone handler. Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:45:52 +00:00
kib	3f255f64e4	Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from the clone handler. Lock Giant in the clone handler. Use destroy_dev_sched() explicitely from pty_maybecleanup() and postpone pty_release() until both master and slave cdevs are destroyed by setting it as callback for destroy_dev_sched(). Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:44:59 +00:00
kib	c3c1199f3c	Automatically detect deadlock condition in destroy_dev(), that is, if destroy_dev() is called from csw method, and no d_purge driver method is provided. Transform the direct call to destroy_dev() into destroy_dev_sched(). Reviewed by: njl (programming interface) Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:43:20 +00:00
kib	0ae42a4095	Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls destroy_dev() from d_close() cdev method would self-deadlock. devfs_close() bump device thread reference counter, and destroy_dev() sleeps, waiting for si_threadcount to reach zero for cdev without d_purge method. destroy_dev_sched() could be used instead from d_close(), to schedule execution of destroy_dev() in another context. The destroy_dev_sched_drain() function can be used to drain the scheduled calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains the events clone to make sure no lingering devices are left after dev_clone event handler deregistered. make_dev_credf(MAKEDEV_REF) function should be used from dev_clone event handlers instead of make_dev()/make_dev_cred() to ensure that created device has reference counter bumped before cdev mutex is dropped inside make_dev(). Reviewed by: tegge (early versions), njl (programming interface) Debugging help and testing by: Peter Holm Approved by: re (kensmith)	2007-07-03 17:42:37 +00:00
kib	e711aeee1e	Relock the sema_mtxp unconditionally after copyin() for SETALL case in kern_semctl. Otherwise, later mtx_unlock() can operate on unlocked mutex. Submitted by: rdivacky MFC after: 3 days Approved by: re (kensmith)	2007-07-03 15:58:47 +00:00
rwatson	e95a48819f	Continue kernel privilege cleanup for 7.0: unstaticize suser_enabled and stop declaring it in systm.h -- it's used only in kern_priv.c and is not required elsewhere. Approved by: re (kensmith)	2007-07-02 14:03:29 +00:00

... 2 3 4 5 6 ...

10374 Commits