freebsd-skq

Author	SHA1	Message	Date
jeff	46f09d5bc3	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
jhb	c04bb048f6	Simplify the interrupt code a bit: - Always include the ie_disable and ie_eoi methods in 'struct intr_event' and collapse down to one intr_event_create() routine. The disable and eoi hooks simply aren't used currently in the !INTR_FILTER case. - Expand 'disab' to 'disable' in a few places. - Use function casts for arm and i386:intr_eoi_src() instead of wrapper routines since to trim one extra indirection. Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER} Tested on: {amd64,i386} x {FILTER, !FILTER}	2008-03-17 22:42:01 +00:00
kib	d5211e24af	Fix two races in the handling of the d_gianttrick for the D_NEEDGIANT drivers. In the giant_XXX wrappers for the device methods of the D_NEEDGIANT drivers, do not dereference the cdev->si_devsw. It is racing with the destroy_devl() clearing of the si_devsw. Instead, use the dev_refthread() and return ENXIO for the destroyed device. [1] The check for the D_INIT in the prep_cdevsw() was not synchronized with the call of the fini_cdevsw() in destroy_devl(), that under rapid device creation/destruction may result in the use of uninitialized cdevsw [2]. Change the protocol for the prep_cdevsw(), requiring it to be called under dev_mtx, where the check for D_INIT is done. Do not free the memory allocated for the gianttrick cdevsw while holding the dev_mtx, put it into the free list to be freed later. Reuse the d_gianttrick pointer to keep the size and layout of the struct cdevsw (requested by phk). Free the memory in the dev_unlock_and_free(), and do all the free after the dev_mtx is dropped (suggested by jhb). Reported by: bsdimp + many [1], pho [2] Reviewed by: phk, jhb Tested by: pho MFC after: 1 week	2008-03-17 13:17:10 +00:00
pjd	cf9cd1298d	- There is no more "uidinfo struct" mutex. - The "uidinfo hash" lock is now a rwlock. Reminded by: kib	2008-03-17 11:48:40 +00:00
pjd	4ef010fa26	Whitespace cleanups.	2008-03-16 21:32:20 +00:00
pjd	9123873999	- Use wait-free method to manage ui_sbsize and ui_proccnt fields in the uidinfo structure. This entirely removes contention observed on the ui_mtxp mutex (as it is now gone). - Convert the uihashtbl_mtx mutex to a rwlock, as most of the time we just need to read-lock it. Reviewed by: jhb, jeff, kris & others Tested by: kris	2008-03-16 21:29:02 +00:00
rwatson	e7b290ea3d	Consistently use ANSI C declarationsfor all functions in kern_synch.c.	2008-03-16 18:59:21 +00:00
pjd	52f3c4136e	Style fixes.	2008-03-16 18:26:59 +00:00
pjd	d61d590ad7	Fix information leak. We can find PIDs of running processes from within a jail, etc. by simply calling setpriority(PRIO_PROCESS, <PID>, 0) and checking the return value: 0 means that the process exists and -1 that it doesn't exist. Reviewed by: rwatson MFC after: 1 week	2008-03-16 17:55:06 +00:00
rwatson	877d7c65ba	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
sobomax	1560402d31	Properly set size of the file_zone to match kern.maxfiles parameter. Otherwise the parameter is no-op, since zone by default limits number of descriptors to some 12K entries. Attempt to allocate more ends up sleeping on zonelimit. MFC after: 2 weeks	2008-03-16 06:21:30 +00:00
ru	a786aa30ca	Fix panic on e.g. "kldload /dev/null". PR: kern/121427 Reviewed by: sem MFC after: 3 days	2008-03-15 17:40:18 +00:00
jhb	9c113163fb	Add preliminary support for binding interrupts to CPUs: - Add a new intr_event method ie_assign_cpu() that is invoked when the MI code wishes to bind an interrupt source to an individual CPU. The MD code may reject the binding with an error. If an assign_cpu function is not provided, then the kernel assumes the platform does not support binding interrupts to CPUs and fails all requests to do so. - Bind ithreads to CPUs on their next execution loop once an interrupt event is bound to a CPU. Only shared ithreads are bound. We currently leave private ithreads for drivers using filters + ithreads in the INTR_FILTER case unbound. - A new intr_event_bind() routine is used to bind an interrupt event to a CPU. - Implement binding on amd64 and i386 by way of the existing pic_assign_cpu PIC method. - For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up an interrupt source and binds its interrupt event to the specified CPU. MI code can currently (ab)use this by doing: intr_bind(rman_get_start(irq_res), cpu); however, I plan to add a truly MI interface (probably a bus_bind_intr(9)) where the implementation in the x86 nexus(4) driver would end up calling intr_bind() internally. Requested by: kmacy, gallatin, jeff Tested on: {amd64, i386} x {regular, INTR_FILTER}	2008-03-14 19:41:48 +00:00
jhb	c162b59cc2	Make the function prototype for cpu_search() match the declaration so that this still compiles with gcc3.	2008-03-14 15:22:38 +00:00
jeff	a469063987	PR 117603 - Close a sleepqueue signal race by interlocking with the per-process spinlock. This was mistakenly omitted from the thread_lock patch and has been a race since. MFC After: 1 week PR: bin/117603 Reported by: Danny Braniss <danny@cs.huji.ac.il>	2008-03-13 00:46:12 +00:00
jeff	acb93d599c	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
jeff	3b1acbdce2	- Pass the priority argument from sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_ to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter	2008-03-12 06:31:06 +00:00
jeff	e30139dff5	- KSE may free a thread that was never actually forked. This will leave td_cpuset NULL. Check for this condition before dereferencing the cpuset. Reported by: david@catwhisker.org, miwi@freebsd.org Sponsored by: Nokia	2008-03-12 05:01:14 +00:00
jeff	540fa064d9	- Fix the invalid priority panics people are seeing by forcing tdq_runq_add to select the runq rather than hoping we set it properly when we adjusted the priority. This involves the same number of branches as before so should perform identically without the extra fragility. Tested by: bz Reviewed by: bz	2008-03-10 22:48:27 +00:00
jeff	e53ae3b798	- Don't rely on a side effect of sched_prio() to set the initial ts_runq for thread0. Set it directly in sched_setup(). This fixes traps on boot seen on some machines. Reported by: phk	2008-03-10 09:50:29 +00:00
jeff	aa3cc14d3d	- Handle kdb switch panics outside of mi_switch() to remove some instructions from the common path and make the code more clear. Whether this has any impact on performance may depend on optimization levels. Sponsored by: Nokia	2008-03-10 03:16:51 +00:00
jeff	14a6f96adb	Reduce ULE context switch time by over 25%. - Only calculate timeshare priorities once per tick or when a thread is woken from sleeping. - Keep the ts_runq pointer valid after all priority changes. - Call tdq_runq_add() directly from sched_switch() without passing in via tdq_add(). We don't need to adjust loads or runqs anymore. - Sort tdq and ts_sched according to utilization to improve cache behavior. Sponsored by: Nokia	2008-03-10 03:15:19 +00:00
imp	be1fee2a1a	Tiny bit of KNF to make bus_setup_intr() look like the rest of this function.	2008-03-10 01:48:25 +00:00
jeff	128f1c2547	- Add the missing '2' case to the switch table for kern.smp.topology and assign it to create the flat 'none' topology where all cpus are scheduled as if they are equal and unrelated.	2008-03-10 01:38:53 +00:00
jeff	7dc7c824ee	- Add an implementation of sched_preempt() that avoids excessive IPIs. - Normalize the preemption/ipi setting code by introducing sched_shouldpreempt() so the logical is identical and not repeated between tdq_notify() and sched_setpreempt(). - In tdq_notify() don't set NEEDRESCHED as we may not actually own the thread lock this could have caused us to lose td_flags settings. - Garbage collect some tunables that are no longer relevant.	2008-03-10 01:32:01 +00:00
jeff	171a608f92	- Add a sched_preempt() routine to be called by md code after IPI_PREEMPT is delivered. - Add a simple implementation to 4bsd.	2008-03-10 01:30:35 +00:00
imp	a7ddb800e8	Any driver that relies on its parent to set the devclass has no way to know if has siblings that need an actual probe. Introduce a specail return value called BUS_PROBE_NOOWILDCARD. If the driver returns this, the probe is only successful for devices that have had a specific devclass set for them. Reviewed by: current@, jhb@, grehan@	2008-03-09 05:10:22 +00:00
antoine	514f31f40e	Introduce a new F_DUP2FD command to fcntl(2), for compatibility with Solaris and AIX. fcntl(fd, F_DUP2FD, arg) and dup2(fd, arg) are functionnaly equivalent. Document it. Add some regression tests (identical to the dup2(2) regression tests). PR: 120233 Submitted by: Jukka Ukkonen Approved by: rwaston (mentor) MFC after: 1 month	2008-03-08 22:02:21 +00:00
rwatson	bb69385843	Use sbuf routines to construct core dump filenames rather than custom string buffer handling, making the code both easier to read and more robust against string-handling bugs. MFC after: 1 week	2008-03-08 16:31:29 +00:00
rwatson	32931f304a	Unlock the process lock when expand_name() fails, or we may leak the process lock leading to a hang. This bug was introduced in kern_sig.c:1.351, when the call to expand_name() was moved earlier bit this particular error case was not updated.	2008-03-08 15:48:06 +00:00
rwatson	d91018d529	Add __FBSDID() tag. MFC after: 3 days Pointed out by: antoine	2008-03-07 15:27:08 +00:00
jeff	7b52c04658	- Add a missing unlock to cpuset_setaffinity(CPU_LEVEL_CPUSET, CPU_WHICH_PID) Found by: gallatin	2008-03-06 20:11:24 +00:00
jeff	f278be8741	- Don't overwrite the recently allocated 'nset' in cpuset_setthread() by passing it to cpuset_which(). Pass in 'set' instead. This argument is not used but for convenience cpuset_which() nulls all incoming parameters. Submitted by: davidxu	2008-03-05 08:08:32 +00:00
jeff	7e2fbaa872	- Verify that when a user supplies a mask that is bigger than the kernel mask none of the upper bits are set. - Be more careful about enforcing the boundaries of masks and child sets. - Introduce a few more CPU_* macros for implementing these tests. - Change the cpusetsize argument to be bytes rather than bits to match other apis. Sponsored by: Nokia	2008-03-05 01:49:20 +00:00
ru	fb19d1efe4	Make it possible to continue working after calling doadump() manually from debugger. (This got broken in rev. 1.122.)	2008-03-04 07:39:31 +00:00
raj	0757a4afb5	Initial support for Freescale PowerQUICC III MPC85xx system-on-chip family. The PQ3 is a high performance integrated communications processing system based on the e500 core, which is an embedded RISC processor that implements the 32-bit Book E definition of the PowerPC architecture. For details refer to: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC8555E This port was tested and successfully run on the following members of the PQ3 family: MPC8533, MPC8541, MPC8548, MPC8555. The following major integrated peripherals are supported: * On-chip peripherals bus * OpenPIC interrupt controller * UART * Ethernet (TSEC) * Host/PCI bridge * QUICC engine (SCC functionality) This commit brings the main functionality and will be followed by individual drivers that are logically separate from this base. Approved by: cognet (mentor) Obtained from: Juniper, Semihalf MFp4: e500	2008-03-03 17:17:00 +00:00
marcel	dcf8897ad7	Unbreak after cpuset: initialize td_cpuset in sched_fork_thread().	2008-03-02 21:34:57 +00:00
jeff	c12e39a76d	Add support for the new cpu topology api: - When searching for affinity search backwards in the tree from the last cpu we ran on while the thread still has affinity for the group. This can take advantage of knowledge of shared L2 or L3 caches among a group of cores. - When searching for the least loaded cpu find the least loaded cpu via the least loaded path through the tree. This load balances system bus links, individual cache levels, and hyper-threaded/SMT cores. - Make the periodic balancer recursively balance the highest and lowest loaded cpu across each link. Add support for cpusets: - Convert the cpuset to a simple native cpumask_t while the kernel still only supports cpumask. - Pass the derived cpumask down through the cpu_search functions to restrict the result cpus. - Make the various steal functions resilient to failure since all threads can not run on all cpus any longer. General improvements: - Precisely track the lowest priority thread on every runq with tdq_setlowpri(). Before it was more advisory but this ended up having pathological behaviors. - Remove many #ifdef SMP conditions to simplify the code. - Get rid of the old cumbersome tdq_group. This is more naturally expressed via the cpu_group tree. Sponsored by: Nokia Testing by: kris	2008-03-02 08:20:59 +00:00
jeff	ad2a31513f	- Remove the old smp cpu topology specification with a new, more flexible tree structure that encodes the level of cache sharing and other properties. - Provide several convenience functions for creating one and two level cpu trees as well as a default flat topology. The system now always has some topology. - On i386 and amd64 create a seperate level in the hierarchy for HTT and multi-core cpus. This will allow the scheduler to intelligently load balance non-uniform cores. Presently we don't detect what level of the cache hierarchy is shared at each level in the topology. - Add a mechanism for testing common topologies that have more information than the MD code is able to provide via the kern.smp.topology tunable. This should be considered a debugging tool only and not a stable api. Sponsored by: Nokia	2008-03-02 07:58:42 +00:00
jeff	9b809b84f1	- Regen for cpuset Sponsored by: Nokia	2008-03-02 07:41:10 +00:00
jeff	694203dedd	Add cpuset, an api for thread to cpu binding and cpu resource grouping and assignment. - Add a reference to a struct cpuset in each thread that is inherited from the thread that created it. - Release the reference when the thread is destroyed. - Add prototypes for syscalls and macros for manipulating cpusets in sys/cpuset.h - Add syscalls to create, get, and set new numbered cpusets: cpuset(), cpuset_{get,set}id() - Add syscalls for getting and setting affinity masks for cpusets or individual threads: cpuid_{get,set}affinity() - Add types for the 'level' and 'which' parameters for the cpuset. This will permit expansion of the api to cover cpu masks for other objects identifiable with an id_t integer. For example, IRQs and Jails may be coming soon. - The root set 0 contains all valid cpus. All thread initially belong to cpuset 1. This permits migrating all threads off of certain cpus to reserve them for special applications. Sponsored by: Nokia Discussed with: arch, rwatson, brooks, davidxu, deischen Reviewed by: antoine	2008-03-02 07:39:22 +00:00
jeff	3bd7de5a7c	- Add a new sched_affinity() api to be used in the upcoming cpuset implementation. - Add empty implementations of sched_affinity() to 4BSD and ULE. Sponsored by: Nokia	2008-03-02 07:19:35 +00:00
attilio	0d87334131	- Handle buffer lock waiters count directly in the buffer cache instead than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>	2008-03-01 19:47:50 +00:00
kib	8b513f42f1	Do not assert any locks for VOP_PRINT. In particular, do not assert that the vnode interlock is not held. vn_printf() already correctly handles locked and unlocked vnode interlocks, and all the in-tree vop_print methods are interlock-agnostic. Some code calls vprintf() with the vnode interlock held, that causes unjustified panics with INVARIANTS (ffs_syncvnode() as example). Reported by: Peter Holm	2008-02-26 12:16:35 +00:00
attilio	4014b55830	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-25 18:45:57 +00:00
attilio	0d54671a48	Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.	2008-02-24 16:38:58 +00:00
cperciva	b02e531c35	After finishing sending file data in sendfile(2), don't forget to send the provided trailers. This has been broken since revision 1.240. Submitted by: Dan Nelson PR: kern/120948 "sounds ok to me" from: phk MFC after: 3 days	2008-02-24 00:07:00 +00:00
des	df26e399aa	This patch adds a new ktrace(2) record type, KTR_STRUCT, whose payload consists of the null-terminated name and the contents of any structure you wish to record. A new ktrstruct() function constructs and emits a KTR_STRUCT record. It is accompanied by convenience macros for struct stat and struct sockaddr. In kdump(1), KTR_STRUCT records are handled by a dispatcher function that runs stringent sanity checks on its contents before handing it over to individual decoding funtions for each type of structure. Currently supported structures are struct stat and struct sockaddr for the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK and AF_IPX is present but disabled, as I am unable to test it properly. Since 's' was already taken, the letter 't' is used by ktrace(1) to enable KTR_STRUCT trace points, and in kdump(1) to enable their decoding. Derived from patches by Andrew Li <andrew2.li@citi.com>. PR: kern/117836 MFC after: 3 weeks	2008-02-23 01:01:49 +00:00
yar	ce8c493400	Undo the damage I did in sys/kern/vfs_mount.c #1.274 and sbin/mount_nfs/mount_nfs.c #1.76. Let the dragons sleep. Requested by: rodrigc, des PR: kern/120319 (welcome the bug back)	2008-02-18 20:58:57 +00:00
yar	2bac23abfa	Add a remark on a questionable property of vfs_mergeopts().	2008-02-18 10:10:42 +00:00
antoine	fb176dbab6	Make sysctl_kern_arnd return a random buffer instead of a random long, as it is expected by userland (stack protector guard setup for example). PR: 119129 Approved by: rwatson (mentor) MFC after: 1 month	2008-02-17 16:44:48 +00:00
kris	8697995804	Switch from conditionally dropping Giant in exit1() to asserting it is not held, which appears to be always true.	2008-02-17 15:28:28 +00:00
imp	0514d6cc34	Fix typo in comment.	2008-02-17 02:46:54 +00:00
antoine	ab8945769a	Remove a superfluous line in run_interrupt_driven_config_hooks(), next_entry is already initialized during TAILQ_FOREACH_SAFE(). PR: kern/119604 Approved by: rwatson (mentor) MFC after: 1 month	2008-02-15 21:54:21 +00:00
attilio	265cb5fb91	- Introduce lockmgr_args() in the lockmgr space. This function performs the same operation of lockmgr() but accepting a custom wmesg, prio and timo for the particular lock instance, overriding default values lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo. - Use lockmgr_args() in order to implement BUF_TIMELOCK() - Cleanup BUF_LOCK() - Remove LK_INTERNAL as it is nomore used in the lockmgr namespace Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-15 21:04:36 +00:00
yar	9713f1f445	In the new order of things dictated by nmount(2), a read-only mount is to be requested via a "ro" option. At the same time, MNT_RDONLY is gradually becoming an indicator of the current state of the FS instead of a command flag. Today passing MNT_RDONLY alone to the kernel's mount machinery will lead to various glitches. (See the PRs for examples.) Therefore mount the root FS with a "ro" option instead of the MNT_RDONLY flag. (Note that MNT_RDONLY still is added to the mount flags internally, by vfs_donmount(), if "ro" was specified.) To be able to pass "ro" cleanly to kernel_vmount(), teach the latter function to accept options with NULL values. Also correct the comment explaining how mount_arg() handles length of -1. PR: bin/106636 kern/120319 Submitted by: Jaakko Heinonen <see PR kern/120319 for email> (originally)	2008-02-14 17:04:31 +00:00
simon	49aa39283b	Fix sendfile(2) write-only file permission bypass. Security: FreeBSD-SA-08:03.sendfile Submitted by: kib	2008-02-14 11:44:31 +00:00
jhb	fd8332efc0	Add KASSERT()'s to catch attempts to recurse on spin mutexes that aren't marked recursable either via mtx_lock_spin() or thread_lock(). MFC after: 1 week	2008-02-13 23:39:05 +00:00
jhb	32100bd15f	Mark sleepqueue chain spin mutexes are recursable since the sleepq code now recurses on them in sleepq_broadcast() and sleepq_signal() when resuming threads that are fully asleep. MFC after: 1 week	2008-02-13 23:36:56 +00:00
jhb	64735ffb5f	Add a couple of assertions and KTR logging to thread_lock_flags() to match mtx_lock_spin_flags(). MFC after: 1 week	2008-02-13 23:33:50 +00:00
jhb	e8b1d791b2	Add an automatic kernel module version dependency to prevent loading modules using invalid ABI versions (e.g. a 7.x module with an 8.x kernel) for a given kernel: - Add a 'kernel' module version whose value is __FreeBSD_version. - Add a version dependency on 'kernel' in every module that has an acceptable version range of __FreeBSD_version up to the end of the branch __FreeBSD_version is part of. E.g. a module compiled on 701000 would work on kernels with versions between 701000 and 799999 inclusive. Discussed on: arch@ MFC after: 1 week	2008-02-13 21:34:06 +00:00
attilio	456bfb1f0f	- Add real assertions to lockmgr locking primitives. A couple of notes for this: * WITNESS support, when enabled, is only used for shared locks in order to avoid problems with the "disowned" locks * KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order to assert for a generic thread (not curthread) owning or not the lock. Really, this kind of check is bogus but it seems very widespread in the consumers code. So, for the moment, we cater this untrusted behaviour, until the consumers are not fixed and the options could be removed (hopefully during 8.0-CURRENT lifecycle) * Implementing KA_HELD and KA_UNHELD (not surported natively by WITNESS) made necessary the introduction of LA_MASKASSERT which specifies the range for default lock assertion flags * About other aspects, lockmgr_assert() follows exactly what other locking primitives offer about this operation. - Build real assertions for buffer cache locks on the top of lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp) paradigm. - Add checks at lock destruction time and use a cookie for verifying lock integrity at any operation. - Redefine BUF_LOCKFREE() in order to not use a direct assert but let it rely on the aforementioned destruction time check. KPI results evidently broken, so __FreeBSD_version bumping and manpage update result necessary and will be committed soon. Side note: lockmgr_assert() will be used soon in order to implement real assertions in the vnode namespace replacing the legacy and still bogus "VOP_ISLOCKED()" way. Tested by: kris (earlier version) Reviewed by: jhb	2008-02-13 20:44:19 +00:00
csjp	b24cb219b9	Make sure we restrict Linux only IPC calls from being executed through the FreeBSD ABI. IPC_INFO, SHM_INFO, SHM_STAT were added specifically for Linux binary support. They are not documented as being a part of the FreeBSD ABI, also, the structures necessary for them have been hidden away from the users for a long time. Also, the Linux ABI layer uses it's own structures to populate the responses back to the user to ensure that the ABI is consistent. I think there is a bit more separation work that needs to happen. Reviewed by: jhb Discussed with: jhb Discussed on: freebsd-arch@ (very briefly) MFC after: 1 month	2008-02-12 20:55:03 +00:00
ru	841dab65e0	Regenerate for readlink(2).	2008-02-12 20:11:54 +00:00
ru	56aa644e2a	Change readlink(2)'s return type and type of the last argument to match POSIX. Prodded by: Alexey Lyashkov	2008-02-12 20:09:04 +00:00
marcus	7e24637c24	Add support for displaying a process' current working directory, root directory, and jail directory within procstat. While this functionality is available already in fstat, encapsulating it in the kern.proc.filedesc sysctl makes it accessible without using kvm and thus without needing elevated permissions. The new procstat output looks like: PID COMM FD T V FLAGS REF OFFSET PRO NAME 76792 tcsh cwd v d -------- - - - /usr/src 76792 tcsh root v d -------- - - - / 76792 tcsh 15 v c rw------ 16 9130 - - 76792 tcsh 16 v c rw------ 16 9130 - - 76792 tcsh 17 v c rw------ 16 9130 - - 76792 tcsh 18 v c rw------ 16 9130 - - 76792 tcsh 19 v c rw------ 16 9130 - - I am also bumping __FreeBSD_version for this as this new feature will be used in at least one port. Reviewed by: rwatson Approved by: rwatson	2008-02-09 05:16:26 +00:00
attilio	e1db4e70b3	Conver all explicit instances to VOP_ISLOCKED(arg, NULL) into VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should only acquire curthread as argument; this will lead in axing the additional argument from both functions, making the code cleaner. Reviewed by: jeff, kib	2008-02-08 21:45:47 +00:00
jeff	e5687b20d7	- Add THREAD_LOCKPTR_ASSERT() to assert that the thread's lock points at the provided lock or &blocked_lock. The thread may be temporarily assigned to the blocked_lock by the scheduler so a direct comparison can not always be made. - Use THREAD_LOCKPTR_ASSERT() in the primary consumers of the scheduling interfaces. The schedulers themselves still use more explicit asserts. Sponsored by: Nokia	2008-02-07 06:55:38 +00:00
jeff	005506bb32	- In rw_wunlock_hard prefer to wakeup writers if there are both readers and writers available. Doing otherwise can cause deadlocks as no read locks can proceed while there are write waiters. Sponsored by: Nokia	2008-02-07 06:16:54 +00:00
alc	4a600fdd88	Change shm_dotruncate() so that it correctly handles cached pages that span the end of the object. (This change is analogous to revision 1.237 of vm/vnode_pager.c.) Discussed with: jhb	2008-02-07 05:55:16 +00:00
attilio	a715e455c6	td cannot be NULL in that place, so just axe out the check.	2008-02-06 13:26:01 +00:00
jeff	77ea5a24c7	Adaptive spinning in write path with readers and writer starvation avoidance. - Move recursion checking into rwlock inlines to free a bit for use with adaptive spinners. - Clear the RW_LOCK_WRITE_SPINNERS flag whenever the lock state changes causing write spinners to restart their loop. - Write spinners are limited by a count while readers hold the lock as there is no way to know for certain whether readers are running still. - In the read path block if there are write waiters or spinners to avoid starving writers. Use a new per-thread count, td_rw_rlocks, to skip starvation avoidance if it might cause a deadlock. - Remove or change invalid assertions in turnstiles. Reviewed by: attilio (developed parts of the patch as well) Sponsored by: Nokia	2008-02-06 01:02:13 +00:00
attilio	6234a71797	Add WITNESS support to lockmgr locking primitive. This support tries to be as parallel as possible with other locking primitives, but there are differences; more specifically: - The base witness support is alredy equipped for allowing lock duplication acquisition as lockmgr rely on this. - In the case of lockmgr_disown() the lock result unlocked by witness even if it is still held by the "kernel context" - In the case of upgrading we can have 3 different situations: * Total unlocking of the shared lock and nothing else * Real witness upgrade if the owner is the first upgrader * Shared unlocking and exclusive locking if the owner is not the first upgrade but it is still allowed to upgrade - LK_DRAIN is basically handled like an exclusive acquisition Additively new options LK_NODUP and LK_NOWITNESS can now be used with lockinit(): LK_NOWITNESS disables WITNESS for the specified lock while LK_NODUP enable duplicated locks tracking. This will require manpages update and a __FreeBSD_version bumping (addressed by further commits). This patch also fixes a problem occurring if a lockmgr is held in exclusive mode and the same owner try to acquire it in shared mode: currently there is a spourious shared locking acquisition while what we really want is a lock downgrade. Probabilly, this situation can be better served with a EDEADLK failing errno return. Side note: first testing on this patch alredy reveleated several LORs reported, so please expect LORs cascades until resolved. NTFS also is reported broken by WITNESS introduction. BTW, NTFS is exposing a lock leak which needs to be fixed, and this patch can help it out if rightly tweaked. Tested by: kris, yar, Scot Hetzel <swhetzel at gmail dot com>	2008-02-06 00:37:14 +00:00
attilio	acc2f89a7f	Really, no explicit checks against against lock_class_* object should be done in consumers code: using locks properties is much more appropriate. Fix current code doing these bogus checks. Note: Really, callout are not usable by all !(LC_SPINLOCK \| LC_SLEEPABLE) primitives like rmlocks doesn't implement the generic lock layer functions, but they can be equipped for this, so the check is still valid. Tested by: matteo, kris (earlier version) Reviewed by: jhb	2008-02-06 00:04:09 +00:00
rwatson	f23198af5c	Further clean up sorflush: - Expose sbrelease_internal(), a variant of sbrelease() with no expectations about the validity of locks in the socket buffer. - Use sbrelease_internel() in sorflush(), and as a result avoid intializing and destroying a socket buffer lock for the temporary stack copy of the actual buffer, asb. - Add a comment indicating why we do what we do, and remove an XXX since things have gotten less ugly in sorflush() lately. This makes socket close cleaner, and possibly also marginally faster. MFC after: 3 weeks	2008-02-04 12:25:13 +00:00
phk	13132840a1	Give sendfile(2) a SF_SYNC flag which makes it wait until all mbufs referencing the files VM pages are returned from the network stack, making changes to the file safe. This flag does not guarantee that the data has been transmitted to the other end.	2008-02-03 15:54:41 +00:00
phk	df9c99b9c2	Give MEXTADD() another argument to make both void pointers to the free function controlable, instead of passing the KVA of the buffer storage as the first argument. Fix all conventional users of the API to pass the KVA of the buffer as the first argument, to make this a no-op commit. Likely break the only non-convetional user of the API, after informing the relevant committer. Update the mbuf(9) manual page, which was already out of sync on this point. Bump __FreeBSD_version to 800016 as there is no way to tell how many arguments a CPP macro needs any other way. This paves the way for giving sendfile(9) a way to wait for the passed storage to have been accessed before returning. This does not affect the memory layout or size of mbufs. Parental oversight by: sam and rwatson. No MFC is anticipated.	2008-02-01 19:36:27 +00:00
rwatson	95bb3acd1f	Use FEATURE() macro to advertise aio availability.	2008-02-01 11:59:14 +00:00
rwatson	c57fa54759	Correct two problems relating to sorflush(), which is called to flush read socket buffers in shutdown() and close(): - Call socantrcvmore() before sblock() to dislodge any threads that might be sleeping (potentially indefinitely) while holding sblock(), such as a thread blocked in recv(). - Flag the sblock() call as non-interruptible so that a signal delivered to the thread calling sorflush() doesn't cause sblock() to fail. The sblock() is required to ensure that all other socket consumer threads have, in fact, left, and do not enter, the socket buffer until we're done flushin it. To implement the latter, change the 'flags' argument to sblock() to accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK flag. When SBL_NOINTR is set, it forces a non-interruptible sx acquisition, regardless of the setting of the disposition of SB_NOINTR on the socket buffer; without this change it would be possible for another thread to clear SB_NOINTR between when the socket buffer mutex is released and sblock() is invoked. Reviewed by: bz, kmacy Reported by: Jos Backus <jos at catnook dot com>	2008-01-31 08:22:24 +00:00
ru	910410640b	Add a wrapper function that bound checks writes to the dump device.	2008-01-28 19:04:07 +00:00
iwasaki	a9f086bbd3	Add devctl_process_running() so that power management system driver can check whether devd(8) is running. MFC after: 1 week	2008-01-27 16:06:37 +00:00
kib	82cf20c0b8	In rev. 1.156, the convertion of the minor number to the unit number resulted in the argument to the make_dev() to be a unit number. Correct this by supplying a minor number to make_dev(), and using the unit number for the calculation of the slave tty name. Reported and tested by: Peter Holm Reviewed by: jhb Yet another pointy hat to: kib MFC after: 1 day	2008-01-26 06:09:23 +00:00
jhb	dd3b84ba3a	Fix a bug where a thread that hit the race where the sleep timeout fires while the thread does not hold the thread lock would stop blocking for subsequent interruptible sleeps and would always immediately fail the sleep with EWOULDBLOCK instead (even sleeps that didn't have a timeout). Some background: - KSE has a facility for allowing one thread to interrupt another thread. During this process, the target thread aborts any interruptible sleeps much as if the target thread had a pending signal. Once the target thread acknowledges the interrupt, normal sleep handling resumes. KSE manages this via the TDF_INTERRUPTED flag. Specifically, it sets the flag when it sends an interrupt to another thread and clears it when the interrupt is acknowledged. (Note that this is purely a software interrupt sort of thing and has no relation to hardware interrupts or kernel interrupt threads.) - The old code for handling the sleep timeout race handled the race by setting the TDF_INTERRUPT flag and faking a KSE-style thread interrupt to the thread in the process of going to sleep. It probably should have just checked the TDF_TIMEOUT flag in sleepq_catch_signals() instead. - The bug was that the sleepq code would set TDF_INTERRUPT but it was never cleared. The sleepq code couldn't safely clear it in case there actually was a real KSE thread interrupt pending for the target thread (in fact, the sleepq timeout actually stomped on said pending interrupt). Thus, any future interruptible sleeps (sleep(.. PCATCH ..) or cv_wait_sig()) would see the TDF_INTERRUPT flag set and immediately fail with EWOULDBLOCK. The flag could be cleared if the thread belonged to a KSE process and another thread posted an interrupt to the original thread. However, in the more common case of a non-KSE process, the thread would pretty much stop sleeping. - Fix the bug by just setting TDF_TIMEOUT in the sleepq timeout code and not messing with TDF_INTERRUPT and td_intrval. With yesterday's fix to fix sleepq_switch() to check TDF_TIMEOUT, this is now sufficient. MFC after: 3 days	2008-01-25 19:44:46 +00:00
jhb	5d22bdedcf	Fix a race in the sleepqueue timeout code that resulted in sleeps not being properly cancelled by a timeout. In general there is a race between a the sleepq timeout handler firing while the thread is still in the process of going to sleep. In 6.x with sched_lock, the race was largely protected by sched_lock. The only place it was "exposed" and had to be handled was while checking for any pending signals in sleepq_catch_signals(). With the thread lock changes, the thread lock is dropped in between sleepq_add() and sleepq_wait() opening up a new window for this race. Thus, if the timeout fired while the sleeping thread was in between sleepq_add() and sleepq_wait(), the thread would be marked as timed out, but the thread would not be dequeued and sleepq_switch() would still block the thread until it was awakened via some other means. In the case of pause(9) where there is no other wakeup, the thread would never be awakened. Fix this by teaching sleepq_switch() to check if the thread has had its sleep canceled before blocking by checking the TDF_TIMEOUT flag and aborting the sleep and dequeueing the thread if it is set. MFC after: 3 days Reported by: dwhite, peter	2008-01-25 02:09:38 +00:00
dumbbell	ba3df23cb8	When asked to use kqueue, AIO stores its internal state in the `kn_sdata' member of the newly registered knote. The problem is that this member is overwritten by a call to kevent(2) with the EV_ADD flag, targetted at the same kevent/knote. For instance, a userland application may set the pointer to NULL, leading to a panic. A testcase was provided by the submitter. PR: kern/118911 Submitted by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 1 day	2008-01-24 17:10:19 +00:00
attilio	7213f4c32b	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo	2008-01-24 12:34:30 +00:00
bz	1c376286e0	Replace the last susers calls in netinet6/ with privilege checks. Introduce a new privilege allowing to set certain IP header options (hop-by-hop, routing headers). Leave a few comments to be addressed later. Reviewed by: rwatson (older version, before addressing his comments)	2008-01-24 08:25:59 +00:00
jeff	be58be75dd	- sched_prio() should only adjust tdq_lowpri if the thread is running or on a run-queue. If the priority is numerically raised only change lowpri if we're certain it will be correct. Some slop is allowed however previously we could erroneously raise lowpri for an idle cpu that a thread had recently run on which lead to errors in load balancing decisions.	2008-01-23 03:10:18 +00:00
rwatson	0e6bbfc8e3	Regenerate.	2008-01-20 23:44:24 +00:00
rwatson	ff05f9dd9d	Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls shm_open() and shm_unlink(). More auditing will need to be done for these calls to capture arguments properly.	2008-01-20 23:43:06 +00:00
rwatson	ff397597d9	Export a type for POSIX SHM file descriptors via kern.proc.filedesc as used by procstat, or SHM descriptors will show up as type unknown in userspace.	2008-01-20 19:55:52 +00:00
attilio	caa2ca048b	- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)	2008-01-19 17:36:23 +00:00
rwatson	dccd51b54f	Move unlock of global UNIX domain socket lock slightly lower in unp_connect(): it is expected to return with the lock held, and two possible error paths otherwise returned with it unlocked. The fix committed here is slightly different from the patch in the PR, but along an alternative line suggested in the PR. PR: 119778 MFC after: 3 days Submitted by: James Juran <james dot juran at baesystems dot com>	2008-01-18 19:16:03 +00:00
kib	3628ae460c	In the rev. 1.153, the one place for converting minor number to unit was missed. As result, pty_create_slave() may index out of the names[] bounds, creating wrong slave tty names. Tested by: kensmith Reviewed by: jhb MFC after: 3 days	2008-01-18 18:07:04 +00:00
julian	d6aa139aef	refactor code so it can run in a chroot without having to have /dev/mounted MFC After: 1 week	2008-01-18 17:02:14 +00:00
davidxu	ae86af8218	Make sure reading td_runtime in critical section since thread may be preempted and td_runtime will be modified.	2008-01-18 13:00:28 +00:00
davidxu	80ec49a2cf	Add POSIX clock id CLOCK_THREAD_CPUTIME_ID, this can be used to measure per-thread runtime in user code.	2008-01-18 07:04:42 +00:00
jhb	7e32513a5b	Add 'compat_freebsd[4567]' features corresponding to the kernel options COMPAT_FREEBSD[4567]. MFC after: 1 week Requested by: kris	2008-01-17 22:46:32 +00:00
sam	e443f3b38c	promote ath_defrag to m_collapse (and retire private+unused m_collapse from cxgb) Reviewed by: pyun, jhb, kmacy MFC after: 2 weeks	2008-01-17 21:25:09 +00:00
jhb	e3c7bebe5f	Remove a conditional that is always true. MFC after: 2 weeks	2008-01-17 20:15:15 +00:00
jhb	52da96d26e	Add a set of regression tests for the POSIX shm API (shm_open(2) and shm_unlink(2)).	2008-01-16 15:51:24 +00:00
njl	27739d97dc	Remove duplicate cpufreq levels, i.e. ones that are within 25 Mhz of each other. The first one survives, the rest are removed. So far, it appears only some acpi_perf(4) BIOS tables have these invalid states, but address this in the core to be sure to handle other potential driver data. PR: kern/114722 Tested by: stefan.lambrev / moneybookers.com MFC after: 3 days	2008-01-16 01:05:21 +00:00
jeff	7a1cea460a	- When executing the 'tryself' branch in sched_pickcpu() look at the lowest priority on the queue for the current cpu vs curthread's priority. In the case that curthread is waking up many threads of a lower priority as would happen with a turnstile_broadcast() or wakeup() of many threads this prevents them from all ending up on the current cpu. - In sched_add() make the relationship between a scheduled ithread and the current cpu advisory rather than strict. Only give the ithread affinity for the current cpu if it's actually being scheduled from a hardware interrupt. This prevents it from migrating when it simply blocks on a lock. Sponsored by: Nokia	2008-01-15 09:03:09 +00:00
attilio	71b7824213	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
attilio	8032f44d5f	lockmgr() function will return successfully when trying to work under panic but it won't actually lock anything. This can lead some paths to reach lockmgr_disown() with inconsistent lock which will let trigger the relative assertions. Fix those in order to recognize panic situation and to not trigger. Reported by: pho Submitted by: kib	2008-01-11 16:38:12 +00:00
rwatson	f261f9865b	Don't zero td_runtime when billing thread CPU usage to the process; maintain a separate td_incruntime to hold unbilled CPU usage for the thread that has the previous properties of td_runtime. When thread information is requested using the thread monitoring sysctls, export thread td_runtime instead of process rusage runtime in kinfo_proc. This restores the display of individual ithread and other kernel thread CPU usage since inception in ps -H and top -SH, as well for libthr user threads, valuable debugging information lost with the move to try kthreads since they are no longer independent processes. There is universal agreement that we should rewrite the process and thread export sysctls, but this commit gets things going a bit better in the mean time. Likewise, there are resevations about the continued validity of statclock given the speed of modern processors. Reviewed by: attilio, emaste, jhb, julian	2008-01-10 22:11:20 +00:00
rwatson	5e4b4e0b32	Remove "lock pushdown" todo item in comment -- I did that for 7.0. MFC after: 3 weeks	2008-01-10 12:38:17 +00:00
rwatson	ffaa005137	Correct typos in comments. MFC after: 3 weeks	2008-01-10 12:29:12 +00:00
attilio	18d0a0dd51	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
attilio	c3923db2c0	Fix a last second typo about recent lockmgr_disown() introduction.	2008-01-09 00:02:43 +00:00
attilio	9e182da191	Remove explicit calling of lockmgr() with the NULL argument. Now, lockmgr() function can only be called passing curthread and the KASSERT() is upgraded according with this. In order to support on-the-fly owner switching, the new function lockmgr_disown() has been introduced and gets used in BUF_KERNPROC(). KPI, so, results changed and FreeBSD version will be bumped soon. Differently from previous code, we assume idle thread cannot try to acquire the lockmgr as it cannot sleep, so loose the relative check[1] in BUF_KERNPROC(). Tested by: kris [1] kib asked for a KASSERT in the lockmgr_disown() about this condition, but after thinking at it, as this is a well known general rule, I found it not really necessary.	2008-01-08 23:48:31 +00:00
jhb	1975c09543	Regen for shm_open(2) and shm_unlink(2).	2008-01-08 22:01:26 +00:00
jhb	8cd9437636	Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())	2008-01-08 21:58:16 +00:00
jhb	c9a8a65b69	Close a race in the kern.ttys sysctl handler that resulted in panics in dev2udev() when a tty was being detached concurrently with the sysctl handler: - Hold the 'tty_list_mutex' lock while we read all the fields out of the struct tty for copying out later. Previously the pty(4) and pts(4) destroy routines could set t_dev to NULL, drop their reference on the tty and destroy the cdev while the sysctl handler was attempting to invoke dev2udev() on the cdev being destroyed. This happened when the sysctl handler read the value of t_dev prior to it being set to NULL either due to it being stale or due to timing races. By holding the list lock we guarantee that the destroy routines will block in ttyrel() in that case and not destroy the cdev until after we've copied all of our data. We may see a NULL cdev pointer or we may see the previous value, but the previous value will no longer point to a destroyed cdev if we see it. - Fix the ttyfree() routine used by tty device drivers in their detach methods to use ttyrel() on the tty so we don't leak them. Also, fix it to use the same order of operations as pty/pts destruction (set t_dev NULL, ttyrel(), destroy_dev()) so it cooperates with the sysctl handler. MFC after: 3 days Tested by: avatar	2008-01-08 04:53:28 +00:00
kris	0b7e3f244a	Fix logic in skipcount handling (used to sample every 1/N lock operations to reduce profiling overhead)	2008-01-08 01:11:40 +00:00
rwatson	d08bce9f30	Free MAC label on a POSIX semaphore when the semaphore is freed. MFC after: 3 days Submitted by: jhb	2008-01-07 22:03:19 +00:00
jhb	f8a246b979	Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson	2008-01-07 20:05:19 +00:00
bde	cf4bbeccc5	In sequential_heuristic(): - spell 16384 as 16384 and not as BKVASIZE. 16384 is (not quite) just a magic size that works well in practice. BKVASIZE should be MAXBSIZE (65536), but is 16384 because i386's don't have enough kva for it to be MAXBSIZE; 16384 works (not so well) for it for much the same reasons that it works well in the heuristic. - expand and/or add comments about this and other details. - don't explicitly inline this function. - fix some other style bugs.	2008-01-05 08:54:51 +00:00
peter	1e0f13faf7	Fall back to the binary-specified interpreter (ld-elf.so.1) if the ABI override binary isn't found. This could probably be smoother, but it is what I did in p4 change #126891 on 2007/09/27. It should solve the "ld-elf32.so.1"-in-chroot problem.	2008-01-05 08:35:56 +00:00
jeff	ebfbe60329	- Restore timeslicing code for all bit SCHED_FIFO priority classes. Reported by: Peter Jeremy <peterjeremy@optushome.com.au>	2008-01-05 04:47:31 +00:00
bz	69daf9ac7f	Add missing sb_sndptr* fields to db_print_sockbuf(). While here change %d to %u for u_ints. Discussed with: rwatson, kmacy	2008-01-03 15:19:31 +00:00
jeff	811f1d35ef	- In sysctl_kern_file skip fdps with negative lastfiles. This can happen if there are no files open. Accounting for these can eventually return a negative value for olenp causing sysctl to crash with a bad malloc. Reported by: Pawel Worach <pawel.worach@gmail.com>	2008-01-03 01:26:59 +00:00
obrien	97f31c31f6	Note what is too {short,long}.	2008-01-02 18:48:27 +00:00
jhb	311f35fbe0	A few whitespace fixes.	2008-01-02 17:09:15 +00:00
jeff	d473542286	- Place the fhold() in unp_internalize_fp to be more consistent with refs. - Clear all of the gc flags before doing a run. Stale flags were causing us to skip some descriptors. - If a unp socket has been marked REF in a gc pass it can't be dead. Found by: rwatson's test tool.	2008-01-01 01:46:42 +00:00
rodrigc	9cea85864d	In vfs_scanopt(), make sure that the mount option value is not NULL before calling vsscanf(). PR: 118531 Submitted by: Jaakko Heinonen <jh saunalahti fi> MFC after: 3 days	2007-12-31 23:44:53 +00:00
jhb	f9f7dedb21	Actually declare the kern.features sysctl node. Pointy hat to: jhb	2007-12-31 22:03:57 +00:00
jeff	c8db393cdc	- Pause a while after disabling lock profiling and before resetting it to be sure that all participating CPUs have stopped updating it. - Restore the behavior of printing the name of the lock type in the output.	2007-12-31 03:45:51 +00:00
jeff	319a3672d9	- Check the correct variable against NULL in two places. - If the unp_file is NULL that means it has never been internalized and it must be reachable.	2007-12-31 03:44:54 +00:00
imp	95c2febd5a	Rather than not redirting the bp when we get ENXIO, only redirty it when the error is EIO. This catches a much larger class of errors that are unlikely to succeed if retried. Submitted by: bde	2007-12-30 05:53:45 +00:00
jeff	ce18638805	Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho	2007-12-30 01:42:15 +00:00
alc	4565fa1697	Add the superpage reservation system. This is "part 2 of 2" of the machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date. Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)	2007-12-29 19:53:04 +00:00
rwatson	166f16a0c6	In "show lockedvnods" DDB command, use db_printf() rather than printf() so that the results end up in the DDB output stream rather than the console output stream. This should likely also be done for the vprint() function it calls. MFC after: 3 months	2007-12-28 00:47:31 +00:00
attilio	c720b77ca9	Trimm out now unused option LK_EXCLUPGRADE from the lockmgr namespace. This option just adds complexity and the new implementation no longer will support it, so axing it now that it is unused is probabilly the better idea. FreeBSD version is bumped in order to reflect the KPI breakage introduced by this patch. In the ports tree, kris found that only old OSKit code uses it, but as it is thought to work only on 2.x kernels serie, version bumping will solve any problem.	2007-12-28 00:38:13 +00:00
attilio	bd42361fd1	In order to avoid a huge class of deadlocks (in particular in interactions with the interlock), owner of the lock should be only curthread or at least, for its limited usage, NULL which identifies LK_KERNPROC. The thread "extra argument" for the lockmgr interface is going to be removed in the near future, but for the moment, just let kernel run for some days with this check on in order to find potential deadlocking places around the kernel and fix them.	2007-12-27 22:56:57 +00:00
rwatson	010adfaab6	Return ESRCH when a kernel stack is queried on a process in execve() -- p_candebug() will return EAGAIN which, if the other process never leaves execve(), will result in the sysctl spinning and never returning to userspace. Processes should always eventually leave execve(), but spinning in kernel while we wait is bad for countless reasons, and particularly harmful if execve() itself is deadlocked. Possibly we should return another error, or return a marker indicating the thread is in execve() so it can be reported that way in userspace. Reported by: kris	2007-12-27 22:44:01 +00:00
attilio	d9b244638e	As LK_EXCLUPGRADE is used in conjuction with LK_NOWAIT, LK_UPGRADE becames equivalent with this and so operate the switch. That call is the only one remaining LK_EXCLUPGRADE consumer and removing it will prepare the ground for LK_EXCLUPGRADE axing and further lockmgr improvements. Discussed with: jeff, ups	2007-12-27 20:52:05 +00:00
imp	430e46303c	A partial solution to some of the 'pull the umass device with a mounted FS' problems. These are more along the lines of 'avoiding an avoidable panic' than a complete solution to removable devices. We now close the barn door after the horse has gotten lose and has been hit by a truck, as it were. The barn no longer catches fire in this case, but the horse is still dead :-). The vfs_bio.c fix causes us not to put a failed write back into the dirty pool if the error returned was ENXIO. In that case, the buffer is treated like any other clean buffer that's being retured. ENXIO means the device isn't there anymore and will never be there again in the future, so retrying is futile. The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file system. If the device is gone, retrying later won't help and we'll never be able to unmount the device. These two are part of a larger patch set submitted by the author. The other patches will be forth coming. I added comments to these two patches. Submitted by: Henrik Gulbrandsen Reviewed by: phk@ PR: usb/46176 (partial)	2007-12-27 16:38:28 +00:00
rwatson	956e2983ba	Add textdump(4) facility, which provides an alternative form of kernel dump using mechanically generated/extracted debugging output rather than a simple memory dump. Current sources of debugging output are: - DDB output capture buffer, if there is captured output to save - Kernel message buffer - Kernel configuration, if included in kernel - Kernel version string - Panic message Textdumps are stored in swap/dump partitions as with regular dumps, but are laid out as ustar files in order to allow multiple parts to be stored as a stream of sequentially written blocks. Blocks are written out in reverse order, as the size of a textdump isn't known a priori. As with regular dumps, they will be extracted using savecore(8). One new DDB(4) command is added, "textdump", which accepts "set", "unset", and "status" arguments. By default, normal kernel dumps are generated unless "textdump set" is run in order to schedule a textdump. It can be canceled using "textdump unset" to restore generation of a normal kernel dump. Several sysctls exist to configure aspects of textdumps; debug.ddb.textdump.pending can be set to check whether a textdump is pending, or set/unset in order to control whether the next kernel dump will be a textdump from userspace. While textdumps don't have to be generated as a result of a DDB script run automatically as part of a kernel panic, this is a particular useful way to use them, as instead of generating a complete memory dump, a simple transcript of an automated DDB session can be captured using the DDB output capture and textdump facilities. This can be used to generate quite brief kernel bug reports rich in debugging information but not dependent on kernel symbol tables or precisely synchronized source code. Most textdumps I generate are less than 100k including the full message buffer. Using textdumps with an interactive debugging session is also useful, with capture being enabled/disabled in order to record some but not all of the DDB session. MFC after: 3 months	2007-12-26 11:32:33 +00:00
wkoszek	2179e058f4	Rewrite kern.console handling in sbuf(9). My intention is to leave kern.console format as is. Thus, no difference in output format should appear after this commit. Reviewed by: cognet@ (mentor) Approved by: cognet@ (mentor)	2007-12-25 21:17:34 +00:00
rwatson	bdee30611d	Add a new 'why' argument to kdb_enter(), and a set of constants to use for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run. Assign approximate why values to all current consumers of the kdb_enter() interface.	2007-12-25 17:52:02 +00:00
julian	265714a11e	give thread0 the tid 100000 and bumpt the others to start at 100001 MFC after: 1 week	2007-12-22 04:56:48 +00:00
wkoszek	7d4fbe3a46	Make SCHED_ULE buildable with gcc3. Reviewed by: cognet (mentor), jeffr Approved by: cognet (mentor), jeffr	2007-12-21 23:30:18 +00:00
imp	46b79678a8	When devclass_get_maxunit is passed a NULL, return -1 to indicate that there's nothing allocated at all yet.	2007-12-19 22:05:07 +00:00
obrien	ee337f2c34	Be more exact with sigaction SA_SIGINFO handling. Reviewed by: marcel	2007-12-18 20:39:13 +00:00
kmacy	41d59439f8	Add SB_NOCOALESCE flag to disable socket buffer update in place	2007-12-17 10:02:01 +00:00
davidxu	496a3a2c52	Check NULL pointer.	2007-12-17 08:09:37 +00:00
davidxu	747bbe486b	Add missing changes for fixing LOR of umtx lock and thread lock, follow the committing of files: kern_resource.c revision 1.181 sched_4bsd.c revision 1.111 sched_ule.c revision 1.218	2007-12-17 05:55:07 +00:00
jeff	4ec9caf00c	Refactor select to reduce contention and hide internal implementation details from consumers. - Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details. Tested by: kris Reviewed on: arch	2007-12-16 06:21:20 +00:00
rrs	edce837b8c	- fix tab to space issue, hmm maybe I should use vi.	2007-12-15 23:14:53 +00:00
jeff	12adc443d6	- Re-implement lock profiling in such a way that it no longer breaks the ABI when enabled. There is no longer an embedded lock_profile_object in each lock. Instead a list of lock_profile_objects is kept per-thread for each lock it may own. The cnt_hold statistic is now always 0 to facilitate this. - Support shared locking by tracking individual lock instances and statistics in the per-thread per-instance lock_profile_object. - Make the lock profiling hash table a per-cpu singly linked list with a per-cpu static lock_prof allocator. This removes the need for an array of spinlocks and reduces cache contention between cores. - Use a seperate hash for spinlocks and other locks so that only a critical_enter() is required and not a spinlock_enter() to modify the per-cpu tables. - Count time spent spinning in the lock statistics. - Remove the LOCK_PROFILE_SHARED option as it is always supported now. - Specifically drop and release the scheduler locks in both schedulers since we track owners now. In collaboration with: Kip Macy Sponsored by: Nokia	2007-12-15 23:13:31 +00:00
obrien	56e498fb3b	style.Makefile(5)	2007-12-14 21:30:51 +00:00
davidxu	3695a78889	Fix LOR of thread lock and umtx's priority propagation mutex due to the reworking of scheduler lock. MFC: after 3 days	2007-12-11 08:25:36 +00:00
rwatson	cce7cfdaf5	Check for P_WEXIT before PHOLD() on a process in kstack and vm query sysctls, as PHOLD() asserts !P_WEXIT. Reported by: Michael Plass <mfp49_freebsd at plass-family dot net>	2007-12-09 17:22:27 +00:00
jkoshy	72c27d71d8	Kernel and hwpmc(4) support for callchain capture. Sponsored by: FreeBSD Foundation and Google Inc.	2007-12-07 08:20:17 +00:00
jhb	0373a29045	Move several data structure definitions out of freebsd32_misc.c and into freebsd32.h instead. MFC after: 1 week	2007-12-06 23:11:27 +00:00
rrs	28889e11f4	- Puts default limits on 4k/9k and 16k zones for mbufs all based on 1/2 of each of the successive limits tied to the limit for 2k clusters. - Adds real functionality in so that doing a sysctl to change these actually changes them :-) MFC after: 1 week	2007-12-05 15:29:44 +00:00
kib	53229c8ee9	Use curthread instead of the FIRST_THREAD_IN_PROC for vnlru and syncer, when applicable. Aquire Giant slightly later for vnlru. In the syncer, aquire the Giant only when a vnode belongs to the non-MPsafe fs. In both speedup_syncer() and syncer_shutdown(), remove the syncer thread from the lbolt sleep queue after the syncer state is modified, not before. Herded by: attilio Tested by: Peter Holm Reviewed by: ups MFC after: 1 week	2007-12-05 09:34:04 +00:00
rodrigc	ca18adc0b5	In nmount(), internally convert the mount option: "rdonly" to "ro". This makes updates mounts such as: "mount -u -o rdonly" work more like, "mount -u -o ro". References to "-o rdonly" were changed to "-o ro" in revision 1.60 of the mount(8) man page, but some people still like to use "-o rdonly" since it was documented in earlier versions of FreeBSD. Requested by: rwatson MFC after: 1 week	2007-12-05 03:26:14 +00:00
thompsa	36f039142a	Apply a workaround for the unkillable jail problem where some devices created within the jail are never freed. si_cred is only used by the MAC framework so make the cred reference conditional on it being compiled in, this is not a fix and will need to be reviewed for any new consumers of si_cred. This will quell some user complaint when using jails with a default kernel. Reviewed by: rwatson MFC after: 3 days	2007-12-05 01:22:03 +00:00
kib	feb2aba5b6	Implement fetching of the __FreeBSD_version from the ELF ABI-tag note. The value is read into the p_osrel member of the struct proc. p_osrel is set to 0 for the binaries without the note. MFC after: 3 days	2007-12-04 12:28:07 +00:00
kib	dbef1afd93	Check for the program headers alignment of the ELF images before dereferencing. Unaligned access could cause panic on strict alignment architectures. Reviewed by: marcel, marius (also tested on sparc64, thanks !) MFC after: 3 days	2007-12-04 12:21:27 +00:00
alc	200640eacf	Introduce an UMA backend page allocator for the jumbo frame zones that allocates physically contiguous memory. MFC after: 3 months Requested and reviewed by: Kip Macy Tested by: Andrew Gallatin and Pyun YongHyeon	2007-12-04 07:06:08 +00:00
rwatson	0c4e2d79d0	When a symbol name can't be resolved, return "??" as the name, rather than "Unknown func", in order to avoid putting spaces in what ideally is a string separated by white space.	2007-12-03 14:44:35 +00:00
rwatson	e891688481	Add another new sysctl in support of the forthcoming procstat(1) to support its -k argument: kern.proc.kstack - dump the kernel stack of a process, if debugging is permitted. This sysctl is present if either "options DDB" or "options STACK" is compiled into the kernel. Having support for tracing the kernel stacks of processes from user space makes it much easier to debug (or understand) specific wmesg's while avoiding the need to enter DDB in order to determine the path by which a process came to be blocked on a particular wait channel or lock.	2007-12-02 21:52:18 +00:00
rwatson	99285f7544	Break out stack(9) from ddb(4): - Introduce per-architecture stack_machdep.c to hold stack_save(9). - Introduce per-architecture machine/stack.h to capture any common definitions required between db_trace.c and stack_machdep.c. - Add new kernel option "options STACK"; we will build in stack(9) if it is defined, or also if "options DDB" is defined to provide compatibility with existing users of stack(9). Add new stack_save_td(9) function, which allows the capture of a stacktrace of another thread rather than the current thread, which the existing stack_save(9) was limited to. It requires that the thread be neither swapped out nor running, which is the responsibility of the consumer to enforce. Update stack(9) man page. Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)	2007-12-02 20:40:35 +00:00
rwatson	c25458da37	Add two new sysctls in support of the forthcoming procstat(1) to support its -f and -v arguments: kern.proc.filedesc - dump file descriptor information for a process, if debugging is permitted, including socket addresses, open flags, file offsets, file paths, etc. kern.proc.vmmap - dump virtual memory mapping information for a process, if debugging is permitted, including layout and information on underlying objects, such as the type of object and path. These provide a superset of the information historically available through the now-deprecated procfs(4), and are intended to be exported in an ABI-robust form.	2007-12-02 10:10:27 +00:00
alc	ed1f1ac93c	Eliminate vfs_page_set_valid()'s unused argument.	2007-12-02 01:28:35 +00:00
rwatson	090235e567	Modify stack(9) stack_print() and stack_sbuf_print() routines to use new linker interfaces for looking up function names and offsets from instruction pointers. Create two variants of each call: one that is "DDB-safe" and avoids locking in the linker, and one that is safe for use in live kernels, by virtue of observing locking, and in particular safe when kernel modules are being loaded and unloaded simultaneous to their use. This will allow them to be used outside of debugging contexts. Modify two of three current stack(9) consumers to use the DDB-safe interfaces, as they run in low-level debugging contexts, such as inside lockmgr(9) and the kernel memory allocator. Update man page.	2007-12-01 22:04:16 +00:00
rwatson	0ab794a171	The kernel linker includes a number of utility functions to look up symbol information in support of DDB(4); these functions bypass normal linker locking as they may run in contexts where locking is unsafe (such as the kernel debugger). Add a new interface linker_ddb_search_symbol_name(), which looks up a symbol name and offset given an address, and also linker_search_symbol_name() which does the same but does follow the locking conventions of the linker. Unlike existing functions, these functions place the name in a caller-provided buffer, which is stable even after linker locks have been released. These functions will be used in upcoming revisions to stack(9) to support kernel stack trace generation in contexts as part of a live, rather than suspended, kernel.	2007-12-01 19:24:28 +00:00
peter	e287ae6b7a	Deal with the possibility of device_set_unit() being called when attaching the associated devinfo sysctl tree.	2007-11-30 21:30:14 +00:00
peter	8959a77f75	Add sysctl_rename_oid() to support device_set_unit() usage. Otherwise, when unit numbers are changed, the sysctl devinfo tree gets out of sync and duplicate trees are attempted to be attached with the original name.	2007-11-30 21:29:08 +00:00
rwatson	df297799bd	Move use of 'i' in cp_time sysctl under SCTL_MASK32 so that it compiles without warnings on systems that don't define it.	2007-11-29 08:38:22 +00:00
peter	8e9baed553	Move the shared cp_time array (counts %sys, %user, %idle etc) to the per-cpu area. cp_time[] goes away and a new function creates a merged cp_time-like array for things like linprocfs, sysctl etc. The atomic ops for updating cp_time[] in statclock go away, and the scope of the thread lock is reduced. sysctl kern.cp_time returns a backwards compatible cp_time[] array. A new kern.cp_times sysctl returns the individual per-cpu stats. I have pending changes to make top and vmstat optionally show per-cpu stats. I'm very aware that there are something like 5 or 6 other versions "out there" for doing this - but none were handy when I needed them. I did merge my changes with John Baldwin's, and ended up replacing a few chunks of my stuff with his, and stealing some other code. Reviewed by: jhb Partly obtained from: jhb	2007-11-29 06:34:30 +00:00
attilio	2562874cb6	Make ADAPTIVE_GIANT as the default in the kernel and remove the option. Currently, Giant is not too much contented so that it is ok to treact it like any other mutexes. Please don't forget to update your own custom config kernel files. Approved by: cognet, marcel (maintainers of arches where option is not enabled at the moment)	2007-11-28 05:50:45 +00:00
attilio	6fcea2407c	Simplify the adaptive spinning algorithm in rwlock and mutex: currently, before to spin the turnstile spinlock is acquired and the waiters flag is set. This is not strictly necessary, so just spin before to acquire the spinlock and to set the flags. This will simplify a lot other functions too, as now we have the waiters flag set only if there are actually waiters. This should make wakeup/sleeping couplet faster under intensive mutex workload. This also fixes a bug in rw_try_upgrade() in the adaptive case, where turnstile_lookup() will recurse on the ts_lock lock that will never be really released [1]. [1] Reported by: jeff with Nokia help Tested by: pho, kris (earlier, bugged version of rwlock part) Discussed with: jhb [2], jeff MFC after: 1 week [2] John had a similar patch about 6.x and/or 7.x about mutexes probabilly	2007-11-26 22:37:35 +00:00
attilio	7d4ec70696	Fix the spinlock static table adding missing spinlocks. - rm_spinlock has turnstile chain as child - srclock has callout and clk as child, found by witness "emulation". Just move it very high in our ranking	2007-11-24 04:32:32 +00:00
attilio	c61d48e731	transferlockers() is a very dangerous and hack-ish function as waiters should never be moved by one lock to another. As, luckily, nothing in our tree is using it, axe the function. This breaks lockmgr KPI, so interested, third-party modules should update their source code with appropriate replacement. Ok'ed by: ups, rwatson MFC after: 3 days	2007-11-24 04:22:28 +00:00
kris	e9fa0b52d6	Remove remaining Giant acquisition around vn_fullpath1. This was missed in r1.106 and has not been required for some years now. Reviewed by: jeff MFC After: 1 week	2007-11-22 21:26:25 +00:00
attilio	5ed5cf01be	Cache the value of c_lock as it can change, in the struct, while the global callout spinlock is not held, and can lead to PF#. Reported by: dougb, Mark Atkinson <atkin901 at yahoo dot com> Tested by: dougb Diagnosed by: jhb	2007-11-22 12:15:54 +00:00
davidxu	648833c953	Add function UMTX_OP_WAIT_UINT, the function causes thread to wait for an integer to be changed.	2007-11-21 04:21:02 +00:00
rwatson	6651bdd106	Test that p_textvp is non-NULL be dereferencing, as no executable vnode is set for kernel processes. Reported by: Skip Ford <skip at menantico dot com> MFC after: 3 days	2007-11-20 18:03:09 +00:00
attilio	6c4cd05be5	Add the function callout_init_rw() to callout facility in order to use rwlocks in conjuction with callouts. The function does basically what callout_init_mtx() alredy does with the difference of using a rwlock as extra argument. CALLOUT_SHAREDLOCK flag can be used, now, in order to acquire the lock only in read mode when running the callout handler. It has no effects when used in conjuction with mtx. In order to implement this, underlying callout functions have been made completely lock type-unaware, so accordingly with this, sysctl debug.to_avg_mtxcalls is now changed in the generic debug.to_avg_lockcalls. Note: currently the allowed lock classes are mutexes and rwlocks because callout handlers run in softclock swi, so they cannot sleep and they cannot acquire sleepable locks like sx or lockmgr. Requested by: kmacy, pjd, rwatson Reviewed by: jhb	2007-11-20 00:37:45 +00:00
jhb	3c82b6a5c7	Bump up the number of ttys supported by pty(4) to 512 by making use of [pt]ty[lmnoLMNO][0-9a-v]. MFC after: 3 days Reviewed by: rwatson	2007-11-19 20:49:42 +00:00
dumbbell	fff090c011	The kernel uses two ways to write data on a pipe: o buffered write, for chunks smaller than PIPE_MINDIRECT bytes o direct write, for everything else A call to writev(2) may receive struct iov of various size and the kernel may have to switch from one solution to the other. Before doing this, it must wake reader processes and any select/poll/kqueue up. This commit fixes a bug where select/poll/kqueue are not triggered when switching from buffered write to direct write. It adds calls to pipeselwakeup(). I give more details on freebsd-arch@: http://lists.freebsd.org/pipermail/freebsd-arch/2007-September/006790.html This should fix issues with Erlang (lang/erlang) and kqueue. Reported by: Rickard Green (Erlang)	2007-11-19 15:05:20 +00:00
attilio	bfc761fdba	Expand lock class with the "virtual" function lc_assert which will offer an unified way for all the lock primitives to express lock assertions. Currenty, lockmgrs and rmlocks don't have assertions, so just panic in that case. This will be a base for more callout improvements. Ok'ed by: jhb, jeff	2007-11-18 14:43:53 +00:00
rrs	9dbec8e7df	- Add in missing event handler invokes for initial proc and thread.	2007-11-18 13:56:51 +00:00
jb	9bd9c03e92	Add a function to list symbols in a file and their values at the same time rather than having to list the symbols and then go back and look each one up by name.	2007-11-18 00:23:31 +00:00
jhb	b113160d81	Acquire the process mutex and spin locks before calling thread_exit() in kthread_exit() to fix panics when using INVARIANTS.	2007-11-15 21:45:17 +00:00
rrs	303afeb279	- Adds event handlers for process_ctor,process_dtor, process_init, process_fini, thread_ctor, thread_dtor, thread_init, thread_fini. This will allow us to extend dynamically areas in proc/thread for dtrace ;-) Reviewed by: rwatson	2007-11-15 14:20:07 +00:00
glebius	93fa6a684a	Fix build.	2007-11-15 14:16:20 +00:00
rrs	3925f424bc	Adds an event handler for: - process_ctor,dtor, init and fini - thread_ctor,dtor, init and fini This allows the ability to add on additional things during construction/destruction of threads and processes. Reviewed by: rwatson	2007-11-15 13:28:54 +00:00
julian	303014d009	This time REALLY copy the name from the proc to the thread as a default.	2007-11-15 06:35:26 +00:00
julian	bfb9581757	When forking, the new thread deserves a name too. Don't just use the td_startcopy section as it is not the right thing to do in other cases (e.g. if starting a new thread from one that is already named).	2007-11-15 02:13:44 +00:00
attilio	4b290fe097	Remove a bogus KASSERT which will prevent rwlock to be acquired recursively in exclusive mode with debugging kernels. Submitted by: kmacy Approved by: jeff	2007-11-14 21:21:48 +00:00
marcel	1e7c4f0a3f	o Rename cpu_thread_setup() to cpu_thread_alloc() to better communicate that it relates to (is called by) thread_alloc() o Add cpu_thread_free() which is called from thread_free() to counter-act cpu_thread_alloc(). i386: Have cpu_thread_free() call cpu_thread_clean() to preserve behaviour. ia64: Have cpu_thread_free() call mtx_destroy() for the mutex initialized in cpu_thread_alloc(). PR: ia64/118024	2007-11-14 20:21:54 +00:00
julian	7ee6259be7	A bunch more files that should probably print out a thread name instead of a process name.	2007-11-14 06:51:33 +00:00
julian	b2732e0c22	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.	2007-11-14 06:21:24 +00:00
julian	b248158d8d	Make sure there is a good default thread name for all threads.	2007-11-14 06:04:57 +00:00
rwatson	9162bf224c	Add rm_wowned(9) function to test whether the current thread owns an exclusive lock on the passed rmlock. Reviewed by: ups	2007-11-10 15:06:30 +00:00
jhb	a97196ea0d	A couple of optimizations to the last commit. Submitted by: Christoph Mallon christoph mallon of gmx de	2007-11-08 21:45:56 +00:00
ups	c6c6cc7efa	Use VM_FAULT_DIRTY to fault in pages for write access in proc_rwmen. Otherwise copy on write may create an anonymous page that is not marked as dirty. Since writing data to these pages in this function also does not dirty these pages they may be later discarded by the pagedaemon.	2007-11-08 19:35:36 +00:00
jhb	3a40a5ed4b	Make it easier to add more ptys to the pty(4) driver: - Use unit2minor() and minor2unit() to generate minor numbers to support unit numbers higher than 255. - Use simple string operations on the 'names' array rather than hard-coded constants and switch statements so that more ptys can be added by simply expanding the 'names' array. MFC after: 1 week	2007-11-08 15:51:52 +00:00
ups	9c75ede409	Initial checkin for rmlock (read mostly lock) a multi reader single writer lock optimized for almost exclusive reader access. (see also rmlock.9) TODO: Convert to per cpu variables linkerset as soon as it is available. Optimize UP (single processor) case.	2007-11-08 14:47:55 +00:00
rwatson	70292a328f	Remove unused variable td from sched_idletd(). MFC after: 3 days Found with: Coverity Prevent(tm) CID: 3561	2007-11-05 12:01:12 +00:00
kib	9ae733819b	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb	2007-11-05 11:36:16 +00:00
julian	0bf0ae3b24	Completely remove the code for single threading the mainline fork code. Put in a little comment explaining why it went away. Re-enable it in the case there an exisiting process is just splitting off its address space and file descriptors. (I donpt think anything uses that code but it needs some sort of locking and this does the job. Reviewed by: Davidxu, alc, others MFC after: 3 days	2007-11-02 19:40:36 +00:00
njl	53b7cf834c	If we're on an SMP kernel and there is more than 1 CPU, reject any attempts to change the freq before the other CPUs are active. The current code always attempts to change all CPUs to match each other, and the requisite sched_bind() call won't work before APs are launched.	2007-10-30 22:18:08 +00:00
julian	03284e6e08	fix typo in code normally not compiled in.	2007-10-29 20:45:31 +00:00
julian	5d7e7a3ea7	Fix typo in code obviously not being compiled on any of my machines. found by: rdivacky@	2007-10-28 23:11:57 +00:00
jhb	3ee3b45ab8	Change the roundrobin implementation in the 4BSD scheduler to trigger a userland preemption directly from hardclock() via sched_clock() when a thread uses up a full quantum instead of using a periodic timeout to cause a userland preemption every so often. This fixes a potential deadlock when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held by a thread pinned or bound to another CPU. The current thread on that CPU will never be preempted while softclock is blocked. Note that ULE already drives its round-robin userland preemption from sched_clock() as well and always enables IPI_PREEMPT. MFC after: 1 week	2007-10-27 22:07:40 +00:00
rodrigc	0b0826633e	In nmount(), if MNT_ROOT is in the mount flags, filter it out instead of returning an error. (1) This makes the behavior consistent with mount(2). (2) This makes update mounts on the root file system work properly. (3) The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c and src/usr.sbin/mountd/mountd.c which were put in to eliminate errors during update mounts on the root file system can be removed. The only place were MNT_ROOTFS can be validly set is inside the kernel, i.e. with vfs_mountroot_try(). Reviewed by: phk MFC after: 3 days	2007-10-27 15:59:18 +00:00
julian	93719028ad	Add support for the pre-exisiting module shutdoen handshake. Fix some comments.	2007-10-27 00:54:16 +00:00
julian	bc607610fb	rename the process to 'idle' and 'intr' as per jhb.	2007-10-27 00:52:26 +00:00
julian	83e9ad5680	Initialise the initial process pointer to NULL so that we know we don't have an idle process yet. I'm guessing that on my system this was always 0 already. found by: Ed Schouten	2007-10-27 00:42:40 +00:00
julian	4eb58f88d1	If kthread_exit() is called on the last kthread in a kproc, then all the work in kproc_exit must be done. We don't actually have a user of this yet but why leave it to chance.	2007-10-26 22:18:20 +00:00
julian	9f54ea2c1e	if one changes a function's arguments, one must also change the callers.	2007-10-26 22:03:19 +00:00
julian	c21b38f409	oops, over optimised and broke non-SMP builds	2007-10-26 20:32:33 +00:00
julian	b6e413a633	kthread_exit needs no stinkin argument.	2007-10-26 17:03:22 +00:00
obrien	2fc050a9c2	style(9)	2007-10-26 16:33:47 +00:00
julian	11e1aa0d18	Introduce a way to make pure kernal threads. kthread_add() takes the same parameters as the old kthread_create() plus a pointer to a process structure, and adds a kernel thread to that process. kproc_kthread_add() takes the parameters for kthread_add, plus a process name and a pointer to a pointer to a process instead of just a pointer, and if the proc * is NULL, it creates the process to the specifications required, before adding the thread to it. All other old kthread_xxx() calls return, but act on (struct thread ) instead of (struct proc ). One reason to change the name is so that any old kernel modules that are lying around and expect kthread_create() to make a process will not just accidentally link. fix top to show kernel threads by their thread name in -SH mode add a tdnam formatting option to ps to show thread names. make all idle threads actual kthreads and put them into their own idled process. make all interrupt threads kthreads and put them in an interd process (mainly for aesthetic and accounting reasons) rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper' man page fixes to follow.	2007-10-26 08:00:41 +00:00
csjp	a16bb8381d	Implement AUE_CORE, which adds process core dump support into the kernel. This change introduces audit_proc_coredump() which is called by coredump(9) to create an audit record for the coredump event. When a process dumps a core, it could be security relevant. It could be an indicator that a stack within the process has been overflowed with an incorrectly constructed malicious payload or a number of other events. The record that is generated looks like this: header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec argument,0,0xb,signal path,/usr/home/csjp/test.core subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2 return,success,1 trailer,111 - We allocate a completely new record to make sure we arent clobbering the audit data associated with the syscall that produced the core (assuming the core is being generated in response to SIGABRT and not an invalid memory access). - Shuffle around expand_name() so we can use the coredump name at the very beginning of the coredump call. Make sure we free the storage referenced by "name" if we need to bail out early. - Audit both successful and failed coredump creation efforts Obtained from: TrustedBSD Project Reviewed by: rwatson MFC after: 1 month	2007-10-26 01:23:07 +00:00
rwatson	60570a92bf	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
csjp	76f4445c43	Move where we audit the PID argument such that we unconditionally audit it at the beginning of the syscall. This fixes a problem where the user supplies an invalid process ID which is > 0 which results in the PID argument not being audited. Obtained from: TrustedBSD Project MFC after: 1 week	2007-10-24 00:14:19 +00:00
julian	8b459baa42	Take out the single-threading code in fork. After discussions with jeff, alc, (various Ironport people), david Xu, and mostly Alfred (who found the problem) it has been demonstrated that this is not needed for our implementations of threads and represents a real (as in we've seen it happen a lot) deadlock danger. Several points: Since forking multiple threads is not allowed, and posix states that any mutexes owned by othre threads wilol be owned in the child by phantom threads, and therads shouldn't ba accessing shared structures without protection, It can be proved that if this leads to the child process accessing inconsistent data, it's a programming error. The mode of thread_single() being used in fork() is the wrong one. It is using SINGLE_NO_EXIT when it should be using SINGLE_BOUNDARY. Even if this we used, System processes have no need to do it as they have no userland to get inconsistent. This commmit first fixes the above bugs to get tehm correct in CVS. then removes them with #ifdef. This is so that history contains the corrected version should it be needed in the future. This code may be needed if we implement the forkall() syscall from Solaris. It may be needed for other non-posix thread libraries at some time in the future, so let the code sit for a short while while I do some work on it anyhow. This removes a reproducible lockup in NFS. It may be argued that maybe doing a fork while holding a vnode lock may not be the best idea in th efirst place but it shouldn't cause a deadlock. The removal has been running under soak test for several days now. This removal should be seriously considered for 7.0 and RELENG_6. Note. There is code in the core-dumping code that may have a similar problem with coredumping threaded processes MFC After: 4 days	2007-10-23 17:54:15 +00:00
grehan	7c82e0520d	Cut over to ULE on PowerPC kern/sched_ule.c - Add __powerpc__ to the list of supported architectures powerpc/conf/GENERIC - Swap SCHED_4BSD with SCHED_ULE powerpc/powerpc/genassym.c - Export TD_LOCK field of thread struct powerpc/powerpc/swtch.S - Handle new 3rd parameter to cpu_switch() by updating the old thread's lock. Note: uniprocessor-only, will require modification for MP support. powerpc/powerpc/vm_machdep.c - Set 3rd param of cpu_switch to mutex of old thread's lock, making the call a no-op. Reviewed by: marcel, jeffr (slightly older version)	2007-10-23 00:52:25 +00:00
jb	9dec415fef	Add the full module path name to the kld_file_stat structure for kldstat(2). This allows libdtrace to determine the exact file from which a kernel module was loaded without having to guess. The kldstat(2) API is versioned with the size of the kld_file_stat structure, so this change creates version 2. Add the pathname to the verbose output of kldstat(8) too. MFC: 3 days	2007-10-22 04:12:57 +00:00
rwatson	2bd7685cc6	Add PRIV_VFS_STAT privilege, which will allow overriding policy limits on the right to stat() a file, such as in mac_bsdextended. Obtained from: TrustedBSD Project MFC after: 3 months	2007-10-21 22:50:11 +00:00
julian	51d643caa6	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
emaste	87bf077fa0	Put comments about syscalls by the correct ones, and use the correct syscall number in the comment.	2007-10-19 19:17:53 +00:00
sam	026e744752	ULE works fine on arm; allow it to be used Reviewed by: jeff, cognet, imp MFC after: 1 week	2007-10-16 19:25:26 +00:00
alfred	3dcb842f61	Export maxswzone, maxbcache, maxtsiz, dfldsiz, maxdsiz, dflssiz, maxssiz, and sgrowsiz via sysctl. MFC after: 1 week	2007-10-16 10:40:53 +00:00
netchild	21c6e78ea7	Backout sensors framework. Requested by: phk Discussed on: cvs-all	2007-10-15 20:00:24 +00:00
netchild	4af9918bc0	Import OpenBSD's sysctl hardware sensors framework. This commit includes the following core components: * sample configuration file for sensorsd * rc(8) script and glue code for sensorsd(8) * sysctl(3) doc fixes for CTL_HW tree * sysctl(3) documentation for hardware sensors * sysctl(8) documentation for hardware sensors * support for the sensor structure for sysctl(8) * rc.conf(5) documentation for starting sensorsd(8) * sensor_attach(9) et al documentation * /sys/kern/kern_sensors.c o sensor_attach(9) API for drivers to register ksensors o sensor_task_register(9) API for the update task o sysctl(3) glue code o hw.sensors shadow tree for sysctl(8) internal magic * <sys/sensors.h> * HW_SENSORS definition for <sys/sysctl.h> * sensors display for systat(1), including documentation * sensorsd(8) and all applicable documentation The userland part of the framework is entirely source-code compatible with OpenBSD 4.1, 4.2 and -current as of today. All sensor readings can be viewed with `sysctl hw.sensors`, monitored in semi-realtime with `systat -sensors` and also logged with `sensorsd`. Submitted by: Constantine A. Murenin <cnst@FreeBSD.org> Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors) Mentored by: syrinx Tested by: many OKed by: kensmith Obtained from: OpenBSD (parts)	2007-10-14 10:45:31 +00:00
des	73606ae492	I don't know what I was smoking when I wrote these three years ago; the return value is an error code, hence always an int. While I'm here, add getenv_uint() for completeness.	2007-10-13 11:30:19 +00:00
mohans	11057bb00f	Set the NFS server sockbuf high watermarks to the system defaults (up form 32KB). The low highwatermark setting caused UDP fullsock request drops, throttling thruput greatly. Reported by: Kris Kennaway Approved by: re@ (Ken Smith)	2007-10-12 03:56:27 +00:00
jeff	8f126294e7	- Fix from pr kern/115469; Don't redeliver a signal once it has been handled by the target process. Contributed by: Tijl Coosemans <tijl@ulyssis.org> Approved by: re	2007-10-09 00:03:39 +00:00
jeff	2bbd1c2e70	- Bail out of tdq_idled if !smp_started or idle stealing is disabled. This fixes a bug on UP machines with SMP kernels where the idle thread constantly switches after trying to steal work from the local cpu. - Make the idle stealing code more robust against self selection. - Prefer to steal from the cpu with the highest load that has at least one transferable thread. Before we selected the cpu with the highest transferable count which excludes bound threads. Collaborated with: csjp Approved by: re	2007-10-08 23:50:39 +00:00
jeff	57102cf5ad	- Restore historical sched_yield() behavior by changing sched_relinquish() to simply switch rather than lowering priority and switching. This allows threads of equal priority to run but not lesser priority. Discussed with: davidxu Reported by: NIIMI Satoshi <sa2c@sa2c.net> Approved by: re	2007-10-08 23:45:24 +00:00
jeff	065472edb7	- Restore historical yield() behavior by manually lowering priority and switching. Approved by: re	2007-10-08 23:40:40 +00:00
jeff	5e4673913f	- Fix ULE in kernels without PREEMPTION compiled in by always enabling the critical_exit() owepreempt check. ULE will always use owepreempt to preempt the idle thread. This change does not effect 4BSD since it will never set owepreempt without PREEMPTION enabled. - Remove some unused code from choosethread(). Discussed with: jhb Approved by: re	2007-10-08 23:37:28 +00:00
kmacy	caeef6111c	This patch adds an M_NOFREE flag which allows one to mark an mbuf as not being independently freeable. This allows one to embed an mbuf in the cluster itself. This confers the benefits of the packet zone on all cluster sizes. Embedded mbufs currently suffer from the same limitation that packet zone mbufs do in that one cannot disconnect them and pass them around independently of the cluster. It would likely be possible to eliminate this limitation in the future by adding a second reference for the mbuf itself. Approved by: re(gnn)	2007-10-06 21:42:39 +00:00
kmacy	c00108fd67	Allow drivers to free an mbuf without having the mbuf be touched if the driver has already freed any attached tags Approved by: re(gnn)	2007-10-06 21:13:55 +00:00
pjd	9281633138	Fix sx_try_slock(), so it only fails when there is an exclusive owner. Before that fix, it was possible for the function to fail if number of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr() call. This fixes ZFS problem when ZFS returns strange EIO errors under load. In ZFS there is a code that depends on the fact that sx_try_slock() can only fail if there is an exclusive owner. Discussed with: attilio Reviewed by: jhb Approved by: re (kensmith)	2007-10-02 14:48:48 +00:00
jeff	4a1456cf5d	- Reassign the thread queue lock to newtd prior to switching. Assigning after the switch leads to a race where the outgoing thread still owns the local queue lock while another cpu may switch it in. This race is only possible on machines where cpu_switch can take significantly longer on different cpus which in practice means HTT machines with unfair thread scheduling algorithms. Found by: kris (of course) Approved by: re	2007-10-02 01:30:18 +00:00
jeff	de71038e60	- Move the rebalancer back into hardclock to prevent potential softclock starvation caused by unbalanced interrupt loads. - Change the rebalancer to work on stathz ticks but retain randomization. - Simplify locking in tdq_idled() to use the tdq_lock_pair() rather than complex sequences of locks to avoid deadlock. Reported by: kris Approved by: re	2007-10-02 00:36:06 +00:00
jeff	644f587c97	- Honor the PREEMPTION and FULL_PREEMPTION flags by setting the default value for kern.sched.preempt_thresh appropriately. It can still by adjusted at runtime. ULE will still use IPI_PREEMPT in certain migration situations. - Assert that we're not trying to compile ULE on an unsupported architecture. To date, I believe only i386 and amd64 have implemented the third cpu switch argument required. Approved by: re	2007-09-27 16:39:27 +00:00
ru	78adc698af	Fix the description of the formula used to autosize the number of buffers in the buffer cache. Approved by: re (kensmith)	2007-09-26 11:22:23 +00:00
alc	d1bce06c64	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
jeff	ea91086eb3	- Bound the interactivity score so that it cannot become negative. Approved by: re	2007-09-24 00:28:54 +00:00

... 3 4 5 6 7 ...

10547 Commits