freebsd-dev

Author	SHA1	Message	Date
John Baldwin	6d541bf1ae	- Protect all accesses to nsw_[rw]count{,_{,a}sync} with the pbuf mutex. - Don't drop the vm mutex while grabbing the pbuf mutex to manipulate said variables.	2001-06-22 21:12:19 +00:00
Bosko Milekic	08442f8a82	Introduce numerous SMP friendly changes to the mbuf allocator. Namely, introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages: o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks. Additional things changed with this addition: - Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl. TODO (in order of priority): - Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc. Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/	2001-06-22 06:35:32 +00:00
John Baldwin	ad6c5bbede	Don't lock around swap_pager_swap_init() that is only called once during the pagedaemon's startup code since it calls malloc which results in lock order reversals.	2001-06-20 23:34:06 +00:00
John Baldwin	69a78d4666	Put the scheduler, vmdaemon, and pagedaemon kthreads back under Giant for now. The proc locking isn't actually safe yet and won't be until the proc locking is finished.	2001-06-20 00:48:20 +00:00
Matthew Dillon	ef6a93ef81	Cleanup the tabbing	2001-06-11 19:17:05 +00:00
Matthew Dillon	ff2b5645b5	Two fixes to the out-of-swap process termination code. First, start killing processes a little earlier to avoid a deadlock. Second, when calculating the 'largest process' do not just count RSS. Instead count the RSS + SWAP used by the process. Without this the code tended to kill small inconsequential processes like, oh, sshd, rather then one of the many 'eatmem 200MB' I run on a whim :-). This fix has been extensively tested on -stable and somewhat tested on -current and will be MFCd in a few days. Shamed into fixing this by: ps	2001-06-09 18:06:58 +00:00
Thomas Moestl	5c5c8fa826	Change the way information about swap devices is exported to be more canonical: define a versioned struct xswdev, and add a sysctl node handler that allows the user to get this structure for a certain device index by specifying this index as last element of the MIB. This new node handler, vm.swap_info, replaces the old vm.nswapdev and vm.swapdevX.* (where X was the index) sysctls.	2001-06-01 22:53:10 +00:00
Thomas Moestl	d279178df7	Clean up the code exporting interrupt statistics via sysctl a bit: - move the sysctl code to kern_intr.c - do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine the length of the intrcnt array - move the declarations of intrnames, eintrnames, intrcnt and eintrcnt from machine-dependent include files to sys/interrupt.h - remove the hw.nintr sysctl, it is not needed. - fix various style bugs Requested by: bde Reviewed by: bde (some time ago)	2001-06-01 13:23:28 +00:00
John Baldwin	342a1480aa	Don't hold the VM lock across VOP's and other things that can sleep.	2001-05-29 16:58:25 +00:00
John Baldwin	190609dd48	Stick VM syscalls back under Giant if the BLEED option is not defined.	2001-05-24 18:04:29 +00:00
Matthew Dillon	ac8f990bde	This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon	2001-05-24 07:22:27 +00:00
John Baldwin	e6b961ffbd	- Assert Giant is held in the vnode pager methods. - Lock the VM while walking down a vm_object's backing_object list in vnode_pager_lock().	2001-05-23 22:51:23 +00:00
John Baldwin	3614c6fcbb	- Add in several asserts of vm_mtx. - Assert Giant in vm_pageout_scan() for the vnode hacking that it does. - Don't hold vm_mtx around vget() or vput(). - Lock Giant when calling vm_pageout_scan() from the pagedaemon. Also, lock curproc while setting the P_BUFEXHAUST flag. - For now we still hold Giant for all of the vm_daemon. When process limits are locked we will be only need Giant for swapout_procs().	2001-05-23 22:48:28 +00:00
John Baldwin	60517fd1f7	- Assert that the vm lock is held for all of _vm_object_allocate(). - Restore the previous order of setting up a new vm_object. The previous had a small bug where we zero'd out the flags after we set the OBJ_ONEMAPPING flag. - Add several asserts of vm_mtx. - Assert Giant is held rather than locking and unlocking it in a few places. - Add in some #ifdef objlocks code to lock individual vm objects when vm objects each have their own lock someday. - Don't bother acquiring the allproc lock for a ddb command. If DDB blocked on the lock, that would be worse than having an inconsistent allproc list.	2001-05-23 22:42:10 +00:00
John Baldwin	21c641b2a9	- Add lots of vm_mtx assertions. - Add a few KTR tracepoints to track the addition and removal of vm_map_entry's and the creation adn free'ing of vmspace's. - Adjust a few portions of code so that we update the process' vmspace pointer to its new vmspace before freeing the old vmspace.	2001-05-23 22:38:00 +00:00
John Baldwin	3a2189d451	- Lock the VM around the pmap_swapin_proc() call in faultin(). - Don't lock Giant in the scheduler() function except for when calling faultin(). - In swapout_procs(), lock the VM before the proccess to avoid a lock order violation. - In swapout_procs(), release the allproc lock before calling swapout(). We restart the process scan after swapping out a process. - In swapout_procs(), un #if 0 the code to bump the vmspace reference count and lock the process' vm structures. This bug was introduced by me and could result in the vmspace being free'd out from under a running process. - Fix an old bug where the vmspace reference count was not free'd if we failed the swap_idle_threshold2 test.	2001-05-23 22:35:45 +00:00
John Baldwin	b608320d4a	- Fix the sw_alloc_interlock to actually lock itself when the lock is acquired. - Assert Giant is held in the strategy, getpages, and putpages methods and the getchainbuf, flushchainbuf, and waitchainbuf functions. - Always call flushchainbuf() w/o the VM lock.	2001-05-23 22:31:15 +00:00
John Baldwin	6d556da5c2	Assert Giant is held for the device pager alloc and getpages methods since we call the mmap method of the cdevsw of the device we are mmap'ing.	2001-05-23 22:27:52 +00:00
John Baldwin	e4ca250d4b	- Obtain Giant in mmap() syscall while messing with file descriptors and vnodes. - Fix an old bug that would leak a reference to a fd if the vnode being mmap'd wasn't of type VREG or VCHR. - Lock Giant in vm_mmap() around calls into the VM that can call into pager routines that need Giant or into other VM routines that need Giant. - Replace code that used a goto to jump around the else branch of a test to use an else branch instead.	2001-05-23 22:17:43 +00:00
John Baldwin	bb10bb4978	Acquire Giant around vm_map_remove() inside of the obreak() syscall for vm_object_terminate().	2001-05-23 22:13:10 +00:00
John Baldwin	576f0c5fa4	Take a more conservative approach and still lock Giant around VM faults for now.	2001-05-23 22:09:18 +00:00
John Baldwin	c52f090cfb	Set the phys_pager_alloc_lock to 1 when it is acquired so that it is actually locked.	2001-05-23 19:52:23 +00:00
Alfred Perlstein	c5e62505ad	aquire Giant when playing with the buffercache and doing IO. use msleep against the vm mutex while waiting for a page IO to complete.	2001-05-23 10:28:11 +00:00
Alfred Perlstein	240e0fdd93	aquire vm mutex in swp_pager_async_iodone. Don't call swp_pager_async_iodone with the mutex held.	2001-05-22 19:01:26 +00:00
John Baldwin	86e92ee7e1	Remove duplicate include and sort includes.	2001-05-22 07:21:46 +00:00
John Baldwin	7d4ad42de5	Sort includes.	2001-05-22 07:01:11 +00:00
John Baldwin	12635f9c89	Unlock the VM lock at the end of munlock() instead of locking it again.	2001-05-22 06:07:36 +00:00
John Baldwin	874468957d	Sort includes from previous commit.	2001-05-22 05:35:45 +00:00
John Baldwin	4edf4a58e6	Sort includes.	2001-05-22 00:56:25 +00:00
Alfred Perlstein	2395531439	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
John Baldwin	ea7549540f	- Use a timeout for the tsleep in scheduler() instead of having vmmeter() wakeup proc0 by hand to enforce the timeout. - When swapping out a process, keep the process locked via the proc lock from the first checks up until we clear PS_INMEM and set PS_SWAPPING in swapout(). The swapout() function now must be called with the proc lock held and releases it before returning. - Comment out the code to attempt to lock a process' VM structures before swapping out. It is broken in that it releases the lock after obtaining it. If it does grab the lock, it needs to hand it off to swapout() instead of releasing it. This can be revisisted when the VM is locked as this is a valid test to perform. It also causes a lock order reversal for the time being, which is the immediate cause for temporarily disabling it.	2001-05-18 00:08:38 +00:00
John Baldwin	1c58e4e550	During the code to pick a process to kill when memory is exhausted, keep the process in question locked as soon as we find it and determine it to be eligible until we actually kill it. To avoid deadlock, we don't block on the process lock but skip any process that is already locked during our search.	2001-05-17 22:49:03 +00:00
John Baldwin	c96d52a913	- Use PROC_LOCK_ASSERT instead of a direct mtx_assert. - Don't hold Giant in the swapper daemon while we walk the list of processes looking for a process to swap back in. - Don't bother grabbing the sched_lock while checking a process' sleep time in swapout_procs() to ensure that a process has been idle for at least swap_idle_threshold2 before swapping it out. If we lose the race we just let a process stay in memory until the next call of swapout_procs(). - Remove some unneeded spl's, sched_lock does all the locking needed in this case.	2001-05-15 22:20:44 +00:00
Poul-Henning Kamp	a468031ce8	Actually biofinish(struct bio , struct devstat , int error) is more general than the bioerror(). Most of this patch is generated by scripts.	2001-05-06 20:00:03 +00:00
Mark Murray	559034b748	Putting sys/lockmgr.h in here allows us to depollute userland includes a bit. OK'ed by: bde	2001-05-03 11:33:51 +00:00
Mark Murray	fb919e4d5a	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
Greg Lehey	60fb0ce365	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
Alfred Perlstein	93c7ba9f09	Address a number of problems with sysctl_vm_zone(). The zone allocator's locks should be leaflocks, meaning that they should never be held when entering into another subsystem, however the sysctl grabs the zone global mutex and individual zone mutexes while holding the lock it calls SYSCTL_OUT which recurses into the VM subsystem in order to wire user memory to do a safe copy. This can block and cause lock order reversals. To fix this: lock zone global. get a count of the number of zones. unlock global. allocate temporary storage. format and SYSCTL_OUT the banner. lock global. traverse list. make sure we haven't looped more than the initial count taken to avoid overflowing the allocated buffer. lock each nodes. read values and format into buffer. unlock individual node. unlock global. format and SYSCTL_OUT the rest of the data. free storage. return. Other problems included not checking for errors when doing sysctl out of the column header. Fixed. Inconsistant termination of the copied string. Fixed. Objected to by: des (for not using sbuf) Since the output is not variable length and I'm actually over allocating signifigantly and I'd like to get this fixed now, I'll work on the sbuf convertion at a later date. I would not object to someone else taking it upon themselves to convert it to sbuf. I hold no MAINTIANER rights to this code (for now).	2001-04-27 22:24:45 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
Alfred Perlstein	d8d5fa8805	vnode_pager_freepage() is really vm_page_free() in disguise, nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()	2001-04-19 06:18:23 +00:00
Alfred Perlstein	a9fa2c05fc	Protect pager object creation with sx locks. Protect pager object list manipulation with a mutex. It doesn't look possible to combine them under a single sx lock because creation may block and we can't have the object list manipulation block on anything other than a mutex because of interrupt requests.	2001-04-18 20:24:16 +00:00
Alfred Perlstein	305dd591ee	Fix the botched rev 1.59 where I made it such that without INVARIANTS the map is never locked. Submitted by: tegge	2001-04-18 05:30:24 +00:00
Poul-Henning Kamp	f84e29a06c	This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.	2001-04-17 08:56:39 +00:00
Alfred Perlstein	cc64b484dd	use TAILQ_FOREACH, fix a comment's location	2001-04-15 10:22:04 +00:00
Alfred Perlstein	971dd34298	if/panic -> KASSERT	2001-04-13 11:15:40 +00:00
Alfred Perlstein	2a758ebe58	protect pbufs and associated counts with a mutex	2001-04-13 10:23:32 +00:00
Alfred Perlstein	493607117e	use %p for pointer printf, include sys/systm.h for printf proto	2001-04-13 10:22:14 +00:00
Alfred Perlstein	7d26b6a450	Use a macro wrapper over printf along with KASSERT to reduce the amount of code here.	2001-04-13 08:07:37 +00:00
Alfred Perlstein	b28cb1ca07	remove truncated part from commment	2001-04-12 21:50:03 +00:00
John Baldwin	1005a129e5	Convert the allproc and proctree locks from lockmgr locks to sx locks.	2001-03-28 11:52:56 +00:00
John Baldwin	f34fa851e0	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>	2001-03-28 09:17:56 +00:00
Thomas Moestl	368d2edce4	Export intrnames and intrcnt as sysctls (hw.nintr, hw.intrnames and hw.intrcnt). Approved by: rwatson	2001-03-23 03:45:17 +00:00
Matthew Dillon	b823bbd6be	Fix a lock reversal problem in the VM subsystem related to threaded programs. There is a case during a fork() which can cause a deadlock. From Tor - The workaround that consists of setting a flag in the vm map that indicates that a fork is in progress and using that mark in the page fault handling to force a revalidation failure. That change will only affect (pessimize) page fault handling during fork for threaded (linuxthreads style) applications and applications using aio_*(). Submited by: tegge	2001-03-14 06:48:53 +00:00
Matthew Dillon	1a484d28dd	Temporarily remove the vm_map_simplify() call from vm_map_insert(). The call is correct, but it interferes with the massive hack called vm_map_growstack(). The call will be returned after our stack handling code is fixed. Reported by: tegge	2001-03-14 06:09:42 +00:00
Ian Dowse	d30344bdfa	When creating a shadow vm_object in vmspace_fork(), only one reference count was transferred to the new object, but both the new and the old map entries had pointers to the new object. Correct this by transferring the second reference. This fixes a panic that can occur when mmap(2) is used with the MAP_INHERIT flag. PR: i386/25603 Reviewed by: dillon, alc	2001-03-09 18:25:54 +00:00
John Baldwin	136d8f42b9	Unrevert the pmap_map() changes. They weren't broken on x86. Sense beaten into me by: peter	2001-03-07 05:29:21 +00:00
John Baldwin	4a01ebd482	Back out the pmap_map() change for now, it isn't completely stable on the i386.	2001-03-07 01:04:17 +00:00
John Baldwin	968950e5d1	- Rework pmap_map() to take advantage of direct-mapped segments on supported architectures such as the alpha. This allows us to save on kernel virtual address space, TLB entries, and (on the ia64) VHPT entries. pmap_map() now modifies the passed in virtual address on architectures that do not support direct-mapped segments to point to the next available virtual address. It also returns the actual address that the request was mapped to. - On the IA64 don't use a special zone of PV entries needed for early calls to pmap_kenter() during pmap_init(). This gets us in trouble because we end up trying to use the zone allocator before it is initialized. Instead, with the pmap_map() change, the number of needed PV entries is small enough that we can get by with a static pool that is used until pmap_init() is complete. Submitted by: dfr Debugging help: peter Tested by: me	2001-03-06 06:06:42 +00:00
Alfred Perlstein	8125b1e66e	Simplify vm_object_deallocate(), by decrementing the refcount first. This allows some of the conditionals to be combined.	2001-03-04 20:25:23 +00:00
Andrew Gallatin	c909b97167	Allocate vm_page_array and vm_page_buckets from the end of the biggest chunk of memory, rather than from the start. This fixes problems allocating bouncebuffers on alphas where there is only 1 chunk of memory (unlike PCs where there is generally at least one small chunk and a large chunk). Having 1 chunk had been fatal, because these structures take over 13MB on a machine with 1GB of ram. This doesn't leave much room for other structures and bounce buffers if they're at the front. Reviewed by: dfr, anderson@cs.duke.edu, silence on -arch Tested by: Yoriaki FUJIMORI <fujimori@grafin.fujimori.cache.waseda.ac.jp>	2001-03-01 19:21:24 +00:00
Matthew Dillon	5bf53acb74	If we intend to make the page writable without requiring another fault, make sure that PG_NOSYNC is properly set. Previously we only set it for a write-fault, but this can occur on a read-fault too. (will be MFCd prior to 4.3 freeze)	2001-02-28 04:26:43 +00:00
Robert Watson	edfa785a8e	Introduce per-swap area accounting in the VM system, and export this information via the vm.nswapdev sysctl (number of swap areas) and vm.swapdevX nodes (where X is the device), which contain the MIBs dev, blocks, used, and flags. These changes are required to allow top and other userland swap-monitoring utilities to run without setgid kmem. Submitted by: Thomas Moestl <tmoestl@gmx.net> Reviewed by: freebsd-audit	2001-02-23 18:46:21 +00:00
Dag-Erling Smørgrav	2f9564de0f	Fix formatting bugs introduced in sysctl_vm_zone() by the previous commit. Also, if SYSCTL_OUT() returns a non-zero value, stop at once.	2001-02-22 14:44:39 +00:00
Jake Burkholder	d5a08a6065	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.	2001-02-12 00:20:08 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
Poul-Henning Kamp	fc2ffbe604	Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 13:13:25 +00:00
Matthew Dillon	4e71e795a1	This commit represents work mainly submitted by Tor and slightly modified by myself. It solves a serious vm_map corruption problem that can occur with the buffer cache when block sizes > 64K are used. This code has been heavily tested in -stable but only tested somewhat on -current. An MFC will occur in a few days. My additions include the vm_map_simplify_entry() and minor buffer cache boundry case fix. Make the buffer cache use a system map for buffer cache KVM rather then a normal map. Ensure that VM objects are not allocated for system maps. There were cases where a buffer map could wind up with a backing VM object -- normally harmless, but this could also result in the buffer cache blocking in places where it assumes no blocking will occur, possibly resulting in corrupted maps. Fix a minor boundry case in the buffer cache size limit is reached that could result in non-optimal code. Add vm_map_simplify_entry() calls to prevent 'creeping proliferation' of vm_map_entry's in the buffer cache's vm_map. Previously only a simple linear optimization was made. (The buffer vm_map typically has only a handful of vm_map_entry's. This stabilizes it at that level permanently). PR: 20609 Submitted by: (Tor Egge) tegge	2001-02-04 06:19:28 +00:00
John Baldwin	45ece682fd	- Doh, lock faultin() with proc lock in scheduler(). - Lock p_swtime with sched_lock in scheduler() as well.	2001-01-25 01:38:09 +00:00
Jason Evans	1b367556b5	Convert all simplelocks to mutexes and remove the simplelock implementations.	2001-01-24 12:35:55 +00:00
John Baldwin	69b4045657	Argh, I didn't get this test right when I converted it. Break this up into two separate if's instead of nested if's. Also, reorder things slightly to avoid unnecessary mutex operations.	2001-01-24 12:23:17 +00:00
John Baldwin	8606d88043	- Catch up to proc flag changes. - Minimal proc locking. - Use queue macros.	2001-01-24 11:28:36 +00:00
John Baldwin	e2181d41d0	Add mtx_assert()'s to verify that kmem_alloc() and kmem_free() are called with Giant held.	2001-01-24 11:27:29 +00:00
John Baldwin	5074aecd6c	- Catch up to proc flag changes. - Proc locking in a few places. - faultin() now must be called with the proc lock held. - Split up swappable() into a couple of tests so that it can be locke in swapout_procs(). - Use queue macros.	2001-01-24 11:25:56 +00:00
John Baldwin	b939335607	- Catch up to proc flag changes.	2001-01-24 11:20:05 +00:00
John Baldwin	0f68b6595a	Add missing include.	2001-01-24 06:54:24 +00:00
Hajimu UMEMOTO	5d22597f3a	Add mibs to hold the number of forks since boot. New mibs are: vm.stats.vm.v_forks vm.stats.vm.v_vforks vm.stats.vm.v_rforks vm.stats.vm.v_kthreads vm.stats.vm.v_forkpages vm.stats.vm.v_vforkpages vm.stats.vm.v_rforkpages vm.stats.vm.v_kthreadpages Submitted by: Paul Herman <pherman@frenchfries.net> Reviewed by: alfred	2001-01-23 14:32:01 +00:00
Jake Burkholder	43be6e2fa2	Sigh. atomic_add_int takes a pointer, not an integer. Pointy-hat-to: des	2001-01-23 03:40:27 +00:00
Dag-Erling Smørgrav	ac2e223868	Use atomic operations to update the stat counters.	2001-01-23 01:11:11 +00:00
Dag-Erling Smørgrav	e78411656b	Call vm_zone_init() at the appropriate time. Reviewed by: jasone, jhb	2001-01-22 07:02:42 +00:00
Dag-Erling Smørgrav	0b0dfb6b07	Give this code a major facelift: - replace the simplelock in struct vm_zone with a mutex. - use a proper SLIST rather than a hand-rolled job for the zone list. - add a subsystem lock that protects the zone list and the statistics counters. - merge _zalloc() into zalloc() and _zfree() into zfree(), and move them below _zget() so there's no need for a prototype. - add two initialization functions: one which initializes the subsystem mutex and the zone list, and one that currently doesn't do anything. - zap zerror(); use KASSERTs instead. - dike out half of sysctl_vm_zone(), which was mostly trying to do manually what the snprintf() call could do better. Reviewed by: jhb, jasone	2001-01-22 07:01:50 +00:00
Dag-Erling Smørgrav	a3ea6d41b9	First step towards an MP-safe zone allocator: - have zalloc() and zfree() always lock the vm_zone. - remove zalloci() and zfreei(), which are now redundant. Reviewed by: bmilekic, jasone	2001-01-21 22:23:11 +00:00
Alfred Perlstein	030f23696c	fix comment which was outdated 3 years ago remove useless assignment purge entire file of 'register' keyword	2000-12-29 13:49:05 +00:00
Alfred Perlstein	6e4f51d1ac	clean up kmem_suballoc(): remove useless assignment remove 'register' variables	2000-12-29 13:05:22 +00:00
Assar Westerlund	68d74cd044	Make zalloc and zfree non-inline functions. This avoids having to have the code calling these be compiled with the same setting for INVARIANTS and SMP. Reviewed by: dillon	2000-12-27 02:54:37 +00:00
Matthew Dillon	2b6b0df712	This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.	2000-12-26 19:41:38 +00:00
Poul-Henning Kamp	065b25803d	Fix floppy drives on machines with lots of RAM. The fix works by reverting the ordering of free memory so that the chances of contig_malloc() succeeding increases. PR: 23291 Submitted by: Andrew Atrens <atrens@nortel.ca>	2000-12-18 20:12:13 +00:00
Seigo Tanimura	21cd6e6232	- If swap metadata does not fit into the KVM, reduce the number of struct swblock entries by dividing the number of the entries by 2 until the swap metadata fits. - Reject swapon(2) upon failure of swap_zone allocation. This is just a temporary fix. Better solutions include: (suggested by: dillon) o reserving swap in SWAP_META_PAGES chunks, and o swapping the swblock structures themselves. Reviewed by: alfred, dillon	2000-12-13 10:01:00 +00:00
Jake Burkholder	c0c2557090	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.	2000-12-13 00:17:05 +00:00
Matthew Dillon	02fa91d35e	Be less conservative with a recently added KASSERT. Certain edge cases with file fragments and read-write mmap's can lead to a situation where a VM page has odd dirty bits, e.g. 0xFC - due to being dirtied by an mmap and only the fragment (representing a non-page-aligned end of file) synced via a filesystem buffer. A correct solution that guarentees consistent m->dirty for the file EOF case is being worked on. In the mean time we can't be so conservative in the KASSERT.	2000-12-11 07:52:47 +00:00
David Malone	7cc0979fd6	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
Alfred Perlstein	b5861b3450	Really fix phys_pager: Backout the previous delta (rev 1.4), it didn't make any difference. If the requested handle is NULL then don't add it to the list of objects, to be found by handle. The problem is that when asking for a NULL handle you are implying you want a new object. Because objects with NULL handles were being added to the list, any further requests for phys backed objects with NULL handles would return a reference to the initial NULL handle object after finding it on the list. Basically one couldn't have more than one phys backed object without a handle in the entire system without this fix. If you did more than one shared memory allocation using the phys pager it would give you your initial allocation again.	2000-12-06 21:52:23 +00:00
Alfred Perlstein	54019b0afe	need to adjust allocation size to properly deal with non PAGE_SIZE allocations, specifically with allocations < PAGE_SIZE when the code doesn't work properly	2000-12-05 22:22:24 +00:00
Bruce Evans	03b67a395f	Backed out previous commit. Don't depend on namespace pollution in <sys/buf.h>.	2000-12-02 12:03:58 +00:00
John Baldwin	c5a44a6af6	Protect p_stat with sched_lock.	2000-12-02 06:09:44 +00:00
John Baldwin	c8a6b0011c	Protect p_stat with sched_lock.	2000-12-02 03:29:33 +00:00
Alfred Perlstein	82625cf321	remove unneded sys/ucred.h includes	2000-11-30 18:52:32 +00:00
Jake Burkholder	553629ebc9	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd	2000-11-22 07:42:04 +00:00
Robert Watson	cee313c431	o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT. This removes a reason that systat requires setgid kmem. More to come.	2000-11-20 00:39:04 +00:00
Matthew Dillon	936524aa02	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>	2000-11-18 23:06:26 +00:00
Matthew Dillon	ef0646f9d8	Add the splvm()'s suggested in PR 20609 to protect vm_pager_page_unswapped(). The remainder of the PR is still open. PR: kern/20609 (partial fix)	2000-11-18 21:11:23 +00:00

1 2 3 4 5 ...

1071 Commits