freebsd-nq

Author	SHA1	Message	Date
Alfred Perlstein	8125b1e66e	Simplify vm_object_deallocate(), by decrementing the refcount first. This allows some of the conditionals to be combined.	2001-03-04 20:25:23 +00:00
Andrew Gallatin	c909b97167	Allocate vm_page_array and vm_page_buckets from the end of the biggest chunk of memory, rather than from the start. This fixes problems allocating bouncebuffers on alphas where there is only 1 chunk of memory (unlike PCs where there is generally at least one small chunk and a large chunk). Having 1 chunk had been fatal, because these structures take over 13MB on a machine with 1GB of ram. This doesn't leave much room for other structures and bounce buffers if they're at the front. Reviewed by: dfr, anderson@cs.duke.edu, silence on -arch Tested by: Yoriaki FUJIMORI <fujimori@grafin.fujimori.cache.waseda.ac.jp>	2001-03-01 19:21:24 +00:00
Matthew Dillon	5bf53acb74	If we intend to make the page writable without requiring another fault, make sure that PG_NOSYNC is properly set. Previously we only set it for a write-fault, but this can occur on a read-fault too. (will be MFCd prior to 4.3 freeze)	2001-02-28 04:26:43 +00:00
Robert Watson	edfa785a8e	Introduce per-swap area accounting in the VM system, and export this information via the vm.nswapdev sysctl (number of swap areas) and vm.swapdevX nodes (where X is the device), which contain the MIBs dev, blocks, used, and flags. These changes are required to allow top and other userland swap-monitoring utilities to run without setgid kmem. Submitted by: Thomas Moestl <tmoestl@gmx.net> Reviewed by: freebsd-audit	2001-02-23 18:46:21 +00:00
Dag-Erling Smørgrav	2f9564de0f	Fix formatting bugs introduced in sysctl_vm_zone() by the previous commit. Also, if SYSCTL_OUT() returns a non-zero value, stop at once.	2001-02-22 14:44:39 +00:00
Jake Burkholder	d5a08a6065	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.	2001-02-12 00:20:08 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
Poul-Henning Kamp	fc2ffbe604	Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 13:13:25 +00:00
Matthew Dillon	4e71e795a1	This commit represents work mainly submitted by Tor and slightly modified by myself. It solves a serious vm_map corruption problem that can occur with the buffer cache when block sizes > 64K are used. This code has been heavily tested in -stable but only tested somewhat on -current. An MFC will occur in a few days. My additions include the vm_map_simplify_entry() and minor buffer cache boundry case fix. Make the buffer cache use a system map for buffer cache KVM rather then a normal map. Ensure that VM objects are not allocated for system maps. There were cases where a buffer map could wind up with a backing VM object -- normally harmless, but this could also result in the buffer cache blocking in places where it assumes no blocking will occur, possibly resulting in corrupted maps. Fix a minor boundry case in the buffer cache size limit is reached that could result in non-optimal code. Add vm_map_simplify_entry() calls to prevent 'creeping proliferation' of vm_map_entry's in the buffer cache's vm_map. Previously only a simple linear optimization was made. (The buffer vm_map typically has only a handful of vm_map_entry's. This stabilizes it at that level permanently). PR: 20609 Submitted by: (Tor Egge) tegge	2001-02-04 06:19:28 +00:00
John Baldwin	45ece682fd	- Doh, lock faultin() with proc lock in scheduler(). - Lock p_swtime with sched_lock in scheduler() as well.	2001-01-25 01:38:09 +00:00
Jason Evans	1b367556b5	Convert all simplelocks to mutexes and remove the simplelock implementations.	2001-01-24 12:35:55 +00:00
John Baldwin	69b4045657	Argh, I didn't get this test right when I converted it. Break this up into two separate if's instead of nested if's. Also, reorder things slightly to avoid unnecessary mutex operations.	2001-01-24 12:23:17 +00:00
John Baldwin	8606d88043	- Catch up to proc flag changes. - Minimal proc locking. - Use queue macros.	2001-01-24 11:28:36 +00:00
John Baldwin	e2181d41d0	Add mtx_assert()'s to verify that kmem_alloc() and kmem_free() are called with Giant held.	2001-01-24 11:27:29 +00:00
John Baldwin	5074aecd6c	- Catch up to proc flag changes. - Proc locking in a few places. - faultin() now must be called with the proc lock held. - Split up swappable() into a couple of tests so that it can be locke in swapout_procs(). - Use queue macros.	2001-01-24 11:25:56 +00:00
John Baldwin	b939335607	- Catch up to proc flag changes.	2001-01-24 11:20:05 +00:00
John Baldwin	0f68b6595a	Add missing include.	2001-01-24 06:54:24 +00:00
Hajimu UMEMOTO	5d22597f3a	Add mibs to hold the number of forks since boot. New mibs are: vm.stats.vm.v_forks vm.stats.vm.v_vforks vm.stats.vm.v_rforks vm.stats.vm.v_kthreads vm.stats.vm.v_forkpages vm.stats.vm.v_vforkpages vm.stats.vm.v_rforkpages vm.stats.vm.v_kthreadpages Submitted by: Paul Herman <pherman@frenchfries.net> Reviewed by: alfred	2001-01-23 14:32:01 +00:00
Jake Burkholder	43be6e2fa2	Sigh. atomic_add_int takes a pointer, not an integer. Pointy-hat-to: des	2001-01-23 03:40:27 +00:00
Dag-Erling Smørgrav	ac2e223868	Use atomic operations to update the stat counters.	2001-01-23 01:11:11 +00:00
Dag-Erling Smørgrav	e78411656b	Call vm_zone_init() at the appropriate time. Reviewed by: jasone, jhb	2001-01-22 07:02:42 +00:00
Dag-Erling Smørgrav	0b0dfb6b07	Give this code a major facelift: - replace the simplelock in struct vm_zone with a mutex. - use a proper SLIST rather than a hand-rolled job for the zone list. - add a subsystem lock that protects the zone list and the statistics counters. - merge _zalloc() into zalloc() and _zfree() into zfree(), and move them below _zget() so there's no need for a prototype. - add two initialization functions: one which initializes the subsystem mutex and the zone list, and one that currently doesn't do anything. - zap zerror(); use KASSERTs instead. - dike out half of sysctl_vm_zone(), which was mostly trying to do manually what the snprintf() call could do better. Reviewed by: jhb, jasone	2001-01-22 07:01:50 +00:00
Dag-Erling Smørgrav	a3ea6d41b9	First step towards an MP-safe zone allocator: - have zalloc() and zfree() always lock the vm_zone. - remove zalloci() and zfreei(), which are now redundant. Reviewed by: bmilekic, jasone	2001-01-21 22:23:11 +00:00
Alfred Perlstein	030f23696c	fix comment which was outdated 3 years ago remove useless assignment purge entire file of 'register' keyword	2000-12-29 13:49:05 +00:00
Alfred Perlstein	6e4f51d1ac	clean up kmem_suballoc(): remove useless assignment remove 'register' variables	2000-12-29 13:05:22 +00:00
Assar Westerlund	68d74cd044	Make zalloc and zfree non-inline functions. This avoids having to have the code calling these be compiled with the same setting for INVARIANTS and SMP. Reviewed by: dillon	2000-12-27 02:54:37 +00:00
Matthew Dillon	2b6b0df712	This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.	2000-12-26 19:41:38 +00:00
Poul-Henning Kamp	065b25803d	Fix floppy drives on machines with lots of RAM. The fix works by reverting the ordering of free memory so that the chances of contig_malloc() succeeding increases. PR: 23291 Submitted by: Andrew Atrens <atrens@nortel.ca>	2000-12-18 20:12:13 +00:00
Seigo Tanimura	21cd6e6232	- If swap metadata does not fit into the KVM, reduce the number of struct swblock entries by dividing the number of the entries by 2 until the swap metadata fits. - Reject swapon(2) upon failure of swap_zone allocation. This is just a temporary fix. Better solutions include: (suggested by: dillon) o reserving swap in SWAP_META_PAGES chunks, and o swapping the swblock structures themselves. Reviewed by: alfred, dillon	2000-12-13 10:01:00 +00:00
Jake Burkholder	c0c2557090	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.	2000-12-13 00:17:05 +00:00
Matthew Dillon	02fa91d35e	Be less conservative with a recently added KASSERT. Certain edge cases with file fragments and read-write mmap's can lead to a situation where a VM page has odd dirty bits, e.g. 0xFC - due to being dirtied by an mmap and only the fragment (representing a non-page-aligned end of file) synced via a filesystem buffer. A correct solution that guarentees consistent m->dirty for the file EOF case is being worked on. In the mean time we can't be so conservative in the KASSERT.	2000-12-11 07:52:47 +00:00
David Malone	7cc0979fd6	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
Alfred Perlstein	b5861b3450	Really fix phys_pager: Backout the previous delta (rev 1.4), it didn't make any difference. If the requested handle is NULL then don't add it to the list of objects, to be found by handle. The problem is that when asking for a NULL handle you are implying you want a new object. Because objects with NULL handles were being added to the list, any further requests for phys backed objects with NULL handles would return a reference to the initial NULL handle object after finding it on the list. Basically one couldn't have more than one phys backed object without a handle in the entire system without this fix. If you did more than one shared memory allocation using the phys pager it would give you your initial allocation again.	2000-12-06 21:52:23 +00:00
Alfred Perlstein	54019b0afe	need to adjust allocation size to properly deal with non PAGE_SIZE allocations, specifically with allocations < PAGE_SIZE when the code doesn't work properly	2000-12-05 22:22:24 +00:00
Bruce Evans	03b67a395f	Backed out previous commit. Don't depend on namespace pollution in <sys/buf.h>.	2000-12-02 12:03:58 +00:00
John Baldwin	c5a44a6af6	Protect p_stat with sched_lock.	2000-12-02 06:09:44 +00:00
John Baldwin	c8a6b0011c	Protect p_stat with sched_lock.	2000-12-02 03:29:33 +00:00
Alfred Perlstein	82625cf321	remove unneded sys/ucred.h includes	2000-11-30 18:52:32 +00:00
Jake Burkholder	553629ebc9	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd	2000-11-22 07:42:04 +00:00
Robert Watson	cee313c431	o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT. This removes a reason that systat requires setgid kmem. More to come.	2000-11-20 00:39:04 +00:00
Matthew Dillon	936524aa02	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>	2000-11-18 23:06:26 +00:00
Matthew Dillon	ef0646f9d8	Add the splvm()'s suggested in PR 20609 to protect vm_pager_page_unswapped(). The remainder of the PR is still open. PR: kern/20609 (partial fix)	2000-11-18 21:11:23 +00:00
Matthew Dillon	279d722604	This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>	2000-11-18 21:01:04 +00:00
Tor Egge	028fe6ec24	Clear the MAP_ENTRY_USER_WIRED flag from cloned vm_map entries. PR: 2840	2000-11-02 21:38:18 +00:00
Poul-Henning Kamp	9f69a4578a	Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ing the offending inline function (BUF_KERNPROC) on it being #included already. I'm not sure BUF_KERNPROC() is even the right thing to do or in the right place or implemented the right way (inline vs normal function). Remove consequently unneeded #includes of <sys/proc.h>	2000-10-29 14:54:55 +00:00
John Baldwin	915cf38b11	- Catch a machine/mutex.h -> sys/mutex.h I somehow missed. - Close a small race condition. The sched_lock mutex protects p->p_stat as well as the run queues. Another CPU could change p_stat of the process while we are waiting for the lock, and we would end up scheduling a process that isn't runnable.	2000-10-25 00:04:16 +00:00
Paul Saab	c794ceb56a	Implement write combining for crashdumps. This is useful when write caching is disabled on both SCSI and IDE disks where large memory dumps could take up to an hour to complete. Taking an i386 scsi based system with 512MB of ram and timing (in seconds) how long it took to complete a dump, the following results were obtained: Before: After: WCE TIME WCE TIME ------------------ ------------------ 1 141.820972 1 15.600111 0 797.265072 0 65.480465 Obtained from: Yahoo! Reviewed by: peter	2000-10-17 10:05:49 +00:00
Matthew Dillon	64bcb9c815	The swap bitmap allocator was not calculating the bitmap size properly in the face of non-stripe-aligned swap areas. The bug could cause a panic during boot. Refuse to configure a swap area that is too large (67 GB or so) Properly document the power-of-2 requirement for SWB_NPAGES. The patch is slightly different then the one Tor enclosed in the P.R., but accomplishes the same thing. PR: kern/20273 Submitted by: Tor.Egge@fast.no	2000-10-13 16:44:34 +00:00
Jason Evans	9722d88fba	For lockmgr mutex protection, use an array of mutexes that are allocated and initialized during boot. This avoids bloating sizeof(struct lock). As a side effect, it is no longer necessary to enforce the assumtion that lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been removed. Idea taken from: BSD/OS.	2000-10-12 22:37:28 +00:00
David Malone	a77c9610f2	If a process is over its resource limit for datasize, still allow it to lower its memory usage. This was mentioned on the mailing lists ages ago, and I've lost the name of the person who brought it up. Reviewed by: alc	2000-10-06 13:03:50 +00:00

1 2 3 4 5 ...

963 Commits