freebsd-skq

Author	SHA1	Message	Date
phk	cd531ac811	Make the first two pages magic to protect the BSD labels rather than only one.	2003-08-06 14:13:38 +00:00
phk	09d8ecf0bf	Staticize swap_pager_putpages() Eliminate a lot of checkes to make sure requests are not cross-device which is unnecessary with the new layout. We know a sequential request cannot possibly be cross-device because there is a reserved page between the devices. Remove a couple of comments which no longer are relevant.	2003-08-06 12:08:27 +00:00
phk	890df5b795	Explicitly set B_PAGING	2003-08-06 09:22:47 +00:00
phk	d2426c0f94	Rip out the totally bogos vnode swapdev_vp with extreeme prejudice. Don't mark buffers with B_KEEPGIANT, we don't drop giant in strategy at this point in time.	2003-08-06 06:53:31 +00:00
phk	d0c4c329b1	Use sparse struct initialization for struct pagerops. Mark our buffers B_KEEPGIANT before sending them downstream. Remove swap_pager_strategy implementation.	2003-08-05 06:54:56 +00:00
phk	8952fe0759	Put an uncovered page between the swap devices, that way we can be sure to not get any cross-device I/O requests. (The unallocated first page protecting BSD labels already gave us this, but that hack may go away at some point in time). Remove the check for cross-device I/O requests in swap_pager_strategy. Move the repeated statistics updating into flushchainbuf().	2003-08-04 08:22:49 +00:00
phk	049e0c4c31	Name swap_pager_find_dev() more correctly swp_pager_finde_dev(). Use ->bio_children to count child buffers, rather than abuse the bio_caller1 pointer. Expand the relevant bits of waitchainbuf() inline, this clarifies the code a little bit.	2003-08-03 21:22:42 +00:00
phk	2b330fbdd3	I accidentally hit undo before committing, fix the resulting off-by-one.	2003-08-03 14:53:52 +00:00
phk	b51aac6e92	Change the layout policy of the swap_pager from a hardcoded width striping to a per device round-robin algorithm. Because of the policy of not attempting to retain previous swap allocation on page-out, this means that a newly added swap device almost instantly takes its 1/N share of the I/O load but it takes somewhat longer for it to assume it's 1/N share of the pages if there is plenty of space on the other devices. Change the 8G total swapspace limitation to 8G per device instead by using a per device blist rather than one global blist. This reduces the memory footprint by 75% (typically a couple hundred kilobytes) for the common case with one swapdevice but NSWAPDEV=4. Remove the compile time constant limit of number of swap devices, there is no limit now. Instead of a fixed size array, store the per swapdev structure in a TAILQ. Total swap space is still addressed by a 32 bit page number and therefore the upper limit is now 2^42 bytes = 16TB (for i386). We still do not allocate the first page of each device in order to give some amount of protection to any bsdlabel at the start of the device. A new device is appended after the existing devices in the swap space, no attempt is made to fill in holes left behind by swapoff (this can trivially be changed should it ever become a problem). The sysctl vm.nswapdev now reflects the number of currently configured swap devices. Rename vm_swap_size to swap_pager_avail for consistency with other exported names. Change argument type for vm_proc_swapin_all() and swap_pager_isswapped() to be a struct swdevt pointer rather than an index. Not changed: we are still using blists to manage the free space, but since the swapspace is no longer fragmented by the striping different resource managers might fare better.	2003-08-03 13:35:31 +00:00
phk	1a2b042c99	Remove unused stuff. Move used stuff to swap_pager.c where it belongs. This file no longer exports anything to userland.	2003-07-31 22:19:28 +00:00
phk	6221ef9078	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.	2003-07-26 07:32:23 +00:00
phk	676c7dc42a	Remove all but one of the inlines here, this reduces the code size by 2032 bytes and has no measurable impact on performance.	2003-07-22 20:54:26 +00:00
peter	27c163f9f7	swp_pager_hash() was called before it was instantiated inline. This made gcc (quite rightly) unhappy. Move it earlier.	2003-07-22 06:55:48 +00:00
phk	ea98d2c3e5	Fix a printf format warning I introduced. Use the macro max number of swap devices rather than cache the constant in a variable. Avoid a (now) pointless variable.	2003-07-18 22:11:17 +00:00
phk	6fd98eaab7	If a proposed swap device exceeds the 8G artificial limit which out radix-tree code imposes, truncate the device instead of rejecting it.	2003-07-18 11:01:23 +00:00
phk	aa8896f3b8	Move the implementation of the vmspace_swap_count() (used only in the "toss the largest process" emergency handling) from vm_map.c to swap_pager.c. The quantity calculated depends strongly on the internals of the swap_pager and by moving it, we no longer need to expose the internal metrics of the swap_pager to the world.	2003-07-18 10:47:58 +00:00
phk	5fa40a3265	Add a new function swap_pager_status() which reports the total size of the paging space and how much of it is in use (in pages). Use this interface from the Linuxolator instead of groping around in the internals of the swap_pager.	2003-07-18 10:26:09 +00:00
phk	84f9cb2fa8	Merge swap_pager.c and vm_swap.c into swap_pager.c, the separation is not natural and needlessly exposes a lot of dirty laundry. Move private interfaces between the two from swap_pager.h to swap_pager.c and staticize as much as possible. No functional change.	2003-07-18 10:02:44 +00:00
phk	a8381d2cc6	Make sure that SWP_NPAGES always has the same value in all source files, so that SWAP_META_PAGES does not vary either. swap_pager.c ended up with a value of 16, everybody else 8. Go with the 16 for now. This should only have any effect in the "kill processes because we are out of swap" scenario, where it will make some sort of estimate of something more precise.	2003-07-17 21:58:43 +00:00
alc	a4f8c4746a	Maintain the lock on a vm object when calling vm_page_grab().	2003-06-25 04:53:56 +00:00
alc	6106a85499	Make swap_pager_haspages() static; remove unused function prototypes.	2003-06-20 20:20:06 +00:00
phk	bd4fe20bfd	This file was ignored by CVS in my last commit for some reason: Remove pointless initialization of b_spc field, which now no longer exists.	2003-06-16 09:31:15 +00:00
alc	d2198cef89	Extend the scope of the vm object lock in swp_pager_async_iodone() to cover a vm_page_free().	2003-06-13 06:17:42 +00:00
alc	d66a37a0f2	Add vm object locking to various pagers' "get pages" methods, i386 stack management functions, and a u area management function.	2003-06-13 03:02:28 +00:00
obrien	b0678d7a44	Use __FBSDID().	2003-06-11 23:50:51 +00:00
alc	c57f179395	Assert that the vm object is locked on entry to swap_pager_freespace().	2003-06-07 20:43:16 +00:00
alc	d18bfec38d	Lock the vm_object when performing vm_pager_deallocate().	2003-05-06 02:45:28 +00:00
alc	afa35c49d2	- Lock the vm_object when performing swap_pager_isswapped(). - Assert that the vm_object is locked in swap_pager_isswapped().	2003-04-28 17:13:53 +00:00
alc	373b18b5c3	- Convert vm_object_pip_wait() from using tsleep() to msleep(). - Make vm_object_pip_sleep() static. - Lock the vm_object when performing vm_object_pip_wait().	2003-04-26 18:33:18 +00:00
alc	114f28a272	- Lock the vm_object when performing vm_object_pip_add(). - Remove an unnecessary variable.	2003-04-20 07:08:30 +00:00
alc	5990076d78	- Lock the vm_object when performing vm_object_pip_add().	2003-04-20 03:41:21 +00:00
alc	dc48d3db81	- Lock the vm_object when performing vm_object_pip_subtract(). - Assert that the vm_object lock is held in vm_object_pip_subtract().	2003-04-19 22:11:41 +00:00
alc	ef4e8a19cf	- Lock the vm_object when performing vm_object_pip_wakeupn(). - Assert that the vm_object lock is held in vm_object_pip_wakeupn(). - Add a new macro VM_OBJECT_LOCK_ASSERT().	2003-04-19 21:15:44 +00:00
imp	cf874b345d	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
alfred	bf8e8a6e8f	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
phk	88bd74d184	Avoid extern decls in .c files by putting them in the vm/swap_pager.h include file where they belong. Share the dmmax_mask variable.	2003-01-03 14:30:46 +00:00
phk	daf6948653	Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.	2003-01-03 06:32:15 +00:00
alc	a0ec4e5670	Hold the page queues lock when performing vm_page_flag_set().	2002-12-18 04:02:02 +00:00
dillon	b43fb3e920	This is David Schultz's swapoff code which I am finally able to commit. This should be considered highly experimental for the moment. Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU> MFC after: 3 weeks	2002-12-15 19:17:57 +00:00
alc	631af658fa	Remove vm_page_protect(). Instead, use pmap_page_protect() directly.	2002-11-18 04:05:22 +00:00
cognet	1404d98db6	Remove extra #include<sys/vmmeter.h>.	2002-11-11 13:57:50 +00:00
phk	1dfc2c167f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
jeff	4792c0673b	- Lock access to numoutput on the swap devices.	2002-09-25 01:24:17 +00:00
dillon	1703af0c56	Reduce the maximum KVA reserved for swap meta structures from 70 to 32 MB. Reduce the swap meta calculation by a factor of 2, it's still massive overkill. X-MFC after: immediately	2002-08-31 21:15:29 +00:00
alc	b9718e8d79	o Lock page queue accesses by vm_page_free().	2002-07-21 20:38:45 +00:00
alc	d9f77c5b73	o Lock page queue accesses by vm_page_try_to_cache(). (The accesses in kern/vfs_bio.c are already locked.) o Assert that the page queues lock is held in vm_page_try_to_cache().	2002-07-20 20:58:46 +00:00
iedowse	e62ef89fc1	Avoid using the 64-bit vm_pindex_t in a few places where 64-bit types are not required, as the overhead is unnecessary: o In the i386 pmap_protect(), `sindex' and `eindex' represent page indices within the 32-bit virtual address space. o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary variable to store the low few bits of a vm_pindex_t that gets used as an array index. o vm_uiomove() uses `osize' and `idx' for page offsets within a map entry. o In vm_object_split(), `idx' is a page offset within a map entry.	2002-06-26 20:32:51 +00:00
iedowse	c3868afc98	Use an explicit cast to avoid relying on sign extension to do the right thing in code such as `vm_pindex_t x = ~SWAP_META_MASK'. Reviewed by: dillon	2002-06-26 19:18:14 +00:00
alc	ef43ad02fe	o Replace GIANT_REQUIRED in swap_pager_alloc() by the acquisition and release of Giant. (Annotate as MPSAFE.)	2002-06-22 08:03:21 +00:00
jhb	db9aa81e23	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
jeff	9ef9bf2eaf	Remove references to vm_zone.h and switch over to the new uma API.	2002-03-20 04:02:59 +00:00
alfred	1446d09429	Remove __P.	2002-03-19 22:20:14 +00:00
jeff	2923687da3	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@	2002-03-19 09:11:49 +00:00
eivind	0799ec54b1	- Remove a number of extra newlines that do not belong here according to style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs. Reviewed by: alc	2002-03-10 21:52:48 +00:00
jhb	b8b3ac8816	Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.	2002-02-27 19:18:10 +00:00
phk	b9b775cf13	GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED.	2002-02-22 09:26:35 +00:00
tegge	56f1506892	Don't use an uninitialized field reserved for callers in the bio structure passed to swap_pager_strategy(). Instead, use a field reserved for drivers and initialize it before usage. Reviewed by: dillon	2001-10-15 23:02:54 +00:00
jhb	4806d88677	Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.	2001-10-11 23:38:17 +00:00
dillon	05c33a209b	Limit the amount of KVM reserved for the buffer cache and for swap-meta information. The default limits only effect machines with > 1GB of ram and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and kern.maxbcache. This has the effect of leaving more KVM available for sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad adds memory to a machine and then sees the kernel panic on boot due to running out of KVM. Also change the default swap-meta auto-sizing calculation to allocate half of what it was previously allocating. The prior defaults were way too high. Note that we cannot afford to run out of swap-meta structures so we still stay somewhat conservative here.	2001-08-20 00:41:12 +00:00
alfred	1d105403d0	Fixups for the initial allocation by dillon: 1) allocate fewer buckets 2) when failing to allocate swap zone, keep reducing the zone by a third rather than a half in order to reduce the chance of allocating way too little. I also moved around some code for readability. Suggested by: dillon Reviewed by: dillon	2001-08-02 07:54:58 +00:00
dillon	cbc4469f38	whitespace / register cleanup	2001-07-04 19:00:13 +00:00
dillon	e028603b7e	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
jhb	ce09fa9fbc	- Protect all accesses to nsw_[rw]count{,_{,a}sync} with the pbuf mutex. - Don't drop the vm mutex while grabbing the pbuf mutex to manipulate said variables.	2001-06-22 21:12:19 +00:00
jhb	38c4b5afb4	- Fix the sw_alloc_interlock to actually lock itself when the lock is acquired. - Assert Giant is held in the strategy, getpages, and putpages methods and the getchainbuf, flushchainbuf, and waitchainbuf functions. - Always call flushchainbuf() w/o the VM lock.	2001-05-23 22:31:15 +00:00
alfred	587340efa6	aquire Giant when playing with the buffercache and doing IO. use msleep against the vm mutex while waiting for a page IO to complete.	2001-05-23 10:28:11 +00:00
alfred	5382981252	aquire vm mutex in swp_pager_async_iodone. Don't call swp_pager_async_iodone with the mutex held.	2001-05-22 19:01:26 +00:00
alfred	a3f0842419	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
phk	16caeec9b0	Actually biofinish(struct bio , struct devstat , int error) is more general than the bioerror(). Most of this patch is generated by scripts.	2001-05-06 20:00:03 +00:00
alfred	e6ee97803e	Protect pager object creation with sx locks. Protect pager object list manipulation with a mutex. It doesn't look possible to combine them under a single sx lock because creation may block and we can't have the object list manipulation block on anything other than a mutex because of interrupt requests.	2001-04-18 20:24:16 +00:00
alfred	bbee48d66d	protect pbufs and associated counts with a mutex	2001-04-13 10:23:32 +00:00
rwatson	9a2d215a5f	Introduce per-swap area accounting in the VM system, and export this information via the vm.nswapdev sysctl (number of swap areas) and vm.swapdevX nodes (where X is the device), which contain the MIBs dev, blocks, used, and flags. These changes are required to allow top and other userland swap-monitoring utilities to run without setgid kmem. Submitted by: Thomas Moestl <tmoestl@gmx.net> Reviewed by: freebsd-audit	2001-02-23 18:46:21 +00:00
tanimura	a8dbeb3f7b	- If swap metadata does not fit into the KVM, reduce the number of struct swblock entries by dividing the number of the entries by 2 until the swap metadata fits. - Reject swapon(2) upon failure of swap_zone allocation. This is just a temporary fix. Better solutions include: (suggested by: dillon) o reserving swap in SWAP_META_PAGES chunks, and o swapping the swblock structures themselves. Reviewed by: alfred, dillon	2000-12-13 10:01:00 +00:00
dwmalone	dd75d1d73b	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
rwatson	4a8be80602	o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT. This removes a reason that systat requires setgid kmem. More to come.	2000-11-20 00:39:04 +00:00
dillon	2ace352085	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>	2000-11-18 23:06:26 +00:00
dillon	15a44d16ca	This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>	2000-11-18 21:01:04 +00:00
dillon	b9b696f1ab	The swap bitmap allocator was not calculating the bitmap size properly in the face of non-stripe-aligned swap areas. The bug could cause a panic during boot. Refuse to configure a swap area that is too large (67 GB or so) Properly document the power-of-2 requirement for SWB_NPAGES. The patch is slightly different then the one Tor enclosed in the P.R., but accomplishes the same thing. PR: kern/20273 Submitted by: Tor.Egge@fast.no	2000-10-13 16:44:34 +00:00
peter	ee5cd6988f	Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon	2000-05-21 12:50:18 +00:00
phk	36c3965ff9	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter	2000-05-05 09:59:14 +00:00
phk	c7feb17572	Convert the vm_pager_strategy() interface to take a struct bio instead of a struct buf. Don't try to examine B_ASYNC, it is a layering violation to do so. The only current user of this interface is vn(4) which, since it emulates a disk interface, operates on struct bio already.	2000-05-03 07:47:46 +00:00
phk	54f7afc04f	Move and staticize the bufchain functions so they become local to the only piece of code using them. This will ease a rewrite of them.	2000-05-01 19:38:51 +00:00
phk	aaaef0b54e	Complete the bio/buf divorce for all code below devfs::strategy Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS	2000-04-15 05:54:02 +00:00
phk	8ee11d587f	Move B_ERROR flag to b_ioflags and call it BIO_ERROR. (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.	2000-04-02 15:24:56 +00:00
dillon	5ccef75e02	Add necessary spl protection for swapper. The problem was located by Alfred while testing his SPLASSERT stuff. This is not a complete fix, more protections are probably needed.	2000-03-27 21:33:32 +00:00
charnier	5c16c2a8f1	Revert spelling mistake I made in the previous commit Requested by: Alan and Bruce	2000-03-27 20:41:17 +00:00
charnier	686df89909	Spelling	2000-03-26 15:20:23 +00:00
phk	1b817afc6d	Fix one place which knew that B_WRITE was zero. Fix a stylistic mistake of mine while here. Found by: Stephen Hocking <shocking@prth.pgs.com>	2000-03-22 08:40:13 +00:00
phk	5df766a0f8	Rename the existing BUF_STRATEGY() to DEV_STRATEGY() substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.	2000-03-20 11:29:10 +00:00
phk	a246e10f55	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.	2000-03-20 10:44:49 +00:00
phk	6b3385b773	Eliminate the undocumented, experimental, non-delivering and highly dangerous MAX_PERF option.	2000-03-16 08:51:55 +00:00
peter	38a9421a1e	Fix the swap backed vn case - this was broken by my rev 1.128 to swap_pager.c and related commits. Essentially swap_pager.c is backed out to before the changes, but swapdev_vp is converted into a real vnode with just VOP_STRATEGY(). It no longer abuses specfs vnops and no longer needs a dev_t and /dev/drum (or /dev/swapdev) for the intermediate layer. This essentially restores the vnode interface as the interface to the bottom of the swap pager, and vm_swap.c provides a clean vnode interface. This will need to be revisited when we swap to files (vnodes) - which is the other reason for keeping the vnode interface between the swap pager and the swap devices. OK'ed by: dillon	1999-12-28 07:30:55 +00:00
phk	84a3f8a8d2	Isolate the swapdev_vp "not quite" vnode in the only source file which needs it now that /dev/drum is gone. Reviewed by: eivind, peter	1999-11-22 15:27:09 +00:00
peter	bae4ed31fd	Remove the non-functional "swap device" userland front-end to the multiplexed underlying swap devices (/dev/drum). The only thing it did was to allow root to open /dev/drum, but not do anything with it. Various utilities used to grovel around in here, but Matt has written a much nicer (and clean) front-end to this for libkvm, and nothing uses the old system any more. The VM system was calling VOP_STRATEGY() on the vp of the first underlying swap device (not the /dev/drum one, the first real device), and using the VOP system to indirectly (and only) call swstrategy() to choose an underlying device and enqueue it on that device. I have changed it to avoid diverting through the VOP system and to call the only possible target directly, saving a little bit of time and some complexity. In all, nothing much changes, except some scaffolding to support the roundabout way of calling swstrategy() is gone. Matt gave me the ok to do this some time ago, and I apologize for taking so long to get around to it.	1999-11-18 06:55:40 +00:00
phk	8e3c3eafed	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
dillon	7a0052d268	Fix a number of spl bugs related to reserving and freeing swap space. Swap space can be freed from an interrupt and so swap reservation and freeing must occur at splvm. Add swap_pager_reserve() code to support a new swap pre-reservation capability for the VN device. Generally cleanup the swap code by simplifying the swp_pager_meta_build() static function and consolidating the SWAPBLK_NONE test from a bit test to an absolute compare. The bit test was left over from a rejected swap allocation scheme that was not ultimately committed. A few other minor cleanups were also made. Reorganize the swap strategy code, again for VN support, to not reallocate swap when writing as this messes up pre-reservation and can fragment I/O unnecessarily as VN-baesd disk is messed around with. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>	1999-09-17 05:09:24 +00:00
peter	3b842d34e8	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
bde	aff861d544	Use devtoname to print dev_t's instead of casting them to u_long for misprinting with %lx. Cast pointers to intptr_t instead of casting them to long. Cosmetic.	1999-08-23 23:55:03 +00:00
alc	fb63bd8ded	Correct an accidental omission of one "vm_page_undirty" replacement from the previous commit.	1999-08-17 05:56:00 +00:00
alc	075745f2e2	Add the (inline) function vm_page_undirty for clearing the dirty bitmask of a vm_page. Use it. Submitted by: dillon	1999-08-17 04:02:34 +00:00
alc	ab1033c835	Remove vm_object::last_read. It is used by the old swap pager, but not by the new one, i.e., vm/swap_pager.c rev 1.108. Reviewed by: dillon@backplane.com	1999-07-16 05:11:37 +00:00
peter	5d3d376190	Kirk missed a required BUF_KERNPROC(). Even though this is a non-async transfer, the b_iodone hook causes biodone() to release it from interrupt context.	1999-06-27 22:08:38 +00:00
mckusick	5b58f2f951	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.	1999-06-26 02:47:16 +00:00
phk	f57a01ebfc	remove b_proc from struct buf, it's (now) unused. Reviewed by: dillon, bde	1999-05-06 20:00:34 +00:00
julian	0c3f3973d2	Submitted by: Matt Dillon <dillon@freebsd.org> The old VN device broke in -4.x when the definition of B_PAGING changed. This patch fixes this plus implements additional capabilities. The new VN device can be backed by a file ( as per normal ), or it can be directly backed by swap. Due to dependencies in VM include files (on opt_xxx options) the new vn device cannot be a module yet. This will be fixed in a later commit. This commit delimitted by tags {PRE,POST}_MATT_VNDEV	1999-03-14 09:20:01 +00:00
dillon	fa06d8f968	Remove conditional sysctl's Leave swap_async_max sysctl intact, remove swap_cluster_max sysctl. Reviewed by: Alan Cox <alc@cs.rice.edu>	1999-02-21 08:34:15 +00:00
dillon	338a9c530d	Reviewed by: Alan Cox <alc@cs.rice.edu> Fix problem w/ low-swap/low-memory handling as reported by Bruce Evans.	1999-02-21 08:30:49 +00:00
dillon	03d8f2679e	Limit number of simultanious asynchronous swap pager I/Os that can be in progress at any given moment. Add two swap tuneables to sysctl: vm.swap_async_max: 4 vm.swap_cluster_max: 16 Recommended values are a cluster size of 8 or 16 pages. async_max is about right for 1-4 swap devices. Reduce to 2 if swap is eating too much bandwidth, or even 1 if swap is both eating too much bandwidth and sitting on a slow network (10BaseT). The defaults work well across a broad range of configurations and should normally be left alone.	1999-02-18 19:57:33 +00:00
dillon	638895c103	Add hysteresis to the 'swap_pager_getswapspace; failed' console message. Also widen the hysteresis levels a little ( these really should be dynamically configured ).	1999-02-06 07:22:21 +00:00
dillon	ca8ef4ff13	Remove unintended trigraph sequences in comments for -Wall	1999-01-27 18:19:53 +00:00
dillon	a4c067a459	Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL to use the vm_page_dirty() inline. The inline can thus do sanity checks ( or not ) over all cases.	1999-01-24 06:04:52 +00:00
dillon	9ed8ff237b	vm_pager_put_pages() is passed an rcval array to hold per-page return values. The 'int' return value for the procedure was never used and not well defined in any case when there are mixed errors on pages, so it has been removed. vm_pager_put_pages() and associated vm_pager functions now return void.	1999-01-24 02:32:15 +00:00
dillon	e3c11331d3	The default_pager's interaction with the swap_pager has been reorganized, and the swap_pager has been completely replaced. The new swap pager uses the new blist radix-tree based bitmap allocator for low level swap allocation and deallocation. The new allocator is effectively O(5) while the old one was O(N), and the new allocator allocates all required memory at init time rather then at allocate memory on the fly at run time. Swap metadata is allocated in clusters and stored in a hash table, eliminating linearly allocated structures. Many, many features have been rewritten or added. Swap space is now reallocated on the fly providing a poor-mans auto defragmentation of swap space. Swap space that is no longer needed is freed on a timely basis so no garbage collection is necessary. Swap I/O is marked B_ASYNC and NFS has been fixed to do the right thing with it, so NFS-based paging now has around 10x the performance as it did before ( previously NFS enforced synchronous I/O for paging ).	1999-01-21 09:33:07 +00:00
dillon	df24433bbe	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
eivind	89e1199534	KNFize, by bde.	1999-01-10 01:58:29 +00:00
eivind	a8dc66f457	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith	1999-01-08 17:31:30 +00:00
dt	065be55870	Don't free swap in swap_pager_getpages(): this code probably cause the "dying daemons" problem. (I thought this code was introduced in rev.1.80, but it just relaxed the condition.) Also, kill related "suggest more swap space" warning (also introduced in 1.80). It was confusing, to say the least... Requested by: msmith Not objected by: dg	1998-12-29 22:53:51 +00:00
bde	9094f7c05e	Fixed a null pointer panic in spc_free(). swap_pager_putpages() almost always causes this panic for the curproc != pageproc case. This case apparently doesn't happen in normal operation, but it happens when vm_page_alloc_contig() is called when there is a memory hogging application that hasn't already been paged out. PR: 8632 Reviewed by: info@opensound.com (Dev Mazumdar), dg Broken in: rev.1.89 (1998/02/23)	1998-11-19 06:20:42 +00:00
peter	8ef35acf90	Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.	1998-10-31 15:31:29 +00:00
dg	3defb6d13f	Fixed two potentially serious classes of bugs: 1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.	1998-10-13 08:24:45 +00:00
dfr	e2df972eb1	Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.	1998-09-04 08:06:57 +00:00
dfr	5fdaeb281d	Change various syscalls to use size_t arguments instead of u_int. Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>	1998-08-24 08:39:39 +00:00
dfr	a1b2079000	Protect all modifications to paging_in_progress with splvm().	1998-08-13 08:05:13 +00:00
bde	d7aa77e789	Fixed two spl nesting bugs. They caused (at least) the entire pageout daemon to run at splvm() forever after swap_pager_putpages() is called from vm_pageout_scan(). Broken in: rev.1.189 (1998/02/23)	1998-07-28 15:30:01 +00:00
bde	f0b863f4b5	Fixed printf format errors.	1998-07-11 07:46:16 +00:00
julian	4363221ba2	VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>	1998-07-04 20:45:42 +00:00
dyson	dfdb369a7d	Work around some VM bugs, the worst being an overly aggressive swap space free calculation. More complete fixes will be forthcoming, in a week.	1998-05-04 03:01:44 +00:00
dyson	b5a79794cd	Tighten up management of memory and swap space during map allocation, deallocation cycles. This should provide a measurable improvement on swap and memory allocation on loaded systems. It is unlikely a complete solution. Also, provide more map info with procfs. Chuck Cranor spurred on this improvement.	1998-04-29 04:28:22 +00:00
bde	b598f559b2	Support compiling with `gcc -ansi'.	1998-04-15 17:47:40 +00:00
dyson	8ceb6160f4	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.	1998-03-07 21:37:31 +00:00
dyson	69e5a1e9f5	1) Use a more consistent page wait methodology. 2) Do not unnecessarily force page blocking when paging pages out. 3) Further improve swap pager performance and correctness, including fixing the paging in progress deadlock (except in severe I/O error conditions.) 4) Enable vfs_ioopt=1 as a default. 5) Fix and enable the page prezeroing in SMP mode. All in all, SMP systems especially should show a significant improvement in "snappyness."	1998-03-01 04:18:54 +00:00
dyson	014146c040	Fix page prezeroing for SMP, and fix some potential paging-in-progress hangs. The paging-in-progress diagnosis was a result of Tor Egge's excellent detective work. Submitted by: Partially from Tor Egge.	1998-02-25 03:56:15 +00:00
dyson	c4e82fbab0	Significantly improve the efficiency of the swap pager, which appears to have declined due to code-rot over time. The swap pager rundown code has been clean-up, and unneeded wakeups removed. Lots of splbio's are changed to splvm's. Also, set the dynamic tunables for the pageout daemon to be more sane for larger systems (thereby decreasing the daemon overheadla.)	1998-02-23 08:22:48 +00:00
eivind	d7a6ab2803	Staticize.	1998-02-09 06:11:36 +00:00
eivind	4547a09753	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
eivind	c552a9a1c3	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
dyson	7fde1e8379	This fix should help the panic problems in -current. There were some errors in "interval" management. Due to the clustering mechanism, the code is necessarily complex and error prone.	1998-02-03 00:50:36 +00:00
dyson	ef6e7f7b8d	Fix a performance problem caused by an earlier commit.	1998-02-01 02:00:20 +00:00
dyson	2aacd1ab4f	Change the busy page mgmt, so that when pages are freed, they MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes." Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.	1998-01-31 11:56:53 +00:00
dyson	197bd655c4	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)	1998-01-22 17:30:44 +00:00
dyson	b130b30c96	Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.	1998-01-17 09:17:02 +00:00
dyson	d97fabbb53	Support running with inadequate swap space. Additionally, the code will complain with a suggestion of increasing it.	1997-12-24 15:05:25 +00:00
phk	a1bfb618d9	In all such uses of struct buf: 's/b_un.b_addr/b_data/g'	1997-12-02 21:07:20 +00:00
bde	a08aff2d02	Removed unused #includes.	1997-09-01 03:17:34 +00:00
bde	98fcb3f476	Print a device number in hex instead of decimal.	1997-09-01 02:28:32 +00:00
bde	c978fb3652	Fixed type mismatches for functions with args of type vm_prot_t and/or vm_inherit_t. These types are smaller than ints, so the prototypes should have used the promoted type (int) to match the old-style function definitions. They use just vm_prot_t and/or vm_inherit_t. This depends on gcc features to work. I fixed the definitions since this is easiest. The correct fix may be to change the small types to u_int, to optimize for time instead of space.	1997-08-25 22:15:31 +00:00
peter	94b6d72794	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.	1997-02-22 09:48:43 +00:00
jkh	808a36ef65	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.	1997-01-14 07:20:47 +00:00
dyson	73dafb0b2c	Prepare better for multi-platform by eliminating another required pmap routine (pmap_is_referenced.) Upper level recoded to use pmap_ts_referenced.	1997-01-11 07:19:02 +00:00
bde	2529e37cab	Removed __pure's and __pure2's. __pure is a no-op for recent versions of gcc by definition, and __pure2 is a no-op in effect (presumably the compiler can see when an inline function has no side effects).	1996-10-12 20:09:48 +00:00
dyson	62b009f8b1	Addition of page coloring support. Various levels of coloring are afforded. The default level works with minimal overhead, but one can also enable full, efficient use of a 512K cache. (Parameters can be generated to support arbitrary cache sizes also.)	1996-09-08 20:44:49 +00:00

1 2 3 4 5 ...

320 Commits