freebsd-nq

Author	SHA1	Message	Date
David Greenman	7c818168d5	Fixed a major bug that caused various pmap related panics, hangs, and reboots. The i386 pmap module uses a special area of kernel virtual memory for mapping of page tables pages when it needs to modify another process's virtual address space. It's called the 'alternate page table map'. There is only one of them and it's expected that only one process will be using it at once and that the operation is atomic. When the merged VM/buffer cache was implemented over a year ago, it became necessary to rundown VM pages at I/O completion. The unfortunate and unforeseen side effect of this is that pmap functions are now called at bio interrupt time. If there happend to be a process using the alternate page table map when this I/O completion occurred, it was possible for a different process's address space to be switched into the alternate page table map - leaving the current pmap process with the wrong address space mapped when the interrupt completed. This resulted in BAD things happening like pages being mapped or removed from the wrong address space, etc.. Since a very common case of a process modifying another process's address space is during fork when the kernel stack is inserted, one of the most common manifestations of this bug was the kernel stack not being mapped properly, resulting in a silent hang or reboot. This made it VERY difficult to troubleshoot this bug (I've been trying to figure out the cause of this for >6 months). Fortunately, the set of conditions that must be true before this problem occurs is sufficiently rare enough that most people never saw the bug occur. As I/O rates increase, however, so does the frequency of the crashes. This problem used to kill wcarchive about every 10 days, but in more recent times when the traffic exceeded >100GB/day, the machine could barely manage 6 hours of uptime. The fix is to make certain that no process has the pages mapped that are involved in the I/O, before the I/O is started. The pages are made busy, so no process will be able to map them, either, until the I/O has finished. This side-steps the issue by still allowing the pmap functions to be called at interrupt time, but also assuring that the alternate page table map won't be switched. Unfortunately, this appears to not be the only cause of this problem. :-( Reviewed by: dyson	1996-06-30 05:17:08 +00:00
Satoshi Asami	ad63a118b2	The Great PC98 Merge. All new code is "#ifdef PC98"ed so this should make no difference to PC/AT (and its clones) users. Ok'd by: core Submitted by: FreeBSD(98) development team	1996-06-14 11:02:28 +00:00
John Dyson	268e9c5397	Keep brelse from freeing busy pages.	1996-05-31 00:41:37 +00:00
John Dyson	301051a01e	Make sure that we don't place a busy or held page onto the PQ_CACHE queue.	1996-05-24 05:21:58 +00:00
John Dyson	b18bfc3da7	This set of commits to the VM system does the following, and contain contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>, Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me: More usage of the TAILQ macros. Additional minor fix to queue.h. Performance enhancements to the pageout daemon. Addition of a wait in the case that the pageout daemon has to run immediately. Slightly modify the pageout algorithm. Significant revamp of the pmap/fork code: 1) PTE's and UPAGES's are NO LONGER in the process's map. 2) PTE's and UPAGES's reside in their own objects. 3) TOTAL elimination of recursive page table pagefaults. 4) The page directory now resides in the PTE object. 5) Implemented pmap_copy, thereby speeding up fork time. 6) Changed the pv entries so that the head is a pointer and not an entire entry. 7) Significant cleanup of pmap_protect, and pmap_remove. 8) Removed significant amounts of machine dependent fork code from vm_glue. Pushed much of that code into the machine dependent pmap module. 9) Support more completely the reuse of already zeroed pages (Page table pages and page directories) as being already zeroed. Performance and code cleanups in vm_map: 1) Improved and simplified allocation of map entries. 2) Improved vm_map_copy code. 3) Corrected some minor problems in the simplify code. Implemented splvm (combo of splbio and splimp.) The VM code now seldom uses splhigh. Improved the speed of and simplified kmem_malloc. Minor mod to vm_fault to avoid using pre-zeroed pages in the case of objects with backing objects along with the already existant condition of having a vnode. (If there is a backing object, there will likely be a COW... With a COW, it isn't necessary to start with a pre-zeroed page.) Minor reorg of source to perhaps improve locality of ref.	1996-05-18 03:38:05 +00:00
Poul-Henning Kamp	aa8de40ae5	Another sweep over the pmap/vm macros, this time with more focus on the usage. I'm not satisfied with the naming, but now at least there is less bogus stuff around.	1996-05-03 21:01:54 +00:00
John Dyson	18ff64943e	Correct handling of dirty pages in I/O buffers. The case where pages residing in a buffer that had been dirtied by a process was being handled incorrectly. The pages were mistakenly placed into the cache queue. This would likely have the effect of mmaped page modifications being lost when I/O system calls were being used simultaneously to the same locations in a file. Submitted by: davidg	1996-03-09 06:46:51 +00:00
John Dyson	c735bcf57d	Fix the buffer queue problem differently. The previous fix could panic with a buffer not on queue panic.	1996-03-03 01:04:28 +00:00
John Dyson	6538dda3dc	1) Fix a bug that a buffer is removed from a queue, but the queue type is not set to QUEUE_NONE. This appears to have caused a hang bug that has been lurking. 2) Fix bugs that brelse'ing locked buffers do not "free" them, but the code assumes so. This can cause hangs when LFS is used. 3) Use malloced memory for directories when applicable. The amount of malloced memory is seriously limited, but should decrease the amount of memory used by an average directory to 1/4 - 1/2 previous. This capability is fully tunable. (Note that there is no config parameter, and might never be.) 4) Bias slightly the buffer cache usage towards non-VMIO buffers. Since the data in VMIO buffers is not lost when the buffer is reclaimed, this will help performance. This is adjustable also.	1996-03-02 04:40:56 +00:00
John Dyson	91477adc6e	Enable VMIO for non-VDIR metadata and block device.	1996-03-02 03:45:12 +00:00
John Dyson	bd7e5f992e	Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.	1996-01-19 04:00:31 +00:00
David Greenman	7548aeb5c5	Print out the queue index if it's found to be inconsistent.	1996-01-06 23:58:03 +00:00
David Greenman	2199f986f1	Rework vm_hold_{load,free}_pages to calculate an index once and use that. At the same time, be sure to page-truncate bp->b_data so that the result of the calculation isn't negative.	1996-01-06 23:23:02 +00:00
Garrett Wollman	8890984dc9	Convert BOUNCE_BUFFERS and BOUNCEPAGES to new option scheme.	1996-01-05 20:12:53 +00:00
David Greenman	a5782ecc36	Fixed minor struct cred leak. Discovered while looking for the opposite condition - too many frees, which has yet to be found. Reviewed by: dyson	1996-01-04 06:09:00 +00:00
Poul-Henning Kamp	87b6de2b76	A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.	1995-12-14 08:32:45 +00:00
John Dyson	1cdb60485c	Fix a problem that was caused by new (partial) support for merged cache metadata and VBLK type devices. The code is currently mostly disabled, and a work-around has been added to disabled attempted clustered writes for VBLK type device buffers. Clustered write of meta-data is currently a work in progress.	1995-12-13 03:47:01 +00:00
John Dyson	beb2f78fb0	This should have fixed some conditions that could cause the "getblk" hang. The B_WANTED flag was being cleared gratuitously, also the optimization of gbincore for ignoring the B_INVAL flag was incorrect. There is no place in the code where buffers are on the hash list that are B_INVAL and not B_BUSY.	1995-12-12 04:18:10 +00:00
John Dyson	a316d390bd	Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.	1995-12-11 04:58:34 +00:00
David Greenman	efeaf95a41	Untangled the vm.h include file spaghetti.	1995-12-07 12:48:31 +00:00
Poul-Henning Kamp	946bb7a268	A major sweep over the sysctl stuff. Move a lot of variables home to their own code (In good time before xmas :-) Introduce the string descrition of format. Add a couple more functions to poke into these marvels, while I try to decide what the correct interface should look like. Next is adding vars on the fly, and sysctl looking at them too. Removed a tine bit of defunct and #ifdefed notused code in swapgeneric.	1995-12-04 16:48:58 +00:00
Bruce Evans	98d938220c	Completed function declarations and/or added prototypes.	1995-12-02 18:58:56 +00:00
Bruce Evans	d841aaa740	Finished (?) cleaning up sysinit stuff.	1995-12-02 17:11:20 +00:00
John Dyson	5fe17eeb8a	General fixes to the vfs clustring code: 1) Make cluster buffer list be a non-malloced chain. This eliminates yet another 'evil' M_WAITOK and generally cleans up the code. 2) Fix write clustering for ext2fs. It was just broken. Also, ffs clustering had an efficiency problem that more bawrites were happening than should have been. 3) Make changes to buf.h to support the above, plus remove b_pfcent at the request of David Greenman. Reviewed by: davidg (partially)	1995-11-19 19:54:31 +00:00
John Dyson	0ada0d6748	Added a missing splx(s).	1995-11-18 23:33:48 +00:00
John Dyson	aef922f514	Greatly simplify the msync code. Eliminate complications in vm_pageout for msyncing. Remove a bug that manifests itself primarily on NFS (the dirty range on the buffers is not set on msync.)	1995-11-05 20:46:03 +00:00
Poul-Henning Kamp	a98ca4699e	Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.	1995-10-29 15:33:36 +00:00
John Dyson	6d875bf526	If we clear the B_CACHE flag because a buffer isn't composed fully of valid bytes, we must also clear the B_DONE flag. Some filesystems depend on this (incl NFS) and is probably the cause of the biodone error and subsequent crash. Anyway this change needs to be made.	1995-10-19 23:48:25 +00:00
Steven Wallace	ad7507e248	Remove prototype definitions from <sys/systm.h>. Prototypes are located in <sys/sysproto.h>. Add appropriate #include <sys/sysproto.h> to files that needed protos from systm.h. Add structure definitions to appropriate files that relied on sys/systm.h, right before system call definition, as in the rest of the kernel source. In kern_prot.c, instead of using the dummy structure "args", create individual dummy structures named <syscall>_args. This makes life easier for prototype generation.	1995-10-08 00:06:22 +00:00
David Greenman	7329854ac4	Two critical bugfixes: 1) "obj" was't initialized properly, resulting in an important vm_page_lookup always failing (resulting in a panic). 2) busy pages could be put on the cache queue or freed (resulting in a panic).	1995-10-01 05:50:27 +00:00
John Dyson	164fd96f34	These changes fix a bug in the clustering code that I made worse when adding support for EXT2FS. Note that the Sig-11 problems appear to be caused by this, but there is still probably an underlying VM problem that let this clustering bug cause vnode objects to appear to be corrupted. The direct manifestation of this bug would have been severely mis-read files. It is possible that processes would Sig-11 on very damaged input files and might explain the mysterious differences in system behaviour when phk's malloc is being used.	1995-09-23 21:12:45 +00:00
David Greenman	4590fd3a2a	Fixed init functions argument type - caddr_t -> void *. Fixed a couple of compiler warnings.	1995-09-09 18:10:37 +00:00
John Dyson	c83ebe7781	Added VOP_GETPAGES/VOP_PUTPAGES and also the "backwards" block count for VOP_BMAP. Updated affected filesystems...	1995-09-04 00:21:16 +00:00
John Dyson	8c601f7da8	Improvements to the cluster code, minor vfs_bio efficiency: Better performance -- more aggressive read-ahead under certain circumstanses. Mods to support clustering on small ( < PAGE_SIZE) block size filesystems (e.g. ext2fs, msdosfs.)	1995-09-03 19:56:15 +00:00
Julian Elischer	2b14f991e6	Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular.. NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases.. certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task) The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.	1995-08-28 09:19:25 +00:00
David Greenman	9f95e53cff	Another minor optimization, this time to incore().	1995-08-24 13:59:14 +00:00
David Greenman	ff3aaf2582	Minor optimization.	1995-08-24 13:28:16 +00:00
David Greenman	d63abe4172	Resize both VMIO and non-VMIO buffers if the size changes.	1995-08-06 12:10:39 +00:00
Bruce Evans	28f8db1403	Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.	1995-07-29 11:44:31 +00:00
David Greenman	23953f95f6	Killed bogus casts in tsleep/wakeup calls.	1995-07-25 05:41:57 +00:00
David Greenman	cd9015557e	Fixed broken offset use in vfs_unbusy_pages() which resulted in several different types of panics/inconsistencies with NFS clients. Cleared PG_WANTED where appropriate. Added checks for buffer busy in allocbuf and biodone. Reviewed by: John Dyson	1995-07-25 05:03:06 +00:00
David Greenman	23f762689d	Panic if no object in biodone. Slightly optimized allocbuf() again.	1995-07-24 03:16:41 +00:00
David Greenman	be49bd16c6	Added some additional diagnostic information output when panicing in biodone().	1995-07-23 19:37:52 +00:00
David Greenman	31de6175b7	Fixed two cases where some parans were missing, resulting in some bogus logic. Slightly simplified allocbuf().	1995-07-23 18:49:48 +00:00
David Greenman	44918dfed7	Re-lookup the buffer if the vnode isn't locked. The previous check for VBLK vnodes isn't adequate since all NFS nodes aren't locked, either. The result is a race condition that would lead to duplicate buffers at the same block offset. Submitted by: John Dyson	1995-07-21 04:55:45 +00:00
David Greenman	1ce781c3af	Fixed "bufspace" calculation. It was lossy in some circumstances of the buffer resizing and caused a "newbuf" deadlock. Reviewed by: John Dyson & David Greenman Submitted by: Peter Wemm	1995-07-17 06:26:07 +00:00
David Greenman	8393c48a22	Resize buffers if they aren't the correct size. Several months ago we made a change to NFS that caused buffers at EOF to be variable size. This had the undesired side-effect of breaking delayed writes on NFS. This fixes it. Submitted by: John Dyson	1995-07-15 16:01:46 +00:00
David Greenman	aa2cabb958	1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.	1995-06-28 12:01:13 +00:00
Rodney W. Grimes	9b2e535452	Remove trailing whitespace.	1995-05-30 08:16:23 +00:00
David Greenman	61f5d51062	Changes to fix the following bugs: 1) Files weren't properly synced on filesystems other than UFS. In some cases, this lead to lost data. Most likely would be noticed on NFS. The fix is to make the VM page sync/object_clean general rather than in each filesystem. 2) Mixing regular and mmaped file I/O on NFS was very broken. It caused chunks of files to end up as zeroes rather than the intended contents. The fix was to fix several race conditions and to kludge up the "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention to page modifications that occurred via the mmapping. Reviewed by: David Greenman Submitted by: John Dyson	1995-05-21 21:39:31 +00:00

1 2

94 Commits