freebsd-dev

Author	SHA1	Message	Date
Kirk McKusick	02b0085406	Prettyness police: Identify flags in b_xflags with BX_ to distinguish them from flags in b_flags which are prefixed with B_	1999-12-22 03:11:04 +00:00
Matthew Dillon	4f79d873c1	Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to madvise(). This feature prevents the update daemon from gratuitously flushing dirty pages associated with a mapped file-backed region of memory. The system pager will still page the memory as necessary and the VM system will still be fully coherent with the filesystem. Modifications made by other means to the same area of memory, for example by write(), are unaffected. The feature works on a page-granularity basis. MAP_NOSYNC allows one to use mmap() to share memory between processes without incuring any significant filesystem overhead, putting it in the same performance category as SysV Shared memory and anonymous memory. Reviewed by: julian, alc, dg	1999-12-12 03:19:33 +00:00
Eivind Eklund	6bdfe06ad9	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter	1999-12-11 16:13:02 +00:00
Matthew Dillon	245efbba4d	Remove vfs_getrootfsid() function (a temporary hack added a few months ago to make BOOTP work again). It is no longer required by BOOTP and no longer used.	1999-11-29 22:25:36 +00:00
Poul-Henning Kamp	38224dcd59	Convert various pieces of code to use vn_isdisk() rather than checking for vp->v_type == VBLK. In ccd: we don't need to call VOP_GETATTR to find the type of a vnode. Reviewed by: sos	1999-11-22 10:33:55 +00:00
Poul-Henning Kamp	0429e37ade	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967	1999-11-20 10:00:46 +00:00
Poul-Henning Kamp	1b7277516b	Commit the remaining part of PR14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914	1999-11-16 16:28:58 +00:00
Poul-Henning Kamp	698f9cf828	Next step in the device cleanup process. Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code. Unify spec_open() for bdev and cdev cases. Remove the disabled bdev specific read/write code.	1999-11-09 14:15:33 +00:00
Poul-Henning Kamp	923502ff91	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
Peter Wemm	d1f088dab5	Trim unused options (or #ifdef for undoc options). Submitted by: phk	1999-10-11 15:19:12 +00:00
Poul-Henning Kamp	aa4f4b695e	Move the buffered read/write code out of spec_{read\|write} and into two new functions spec_buf{read\|write}. Add sysctl vfs.bdev_buffered which defaults to 1 == true. This sysctl can be used to experimentally turn buffered behaviour for bdevs off. I should not be changed while any blockdevices are open. Remove the misplaced sysctl vfs.enable_userblk_io. No other changes in behaviour.	1999-10-04 11:23:10 +00:00
Poul-Henning Kamp	1b5464ef9d	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde	1999-09-29 20:05:33 +00:00
Matthew Dillon	40360b1bbb	Final commit to remove vnode->v_lastr. vm_fault now handles read clustering issues (replacing code that used to be in ufs/ufs/ufs_readwrite.c). vm_fault also now uses the new VM page counter inlines. This completes the changeover from vnode->v_lastr to vm_entry_t->v_lastr for VM, and fp->f_nextread and fp->f_seqcount (which have been in the tree for a while). Determination of the I/O strategy (sequential, random, and so forth) is now handled on a descriptor-by-descriptor basis for base I/O calls, and on a memory-region-by-memory-region and process-by-process basis for VM faults. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>	1999-09-21 00:36:16 +00:00
Poul-Henning Kamp	552f337f1f	Initialize vp->v_maxio to its default in getnetvnode() rather than four different places in vfs_cluster.c	1999-09-20 19:53:23 +00:00
Matthew Dillon	e6f7111170	Fix BOOTP root FS mounts. Also cleanup vfs_getnewfsid() and collapse addaliasu() into addalias() (no operational change) and clarify comments relating to a trick that vclean() uses. The fix to BOOTP is yet another hack. Actually, rootfsid handling is already a major hack. The whole thing needs to be cleaned up. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>	1999-09-19 06:24:21 +00:00
Matthew Dillon	bb01f28e97	Add vfs.enable_userblk_io sysctl to control whether user reads and writes to buffered block devices are allowed. The default is to be backwards compatible, i.e. reads and writes are allowed. The idea is for a larger crowd to start running with this disabled and see what problems, if any, crop up, and then to change the default to off and see if any problems crop up in the next 6 months prior to potentially removing support entirely. There are still a few people, Julian and myself included, who believe the buffered block device access from usermode to be useful. Remove use of vnode->v_lastr from buffered block device I/O in preparation for removal of vnode->v_lastr field, replacing it with the already existing seqcount metric to detect sequential operation. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>	1999-09-17 06:10:27 +00:00
Poul-Henning Kamp	d137accc89	Add dev_t freeing code. Controlled by sysctl debug.free_devt, default is off.	1999-08-29 09:09:12 +00:00
Poul-Henning Kamp	9626728875	remove unused variables.	1999-08-28 19:21:03 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Poul-Henning Kamp	dbafb3660f	Simplify the handling of VCHR and VBLK vnodes using the new dev_t: Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;	1999-08-26 14:53:31 +00:00
Poul-Henning Kamp	41d2e3e09e	Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.	1999-08-25 12:24:39 +00:00
Julian Elischer	0ff7b13acd	Make DEVFS use PHK's specinfo struct as the source of dev_t and devsw. In lookup() however it's the other way around as we need to supply the dev_t for the vnode, so devfs still has a copy of it stashed away. Sourcing it from the vnode in the vnops however is useful as it makes a lot of the code almost the same as that in specfs.	1999-08-25 04:55:20 +00:00
John Polstra	a2801b7731	Support full-precision file timestamps. Until now, only the seconds have been maintained, and that is still the default. A new sysctl variable "vfs.timestamp_precision" can be used to enable higher levels of precision: 0 = seconds only; nanoseconds zeroed (default). 1 = seconds and nanoseconds, accurate within 1/HZ. 2 = seconds and nanoseconds, truncated to microseconds. >=3 = seconds and nanoseconds, maximum precision. Level 1 uses getnanotime(), which is fast but can be wrong by up to 1/HZ. Level 2 uses microtime(). It might be desirable for consistency with utimes() and friends, which take timeval structures rather than timespecs. Level 3 uses nanotime() for the higest precision. I benchmarked levels 0, 1, and 3 by copying a 550 MB tree with "cpio -pdu". There was almost negligible difference in the system times -- much less than 1%, and less than the variation among multiple runs at the same level. Bruce Evans dreamed up a torture test involving 1-byte reads with intervening fstat() calls, but the cpio test seems more realistic to me. This feature is currently implemented only for the UFS (FFS and MFS) filesystems. But I think it should be easy to support it in the others as well. An earlier version of this was reviewed by Bruce. He's not to blame for any breakage I've introduced since then. Reviewed by: bde (an earlier version of the code)	1999-08-22 00:15:16 +00:00
Poul-Henning Kamp	7dc5cd047f	The bdevsw() and cdevsw() are now identical, so kill the former.	1999-08-13 10:29:38 +00:00
Poul-Henning Kamp	4d4f932326	s/v_specinfo/v_rdev/	1999-08-13 10:10:12 +00:00
Poul-Henning Kamp	0ef1c82630	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.	1999-08-08 18:43:05 +00:00
Alan Cox	6745299365	Add sysctl and support code to allow directories to be VMIO'd. The default setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon	1999-07-26 06:25:53 +00:00
Poul-Henning Kamp	698bfad7f2	Now a dev_t is a pointer to struct specinfo which is shared by all specdev vnodes referencing this device. Details: cdevsw->d_parms has been removed, the specinfo is available now (== dev_t) and the driver should modify it directly when applicable, and the only driver doing so, does so: vn.c. I am not sure the logic in checking for "<" was right before, and it looks even less so now. An intial pool of 50 struct specinfo are depleted during early boot, after that malloc had better work. It is likely that fewer than 50 would do. Hashing is done from udev_t to dev_t with a prime number remainder hash, experiments show no better hash available for decent cost (MD5 is only marginally better) The prime number used should not be close to a power of two, we use 83 for now. Add new checkalias2() to get around the loss of info from dev2udev() in bdevvp(); The aliased vnodes are hung on a list straight of the dev_t, and speclisth[SPECSZ] is unused. The sharing of struct specinfo means that the v_specnext moves into the vnode which grows by 4 bytes. Don't use a VBLK dev_t which doesn't make sense in MFS, now we hang a dummy cdevsw on B/Cmaj 253 so that things look sane. Storage overhead from all of this is O(50k). Bump __FreeBSD_version to 400009 The next step will add the stuff needed so device-drivers can start to hang things from struct specinfo	1999-07-20 09:47:55 +00:00
Poul-Henning Kamp	3de280c443	[click] Now all dev_t's in the kernel have their char device major. Only know casualy of this is swapinfo/pstat which should be fixes the right way: Store the actual pathname in the kernel like mount does. [Volounteers sought for this task] The road map from here is roughly: expand struct specinfo into struct based dev_t. Add dev_t registration facilities for device drivers and start to use them.	1999-07-19 09:37:59 +00:00
Poul-Henning Kamp	6ca5486476	Introduce the vn_todev(struct vnode*) function, which returns the dev_t corresponding to a VBLK or VCHR node, or NODEV.	1999-07-18 14:30:37 +00:00
Poul-Henning Kamp	c7119ea7dd	Fix 2nd arg to udev2dev().	1999-07-17 19:38:00 +00:00
Poul-Henning Kamp	f008cfcc1a	I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g	1999-07-17 18:43:50 +00:00
Kris Kennaway	e7647e6c20	Correct a couple of spelling errors in comments.	1999-07-12 15:02:51 +00:00
Kirk McKusick	ad8ac923fa	These changes appear to give us benefits with both small (32MB) and large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() PRIOR to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-08 06:06:00 +00:00
Kirk McKusick	e929c00d23	The buffer queue mechanism has been reformulated. Instead of having QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN, QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean and dirty buffers have been separated. Empty buffers with KVM assignments have been separated from truely empty buffers. getnewbuf() has been rewritten and now operates in a 100% optimal fashion. That is, it is able to find precisely the right kind of buffer it needs to allocate a new buffer, defragment KVM, or to free-up an existing buffer when the buffer cache is full (which is a steady-state situation for the buffer cache). Buffer flushing has been reorganized. Previously buffers were flushed in the context of whatever process hit the conditions forcing buffer flushing to occur. This resulted in processes blocking on conditions unrelated to what they were doing. This also resulted in inappropriate VFS stacking chains due to multiple processes getting stuck trying to flush dirty buffers or due to a single process getting into a situation where it might attempt to flush buffers recursively - a situation that was only partially fixed in prior commits. We have added a new daemon called the buf_daemon which is responsible for flushing dirty buffers when the number of dirty buffers exceeds the vfs.hidirtybuffers limit. This daemon attempts to dynamically adjust the rate at which dirty buffers are flushed such that getnewbuf() calls (almost) never block. The number of nbufs and amount of buffer space is now scaled past the 8MB limit that was previously imposed for systems with over 64MB of memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed somewhat. The number of physical buffers has been increased with the intention that we will manage physical I/O differently in the future. reassignbuf previously attempted to keep the dirtyblkhd list sorted which could result in non-deterministic operation under certain conditions, such as when a large number of dirty buffers are being managed. This algorithm has been changed. reassignbuf now keeps buffers locally sorted if it can do so cheaply, and otherwise gives up and adds buffers to the head of the dirtyblkhd list. The new algorithm is deterministic but not perfect. The new algorithm greatly reduces problems that previously occured when write_behind was turned off in the system. The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive P_BUFEXHAUST bit. This bit allows processes working with filesystem buffers to use available emergency reserves. Normal processes do not set this bit and are not allowed to dig into emergency reserves. The purpose of this bit is to avoid low-memory deadlocks. A small race condition was fixed in getpbuf() in vm/vm_pager.c. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-04 00:25:38 +00:00
Poul-Henning Kamp	8947a90a90	Make sure that stat(2) and friends always return a valid st_dev field. Pseudo-FS need not fill in the va_fsid anymore, the syscall code will use the first half of the fsid, which now looks like a udev_t with major 255.	1999-07-02 16:29:47 +00:00
Peter Wemm	9c8b8baa38	Slight reorganization of kernel thread/process creation. Instead of using SYSINIT_KT() etc (which is a static, compile-time procedure), use a NetBSD-style kthread_create() interface. kproc_start is still available as a SYSINIT() hook. This allowed simplification of chunks of the sysinit code in the process. This kthread_create() is our old kproc_start internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work the same as the NetBSD one. One thing I'd like to do shortly is get rid of nfsiod as a user initiated process. It makes sense for the nfs client code to create them on the fly as needed up to a user settable limit. This means that nfsiod doesn't need to be in /sbin and is always "available". This is a fair bit easier to do outside of the SYSINIT_KT() framework.	1999-07-01 13:21:46 +00:00
Kirk McKusick	67812eacd7	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.	1999-06-26 02:47:16 +00:00
Kirk McKusick	f9c8cab591	Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.	1999-06-16 23:27:55 +00:00
Kirk McKusick	e4ab40bcb6	Get rid of the global variable rushjob and replace it with a function in kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.	1999-06-15 23:37:29 +00:00
Poul-Henning Kamp	2447bec829	Simplify cdevsw registration. The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing. cdevsw_add() will print an message if the d_maj field looks bogus. Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL. Move bdevsw() and devsw() functions to kern/kern_conf.c Bump __FreeBSD_version to 400006 This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions if_xe.c bogusly accessed cdevsw[], author/maintainer please fix. I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.	1999-05-31 11:29:30 +00:00
John Birrell	02013ff886	Remove the test for bdevsw(dev) == NULL from bdevvp() because it fails if there is no character device associated with the block device. In this case that doesn't matter because bdevvp() doesn't use the character device structure. I can use the pointy bit of the axe too.	1999-05-24 00:34:10 +00:00
Luoqi Chen	0ce54cbb0c	Legally acquire a major number for mfs.	1999-05-14 20:40:23 +00:00
Kirk McKusick	eaea7a9e9f	Previously directories were sync'ed every 10 seconds while bitmaps & inodes were synced every 15 seconds. This is now reversed as during directory create, we cannot commit the directory entry until its inode has been written. With this switch, the inodes will be more likely to be written by the time that the directory is written thus reducing the number of directory rollbacks that are needed.	1999-05-14 01:29:21 +00:00
Peter Wemm	cc5881cff5	Fix (?) SPECHASH dev_t/major/minor/etc args	1999-05-12 19:06:40 +00:00
Poul-Henning Kamp	8bee45c44e	Don't peek into dev_t	1999-05-12 11:06:07 +00:00
Poul-Henning Kamp	bfbb9ce670	Divorce "dev_t" from the "major\|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).	1999-05-11 19:55:07 +00:00
Poul-Henning Kamp	1637aa4b1c	Fix some of the places where too much inside knowledge about major/minor layout and dev_t structure is being (ab)used.	1999-05-08 07:02:41 +00:00
Poul-Henning Kamp	4be2eb8c49	I got tired of seeing all the cdevsw[major(foo)] all over the place. Made a new (inline) function devsw(dev_t dev) and substituted it. Changed to the BDEV variant to this format as well: bdevsw(dev_t dev) DEVFS will eventually benefit from this change too.	1999-05-08 06:40:31 +00:00
Poul-Henning Kamp	46eede0058	Continue where Julian left off in July 1998: Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function. Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!) Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!) (Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)	1999-05-07 10:11:40 +00:00
Bill Fumerola	3d177f465a	Add sysctl descriptions to many SYSCTL_XXXs PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)	1999-05-03 23:57:32 +00:00
Julian Elischer	4ef2094e45	Reviewed by: Many at differnt times in differnt parts, including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)	1999-03-12 02:24:58 +00:00
Matthew Dillon	155f87daf2	Reviewed by: Julian Elischer <julian@whistle.com> Add d_parms() to {c,b}devsw[]. If non-NULL this function points to a device routine that will properly fill in the specinfo structure. vfs_subr.c's checkalias() supplies appropriate defaults. This change should be fully backwards compatible with existing devices.	1999-02-25 05:22:30 +00:00
Matthew Dillon	42e26d47bd	Protect vn worklist and vn->v_{clean,dirty}blkhd at splbio(). Get rid of extra LIST_REMOVE() Reviewed by: hsu@FreeBSD.ORG (Jeffrey Hsu), mckusick@McKusick.COM Submitted by: hsu@FreeBSD.ORG (Jeffrey Hsu), dillon@backplane.com ( Matthew Dillon )	1999-02-19 17:36:58 +00:00
Matthew Dillon	82b23b5384	vp->v_object must be valid after normal flow of vfs_object_create() completes, change if() to KASSERT(). This is not a bug, we are simplify clarifying and optimizing the code. In if/else in vfs_object_create(), the failure of both conditionals will lead to a NULL object. Exit gracefully if this case occurs. ( this case does not normally occur, but needed to be handled ). Obtained from: Eivind Eklund <eivind@FreeBSD.org>	1999-02-04 18:25:39 +00:00
Matthew Dillon	bc81493155	More const fixes for -Wall, -Wcast-qual	1999-01-29 23:18:50 +00:00
Matthew Dillon	8aef171243	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-28 00:57:57 +00:00
Matthew Dillon	1c7c3c6a86	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
Eivind Eklund	219cbf59f2	KNFize, by bde.	1999-01-10 01:58:29 +00:00
Eivind Eklund	5526d2d920	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith	1999-01-08 17:31:30 +00:00
Eivind Eklund	fb1167777a	Remove the 'waslocked' parameter to vfs_object_create().	1999-01-05 18:50:03 +00:00
Eivind Eklund	0df45b5a31	Finish staticization.	1999-01-05 18:12:29 +00:00
Bruce Evans	289bdf33d3	Ifdefed conditionally used simplock variables.	1999-01-02 11:34:57 +00:00
Bruce Evans	4d94881342	Restored rev.1.31 which was clobbered by rev.1.69 (the big Lite2 merge). This fixes at least hanging in revoke(2) when a somewhat active slave pty is revoked. The hang made the window for the null pointer bug in ufsspec_{read,write} much larger. There are many other bugs in this area (revoke of an active fifo at best leaks memory...).	1998-12-24 12:07:16 +00:00
Eivind Eklund	29c98cd8bf	Check return value of tsleep(). I've checked of all call points - there does not seem to be a problem with this. PR: kern/8732 Analysis by: David G Andersen <danderse@cs.utah.edu> Tested by: Alfred Perlstein <bright@hotjobs.com>	1998-12-22 00:44:11 +00:00
Eivind Eklund	db878ba480	Staticize.	1998-12-21 23:38:33 +00:00
Archie Cobbs	2127f26023	Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc. These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>	1998-12-04 22:54:57 +00:00
Peter Wemm	16e9e530cc	Convert lists for bufs attached to vnodes from a LIST to a TAILQ. - Use TAILQ_* macros extensively instead of internal names - use b_xflags instead of the NOLIST magic number hack in the next pointer - clean bufs are inserted at the tail rather than the head. - redo dirty buffer insert so that metadata (negative lbn) goes to the tail directly rather than at the HEAD. This makes a difference when inserting dirty data blocks in lbn sorted order since data block insertion will not have to bypass all the metadata cruft. data is lbn sorted since it makes sense for clustering and writeback ordering, while metadata sorting doesn't help much since the lbn's are meaningless when walking the list for writebacks. Small systems will not notice much (if any) benefit from this, but really busy systems with large dirty block lists should get a lot more. I've tested this with softdep, and it doesn't seem to mind the change of queueing of metadata. Reviewed (in princible) by: dg Obtained from: partly from John Dyson's work-in-progress patches in June.	1998-10-31 14:20:39 +00:00
Peter Wemm	b421db370b	The last argument to vm_object_page_clean() are now bit flags, rather than the old true/false. While here, have vfs_msync() only call vm_object_page_clean() with OBJPC_SYNC if called with MNT_WAIT flags. vfs_msync() is called at unmount time (with MNT_WAIT) and from the syncer process (formerly update). This should make dirty mmap writebacks a little less nasty. I have tested this a little with SOFTUPDATES enabled, but I don't normally use it since I've been badly burned too many times.	1998-10-31 07:42:04 +00:00
Bruce Evans	cbbbd4c330	Oops, rev.1.167 made the device number checking in bdevvp() too strict for mfs root mounts. Don't require major 255 to be in bdevsw[].	1998-10-29 11:50:32 +00:00
Peter Wemm	20f02ef5e9	Remove the V_SAVEMETA flag, nothing uses it any more now that msdosfs and ext2fs call vtruncbuf() directly. This simplifies and cleans up vinvalbuf() a little.	1998-10-29 09:51:28 +00:00
Bruce Evans	885bf0b57a	Updated the major number check in vfs_object_create(). It's not clear if the check is necessary, but vfs_object_create() is called for all vnodes and it was silly to create objects for VBLK vnodes that don't even have a driver.	1998-10-26 08:07:00 +00:00
Poul-Henning Kamp	f5ef029e92	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.	1998-10-25 17:44:59 +00:00
Bruce Evans	37906c686d	Fixed device number checking in bdevvp(): - dev != NODEV was checked for, but 0 was returned on failure. This was fixed in Lite2 (except the return code was still slightly wrong (ENODEV instead of ENXIO)) but the changes were not merged. This case probably doesn't actually occur under FreeBSD. - major(dev) was not checked to have a valid non-NULL bdevsw entry. This caused panics when the driver for the root device didn't exist. Fixed minor misformattings in bdevvp(). Rev.1.14 consisted mainly of gratuitous reformattings that seem to have caused many Lite2 merge errors. PR: 8417	1998-10-25 16:11:49 +00:00
Dmitrij Tejblum	f74d75a2b6	Backed out rev. 1.164. It caused problems on SMP. PR: 8309	1998-10-14 15:05:52 +00:00
David Greenman	6cde7a165f	Fixed two potentially serious classes of bugs: 1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.	1998-10-13 08:24:45 +00:00
Dmitrij Tejblum	9bbd8a2498	UnVMIO vnodes of block devices when they are no longer in use. (Some things, like msdosfs, do not work (panic) on devices with VMIO enabled. FFS enable VMIO on mounted devices, and nothing previously disabled it, so, after you mounted FFS floppy, you could not mount msdosfs floppy anymore...) This is mostly a quick before-release fix. Reviewed by: bde	1998-10-12 20:14:09 +00:00
Søren Schmidt	d024c95599	Remove the SLICE code. This clearly needs alot more thought, and we dont need this to hunt us down in 3.0-RELEASE.	1998-09-14 19:56:42 +00:00
Bruce Evans	500b04a257	Instantiate `nfs_mount_type' in a standard file so that it is present when nfs is an LKM. Declare it in a header file. Don't forget to use it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0 will soon be the mount type number for the first vfs loaded. NetBSD uses strcmp() to avoid this ugly global.	1998-09-05 15:17:34 +00:00
Bruce Evans	f5ce675296	Oops, the previous revision unconfigured too much pre-Lite2 compatibilty cruft. At least lsvfs(1) was broken.	1998-08-29 13:13:10 +00:00
Bruce Evans	13950bd2ed	Don't configure compatibility code for pre-Lite2 mount() calls by default. This code should go away soon.	1998-08-12 20:17:42 +00:00
Doug Rabson	7a6c46b55a	Initialise all the fields separately in vattr_null since on the alpha they are not all the same width.	1998-07-12 16:45:39 +00:00
Bruce Evans	ac1e407b32	Fixed printf format errors.	1998-07-11 07:46:16 +00:00
Bruce Evans	be160d60ab	Removed unused includes.	1998-06-21 18:02:50 +00:00
Julian Elischer	32f5d4d843	Replace 'sleep()' with 'tsleep()' Accidentally imported from Kirk's codebase. Pointed out by: various.	1998-06-10 22:02:14 +00:00
Julian Elischer	28913ebe4e	Submitted by: Kirk McKusick <mckusick@McKusick.COM> Fix for potential hang when trying to reboot the system or to forcibly unmount a soft update enabled filesystem. FreeBSD already handled the reboot case differently, this is however a better fix.	1998-06-10 18:13:19 +00:00
Doug Rabson	ecbb00a262	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.	1998-06-07 17:13:14 +00:00
Tor Egge	cb87a87c16	Supply the correct process argument to dounmount when possible.	1998-05-17 19:38:55 +00:00
Julian Elischer	3e425b968d	Add changes and code to implement a functional DEVFS. This code will be turned on with the TWO options DEVFS and SLICE. (see LINT) Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes. /dev will be automatically mounted by init (thanks phk) on bootup. See /sys/dev/slice/slice.4 for more info. All code should act the same without these options enabled. Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others This code does not support the following: bad144 handling. Persistance. (My head is still hurting from the last time we discussed this) ATAPI flopies are not handled by the SLICE code yet. When this code is running, all major numbers are arbitrary and COULD be dynamically assigned. (this is not done, for POLA only) Minor numbers for disk slices ARE arbitray and dynamically assigned.	1998-04-19 23:32:49 +00:00
Peter Wemm	37b8ccd37a	In vfs_msync(), test to see if the vnode being examined is "interesting" (ie: it has a vm_object attached and is marked as OBJ_MIGHTBEDIRTY) before attempting to lock it. This should reduce the cpu hit that is incurred when doing a sync(2) and when the syncer process is doing the 30-second writeback of dirty mmap() data to disk. Skip this speedup if we are doing an unmount() to be sure to get everything - we can afford to occasionally miss a msync while the system is running, but not at unmount. I'm not sure about the VXLOCK and MNT_WAIT case, it seems a bit odd to skip doing a page_clean at unmount time just because a vnode is VXLOCKed, but that's what was being done before...	1998-04-18 06:26:16 +00:00
Peter Wemm	efdc5523c0	When the softdep conversion took place, the periodic vfs_msync() from update got lost. This is responsible for ensuring that dirty mmap() pages get periodically written to disk. Without it, long time mmap's might not have their dirty pages written out at all of the system crashes or isn't cleanly shut down. This could be nasty if you've got a long-running writing via mmap(), dirty pages used to get written to disk within 30 seconds or so.	1998-04-16 03:31:26 +00:00
Tor Egge	71033a8c50	Unlock mountlist_slock if the mount point was busy (unmount in progress) during the attempt at lazy fsync.	1998-04-15 18:37:49 +00:00
Poul-Henning Kamp	227ee8a188	Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde	1998-03-30 09:56:58 +00:00
Bruce Evans	3c1300a6b3	Removed unused #includes.	1998-03-28 13:25:01 +00:00
Bruce Evans	771b51ef7b	Don't depend on <sys/mount.h> including <sys/socket.h>.	1998-03-28 12:04:40 +00:00
John Dyson	52c64c95c5	In kern_physio.c fix tsleep priority messup. In vfs_bio.c, remove b_generation count usage, remove redundant reassignbuf, remove redundant spl(s), manage page PG_ZERO flags more correctly, utilize in invalid value for b_offset until it is properly initialized. Add asserts for #ifdef DIAGNOSTIC, when b_offset is improperly used. when a process is not performing I/O, and just waiting on a buffer generally, make the sleep priority low. only check page validity in getblk for B_VMIO buffers. In vfs_cluster, add b_offset asserts, correct pointer calculation for clustered reads. Improve readability of certain parts of the code. Remove redundant spl(s). In vfs_subr, correct usage of vfs_bio_awrite (From Andrew Gallatin <gallatin@cs.duke.edu>). More vtruncbuf problems fixed.	1998-03-19 22:48:16 +00:00
John Dyson	1c77c6b7b0	Fix an embarassing problem in vtruncbuf.	1998-03-19 18:46:58 +00:00
John Dyson	2deb5d0417	Correct a severely evil bug in the vtruncbuf code. It didn't cause me any problems until after the previous commit. This problem then caused a severe case of creeping crud on my diskdrive, and hosed my system so bad, that I needed to do a complete reinstall. Sorry!!! I assume that others have manifest this bug.	1998-03-17 06:30:52 +00:00
John Dyson	e85c1afb7c	Allow vfs_ioopt to be enabled with a (temporary) config option.	1998-03-16 02:13:03 +00:00
John Dyson	bef608bd7e	Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!! pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.) pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances. vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.) vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust. vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true. vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.) vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities. vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.) vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.) 10) Modify FFS to use vtruncbuf. vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.) vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly. vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.) vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.	1998-03-16 01:56:03 +00:00
John Dyson	26300b34f1	Disable the vfs.ioopt option for now, so that we don't get gratuitious bugreports. I might not be able to fix the problems before 3.0, due to other, more important things.	1998-03-14 19:50:36 +00:00
Tor Egge	8293f20aee	Don't misuse vnode interlocks in routines that can be called from interrupts. PR: 5893	1998-03-14 02:55:01 +00:00
Julian Elischer	b1897c197c	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree	1998-03-08 09:59:44 +00:00
John Dyson	8f9110f6a1	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.	1998-03-07 21:37:31 +00:00
John Dyson	59228495d7	Change vfs.ioopt default back to '0'.	1998-03-01 23:07:45 +00:00
John Dyson	ffc82b0a70	1) Use a more consistent page wait methodology. 2) Do not unnecessarily force page blocking when paging pages out. 3) Further improve swap pager performance and correctness, including fixing the paging in progress deadlock (except in severe I/O error conditions.) 4) Enable vfs_ioopt=1 as a default. 5) Fix and enable the page prezeroing in SMP mode. All in all, SMP systems especially should show a significant improvement in "snappyness."	1998-03-01 04:18:54 +00:00
John Dyson	64d3c7e32d	Clean-up the vget mechanism by permanently attaching VM objects to vnodes, therefore vget doesn't need to do so anymore. Other minor improvements include the temp free vnode queue obeying the VAGE flag and a printf that warns of to-be-removed code being executed.	1998-02-23 06:59:52 +00:00
KATO Takenori	1b11919b2b	Fixed vnode interlock handling. Reviewed by: Bruce Evans <bde@zeta.org.au> Tor Egge <Tor.Egge@idi.ntnu.no>	1998-02-10 02:54:24 +00:00
Eivind Eklund	303b270b0a	Staticize.	1998-02-09 06:11:36 +00:00
KATO Takenori	16e3b0b67a	When the vp is lcoked, vget() calls vfs_object_create() with waslocked = TRUE. This change may fix lockmgr panic in umapfs/nullfs. PR: 5634 Reviewed by: "John S. Dyson" <toor@dyson.iquest.net> Suggested by: Bruce Evans <bde@zeta.org.au>	1998-02-07 08:44:31 +00:00
Eivind Eklund	0b08f5f737	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
John Dyson	95461b450d	1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.	1998-02-05 03:32:49 +00:00
Eivind Eklund	47cfdb166d	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
Tor Egge	d09a16d804	Update freevnodes when adding a vnode to the head of the free list.	1998-01-31 01:17:58 +00:00
John Dyson	50ce7ff499	Add better support for larger I/O clusters, including larger physical I/O. The support is not mature yet, and some of the underlying implementation needs help. However, support does exist for IDE devices now.	1998-01-24 02:01:46 +00:00
John Dyson	2d8acc0f4a	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)	1998-01-22 17:30:44 +00:00
John Dyson	4722175765	Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.	1998-01-17 09:17:02 +00:00
John Dyson	53f6f08545	Fix another vnode leak.	1998-01-12 03:15:01 +00:00
John Dyson	925a3a419a	Fix some vnode management problems, and better mgmt of vnode free list. Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock. PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.	1998-01-12 01:46:33 +00:00
John Dyson	857d737ed6	Disable io optimizations again, minor bug found, and will be fixed in a few days.	1998-01-07 09:26:29 +00:00
John Dyson	95e5e988e0	Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.	1998-01-06 05:26:17 +00:00
John Dyson	483140ead1	Add the vnode interlock back around vref.	1997-12-29 16:54:03 +00:00
John Dyson	60f8d46448	Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem with the object cache removal.	1997-12-29 01:03:55 +00:00
John Dyson	2be70f79f6	Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.	1997-12-29 00:25:11 +00:00
John Dyson	1efb74fbcc	Some performance improvements, and code cleanups (including changing our expensive OFF_TO_IDX to btoc whenever possible.)	1997-12-19 09:03:37 +00:00
Garrett Wollman	1cbbd625cc	Add support for poll(2) on files. vop_nopoll() now returns POLLNVAL if one of the new poll types is requested; hopefully this will not break any existing code. (This is done so that programs have a dependable way of determining whether a filesystem supports the extended poll types or not.) The new poll types added are: POLLWRITE - file contents may have been modified POLLNLINK - file was linked, unlinked, or renamed POLLATTRIB - file's attributes may have been changed POLLEXTEND - file was extended Note that the internal operation of poll() means that it is impossible for two processes to reliably poll for the same event (this could be fixed but may not be worth it), so it is not possible to rewrite `tail -f' to use poll at this time.	1997-12-15 03:09:59 +00:00
Bruce Evans	cb451ebdbd	Staticized.	1997-11-22 08:35:46 +00:00
Julian Elischer	b1f4a44b03	Reviewed by: various. Ever since I first say the way the mount flags were used I've hated the fact that modes, and events, internal and exported, and short-term and long term flags are all thrown together. Finally it's annoyed me enough.. This patch to the entire FreeBSD tree adds a second mount flag word to the mount struct. it is not exported to userspace. I have moved some of the non exported flags over to this word. this means that we now have 8 free bits in the mount flags. There are another two that might well move over, but which I'm not sure about. The only user visible change would have been in pstat -v, except that davidg has disabled it anyhow. I'd still like to move the state flags and the 'command' flags apart from each other.. e.g. MNT_FORCE really doesn't have the same semantics as MNT_RDONLY, but that's left for another day.	1997-11-12 05:42:33 +00:00
Poul-Henning Kamp	4a11ca4e29	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused	1997-11-07 08:53:44 +00:00
Poul-Henning Kamp	dba3870c10	VFS interior redecoration. Rename vn_default_error to vop_defaultop all over the place. Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite. Use vop_null instead of nullop. Move vop_nopoll from vfs_subr.c to vfs_default.c Move vop_sharedlock from vfs_subr.c to vfs_default.c Move vop_nolock from vfs_subr.c to vfs_default.c Move vop_nounlock from vfs_subr.c to vfs_default.c Move vop_noislocked from vfs_subr.c to vfs_default.c Use vop_ebadf instead of *_ebadf. Add vop_defaultop for getpages on master vnode in MFS.	1997-10-26 20:55:39 +00:00
Poul-Henning Kamp	a1c995b626	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde	1997-10-12 20:26:33 +00:00
Poul-Henning Kamp	55166637cd	Distribute and statizice a lot of the malloc M_* types. Substantial input from: bde	1997-10-11 18:31:40 +00:00
Poul-Henning Kamp	f7891f9adb	Dike out a weird warning.	1997-10-11 07:34:27 +00:00
Poul-Henning Kamp	d047b580c6	I lost a bit of my change in the last commit, this is more like it. Noticed by: bde	1997-09-26 08:08:58 +00:00
Poul-Henning Kamp	87b1940afa	Reduce the target number of vnodes on the freelist from desiredvnodes (usually a couple of thousand) to 25. The measured impact on cache-hits doesn't justify spending memory this way: Target number of free vnodes versus namecache hit rate in % during a make world: 10 98.5316 200 98.5479 500 98.5546 1000 98.5709 3000 98.6006 4000 98.6126	1997-09-25 16:17:57 +00:00
Poul-Henning Kamp	0054419366	A couple of handles to tweak, more statistics.	1997-09-24 07:46:54 +00:00
Bruce Evans	514ede0953	Fixed gratuitous ANSIisms.	1997-09-16 11:44:05 +00:00
Peter Wemm	7fab77996c	Provide a 'return true' poll vnode op rather than duplicating the 'do nothing' case all over the various filesystems.	1997-09-14 02:49:06 +00:00
Peter Wemm	557fe2c5e1	print correct function name in a panic (vop_nolock -> vop_sharedlock)	1997-09-13 15:02:28 +00:00
Bruce Evans	41fadeeb28	Removed yet more vestiges of config-time swap configuration and/or cleaned up nearby cruft.	1997-09-07 16:21:11 +00:00
Bruce Evans	a910e75cdc	Removed vestiges of config-time "argument processing" configuration.	1997-09-07 13:49:56 +00:00
Poul-Henning Kamp	fd9d9ff13e	Hmm, this is hopefully better.	1997-09-03 13:29:41 +00:00
Poul-Henning Kamp	7cb22688e9	Revert the v_usecount handling in relation to VOP_INACTIVE.	1997-09-03 09:18:48 +00:00
Bruce Evans	e4ba6a82b0	Removed unused #includes.	1997-09-02 20:06:59 +00:00
Poul-Henning Kamp	a051452ae2	Change the 0xdeadb hack to a flag called VDOOMED. Introduce VFREE which indicates that vnode is on freelist. Rename vholdrele() to vdrop(). Create vfree() and vbusy() to add/delete vnode from freelist. Add vfree()/vbusy() to keep (v_holdcnt != 0 \|\| v_usecount != 0) vnodes off the freelist. Generalize vhold()/v_holdcnt to mean "do not recycle". Fix reassignbuf()s lack of use of vhold(). Use vhold() instead of checking v_cache_src list. Remove vtouch(), the vnodes are always vget'ed soon enough after for it to have any measuable effect. Add sysctl debug.freevnodes to keep track of things. Move cache_purge() up in getnewvnodes to avoid race. Decrement v_usecount after VOP_INACTIVE(), put a vhold() on it during VOP_INACTIVE() Unmacroize vhold()/vdrop() Print out VDOOMED and VFREE flags (XXX: should use %b) Reviewed by: dyson	1997-08-31 07:32:39 +00:00
Bruce Evans	d3114049c1	Restored rev.1.92 which was clobbered by the previous commit.	1997-08-26 11:59:20 +00:00
John Dyson	a5db4bf475	Back out some incorrect changes that was worse than the original bug.	1997-08-26 04:36:27 +00:00
John Dyson	89721f6f1a	This is a trial improvement for the vnode reference count while on the vnode free list problem. Also, the vnode age flag is no longer used by the vnode pager. (It is actually incorrect to use then.) Constructive feedback welcome -- just be kind.	1997-08-22 03:56:37 +00:00
Bruce Evans	b1037dcd53	#include <machine/limits.h> explicitly in the few places that it is required.	1997-08-21 20:33:42 +00:00
Garrett Wollman	57bf258e3d	Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.	1997-08-16 19:16:27 +00:00
John Dyson	23be6be853	Fix a problem with the vfs vnode caching that it doesn't grow quickly enough and can cause some strange performance problems. Specifically, at or near startup time is when the problem is worst. To reproduce the problem, run "lat_syscall stat" from the alpha lmbench code right after bootup. A positive side effect of this mod is that the name cache can be set to grow again by sysctl. A noticable positive performance impact is realized due to a larger namecache being available as needed (or tuned.)	1997-08-04 07:43:28 +00:00
Doug Rabson	f6b4c28555	Merge WebNFS support from NetBSD Obtained from: NetBSD	1997-07-17 07:17:33 +00:00
John Dyson	3c631446d3	Remove a window during running down a file vnode. Also, the OBJ_DEAD flag wasn't being respected during vref(), et. al. Note that this isn't the eventual fix for the locking problem. Fine grained SMP in the VM and VFS code will require (lots) more work.	1997-06-22 03:00:24 +00:00
David Greenman	2e58c0f892	Disabled the kern.vnode sysctl variable. It's causing system crashes on large systems and needs to be re-thinked or removed wholesale.	1997-06-10 02:48:08 +00:00
Poul-Henning Kamp	8670684a2f	Fix a race condition that did, after all, exist. Reviewed by: phk Submitted by: dfr	1997-05-06 15:19:38 +00:00
Poul-Henning Kamp	b15a966ec6	1. Add a {pointer, v_id} pair to the vnode to store the reference to the ".." vnode. This is cheaper storagewise than keeping it in the namecache, and it makes more sense since it's a 1:1 mapping. 2. Also handle the case of "." more intelligently rather than stuff the namecache with pointless entries. 3. Add two lists to the vnode and hang namecache entries which go from or to this vnode. When cleaning a vnode, delete all namecache entries it invalidates. 4. Never reuse namecache enties, malloc new ones when we need it, free old ones when they die. No longer a hard limit on how many we can have. 5. Remove the upper limit on namelength of namecache entries. 6. Make a global list for negative namecache entries, limit their number to a sysctl'able (debug.ncnegfactor) fraction of the total namecache. Currently the default fraction is 1/16th. (Suggestions for better default wanted!) 7. Assign v_id correctly in the face of 32bit rollover. 8. Remove the LRU list for namecache entries, not needed. Remove the #ifdef NCH_STATISTICS stuff, it's not needed either. 9. Use the vnode freelist as a true LRU list, also for namecache accesses. 10. Reuse vnodes more aggresively but also more selectively, if we can't reuse, malloc a new one. There is no longer a hard limit on their number, they grow to the point where we don't reuse potentially usable vnodes. A vnode will not get recycled if still has pages in core or if it is the source of namecache entries (Yes, this does indeed work :-) "." and ".." are not namecache entries any longer...) 11. Do not overload the v_id field in namecache entries with whiteout information, use a char sized flags field instead, so we can get rid of the vpid and v_id fields from the namecache struct. Since we're linked to the vnodes and purged when they're cleaned, we don't have to check the v_id any more. 12. NFS knew about the limitation on name length in the namecache, it shouldn't and doesn't now. Bugs: The namecache statistics no longer includes the hits for ".." and "." hits. Performance impact: Generally in the +/- 0.5% for "normal" workstations, but I hope this will allow the system to be selftuning over a bigger range of "special" applications. The case where RAM is available but unused for cache because we don't have any vnodes should be gone. Future work: Straighten out the namecache statistics. "desiredvnodes" is still used to (bogusly ?) size hash tables in the filesystems. I have still to find a way to safely free unused vnodes back so their number can shrink when not needed. There is a few uses of the v_id field left in the filesystems, scheduled for demolition at a later time. Maybe a one slot cache for unused namecache entries should be implemented to decrease the malloc/free frequency.	1997-05-04 09:17:38 +00:00
John Dyson	82b8e119b4	Staticize an unnecessarily global function: vputrele. Submitted by: Michael Hancock <michaelh@cet.co.jp>	1997-04-30 03:09:15 +00:00
Peter Wemm	5f61c81d66	copyin the export network mask to the correct variable. Submitted by: Mike Hibler <mike@marker.cs.utah.edu>, PR#3380	1997-04-25 06:47:12 +00:00
Doug Rabson	de15ef6aef	Add a function vop_sharedlock which a copy of vop_nolock without the implementation #ifdef out. This can be used for now by NFS. As soon as all the other filesystems' locking is fixed, this can go away. Print the vnode address in vprint for easier debugging.	1997-04-04 17:46:21 +00:00
Bruce Evans	0f1adf65ab	Use OID_AUTO instead of magic number for the Lite2 sysctl debug.busyprt. Removed declaration of vfs_unmountroot() again. Staticized vgonel().	1997-04-01 13:05:34 +00:00
David Greenman	2f2160da3b	Fixed splbio problems in vinvalbuf. Closes PR#2875, although fixed differently by me.	1997-03-05 04:54:54 +00:00
Bruce Evans	4a8b966013	Attach vfs_sysctl() one level lower so that only the levels below VFS_GENERIC aren't done in the FreeBSD way. The previous commit broke the nfs sysctls.	1997-03-04 18:31:56 +00:00
Bruce Evans	3a76a5949b	Merged Lite2's vfs_sysctl(). It doesn't fit very well into FreeBSD's (phk's) sysctl framework, and I needed special code to disambiguate the VFS_GENERIC node from the VFS_VFSCONF leaf, so I only converted the leaves to the FreeBSD framework. The error handling isn't quite right. CSRGS's sysctls seem to return ENOTDIR too much and FreeBSD's sysctls don't agree with the man page.	1997-03-03 12:58:20 +00:00
Bruce Evans	dc91a89e83	Restored some pre-Lite2-merge source-level compatibility to the mount() and getvfsbyname() interfaces. The new interfaces are now hidden from applications unless _NEW_VFSCONF is defined. The new vfsconf interfaces don't work yet.	1997-03-02 17:53:37 +00:00
Bruce Evans	a896f0256c	Moved vfs sysctls to where Lite2 put them. No code changes yet.	1997-03-02 11:06:22 +00:00
Bruce Evans	b98afd0d00	Fixed Lite2 merge of spechash simplelocking. It was misplaced in checkalias() and missing in vfinddev() and vcount().	1997-02-27 16:08:43 +00:00
John Dyson	fd7f690f94	Fix the previous simple_lock fix breakage in the combined vput/vrele routine. Fix a panic message. Fix the vop_nounlock routine so that "special" filesystems that use it work correctly.	1997-02-27 05:28:58 +00:00
John Dyson	0d955f71a1	Fix the simple_lock problem with the physical I/O buffer code, and also fix the missing simple_unlock in vrele, and improve vrele/vput by merging them into one routine. BDE pointed these problems out.	1997-02-27 02:57:03 +00:00
Bruce Evans	7c1557c4af	Fixed unmounting of the root fs. vfs_unmountroot() wasn't fully updated to do Lite2 locking and vfs_unmountall() wasn't as simple as the Lite2 version.	1997-02-26 15:35:42 +00:00
Bruce Evans	c35e283a80	Merged some missing locking from Lite2: - getnewvnode() and vref() were missing one simple_unlock() each. - the Lite2 locking changes weren't merged at all in printlockedvnodes() or sysctl_vnode(). Merging these undid some KNF style regressions.	1997-02-25 19:33:23 +00:00
Peter Wemm	6875d25465	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.	1997-02-22 09:48:43 +00:00
John Dyson	996c772f58	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>	1997-02-10 02:22:35 +00:00
Bruce Evans	5131d64e0c	Removed option EXTRAVNODES. All versions of FreeBSD-2.x have a sysctl variable `kern.maxvnodes' which gives much better control over vnode allocation than EXTRAVNODES (except in -current between 1995/10/28 and 1996/11/12, kern.maxvnodes was read-only and thus useless).	1997-01-16 13:16:10 +00:00
Jordan K. Hubbard	1130b656e5	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.	1997-01-14 07:20:47 +00:00
John Dyson	8b612c4b4a	This commit is the embodiment of some VFS read clustering improvements. Firstly, now our read-ahead clustering is on a file descriptor basis and not on a per-vnode basis. This will allow multiple processes reading the same file to take advantage of read-ahead clustering. Secondly, there previously was a problem with large reads still using the ramp-up algorithm. Of course, that was bogus, and now we read the entire "chunk" off of the disk in one operation. The read-ahead clustering algorithm should use less CPU than the previous also (I hope :-)). NOTE: THAT LKMS MUST BE REBUILT!!!	1996-12-29 02:45:28 +00:00
Bruce Evans	b83ddf9c86	Restored writability of kern.maxvnodes. It was broken a year ago in rev.1.29 of kern_sysctl.c. Should be in 2.2.	1996-11-12 09:24:31 +00:00
Poul-Henning Kamp	19060a3ad9	init_main.c: pass -d to init if DEVFS_ROOT kern_conf.c: gd driver is a disk. vfs_subr.c: include opt_devfs.h	1996-10-28 11:34:57 +00:00
Jordan K. Hubbard	0082fb4657	I'm not sure why, but Netcon's TFS filesystem code doesn't want to add free vnodes back to the freelist. They must do their own vnode management. Anyway, this change is only activated with their filesystem and doesn't affect anyone else. Whoops, forgot the submitted-by lines in my previous commits too.. :-( Submitted-By: Tony Ardolino <tony@netcon.com>	1996-10-17 17:56:07 +00:00
John Dyson	ad98052216	Clean up the rundown of the object backing a vnode. This should fix NFS problems associated with forcible dismounts.	1996-10-17 02:49:35 +00:00
John Dyson	a8f42fa9a6	Correct vget by removing a window where a vnode can potentially go away.	1996-09-28 03:36:07 +00:00
Nate Williams	030e2e9ebb	In sys/time.h, struct timespec is defined as: /* * Structure defined by POSIX.4 to be like a timeval. / struct timespec { time_t ts_sec; / seconds / long ts_nsec; / and nanoseconds */ }; The correct names of the fields are tv_sec and tv_nsec. Reminded by: James Drobina <jdrobina@infinet.com>	1996-09-19 18:21:32 +00:00
John Dyson	6476c0d204	Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct. This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc. Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility. Reviewed by: davidg	1996-08-21 21:56:23 +00:00
John Dyson	619594e898	Certain vnode buffer list operations were not being spl protected, and they needed to be. Brelse for example can be called at interrupt level, and the buffer list operations were not being protected from it.	1996-08-15 06:45:01 +00:00
Bruce Evans	8c2ff39670	Only use the special bdevvp() for DEVFS if DEVFS_ROOT is defined. This makes option DEVFS safe to use again (although mounting devfs is unsafe).	1996-07-30 18:00:32 +00:00
Poul-Henning Kamp	e83cf165d6	DEVFS needs a special bdevvp().	1996-07-24 21:21:43 +00:00
Bruce Evans	cba2a7c614	Staticized a few variables. Fixed warnings about unused variables.	1996-07-12 07:41:34 +00:00
Peter Wemm	114a8cff43	Add an option "EXTRA_VNODES" to cause an extra number of vnode structures to be allocated at boot time. This is an expensive option, as they consume physical ram and are not pageable etc. In certain situations, this kind of option is quite useful, especially for news servers that access a large number of directories at random and torture the name cache. Defining 5000 or 10000 extra vnodes should cut down the amount of vnode recycling somewhat, which should allow better name and directory caching etc. This is a "your mileage may vary" option, with no real indication of what works best for your machine except trial and error. Too many will cost you ram that you could otherwise use for disk buffers etc. This is based on something John Dyson mentioned to me a while ago.	1996-05-31 00:20:34 +00:00
John Dyson	e5fadd05f2	Put the "free vnode isn't" check back in the right place.	1996-03-09 06:43:19 +00:00
John Dyson	bd7e5f992e	Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.	1996-01-19 04:00:31 +00:00
Garrett Wollman	0e41ee3037	Convert DDB to new-style option.	1996-01-04 21:13:23 +00:00
David Greenman	864ef7d17e	Moved the #ifdef DIAGNOSTIC in vrele() so that the check for negative v_usecount is always performed and only the call to vprint is conditional.	1996-01-02 18:13:20 +00:00
Poul-Henning Kamp	27a0b398a7	Staticize. Unstaticize a function in scsi/scsi_base that was used, with an undocumented option. My last count on the LINT kernel shows: Total symbols: 3647 unref symbols: 463 undef symbols: 4 1 ref symbols: 1751 2 ref symbols: 485 Approaching the pain threshold now.	1995-12-17 21:23:44 +00:00
John Dyson	a316d390bd	Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.	1995-12-11 04:58:34 +00:00
David Greenman	efeaf95a41	Untangled the vm.h include file spaghetti.	1995-12-07 12:48:31 +00:00
Poul-Henning Kamp	65d0bc1387	A couple of minor tweaks to the sysctl stuff.	1995-12-06 13:27:39 +00:00
Bruce Evans	98d938220c	Completed function declarations and/or added prototypes.	1995-12-02 18:58:56 +00:00
Poul-Henning Kamp	2d0b1d708c	A test was backwards. Noticed by: Cheng, Hsiao-Yang <sycheng@cis.ufl.edu>	1995-11-29 11:28:00 +00:00
Poul-Henning Kamp	4b2af45f4b	Mega commit for sysctl. Convert the remaining sysctl stuff to the new way of doing things. the devconf stuff is the reason for the large number of files. Cleaned up some compiler warnings while I were there.	1995-11-20 12:42:39 +00:00
Bruce Evans	986f4ce75e	Fixed support for DIAGNOSTIC option. SYSCTL_INT() depends on kernel.h.	1995-11-16 09:45:23 +00:00
Poul-Henning Kamp	395e673587	Change some of the debug sysctl vars. The semantics of these will change.	1995-11-14 09:19:16 +00:00
Bruce Evans	ceba6236e0	Fixed type of vfs_free_netcred(). Removed redundant declaration of insmntque().	1995-11-11 00:27:00 +00:00
Bruce Evans	f57e65478d	Introduced a type `vop_t' for vnode operation functions and used it 1138 times (:-() in casts and a few more times in declarations. This change is null for the i386. The type has to be `typedef int vop_t(void *)' and not `typedef int vop_t()' because `gcc -Wstrict-prototypes' warns about the latter. Since vnode op functions are called with args of different (struct pointer) types, neither of these function types is any use for type checking of the arg, so it would be preferable not to use the complete function type, especially since using the complete type requires adding 1138 casts to avoid compiler warnings and another 40+ casts to reverse the function pointer conversions before calling the functions.	1995-11-09 08:17:23 +00:00
John Dyson	5e527f652f	This is a modification missed by me in the msync fixes a few days ago.	1995-11-07 05:09:43 +00:00
Bruce Evans	e887950a45	Call vfs_unbusy() before error returns from sysctl_vnode(). This fixes PR 795. Set the size before one error return from sysctl_vnode() the same as before the other. The caller might want to know about the amount successfully read although the current caller doesn't.	1995-10-28 08:50:08 +00:00
Bruce Evans	430179f0c3	Don't compile the diagnostic functions vhold() and holdrele() unless DIAGNOSTIC is defined.	1995-08-25 20:49:44 +00:00
David Greenman	628641f8a6	Converted mountlist to a CIRCLEQ. Partially obtained from: 4.4BSD-Lite2	1995-08-11 11:31:18 +00:00
David Greenman	24a1cce34f	NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!! Much needed overhaul of the VM system. Included in this first round of changes: 1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers". 2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items. 3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed. 4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug. 5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance. 6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain. 7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance. 8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed. 9) Some almost useless debugging code removed. 10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology. 11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended. 12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course). 13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE. 14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes) TODO: 1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size. 2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness. 3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind. 4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems. 5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).	1995-07-13 08:48:48 +00:00
David Greenman	e0dca2b939	Improve negative usecount diagnostic a little.	1995-07-08 04:10:32 +00:00
David Greenman	aa2cabb958	1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.	1995-06-28 12:01:13 +00:00
Bruce Evans	6acceb40dc	Pass the correct nonblocking flag to VOP_CLOSE() in vclean(). VOP_CLOSE() takes `F' (file) flags, not `IO' flags. At least that's what close() passes. I previously fixed ttylclose() to check FNONBLOCK instead of IO_NDELAY. This broke the call from vclean() and cleaning of ptys sometimes deadlocked.	1995-06-27 21:29:08 +00:00
David Greenman	61f5d51062	Changes to fix the following bugs: 1) Files weren't properly synced on filesystems other than UFS. In some cases, this lead to lost data. Most likely would be noticed on NFS. The fix is to make the VM page sync/object_clean general rather than in each filesystem. 2) Mixing regular and mmaped file I/O on NFS was very broken. It caused chunks of files to end up as zeroes rather than the intended contents. The fix was to fix several race conditions and to kludge up the "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention to page modifications that occurred via the mmapping. Reviewed by: David Greenman Submitted by: John Dyson	1995-05-21 21:39:31 +00:00
David Greenman	15f1b09648	Increased ratio of allowed vnodes on freelist to 1/4th of the total. This is more representative of worst case situations of 4 files/directory. (If that last sentence doesn't make any sense, I'm not surprised. It's rather compilcated how this all fits together....). This should fix a problem that Ed Hudson has been complaining about where directories with lots of symlinks could cause excessive disk I/O.	1995-05-12 04:24:53 +00:00
David Greenman	1a477b0c27	Changed #ifdef around printlockedvnodes() from DEBUG to DDB.	1995-04-16 11:33:33 +00:00
David Greenman	213fd1b6e8	Changes from John Dyson and myself: Fixed remaining known bugs in the buffer IO and VM system. vfs_bio.c: Fixed some race conditions and locking bugs. Improved performance by removing some (now) unnecessary code and fixing some broken logic. Fixed process accounting of # of FS outputs. Properly handle NFS interrupts (B_EINTR). (various) Replaced calls to clrbuf() with calls to an optimized routine call vfs_bio_clrbuf(). (various FS sync) Sync out modified vnode_pager backed pages. ffs_vnops.c: Do two passes: Sync out file data first, then indirect blocks. vm_fault.c: Fixed deadly embrace caused by acquiring locks in the wrong order. vnode_pager.c: Changed to use buffer I/O system for writing out modified pages. This should fix the problem with the modification date previous not getting updated. Also dramatically simplifies the code. Note that this is going to change in the future and be implemented via VOP_PUTPAGES(). vm_object.c: Fixed a pile of bugs related to cleaning (vnode) objects. The performance of vm_object_page_clean() is terrible when dealing with huge objects, but this will change when we implement a binary tree to keep the object pages sorted. vm_pageout.c: Fixed broken clustering of pageouts. Fixed race conditions and other lockup style bugs in the scanning of pages. Improved performance.	1995-04-09 06:02:46 +00:00
David Greenman	b946193056	Fixed vinvalbuf() to work like NFS wants it to. The previous code wouldn't flush pages in the vm object if V_SAVE was true.	1995-03-21 01:13:16 +00:00
David Greenman	62b71ed629	Don't gain/lose a reference to the object when yanking its pages in vinvalbuf()...it will cause vnode locking problems in vm_object_terminate, and isn't necessary anyway.	1995-03-20 10:19:09 +00:00
David Greenman	ff769afc81	Don't attempt to sync pages in the V_SAVE case of vinvalbuf; doing so can lead to a deadlock. Just let the VM system deal with it.	1995-03-20 02:08:24 +00:00
Bruce Evans	b5e8ce9f12	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.	1995-03-16 18:17:34 +00:00
David Greenman	3d2a8cf3d9	Added a comment.	1995-03-11 22:29:07 +00:00
David Greenman	f8a0b2dd88	Reorganized an if() expression for efficiency.	1995-03-10 21:18:24 +00:00
Poul-Henning Kamp	fbd6e6c9ef	Clean up and improve the namecache. 1. We always keep one 16th of the vnodes on the freelist, so that the namecache doesn't get trashed. It used to be that it wasn't a problem, but the only vnodes getting released these days are directories and things which Clean up and improve the namecache. 1. We always keep one 16th of the vnodes on the freelist, so that the namecache doesn't get trashed. It used to be that it wasn't a problem, but the only vnodes getting released these days are directories and things which gets forced out of the VM/cache. The latter is not numerous enough to keep the pool of vnodes needed for the namecache sufficiently big. 2. Purge invalid entries in the namecache as soon as we notice them. This avoids a stale entry pushing out a valid entry on the LRU list. 3. Speed up the lookup in the namecache by avoid a special case branch. 4. Make the cache purge routines do the thing they're supposed to, and in a decently efficient manner. 5. Make the size of the namecache follow the number of vnodes, so that we can always point to all the vnodes we have in core. 6. Readability has gone way up. 7. Added a "options NCH_STATISTICS" feature that will gather more detailed statistics on the performance of the namecache. Reviewed by: davidg (cvs is dumping core on me :-( )	1995-03-09 20:27:04 +00:00
David Greenman	acc835fd3f	Put VAGE vnodes at the head of the free list.	1995-03-07 18:59:45 +00:00
David Greenman	7500ed1d63	Backed out previous change. I forgot (for about the fourth time) that v_rdev is a #define which is dereferenced through v_specinfo->si_rdev, and that isn't initialized until later in checkalias().	1995-02-27 10:15:38 +00:00
David Greenman	f9ceb7c7b5	Initialize v_rdev in getnewvnode() - it appears that some filesystems may not properly initialize this field in all cases, and this would result in very anti-social behavior (overwriting on some other random device/location). Submitted by: John Dyson	1995-02-27 06:50:08 +00:00
David Greenman	a3a8bb29d2	vfs_cluster.c: Various more tweaks from John Dyson to improve read ahead calculations. vfs_subr.c: Only wakeup if numoutput is 0 in vwakeup(). Submitted by: John Dyson	1995-02-22 09:39:22 +00:00
David Greenman	480dff540b	Fixed some formatting weirdness that I overlooked in the previous commit.	1995-01-10 07:32:52 +00:00
David Greenman	0d94caffca	These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D. The majority of the merged VM/cache work is by John Dyson. The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme. vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering. vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff. vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption. vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up. vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme. pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs. vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping. proc.h Fixed the problem that the p_lock flag was not being cleared on a fork. swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore. machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme. machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed. ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers. Submitted by: John Dyson and David Greenman	1995-01-09 16:06:02 +00:00
David Greenman	602d2b481a	Protect vnode buffer chain manipulation with splbio to prevent list corruption..	1994-12-23 04:52:55 +00:00
David Greenman	824789192c	Use tsleep() rather than sleep so that 'ps' is more informative about the wait.	1994-10-06 21:07:04 +00:00
David Greenman	8e58bf6875	Stuff object into v_vmdata rather than pager. Not important which at the moment, but will be in the future. Other changes mostly cosmetic, but are made for future VMIO considerations. Submitted by: John Dyson	1994-10-05 09:48:45 +00:00
Poul-Henning Kamp	797f2d22f0	All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.	1994-10-02 17:35:40 +00:00
Poul-Henning Kamp	bb56ec4a05	While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.	1994-09-25 19:34:02 +00:00
David Greenman	1cdeb653a8	"bogus" fixes from 1.1.5 to work around some cache coherency problems.	1994-08-29 06:09:15 +00:00
David Greenman	e0c0215442	Initialized v_writecount.	1994-08-24 04:06:39 +00:00
David Greenman	4f5a3fef1a	print "BUSY" instead of error number if filesystem was busy during vfs_unmountall() - this is the most common case. If it was a different error, then print the error number.	1994-08-22 17:05:00 +00:00
David Greenman	e0e9c42112	Implemented filesystem clean bit via: machdep.c: Changed printf's a little and call vfs_unmountall() if the sync was successful. cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c: Allow dismount of root FS. It is now disallowed at a higher level. vfs_conf.c: Removed unused rootfs global. vfs_subr.c: Added new routines vfs_unmountall and vfs_unmountroot. Filesystems are now dismounted if the machine is properly rebooted. ffs_vfsops.c: Toggle clean bit at the appropriate places. Print warning if an unclean FS is mounted. ffs_vfsops.c, lfs_vfsops.c: Fix bug in selecting proper flags for VOP_CLOSE(). vfs_syscalls.c: Disallow dismounting root FS via umount syscall.	1994-08-20 16:03:26 +00:00
Garrett Wollman	f23b4c91c4	Fix up some sloppy coding practices: - Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above. NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.	1994-08-18 22:36:09 +00:00
David Greenman	3c4dd3568f	Added $Id$	1994-08-02 07:55:43 +00:00
Rodney W. Grimes	26f9a76710	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman	1994-05-25 09:21:21 +00:00
Rodney W. Grimes	df8bae1de4	BSD 4.4 Lite Kernel Sources	1994-05-24 10:09:53 +00:00

... 4 5 6 7 8 ...

490 Commits