freebsd-dev

Author	SHA1	Message	Date
Poul-Henning Kamp	0429e37ade	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967	1999-11-20 10:00:46 +00:00
Peter Wemm	63034ded71	Fix a warning (unused static declaration without MFS_ROOT)	1999-11-18 08:49:40 +00:00
Eivind Eklund	dd8c04f4c7	Remove WILLRELE from VOP_SYMLINK Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.	1999-11-13 20:58:17 +00:00
Eivind Eklund	edfe736df9	Remove WILLRELE from VOP_RENAME	1999-11-12 03:34:28 +00:00
Poul-Henning Kamp	698f9cf828	Next step in the device cleanup process. Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code. Unify spec_open() for bdev and cdev cases. Remove the disabled bdev specific read/write code.	1999-11-09 14:15:33 +00:00
Bruce Evans	5bd5c8b9e5	Quick fix for breakage of ext2fs link counts as reported by stat(2) by the soft updates changes: only report the link count to be i_effnlink in ufs_getattr() for file systems that maintain i_effnlink. Tested by: Mike Dracopoulos <mdraco@math.uoa.gr>	1999-11-03 12:05:39 +00:00
Mike Smith	88d4183b84	Make MFS work with the new root filesystem search process. In order to achieve this, root filesystem mount is moved from SI_ORDER_FIRST to SI_ORDER_SECOND in the SI_SUB_MOUNT_ROOT sysinit group. Now, modules which wish to usurp the default root mount can use SI_ORDER_FIRST. A compiled-in or preloaded MFS filesystem will become the root filesystem unless the vfs.root.mountfrom environment variable refers to a valid bootable device. This will normally only be the case when the kernel and MFS image have been loaded from a disk which has a valid /etc/fstab file. In this case, the variable should be manually overridden in the loader, or the kernel booted with -a. In either case "mfs:" should be supplied as the new value. Also fix a typo in one DFLTROOT case that would not have compiled.	1999-11-03 11:02:47 +00:00
Mike Smith	6d14782861	Newline-terminate the complaint message about not being able to find the root vnode pointer.	1999-11-01 23:57:28 +00:00
Matthew Dillon	f9eb66d73a	Add sysctl debug.dircheck to allow directory sanity checking to be turned on with a sysctl. Fix two bugs in ufs_lookup that can cause deadlocks due to out-of-order locking. This fix was tested for a few days prior to commit.	1999-10-30 00:51:14 +00:00
Poul-Henning Kamp	923502ff91	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
Poul-Henning Kamp	b89392e703	Remove the D_NOCLUSTER[RW] options which were added because vn had problems. Now that Matt has fixed vn, this can go. The vn driver should have used d_maxio (now si_iosize_max) anyway.	1999-09-30 07:11:30 +00:00
Poul-Henning Kamp	1b5464ef9d	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde	1999-09-29 20:05:33 +00:00
Marcel Moolenaar	2c42a14602	sigset_t change (part 2 of 5) ----------------------------- The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements. The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t. struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals. signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution. Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types. NOTE: kdump (and port linux_kdump) must be recompiled. Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.	1999-09-29 15:03:48 +00:00
Poul-Henning Kamp	d6a0e38a1b	Remove five now unused fields from struct cdevsw. They should never have been there in the first place. A GENERIC kernel shrinks almost 1k. Add a slightly different safetybelt under nostop for tty drivers. Add some missing FreeBSD tags	1999-09-25 18:24:47 +00:00
Matthew Dillon	67ddfcaf69	More removals of vnode->v_lastr, replaced by preexisting seqcount heuristic to detect sequential operation. VM-related forced clustering code removed from ufs in preparation for a commit to vm/vm_fault.c that does it more generally. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>	1999-09-20 23:27:58 +00:00
Poul-Henning Kamp	faad302913	Fix a harmless bug I introduced, simplify a bit more while here.	1999-09-20 21:14:43 +00:00
Poul-Henning Kamp	fae03f66d1	Step one of replacing devsw->d_maxio with si_bsize_max. Rename dev->si_bsize_max to si_iosize_max and set it in spec_open if the device didn't. Set vp->v_maxio from dev->si_bsize_max in spec_open rather than in ufs_bmap.c	1999-09-20 19:57:28 +00:00
Bruce Evans	887ba12fc5	Removed diskerr()'s unused d_name arg and updated callers. This fixes warnings caused by the arg having the wrong type (not const enough). The arg was also wrong (a full name instead of a short one) for calls from from subr_diskmbr.c and pc98/diskslice_machdep.c.	1999-09-13 12:59:41 +00:00
Alfred Perlstein	c24fda81c9	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD	1999-09-11 00:46:08 +00:00
Julian Elischer	85a219d201	Changes to centralise the default blocksize behaviour. More likely to follow. Submitted by: phk@freebsd.org	1999-09-09 19:08:44 +00:00
Julian Elischer	7012bab988	Revert a bunch of contraversial changes by PHK. After a quick think and discussion among various people some form of some of these changes will probably be recommitted. The reversion requested was requested by dg while discussions proceed. PHK has indicated that he can live with this, and it has been agreed that some form of some of these changes may return shortly after further discussion.	1999-09-03 05:16:59 +00:00
Poul-Henning Kamp	02e1576966	Make bdev userland access work like cdev userland access unless the highly non-recommended option ALLOW_BDEV_ACCESS is used. (bdev access is evil because you don't get write errors reported.) Kill si_bsize_best before it kills Matt :-) Use the specfs routines rather having cloned copies in devfs.	1999-08-30 07:56:23 +00:00
Poul-Henning Kamp	9626728875	remove unused variables.	1999-08-28 19:21:03 +00:00
Poul-Henning Kamp	10af1a2b5f	We don't need to pass the diskname argument all over the diskslice/label code, we can find the name from any convenient dev_t	1999-08-28 14:33:44 +00:00
Peter Wemm	280652828b	$Id$ -> $FreeBSD$	1999-08-28 02:16:32 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Poul-Henning Kamp	dbafb3660f	Simplify the handling of VCHR and VBLK vnodes using the new dev_t: Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;	1999-08-26 14:53:31 +00:00
Poul-Henning Kamp	41d2e3e09e	Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.	1999-08-25 12:24:39 +00:00
Poul-Henning Kamp	cb5eef8f2b	Initialize the si_bsize fields for the MFS bogodevices. (This broke MFS rootfs and thereby installation)	1999-08-24 18:35:33 +00:00
Sheldon Hearn	740e3a15f7	Fix bug introduced in rev 1.28, which causes kernel build to break for the case where DEBUG is defined but not DIAGNOSTIC. ffs_checkblk is declared conditionally on DIAGNOSTIC, not DEBUG. PR: 13314 Reviewed by: bde	1999-08-24 08:39:41 +00:00
Bruce Evans	d918320517	Use devtoname() to print dev_t's instead of casting them to long or u_long for misprinting in %lx format.	1999-08-23 20:35:21 +00:00
John Polstra	a2801b7731	Support full-precision file timestamps. Until now, only the seconds have been maintained, and that is still the default. A new sysctl variable "vfs.timestamp_precision" can be used to enable higher levels of precision: 0 = seconds only; nanoseconds zeroed (default). 1 = seconds and nanoseconds, accurate within 1/HZ. 2 = seconds and nanoseconds, truncated to microseconds. >=3 = seconds and nanoseconds, maximum precision. Level 1 uses getnanotime(), which is fast but can be wrong by up to 1/HZ. Level 2 uses microtime(). It might be desirable for consistency with utimes() and friends, which take timeval structures rather than timespecs. Level 3 uses nanotime() for the higest precision. I benchmarked levels 0, 1, and 3 by copying a 550 MB tree with "cpio -pdu". There was almost negligible difference in the system times -- much less than 1%, and less than the variation among multiple runs at the same level. Bruce Evans dreamed up a torture test involving 1-byte reads with intervening fstat() calls, but the cpio test seems more realistic to me. This feature is currently implemented only for the UFS (FFS and MFS) filesystems. But I think it should be easy to support it in the others as well. An earlier version of this was reviewed by Bruce. He's not to blame for any breakage I've introduced since then. Reviewed by: bde (an earlier version of the code)	1999-08-22 00:15:16 +00:00
Alan Cox	2c28a10540	Add the (inline) function vm_page_undirty for clearing the dirty bitmask of a vm_page. Use it. Submitted by: dillon	1999-08-17 04:02:34 +00:00
Poul-Henning Kamp	49ff4debd3	Spring cleaning around strategy and disklabels/slices: Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout. please see comment in sys/conf.h about the flag argument. Remove strategy argument from all the diskslice/label/bad144 implementations, it should be found from the dev_t. Remove bogus and unused strategy1 routines. Remove open/close arguments from dssize(). Pick them up from dev_t. Remove unused and unfinished setgeom support from diskslice/label/bad144 code.	1999-08-14 11:40:51 +00:00
Poul-Henning Kamp	3a965c0db0	Move the special-casing of stat(2)->st_blksize for device files from UFS to the generic level. For chr/blk devices we don't care about the blocksize of the filesystem, we want what the device asked for.	1999-08-13 10:56:07 +00:00
Poul-Henning Kamp	7dc5cd047f	The bdevsw() and cdevsw() are now identical, so kill the former.	1999-08-13 10:29:38 +00:00
Poul-Henning Kamp	4d4f932326	s/v_specinfo/v_rdev/	1999-08-13 10:10:12 +00:00
Poul-Henning Kamp	0ef1c82630	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.	1999-08-08 18:43:05 +00:00
Alan Cox	7f866e4b29	Move the memory access behavior information provided by madvise from the vm_object to the vm_map. Submitted by: dillon	1999-08-01 06:05:09 +00:00
Bruce Evans	3dfdfdb27f	Fixed access timestamp bugs: Set IN_ACCESS for successful reads of 0 bytes (except for requests to read 0 bytes). This was broken in rev.1.42. PR: misc/10148 Don't set IN_ACCESS for requests to read 0 bytes. Don't set IN_ACCESS for unsuccessful reads.	1999-07-25 02:07:16 +00:00
Poul-Henning Kamp	698bfad7f2	Now a dev_t is a pointer to struct specinfo which is shared by all specdev vnodes referencing this device. Details: cdevsw->d_parms has been removed, the specinfo is available now (== dev_t) and the driver should modify it directly when applicable, and the only driver doing so, does so: vn.c. I am not sure the logic in checking for "<" was right before, and it looks even less so now. An intial pool of 50 struct specinfo are depleted during early boot, after that malloc had better work. It is likely that fewer than 50 would do. Hashing is done from udev_t to dev_t with a prime number remainder hash, experiments show no better hash available for decent cost (MD5 is only marginally better) The prime number used should not be close to a power of two, we use 83 for now. Add new checkalias2() to get around the loss of info from dev2udev() in bdevvp(); The aliased vnodes are hung on a list straight of the dev_t, and speclisth[SPECSZ] is unused. The sharing of struct specinfo means that the v_specnext moves into the vnode which grows by 4 bytes. Don't use a VBLK dev_t which doesn't make sense in MFS, now we hang a dummy cdevsw on B/Cmaj 253 so that things look sane. Storage overhead from all of this is O(50k). Bump __FreeBSD_version to 400009 The next step will add the stuff needed so device-drivers can start to hang things from struct specinfo	1999-07-20 09:47:55 +00:00
Poul-Henning Kamp	f008cfcc1a	I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g	1999-07-17 18:43:50 +00:00
Kirk McKusick	4dc0c8f521	Create the macro DOINGASYNC to check whether the MNT_ASYNC flag has been set for a mount point. Insert missing checks to ensure that all write operations are done asynchronously when the MNT_ASYNC option has been requested. Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-13 18:20:13 +00:00
Poul-Henning Kamp	68de329e34	Use the fsid from the superblock, unless it looks bogus or has already been taken by some other filesystem.	1999-07-11 19:16:50 +00:00
Kirk McKusick	ad8ac923fa	These changes appear to give us benefits with both small (32MB) and large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() PRIOR to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-08 06:06:00 +00:00
Ollivier Robert	7fe29b0aef	Add $Id$ Approved by: kirk	1999-07-07 07:51:04 +00:00
John Polstra	24755bdc25	Update pathnames for new location of soft-updates sources.	1999-07-03 21:34:05 +00:00
Kirk McKusick	48703fedf1	No longer need to set B_ASYNC flag since BUF_KERNPROC now unconditionally sets the identity of the buffer.	1999-06-29 15:57:40 +00:00
Peter Wemm	a6451da76b	Keep the inlines for <sys/buf.h> happy..	1999-06-27 13:26:23 +00:00
Kirk McKusick	67812eacd7	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.	1999-06-26 02:47:16 +00:00
Kirk McKusick	7481264c1e	On our final pass through ffs_fsync, do all I/O synchronously so that we can find out if our flush is failing because of write errors. This change avoids a "flush failed" panic during unrecoverable disk errors.	1999-06-18 05:49:46 +00:00
Kirk McKusick	f9c8cab591	Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.	1999-06-16 23:27:55 +00:00
Kirk McKusick	e4ab40bcb6	Get rid of the global variable rushjob and replace it with a function in kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.	1999-06-15 23:37:29 +00:00
Poul-Henning Kamp	2447bec829	Simplify cdevsw registration. The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing. cdevsw_add() will print an message if the d_maj field looks bogus. Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL. Move bdevsw() and devsw() functions to kern/kern_conf.c Bump __FreeBSD_version to 400006 This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions if_xe.c bogusly accessed cdevsw[], author/maintainer please fix. I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.	1999-05-31 11:29:30 +00:00
John Birrell	ed3a2fb7b3	- Back out Luoqi's cdevsw stuff. It panics on my system and is not required. - Fix an error message. - Do the MFS_ROOT setting of mountrootfsname in mfs_init() instead of cpu_rootconf(). - Set rootdev in mfs_init instead of later in mfs_mount() iff MFS_ROOT.	1999-05-24 00:27:12 +00:00
Julian Elischer	2e897e94b6	Cosmetic changes to make it compile without errors in gcc -Wall	1999-05-22 04:43:04 +00:00
Luoqi Chen	0ce54cbb0c	Legally acquire a major number for mfs.	1999-05-14 20:40:23 +00:00
Kirk McKusick	c2606ec5c6	Add a hook to ffs_fsync to allow soft updates to get first chance at doing a sync on the block device for the filesystem. That allows it to push the bitmap blocks before the inode blocks which greatly reduces the number of inode rollbacks that need to be done.	1999-05-14 01:26:46 +00:00
Peter Wemm	51b5226683	Try and fix a dev_t/major/minor etc nit.	1999-05-12 22:32:07 +00:00
Poul-Henning Kamp	bfbb9ce670	Divorce "dev_t" from the "major\|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).	1999-05-11 19:55:07 +00:00
Bruce Evans	6b88526425	Fixed disordering in previous 2 commits.	1999-05-11 03:11:09 +00:00
Peter Wemm	7f2d5fc4f2	Move the mfs_getimage() prototype to mfs_extern.h duplicating it everywhere.	1999-05-10 17:12:45 +00:00
Kirk McKusick	71a0942aca	Put back changes that might be causing trouble on Alpha.	1999-05-09 19:39:54 +00:00
Poul-Henning Kamp	4be2eb8c49	I got tired of seeing all the cdevsw[major(foo)] all over the place. Made a new (inline) function devsw(dev_t dev) and substituted it. Changed to the BDEV variant to this format as well: bdevsw(dev_t dev) DEVFS will eventually benefit from this change too.	1999-05-08 06:40:31 +00:00
Poul-Henning Kamp	46eede0058	Continue where Julian left off in July 1998: Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function. Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!) Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!) (Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)	1999-05-07 10:11:40 +00:00
Kirk McKusick	36cfb417de	Whitespace cleanup.	1999-05-07 05:21:16 +00:00
Kirk McKusick	7957996abd	Get rid of random debugging cruft; sync up with latest version.	1999-05-07 05:11:31 +00:00
Kirk McKusick	224a6aa241	Severe slowdowns have been reported when creating or removing many files at once on a filesystem running soft updates. The root of the problem is that soft updates limits the amount of memory that may be allocated to dependency structures so as to avoid hogging kernel memory. The original algorithm just waited for the disk I/O to catch up and reduce the number of dependencies. This new code takes a much more aggressive approach. Basically there are two resources that routinely hit the limit. Inode dependencies during periods with a high file creation rate and file and block removal dependencies during periods with a high file removal rate. I have attacked these problems from two fronts. When the inode dependency limits are reached, I pick a random inode dependency, UFS_UPDATE it together with all the other dirty inodes contained within its disk block and then write that disk block. This trick usually clears 5-50 inode dependencies in a single disk I/O. For block and file removal dependencies, I pick a random directory page that has at least one remove pending and VOP_FSYNC its directory. That releases all its removal dependencies to the work queue. To further hasten things along, I also immediately start the work queue process rather than waiting for its next one second scheduled run.	1999-05-07 02:26:47 +00:00
Peter Wemm	dfd5dee1b0	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.	1999-05-06 18:13:11 +00:00
Alan Cox	4221e284a3	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-05-02 23:57:16 +00:00
Poul-Henning Kamp	75c1354190	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/	1999-04-28 11:38:52 +00:00
Mike Smith	f4711b2df4	Simplify the tunefs example, since tunefs uses getfsfile(). Lots of people complain about working out what device their filesystems are mounted on.	1999-04-27 21:11:19 +00:00
Poul-Henning Kamp	f711d546d2	Suser() simplification: 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc ), prototyped in <sys/proc.h>. 3: s/suser_xxx($[a-zA-Z0-9_]$->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.	1999-04-27 11:18:52 +00:00
Dmitrij Tejblum	8d81b5d631	Change type of a variable from u_int to size_t, so that pointer to it may be used as a last argument to copyinstr().	1999-04-21 09:41:07 +00:00
Eivind Eklund	ee45a71480	Correct typo in panic message	1999-04-11 02:28:32 +00:00
Peter Wemm	30c56d468c	Hold the mfs process's upages in-core with PHOLD rather than P_NOSWAP.	1999-04-06 03:08:43 +00:00
Julian Elischer	8d17e69460	Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>	1999-04-05 19:38:30 +00:00
Peter Wemm	fa8e1794b5	There's not much point in the EXPORTMFS #ifdef. I've had this sitting in my tree for 12+ months, and I just noticed that NetBSD have (I think, I've just seen the commit, not the change) just zapped it there. It wasn't in the options files or LINT either.	1999-04-05 06:39:10 +00:00
Julian Elischer	51df594922	Stop the mfs from trying to swap out crucial bits of the mfs as this can lead to deadlock. Submitted by: Mat dillon <dillon@freebsd.org>	1999-03-12 00:44:03 +00:00
Bruce Evans	44f332052d	Don't depend on <ufs/ufs/quota.h> or another (old) prerequisite including <sys/queue.h>. This fixes my recent breakage of biosboot by unpolluting <ufs/ufs/quota.h> in the !KERNEL case.	1999-03-06 05:21:09 +00:00
Bruce Evans	fdc79256c1	Moved kernel declarations inside the KERNEL ifdef, and removed include of <sys/queue.h> in the !KERNEL case. The prerequisites for <ufs/ufs/quota.h> were broken in Lite2 by converting some of the kernel declarations to use queue macros without including <sys/queue.h>. <sys/queue.h> was included in applications in /usr/src instead. We polluted this file instead of merging the changes in the applications. Include <sys/queue.h> in the KERNEL case, and forward-declare all structs that are used in prototypes, so that this file is almost self-sufficient even in the kernel. Obtained from: mostly from NetBSD	1999-03-05 11:25:31 +00:00
Bruce Evans	abd022381d	Changed the type of quotactl()'s 4th arg from `char ' to` void ' so that non-sloppy applications can call it without using disgusting casts to avoid warnings. The 4th arg is sort of varargs -- it must sometimes represent a filename, sometimes a struct pointer, and is sometimes unused. The arg type is still caddr_t in the kernel. Obtained from: mostly from NetBSD	1999-03-05 09:28:33 +00:00
Kirk McKusick	38e28fd66b	Reorganize locking to avoid holding the lock during calls to bdwrite and brelse (which may sleep in some systems). Obtained from: Matthew Dillon <dillon@apollo.backplane.com>	1999-03-02 06:38:07 +00:00
Warner Losh	5369eb85ca	Merge patch to ufs_vnops.c's ufs_rename to the copy of ufs_rename that lives in ext2_vnops.c for ext2fs. Also remove cast from comparision. Bruce pointed out that it was bogus since we'd force a signed comparision when we really wanted an unsigned comparison.	1999-03-02 05:31:47 +00:00
Kirk McKusick	eef33ce9bd	When fsync'ing a file on a filesystem using soft updates, we first try to write all the dirty blocks. If some of those blocks have dependencies, they will be remarked dirty when the I/O completes. On systems with really fast I/O systems, it is possible to get in an infinite loop trying to flush the buffers, because the I/O finishes before we can get all the dirty buffers off the v_dirtyblkhd list and into the I/O queue. (The previous algorithm looped over the v_dirtyblkhd list writing out buffers until the list emptied.) So, now we mark each buffer that we try to write so that we can distinguish the ones that are being remarked dirty from those that we have not yet tried to flush. Once we have tried to push every buffer once, we then push any associated metadata that is causing the remaining buffers to be redirtied. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-03-02 04:04:31 +00:00
Kirk McKusick	4cbb89d95d	Ensure that softdep_sync_metadata can handle bmsafemap and mkdir entries if they ever arise (which should not happen as softdep_sync_metadata is currently used).	1999-03-02 00:19:47 +00:00
Warner Losh	00db131a60	Fix last commit based on feedback from Guido, Bruce and Terry. Specifically, the test was in the wrong place, lacked a cast, didn't unlock the node, and exited to bad rather than abortit. Now we don't allow renaming of a file with LINK_MAX references. Move the test to earlier in the code as it is closer to where ip is obtained, as that is the style of the rest of the function. Didn't fix the problems bruce pointed out in the rename man page to include EMLINK, nor address his complaints about how the whole idea of incrementing the link count during a rename is potentially asking for trouble. Also didn't try to correct potential problem Terry pointed out with decrements not being similarly protected against underflow.	1999-02-26 05:34:16 +00:00
Warner Losh	b82396269c	Add missing check for LINK_MAX in ufs_rename. Since ip->i_effnlink and ip->nlink were different types, there was a masked overflow. Reported by: Mark Slemco <marcs@znep.com>	1999-02-25 09:52:46 +00:00
Matthew Dillon	06f7b4ebd1	Update ufs_vnops code to use new specinfo fields rather then guess. This is part of general specinfo / d_parms() commit.	1999-02-25 05:35:53 +00:00
Kirk McKusick	133ff2619a	fix double LIST_REMOVE; other cosmetic changes to match version 9.32. Obtained from: Jeffrey Hsu <hsu@FreeBSD.ORG>	1999-02-17 20:01:20 +00:00
Matthew Dillon	89a01116cf	Remove XXX comment in regarsd to why NFS doesn't use VOP_ABORT(). NFS is being fixed now.	1999-02-13 08:38:28 +00:00
Matthew Dillon	8aef171243	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-28 00:57:57 +00:00
Matthew Dillon	5e24f1a2f6	Remove unintended trigraph sequences in comments for -Wall	1999-01-27 18:19:53 +00:00
David Greenman	8ab2fa0073	Gutted softdep_deallocate_dependencies and replaced it with a panic. It turns out to not be useful to unwind the dependencies and continue in the face of a fatal error. Also changed the log() to a printf() in softdep_error() so that it will be output in the case of a impending panic. Submitted by: Kirk McKusick <mckusick@mckusick.com>	1999-01-22 09:07:32 +00:00
Matthew Dillon	50c7d0d513	Added support for VOP_FREEBLKS(), reducing MFS's impact on swap and increasing performance by deallocating at least some of the backing store when files are removed. Protect mfsp->buf_queue access at splbio().	1999-01-21 09:27:03 +00:00
Matthew Dillon	3fe2487dfc	Access to mfsp->buf_queue must be protected at splbio(). Other minor adjustments also made, such as passing mfsp to mfs_doio() directly.	1999-01-21 09:24:46 +00:00
Matthew Dillon	1c7c3c6a86	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
Eivind Eklund	5b1b6c5859	Silence warning about unused debug function. (I'll turn this function into a DDB command in my next staticization sweep).	1999-01-12 11:42:41 +00:00
Eivind Eklund	a862221fa0	Add a warning about the copyright restraints.	1999-01-08 16:03:12 +00:00
Bruce Evans	de5d1ba57c	Don't pass unused unused timestamp args to UFS_UPDATE() or waste time initializing them. This almost finishes centralizing (in-core) timestamp updates in ufs_itimes().	1999-01-07 16:14:19 +00:00
Bruce Evans	4591d9bb7e	UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value MNT_WAIT when we mean boolean `true' or check for that value not being passed. There was no problem in practice because MNT_WAIT had the magic value of 1.	1999-01-06 18:18:06 +00:00
Bruce Evans	d64dbc8719	Ifdefed the conditionally used variable `prtrealloc'. Declare it as volatile so that there is no chance that the code that it controls is optimised away.	1999-01-06 17:04:33 +00:00
Bruce Evans	5991fd0370	Backed out rev.1.47. It just broke my optimisations for lazy syncing of timestamps in rev.1.45. The soft updates bug was elsewhere. Forgotten by: luoqi	1999-01-06 16:52:38 +00:00
Eivind Eklund	fb1167777a	Remove the 'waslocked' parameter to vfs_object_create().	1999-01-05 18:50:03 +00:00
Bruce Evans	289bdf33d3	Ifdefed conditionally used simplock variables.	1999-01-02 11:34:57 +00:00
Eivind Eklund	a777e82019	Remove the last clients of vfs_object_create(..., waslocked=1); waslocked will go away shortly. Reviewed by: dg	1999-01-02 01:32:36 +00:00
Matthew Dillon	2ae122f64a	The mount_mfs process that stays in a supervisor context handling MFS I/O requests must be marked P_SYSTEM because if it isn't and the system decides to swap it or (god forbid) kill it, the system stands a good chance of locking up.	1999-01-01 04:14:11 +00:00
Bruce Evans	d26105a9ce	Fixed null pointer panics which I introduced in rev.1.86. Vnodes may be revoked, so vnop routines must be careful about accessing the vnode if they may have blocked. Fixed marking for update after successfully reading or writing 0 bytes. In this case, POSIX.1 specifies marking if and only if the requested count is nonzero, but rev.1.86 never marked.	1998-12-24 09:45:10 +00:00
Bruce Evans	450fefa9f3	Remove unused file. It seems to have been a vestige of when mfs did its own memory allocation.	1998-12-20 17:05:54 +00:00
Doug Rabson	6839466b30	In ufs_setattr(), if only one of va_atime or va_mtime are != VNOVAL, then the code set the other field in the inode to VNOVAL. This can happen sometimes on an NFS server.	1998-12-20 12:36:01 +00:00
Julian Elischer	feb17f5acb	Add comments to code that I was trying to understand. Hopefully will save others time. Someone who understands this better might check for correctness.	1998-12-15 03:29:52 +00:00
Matthew Dillon	f7bb75c92a	Fix -Wuninitialized warning regarding zero-length var-args ctl element. ( this isn't really an error, but I think it is important to fix the warning ).	1998-12-14 05:37:37 +00:00
Julian Elischer	1f35e8c8da	Remove some compiler warnings.	1998-12-10 20:11:47 +00:00
Eivind Eklund	bf51e54f46	Make compare correct with unsigned types. (Problem introduced by Lite/2).	1998-12-09 02:06:27 +00:00
Archie Cobbs	f1d19042b0	The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.	1998-12-07 21:58:50 +00:00
Bruce Evans	672be20b9f	Don't use the strange null pointer constant `(ufs_daddr_t)0' in a call to VOP_BMAP(). Don't use uncast NULLs in the same call.	1998-11-29 03:12:06 +00:00
David Greenman	1c680b45a2	Restored the "reallocblks" code to its former glory. What this does is basically do a on-the-fly defragmentation of the FFS filesystem, changing file block allocations to make them contiguous. Thanks to Kirk McKusick for providing hints on what needed to be done to get this working.	1998-11-13 01:01:44 +00:00
Peter Wemm	1c5bb3eaa1	add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()	1998-11-10 09:16:29 +00:00
Peter Wemm	2ec07c6614	Change dirty block list handling to use TAILQ macros.	1998-10-31 15:33:32 +00:00
Peter Wemm	40c8cfe552	Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.	1998-10-31 15:31:29 +00:00
Jordan K. Hubbard	2dcc2f0693	Clarify a rather ambiguous debugging message.	1998-10-28 10:37:54 +00:00
Bruce Evans	b5ee16407f	Oops, the redundant tests for major numbers weren't redundant here. They checked for the magic major number for the "device" behind mfs mount points. Use a more obvious check for this device. Debugged by: Andrew Gallatin <gallatin@cs.duke.edu>	1998-10-27 11:47:08 +00:00
Bruce Evans	569555b969	Removed redundant bitrotted checks for major numbers instead of updating them.	1998-10-26 08:53:13 +00:00
Bruce Evans	9c0619dace	Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted when bdevsw[] became sparse. We still depend on magic to avoid having to check that (v_rdev) device numbers in vnodes are not NODEV. Removed redundant `major(dev) < nblkdev' tests instead of updating them.	1998-10-25 19:02:48 +00:00
Poul-Henning Kamp	f5ef029e92	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.	1998-10-25 17:44:59 +00:00
Bruce Evans	e36b4f594a	Use only the correct raw partition for writing labels. Don't use the partition that the label ioctl is being done on just because it has offset 0, since there is no guarantee that such a partition is large enough to contain the label. Don't use the wrong raw partition (0 instead of RAW_PART). This fixes problems rewriting bizarre labels (with a nonzero offset for the 'a' partition) in newfs(8). Such labels shouldn't normally be used, but creating them was allowed if the ioctl was done on the raw partition, and sysinstall creates them if the root partition isn't allocated first. Note that allowing write access to a partition other than the one that has been checked for write access doesn't increase security holes significantly, since write access to any partition already allows changing the in-core label. This fix should be in 3.0R. Rev.1.26 of newfs/newfs.c shouldn't be in 3.0R.	1998-10-17 07:49:04 +00:00
Jordan K. Hubbard	908dcbd2a4	fixup for alpha.	1998-10-16 10:14:21 +00:00
Bruce Evans	d2165c2f7d	Fixed bloatage of `struct inode'. We used 5 "spare" fields for ext2fs, but when i_effnlink was added to support soft updates, there was only room for 4 spares. The number of spares was not reduced, so the inode size became 260 (on i386's), or 512 after rounding up by malloc(). Use one spare field in `struct dinode' instead of the 5th spare field in the inode and reduced to 4 spares in the inode so that the size is 256 again. Changed the types of the spares in the inode from int to u_int32_t so that the inode size has more chance of being <= 256 under other arches, and downdated ext2fs to match (it was broken to use ints before rev.1.1).	1998-10-13 15:45:43 +00:00
Peter Wemm	624b326270	"fix" a warning	1998-10-12 09:02:19 +00:00
Jordan K. Hubbard	a33b93ff31	Allow more flexible use of MFS root. Submitted by: peter	1998-10-10 08:12:24 +00:00
Peter Wemm	cfb55a60f9	MODINFO_ADDR has real addresses now, remove the manual relocation based on cpu type.	1998-10-09 23:37:37 +00:00
Jordan K. Hubbard	b526345c9e	Add some evil temporary phys-to-kern translation for mfs.	1998-10-09 06:21:12 +00:00
Jordan K. Hubbard	48c540f9a1	include proper header for Mike's new stuff.	1998-10-09 01:40:56 +00:00
Jordan K. Hubbard	f97ad428df	Allow the module area to be used in order to find the MFS image (in addition to allowing it to be compiled in) and stop overloading the MFS_ROOT variable to store size information.	1998-10-08 23:34:44 +00:00
Luoqi Chen	523e3b0f7f	Use vm_page_xxx() inline functions to manipulate vm_page::flags, vm_page::busy. As a side effect, a few wakeup() calls are added, which might fix some of the missing vm_page wakeups people have been seeing. Reviewed by: Doug Rabson <dfr@nlsystems.com>	1998-10-07 13:59:26 +00:00
Nate Williams	ed8d80c2de	Fix 'noatime' bug that was unrelated to use of noatime. The problem is caused when a directory block is compacted. When this occurs, softdep_change_directoryentry_offset() is called to relocate each directory entry and adjust its matching diradd structure, if any, to match the new location of the entry. The bug is that while softdep_change_directoryentry_offset() correctly adjusts the offsets of the diradd structures on the pd_diraddhd[] lists (which are not yet ready to be committed to disk), it fails to adjust the offsets of the diradd structures on the pd_pendinghd list (which are ready to be committed to disk). This causes the dependency structures to be inconsistent with the buf contents. Now, if the compaction has moved a directory entry to the same offset as one of the diradd structures on the pd_pendinghd list and a syscall is done that tries to remove this directory entry before this directory block has been written to disk (which would empty pd_pendinghd), a sanity check in newdirrem() will call panic() when it notices that the inode number in the entry that it is to be removed doesn't match the inode number in the diradd structure with that offset of that entry. Reviewed by: Kirk McKusick <mckusick@McKusick.COM> Submitted by: Don Lewis <Don.Lewis@tsc.tdk.com>	1998-10-03 19:17:11 +00:00
Kirk McKusick	df077352a7	Do not allow a mounted on directory to be rmdir'ed. This removal can happen when an NFS exported filesystem tries to remove a locally mounted on directory. PR: kern/7272 Submitted by: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>	1998-09-30 00:53:40 +00:00
Bruce Evans	0922cce61c	Fixed clean flag handling: - don't set the clean flag on unmount of an unclean filesystem that was (forcibly) mounted rw. - set the clean flag on rw -> ro update of a mounted initially-clean filesystem. - fixed some style bugs (mostly long lines). This uses the fs_flags field and FS_UNCLEAN state bit which were introduced in the softdep changes. NetBSD uses extra state bits in fs_clean. Reviewed by: luoqui	1998-09-26 04:59:42 +00:00
Luoqi Chen	e266594c25	Eliminate a race in VOP_FSYNC() when softupdates is enabled. Submitted by: Kirk McKusick <mckusick@McKusick.COM> Two minor changes are also included, 1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set, vn_lock should always succeed in these cases. 2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount a little more unstable. It also keeps us in sync with other BSDs. Suggested by: Bruce Evans <bde@zeta.org.au>	1998-09-24 15:02:46 +00:00
Luoqi Chen	f9e84c2fee	Restore pre-v1.44 behavior: always copy modified in-core inode to disk buffer. Otherwise some in-core inode changes might be lost, including important meta data (e.g. size) if softupdates is enabled.	1998-09-15 14:45:28 +00:00
Justin T. Gibbs	eda00cb5d2	When a buffer is removed from a buffer queue, remember it's block number and use it as "the currently active" buffer in doing disk sort calculations.	1998-09-15 08:55:03 +00:00
Søren Schmidt	d024c95599	Remove the SLICE code. This clearly needs alot more thought, and we dont need this to hunt us down in 3.0-RELEASE.	1998-09-14 19:56:42 +00:00
Bruce Evans	9164000766	Don't dereference an uninitialized pointer in dead code. The dead code gets executed if it is compiled without optimization.	1998-09-12 14:46:15 +00:00
Bruce Evans	8994ca3ce9	Removed statically configured mount type numbers (MOUNT_) and all references to them. The change a couple of days ago to ignore these numbers in statically configured vfsconf structs was slightly premature because the cd9660, cfs, devfs, ext2fs, nfs vfs's still used MOUNT_ instead of the number in their vfsconf struct.	1998-09-07 13:17:06 +00:00
Bruce Evans	ff261f16f6	Put the zombie ffs sysctl node in "notyet" state together with its few remaining children. Prepare it for MOUNT_UFS going away.	1998-09-07 11:50:19 +00:00
Poul-Henning Kamp	21aa768ea1	Make MFS do the default on VOP_FREEBLKS(). XXX: we could deallocate the storage, but somebody else will have to pick up that task.	1998-09-07 06:52:01 +00:00
Poul-Henning Kamp	0375c9f2b8	Add a new vnode op, VOP_FREEBLKS(), which filesystems can use to inform device drivers about sectors no longer in use. Device-drivers receive the call through d_strategy, if they have D_CANFREE in d_flags. This allows flash based devices to erase the sectors and avoid pointlessly carrying them around in compactions. Reviewed by: Kirk Mckusick, bde Sponsored by: M-Systems (www.m-sys.com)	1998-09-05 14:13:12 +00:00
Bruce Evans	1874ef935c	Quick fix for breakage of read clustering on non-IDE drives. Read clustering is obsolescent technology so hardly anyone noticed. On a DORS 32160 SCSI drive with 4 tags, read clustering makes very little difference even for huge sequential reads. However, on a ZIP SCSI drive with 0 tags, the minimum overhead per block is about 40 msec, so very large clusters must be used to get anywhere near the maximum transfer rate. Using clusters consisting of 1 8K block reduces the transfer rate to about 250K/sec. Under msdosfs, missing read clustering is normal and a cluster size of 1 512 byte block reduces the transfer rate to about 25K/sec. Broken in: rev.1.18	1998-08-18 03:54:39 +00:00
Bruce Evans	0492d857d1	Removed unused includes.	1998-08-17 19:09:36 +00:00
Mike Smith	f01beb610a	"The releaseing of the reference and lock is not temporary and belongs where it is. The reference and lock(s) are acquired just above the code in VREF() and relookup()." Submitted by: Michael Hancock <michaelh@cet.co.jp>	1998-08-12 21:42:54 +00:00
Julian Elischer	55d80b2df1	Handle the case of moving a directory onto the top of a sibling's child of the same name. Submitted by: Kirk Mckusick with fixes from luoqi Chen Obtained from: Whistle test tree.	1998-08-12 20:46:47 +00:00
Bruce Evans	aa6db4230d	Used daddr_t's, not ints, to store disk block numbers. Updated printf formats and args to match. Fixed old printf format errors (all related; most were hidden by calling printf indirectly). This change somehow avoids compiler bugs for 64-bit longs on i386's, although it increases the number of 64-bit calculations.	1998-07-28 18:25:51 +00:00
Bruce Evans	15b29aabe4	Made lazy syncing of timestamps for special files non-optional.	1998-07-27 15:37:00 +00:00
Bruce Evans	a23d65bfc8	Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.	1998-07-15 02:32:35 +00:00
Bruce Evans	ac1e407b32	Fixed printf format errors.	1998-07-11 07:46:16 +00:00
Julian Elischer	f763857cff	Add code missed in the initial Soft updates integration. Make the unallocated parts of a directry have a know state in case we need it later.	1998-07-10 00:10:20 +00:00
Julian Elischer	bcbd6c6fdd	Don't update superblock if mounted readonly, also fixes some problems with softupdates on root. More cleanups are needed here.. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>	1998-07-08 23:52:27 +00:00
Julian Elischer	6deaf84b1f	Catch a few corner cases where FreeBSD differs enough from BSD 4.4 to confuse Soft updates.. Should solve several "dangling deps" panics.	1998-07-08 01:04:33 +00:00
Julian Elischer	fd5d1124e2	VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>	1998-07-04 20:45:42 +00:00
Bruce Evans	99977261a1	Restored revs.1.89-1.90 which I somehow clobbered in rev.1.91.	1998-07-03 22:37:43 +00:00
Bruce Evans	3055187290	Sync timestamp changes for inodes of special files to disk as late as possible (when the inode is reclaimed). Temporarily only do this if option UFS_LAZYMOD configured and softupdates aren't enabled. UFS_LAZYMOD is intentionally left out of /sys/conf/options. This is mainly to avoid almost useless disk i/o on battery powered machines. It's silly to write to disk (on the next sync or when the inode becomes inactive) just because someone hit a key or something wrote to the screen or /dev/null. PR: 5577 Previous version reviewed by: phk	1998-07-03 22:17:03 +00:00
Bruce Evans	33cc029eab	Centralized in-core inode update. Update the in-core inode directly in ufs_setattr() so that there is no need to pass timestamps to UFS_UPDATE() (everything else just needs the current time). Ignore the passed-in timestamps in UFS_UPDATE() and always call ufs_itimes() (was: itimes()) to do the update. The timestamps are still passed so that all the callers don't need to be changed yet.	1998-07-03 18:46:52 +00:00
Poul-Henning Kamp	94c69b7e15	Make vprint() print dev_t in hex also.	1998-06-27 07:28:49 +00:00
Poul-Henning Kamp	81b42c386e	Report the type from the inode, not the vnode.	1998-06-27 06:45:04 +00:00
Jordan K. Hubbard	d94ce17be4	Flesh this document out just a little in response to some user questions and also recommend linking over copying since, at this stage, a stale copy is a real concern.	1998-06-26 10:35:55 +00:00
Bruce Evans	e5b19842ef	Removed unused includes.	1998-06-21 14:53:44 +00:00
Julian Elischer	c619155f0e	Slight change to directory cleanup Makes soft updates a bit cleaner. Eliminates some warnings about 'corrupted directories' from fsck.	1998-06-14 19:31:28 +00:00
Julian Elischer	28ed032673	Note which version of Kirk's sources this corresponds to.	1998-06-12 21:21:26 +00:00
Julian Elischer	aa75cb86b4	Fix the case when renaming to a file that you've just created and deleted, that had an inode that has not yet been written to disk, when the inode of the new file is also not yet written to disk, and your old directory entry is not yet on disk but you need to remove it and the new name exists in memory but has been deleted but the transaction to write the deleted name to disk exists and has not yet been cancelled by the request to delete the non existant name. I don't know how kirk could have missed such a glaring problem for so long. :-) Especially since the inconsitency survived on the disk for a whole 4 second on average before being fixed by other code. This was not a crashing bug but just led to filesystem inconsitencies if you crashed. Submitted by: Kirk McKusick (mckusick@mckusick.com)	1998-06-12 20:48:30 +00:00
Julian Elischer	6d0ba44288	Add B_NOCACHE to several cases where BSD4.4 only required a B_INVAL. Change worked out by john and kirk in consort.	1998-06-11 17:44:32 +00:00
Julian Elischer	8c221701c3	Fix for "live inode" panic. Submitted by: Kirk McKusick <mckusick@McKusick.COM> Reviewed by: yeah right...	1998-06-10 20:45:46 +00:00
Julian Elischer	4af0bb0f9e	Remove buggy debugging code.	1998-06-10 20:03:16 +00:00
Julian Elischer	939001af5c	Back out John's changes 1.45 -> 1.46 Kirk confirms that the original semantic was what he wanted... (well, a very slight difference) May fix "dangling deps" panic with soft updates.	1998-06-10 19:27:56 +00:00
Julian Elischer	3f2419f9e4	The version of the softdep changes in FreeBSD broke the (doingdirectory && !newparent) case of ufs_rename(). rename("D1/X/", "D2/Y/") gives a wrong link count for D2. Submitted by: Bruce Evans <bde@zeta.org.au> Reviewed by: Kirk McKusick <mckusick@McKusick.COM>	1998-06-08 23:55:33 +00:00
Bruce Evans	9399d2c5ad	Null change. Forgot to mention in previous log message that MNT_NOATIME is now ignored for special files, so that mounting root with option noatime doesn't break reporting of idle times in programs like `w'. The problem of execessive disk updates just to stamp atimes will be handled for special files by only writing atimes to disk when inodes become active. This works well because special files are relatively uncommon and their atimes are even more disposable at panic time than regular files' atimes.	1998-06-07 11:04:26 +00:00
Bruce Evans	12f66dd32f	Fixed some longstanding timestamp bugs: 1. mark atimes and mtimes of special files and fifos for update upon successful completion of non-null i/o, not at the beginning of the syscall. 2. never update file times for readonly filesystems. They were updated for stats and closes but not for syncs. The updates were of course only in-core and were thrown away when the inode was uncached, so the times sometimes appeared to go backwards. Improved comments in code related to (1) (mostly by removing them). Unmacroized ITIMES(). The test in (2) bloated it even more. Don't call getmicrotime() in the function version of it when we only need the time in seconds.	1998-06-07 10:49:18 +00:00
Doug Rabson	8435e0aef5	Use size_t instead of u_int for sizes.	1998-06-04 17:21:39 +00:00
Doug Rabson	6589ab80a9	If the filesystem blocksize is less than the VM page size, use the generic getpages code. This happens for filesystems with 4k pages on the alpha since the normal alpha pagesize is 8k.	1998-06-04 17:04:44 +00:00
Doug Rabson	58b395a905	Don't cast a pointer to an int in DQHASH.	1998-06-04 17:03:16 +00:00
Julian Elischer	00076e7cf9	Add a reference to the original softupdates paper	1998-06-02 01:30:51 +00:00
Julian Elischer	3942b533f8	Add a reference to the Ganger/Patt paper	1998-06-02 01:27:27 +00:00
Julian Elischer	b8cf4de4c8	A fix to a debug test from Kirk.	1998-05-27 03:32:23 +00:00
Julian Elischer	928c9ddf81	Ensure that there is enough information here, so that people can use soft updates should they desire.	1998-05-19 23:18:37 +00:00
Julian Elischer	25db4e8a66	Bring up-to-date with Whistle's current version Includes some debugging code.	1998-05-19 23:07:25 +00:00
Julian Elischer	46e752be05	Merge with Kirk's version as of Feb 20 His version 9.23 == our version 1.5 of ffs_softdep.c His version 9.5 == our version 1.4 of softdep.c	1998-05-19 22:54:53 +00:00
Julian Elischer	62e12c760c	Merge in Kirk's changes to stop softupdates from hogging all of memory.	1998-05-19 21:45:53 +00:00
Julian Elischer	b6dad36385	Change to stop a silly panic. This should be understood better. Change a buffer swizzle trick to a bcopy. It would be nice if the efficient trick could be used in the future.	1998-05-19 20:50:41 +00:00
Julian Elischer	987614a910	First published FreeBSD version of soft updates Feb 5.	1998-05-19 20:18:42 +00:00
Julian Elischer	a697eb98d4	This commit was generated by cvs2svn to compensate for changes in r36206, which included commits to RCS files with non-trunk default branches.	1998-05-19 20:03:29 +00:00
Julian Elischer	8e95b94dec	Import the next version received from kirk after some FreeBSD feedback.	1998-05-19 20:03:29 +00:00
Julian Elischer	8d1c524575	This commit was generated by cvs2svn to compensate for changes in r36201, which included commits to RCS files with non-trunk default branches.	1998-05-19 19:47:22 +00:00
Julian Elischer	467e1a6e7a	Import the earliest version of the soft update code that I have.	1998-05-19 19:47:22 +00:00
Julian Elischer	c11d29814e	try stop the user from using mount -u to set the async flag on a filesystem currently using soft updates. Also needs a new copy of ffs_softdep.c to complete the fix.	1998-05-18 06:38:18 +00:00
Poul-Henning Kamp	c21410e119	s/nanoruntime/nanouptime/g s/microruntime/microuptime/g Reviewed by: bde	1998-05-17 11:53:46 +00:00
Julian Elischer	5d0957193a	Add missing splx() Submitted by: Luoqi Chen <luoqi@chen.ml.org>	1998-05-11 21:41:13 +00:00
Julian Elischer	336c78bb90	Submitted by: abial@nask.pl Minor fix to support SLICE in MFS...	1998-05-11 19:27:18 +00:00
Mike Smith	7be2d30077	In the words of the submitter: --------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly. Quality testing was done with testvn, and lat_fs from the lmbench suite. Some NFS client testing courtesy of Patrik Kudo. vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. --------- Submitted by: Michael Hancock <michaelh@cet.co.jp>	1998-05-07 04:58:58 +00:00
Mike Smith	79cc756d8b	As described by the submitter: Reverse the VFS_VRELE patch. Reference counting of vnodes does not need to be done per-fs. I noticed this while fixing vfs layering violations. Doing reference counting in generic code is also the preference cited by John Heidemann in recent discussions with him. The implementation of alternative vnode management per-fs is still a valid requirement for some filesystems but will be revisited sometime later, most likely using a different framework. Submitted by: Michael Hancock <michaelh@cet.co.jp>	1998-05-06 05:29:41 +00:00
John Dyson	e60606c0af	Correct an error that I made where the vtruncbuf was changed back to vinvalbuf, but I incorrectly added the "V_SAVE\|V_SAVEMETA" flags. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>	1998-05-04 17:43:48 +00:00
John Dyson	83ad4e3dbc	Fix an error that I made with an optimization. In the case of softupdates, we need to do vtruncbuf the old way. Luoqi caught, found the bug and submitted this fix. Submitted by: Luoqi Chen <luoqi@chen.ml.org>	1998-04-30 05:28:53 +00:00
Julian Elischer	c0bab11dfe	Make the devfs SLICE option a standard type option. (hopefully it will go away eventually anyhow)	1998-04-20 03:57:41 +00:00
Julian Elischer	3e425b968d	Add changes and code to implement a functional DEVFS. This code will be turned on with the TWO options DEVFS and SLICE. (see LINT) Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes. /dev will be automatically mounted by init (thanks phk) on bootup. See /sys/dev/slice/slice.4 for more info. All code should act the same without these options enabled. Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others This code does not support the following: bad144 handling. Persistance. (My head is still hurting from the last time we discussed this) ATAPI flopies are not handled by the SLICE code yet. When this code is running, all major numbers are arbitrary and COULD be dynamically assigned. (this is not done, for POLA only) Minor numbers for disk slices ARE arbitray and dynamically assigned.	1998-04-19 23:32:49 +00:00
Dag-Erling Smørgrav	dc73342347	Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.	1998-04-17 22:37:19 +00:00
Bruce Evans	37223939f0	Fixed bitrot in the non-softdep case of ufs_dirremove(): - restored async mount support. The first entry in a block is still always written synchronously, although it probably shouldn't be in the async case. - restored use of BWRITE() instead of bowrite() for the DOWHITEOUT case, although bowrite() is probably better. Broken by: merge of softdep changes (rev.1.22). Found by: lmbench2 delete-file benchmarks.	1998-04-15 12:27:31 +00:00
Peter Wemm	a66ae6f438	Back this out, allowing users to get a fd connected to a symlink is just too dangerous.	1998-04-06 18:18:50 +00:00
Peter Wemm	b587fd008d	Don't panic if a VOP_READ() gets through on a short link, Just Do It (because we can :-). This means you can open a link file (or pseudo-file in the case of short links where the data is stored in the inode rather than disk blocks) and read the contents. However, trap any writes from the user as it's difficult to do the right thing in all cases. A link may be short and the user may be trying to extend it beyond the limit and so on. Although.. being able to re-target a symlink without deleting it first might have been nice. This stuff is a bit perverse since symlink() and readlink() calls can end up actually being implemented as read/write vnode ops. Reviewed by: phk	1998-04-06 17:44:40 +00:00
Poul-Henning Kamp	00af9731c9	Time changes mark 2: * Figure out UTC relative to boottime. Four new functions provide time relative to boottime. * move "runtime" into struct proc. This helps fix the calcru() problem in SMP. * kill mono_time. * add timespec{add\|sub\|cmp} macros to time.h. (XXX: These may change!) * nanosleep, select & poll takes long sleeps one day at a time Reviewed by: bde Tested by: ache and others	1998-04-04 13:26:20 +00:00
Poul-Henning Kamp	227ee8a188	Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde	1998-03-30 09:56:58 +00:00
Bruce Evans	08637435f2	Moved some #includes from <sys/param.h> nearer to where they are actually used.	1998-03-28 10:33:27 +00:00
Peter Wemm	26cf9c3b75	Enable the use of soft updates on the root filesystem. Previously, the softdep mode could only be activated on the initial mount of a filesystem and then only if it was a read-write mount. A 'mount -r' (as done in the rootfs mount) followed by a 'mount -u' to convert to read-write didn't start softdep mode.	1998-03-27 14:20:57 +00:00
Poul-Henning Kamp	a0502b19d4	Add two new functions, get{micro\|nano}time. They are atomic, but return in essence what is in the "time" variable. gettime() is now a macro front for getmicrotime(). Various patches to use the two new functions instead of the various hacks used in their absence. Some puntuation and grammer patches from Bruce. A couple of XXX comments.	1998-03-26 20:54:05 +00:00
Bruce Evans	3d2d6cc3d8	Forward declare even more structs to restore some self-sufficiency. Didn't fix new dependence on <ufs/ufs/inode.h> and its prerequisites.	1998-03-23 14:12:37 +00:00
John Dyson	9eebcfcf8c	Softdep_sync_metadata appears to expect that it is called at splbio, so make it so...	1998-03-21 05:16:09 +00:00
John Dyson	34f72be5af	Fix vfs_bio_awrite usage, and correct vtruncbuf usage.	1998-03-19 22:49:44 +00:00
John Dyson	bef608bd7e	Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!! pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.) pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances. vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.) vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust. vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true. vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.) vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities. vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.) vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.) 10) Modify FFS to use vtruncbuf. vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.) vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly. vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.) vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.	1998-03-16 01:56:03 +00:00
John Dyson	31bba5f966	Correct a problem with the ffs_getpages routine that manifest's itself during the tail command. The amount to read is incorrectly calculated. Submitted by: Tor Egge	1998-03-09 22:12:52 +00:00
Julian Elischer	b1897c197c	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree	1998-03-08 09:59:44 +00:00
Julian Elischer	2cbcee772b	Submitted by: kirk McKusick Stub file for soft updates.	1998-03-08 08:38:41 +00:00
John Dyson	8f9110f6a1	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.	1998-03-07 21:37:31 +00:00
Bruce Evans	16337c2efb	Fixed missing simple_lock() in ffs_mountfs().	1998-03-07 14:59:44 +00:00
Mike Smith	34bdbbd0de	The intent is to get rid of WILLRELE in vnode_if.src by making a complement to all ops that return a vpp, VFS_VRELE. This is initially only for file systems that implement the following ops that do a WILLRELE: vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link, vop_rename, vop_mkdir, vop_rmdir, vop_symlink This is initial DNA that doesn't do anything yet. VFS_VRELE is implemented but not called. A default vfs_vrele was created for fs implementations that use the standard vnode management routines. VFS_VRELE implementations were made for the following file systems: Standard (vfs_vrele) ffs mfs nfs msdosfs devfs ext2fs Custom union umapfs Just EOPNOTSUPP fdesc procfs kernfs portal cd9660 These implementations may change as VOP changes are implemented. In the next phase, in the vop implementations calls to vrele and the vrele part of vput will be moved to the top layer vfs_vnops and made visible to all layers. vput will be replaced by unlock in these cases. Unlocking will still be done in the per fs layer but the refcount decrement will be triggered at the top because it doesn't hurt to hold a vnode reference a little longer. This will have minimal impact on the structure of the existing code. This will only be done for vnode arguments that are released by the various fs vop implementations. Wider use of VFS_VRELE will likely require restructuring of the code. Reviewed by: phk, dyson, terry et. al. Submitted by: Michael Hancock <michaelh@cet.co.jp>	1998-03-01 22:46:53 +00:00
Mike Smith	ce75f2c365	In the author's words: These diffs implement the first stage of a VOP_{GET\|PUT}PAGES pushdown for local media FS's. See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation details for generic _{get\|put}pages for local media FS's. Support is trivial to add for any FS that formerly relied on the default behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the ffs_getpages() code for the FS in question's _{get\|put}pages). Obviously, it would be better if each local media FS implemented a more optimal method, instead of calling an exported interface from the /sys/vm/vnode_pager.c, but this is a necessary first step in getting the FS's to a point where they can be supplied with better implementations on a case-by-case basis. Obviously, the cd9660_putpages() can be rather trivial (since it is a read-only FS type 8-)). A slight (temporary) modification is made to print a diagnostic message in the case where the underlying filesystem attempts to engage in the previous behaviour. Failure is likely to be ungraceful. Submitted by: terry@freebsd.org (Terry Lambert)	1998-02-26 06:39:59 +00:00
Bruce Evans	c9b9921363	Fixed missing permissions checking for mounting by non-root. There is now less need for the vfs.usermount sysctl. msdosfs already has this change, modulo a missing LK_RETRY, via NetBSD. At least ext2fs is missing this and many other changes from Lite2. Obtained from: Lite2	1998-02-25 04:47:04 +00:00
Bruce Evans	d68fa50ccb	Don't depend on "implicit int".	1998-02-20 13:37:40 +00:00
Mike Smith	b35809ad68	Fix a panic resulting from executing off an MFS image. This corrects the recently observed problem with the install image. Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>	1998-02-16 23:55:53 +00:00
Bruce Evans	721a23b12f	Removed unnecessary dependencies on KERNEL and DIAGNOSTIC. This was more useful when opt_diagnostic.h had to be included.	1998-02-13 00:20:36 +00:00
Eivind Eklund	303b270b0a	Staticize.	1998-02-09 06:11:36 +00:00
Eivind Eklund	0b08f5f737	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
John Dyson	95461b450d	1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.	1998-02-05 03:32:49 +00:00
Eivind Eklund	47cfdb166d	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
Bruce Evans	9cf2c3e77a	Forward declare some structs so that this file is more self-sufficient.	1998-02-03 21:52:02 +00:00
John Dyson	9cfcd01101	Back out recent laptop sync changes. They had significant errors.	1998-02-01 08:24:00 +00:00
John Dyson	de1050d8e4	Support more intelligent sync operations for MNT_NOATIME. PR: kern/5577 Submitted by: Craig Leres <leres@ee.lbl.gov>	1998-02-01 01:59:12 +00:00
Julian Elischer	abafd7f814	Serves me right for not puting SUIDDIR in LINT. it got bitrot. This should stop complaints about it not working for people.	1998-01-31 19:28:28 +00:00
Poul-Henning Kamp	c5b193bfba	Retire LFS. If you want to play with it, you can find the final version of the code in the repository the tag LFS_RETIREMENT. If somebody makes LFS work again, adding it back is certainly desireable, but as it is now nobody seems to care much about it, and it has suffered considerable bitrot since its somewhat haphazard integration. R.I.P	1998-01-30 11:34:06 +00:00
Eivind Eklund	7b778b5e61	Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style. This introduce an xxxFS_BOOT for each of the rootable filesystems. (Presently not required, but encouraged to allow a smooth move of option *FS to opt_dontuse.h later.) LFS is temporarily disabled, and will be re-enabled tomorrow.	1998-01-24 02:54:56 +00:00
John Dyson	50ce7ff499	Add better support for larger I/O clusters, including larger physical I/O. The support is not mature yet, and some of the underlying implementation needs help. However, support does exist for IDE devices now.	1998-01-24 02:01:46 +00:00
John Dyson	2d8acc0f4a	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)	1998-01-22 17:30:44 +00:00
John Dyson	4722175765	Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.	1998-01-17 09:17:02 +00:00
John Dyson	95e5e988e0	Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.	1998-01-06 05:26:17 +00:00
Bruce Evans	e5cc36223b	Removed unused #includes again. They thrashed when mfs_reclaim thrashed to ufs_reclaim and back.	1998-01-01 12:40:25 +00:00
John Dyson	60f8d46448	Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem with the object cache removal.	1997-12-29 01:03:55 +00:00
John Dyson	2be70f79f6	Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.	1997-12-29 00:25:11 +00:00
Bruce Evans	675ea6f083	Unspammed nested include of <vm/vm_zone.h>.	1997-12-27 02:56:39 +00:00
John Dyson	4854f102a0	I added vfs_ioopt prematurely, disabled.	1997-12-21 10:41:19 +00:00
John Dyson	1efb74fbcc	Some performance improvements, and code cleanups (including changing our expensive OFF_TO_IDX to btoc whenever possible.)	1997-12-19 09:03:37 +00:00
Eivind Eklund	16a4e2f328	Make LINT compile again after wollman introduced poll() here. Overlooked by: wollman	1997-12-16 22:28:26 +00:00
Eivind Eklund	8c13c35718	Convert SUIDDIR fully to a new-style option. Forgotten by: julian	1997-12-15 21:51:45 +00:00
Garrett Wollman	1cbbd625cc	Add support for poll(2) on files. vop_nopoll() now returns POLLNVAL if one of the new poll types is requested; hopefully this will not break any existing code. (This is done so that programs have a dependable way of determining whether a filesystem supports the extended poll types or not.) The new poll types added are: POLLWRITE - file contents may have been modified POLLNLINK - file was linked, unlinked, or renamed POLLATTRIB - file's attributes may have been changed POLLEXTEND - file was extended Note that the internal operation of poll() means that it is impossible for two processes to reliably poll for the same event (this could be fixed but may not be worth it), so it is not possible to rewrite `tail -f' to use poll at this time.	1997-12-15 03:09:59 +00:00
Bruce Evans	2573d99c75	Restored ufs_pathconf() from rev.1.61. vop_stdpathconf() is too general to be of much use. Using it here broke the _PC_NAME_MAX, _PC_NO_TRUNC and _PC_PATH_MAX cases, and weakened the _PC_MAX_CANON, _PC_MAX_INPUT and _PC_VDISABLE cases.	1997-12-13 12:30:34 +00:00

... 3 4 5 6 7 ...

748 Commits