freebsd-nq

Author	SHA1	Message	Date
Poul-Henning Kamp	0c25dbeb17	Remove faint traces of non-existant ffs_bmap().	2001-04-29 10:23:32 +00:00
Greg Lehey	60fb0ce365	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
Kirk McKusick	c9509f5865	Rather than copying all the indirect blocks of the snapshot, simply mark them as BLK_NOCOPY. This trick cuts the initial size of the snapshot in half and cuts the time to take a snapshot by a third.	2001-04-26 00:50:53 +00:00
Kirk McKusick	112f737245	When closing the last reference to an unlinked file, it is freed by the inactive routine. Because the freeing causes the filesystem to be modified, the close must be held up during periods when the filesystem is suspended. For snapshots to be consistent across crashes, they must write blocks that they copy and claim those written blocks in their on-disk block pointers before the old blocks that they referenced can be allowed to be written. Close a loophole that allowed unwritten blocks to be skipped when doing ffs_sync with a request to wait for all I/O activity to be completed.	2001-04-25 08:11:18 +00:00
Poul-Henning Kamp	a13234bb35	Move the netexport structure from the fs-specific mountstructure to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>	2001-04-25 07:07:52 +00:00
Ian Dowse	5d69bac493	Pre-dirpref versions of fsck may zero out the new superblock fields fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause panics if these fields were zeroed while a filesystem was mounted read-only, and then remounted read-write. Add code to ffs_reload() which copies the fs_contigdirs pointer from the previous superblock, and reinitialises fs_avgf* if necessary. Reviewed by: mckusick	2001-04-24 00:37:16 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
Kirk McKusick	5819ab3f12	Add debugging option to always read/write cylinder groups as full sized blocks. To enable this option, use: `sysctl -w debug.bigcgs=1'. Add debugging option to disable background writes of cylinder groups. To enable this option, use: `sysctl -w debug.dobkgrdwrite=0'. These debugging options should be tried on systems that are panicing with corrupted cylinder group maps to see if it makes the problem go away. The set of panics in question are: ffs_clusteralloc: map mismatch ffs_nodealloccg: map corrupted ffs_nodealloccg: block not in map ffs_alloccg: map corrupted ffs_alloccg: block not in map ffs_alloccgblk: cyl groups corrupted ffs_alloccgblk: can't find blk in cyl ffs_checkblk: partially free fragment The following panics are less likely to be related to this problem, but might be helped by these debugging options: ffs_valloc: dup alloc ffs_blkfree: freeing free block ffs_blkfree: freeing free frag ffs_vfree: freeing free inode If you try these options, please report whether they helped reduce your bitmap corruption panics to Kirk McKusick at <mckusick@mckusick.com> and to Matt Dillon <dillon@earth.backplane.com>.	2001-04-17 05:37:51 +00:00
Kirk McKusick	f0f3f19f05	Background fsck sysctl operations must use vn_start_write and vn_finished_write so that they do not attempt to modify a suspended filesystem.	2001-04-17 05:06:37 +00:00
Kirk McKusick	74046077a7	Update to describe use of mdconfig instead of deprecated vnconfig. Submitted by: Steve Ames <steve@virtual-voodoo.com>	2001-04-14 18:32:09 +00:00
Kirk McKusick	1a6a661032	This checkin adds support in ufs/ffs for the FS_NEEDSFSCK flag. It is described in ufs/ffs/fs.h as follows: /* * Filesystem flags. * * Note that the FS_NEEDSFSCK flag is set and cleared only by the * fsck utility. It is set when background fsck finds an unexpected * inconsistency which requires a traditional foreground fsck to be * run. Such inconsistencies should only be found after an uncorrectable * disk error. A foreground fsck will clear the FS_NEEDSFSCK flag when * it has successfully cleaned up the filesystem. The kernel uses this * flag to enforce that inconsistent filesystems be mounted read-only. / #define FS_UNCLEAN 0x01 / filesystem not clean at mount / #define FS_DOSOFTDEP 0x02 / filesystem using soft dependencies / #define FS_NEEDSFSCK 0x04 / filesystem needs sync fsck before mount */	2001-04-14 05:26:28 +00:00
Kirk McKusick	a61ab64ac4	Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>. His description of the problem and solution follow. My own tests show speedups on typical filesystem intensive workloads of 5% to 12% which is very impressive considering the small amount of code change involved. ------ One day I noticed that some file operations run much faster on small file systems then on big ones. I've looked at the ffs algorithms, thought about them, and redesigned the dirpref algorithm. First I want to describe the results of my tests. These results are old and I have improved the algorithm after these tests were done. Nevertheless they show how big the perfomance speedup may be. I have done two file/directory intensive tests on a two OpenBSD systems with old and new dirpref algorithm. The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports". The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release. It contains 6596 directories and 13868 files. The test systems are: 1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for test is at wd1. Size of test file system is 8 Gb, number of cg=991, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=35 2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system at wd0, file system for test is at wd1. Size of test file system is 40 Gb, number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50 You can get more info about the test systems and methods at: http://www.ptci.ru/gluk/dirpref/old/dirpref.html Test Results tar -xzf ports.tar.gz rm -rf ports mode old dirpref new dirpref speedup old dirprefnew dirpref speedup First system normal 667 472 1.41 477 331 1.44 async 285 144 1.98 130 14 9.29 sync 768 616 1.25 477 334 1.43 softdep 413 252 1.64 241 38 6.34 Second system normal 329 81 4.06 263.5 93.5 2.81 async 302 25.7 11.75 112 2.26 49.56 sync 281 57.0 4.93 263 90.5 2.9 softdep 341 40.6 8.4 284 4.76 59.66 "old dirpref" and "new dirpref" columns give a test time in seconds. speedup - speed increasement in times, ie. old dirpref / new dirpref. ------ Algorithm description The old dirpref algorithm is described in comments: /* * Find a cylinder to place a directory. * * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. / A new directory is allocated in a different cylinder groups than its parent directory resulting in a directory tree that is spreaded across all the cylinder groups. This spreading out results in a non-optimal access to the directories and files. When we have a small filesystem it is not a problem but when the filesystem is big then perfomance degradation becomes very apparent. What I mean by a big file system ? 1. A big filesystem is a filesystem which occupy 20-30 or more percent of total drive space, i.e. first and last cylinder are physically located relatively far from each other. 2. It has a relatively large number of cylinder groups, for example more cylinder groups than 50% of the buffers in the buffer cache. The first results in long access times, while the second results in many buffers being used by metadata operations. Such operations use cylinder group blocks and on-disk inode blocks. The cylinder group block (fs->fs_cblkno) contains struct cg, inode and block bit maps. It is 2k in size for the default filesystem parameters. If new and parent directories are located in different cylinder groups then the system performs more input/output operations and uses more buffers. On filesystems with many cylinder groups, lots of cache buffers are used for metadata operations. My solution for this problem is very simple. I allocate many directories in one cylinder group. I also do some things, so that the new allocation method does not cause excessive fragmentation and all directory inodes will not be located at a location far from its file's inodes and data. The algorithm is: / * Find a cylinder group to place a directory. * * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. / My early versions of dirpref give me a good results for a wide range of file operations and different filesystem capacities except one case: those applications that create their entire directory structure first and only later fill this structure with files. My solution for such and similar cases is to limit a number of directories which may be created one after another in the same cylinder group without intervening file creations. For this purpose, I allocate an array of counters at mount time. This array is linked to the superblock fs->fs_contigdirs[cg]. Each time a directory is created the counter increases and each time a file is created the counter decreases. A 60Gb filesystem with 8mb/cg requires 10kb of memory for the counters array. The maxcontigdirs is a maximum number of directories which may be created without an intervening file creation. I found in my tests that the best performance occurs when I restrict the number of directories in one cylinder group such that all its files may be located in the same cylinder group. There may be some deterioration in performance if all the file inodes are in the same cylinder group as its containing directory, but their data partially resides in a different cylinder group. The maxcontigdirs value is calculated to try to prevent this condition. Since there is no way to know how many files and directories will be allocated later I added two optimization parameters in superblock/tunefs. They are: int32_t fs_avgfilesize; / expected average file size / int32_t fs_avgfpdir; / expected # of files per directory */ These parameters have reasonable defaults but may be tweeked for special uses of a filesystem. They are only necessary in rare cases like better tuning a filesystem being used to store a squid cache. I have been using this algorithm for about 3 months. I have done a lot of testing on filesystems with different capacities, average filesize, average number of files per directory, and so on. I think this algorithm has no negative impact on filesystem perfomance. It works better than the default one in all cases. The new dirpref will greatly improve untarring/removing/coping of big directories, decrease load on cvs servers and much more. The new dirpref doesn't speedup a compilation process, but also doesn't slow it down. Obtained from: Grigoriy Orlov <gluk@ptci.ru>	2001-04-10 08:38:59 +00:00
Jeroen Ruigrok van der Werven	5d0b660f2a	Fix typo ); -> ,	2001-03-24 15:25:04 +00:00
Kirk McKusick	fca26df055	Check that background fsck operation is being done on a ufs filesystem. Obtained from: Robert Watson <rwatson@FreeBSD.org>	2001-03-23 20:58:25 +00:00
Kirk McKusick	812b1d416c	Add kernel support for running fsck on active filesystems.	2001-03-21 04:09:01 +00:00
Kirk McKusick	31c6ce0aed	Clear the fs_clean flag only when the FS_UNCLEAN flag is not set (as is done in unmount). Remove a snapshot inode from the superblock list when its last name goes away rather than when its last reference goes away. That way it will be properly reclaimed by fsck after a crash rather than reenabled when the filesystem is mounted.	2001-03-21 04:05:20 +00:00
Kirk McKusick	7e72e9918a	Report the correct inode number when panicing with freeing free inode. Report the correct block number when panicing with freeing free block.	2001-03-21 04:01:02 +00:00
Robert Watson	516081f288	o Change options FFS_EXTATTR and options FFS_EXTATTR_AUTOSTART to options UFS_EXTATTR and UFS_EXTATTR_AUTOSTART respectively. This change reflects the fact that our EA support is implemented entirely at the UFS layer (modulo FFS start/stop/autostart hooks for mount and unmount events). This also better reflects the fact that [shortly] MFS will also support EAs, as well as possibly IFS. o Consumers of the EA support in FFS are reminded that as a result, they must change kernel config files to reflect the new option names. Obtained from: TrustedBSD Project	2001-03-19 04:35:40 +00:00
Robert Watson	f5161237ad	o Implement "options FFS_EXTATTR_AUTOSTART", which depends on "options FFS_EXTATTR". When extended attribute auto-starting is enabled, FFS will scan the .attribute directory off of the root of each file system, as it is mounted. If .attribute exists, EA support will be started for the file system. If there are files in the directory, FFS will attempt to start them as attribute backing files for attributes baring the same name. All attributes are started before access to the file system is permitted, so this permits race-free enabling of attributes. For attributes backing support for security features, such as ACLs, MAC, Capabilities, this is vital, as it prevents the file system attributes from getting out of sync as a result of file system operations between mount-time and the enabling of the extended attribute. The userland extattrctl tool will still function exactly as previously. Files must be placed directly in .attribute, which must be directly off of the file system root: symbolic links are not permitted. FFS_EXTATTR will continue to be able to function without FFS_EXTATTR_AUTOSTART for sites that do not want/require auto-starting. If you're using the UFS_ACL code available from www.TrustedBSD.org, using FFS_EXTATTR_AUTOSTART is recommended. o This support is implemented by adding an invocation of ufs_extattr_autostart() to ffs_mountfs(). In addition, several new supporting calls are introduced in ufs_extattr.c: ufs_extattr_autostart(): start EAs on the specified mount ufs_extattr_lookup(): given a directory and filename, return the vnode for the file. ufs_extattr_enable_with_open(): invoke ufs_extattr_enable() after doing the equililent of vn_open() on the passed file. ufs_extattr_iterate_directory(): iterate over a directory, invoking ufs_extattr_lookup() and ufs_extattr_enable_with_open() on each entry. o This feature is not widely tested, and therefore may contain bugs, caution is advised. Several changes are in the pipeline for this feature, including breaking out of EA namespaces into subdirectories of .attribute (this is waiting on the updated EA API), as well as a per-filesystem flag indicating whether or not EAs should be auto-started. This is required because administrators may not want .attribute auto-started on all file systems, especially if non-administrators have write access to the root of a file system. Obtained from: TrustedBSD Project	2001-03-14 05:32:31 +00:00
Kirk McKusick	589c7af992	Fixes to track snapshot copy-on-write checking in the specinfo structure rather than assuming that the device vnode would reside in the FFS filesystem (which is obviously a broken assumption with the device filesystem).	2001-03-07 07:09:55 +00:00
Kirk McKusick	8775e64a5d	Free lock before returning from process_worklist_item. Obtained from: Constantine Sapuntzakis <csapuntz@stanford.edu>	2001-03-01 21:43:46 +00:00
Adrian Chadd	f3a90da995	Reviewed by: jlemon An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin() stuff for `path'. `data' still requires copyin() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.	2001-03-01 21:00:17 +00:00
Kirk McKusick	a5a94e3936	Free lock before calling panic so that subsequent attempt to write out buffers does not re-panic with `locking against myself'. This change should not affect normal operations of soft updates in any way.	2001-02-23 09:01:31 +00:00
Kirk McKusick	cc686e21c0	When cleaning up excess inode dependencies, check for being done. Reviewed by: Jan Koum <jkb@yahoo-inc.com>	2001-02-22 10:17:57 +00:00
Kirk McKusick	2cf5d587a9	This patch corrects two problems with the rate limiting code that was introduced in revision 1.80. The problem manifested itself with a `locking against myself' panic and could also result in soft updates inconsistences associated with inodedeps. The two problems are: 1) One of the background operations could manipulate the bitmap while holding it locked with intent to create. This held lock results in a `locking against myself' panic, when the background processing that we have been coopted to do tries to lock the bitmap which we are already holding locked. To understand how to fix this problem, first, observe that we can do the background cleanups in inodedep_lookup only when allocating inodedeps (DEPALLOC is set in the call to inodedep_lookup). Second observe that calls to inodedep_lookup with DEPALLOC set can only happen from the following calls into the softdep code: softdep_setup_inomapdep softdep_setup_allocdirect softdep_setup_remove softdep_setup_freeblocks softdep_setup_directory_change softdep_setup_directory_add softdep_change_linkcnt Only the first two of these can come from ffs_alloc.c while holding a bitmap locked. Thus, inodedep_lookup must not go off to do request_cleanups when being called from these functions. This change adds a flag, NODELAY, that can be passed to inodedep_lookup to let it know that it should not do background processing in those cases. 2) The return value from request_cleanup when helping out with the cleanup was 0 instead of 1. This meant that despite the fact that we may have slept while doing the cleanups, the code did not recheck for the appearance of an inodedep (e.g., goto top in inodedep_lookup). This lead to the softdep inconsistency in which we ended up with two inodedep's for the same inode. Reviewed by: Peter Wemm <peter@yahoo-inc.com>, Matt Dillon <dillon@earth.backplane.com>	2001-02-20 11:14:38 +00:00
Jeroen Ruigrok van der Werven	d7d97eb0aa	Preceed/preceeding are not english words. Use precede and preceding.	2001-02-18 10:43:53 +00:00
Jake Burkholder	d5a08a6065	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.	2001-02-12 00:20:08 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
Poul-Henning Kamp	37d4006626	Another round of the <sys/queue.h> FOREACH transmogriffer. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 16:08:18 +00:00
Poul-Henning Kamp	fc2ffbe604	Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 13:13:25 +00:00
Poul-Henning Kamp	ef9e85abba	Use <sys/queue.h> macro API.	2001-02-04 12:37:48 +00:00
Matthew Dillon	f8e071a1eb	Fix a race between the syncer and umount. When you umount a softupdates filesystem softdep_process_worklist() is called in a loop until it indicates that no dependancies remain, but the determination of that fact depends on there only being one softdep_process_worklist() instance running. It was possible for the syncer to also be running softdep_process_worklist() and the pre-existing checks in the code to prevent this were not sufficient to prevent the race. This patch solves the problem. Approved-by: mckusick	2001-01-30 06:31:59 +00:00
Jason Evans	1b367556b5	Convert all simplelocks to mutexes and remove the simplelock implementations.	2001-01-24 12:35:55 +00:00
Ian Dowse	f55ff3f3ef	The ffs superblock includes a 128-byte region for use by temporary in-core pointers to summary information. An array in this region (fs_csp) could overflow on filesystems with a very large number of cylinder groups (~16000 on i386 with 8k blocks). When this happens, other fields in the superblock get corrupted, and fsck refuses to check the filesystem. Solve this problem by replacing the fs_csp array in 'struct fs' with a single pointer, and add padding to keep the length of the 128-byte region fixed. Update the kernel and userland utilities to use just this single pointer. With this change, the kernel no longer makes use of the superblock fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c to indicate that these fields must be calculated for compatibility with older kernels. Reviewed by: mckusick	2001-01-15 18:30:40 +00:00
Kirk McKusick	cb3ab5aaf7	Properly compute the size of the final block of superblock summary information. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2001-01-12 21:56:55 +00:00
Kirk McKusick	48d617487d	Several small but important fixes for snapshots: 1) Be more tolerant of missing snapshot files by only trying to decrement their reference count if they are registered as active. 2) Fix for snapshots of filesystems with block sizes larger than 8K (from Ollivier Robert <roberto@eurocontrol.fr>). 3) Fix to avoid losing last block in snapshot file when calculating blocks that need to be copied (from Don Coleman <coleman@coleman.org>).	2000-12-19 04:41:09 +00:00
Kirk McKusick	6da443cb22	Get rid of spurious check in ffs_truncate for i_size == length which fails to set the modification time on the file. The same check a few lines later takes the correct action. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2000-12-19 04:20:13 +00:00
Assar Westerlund	ca85ca6099	add a stub for softdep_slowdown so that it's possible to build the kernel without SOFTUPDATES	2000-12-17 23:59:56 +00:00
Seigo Tanimura	937c4dfa08	Do not race for the lock of an inode hash. Reviewed by: jhb	2000-12-13 10:04:01 +00:00
Kirk McKusick	1d733bbd10	Preventing runaway kernel soft updates memory, take three. Previously, the syncer process was the only process in the system that could process the soft updates background work list. If enough other processes were adding requests to that list, it would eventually grow without bound. Because some of the work list requests require vnodes to be locked, it was not generally safe to let random processes process the work list while they already held vnodes locked. By adding a flag to the work list queue processing function to indicate whether the calling process could safely lock vnodes, it becomes possible to co-opt other processes into helping out with the work list. Now when the worklist gets too large, other processes can safely help out by picking off those work requests that can be handled without locking a vnode, leaving only the small number of requests requiring a vnode lock for the syncer process. With this change, it appears possible to keep even the nastiest workloads under control. Submitted by: Paul Saab <ps@yahoo-inc.com>	2000-12-13 08:30:35 +00:00
David Malone	7cc0979fd6	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
Poul-Henning Kamp	959b7375ed	Staticize some malloc M_ instances.	2000-12-08 20:09:00 +00:00
Kirk McKusick	71868b020d	More aggressively rate limit the growth of soft dependency structures in the face of multiple processes doing massive numbers of filesystem operations. While this patch will work in nearly all situations, there are still some perverse workloads that can overwhelm the system. Detecting and handling these perverse workloads will be the subject of another patch. Reviewed by: Paul Saab <ps@yahoo-inc.com> Obtained from: Ethan Solomita <ethan@geocast.com>	2000-11-20 06:22:39 +00:00
Matthew Dillon	936524aa02	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>	2000-11-18 23:06:26 +00:00
Kirk McKusick	bd4bd019fb	When deleting a file, the ordering of events imposed by soft updates is to first write the deleted directory entry to disk, second write the zero'ed inode to disk, and finally to release the freed blocks and the inode back to the cylinder-group map. As this ordering requires two disk writes to occur which are normally spaced about 30 seconds apart (except when memory is under duress), it takes about a minute from the time that a file is deleted until its inode and data blocks show up in the cylinder-group map for reallocation. If a file has had only a brief lifetime (less than 30 seconds from creation to deletion), neither its inode nor its directory entry may have been written to disk. If its directory entry has not been written to disk, then we need not wait for that directory block to be written as the on-disk directory block does not reference the inode. Similarly, if the allocated inode has never been written to disk, we do not have to wait for it to be written back either as its on-disk representation is still zero'ed out. Thus, in the case of a short lived file, we can simply release the blocks and inode to the cylinder-group map immediately. As the inode and its blocks are released immediately, they are immediately available for other uses. If they are not released for a minute, then other inodes and blocks must be allocated for short lived files, cluttering up the vnode and buffer caches. The previous code was a bit too aggressive in trying to release the blocks and inode back to the cylinder-group map resulting in their being made available when in fact the inode on disk had not yet been zero'ed. This patch takes a more conservative approach to doing the release which avoids doing the release prematurely.	2000-11-14 09:00:25 +00:00
Adrian Chadd	0b0c10b48d	Initial commit of IFS - a inode-namespaced FFS. Here is a short description: How it works: -- Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.) I didn't see the need in duplicating all of sys/ufs/ffs to get this off the ground. File creation is done through a special file - 'newfile' . When newfile is called, the system allocates and returns an inode. Note that newfile is done in a cloning fashion: fd = open("newfile", O_CREAT\|O_RDWR, 0644); fstat(fd, &st); printf("new file is %d\n", (int)st.st_ino); Once you have created a file, you can open() and unlink() it by its returned inode number retrieved from the stat call, ie: fd = open("5", O_RDWR); The creation permissions depend entirely if you have write access to the root directory of the filesystem. To get the list of currently allocated inodes, VOP_READDIR has been added which returns a directory listing of those currently allocated. -- What this entails: * patching conf/files and conf/options to include IFS as a new compile option (and since ifs depends upon FFS, include the FFS routines) * An entry in i386/conf/NOTES indicating IFS exists and where to go for an explanation * Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS routines require (ffs_mount() and ffs_reload()) * a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS routines. IFS replaces some of the vfsops, and a handful of vnops - most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR(). Any other directory operation is marked as invalid. What this results in: * an IFS partition's create permissions are controlled by the perm/ownership of the root mount point, just like a normal directory * Each inode has perm and ownership too * IFS does NOT mean an FFS partition can be opened per inode. This is a completely seperate filesystem here * Softupdates doesn't work with IFS, and really I don't think it needs it. Besides, fsck's are FAST. (Try it :-) * Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC). Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against this particular inode, and unravelling THAT code isn't trivial. Therefore, useful inodes start at 3. Enjoy, and feedback is definitely appreciated!	2000-10-14 03:02:30 +00:00
Eivind Eklund	7eb9fca557	Blow away the v_specmountpoint define, replacing it with what it was defined as (rdev->si_mountpoint)	2000-10-09 17:31:39 +00:00
Robert Watson	ff435dcb91	o Move initialization of ump from mp to the top of the function so that it is defined whenm used in ufs_extattr_uepm_destroy(), fixing a panic due to a NULL pointer dereference. Submitted by: Wesley Morgan <morganw@chemicals.tacorp.com>	2000-10-06 15:31:28 +00:00
Robert Watson	9de54ba513	o Add call to ufs_extattr_uepm_destroy() in ffs_unmount() so as to clean up lock on extattrs. o Get for free a comment indicating where auto-starting of extended attributes will eventually occur, as it was in my commit tree also. No implementation change here, only a comment.	2000-10-04 04:44:51 +00:00
Jason Evans	a18b1f1d4d	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.	2000-10-04 01:29:17 +00:00
Boris Popov	67e871664b	Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_nolock() functions maintain only interlock lock. vop_stdlock() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick	2000-09-25 15:24:04 +00:00
Robert Watson	907da7c385	o Permit UFS Extended Attributes to be associated with special devices and FIFOs. Obtained from: TrustedBSD Project	2000-09-21 19:06:02 +00:00
Dag-Erling Smørgrav	8461bdba85	Silence a warning.	2000-09-17 19:41:26 +00:00
Kirk McKusick	52a3bfa2e7	Cannot do MALLOC with M_WAITOK while holding ACQUIRE_LOCK Obtained from: Ethan Solomita <ethan@geocast.com>	2000-09-07 23:02:55 +00:00
Jason Evans	0384fff8c5	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh	2000-09-07 01:33:02 +00:00
Tor Egge	b5ee7ec63a	Initialize *countp to 0 in stub for softdep_flushworklist(). This allows ffs_fsync() to break out of a loop that might otherwise be infinite on kernels compiled without the SOFTUPDATES option. The observed symptom was a system hang at the first unmount attempt.	2000-08-09 00:41:54 +00:00
Ollivier Robert	8694d8e912	Fix the lockmgr panic everyone is seeing at shutdown time. vput assumes curproc is the lock holder, but it's not true in this case. Thanks a lot Luoqi ! Submitted by: luoqi Tested by: phk	2000-08-01 14:15:07 +00:00
Peter Wemm	6ee6b42ef7	Minor change: fix warning - move a 'struct vnode *vp' declaration inside a #ifdef DIAGNOSTIC to match its corresponding usage.	2000-07-28 22:27:00 +00:00
Kirk McKusick	3592b7155c	Clean up the snapshot code so that it no longer depends on the use of the SF_IMMUTABLE flag to prevent writing. Instead put in explicit checking for the SF_SNAPSHOT flag in the appropriate places. With this change, it is now possible to rename and link to snapshot files. It is also possible to set or clear any of the owner, group, or other read bits on the file, though none of the write or execute bits can be set. There is also an explicit test to prevent the setting or clearing of the SF_SNAPSHOT flag via chflags() or fchflags(). Note also that the modify time cannot be changed as it needs to accurately reflect the time that the snapshot was taken. Submitted by: Robert Watson <rwatson@FreeBSD.org>	2000-07-26 23:07:01 +00:00
Kirk McKusick	55ba28c60a	Add stub for softdep_flushworklist() so that kernels compiled without the SOFTUPDATES option will load correctly. Obtained from: John Baldwin <jhb@bsdi.com>	2000-07-25 05:28:59 +00:00
Kirk McKusick	9b97113391	This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.	2000-07-24 05:28:33 +00:00
Boris Popov	3fbd97427e	Prevent possible dereference of NULL pointer. Submitted by: Marius Bendiksen <mbendiks@eunet.no>	2000-07-13 02:17:14 +00:00
Kirk McKusick	d303f71fdc	Brain fault, forgot to update ffs_snapshot.c with the new calling convention for vn_start_write.	2000-07-12 00:27:27 +00:00
Kirk McKusick	f2a2857bb3	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).	2000-07-11 22:07:57 +00:00
Kirk McKusick	d4c1816924	Clean up warning about undeclared function by declaring softdep_fsync in mount.h instead of ffs_extern.h. The correct solution is to use an indirect function pointer so that the kernel does not have to be built with options FFS, but that will be left for another day.	2000-07-11 19:28:26 +00:00
Kirk McKusick	cc3962a9cd	Delete README as it is now obsolete. Relevant information is in README.softupdates.	2000-07-08 02:32:49 +00:00
Kirk McKusick	876578906d	Update to reflect current status.	2000-07-08 02:31:21 +00:00
Kirk McKusick	22e5a6234e	Get userland visible flags added for snapshots to give a few days advance preparation for them to get migrated into place so that subsequent changes in utilities will not fail to compile for lack of up-to-date header files in /usr/include.	2000-07-04 04:58:34 +00:00
Poul-Henning Kamp	3275cf7379	Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES, that is way cleaner than using the softupdates_stub stunt, which should be killed when convenient. Discussed with: mckusick	2000-07-03 13:26:54 +00:00
Andrey A. Chernov	2d90744fd8	Remove obsoleted info about linking from contrib	2000-06-24 13:29:25 +00:00
Kirk McKusick	858c16fab8	Update to new copyright.	2000-06-22 00:29:53 +00:00
Kirk McKusick	6019e6208f	When running with quotas enabled on a filesystem using soft updates, the system would panic when a user's inode quota was exceeded (see PR 18959 for details). This fixes that problem. PR: 18959 Submitted by: Jason Godsey <jason@unixguy.fidalgo.net>	2000-06-18 22:14:28 +00:00
Kirk McKusick	d3abb52714	Some additional performance improvements. When freeing an inode check to see if it has been committed to disk. If it has never been written, it can be freed immediately. For short lived files this change allows the same inode to be reused repeatedly. Similarly, when upgrading a fragment to a larger size, if it has never been claimed by an inode on disk, it too can be freed immediately making it available for reuse often in the next slowly growing block of the same file.	2000-06-18 22:05:57 +00:00
Poul-Henning Kamp	7c50d77218	Revert part of my bioops change which implemented panic(8).	2000-06-16 14:32:13 +00:00
Poul-Henning Kamp	7523681895	ARGH! I have too many source trees :-( Fix prototype errors in last commit.	2000-06-16 13:00:33 +00:00
Poul-Henning Kamp	a2e7a027a7	Virtualizes & untangles the bioops operations vector. Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@	2000-06-16 08:48:51 +00:00
Poul-Henning Kamp	6ea6805f8c	Remove a comment which should never have made it in.	2000-06-14 21:48:19 +00:00
Robert Watson	b2b0497ab5	o If FFS_EXTATTR is defined, don't print out an error message on unmount if an FFS partition returns EOPNOTSUPP, as it just means extended attributes weren't enabled on that partition. Prevents spurious warning per-partition at shutdown.	2000-06-04 04:50:36 +00:00
Jake Burkholder	e39756439c	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others	2000-05-26 02:09:24 +00:00
Jake Burkholder	740a1973a6	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd	2000-05-23 20:41:01 +00:00
Robert Watson	f3706a0361	s/ffs_unmonut/ffs_unmount/ in a gratuitous ufs_extattr printf. Reported by: knu	2000-05-07 17:21:08 +00:00
Poul-Henning Kamp	9626b608de	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter	2000-05-05 09:59:14 +00:00
Poul-Henning Kamp	2c9b67a8df	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude	2000-04-30 18:52:11 +00:00
Poul-Henning Kamp	87150cb06d	s/biowait/bufwait/g Prodded by: several.	2000-04-29 16:25:22 +00:00
Poul-Henning Kamp	eb95c536ad	Remove unneeded #include <sys/kernel.h>	2000-04-29 15:36:14 +00:00
Poul-Henning Kamp	3389ae9350	Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>	2000-04-19 14:58:28 +00:00
Robert Watson	a64ed08955	Introduce extended attribute support for FFS, allowing arbitrary (name, value) pairs to be associated with inodes. This support is used for ACLs, MAC labels, and Capabilities in the TrustedBSD security extensions, which are currently under development. In this implementation, attributes are backed to data vnodes in the style of the quota support in FFS. Support for FFS extended attributes may be enabled using the FFS_EXTATTR kernel option (disabled by default). Userland utilities and man pages will be committed in the next batch. VFS interfaces and man pages have been in the repo since 4.0-RELEASE and are unchanged. o ufs/ufs/extattr.h: UFS-specific extattr defines o ufs/ufs/ufs_extattr.c: bulk of support routines o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes o contrib/softupdates/ffs_softdep.c: extattr.h includes o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h (This should not be the case, and will be fixed in a future commit) Currently attributes are not supported in MFS. This will be fixed. Reviewed by: adrian, bp, freebsd-fs, other unthanked souls Obtained from: TrustedBSD Project	2000-04-15 03:34:27 +00:00
Poul-Henning Kamp	c244d2de43	Move B_ERROR flag to b_ioflags and call it BIO_ERROR. (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.	2000-04-02 15:24:56 +00:00
Poul-Henning Kamp	b99c307a21	Rename the existing BUF_STRATEGY() to DEV_STRATEGY() substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.	2000-03-20 11:29:10 +00:00
Poul-Henning Kamp	21144e3bf1	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.	2000-03-20 10:44:49 +00:00
Kirk McKusick	584508a741	Use 64-bit math to calculate if we have hit our freespace limit. Necessary for coherent results on filesystems bigger than 0.5Tb.	2000-03-17 03:44:47 +00:00
Kirk McKusick	9f043878d0	Use 64-bit math to decide if optimization needs to be changed. Necessary for coherent results on filesystems bigger than 0.5Tb. Submitted by: Paul Saab <ps@yahoo-inc.com>	2000-03-15 07:08:36 +00:00
Matthew Dillon	f8fa53397f	Fix a 'freeing free block' panic in UFS. The problem occurs when the filesystem fills up. If the first indirect block exists and FFS is able to allocate deeper indirect blocks, but is not able to allocate the data block, FFS improperly unwinds the indirect blocks and leaves a block pointer hanging to a freed block. This will cause a panic later when the file is removed. The solution is to properly account for the first block-pointer-to-an-indirect-block we had to create in a balloc operation and then unwind it if a failure occurs. Detective work by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: mckusick, Ian Dowse <iedowse@maths.tcd.ie> Approved by: jkh	2000-02-24 20:43:20 +00:00
Kirk McKusick	4434ff1d38	When writing out bitmap buffers, need to skip over ones that already have a write in progress. Otherwise one can get in an infinite loop trying to get them all flushed. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	2000-01-30 20:32:59 +00:00
Kirk McKusick	57a91f6fb0	During fastpath processing for removal of a short-lived inode, the set of restrictions for cancelling an inode dependency (inodedep) is somewhat stronger than originally coded. Since this check appears in two places, we codify it into the function check_inode_unwritten which we then call from the two sites, one freeing blocks and the other freeing directory entries. Submitted by: Steinar Haug via Matthew Dillon	2000-01-18 01:33:05 +00:00
Kirk McKusick	4c6adb0622	Need to reorganize the flushing of directory entry (pagedep) dependencies so that they never try to lock an inode corresponding to ".." as this can lead to deadlock. We observe that any inode with an updated link count is always pushed into its buffer at the time of the link count change, so we do not need to do a VOP_UPDATE, but merely find its buffer and write it. The only time we need to get the inode itself is from the result of a mkdir whose name will never be ".." and hence locking such an inode will never request a lock above us in the filesystem tree. Thanks to Brian Fundakowski Feldman for providing the test program that tickled soft updates into hanging in "inode" sleep. Submitted by: Brian Fundakowski Feldman <green@FreeBSD.org>	2000-01-18 01:30:03 +00:00
Kirk McKusick	105ef72c55	Better bounding on softdep_flushfiles; other minor tweeks to checks.	2000-01-17 06:35:11 +00:00
Kirk McKusick	107d5039ef	Must track multiple uncommitted renames until one ultimately gets committed to disk or is removed.	2000-01-17 06:28:18 +00:00
Matthew Dillon	173cce7c8e	Non-operational change, fix compiler warning. Reviewed by: mckusick	2000-01-14 04:39:28 +00:00
Kirk McKusick	d7127837a2	Confirming Peter's fix (locking 101: release the lock before you go to sleep). Locking 101, part 2: do not look at buffer contents after you have been asleep. There is no telling what wonderous changes may have occurred.	2000-01-13 20:03:22 +00:00
Peter Wemm	7f473504e6	Free the global softupdates lock prior to tsleep() in getdirtybuf(). This seems to be responsible for a bunch of panics where the process sleeps and something else finds softupdates "locked" when it shouldn't be. This commit is unreviewed, but has been a big help here. Previously my boxes would panic pretty much on the first fsync() that wrote something to disk.	2000-01-13 18:48:12 +00:00
Kirk McKusick	1c2ceb2880	Because cylinder group blocks are now written in background, it is no longer sufficient to get a lock on a buffer to know that its write has been completed. We have to first get the lock on the buffer, then check to see if it is doing a background write. If it is doing background write, we have to wait for the background write to finish, then check to see if that fullfilled our dependency, and if not to start another write. Luckily the explanation is longer than the fix.	2000-01-13 07:20:01 +00:00
Kirk McKusick	94313add1f	A panic occurs during an fsync when a dirty block associated with a vnode has not been written (which would clear certain of its dependencies). The problems arises because fsync with MNT_NOWAIT no longer pushes all the dirty blocks associated with a vnode. It skips those that require rollbacks, since they will just get instantly dirty again. Such skipped blocks are marked so that they will not be skipped a second time (otherwise circular dependencies would never clear). So, we fsync twice to ensure that everything will be written at least once.	2000-01-13 07:17:39 +00:00
Kirk McKusick	4ed62fbd7f	The only known cause of this panic is running out of disk space. The problem occurs when an indirect block and a data block are being allocated at the same time. For example when the 13th block of the file is written, the filesystem needs to allocate the first indirect block and a data block. If the indirect block allocation succeeds, but the data block allocation fails, the error code dellocates the indirect block as it has nothing at which to point. Unfortunately, it does not deallocate the indirect block's associated dependencies which then fail when they find the block unexpectedly gone (ptr == 0 instead of its expected value). The fix is to fsync the file before doing the block rollback, as the fsync will flush out all of the dependencies. Once the rollback is done the file must be fsync'ed again so that the soft updates code does not find unexpected changes. This approach is much slower than writing the code to back out the extraneous dependencies, but running out of disk space is not expected to be a common occurence, so just getting it right is the main criterion. PR: kern/15063 Submitted by: Assar Westerlund <assar@stacken.kth.se>	2000-01-11 08:27:00 +00:00
Kirk McKusick	10767f840b	We cannot proceed to free the blocks of the file until the dependencies have been cleaned up by deallocte_dependencies(). Once that is done, it is safe to post the request to free the blocks. A similar change is also needed for the freefile case.	2000-01-11 06:52:35 +00:00
Poul-Henning Kamp	ba4ad1fcea	Give vn_isdisk() a second argument where it can return a suitable errno. Suggested by: bde	2000-01-10 12:04:27 +00:00
Kirk McKusick	26e5527c86	Missing FREE_LOCK call before handle_workitem_freeblocks. Submitted by: "Kenneth D. Merry" <ken@kdm.org>	2000-01-10 08:39:03 +00:00
Kirk McKusick	cf60e8e4bf	Several performance improvements for soft updates have been added: 1) Fastpath deletions. When a file is being deleted, check to see if it was so recently created that its inode has not yet been written to disk. If so, the delete can proceed to immediately free the inode. 2) Background writes: No file or block allocations can be done while the bitmap is being written to disk. To avoid these stalls, the bitmap is copied to another buffer which is written thus leaving the original available for futher allocations. 3) Link count tracking. Constantly track the difference in i_effnlink and i_nlink so that inodes that have had no change other than i_effnlink need not be written. 4) Identify buffers with rollback dependencies so that the buffer flushing daemon can choose to skip over them.	2000-01-10 00:24:24 +00:00
Kirk McKusick	f0f7d38386	Keep tighter control of removal dependencies by limiting the number of dirrem structure rather than the collaterally created freeblks and freefile structures. Limit the rate of buffer dirtying by the syncer process during periods of intense file removal.	2000-01-09 23:35:38 +00:00
Kirk McKusick	3f5b28bc07	Reorganize softdep_fsync so that it only does the inode-is-flushed check before the inode is unlocked while grabbing its parent directory. Once it is unlocked, other operations may slip in that could make the inode-is-flushed check fail. Allowing other writes to the inode before returning from fsync does not break the semantics of fsync since we have flushed everything that was dirty at the time of the fsync call.	2000-01-09 23:14:57 +00:00
Kirk McKusick	e2dc60835d	Get rid of unreferenced function.	2000-01-09 22:42:42 +00:00
Kirk McKusick	83aaf63ab2	Make static non-exported functions from soft updates.	2000-01-09 22:40:09 +00:00
Peter Wemm	c447342094	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.	1999-12-29 05:07:58 +00:00
Bruce Evans	7e58bfacbe	Update the unclean flag for mount -u. I forgot to handle this case when I made the absence of the clean flag sticky in rev.1.88. This was a problem main for "mount /". There is no way to mount "/" for writing without using mount -u (normally implicitly), so after "mount -f /" of an unclean filesystem, the absence of the clean flag was sticky forever.	1999-12-23 15:42:14 +00:00
Eivind Eklund	369dc8ceb8	Change incorrect NULLs to 0s	1999-12-21 11:14:12 +00:00
Robert Watson	91f37dcba1	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind	1999-12-19 06:08:07 +00:00
Kirk McKusick	6a4152243f	The function request_cleanup() had a tsleep() with PCATCH. It is quite dangerous, since the process may hold locks at the point, and if it is stopped in that tsleep the machine may hang. Because the sleep is so short, the PCATCH is not required here, so it has been removed. For the future, the FreeBSD team needs to decide whether it is still reasonable to stop a process in tsleep, as that may affect any other code that uses PCATCH while holding kernel locks. Submitted by: Dmitrij Tejblum <tejblum@arc.hq.cti.ru> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-12-16 22:02:09 +00:00
Eivind Eklund	762e6b856c	Introduce NDFREE (and remove VOP_ABORTOP)	1999-12-15 23:02:35 +00:00
Eivind Eklund	6bdfe06ad9	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter	1999-12-11 16:13:02 +00:00
Bill Fumerola	43cd4e8815	Remove the 'alpha, use at your own risk' death-statement. Reviewed by: mckusick (verbally at FreeBSDcon)	1999-12-03 00:40:31 +00:00
Bill Fumerola	cfa5001489	Fix typo, add $FreeBSD$	1999-12-03 00:34:26 +00:00
Kirk McKusick	9f54c05286	Preferentially allocate the first indirect block in the same cylinder group as the inode. This makes a 15% difference in read speed for files in the 96K to 500K size range.	1999-12-01 19:33:12 +00:00
Poul-Henning Kamp	38224dcd59	Convert various pieces of code to use vn_isdisk() rather than checking for vp->v_type == VBLK. In ccd: we don't need to call VOP_GETATTR to find the type of a vnode. Reviewed by: sos	1999-11-22 10:33:55 +00:00
Eivind Eklund	b2f2b704d0	We do not have ffs_checkexp, so remove the prototype	1999-11-20 16:44:44 +00:00
Poul-Henning Kamp	0429e37ade	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967	1999-11-20 10:00:46 +00:00
Poul-Henning Kamp	698f9cf828	Next step in the device cleanup process. Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code. Unify spec_open() for bdev and cdev cases. Remove the disabled bdev specific read/write code.	1999-11-09 14:15:33 +00:00
Bruce Evans	5bd5c8b9e5	Quick fix for breakage of ext2fs link counts as reported by stat(2) by the soft updates changes: only report the link count to be i_effnlink in ufs_getattr() for file systems that maintain i_effnlink. Tested by: Mike Dracopoulos <mdraco@math.uoa.gr>	1999-11-03 12:05:39 +00:00
Mike Smith	6d14782861	Newline-terminate the complaint message about not being able to find the root vnode pointer.	1999-11-01 23:57:28 +00:00
Poul-Henning Kamp	923502ff91	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
Poul-Henning Kamp	b89392e703	Remove the D_NOCLUSTER[RW] options which were added because vn had problems. Now that Matt has fixed vn, this can go. The vn driver should have used d_maxio (now si_iosize_max) anyway.	1999-09-30 07:11:30 +00:00
Poul-Henning Kamp	1b5464ef9d	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde	1999-09-29 20:05:33 +00:00
Alfred Perlstein	c24fda81c9	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD	1999-09-11 00:46:08 +00:00
Peter Wemm	280652828b	$Id$ -> $FreeBSD$	1999-08-28 02:16:32 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Poul-Henning Kamp	41d2e3e09e	Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.	1999-08-25 12:24:39 +00:00
Sheldon Hearn	740e3a15f7	Fix bug introduced in rev 1.28, which causes kernel build to break for the case where DEBUG is defined but not DIAGNOSTIC. ffs_checkblk is declared conditionally on DIAGNOSTIC, not DEBUG. PR: 13314 Reviewed by: bde	1999-08-24 08:39:41 +00:00
Bruce Evans	d918320517	Use devtoname() to print dev_t's instead of casting them to long or u_long for misprinting in %lx format.	1999-08-23 20:35:21 +00:00
Poul-Henning Kamp	7dc5cd047f	The bdevsw() and cdevsw() are now identical, so kill the former.	1999-08-13 10:29:38 +00:00
Poul-Henning Kamp	0ef1c82630	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.	1999-08-08 18:43:05 +00:00
Kirk McKusick	4dc0c8f521	Create the macro DOINGASYNC to check whether the MNT_ASYNC flag has been set for a mount point. Insert missing checks to ensure that all write operations are done asynchronously when the MNT_ASYNC option has been requested. Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-13 18:20:13 +00:00
Poul-Henning Kamp	68de329e34	Use the fsid from the superblock, unless it looks bogus or has already been taken by some other filesystem.	1999-07-11 19:16:50 +00:00
Ollivier Robert	7fe29b0aef	Add $Id$ Approved by: kirk	1999-07-07 07:51:04 +00:00
John Polstra	24755bdc25	Update pathnames for new location of soft-updates sources.	1999-07-03 21:34:05 +00:00
Kirk McKusick	48703fedf1	No longer need to set B_ASYNC flag since BUF_KERNPROC now unconditionally sets the identity of the buffer.	1999-06-29 15:57:40 +00:00
Peter Wemm	a6451da76b	Keep the inlines for <sys/buf.h> happy..	1999-06-27 13:26:23 +00:00
Kirk McKusick	67812eacd7	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.	1999-06-26 02:47:16 +00:00
Kirk McKusick	7481264c1e	On our final pass through ffs_fsync, do all I/O synchronously so that we can find out if our flush is failing because of write errors. This change avoids a "flush failed" panic during unrecoverable disk errors.	1999-06-18 05:49:46 +00:00
Kirk McKusick	f9c8cab591	Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.	1999-06-16 23:27:55 +00:00
Kirk McKusick	e4ab40bcb6	Get rid of the global variable rushjob and replace it with a function in kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.	1999-06-15 23:37:29 +00:00
Poul-Henning Kamp	2447bec829	Simplify cdevsw registration. The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing. cdevsw_add() will print an message if the d_maj field looks bogus. Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL. Move bdevsw() and devsw() functions to kern/kern_conf.c Bump __FreeBSD_version to 400006 This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions if_xe.c bogusly accessed cdevsw[], author/maintainer please fix. I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.	1999-05-31 11:29:30 +00:00
Julian Elischer	2e897e94b6	Cosmetic changes to make it compile without errors in gcc -Wall	1999-05-22 04:43:04 +00:00
Kirk McKusick	c2606ec5c6	Add a hook to ffs_fsync to allow soft updates to get first chance at doing a sync on the block device for the filesystem. That allows it to push the bitmap blocks before the inode blocks which greatly reduces the number of inode rollbacks that need to be done.	1999-05-14 01:26:46 +00:00
Peter Wemm	51b5226683	Try and fix a dev_t/major/minor etc nit.	1999-05-12 22:32:07 +00:00
Kirk McKusick	71a0942aca	Put back changes that might be causing trouble on Alpha.	1999-05-09 19:39:54 +00:00
Poul-Henning Kamp	4be2eb8c49	I got tired of seeing all the cdevsw[major(foo)] all over the place. Made a new (inline) function devsw(dev_t dev) and substituted it. Changed to the BDEV variant to this format as well: bdevsw(dev_t dev) DEVFS will eventually benefit from this change too.	1999-05-08 06:40:31 +00:00
Poul-Henning Kamp	46eede0058	Continue where Julian left off in July 1998: Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function. Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!) Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!) (Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)	1999-05-07 10:11:40 +00:00
Kirk McKusick	36cfb417de	Whitespace cleanup.	1999-05-07 05:21:16 +00:00
Kirk McKusick	7957996abd	Get rid of random debugging cruft; sync up with latest version.	1999-05-07 05:11:31 +00:00
Kirk McKusick	224a6aa241	Severe slowdowns have been reported when creating or removing many files at once on a filesystem running soft updates. The root of the problem is that soft updates limits the amount of memory that may be allocated to dependency structures so as to avoid hogging kernel memory. The original algorithm just waited for the disk I/O to catch up and reduce the number of dependencies. This new code takes a much more aggressive approach. Basically there are two resources that routinely hit the limit. Inode dependencies during periods with a high file creation rate and file and block removal dependencies during periods with a high file removal rate. I have attacked these problems from two fronts. When the inode dependency limits are reached, I pick a random inode dependency, UFS_UPDATE it together with all the other dirty inodes contained within its disk block and then write that disk block. This trick usually clears 5-50 inode dependencies in a single disk I/O. For block and file removal dependencies, I pick a random directory page that has at least one remove pending and VOP_FSYNC its directory. That releases all its removal dependencies to the work queue. To further hasten things along, I also immediately start the work queue process rather than waiting for its next one second scheduled run.	1999-05-07 02:26:47 +00:00
Peter Wemm	dfd5dee1b0	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.	1999-05-06 18:13:11 +00:00
Alan Cox	4221e284a3	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-05-02 23:57:16 +00:00
Mike Smith	f4711b2df4	Simplify the tunefs example, since tunefs uses getfsfile(). Lots of people complain about working out what device their filesystems are mounted on.	1999-04-27 21:11:19 +00:00
Kirk McKusick	38e28fd66b	Reorganize locking to avoid holding the lock during calls to bdwrite and brelse (which may sleep in some systems). Obtained from: Matthew Dillon <dillon@apollo.backplane.com>	1999-03-02 06:38:07 +00:00
Kirk McKusick	eef33ce9bd	When fsync'ing a file on a filesystem using soft updates, we first try to write all the dirty blocks. If some of those blocks have dependencies, they will be remarked dirty when the I/O completes. On systems with really fast I/O systems, it is possible to get in an infinite loop trying to flush the buffers, because the I/O finishes before we can get all the dirty buffers off the v_dirtyblkhd list and into the I/O queue. (The previous algorithm looped over the v_dirtyblkhd list writing out buffers until the list emptied.) So, now we mark each buffer that we try to write so that we can distinguish the ones that are being remarked dirty from those that we have not yet tried to flush. Once we have tried to push every buffer once, we then push any associated metadata that is causing the remaining buffers to be redirtied. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-03-02 04:04:31 +00:00
Kirk McKusick	4cbb89d95d	Ensure that softdep_sync_metadata can handle bmsafemap and mkdir entries if they ever arise (which should not happen as softdep_sync_metadata is currently used).	1999-03-02 00:19:47 +00:00
Kirk McKusick	133ff2619a	fix double LIST_REMOVE; other cosmetic changes to match version 9.32. Obtained from: Jeffrey Hsu <hsu@FreeBSD.ORG>	1999-02-17 20:01:20 +00:00
Matthew Dillon	8aef171243	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-28 00:57:57 +00:00
David Greenman	8ab2fa0073	Gutted softdep_deallocate_dependencies and replaced it with a panic. It turns out to not be useful to unwind the dependencies and continue in the face of a fatal error. Also changed the log() to a printf() in softdep_error() so that it will be output in the case of a impending panic. Submitted by: Kirk McKusick <mckusick@mckusick.com>	1999-01-22 09:07:32 +00:00
Eivind Eklund	5b1b6c5859	Silence warning about unused debug function. (I'll turn this function into a DDB command in my next staticization sweep).	1999-01-12 11:42:41 +00:00
Eivind Eklund	a862221fa0	Add a warning about the copyright restraints.	1999-01-08 16:03:12 +00:00
Bruce Evans	de5d1ba57c	Don't pass unused unused timestamp args to UFS_UPDATE() or waste time initializing them. This almost finishes centralizing (in-core) timestamp updates in ufs_itimes().	1999-01-07 16:14:19 +00:00
Bruce Evans	4591d9bb7e	UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value MNT_WAIT when we mean boolean `true' or check for that value not being passed. There was no problem in practice because MNT_WAIT had the magic value of 1.	1999-01-06 18:18:06 +00:00
Bruce Evans	d64dbc8719	Ifdefed the conditionally used variable `prtrealloc'. Declare it as volatile so that there is no chance that the code that it controls is optimised away.	1999-01-06 17:04:33 +00:00
Bruce Evans	5991fd0370	Backed out rev.1.47. It just broke my optimisations for lazy syncing of timestamps in rev.1.45. The soft updates bug was elsewhere. Forgotten by: luoqi	1999-01-06 16:52:38 +00:00
Eivind Eklund	fb1167777a	Remove the 'waslocked' parameter to vfs_object_create().	1999-01-05 18:50:03 +00:00
Eivind Eklund	a777e82019	Remove the last clients of vfs_object_create(..., waslocked=1); waslocked will go away shortly. Reviewed by: dg	1999-01-02 01:32:36 +00:00
Julian Elischer	1f35e8c8da	Remove some compiler warnings.	1998-12-10 20:11:47 +00:00
Bruce Evans	672be20b9f	Don't use the strange null pointer constant `(ufs_daddr_t)0' in a call to VOP_BMAP(). Don't use uncast NULLs in the same call.	1998-11-29 03:12:06 +00:00
David Greenman	1c680b45a2	Restored the "reallocblks" code to its former glory. What this does is basically do a on-the-fly defragmentation of the FFS filesystem, changing file block allocations to make them contiguous. Thanks to Kirk McKusick for providing hints on what needed to be done to get this working.	1998-11-13 01:01:44 +00:00
Peter Wemm	2ec07c6614	Change dirty block list handling to use TAILQ macros.	1998-10-31 15:33:32 +00:00
Peter Wemm	40c8cfe552	Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.	1998-10-31 15:31:29 +00:00
Jordan K. Hubbard	2dcc2f0693	Clarify a rather ambiguous debugging message.	1998-10-28 10:37:54 +00:00
Bruce Evans	b5ee16407f	Oops, the redundant tests for major numbers weren't redundant here. They checked for the magic major number for the "device" behind mfs mount points. Use a more obvious check for this device. Debugged by: Andrew Gallatin <gallatin@cs.duke.edu>	1998-10-27 11:47:08 +00:00
Bruce Evans	9c0619dace	Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted when bdevsw[] became sparse. We still depend on magic to avoid having to check that (v_rdev) device numbers in vnodes are not NODEV. Removed redundant `major(dev) < nblkdev' tests instead of updating them.	1998-10-25 19:02:48 +00:00
Poul-Henning Kamp	f5ef029e92	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.	1998-10-25 17:44:59 +00:00
Nate Williams	ed8d80c2de	Fix 'noatime' bug that was unrelated to use of noatime. The problem is caused when a directory block is compacted. When this occurs, softdep_change_directoryentry_offset() is called to relocate each directory entry and adjust its matching diradd structure, if any, to match the new location of the entry. The bug is that while softdep_change_directoryentry_offset() correctly adjusts the offsets of the diradd structures on the pd_diraddhd[] lists (which are not yet ready to be committed to disk), it fails to adjust the offsets of the diradd structures on the pd_pendinghd list (which are ready to be committed to disk). This causes the dependency structures to be inconsistent with the buf contents. Now, if the compaction has moved a directory entry to the same offset as one of the diradd structures on the pd_pendinghd list and a syscall is done that tries to remove this directory entry before this directory block has been written to disk (which would empty pd_pendinghd), a sanity check in newdirrem() will call panic() when it notices that the inode number in the entry that it is to be removed doesn't match the inode number in the diradd structure with that offset of that entry. Reviewed by: Kirk McKusick <mckusick@McKusick.COM> Submitted by: Don Lewis <Don.Lewis@tsc.tdk.com>	1998-10-03 19:17:11 +00:00
Bruce Evans	0922cce61c	Fixed clean flag handling: - don't set the clean flag on unmount of an unclean filesystem that was (forcibly) mounted rw. - set the clean flag on rw -> ro update of a mounted initially-clean filesystem. - fixed some style bugs (mostly long lines). This uses the fs_flags field and FS_UNCLEAN state bit which were introduced in the softdep changes. NetBSD uses extra state bits in fs_clean. Reviewed by: luoqui	1998-09-26 04:59:42 +00:00
Luoqi Chen	e266594c25	Eliminate a race in VOP_FSYNC() when softupdates is enabled. Submitted by: Kirk McKusick <mckusick@McKusick.COM> Two minor changes are also included, 1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set, vn_lock should always succeed in these cases. 2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount a little more unstable. It also keeps us in sync with other BSDs. Suggested by: Bruce Evans <bde@zeta.org.au>	1998-09-24 15:02:46 +00:00
Luoqi Chen	f9e84c2fee	Restore pre-v1.44 behavior: always copy modified in-core inode to disk buffer. Otherwise some in-core inode changes might be lost, including important meta data (e.g. size) if softupdates is enabled.	1998-09-15 14:45:28 +00:00
Søren Schmidt	d024c95599	Remove the SLICE code. This clearly needs alot more thought, and we dont need this to hunt us down in 3.0-RELEASE.	1998-09-14 19:56:42 +00:00
Bruce Evans	9164000766	Don't dereference an uninitialized pointer in dead code. The dead code gets executed if it is compiled without optimization.	1998-09-12 14:46:15 +00:00
Bruce Evans	8994ca3ce9	Removed statically configured mount type numbers (MOUNT_) and all references to them. The change a couple of days ago to ignore these numbers in statically configured vfsconf structs was slightly premature because the cd9660, cfs, devfs, ext2fs, nfs vfs's still used MOUNT_ instead of the number in their vfsconf struct.	1998-09-07 13:17:06 +00:00
Bruce Evans	ff261f16f6	Put the zombie ffs sysctl node in "notyet" state together with its few remaining children. Prepare it for MOUNT_UFS going away.	1998-09-07 11:50:19 +00:00
Poul-Henning Kamp	0375c9f2b8	Add a new vnode op, VOP_FREEBLKS(), which filesystems can use to inform device drivers about sectors no longer in use. Device-drivers receive the call through d_strategy, if they have D_CANFREE in d_flags. This allows flash based devices to erase the sectors and avoid pointlessly carrying them around in compactions. Reviewed by: Kirk Mckusick, bde Sponsored by: M-Systems (www.m-sys.com)	1998-09-05 14:13:12 +00:00
Bruce Evans	0492d857d1	Removed unused includes.	1998-08-17 19:09:36 +00:00
Julian Elischer	55d80b2df1	Handle the case of moving a directory onto the top of a sibling's child of the same name. Submitted by: Kirk Mckusick with fixes from luoqi Chen Obtained from: Whistle test tree.	1998-08-12 20:46:47 +00:00
Bruce Evans	ac1e407b32	Fixed printf format errors.	1998-07-11 07:46:16 +00:00
Julian Elischer	bcbd6c6fdd	Don't update superblock if mounted readonly, also fixes some problems with softupdates on root. More cleanups are needed here.. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>	1998-07-08 23:52:27 +00:00
Julian Elischer	fd5d1124e2	VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>	1998-07-04 20:45:42 +00:00
Bruce Evans	3055187290	Sync timestamp changes for inodes of special files to disk as late as possible (when the inode is reclaimed). Temporarily only do this if option UFS_LAZYMOD configured and softupdates aren't enabled. UFS_LAZYMOD is intentionally left out of /sys/conf/options. This is mainly to avoid almost useless disk i/o on battery powered machines. It's silly to write to disk (on the next sync or when the inode becomes inactive) just because someone hit a key or something wrote to the screen or /dev/null. PR: 5577 Previous version reviewed by: phk	1998-07-03 22:17:03 +00:00
Bruce Evans	33cc029eab	Centralized in-core inode update. Update the in-core inode directly in ufs_setattr() so that there is no need to pass timestamps to UFS_UPDATE() (everything else just needs the current time). Ignore the passed-in timestamps in UFS_UPDATE() and always call ufs_itimes() (was: itimes()) to do the update. The timestamps are still passed so that all the callers don't need to be changed yet.	1998-07-03 18:46:52 +00:00
Jordan K. Hubbard	d94ce17be4	Flesh this document out just a little in response to some user questions and also recommend linking over copying since, at this stage, a stale copy is a real concern.	1998-06-26 10:35:55 +00:00
Julian Elischer	c619155f0e	Slight change to directory cleanup Makes soft updates a bit cleaner. Eliminates some warnings about 'corrupted directories' from fsck.	1998-06-14 19:31:28 +00:00
Julian Elischer	28ed032673	Note which version of Kirk's sources this corresponds to.	1998-06-12 21:21:26 +00:00
Julian Elischer	aa75cb86b4	Fix the case when renaming to a file that you've just created and deleted, that had an inode that has not yet been written to disk, when the inode of the new file is also not yet written to disk, and your old directory entry is not yet on disk but you need to remove it and the new name exists in memory but has been deleted but the transaction to write the deleted name to disk exists and has not yet been cancelled by the request to delete the non existant name. I don't know how kirk could have missed such a glaring problem for so long. :-) Especially since the inconsitency survived on the disk for a whole 4 second on average before being fixed by other code. This was not a crashing bug but just led to filesystem inconsitencies if you crashed. Submitted by: Kirk McKusick (mckusick@mckusick.com)	1998-06-12 20:48:30 +00:00
Julian Elischer	6d0ba44288	Add B_NOCACHE to several cases where BSD4.4 only required a B_INVAL. Change worked out by john and kirk in consort.	1998-06-11 17:44:32 +00:00
Julian Elischer	8c221701c3	Fix for "live inode" panic. Submitted by: Kirk McKusick <mckusick@McKusick.COM> Reviewed by: yeah right...	1998-06-10 20:45:46 +00:00
Julian Elischer	4af0bb0f9e	Remove buggy debugging code.	1998-06-10 20:03:16 +00:00
Julian Elischer	939001af5c	Back out John's changes 1.45 -> 1.46 Kirk confirms that the original semantic was what he wanted... (well, a very slight difference) May fix "dangling deps" panic with soft updates.	1998-06-10 19:27:56 +00:00
Doug Rabson	8435e0aef5	Use size_t instead of u_int for sizes.	1998-06-04 17:21:39 +00:00
Julian Elischer	00076e7cf9	Add a reference to the original softupdates paper	1998-06-02 01:30:51 +00:00
Julian Elischer	3942b533f8	Add a reference to the Ganger/Patt paper	1998-06-02 01:27:27 +00:00
Julian Elischer	b8cf4de4c8	A fix to a debug test from Kirk.	1998-05-27 03:32:23 +00:00
Julian Elischer	928c9ddf81	Ensure that there is enough information here, so that people can use soft updates should they desire.	1998-05-19 23:18:37 +00:00
Julian Elischer	25db4e8a66	Bring up-to-date with Whistle's current version Includes some debugging code.	1998-05-19 23:07:25 +00:00
Julian Elischer	46e752be05	Merge with Kirk's version as of Feb 20 His version 9.23 == our version 1.5 of ffs_softdep.c His version 9.5 == our version 1.4 of softdep.c	1998-05-19 22:54:53 +00:00
Julian Elischer	62e12c760c	Merge in Kirk's changes to stop softupdates from hogging all of memory.	1998-05-19 21:45:53 +00:00
Julian Elischer	b6dad36385	Change to stop a silly panic. This should be understood better. Change a buffer swizzle trick to a bcopy. It would be nice if the efficient trick could be used in the future.	1998-05-19 20:50:41 +00:00
Julian Elischer	987614a910	First published FreeBSD version of soft updates Feb 5.	1998-05-19 20:18:42 +00:00
Julian Elischer	a697eb98d4	This commit was generated by cvs2svn to compensate for changes in r36206, which included commits to RCS files with non-trunk default branches.	1998-05-19 20:03:29 +00:00
Julian Elischer	8e95b94dec	Import the next version received from kirk after some FreeBSD feedback.	1998-05-19 20:03:29 +00:00
Julian Elischer	8d1c524575	This commit was generated by cvs2svn to compensate for changes in r36201, which included commits to RCS files with non-trunk default branches.	1998-05-19 19:47:22 +00:00
Julian Elischer	467e1a6e7a	Import the earliest version of the soft update code that I have.	1998-05-19 19:47:22 +00:00
Julian Elischer	c11d29814e	try stop the user from using mount -u to set the async flag on a filesystem currently using soft updates. Also needs a new copy of ffs_softdep.c to complete the fix.	1998-05-18 06:38:18 +00:00
Julian Elischer	5d0957193a	Add missing splx() Submitted by: Luoqi Chen <luoqi@chen.ml.org>	1998-05-11 21:41:13 +00:00
Mike Smith	79cc756d8b	As described by the submitter: Reverse the VFS_VRELE patch. Reference counting of vnodes does not need to be done per-fs. I noticed this while fixing vfs layering violations. Doing reference counting in generic code is also the preference cited by John Heidemann in recent discussions with him. The implementation of alternative vnode management per-fs is still a valid requirement for some filesystems but will be revisited sometime later, most likely using a different framework. Submitted by: Michael Hancock <michaelh@cet.co.jp>	1998-05-06 05:29:41 +00:00
John Dyson	e60606c0af	Correct an error that I made where the vtruncbuf was changed back to vinvalbuf, but I incorrectly added the "V_SAVE\|V_SAVEMETA" flags. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>	1998-05-04 17:43:48 +00:00
John Dyson	83ad4e3dbc	Fix an error that I made with an optimization. In the case of softupdates, we need to do vtruncbuf the old way. Luoqi caught, found the bug and submitted this fix. Submitted by: Luoqi Chen <luoqi@chen.ml.org>	1998-04-30 05:28:53 +00:00
Julian Elischer	c0bab11dfe	Make the devfs SLICE option a standard type option. (hopefully it will go away eventually anyhow)	1998-04-20 03:57:41 +00:00
Julian Elischer	3e425b968d	Add changes and code to implement a functional DEVFS. This code will be turned on with the TWO options DEVFS and SLICE. (see LINT) Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes. /dev will be automatically mounted by init (thanks phk) on bootup. See /sys/dev/slice/slice.4 for more info. All code should act the same without these options enabled. Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others This code does not support the following: bad144 handling. Persistance. (My head is still hurting from the last time we discussed this) ATAPI flopies are not handled by the SLICE code yet. When this code is running, all major numbers are arbitrary and COULD be dynamically assigned. (this is not done, for POLA only) Minor numbers for disk slices ARE arbitray and dynamically assigned.	1998-04-19 23:32:49 +00:00
Poul-Henning Kamp	227ee8a188	Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde	1998-03-30 09:56:58 +00:00
Bruce Evans	08637435f2	Moved some #includes from <sys/param.h> nearer to where they are actually used.	1998-03-28 10:33:27 +00:00
Peter Wemm	26cf9c3b75	Enable the use of soft updates on the root filesystem. Previously, the softdep mode could only be activated on the initial mount of a filesystem and then only if it was a read-write mount. A 'mount -r' (as done in the rootfs mount) followed by a 'mount -u' to convert to read-write didn't start softdep mode.	1998-03-27 14:20:57 +00:00
Poul-Henning Kamp	a0502b19d4	Add two new functions, get{micro\|nano}time. They are atomic, but return in essence what is in the "time" variable. gettime() is now a macro front for getmicrotime(). Various patches to use the two new functions instead of the various hacks used in their absence. Some puntuation and grammer patches from Bruce. A couple of XXX comments.	1998-03-26 20:54:05 +00:00
Bruce Evans	3d2d6cc3d8	Forward declare even more structs to restore some self-sufficiency. Didn't fix new dependence on <ufs/ufs/inode.h> and its prerequisites.	1998-03-23 14:12:37 +00:00
John Dyson	9eebcfcf8c	Softdep_sync_metadata appears to expect that it is called at splbio, so make it so...	1998-03-21 05:16:09 +00:00
John Dyson	34f72be5af	Fix vfs_bio_awrite usage, and correct vtruncbuf usage.	1998-03-19 22:49:44 +00:00
John Dyson	bef608bd7e	Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!! pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.) pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances. vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.) vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust. vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true. vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.) vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities. vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.) vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.) 10) Modify FFS to use vtruncbuf. vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.) vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly. vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.) vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.	1998-03-16 01:56:03 +00:00
Julian Elischer	b1897c197c	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree	1998-03-08 09:59:44 +00:00
Julian Elischer	2cbcee772b	Submitted by: kirk McKusick Stub file for soft updates.	1998-03-08 08:38:41 +00:00
John Dyson	8f9110f6a1	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.	1998-03-07 21:37:31 +00:00
Bruce Evans	16337c2efb	Fixed missing simple_lock() in ffs_mountfs().	1998-03-07 14:59:44 +00:00
Mike Smith	34bdbbd0de	The intent is to get rid of WILLRELE in vnode_if.src by making a complement to all ops that return a vpp, VFS_VRELE. This is initially only for file systems that implement the following ops that do a WILLRELE: vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link, vop_rename, vop_mkdir, vop_rmdir, vop_symlink This is initial DNA that doesn't do anything yet. VFS_VRELE is implemented but not called. A default vfs_vrele was created for fs implementations that use the standard vnode management routines. VFS_VRELE implementations were made for the following file systems: Standard (vfs_vrele) ffs mfs nfs msdosfs devfs ext2fs Custom union umapfs Just EOPNOTSUPP fdesc procfs kernfs portal cd9660 These implementations may change as VOP changes are implemented. In the next phase, in the vop implementations calls to vrele and the vrele part of vput will be moved to the top layer vfs_vnops and made visible to all layers. vput will be replaced by unlock in these cases. Unlocking will still be done in the per fs layer but the refcount decrement will be triggered at the top because it doesn't hurt to hold a vnode reference a little longer. This will have minimal impact on the structure of the existing code. This will only be done for vnode arguments that are released by the various fs vop implementations. Wider use of VFS_VRELE will likely require restructuring of the code. Reviewed by: phk, dyson, terry et. al. Submitted by: Michael Hancock <michaelh@cet.co.jp>	1998-03-01 22:46:53 +00:00
Mike Smith	ce75f2c365	In the author's words: These diffs implement the first stage of a VOP_{GET\|PUT}PAGES pushdown for local media FS's. See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation details for generic _{get\|put}pages for local media FS's. Support is trivial to add for any FS that formerly relied on the default behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the ffs_getpages() code for the FS in question's _{get\|put}pages). Obviously, it would be better if each local media FS implemented a more optimal method, instead of calling an exported interface from the /sys/vm/vnode_pager.c, but this is a necessary first step in getting the FS's to a point where they can be supplied with better implementations on a case-by-case basis. Obviously, the cd9660_putpages() can be rather trivial (since it is a read-only FS type 8-)). A slight (temporary) modification is made to print a diagnostic message in the case where the underlying filesystem attempts to engage in the previous behaviour. Failure is likely to be ungraceful. Submitted by: terry@freebsd.org (Terry Lambert)	1998-02-26 06:39:59 +00:00
Bruce Evans	c9b9921363	Fixed missing permissions checking for mounting by non-root. There is now less need for the vfs.usermount sysctl. msdosfs already has this change, modulo a missing LK_RETRY, via NetBSD. At least ext2fs is missing this and many other changes from Lite2. Obtained from: Lite2	1998-02-25 04:47:04 +00:00
Bruce Evans	721a23b12f	Removed unnecessary dependencies on KERNEL and DIAGNOSTIC. This was more useful when opt_diagnostic.h had to be included.	1998-02-13 00:20:36 +00:00
Eivind Eklund	303b270b0a	Staticize.	1998-02-09 06:11:36 +00:00
Eivind Eklund	0b08f5f737	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
Eivind Eklund	47cfdb166d	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
Bruce Evans	9cf2c3e77a	Forward declare some structs so that this file is more self-sufficient.	1998-02-03 21:52:02 +00:00
John Dyson	9cfcd01101	Back out recent laptop sync changes. They had significant errors.	1998-02-01 08:24:00 +00:00
John Dyson	de1050d8e4	Support more intelligent sync operations for MNT_NOATIME. PR: kern/5577 Submitted by: Craig Leres <leres@ee.lbl.gov>	1998-02-01 01:59:12 +00:00
John Dyson	2d8acc0f4a	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)	1998-01-22 17:30:44 +00:00
John Dyson	4722175765	Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.	1998-01-17 09:17:02 +00:00
John Dyson	95e5e988e0	Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.	1998-01-06 05:26:17 +00:00
John Dyson	2be70f79f6	Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.	1997-12-29 00:25:11 +00:00
Bruce Evans	1cd52ec333	Don't include <sys/lock.h> in headers when only `struct simplelock' is required. Fixed everything that depended on the pollution.	1997-12-05 19:55:52 +00:00
Bruce Evans	23906a782b	Fix a small style bug in the generation number change (rev.1.33) before copying the change to other fs's.	1997-12-02 11:21:16 +00:00
Bruce Evans	cb451ebdbd	Staticized.	1997-11-22 08:35:46 +00:00
Bruce Evans	2ea354c3bb	Unremoved prtrealloc and the declaration of ffs_clusteralloc(). These are used in the `#ifdef notyet' case :-). This case is used except in the `#if !defined (not_yes)' case :-\|. This has something to do with the `#ifdef notyet_block_reallocation_enabled' case in vfs_cluster.c :-(.	1997-11-22 07:00:40 +00:00
Bruce Evans	c5697b1f22	Removed an unused #include in the `#ifdef KERNEL' case. Fixed a comment to match the code. The code is still wrong (ffs_checkoverlap() should be staticized and called from a ddb command).	1997-11-18 15:10:38 +00:00
Julian Elischer	b1f4a44b03	Reviewed by: various. Ever since I first say the way the mount flags were used I've hated the fact that modes, and events, internal and exported, and short-term and long term flags are all thrown together. Finally it's annoyed me enough.. This patch to the entire FreeBSD tree adds a second mount flag word to the mount struct. it is not exported to userspace. I have moved some of the non exported flags over to this word. this means that we now have 8 free bits in the mount flags. There are another two that might well move over, but which I'm not sure about. The only user visible change would have been in pstat -v, except that davidg has disabled it anyhow. I'd still like to move the state flags and the 'command' flags apart from each other.. e.g. MNT_FORCE really doesn't have the same semantics as MNT_RDONLY, but that's left for another day.	1997-11-12 05:42:33 +00:00
Poul-Henning Kamp	4a11ca4e29	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused	1997-11-07 08:53:44 +00:00
Bruce Evans	ef91bd5734	Removed unused #includes. The need for most of them went away with recent changes (docluster* and vfs improvements).	1997-10-27 13:33:47 +00:00
Poul-Henning Kamp	987f569678	Another VFS cleanup "kilo commit" 1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure. 2. Remove VOP_SEEK, it was unused. 3. Add mode default vops: VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp And remove identical functionality from filesystems 4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?) 5. Try to make system wide VOP functions have vop_* names. 6. Initialize the um_* vectors in LFS. (Recompile your LKMS!!!)	1997-10-16 20:32:40 +00:00
Poul-Henning Kamp	cec0f20ce7	VFS mega cleanup commit (x/N) 1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here. 2. Change VOP_BLKATOFF to a normal function in cd9660. 3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead. 4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done. 5. Fix another VCALL in vfs_cache.c (thanks Bruce!)	1997-10-16 10:50:27 +00:00
Poul-Henning Kamp	138ec1f71a	vnops megacommit 1. Use the default function to access all the specfs operations. 2. Use the default function to access all the fifofs operations. 3. Use the default function to access all the ufs operations. 4. Fix VCALL usage in vfs_cache.c 5. Use VOCALL to access specfs functions in devfs_vnops.c 6. Staticize most of the spec and fifofs vnops functions. 7. Make UFS panic if it lacks bits of the underlying storage handling.	1997-10-15 13:24:07 +00:00
Poul-Henning Kamp	6a525123aa	Hmm, realign the vnops into two columns.	1997-10-15 10:05:29 +00:00
Poul-Henning Kamp	539ef70c2d	Stylistic overhaul of vnops tables. 1. Remove comment stating the blatantly obvious. 2. Align in two columns. 3. Sort all but the default element alphabetically. 4. Remove XXX comments pointing out entries not needed.	1997-10-15 09:22:02 +00:00
Poul-Henning Kamp	40715905a7	I think my previous change may have opened a race conditio. This patch does the same thing, with no change in semantics.	1997-10-14 18:46:48 +00:00
Poul-Henning Kamp	34a6a33036	ufs_ihashrem() should not be called from the UFS layer, but from the lower layer (LFS/FFS/?) like the rest of the ihash functions. Otherwise it is impossible to make a lower layer that doesn't use the ihash facility.	1997-10-14 14:22:31 +00:00
Poul-Henning Kamp	a1c995b626	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde	1997-10-12 20:26:33 +00:00
Poul-Henning Kamp	55166637cd	Distribute and statizice a lot of the malloc M_* types. Substantial input from: bde	1997-10-11 18:31:40 +00:00
Poul-Henning Kamp	0be6b890f9	Add type arg to ffs_mountfs and avoid examining v_tag to find out if MFS is getting a free ride. Use generic ufs_reclaim().	1997-10-10 18:17:00 +00:00
KATO Takenori	81bca6ddae	Clustered read and write are switched at mount-option level. 1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread and vfs.foo.doclusterwrite are deleted. Only mount option can control clustered I/O from userland. 2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR and D_CLUSTERW bits of the d_flags member in the block device switch table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR / MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and MNT_NOCLUSTERW cannot be cleared from userland. 3. Vnode driver disables both clustered read and write. 4. Union filesystem disables clutered write. Reviewed by: bde	1997-09-27 13:40:20 +00:00
Poul-Henning Kamp	ec1d10e413	[Regarding the previous patch] This is completely wrong. 1. ffs_alloc() actually allowed writing one block less one frag (normally 7 frags or 7/8 blocks) beyond the limit. 2. freebufspace() gives the free space in frags, but `size' is in bytes, so the change results in approximately `size' fragments too many being reserved. 3. ffs_realloccg() has the same bug but wasn't changed. PR: 3398 Submitted by: bde Eyeballed by: phk	1997-09-19 11:13:16 +00:00
Poul-Henning Kamp	6772045776	Ffs_alloc allow users to write one block beyond the limit. PR: 3398 Reviewed by: phk Submitted by: Wolfram Schneider <wosch@apfel.de>	1997-09-18 18:07:45 +00:00
Peter Wemm	a6aeade2c4	Convert select -> poll. Delete 'always succeed' select/poll handlers, replaced with generic call. Flag missing vnode op table entries.	1997-09-14 02:58:12 +00:00
Bruce Evans	41fadeeb28	Removed yet more vestiges of config-time swap configuration and/or cleaned up nearby cruft.	1997-09-07 16:21:11 +00:00
Bruce Evans	e4ba6a82b0	Removed unused #includes.	1997-09-02 20:06:59 +00:00
Poul-Henning Kamp	0fa2443f0e	Uncut&paste cache_lookup(). This unifies several times in theory indentical 50 lines of code. The filesystems have a new method: vop_cachedlookup, which is the meat of the lookup, and use vfs_cache_lookup() for their vop_lookup method. vfs_cache_lookup() will check the namecache and pass on to the vop_cachedlookup method in case of a miss. It's still the task of the individual filesystems to populate the namecache with cache_enter(). Filesystems that do not use the namecache will just provide the vop_lookup method as usual.	1997-08-26 07:32:51 +00:00
KATO Takenori	804f67f778	Renamed doclusterread/write to unique names (ffs_doclusterread/write), and staticize them. Move the #include of <sys/sysctl.h> to the top of the file. Pointed out by: Bruce Evans <bde@zeta.org.au>	1997-08-25 08:18:39 +00:00
Garrett Wollman	57bf258e3d	Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.	1997-08-16 19:16:27 +00:00
Poul-Henning Kamp	3758b280f2	We got a couple of "map mismatch" panics from the following code. According to the crash dump, bpref is set to 445 and cgp->cg_nclusterblks is 444. Hence in the for loop, the test fails immediately but the following failure check (got == cgp->cg_nclusterblks) doesn't trigger because got > cgp->cg_nclusterblks. This wreaks havoc in the code after that. Fix: Move one source bit to the left :-) Noticed by: Mike Hibler <mike@fast.cs.utah.edu> Submitted by: Kirk McKusick <mckusick@McKusick.COM>	1997-08-04 07:30:43 +00:00
Bruce Evans	1fd0b0588f	Removed unused #includes.	1997-08-02 14:33:27 +00:00
Bruce Evans	401116d519	Use smalllblktosize() instead of multiplying small block numbers by fs->fs_bsize. The macro is usually faster and makes it clearer that the multiplication can't overflow.	1997-04-10 13:17:09 +00:00
Bruce Evans	fce002fdef	Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include it when it is not used. In most cases, the reasons for including it went away when the special ioctl headers became self-sufficient.	1997-03-24 11:25:10 +00:00
Bruce Evans	90b42d7e40	Fixed corrupted newline and corrupted tab in previous commit.	1997-03-24 03:19:37 +00:00
Guido van Rooij	8f89943eda	Add generation number randomization. Newly created filesystems wil now automatically have random generation numbers. The kenel way of handling those also changed. Further it is advised to run fsirand on all your nfs exported filesystems. the code is mostly copied from OpenBSD, with the randomization chanegd to use /dev/urandom Reviewed by: Garrett Obtained from: OpenBSD	1997-03-23 20:08:22 +00:00
Bruce Evans	3ac4d1ef0c	Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.	1997-03-23 03:37:54 +00:00
Bruce Evans	3c81694426	Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.	1997-03-22 06:53:45 +00:00
Peter Wemm	cc9d89901f	Restore the lost MNT_LOCAL flag twiddle. Lite2 has a different mechanism of setting it (compiled into vfs_conf.c), but we have a dynamic system in place. This could probably be better done via a runtime configure flag in the VFS_SET() VFS declaration, perhaps VFCF_LOCAL, and have the VFS code propagate this down into MNT_LOCAL at mount time. The other FS's would need to be updated, havinf UFS and MSDOSFS filesystems without MNT_LOCAL breaks a few things.. the man page rebuild scans for local filesystems and currently fails, I suspect that other tools like find and tar with their "local filesystem only" modes might be affected.	1997-03-18 19:50:12 +00:00
Søren Schmidt	6ae8358759	Fix support for != 512 byte sector devices. Restores the use of SBLOCK instead of the BSOFF/sectorsize calculation. Using SBLOCK is bogus however in that it uses DEV_BSIZE instead of the actual sector size, but that is taken care of in other places. Changing the SBLOCK would be better, but it affects the system in other places, and doing it this way makes it possible to use filesystems that was made before the lite2 merge.	1997-03-15 18:58:10 +00:00
Mike Pritchard	5ace3b260a	Update a number of panic messages to reflect the actual name of the routine that caused the panic.	1997-03-09 06:00:44 +00:00
Bruce Evans	e4c4150e70	Fixed connection of vfs.ffs node to the sysctl tree.	1997-03-04 18:35:15 +00:00
Peter Wemm	6875d25465	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.	1997-02-22 09:48:43 +00:00
Mike Pritchard	724ab19569	Add function prototypes for most of the new Lite2 functions. Also made a few of the miscfs routines static to be consistent. Some modules simply required some additional #includes to remove -Wall warnings.	1997-02-12 06:52:51 +00:00
Mike Pritchard	812ac98e83	Correct the new Lite2 #ifdef DIAGNOSTIC ffs_checkblk routine to not return without setting a return value when it can't read a block error or detects a bad cylinder group, since the caller is expecting a return value. It will now panic at this point, since the thing to do in this case would be to return a "bad block" status to the caller, and the caller will panic anyways when that happens. Also updated to panic strings in this routine to read "ffs_checkblk: ..." instead of "checkblk: ...".	1997-02-10 17:05:30 +00:00
Mike Pritchard	d2b52eb257	Make ffs_subr.c compile when DIAGNOSTIC is defined. It looks like this was broken before the Lite2 merge :-(. VOP_BMAP was being called with the wrong number of arguments.	1997-02-10 16:27:53 +00:00
John Dyson	996c772f58	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>	1997-02-10 02:22:35 +00:00

... 4 5 6 7 8 ...

641 Commits