freebsd-skq

Author	SHA1	Message	Date
phk	709379c1ae	Another round of the <sys/queue.h> FOREACH transmogriffer. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 16:08:18 +00:00
phk	e87f7a15ad	Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 13:13:25 +00:00
phk	f3b4fbe35f	Use <sys/queue.h> macro API.	2001-02-04 12:37:48 +00:00
phk	236808f33a	Remove a DIAGNOSTIC check which belongs in <sys/queue.h> if anyplace at all.	2001-02-04 11:53:51 +00:00
iedowse	be2876f24f	Extend the sanity checks in ufs_lookup to ensure that each directory entry fits within its DIRBLKSIZ block. The surrounding code is extremely fragile with respect to corruption of the directory entry 'd_reclen' field; if directory corruption occurs, it can blindly scan forward beyond the end of the filesystem block. Usually this results in a 'fault on nofault entry' panic. Directory corruption is now much more likely to be detected, resulting in a 'ufs_dirbad' panic. If the filesystem is read-only, it will simply print a warning message, and skip the corrupted block. Reviewed by: mckusick	2001-02-04 01:52:11 +00:00
iedowse	e061532c92	Use the correct flags field when checking for a read-only filesystem in ufs_dirbad(). The mnt_stat.f_flags field is only updated by the syscalls *statfs and getfsstat, so mnt_flag should be used instead. This only affects whether or not a panic is generated on detection of certain types of directory corruption. Reviewed by: mckusick	2001-02-03 21:25:32 +00:00
dillon	1f3366270d	Fix a race between the syncer and umount. When you umount a softupdates filesystem softdep_process_worklist() is called in a loop until it indicates that no dependancies remain, but the determination of that fact depends on there only being one softdep_process_worklist() instance running. It was possible for the syncer to also be running softdep_process_worklist() and the pre-existing checks in the code to prevent this were not sufficient to prevent the race. This patch solves the problem. Approved-by: mckusick	2001-01-30 06:31:59 +00:00
jasone	8d2ec1ebc4	Convert all simplelocks to mutexes and remove the simplelock implementations.	2001-01-24 12:35:55 +00:00
iedowse	5cc8ff22fa	The ffs superblock includes a 128-byte region for use by temporary in-core pointers to summary information. An array in this region (fs_csp) could overflow on filesystems with a very large number of cylinder groups (~16000 on i386 with 8k blocks). When this happens, other fields in the superblock get corrupted, and fsck refuses to check the filesystem. Solve this problem by replacing the fs_csp array in 'struct fs' with a single pointer, and add padding to keep the length of the 128-byte region fixed. Update the kernel and userland utilities to use just this single pointer. With this change, the kernel no longer makes use of the superblock fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c to indicate that these fields must be calculated for compatibility with older kernels. Reviewed by: mckusick	2001-01-15 18:30:40 +00:00
mckusick	8ef2c7028e	Properly compute the size of the final block of superblock summary information. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2001-01-12 21:56:55 +00:00
rwatson	8e64b8803d	o Commit reems of style(9) changes, whitespace improvements, and comment cleanups. Obtained from: TrustedBSD Project	2001-01-07 23:45:56 +00:00
rwatson	34031fcb8d	o Zero the ufs_extattr_header length field (not necessary, but not a bad idea either) in ufs_extattr_rm. o More completely fill out the local_aio structure when writing out the zero'd extended attribute in ufs_extattr_rm -- previoulsy, this worked fine, but probably should not have. This corrects extraneous warnings about inconsistent inodes following file deletion. Reviewed by: jedgar	2001-01-07 23:31:51 +00:00
rwatson	d0fa22cb5d	o Add an additional EA inconsistency reporting opportunity in ufs_extattr_rm. o Make both reporting locations report the function name where the inconsistency is discovered, as well as the inode number in question. Reviewed by: jedgar	2001-01-07 23:27:58 +00:00
rwatson	17469eca96	o Make call to ufs_extattr_rm() in ufs_extattr_vnode_inactive() use NULL as the credential, not 0, so as to make it more clear what's going on. Obtained from: TrustedBSD Project	2001-01-07 21:38:26 +00:00
rwatson	e32240b99a	o Remove unnecessary sanity check involving requested offset of extended attribute read--the offset is required to be 0 by an earlier check, meaning that it will always be within the scope of the attribute data. This change should have no impact on executed code paths other than removing the unnecessary check: please report if any new failures start to occur as a result. Obtained from: TrustedBSD Project	2001-01-07 21:07:22 +00:00
dillon	fd223545d4	This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.	2000-12-26 19:41:38 +00:00
mckusick	d1a83511c9	Several small but important fixes for snapshots: 1) Be more tolerant of missing snapshot files by only trying to decrement their reference count if they are registered as active. 2) Fix for snapshots of filesystems with block sizes larger than 8K (from Ollivier Robert <roberto@eurocontrol.fr>). 3) Fix to avoid losing last block in snapshot file when calculating blocks that need to be copied (from Don Coleman <coleman@coleman.org>).	2000-12-19 04:41:09 +00:00
mckusick	1198265231	Get rid of spurious check in ffs_truncate for i_size == length which fails to set the modification time on the file. The same check a few lines later takes the correct action. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2000-12-19 04:20:13 +00:00
assar	4d01c1e206	add a stub for softdep_slowdown so that it's possible to build the kernel without SOFTUPDATES	2000-12-17 23:59:56 +00:00
dillon	69242be380	Avoid a data-consistency race between write() and mmap() by ensuring that newly allocated blocks are zerod. The race can occur even in the case where the write covers the entire block. Reported by: Sven Berkvens <sven@berkvens.net>, Marc Olzheim <zlo@zlo.nu>	2000-12-17 23:57:05 +00:00
tanimura	016329311e	- Move ifs_init() so that it can initialize ifs_inode_hash_mtx. - s/ffs_inode_hash_lock/ifs_inode_hash_lock/	2000-12-14 09:15:27 +00:00
tanimura	635b424e75	Do not race for the lock of an inode hash. Reviewed by: jhb	2000-12-13 10:04:01 +00:00
mckusick	f52f7b670a	Preventing runaway kernel soft updates memory, take three. Previously, the syncer process was the only process in the system that could process the soft updates background work list. If enough other processes were adding requests to that list, it would eventually grow without bound. Because some of the work list requests require vnodes to be locked, it was not generally safe to let random processes process the work list while they already held vnodes locked. By adding a flag to the work list queue processing function to indicate whether the calling process could safely lock vnodes, it becomes possible to co-opt other processes into helping out with the work list. Now when the worklist gets too large, other processes can safely help out by picking off those work requests that can be handled without locking a vnode, leaving only the small number of requests requiring a vnode lock for the syncer process. With this change, it appears possible to keep even the nastiest workloads under control. Submitted by: Paul Saab <ps@yahoo-inc.com>	2000-12-13 08:30:35 +00:00
dwmalone	dd75d1d73b	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
phk	c3f2ee9700	Staticize some malloc M_ instances.	2000-12-08 20:09:00 +00:00
dillon	978bf0288d	Add necessary bwillwrite() in writev() entry point. Deal with excessive dirty buffers when msync() syncs non-contiguous dirty buffers by checking for the case in UFS before checking for clusterability.	2000-12-06 20:55:09 +00:00
mckusick	690af6322b	More aggressively rate limit the growth of soft dependency structures in the face of multiple processes doing massive numbers of filesystem operations. While this patch will work in nearly all situations, there are still some perverse workloads that can overwhelm the system. Detecting and handling these perverse workloads will be the subject of another patch. Reviewed by: Paul Saab <ps@yahoo-inc.com> Obtained from: Ethan Solomita <ethan@geocast.com>	2000-11-20 06:22:39 +00:00
dillon	2ace352085	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>	2000-11-18 23:06:26 +00:00
mckusick	56cb617a9d	When deleting a file, the ordering of events imposed by soft updates is to first write the deleted directory entry to disk, second write the zero'ed inode to disk, and finally to release the freed blocks and the inode back to the cylinder-group map. As this ordering requires two disk writes to occur which are normally spaced about 30 seconds apart (except when memory is under duress), it takes about a minute from the time that a file is deleted until its inode and data blocks show up in the cylinder-group map for reallocation. If a file has had only a brief lifetime (less than 30 seconds from creation to deletion), neither its inode nor its directory entry may have been written to disk. If its directory entry has not been written to disk, then we need not wait for that directory block to be written as the on-disk directory block does not reference the inode. Similarly, if the allocated inode has never been written to disk, we do not have to wait for it to be written back either as its on-disk representation is still zero'ed out. Thus, in the case of a short lived file, we can simply release the blocks and inode to the cylinder-group map immediately. As the inode and its blocks are released immediately, they are immediately available for other uses. If they are not released for a minute, then other inodes and blocks must be allocated for short lived files, cluttering up the vnode and buffer caches. The previous code was a bit too aggressive in trying to release the blocks and inode back to the cylinder-group map resulting in their being made available when in fact the inode on disk had not yet been zero'ed. This patch takes a more conservative approach to doing the release which avoids doing the release prematurely.	2000-11-14 09:00:25 +00:00
bde	fee8e1a16f	Fixed breakage of mknod() in rev.1.48 of ext2_vnops.c and rev.1.126 of ufs_vnops.c: 1) i_ino was confused with i_number, so the inode number passed to VFS_VGET() was usually wrong (usually 0U). 2) ip was dereferenced after vgone() freed it, so the inode number passed to VFS_VGET() was sometimes not even wrong. Bug (1) was usually fatal in ext2_mknod(), since ext2fs doesn't have space for inode 0 on the disk; ino_to_fsba() subtracts 1 from the inode number, so inode number 0U gives a way out of bounds array index. Bug(1) was usually harmless in ufs_mknod(); ino_to_fsba() doesn't subtract 1, and VFS_VGET() reads suitable garbage (all 0's?) from the disk for the invalid inode number 0U; ufs_mknod() returns a wrong vnode, but most callers just vput() it; the correct vnode is eventually obtained by an implicit VFS_VGET() just like it used to be. Bug (2) usually doesn't happen.	2000-11-04 08:10:56 +00:00
eivind	1afa7eea27	Give vop_mmap an untimely death. The opportunity to give it a timely death timed out in 1996.	2000-11-01 17:57:24 +00:00
phk	cbbd2b08c2	Add a missing <sys/systm.h>	2000-10-30 20:37:19 +00:00
phk	ff5cdfae2d	Move suser() and suser_xxx() prototypes and a related #define from <sys/proc.h> to <sys/systm.h>. Correctly document the #includes needed in the manpage. Add one now needed #include of <sys/systm.h>. Remove the consequent 48 unused #includes of <sys/proc.h>.	2000-10-29 16:06:56 +00:00
phk	f82e4ca62c	Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ing the offending inline function (BUF_KERNPROC) on it being #included already. I'm not sure BUF_KERNPROC() is even the right thing to do or in the right place or implemented the right way (inline vs normal function). Remove consequently unneeded #includes of <sys/proc.h>	2000-10-29 14:54:55 +00:00
phk	94a5006c9a	Remove unneeded #include <sys/proc.h> lines.	2000-10-29 13:57:19 +00:00
rwatson	9c993b44d0	o Introduce new VOP_ACCESS() flag VADMIN, allowing file systems to perform "administrative" authorization checks. In most cases, the VADMIN test checks to make sure the credential effective uid is the same as the file owner. o Modify vaccess() to set VADMIN as an available right if the uid is appropriate. o Modify references to uid-based access control operations such that they now always invoke VOP_ACCESS() instead of using hard-coded policy checks. o This allows alternative UFS policies to be implemented by replacing only ufs_access() (such as mandatory system policies). o VOP_ACCESS() requires the caller to hold an exclusive vnode lock on the vnode: I believe that new invocations of VOP_ACCESS() are always called with the lock held. o Some direct checks of the uid remain, largely associated with the QUOTA and SUIDDIR code. Reviewed by: eivind Obtained from: TrustedBSD Project	2000-10-19 07:53:59 +00:00
adrian	0458054c4e	Initial commit of IFS - a inode-namespaced FFS. Here is a short description: How it works: -- Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.) I didn't see the need in duplicating all of sys/ufs/ffs to get this off the ground. File creation is done through a special file - 'newfile' . When newfile is called, the system allocates and returns an inode. Note that newfile is done in a cloning fashion: fd = open("newfile", O_CREAT\|O_RDWR, 0644); fstat(fd, &st); printf("new file is %d\n", (int)st.st_ino); Once you have created a file, you can open() and unlink() it by its returned inode number retrieved from the stat call, ie: fd = open("5", O_RDWR); The creation permissions depend entirely if you have write access to the root directory of the filesystem. To get the list of currently allocated inodes, VOP_READDIR has been added which returns a directory listing of those currently allocated. -- What this entails: * patching conf/files and conf/options to include IFS as a new compile option (and since ifs depends upon FFS, include the FFS routines) * An entry in i386/conf/NOTES indicating IFS exists and where to go for an explanation * Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS routines require (ffs_mount() and ffs_reload()) * a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS routines. IFS replaces some of the vfsops, and a handful of vnops - most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR(). Any other directory operation is marked as invalid. What this results in: * an IFS partition's create permissions are controlled by the perm/ownership of the root mount point, just like a normal directory * Each inode has perm and ownership too * IFS does NOT mean an FFS partition can be opened per inode. This is a completely seperate filesystem here * Softupdates doesn't work with IFS, and really I don't think it needs it. Besides, fsck's are FAST. (Try it :-) * Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC). Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against this particular inode, and unravelling THAT code isn't trivial. Therefore, useful inodes start at 3. Enjoy, and feedback is definitely appreciated!	2000-10-14 03:02:30 +00:00
rwatson	113a9e1b35	o Sanity check was inverted, resulting in a possible spurious panic during unmount if extended attributes were in use. Correct by removing an unneeded (and undesirable) '!'.	2000-10-09 20:04:39 +00:00
eivind	4a39f454a0	Blow away the v_specmountpoint define, replacing it with what it was defined as (rdev->si_mountpoint)	2000-10-09 17:31:39 +00:00
rwatson	8d202cd4d0	o Move initialization of ump from mp to the top of the function so that it is defined whenm used in ufs_extattr_uepm_destroy(), fixing a panic due to a NULL pointer dereference. Submitted by: Wesley Morgan <morganw@chemicals.tacorp.com>	2000-10-06 15:31:28 +00:00
rwatson	469c6c2bf3	o Add call to ufs_extattr_uepm_destroy() in ffs_unmount() so as to clean up lock on extattrs. o Get for free a comment indicating where auto-starting of extended attributes will eventually occur, as it was in my commit tree also. No implementation change here, only a comment.	2000-10-04 04:44:51 +00:00
rwatson	4e14377293	o Correct use of lockdestroy() by adding a new ufs_extattr_uepm_destroy() call, which should be the last thing down to a per-mount extattr management structure, after ufs_extattr_stop() on the file system. This currently has the effect only of destroying the per-mount lock on extended attributes, and clearing appropriate flags. o Remove inappropriate invocation in ufs_extattr_vnode_inactive().	2000-10-04 04:41:33 +00:00
jasone	4e290e67b7	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.	2000-10-04 01:29:17 +00:00
bp	6110b03d24	Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_nolock() functions maintain only interlock lock. vop_stdlock() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick	2000-09-25 15:24:04 +00:00
rwatson	40aced8438	o Permit UFS Extended Attributes to be associated with special devices and FIFOs. Obtained from: TrustedBSD Project	2000-09-21 19:06:02 +00:00
rwatson	07ac219faf	o Disallow privileged processes in jail() from directly accessing system namespace extended attributes. o Document privilege/jail() interaction relating to extended attributes. Obtained from: TrustedBSD Project	2000-09-18 18:10:13 +00:00
rwatson	3546d27e15	o Allow privileged processes in jail() to override sticky bit behavior on directories. o Allow privileged processes in jail() to create inodes with the setgid bit set even if they are not a member of the group denoted by the file creation gid. This occurs due to inherited gid's from parent directories on file creation, allowing a user to create a file with a gid that is not in the creating process's credentials. Obtained from: TrustedBSD Project	2000-09-18 18:03:49 +00:00
rwatson	b324dcbd3d	o Add a comment clarifying interaction between jail(), privileged processes, and UFS file flags. Here's what the comment says, for reference: Privileged processes in jail() are permitted to modify arbitrary user flags on files, but are not permitted to modify system flags. In other words, privilege does allow a process in jail to modify user flags for objects that the process does not own, but privilege will not permit the setting of system flags on the file. Obtained from: TrustedBSD Project	2000-09-18 17:58:15 +00:00
rwatson	f193def48e	o Add missing PRISON_ROOT allowing a privileged process in a jail() to not remove the setuid/setgid bits by virtue of a change to a file with those bits set, even if the process doesn't own the file, or isn't a group member of the file's gid. Obtained from: TrustedBSD Project	2000-09-18 17:53:22 +00:00
rwatson	4ba86892be	o Substitute suser() calls for direct credential checks, which is now safe as suser() no longer sets ASU. o Note that in some cases, the PRISON_ROOT flag is used even though no process structure is passed, to indicate that if a process structure (and hence jail) was available, it would be ok. In the long run, the jail identifier should probably be moved to ucred, as the uidinfo information was. o Some uid 0 checks remain relating to the quota code, which I'll leave for another day. Reviewed by: phk, eivind Obtained from: TrustedBSD Project	2000-09-18 16:13:02 +00:00

1 2 3 4 5 ...

706 Commits