freebsd-skq

Author	SHA1	Message	Date
rwatson	f193def48e	o Add missing PRISON_ROOT allowing a privileged process in a jail() to not remove the setuid/setgid bits by virtue of a change to a file with those bits set, even if the process doesn't own the file, or isn't a group member of the file's gid. Obtained from: TrustedBSD Project	2000-09-18 17:53:22 +00:00
rwatson	4ba86892be	o Substitute suser() calls for direct credential checks, which is now safe as suser() no longer sets ASU. o Note that in some cases, the PRISON_ROOT flag is used even though no process structure is passed, to indicate that if a process structure (and hence jail) was available, it would be ok. In the long run, the jail identifier should probably be moved to ucred, as the uidinfo information was. o Some uid 0 checks remain relating to the quota code, which I'll leave for another day. Reviewed by: phk, eivind Obtained from: TrustedBSD Project	2000-09-18 16:13:02 +00:00
des	86bd96948b	Silence a warning.	2000-09-17 19:41:26 +00:00
bp	02544af7d4	Add new flag PDIRUNLOCK to the component.cn_flags which should be set by filesystem lookup() routine if it unlocks parent directory. This flag should be carefully tracked by filesystems if they want to work properly with nullfs and other stacked filesystems. VFS takes advantage of this flag to perform symantically correct usage of vrele() instead of vput() if parent directory already unlocked. If filesystem fails to track this flag then previous codepath in VFS left unchanged. Convert UFS code to set PDIRUNLOCK flag if necessary. Other filesystmes will be changed after some period of testing. Reviewed in general by: mckusick, dillon, adrian Obtained from: NetBSD	2000-09-17 07:26:42 +00:00
phk	f2b4e59044	Remove a pointless casting of a gid_t to a gid_t.	2000-09-16 18:20:27 +00:00
bp	8437d5b6f4	Add VOP_*VOBJECT vops, because MFS requires explicit vop specification. Noted by: knu	2000-09-12 16:21:16 +00:00
rwatson	d12caa21f3	o Variety of extended attribute fixes - In ufs_extattr_enable(), return EEXIST instead of EOPNOTSUPP if the caller tries to configure an attribute name that is already configured - Throughout, add IO_NODELOCKED to VOP_{READ,WRITE} calls to indicate lock status of passed vnode. Apparently not a problem, but worth fixing. - For all writes, make use of IO_SYNC consistent. Really, IO_UNIT and combining of VOP_WRITE's should happen, but I don't have that tested. At least with this, it's consistent usage. (pointed out by: bde) - In ufs_extattr_get(), fixed nested locking of backing vnode (fine due to recursive lock support, but make it more consistent with other code) - In ufs_extattr_get(), clean up return code to set uio_resid more consistently with other pieces of code (worked fine, this is just a cleanup) - Fix ufs_extattr_rm(), which was broken--effectively a nop. - Minor comment and whitespace fixes. Obtained from: TrustedBSD Project	2000-09-12 05:35:47 +00:00
jhb	e467813373	Fix a 64-bitism. Use size_t instead of int for 4th argument to copyinstr. Approved by: rwatson	2000-09-11 05:43:02 +00:00
mckusick	7438b4ca6f	Cannot do MALLOC with M_WAITOK while holding ACQUIRE_LOCK Obtained from: Ethan Solomita <ethan@geocast.com>	2000-09-07 23:02:55 +00:00
jasone	769e0f974d	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh	2000-09-07 01:33:02 +00:00
rwatson	e6a536221c	Modify extended attribute protection model to authorize based on attribute namespace and DAC protection on file: - Attribute names beginning with '$' are in the system namespace - The attribute name "$" is reserved - System namespace attributes may only be read/set by suser() or by kernel (cred == NULL) - Other attribute names are in the application namespace - The attribute name "" is reserved - Application namespace attributes are protected in the manner of the target file permission o Kernel changes - Add ufs_extattr_valid_attrname() to check whether the requested attribute "set" or "enable" is appropriate (i.e., non-reserved) - Modify ufs_extattr_credcheck() to accept target file vnode, not to take inode uid - Modify ufs_extattr_credcheck() to check namespace, then enforce either kernel/suser for system namespace, or vaccess() for application namespace o EA backing file format changes - Remove permission fields from extended attribute backing file header - Bump extended attribute backing file header version to 3 o Update extattrctl.c and extattrctl.8 - Remove now deprecated -r and -w arguments to initattr, as permissions are now implicit - (unrelated) fix error reporting and unlinking during failed initattr to remove duplicate/inaccurate error messages, and to only unlink if the failure wasn't in the backing file open() Obtained from: TrustedBSD Project	2000-09-02 20:31:26 +00:00
rwatson	e54ea574fa	o Restructure vaccess() so as to check for DAC permission to modify the object before falling back on privilege. Make vaccess() accept an additional optional argument, privused, to determine whether privilege was required for vaccess() to return 0. Add commented out capability checks for reference. Rename some variables to make it more clear which modes/uids/etc are associated with the object, and which with the access mode. o Update file system use of vaccess() to pass NULL as the optional privused argument. Once additional patches are applied, suser() will no longer set ASU, so privused will permit passing of privilege information up the stack to the caller. Reviewed by: bde, green, phk, -security, others Obtained from: TrustedBSD Project	2000-08-29 14:45:49 +00:00
rwatson	251e663e8a	o Correct spelling of ufs_exttatr_find_attr -> ufs_extattr_find_attr o Add "const" qualifier to attrname argument of various calls to remove warnings Obtained from: TrustedBSD Project	2000-08-26 22:00:58 +00:00
phk	b648921acc	Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c) Remove old DEVFS support fields from dev_t. Make uid, gid & mode members of dev_t and set them in make_dev(). Use correct uid, gid & mode in make_dev in disk minilayer. Add support for registering alias names for a dev_t using the new function make_dev_alias(). These will show up as symlinks in DEVFS. Use makedev() rather than make_dev() for MFSs magic devices to prevent DEVFS from noticing this abuse. Add a field for DEVFS inode number in dev_t. Add new DEVFS in fs/devfs. Add devfs cloning to: disk minilayer (ie: ad(4), sd(4), cd(4) etc etc) md(4), tun(4), bpf(4), fd(4) If DEVFS add -d flag to /sbin/inits args to make it mount devfs. Add commented out DEVFS to GENERIC	2000-08-20 21:34:39 +00:00
phk	3d2aecdc81	Centralize the canonical vop_access user/group/other check in vaccess(). Discussed with: bde	2000-08-20 08:36:26 +00:00
tegge	6dac8645b8	Initialize *countp to 0 in stub for softdep_flushworklist(). This allows ffs_fsync() to break out of a loop that might otherwise be infinite on kernels compiled without the SOFTUPDATES option. The observed symptom was a system hang at the first unmount attempt.	2000-08-09 00:41:54 +00:00
roberto	3d4cf3c369	Fix the lockmgr panic everyone is seeing at shutdown time. vput assumes curproc is the lock holder, but it's not true in this case. Thanks a lot Luoqi ! Submitted by: luoqi Tested by: phk	2000-08-01 14:15:07 +00:00
peter	3f9fc32ece	Minor tweak - removed unused variable 'struct mount *mp';	2000-07-28 22:28:05 +00:00
peter	cfc0cd38b7	Minor change: fix warning - move a 'struct vnode *vp' declaration inside a #ifdef DIAGNOSTIC to match its corresponding usage.	2000-07-28 22:27:00 +00:00
mckusick	b86877bef0	Clean up the snapshot code so that it no longer depends on the use of the SF_IMMUTABLE flag to prevent writing. Instead put in explicit checking for the SF_SNAPSHOT flag in the appropriate places. With this change, it is now possible to rename and link to snapshot files. It is also possible to set or clear any of the owner, group, or other read bits on the file, though none of the write or execute bits can be set. There is also an explicit test to prevent the setting or clearing of the SF_SNAPSHOT flag via chflags() or fchflags(). Note also that the modify time cannot be changed as it needs to accurately reflect the time that the snapshot was taken. Submitted by: Robert Watson <rwatson@FreeBSD.org>	2000-07-26 23:07:01 +00:00
phk	9aed458325	Fix the "mfs_badop[vop_getwritemount] = 45" messages.	2000-07-26 17:53:04 +00:00
mckusick	4223e4856e	Add stub for softdep_flushworklist() so that kernels compiled without the SOFTUPDATES option will load correctly. Obtained from: John Baldwin <jhb@bsdi.com>	2000-07-25 05:28:59 +00:00
mckusick	7d9ff6c133	Eliminate periodic 'mfs_badop[vop_getwritemount] = 45' messages. Submitted by: Sheldon Hearn <sheldonh@uunet.co.za>	2000-07-25 05:11:57 +00:00
mckusick	acc66855bf	This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.	2000-07-24 05:28:33 +00:00
rwatson	1293199940	o Marius pointed out an unusually inconvenient upper bound on extended attribute data size. o Fortunately it turned out to be an unused constant left over from an earlier implementation, and is therefore being removed so as not to confuse casual observers. Submitted by: mbendiks@eunet.no	2000-07-14 03:30:52 +00:00
bp	c956469da1	Prevent possible dereference of NULL pointer. Submitted by: Marius Bendiksen <mbendiks@eunet.no>	2000-07-13 02:17:14 +00:00
mckusick	6e81eafe20	Brain fault, forgot to update ffs_snapshot.c with the new calling convention for vn_start_write.	2000-07-12 00:27:27 +00:00
mckusick	a3d0c189ea	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).	2000-07-11 22:07:57 +00:00
mckusick	b2ed023a03	Clean up warning about undeclared function by declaring softdep_fsync in mount.h instead of ffs_extern.h. The correct solution is to use an indirect function pointer so that the kernel does not have to be built with options FFS, but that will be left for another day.	2000-07-11 19:28:26 +00:00
phk	9241ff9fc6	Finish repo-copy: Move ufs/ufs/ufs_disksubr.c to kern/subr_disklabel.c. These functions are not UFS specific and are in fact used all over the place.	2000-07-10 13:48:06 +00:00
mckusick	92dfcced5b	Delete README as it is now obsolete. Relevant information is in README.softupdates.	2000-07-08 02:32:49 +00:00
mckusick	0ab089c771	Update to reflect current status.	2000-07-08 02:31:21 +00:00
mckusick	5e6b00a0a7	Get userland visible flags added for snapshots to give a few days advance preparation for them to get migrated into place so that subsequent changes in utilities will not fail to compile for lack of up-to-date header files in /usr/include.	2000-07-04 04:58:34 +00:00
mckusick	040e64cd97	Move the truncation code out of vn_open and into the open system call after the acquisition of any advisory locks. This fix corrects a case in which a process tries to open a file with a non-blocking exclusive lock. Even if it fails to get the lock it would still truncate the file even though its open failed. With this change, the truncation is done only after the lock is successfully acquired. Obtained from: BSD/OS	2000-07-04 03:34:11 +00:00
phk	2a91a9dd04	Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES, that is way cleaner than using the softupdates_stub stunt, which should be killed when convenient. Discussed with: mckusick	2000-07-03 13:26:54 +00:00
phk	0535bee2fb	Move prtactive to vfs from ufs. It is used all over the place.	2000-06-27 07:46:22 +00:00
ache	8b610ecf81	Remove obsoleted info about linking from contrib	2000-06-24 13:29:25 +00:00
mckusick	aa0e1b74b0	Update to new copyright.	2000-06-22 00:29:53 +00:00
mckusick	2be2bf630e	When running with quotas enabled on a filesystem using soft updates, the system would panic when a user's inode quota was exceeded (see PR 18959 for details). This fixes that problem. PR: 18959 Submitted by: Jason Godsey <jason@unixguy.fidalgo.net>	2000-06-18 22:14:28 +00:00
mckusick	cad9618566	Some additional performance improvements. When freeing an inode check to see if it has been committed to disk. If it has never been written, it can be freed immediately. For short lived files this change allows the same inode to be reused repeatedly. Similarly, when upgrading a fragment to a larger size, if it has never been claimed by an inode on disk, it too can be freed immediately making it available for reuse often in the next slowly growing block of the same file.	2000-06-18 22:05:57 +00:00
phk	cb90cb2b60	Revert part of my bioops change which implemented panic(8).	2000-06-16 14:32:13 +00:00
phk	74e1ff15ad	ARGH! I have too many source trees :-( Fix prototype errors in last commit.	2000-06-16 13:00:33 +00:00
phk	4ec91666fa	Virtualizes & untangles the bioops operations vector. Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@	2000-06-16 08:48:51 +00:00
phk	34fec64322	Remove a comment which should never have made it in.	2000-06-14 21:48:19 +00:00
rwatson	051a92f4cd	o Remove unneeded off_t variable to clean up compile warning Obtained from: TrustedBSD Project	2000-06-05 14:22:51 +00:00
rwatson	584644aae9	o If FFS_EXTATTR is defined, don't print out an error message on unmount if an FFS partition returns EOPNOTSUPP, as it just means extended attributes weren't enabled on that partition. Prevents spurious warning per-partition at shutdown.	2000-06-04 04:50:36 +00:00
jake	961b97d434	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others	2000-05-26 02:09:24 +00:00
jake	d93fbc9916	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd	2000-05-23 20:41:01 +00:00
rwatson	5dc4cdc7ab	s/ffs_unmonut/ffs_unmount/ in a gratuitous ufs_extattr printf. Reported by: knu	2000-05-07 17:21:08 +00:00
phk	36c3965ff9	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter	2000-05-05 09:59:14 +00:00
rwatson	51a3d7f35d	Don't allow VOP_GETEXTATTR to set uio->uio_offset != 0, as we don't provide locking over extended attribute operations, requiring that individual operations be atomic. Allowing non-zero starting offsets permits applications/etc to put themselves at risk for inconsistent behavior. As VOP_SETEXTATTR already prohibited non-zero write offsets, this makes sense. Suggested by: Andreas Gruenbacher <a.gruenbacher@bestbits.at>	2000-05-03 05:50:46 +00:00
phk	10914aa708	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude	2000-04-30 18:52:11 +00:00
phk	1931990da0	s/biowait/bufwait/g Prodded by: several.	2000-04-29 16:25:22 +00:00
phk	ce2aa22c93	Remove unneeded #include <sys/kernel.h>	2000-04-29 15:36:14 +00:00
mckusick	9ae79363c4	When files are given to users by root, the quota system failed to reset their grace timer as their ownership crossed the soft limit threshhold. Thus if they had been over their limit in the past, they were suddenly penalized as if they had been over their limit ever since. The fix is to check when root gives away files, that when the receiving user crosses their soft limit, their grace timer is reset. See the PR report for a detailed method of reproducing the bug. PR: kern/17128 Submitted by: Andre Albsmeier <andre.albsmeier@mchp.siemens.de> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	2000-04-28 06:12:56 +00:00
phk	7473110e20	Convert the magic MFS device to a VCHR. Detected by: obrien	2000-04-22 05:45:38 +00:00
rwatson	5a7c0dbf09	o Introduce an extended attribute backing file header magic number o Introduce an extended attribute backing file header version number	2000-04-19 20:12:41 +00:00
phk	6be1308ad1	Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>	2000-04-19 14:58:28 +00:00
rwatson	5b36470f20	o Cause attribute data writes to use IO_SYNC since this improves the chances of consistency with other file/directory meta-data in a write. In the current set of extended attribute applications, this does not hurt much. This should be discussed again later when it comes time to optimize performance of attributes. o Include an inode generation number in the per-attribute header information. This allows consistency verification to catch when a crash occurs, or an inode is recycled while attributes are not properly configured. For now, an irritating error message is displayed when an inconsistency occurs. At some point, may introduce an ``extattrctl check ...'' which catches these before attributes are enabled. Not today. If you get this message, it means you somehow managed to get your attribute backing file out of synch with the file system. When this occurs, attribute not found is returned (== undefined). Writes will overwrite the value there correcting the problem. Might want to think about introducing a new errno or two to handle this kind of situation. Discussed with: kris	2000-04-19 07:38:20 +00:00
phk	99e3753c3a	Retire bufqdisksort(), all drivers use bioqdisksort now.	2000-04-18 13:25:19 +00:00
jlemon	eb30412fdc	Remove unneeded cast.	2000-04-17 03:37:13 +00:00
jlemon	42f19b9069	Replace the POLLEXTEND extensions with the kqueue() mechanism.	2000-04-16 18:55:20 +00:00
rwatson	95acaf111c	Fix two bugs in extended attribute support for UFS/FFS: o Put back in {} removed during over-zealous cleanup of gratuitous debugging output during preparation for the commit. Due to the missing {}, writes on extended attributes always silently failed. Doh. o Don't unlock the target vnode if it's the backing vnode, as we don't lock the target vnode if it's the backing vnode.	2000-04-16 01:35:30 +00:00
phk	aaaef0b54e	Complete the bio/buf divorce for all code below devfs::strategy Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS	2000-04-15 05:54:02 +00:00
rwatson	a0dd5ab0fd	Introduce extended attribute support for FFS, allowing arbitrary (name, value) pairs to be associated with inodes. This support is used for ACLs, MAC labels, and Capabilities in the TrustedBSD security extensions, which are currently under development. In this implementation, attributes are backed to data vnodes in the style of the quota support in FFS. Support for FFS extended attributes may be enabled using the FFS_EXTATTR kernel option (disabled by default). Userland utilities and man pages will be committed in the next batch. VFS interfaces and man pages have been in the repo since 4.0-RELEASE and are unchanged. o ufs/ufs/extattr.h: UFS-specific extattr defines o ufs/ufs/ufs_extattr.c: bulk of support routines o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes o contrib/softupdates/ffs_softdep.c: extattr.h includes o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h (This should not be the case, and will be fixed in a future commit) Currently attributes are not supported in MFS. This will be fixed. Reviewed by: adrian, bp, freebsd-fs, other unthanked souls Obtained from: TrustedBSD Project	2000-04-15 03:34:27 +00:00
phk	f37bdf3ad7	Clone bio versions of certain bits of infrastructure: devstat_end_transaction_bio() bioq_* versions of bufq_* incl bioqdisksort() the corresponding "buf" versions will disappear when no longer used. Move b_offset, b_data and b_bcount to struct bio. Add BIO_FORMAT as a hack for fd.c etc. We are now largely ready to start converting drivers to use struct bio instead of struct buf.	2000-04-02 19:08:05 +00:00
phk	8ee11d587f	Move B_ERROR flag to b_ioflags and call it BIO_ERROR. (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.	2000-04-02 15:24:56 +00:00
dillon	057e33d02c	Change the write-behind code to take more care when starting async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normally, thus blocks in the file will still be reallocated for large (but still random) I/O's, but I/O will only be initiated for truely sequential writes. This solves a number of annoying situations, especially with DBM (hash method) writes, and also has the side effect of fixing a number of (stupid) benchmarks. Reviewed-by: mckusick	2000-04-02 00:55:28 +00:00
phk	9f5fd263aa	diff, patch and cvs didn't like these three last time around, try again.	2000-03-20 12:34:21 +00:00
phk	5df766a0f8	Rename the existing BUF_STRATEGY() to DEV_STRATEGY() substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.	2000-03-20 11:29:10 +00:00
phk	a246e10f55	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.	2000-03-20 10:44:49 +00:00
mckusick	a02c1c5b8a	Use 64-bit math to calculate if we have hit our freespace limit. Necessary for coherent results on filesystems bigger than 0.5Tb.	2000-03-17 03:44:47 +00:00
mckusick	5ce14e7844	Bug fixes for currently harmless bugs that could rise to bite the unwary if the code were called in slightly different ways. 1) In ufs_bmaparray() the code for calculating 'runb' will stop one block short of the first entry in an indirect block. i.e. if an indirect block contains N block numbers b[0]..b[N-1] then the code will never check if b[0] and b[1] are sequential. For reference, compare with the equivalent code that deals with direct blocks. 2) In ufs_lookup() there is an off-by-one error in the test that checks if dp->i_diroff is outside the range of the the current directory size. This is completely harmless, since the following while-loop condition 'dp->i_offset < endsearch' is never met, so the code immediately does a second pass starting at dp->i_offset = 0. 3) Again in ufs_lookup(), the condition in a sanity check is wrong for directories that are longer than one block. This bug means that the sanity check is only effective for small directories. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2000-03-15 07:18:15 +00:00
mckusick	acdd0d6f53	Use 64-bit math to decide if optimization needs to be changed. Necessary for coherent results on filesystems bigger than 0.5Tb. Submitted by: Paul Saab <ps@yahoo-inc.com>	2000-03-15 07:08:36 +00:00
dillon	464af2ea27	In the 'found' case for ufs_lookup() the underlying bp's data was being accessed after the bp had been releaed. A simple move of the brelse() solves the problem. Approved by: jkh Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2000-03-09 18:54:59 +00:00
dillon	414d15acb8	Fix a 'freeing free block' panic in UFS. The problem occurs when the filesystem fills up. If the first indirect block exists and FFS is able to allocate deeper indirect blocks, but is not able to allocate the data block, FFS improperly unwinds the indirect blocks and leaves a block pointer hanging to a freed block. This will cause a panic later when the file is removed. The solution is to properly account for the first block-pointer-to-an-indirect-block we had to create in a balloc operation and then unwind it if a failure occurs. Detective work by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: mckusick, Ian Dowse <iedowse@maths.tcd.ie> Approved by: jkh	2000-02-24 20:43:20 +00:00
rwatson	baa4395a04	After much consulting with bde, concluded that this fix was the best fix to the current jail/chflags interactions. This fix conditionalizes ``root behavior'' in the chflags() case on not being in jail, so attempts to perform a chflags in a jail are limited to what a normal user could do. For example, this does allow setting of user flags as appropriate, but prohibits changing of system flags. Reviewed by: bde	2000-02-22 03:56:58 +00:00
rwatson	fd37898b9f	Disable chflags() from within jail() so that root within jail can't make a mess in securelevel environments. Results in one warning during /etc/rc as it attempts to remove file flags, but this is harmless. Approved by: High Lord Hubbard	2000-02-20 01:10:36 +00:00
mckusick	541e13d43c	When writing out bitmap buffers, need to skip over ones that already have a write in progress. Otherwise one can get in an infinite loop trying to get them all flushed. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	2000-01-30 20:32:59 +00:00
mckusick	e8eebed1f3	During fastpath processing for removal of a short-lived inode, the set of restrictions for cancelling an inode dependency (inodedep) is somewhat stronger than originally coded. Since this check appears in two places, we codify it into the function check_inode_unwritten which we then call from the two sites, one freeing blocks and the other freeing directory entries. Submitted by: Steinar Haug via Matthew Dillon	2000-01-18 01:33:05 +00:00
mckusick	37dbb3e53f	Need to reorganize the flushing of directory entry (pagedep) dependencies so that they never try to lock an inode corresponding to ".." as this can lead to deadlock. We observe that any inode with an updated link count is always pushed into its buffer at the time of the link count change, so we do not need to do a VOP_UPDATE, but merely find its buffer and write it. The only time we need to get the inode itself is from the result of a mkdir whose name will never be ".." and hence locking such an inode will never request a lock above us in the filesystem tree. Thanks to Brian Fundakowski Feldman for providing the test program that tickled soft updates into hanging in "inode" sleep. Submitted by: Brian Fundakowski Feldman <green@FreeBSD.org>	2000-01-18 01:30:03 +00:00
mckusick	c6b8373708	Better bounding on softdep_flushfiles; other minor tweeks to checks.	2000-01-17 06:35:11 +00:00
mckusick	e7e567fb65	Must track multiple uncommitted renames until one ultimately gets committed to disk or is removed.	2000-01-17 06:28:18 +00:00
dillon	53da3b72da	Non-operational change, fix compiler warning. Reviewed by: mckusick	2000-01-14 04:39:28 +00:00
mckusick	7eac0e762a	Confirming Peter's fix (locking 101: release the lock before you go to sleep). Locking 101, part 2: do not look at buffer contents after you have been asleep. There is no telling what wonderous changes may have occurred.	2000-01-13 20:03:22 +00:00
peter	f45fdd3f47	Free the global softupdates lock prior to tsleep() in getdirtybuf(). This seems to be responsible for a bunch of panics where the process sleeps and something else finds softupdates "locked" when it shouldn't be. This commit is unreviewed, but has been a big help here. Previously my boxes would panic pretty much on the first fsync() that wrote something to disk.	2000-01-13 18:48:12 +00:00
mckusick	e5a3075fbb	Because cylinder group blocks are now written in background, it is no longer sufficient to get a lock on a buffer to know that its write has been completed. We have to first get the lock on the buffer, then check to see if it is doing a background write. If it is doing background write, we have to wait for the background write to finish, then check to see if that fullfilled our dependency, and if not to start another write. Luckily the explanation is longer than the fix.	2000-01-13 07:20:01 +00:00
mckusick	28d13b9ecf	A panic occurs during an fsync when a dirty block associated with a vnode has not been written (which would clear certain of its dependencies). The problems arises because fsync with MNT_NOWAIT no longer pushes all the dirty blocks associated with a vnode. It skips those that require rollbacks, since they will just get instantly dirty again. Such skipped blocks are marked so that they will not be skipped a second time (otherwise circular dependencies would never clear). So, we fsync twice to ensure that everything will be written at least once.	2000-01-13 07:17:39 +00:00
mckusick	2ba5b46007	The only known cause of this panic is running out of disk space. The problem occurs when an indirect block and a data block are being allocated at the same time. For example when the 13th block of the file is written, the filesystem needs to allocate the first indirect block and a data block. If the indirect block allocation succeeds, but the data block allocation fails, the error code dellocates the indirect block as it has nothing at which to point. Unfortunately, it does not deallocate the indirect block's associated dependencies which then fail when they find the block unexpectedly gone (ptr == 0 instead of its expected value). The fix is to fsync the file before doing the block rollback, as the fsync will flush out all of the dependencies. Once the rollback is done the file must be fsync'ed again so that the soft updates code does not find unexpected changes. This approach is much slower than writing the code to back out the extraneous dependencies, but running out of disk space is not expected to be a common occurence, so just getting it right is the main criterion. PR: kern/15063 Submitted by: Assar Westerlund <assar@stacken.kth.se>	2000-01-11 08:27:00 +00:00
mckusick	57887aa35c	We cannot proceed to free the blocks of the file until the dependencies have been cleaned up by deallocte_dependencies(). Once that is done, it is safe to post the request to free the blocks. A similar change is also needed for the freefile case.	2000-01-11 06:52:35 +00:00
phk	ae0c1ec8f7	Give vn_isdisk() a second argument where it can return a suitable errno. Suggested by: bde	2000-01-10 12:04:27 +00:00
mckusick	b09e759229	Missing FREE_LOCK call before handle_workitem_freeblocks. Submitted by: "Kenneth D. Merry" <ken@kdm.org>	2000-01-10 08:39:03 +00:00
mckusick	d4409da210	Several performance improvements for soft updates have been added: 1) Fastpath deletions. When a file is being deleted, check to see if it was so recently created that its inode has not yet been written to disk. If so, the delete can proceed to immediately free the inode. 2) Background writes: No file or block allocations can be done while the bitmap is being written to disk. To avoid these stalls, the bitmap is copied to another buffer which is written thus leaving the original available for futher allocations. 3) Link count tracking. Constantly track the difference in i_effnlink and i_nlink so that inodes that have had no change other than i_effnlink need not be written. 4) Identify buffers with rollback dependencies so that the buffer flushing daemon can choose to skip over them.	2000-01-10 00:24:24 +00:00
mckusick	db94728905	Keep tighter control of removal dependencies by limiting the number of dirrem structure rather than the collaterally created freeblks and freefile structures. Limit the rate of buffer dirtying by the syncer process during periods of intense file removal.	2000-01-09 23:35:38 +00:00
mckusick	51339d9a78	Reorganize softdep_fsync so that it only does the inode-is-flushed check before the inode is unlocked while grabbing its parent directory. Once it is unlocked, other operations may slip in that could make the inode-is-flushed check fail. Allowing other writes to the inode before returning from fsync does not break the semantics of fsync since we have flushed everything that was dirty at the time of the fsync call.	2000-01-09 23:14:57 +00:00
mckusick	6f8af35d26	Get rid of unreferenced function.	2000-01-09 22:42:42 +00:00
mckusick	722b90c9d5	Make static non-exported functions from soft updates.	2000-01-09 22:40:09 +00:00
peter	d53e4c1d80	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.	1999-12-29 05:07:58 +00:00
bde	af5509c2c7	Update the unclean flag for mount -u. I forgot to handle this case when I made the absence of the clean flag sticky in rev.1.88. This was a problem main for "mount /". There is no way to mount "/" for writing without using mount -u (normally implicitly), so after "mount -f /" of an unclean filesystem, the absence of the clean flag was sticky forever.	1999-12-23 15:42:14 +00:00
eivind	8befc1a2b8	Change incorrect NULLs to 0s	1999-12-21 11:14:12 +00:00
rwatson	4b6baecfc7	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind	1999-12-19 06:08:07 +00:00
mckusick	f9019037ba	The function request_cleanup() had a tsleep() with PCATCH. It is quite dangerous, since the process may hold locks at the point, and if it is stopped in that tsleep the machine may hang. Because the sleep is so short, the PCATCH is not required here, so it has been removed. For the future, the FreeBSD team needs to decide whether it is still reasonable to stop a process in tsleep, as that may affect any other code that uses PCATCH while holding kernel locks. Submitted by: Dmitrij Tejblum <tejblum@arc.hq.cti.ru> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-12-16 22:02:09 +00:00
eivind	87724eb673	Introduce NDFREE (and remove VOP_ABORTOP)	1999-12-15 23:02:35 +00:00
eivind	287836faea	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter	1999-12-11 16:13:02 +00:00
billf	92aa1686d2	Remove the 'alpha, use at your own risk' death-statement. Reviewed by: mckusick (verbally at FreeBSDcon)	1999-12-03 00:40:31 +00:00
billf	3d7f6e72a3	Fix typo, add $FreeBSD$	1999-12-03 00:34:26 +00:00
mckusick	579c93e793	Preferentially allocate the first indirect block in the same cylinder group as the inode. This makes a 15% difference in read speed for files in the 96K to 500K size range.	1999-12-01 19:33:12 +00:00
phk	ccda399c72	Retire MFS_ROOT and MFS_ROOT_SIZE options from the MFS implementation. Add MD_ROOT and MD_ROOT_SIZE options to the md driver. Make the md driver handle MFS_ROOT and MFS_ROOT_SIZE options for compatibility. Add md driver to GENERIC, PCCARD and LINT. This is a cleanup which removes the need for some of the worse hacks in MFS: We really want to have a rootvnode but MFS on a preloaded image doesn't really have one. md is a true device, so it is less trouble. This has been tested with make release, and if people remember to add the "md" pseudo-device to their kernels, PicoBSD should be just fine as well. If people have no other use for MFS, it can be removed from the kernel.	1999-11-26 20:08:44 +00:00
phk	1848d96439	Convert various pieces of code to use vn_isdisk() rather than checking for vp->v_type == VBLK. In ccd: we don't need to call VOP_GETATTR to find the type of a vnode. Reviewed by: sos	1999-11-22 10:33:55 +00:00
eivind	f65e4dc8cd	We do not have ffs_checkexp, so remove the prototype	1999-11-20 16:44:44 +00:00
phk	1adcecffd9	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967	1999-11-20 10:00:46 +00:00
peter	bbcf774aff	Fix a warning (unused static declaration without MFS_ROOT)	1999-11-18 08:49:40 +00:00
eivind	4ce73d7096	Remove WILLRELE from VOP_SYMLINK Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.	1999-11-13 20:58:17 +00:00
eivind	21fff7b1c2	Remove WILLRELE from VOP_RENAME	1999-11-12 03:34:28 +00:00
phk	8c9bc6b146	Next step in the device cleanup process. Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code. Unify spec_open() for bdev and cdev cases. Remove the disabled bdev specific read/write code.	1999-11-09 14:15:33 +00:00
bde	d60ac1963e	Quick fix for breakage of ext2fs link counts as reported by stat(2) by the soft updates changes: only report the link count to be i_effnlink in ufs_getattr() for file systems that maintain i_effnlink. Tested by: Mike Dracopoulos <mdraco@math.uoa.gr>	1999-11-03 12:05:39 +00:00
msmith	219fe6842a	Make MFS work with the new root filesystem search process. In order to achieve this, root filesystem mount is moved from SI_ORDER_FIRST to SI_ORDER_SECOND in the SI_SUB_MOUNT_ROOT sysinit group. Now, modules which wish to usurp the default root mount can use SI_ORDER_FIRST. A compiled-in or preloaded MFS filesystem will become the root filesystem unless the vfs.root.mountfrom environment variable refers to a valid bootable device. This will normally only be the case when the kernel and MFS image have been loaded from a disk which has a valid /etc/fstab file. In this case, the variable should be manually overridden in the loader, or the kernel booted with -a. In either case "mfs:" should be supplied as the new value. Also fix a typo in one DFLTROOT case that would not have compiled.	1999-11-03 11:02:47 +00:00
msmith	c36e70686e	Newline-terminate the complaint message about not being able to find the root vnode pointer.	1999-11-01 23:57:28 +00:00
dillon	6554c27772	Add sysctl debug.dircheck to allow directory sanity checking to be turned on with a sysctl. Fix two bugs in ufs_lookup that can cause deadlocks due to out-of-order locking. This fix was tested for a few days prior to commit.	1999-10-30 00:51:14 +00:00
phk	8e3c3eafed	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
phk	1fc218b676	Remove the D_NOCLUSTER[RW] options which were added because vn had problems. Now that Matt has fixed vn, this can go. The vn driver should have used d_maxio (now si_iosize_max) anyway.	1999-09-30 07:11:30 +00:00
phk	073b941095	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde	1999-09-29 20:05:33 +00:00
marcel	d5e8d714b9	sigset_t change (part 2 of 5) ----------------------------- The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements. The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t. struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals. signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution. Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types. NOTE: kdump (and port linux_kdump) must be recompiled. Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.	1999-09-29 15:03:48 +00:00
phk	e9e0512210	Remove five now unused fields from struct cdevsw. They should never have been there in the first place. A GENERIC kernel shrinks almost 1k. Add a slightly different safetybelt under nostop for tty drivers. Add some missing FreeBSD tags	1999-09-25 18:24:47 +00:00
dillon	3bddba7951	More removals of vnode->v_lastr, replaced by preexisting seqcount heuristic to detect sequential operation. VM-related forced clustering code removed from ufs in preparation for a commit to vm/vm_fault.c that does it more generally. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>	1999-09-20 23:27:58 +00:00
phk	915efc6d1e	Fix a harmless bug I introduced, simplify a bit more while here.	1999-09-20 21:14:43 +00:00
phk	3ea30afc2d	Step one of replacing devsw->d_maxio with si_bsize_max. Rename dev->si_bsize_max to si_iosize_max and set it in spec_open if the device didn't. Set vp->v_maxio from dev->si_bsize_max in spec_open rather than in ufs_bmap.c	1999-09-20 19:57:28 +00:00
bde	d01e107d02	Removed diskerr()'s unused d_name arg and updated callers. This fixes warnings caused by the arg having the wrong type (not const enough). The arg was also wrong (a full name instead of a short one) for calls from from subr_diskmbr.c and pc98/diskslice_machdep.c.	1999-09-13 12:59:41 +00:00
alfred	b9136a6115	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD	1999-09-11 00:46:08 +00:00
julian	5c78e7345a	Changes to centralise the default blocksize behaviour. More likely to follow. Submitted by: phk@freebsd.org	1999-09-09 19:08:44 +00:00
julian	fd9cb11e53	Revert a bunch of contraversial changes by PHK. After a quick think and discussion among various people some form of some of these changes will probably be recommitted. The reversion requested was requested by dg while discussions proceed. PHK has indicated that he can live with this, and it has been agreed that some form of some of these changes may return shortly after further discussion.	1999-09-03 05:16:59 +00:00
phk	216936ca6d	Make bdev userland access work like cdev userland access unless the highly non-recommended option ALLOW_BDEV_ACCESS is used. (bdev access is evil because you don't get write errors reported.) Kill si_bsize_best before it kills Matt :-) Use the specfs routines rather having cloned copies in devfs.	1999-08-30 07:56:23 +00:00
phk	d311a0563b	remove unused variables.	1999-08-28 19:21:03 +00:00
phk	9c72381e09	We don't need to pass the diskname argument all over the diskslice/label code, we can find the name from any convenient dev_t	1999-08-28 14:33:44 +00:00
peter	d41244b69e	$Id$ -> $FreeBSD$	1999-08-28 02:16:32 +00:00
peter	3b842d34e8	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
phk	591c94d4c6	Simplify the handling of VCHR and VBLK vnodes using the new dev_t: Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;	1999-08-26 14:53:31 +00:00
phk	ea55d63475	Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.	1999-08-25 12:24:39 +00:00
phk	957b68b507	Initialize the si_bsize fields for the MFS bogodevices. (This broke MFS rootfs and thereby installation)	1999-08-24 18:35:33 +00:00
sheldonh	190863bb6d	Fix bug introduced in rev 1.28, which causes kernel build to break for the case where DEBUG is defined but not DIAGNOSTIC. ffs_checkblk is declared conditionally on DIAGNOSTIC, not DEBUG. PR: 13314 Reviewed by: bde	1999-08-24 08:39:41 +00:00
bde	2a5ff1f726	Use devtoname() to print dev_t's instead of casting them to long or u_long for misprinting in %lx format.	1999-08-23 20:35:21 +00:00
jdp	9f71d680aa	Support full-precision file timestamps. Until now, only the seconds have been maintained, and that is still the default. A new sysctl variable "vfs.timestamp_precision" can be used to enable higher levels of precision: 0 = seconds only; nanoseconds zeroed (default). 1 = seconds and nanoseconds, accurate within 1/HZ. 2 = seconds and nanoseconds, truncated to microseconds. >=3 = seconds and nanoseconds, maximum precision. Level 1 uses getnanotime(), which is fast but can be wrong by up to 1/HZ. Level 2 uses microtime(). It might be desirable for consistency with utimes() and friends, which take timeval structures rather than timespecs. Level 3 uses nanotime() for the higest precision. I benchmarked levels 0, 1, and 3 by copying a 550 MB tree with "cpio -pdu". There was almost negligible difference in the system times -- much less than 1%, and less than the variation among multiple runs at the same level. Bruce Evans dreamed up a torture test involving 1-byte reads with intervening fstat() calls, but the cpio test seems more realistic to me. This feature is currently implemented only for the UFS (FFS and MFS) filesystems. But I think it should be easy to support it in the others as well. An earlier version of this was reviewed by Bruce. He's not to blame for any breakage I've introduced since then. Reviewed by: bde (an earlier version of the code)	1999-08-22 00:15:16 +00:00
alc	075745f2e2	Add the (inline) function vm_page_undirty for clearing the dirty bitmask of a vm_page. Use it. Submitted by: dillon	1999-08-17 04:02:34 +00:00
phk	5f45261e99	Spring cleaning around strategy and disklabels/slices: Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout. please see comment in sys/conf.h about the flag argument. Remove strategy argument from all the diskslice/label/bad144 implementations, it should be found from the dev_t. Remove bogus and unused strategy1 routines. Remove open/close arguments from dssize(). Pick them up from dev_t. Remove unused and unfinished setgeom support from diskslice/label/bad144 code.	1999-08-14 11:40:51 +00:00
phk	a45a44a2bd	Move the special-casing of stat(2)->st_blksize for device files from UFS to the generic level. For chr/blk devices we don't care about the blocksize of the filesystem, we want what the device asked for.	1999-08-13 10:56:07 +00:00
phk	7b7ae40370	The bdevsw() and cdevsw() are now identical, so kill the former.	1999-08-13 10:29:38 +00:00
phk	683c2698ff	s/v_specinfo/v_rdev/	1999-08-13 10:10:12 +00:00
phk	e938d317d5	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.	1999-08-08 18:43:05 +00:00
alc	33da09bf48	Move the memory access behavior information provided by madvise from the vm_object to the vm_map. Submitted by: dillon	1999-08-01 06:05:09 +00:00
bde	13dd3005e3	Fixed access timestamp bugs: Set IN_ACCESS for successful reads of 0 bytes (except for requests to read 0 bytes). This was broken in rev.1.42. PR: misc/10148 Don't set IN_ACCESS for requests to read 0 bytes. Don't set IN_ACCESS for unsuccessful reads.	1999-07-25 02:07:16 +00:00
phk	cacc73aa18	Now a dev_t is a pointer to struct specinfo which is shared by all specdev vnodes referencing this device. Details: cdevsw->d_parms has been removed, the specinfo is available now (== dev_t) and the driver should modify it directly when applicable, and the only driver doing so, does so: vn.c. I am not sure the logic in checking for "<" was right before, and it looks even less so now. An intial pool of 50 struct specinfo are depleted during early boot, after that malloc had better work. It is likely that fewer than 50 would do. Hashing is done from udev_t to dev_t with a prime number remainder hash, experiments show no better hash available for decent cost (MD5 is only marginally better) The prime number used should not be close to a power of two, we use 83 for now. Add new checkalias2() to get around the loss of info from dev2udev() in bdevvp(); The aliased vnodes are hung on a list straight of the dev_t, and speclisth[SPECSZ] is unused. The sharing of struct specinfo means that the v_specnext moves into the vnode which grows by 4 bytes. Don't use a VBLK dev_t which doesn't make sense in MFS, now we hang a dummy cdevsw on B/Cmaj 253 so that things look sane. Storage overhead from all of this is O(50k). Bump __FreeBSD_version to 400009 The next step will add the stuff needed so device-drivers can start to hang things from struct specinfo	1999-07-20 09:47:55 +00:00
phk	6c373ff516	I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g	1999-07-17 18:43:50 +00:00
mckusick	f091c51c34	Create the macro DOINGASYNC to check whether the MNT_ASYNC flag has been set for a mount point. Insert missing checks to ensure that all write operations are done asynchronously when the MNT_ASYNC option has been requested. Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-13 18:20:13 +00:00
phk	f94fffca16	Use the fsid from the superblock, unless it looks bogus or has already been taken by some other filesystem.	1999-07-11 19:16:50 +00:00
mckusick	52ea4270f3	These changes appear to give us benefits with both small (32MB) and large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() PRIOR to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-08 06:06:00 +00:00
roberto	30e244e4e1	Add $Id$ Approved by: kirk	1999-07-07 07:51:04 +00:00
jdp	32460a339d	Update pathnames for new location of soft-updates sources.	1999-07-03 21:34:05 +00:00
mckusick	582bbe6a3b	No longer need to set B_ASYNC flag since BUF_KERNPROC now unconditionally sets the identity of the buffer.	1999-06-29 15:57:40 +00:00
peter	80b4d1b002	Keep the inlines for <sys/buf.h> happy..	1999-06-27 13:26:23 +00:00
mckusick	5b58f2f951	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.	1999-06-26 02:47:16 +00:00
mckusick	3050d8dd0b	On our final pass through ffs_fsync, do all I/O synchronously so that we can find out if our flush is failing because of write errors. This change avoids a "flush failed" panic during unrecoverable disk errors.	1999-06-18 05:49:46 +00:00
mckusick	88e39a63db	Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.	1999-06-16 23:27:55 +00:00
mckusick	02e5fe8035	Get rid of the global variable rushjob and replace it with a function in kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.	1999-06-15 23:37:29 +00:00
phk	6a5dc97620	Simplify cdevsw registration. The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing. cdevsw_add() will print an message if the d_maj field looks bogus. Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL. Move bdevsw() and devsw() functions to kern/kern_conf.c Bump __FreeBSD_version to 400006 This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions if_xe.c bogusly accessed cdevsw[], author/maintainer please fix. I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.	1999-05-31 11:29:30 +00:00
jb	0e5212b792	- Back out Luoqi's cdevsw stuff. It panics on my system and is not required. - Fix an error message. - Do the MFS_ROOT setting of mountrootfsname in mfs_init() instead of cpu_rootconf(). - Set rootdev in mfs_init instead of later in mfs_mount() iff MFS_ROOT.	1999-05-24 00:27:12 +00:00
julian	fa6608381d	Cosmetic changes to make it compile without errors in gcc -Wall	1999-05-22 04:43:04 +00:00
luoqi	6f6fbfa99e	Legally acquire a major number for mfs.	1999-05-14 20:40:23 +00:00
mckusick	365073a062	Add a hook to ffs_fsync to allow soft updates to get first chance at doing a sync on the block device for the filesystem. That allows it to push the bitmap blocks before the inode blocks which greatly reduces the number of inode rollbacks that need to be done.	1999-05-14 01:26:46 +00:00
peter	5c0287c834	Try and fix a dev_t/major/minor etc nit.	1999-05-12 22:32:07 +00:00
phk	7e26ca1d1a	Divorce "dev_t" from the "major\|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).	1999-05-11 19:55:07 +00:00
bde	4558abbfc5	Fixed disordering in previous 2 commits.	1999-05-11 03:11:09 +00:00
peter	8b9aff36cb	Move the mfs_getimage() prototype to mfs_extern.h duplicating it everywhere.	1999-05-10 17:12:45 +00:00
mckusick	81c1d3f4c6	Put back changes that might be causing trouble on Alpha.	1999-05-09 19:39:54 +00:00
phk	500e41bd71	I got tired of seeing all the cdevsw[major(foo)] all over the place. Made a new (inline) function devsw(dev_t dev) and substituted it. Changed to the BDEV variant to this format as well: bdevsw(dev_t dev) DEVFS will eventually benefit from this change too.	1999-05-08 06:40:31 +00:00
phk	693dd58bb3	Continue where Julian left off in July 1998: Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function. Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!) Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!) (Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)	1999-05-07 10:11:40 +00:00
mckusick	ea5a4be9ab	Whitespace cleanup.	1999-05-07 05:21:16 +00:00
mckusick	b22fad8e64	Get rid of random debugging cruft; sync up with latest version.	1999-05-07 05:11:31 +00:00
mckusick	1a318ee963	Severe slowdowns have been reported when creating or removing many files at once on a filesystem running soft updates. The root of the problem is that soft updates limits the amount of memory that may be allocated to dependency structures so as to avoid hogging kernel memory. The original algorithm just waited for the disk I/O to catch up and reduce the number of dependencies. This new code takes a much more aggressive approach. Basically there are two resources that routinely hit the limit. Inode dependencies during periods with a high file creation rate and file and block removal dependencies during periods with a high file removal rate. I have attacked these problems from two fronts. When the inode dependency limits are reached, I pick a random inode dependency, UFS_UPDATE it together with all the other dirty inodes contained within its disk block and then write that disk block. This trick usually clears 5-50 inode dependencies in a single disk I/O. For block and file removal dependencies, I pick a random directory page that has at least one remove pending and VOP_FSYNC its directory. That releases all its removal dependencies to the work queue. To further hasten things along, I also immediately start the work queue process rather than waiting for its next one second scheduled run.	1999-05-07 02:26:47 +00:00
peter	73556bfee1	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.	1999-05-06 18:13:11 +00:00
alc	5cb08a2652	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-05-02 23:57:16 +00:00
phk	ca21a25f17	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/	1999-04-28 11:38:52 +00:00
msmith	81f00f311a	Simplify the tunefs example, since tunefs uses getfsfile(). Lots of people complain about working out what device their filesystems are mounted on.	1999-04-27 21:11:19 +00:00
phk	16e3fbd2c1	Suser() simplification: 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc ), prototyped in <sys/proc.h>. 3: s/suser_xxx($[a-zA-Z0-9_]$->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.	1999-04-27 11:18:52 +00:00
dt	306f74ff3a	Change type of a variable from u_int to size_t, so that pointer to it may be used as a last argument to copyinstr().	1999-04-21 09:41:07 +00:00
eivind	6277625e29	Correct typo in panic message	1999-04-11 02:28:32 +00:00
peter	8d6ca2a948	Hold the mfs process's upages in-core with PHOLD rather than P_NOSWAP.	1999-04-06 03:08:43 +00:00
julian	0ed09d2ad5	Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>	1999-04-05 19:38:30 +00:00
peter	af77d6f191	There's not much point in the EXPORTMFS #ifdef. I've had this sitting in my tree for 12+ months, and I just noticed that NetBSD have (I think, I've just seen the commit, not the change) just zapped it there. It wasn't in the options files or LINT either.	1999-04-05 06:39:10 +00:00
julian	4726cfcda9	Stop the mfs from trying to swap out crucial bits of the mfs as this can lead to deadlock. Submitted by: Mat dillon <dillon@freebsd.org>	1999-03-12 00:44:03 +00:00
bde	7435c5f5ec	Don't depend on <ufs/ufs/quota.h> or another (old) prerequisite including <sys/queue.h>. This fixes my recent breakage of biosboot by unpolluting <ufs/ufs/quota.h> in the !KERNEL case.	1999-03-06 05:21:09 +00:00
bde	6d203414ca	Moved kernel declarations inside the KERNEL ifdef, and removed include of <sys/queue.h> in the !KERNEL case. The prerequisites for <ufs/ufs/quota.h> were broken in Lite2 by converting some of the kernel declarations to use queue macros without including <sys/queue.h>. <sys/queue.h> was included in applications in /usr/src instead. We polluted this file instead of merging the changes in the applications. Include <sys/queue.h> in the KERNEL case, and forward-declare all structs that are used in prototypes, so that this file is almost self-sufficient even in the kernel. Obtained from: mostly from NetBSD	1999-03-05 11:25:31 +00:00
bde	801213cd08	Changed the type of quotactl()'s 4th arg from `char ' to` void ' so that non-sloppy applications can call it without using disgusting casts to avoid warnings. The 4th arg is sort of varargs -- it must sometimes represent a filename, sometimes a struct pointer, and is sometimes unused. The arg type is still caddr_t in the kernel. Obtained from: mostly from NetBSD	1999-03-05 09:28:33 +00:00
mckusick	4806ae523d	Reorganize locking to avoid holding the lock during calls to bdwrite and brelse (which may sleep in some systems). Obtained from: Matthew Dillon <dillon@apollo.backplane.com>	1999-03-02 06:38:07 +00:00
imp	f86ebee5a7	Merge patch to ufs_vnops.c's ufs_rename to the copy of ufs_rename that lives in ext2_vnops.c for ext2fs. Also remove cast from comparision. Bruce pointed out that it was bogus since we'd force a signed comparision when we really wanted an unsigned comparison.	1999-03-02 05:31:47 +00:00
mckusick	421edf71f1	When fsync'ing a file on a filesystem using soft updates, we first try to write all the dirty blocks. If some of those blocks have dependencies, they will be remarked dirty when the I/O completes. On systems with really fast I/O systems, it is possible to get in an infinite loop trying to flush the buffers, because the I/O finishes before we can get all the dirty buffers off the v_dirtyblkhd list and into the I/O queue. (The previous algorithm looped over the v_dirtyblkhd list writing out buffers until the list emptied.) So, now we mark each buffer that we try to write so that we can distinguish the ones that are being remarked dirty from those that we have not yet tried to flush. Once we have tried to push every buffer once, we then push any associated metadata that is causing the remaining buffers to be redirtied. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-03-02 04:04:31 +00:00
mckusick	0674f5c758	Ensure that softdep_sync_metadata can handle bmsafemap and mkdir entries if they ever arise (which should not happen as softdep_sync_metadata is currently used).	1999-03-02 00:19:47 +00:00
imp	b4c5cb0560	Fix last commit based on feedback from Guido, Bruce and Terry. Specifically, the test was in the wrong place, lacked a cast, didn't unlock the node, and exited to bad rather than abortit. Now we don't allow renaming of a file with LINK_MAX references. Move the test to earlier in the code as it is closer to where ip is obtained, as that is the style of the rest of the function. Didn't fix the problems bruce pointed out in the rename man page to include EMLINK, nor address his complaints about how the whole idea of incrementing the link count during a rename is potentially asking for trouble. Also didn't try to correct potential problem Terry pointed out with decrements not being similarly protected against underflow.	1999-02-26 05:34:16 +00:00
imp	f6b1037575	Add missing check for LINK_MAX in ufs_rename. Since ip->i_effnlink and ip->nlink were different types, there was a masked overflow. Reported by: Mark Slemco <marcs@znep.com>	1999-02-25 09:52:46 +00:00
dillon	499ea70f5f	Update ufs_vnops code to use new specinfo fields rather then guess. This is part of general specinfo / d_parms() commit.	1999-02-25 05:35:53 +00:00
mckusick	1f1828bc6d	fix double LIST_REMOVE; other cosmetic changes to match version 9.32. Obtained from: Jeffrey Hsu <hsu@FreeBSD.ORG>	1999-02-17 20:01:20 +00:00
dillon	39fab1fa0c	Remove XXX comment in regarsd to why NFS doesn't use VOP_ABORT(). NFS is being fixed now.	1999-02-13 08:38:28 +00:00
dillon	975fba8a24	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-28 00:57:57 +00:00
dillon	ca8ef4ff13	Remove unintended trigraph sequences in comments for -Wall	1999-01-27 18:19:53 +00:00
dg	416e7bfc5a	Gutted softdep_deallocate_dependencies and replaced it with a panic. It turns out to not be useful to unwind the dependencies and continue in the face of a fatal error. Also changed the log() to a printf() in softdep_error() so that it will be output in the case of a impending panic. Submitted by: Kirk McKusick <mckusick@mckusick.com>	1999-01-22 09:07:32 +00:00
dillon	99dfef7f2a	Added support for VOP_FREEBLKS(), reducing MFS's impact on swap and increasing performance by deallocating at least some of the backing store when files are removed. Protect mfsp->buf_queue access at splbio().	1999-01-21 09:27:03 +00:00
dillon	92d48e1c28	Access to mfsp->buf_queue must be protected at splbio(). Other minor adjustments also made, such as passing mfsp to mfs_doio() directly.	1999-01-21 09:24:46 +00:00
dillon	df24433bbe	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
eivind	99c0da0833	Silence warning about unused debug function. (I'll turn this function into a DDB command in my next staticization sweep).	1999-01-12 11:42:41 +00:00
eivind	1e06085274	Add a warning about the copyright restraints.	1999-01-08 16:03:12 +00:00
bde	2facf6978a	Don't pass unused unused timestamp args to UFS_UPDATE() or waste time initializing them. This almost finishes centralizing (in-core) timestamp updates in ufs_itimes().	1999-01-07 16:14:19 +00:00
bde	2a8b656860	UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value MNT_WAIT when we mean boolean `true' or check for that value not being passed. There was no problem in practice because MNT_WAIT had the magic value of 1.	1999-01-06 18:18:06 +00:00
bde	e5ba679f2a	Ifdefed the conditionally used variable `prtrealloc'. Declare it as volatile so that there is no chance that the code that it controls is optimised away.	1999-01-06 17:04:33 +00:00
bde	f77c71a1d3	Backed out rev.1.47. It just broke my optimisations for lazy syncing of timestamps in rev.1.45. The soft updates bug was elsewhere. Forgotten by: luoqi	1999-01-06 16:52:38 +00:00
eivind	ffaaca5874	Remove the 'waslocked' parameter to vfs_object_create().	1999-01-05 18:50:03 +00:00
bde	734d13314e	Ifdefed conditionally used simplock variables.	1999-01-02 11:34:57 +00:00
eivind	9922763a3d	Remove the last clients of vfs_object_create(..., waslocked=1); waslocked will go away shortly. Reviewed by: dg	1999-01-02 01:32:36 +00:00
dillon	e4a4ff7180	The mount_mfs process that stays in a supervisor context handling MFS I/O requests must be marked P_SYSTEM because if it isn't and the system decides to swap it or (god forbid) kill it, the system stands a good chance of locking up.	1999-01-01 04:14:11 +00:00
bde	20e32654dd	Fixed null pointer panics which I introduced in rev.1.86. Vnodes may be revoked, so vnop routines must be careful about accessing the vnode if they may have blocked. Fixed marking for update after successfully reading or writing 0 bytes. In this case, POSIX.1 specifies marking if and only if the requested count is nonzero, but rev.1.86 never marked.	1998-12-24 09:45:10 +00:00
bde	cbf08b2af8	Remove unused file. It seems to have been a vestige of when mfs did its own memory allocation.	1998-12-20 17:05:54 +00:00
dfr	7a9bc41cc4	In ufs_setattr(), if only one of va_atime or va_mtime are != VNOVAL, then the code set the other field in the inode to VNOVAL. This can happen sometimes on an NFS server.	1998-12-20 12:36:01 +00:00
julian	c179278e54	Add comments to code that I was trying to understand. Hopefully will save others time. Someone who understands this better might check for correctness.	1998-12-15 03:29:52 +00:00
dillon	6d407291a8	Fix -Wuninitialized warning regarding zero-length var-args ctl element. ( this isn't really an error, but I think it is important to fix the warning ).	1998-12-14 05:37:37 +00:00
julian	da920d916d	Remove some compiler warnings.	1998-12-10 20:11:47 +00:00
eivind	c1d9b8bf7a	Make compare correct with unsigned types. (Problem introduced by Lite/2).	1998-12-09 02:06:27 +00:00
archie	60d13c7a9d	The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.	1998-12-07 21:58:50 +00:00
bde	a76d32989b	Don't use the strange null pointer constant `(ufs_daddr_t)0' in a call to VOP_BMAP(). Don't use uncast NULLs in the same call.	1998-11-29 03:12:06 +00:00
dg	841cc6703a	Restored the "reallocblks" code to its former glory. What this does is basically do a on-the-fly defragmentation of the FFS filesystem, changing file block allocations to make them contiguous. Thanks to Kirk McKusick for providing hints on what needed to be done to get this working.	1998-11-13 01:01:44 +00:00
peter	73192d8050	add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()	1998-11-10 09:16:29 +00:00
peter	8ad638ff9e	Change dirty block list handling to use TAILQ macros.	1998-10-31 15:33:32 +00:00
peter	8ef35acf90	Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.	1998-10-31 15:31:29 +00:00
jkh	d68de0ddb3	Clarify a rather ambiguous debugging message.	1998-10-28 10:37:54 +00:00
bde	bd7a76a938	Oops, the redundant tests for major numbers weren't redundant here. They checked for the magic major number for the "device" behind mfs mount points. Use a more obvious check for this device. Debugged by: Andrew Gallatin <gallatin@cs.duke.edu>	1998-10-27 11:47:08 +00:00
bde	873d7be484	Removed redundant bitrotted checks for major numbers instead of updating them.	1998-10-26 08:53:13 +00:00
bde	5a7ea1209a	Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted when bdevsw[] became sparse. We still depend on magic to avoid having to check that (v_rdev) device numbers in vnodes are not NODEV. Removed redundant `major(dev) < nblkdev' tests instead of updating them.	1998-10-25 19:02:48 +00:00
phk	13c66194f4	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.	1998-10-25 17:44:59 +00:00
bde	b92f5250f9	Use only the correct raw partition for writing labels. Don't use the partition that the label ioctl is being done on just because it has offset 0, since there is no guarantee that such a partition is large enough to contain the label. Don't use the wrong raw partition (0 instead of RAW_PART). This fixes problems rewriting bizarre labels (with a nonzero offset for the 'a' partition) in newfs(8). Such labels shouldn't normally be used, but creating them was allowed if the ioctl was done on the raw partition, and sysinstall creates them if the root partition isn't allocated first. Note that allowing write access to a partition other than the one that has been checked for write access doesn't increase security holes significantly, since write access to any partition already allows changing the in-core label. This fix should be in 3.0R. Rev.1.26 of newfs/newfs.c shouldn't be in 3.0R.	1998-10-17 07:49:04 +00:00
jkh	f598ae3929	fixup for alpha.	1998-10-16 10:14:21 +00:00
bde	8ccf93af58	Fixed bloatage of `struct inode'. We used 5 "spare" fields for ext2fs, but when i_effnlink was added to support soft updates, there was only room for 4 spares. The number of spares was not reduced, so the inode size became 260 (on i386's), or 512 after rounding up by malloc(). Use one spare field in `struct dinode' instead of the 5th spare field in the inode and reduced to 4 spares in the inode so that the size is 256 again. Changed the types of the spares in the inode from int to u_int32_t so that the inode size has more chance of being <= 256 under other arches, and downdated ext2fs to match (it was broken to use ints before rev.1.1).	1998-10-13 15:45:43 +00:00
peter	71fd6ef94b	"fix" a warning	1998-10-12 09:02:19 +00:00
jkh	a196ba12b0	Allow more flexible use of MFS root. Submitted by: peter	1998-10-10 08:12:24 +00:00
peter	4235e6da0c	MODINFO_ADDR has real addresses now, remove the manual relocation based on cpu type.	1998-10-09 23:37:37 +00:00
jkh	ff1c526f59	Add some evil temporary phys-to-kern translation for mfs.	1998-10-09 06:21:12 +00:00
jkh	b222d6d82f	include proper header for Mike's new stuff.	1998-10-09 01:40:56 +00:00
jkh	ababcfc884	Allow the module area to be used in order to find the MFS image (in addition to allowing it to be compiled in) and stop overloading the MFS_ROOT variable to store size information.	1998-10-08 23:34:44 +00:00
luoqi	b09f25123d	Use vm_page_xxx() inline functions to manipulate vm_page::flags, vm_page::busy. As a side effect, a few wakeup() calls are added, which might fix some of the missing vm_page wakeups people have been seeing. Reviewed by: Doug Rabson <dfr@nlsystems.com>	1998-10-07 13:59:26 +00:00
nate	75914bbe38	Fix 'noatime' bug that was unrelated to use of noatime. The problem is caused when a directory block is compacted. When this occurs, softdep_change_directoryentry_offset() is called to relocate each directory entry and adjust its matching diradd structure, if any, to match the new location of the entry. The bug is that while softdep_change_directoryentry_offset() correctly adjusts the offsets of the diradd structures on the pd_diraddhd[] lists (which are not yet ready to be committed to disk), it fails to adjust the offsets of the diradd structures on the pd_pendinghd list (which are ready to be committed to disk). This causes the dependency structures to be inconsistent with the buf contents. Now, if the compaction has moved a directory entry to the same offset as one of the diradd structures on the pd_pendinghd list and a syscall is done that tries to remove this directory entry before this directory block has been written to disk (which would empty pd_pendinghd), a sanity check in newdirrem() will call panic() when it notices that the inode number in the entry that it is to be removed doesn't match the inode number in the diradd structure with that offset of that entry. Reviewed by: Kirk McKusick <mckusick@McKusick.COM> Submitted by: Don Lewis <Don.Lewis@tsc.tdk.com>	1998-10-03 19:17:11 +00:00
mckusick	a66a5d3d3a	Do not allow a mounted on directory to be rmdir'ed. This removal can happen when an NFS exported filesystem tries to remove a locally mounted on directory. PR: kern/7272 Submitted by: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>	1998-09-30 00:53:40 +00:00
bde	d7210d0597	Fixed clean flag handling: - don't set the clean flag on unmount of an unclean filesystem that was (forcibly) mounted rw. - set the clean flag on rw -> ro update of a mounted initially-clean filesystem. - fixed some style bugs (mostly long lines). This uses the fs_flags field and FS_UNCLEAN state bit which were introduced in the softdep changes. NetBSD uses extra state bits in fs_clean. Reviewed by: luoqui	1998-09-26 04:59:42 +00:00
luoqi	10b8717849	Eliminate a race in VOP_FSYNC() when softupdates is enabled. Submitted by: Kirk McKusick <mckusick@McKusick.COM> Two minor changes are also included, 1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set, vn_lock should always succeed in these cases. 2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount a little more unstable. It also keeps us in sync with other BSDs. Suggested by: Bruce Evans <bde@zeta.org.au>	1998-09-24 15:02:46 +00:00
luoqi	2ff38e0785	Restore pre-v1.44 behavior: always copy modified in-core inode to disk buffer. Otherwise some in-core inode changes might be lost, including important meta data (e.g. size) if softupdates is enabled.	1998-09-15 14:45:28 +00:00
gibbs	048a0d3b5b	When a buffer is removed from a buffer queue, remember it's block number and use it as "the currently active" buffer in doing disk sort calculations.	1998-09-15 08:55:03 +00:00
sos	8397655514	Remove the SLICE code. This clearly needs alot more thought, and we dont need this to hunt us down in 3.0-RELEASE.	1998-09-14 19:56:42 +00:00
bde	bf0874491d	Don't dereference an uninitialized pointer in dead code. The dead code gets executed if it is compiled without optimization.	1998-09-12 14:46:15 +00:00
bde	e170b2ba75	Removed statically configured mount type numbers (MOUNT_) and all references to them. The change a couple of days ago to ignore these numbers in statically configured vfsconf structs was slightly premature because the cd9660, cfs, devfs, ext2fs, nfs vfs's still used MOUNT_ instead of the number in their vfsconf struct.	1998-09-07 13:17:06 +00:00
bde	4100b68615	Put the zombie ffs sysctl node in "notyet" state together with its few remaining children. Prepare it for MOUNT_UFS going away.	1998-09-07 11:50:19 +00:00
phk	fcef0795af	Make MFS do the default on VOP_FREEBLKS(). XXX: we could deallocate the storage, but somebody else will have to pick up that task.	1998-09-07 06:52:01 +00:00
phk	4630814c8b	Add a new vnode op, VOP_FREEBLKS(), which filesystems can use to inform device drivers about sectors no longer in use. Device-drivers receive the call through d_strategy, if they have D_CANFREE in d_flags. This allows flash based devices to erase the sectors and avoid pointlessly carrying them around in compactions. Reviewed by: Kirk Mckusick, bde Sponsored by: M-Systems (www.m-sys.com)	1998-09-05 14:13:12 +00:00
bde	605f4cb33b	Quick fix for breakage of read clustering on non-IDE drives. Read clustering is obsolescent technology so hardly anyone noticed. On a DORS 32160 SCSI drive with 4 tags, read clustering makes very little difference even for huge sequential reads. However, on a ZIP SCSI drive with 0 tags, the minimum overhead per block is about 40 msec, so very large clusters must be used to get anywhere near the maximum transfer rate. Using clusters consisting of 1 8K block reduces the transfer rate to about 250K/sec. Under msdosfs, missing read clustering is normal and a cluster size of 1 512 byte block reduces the transfer rate to about 25K/sec. Broken in: rev.1.18	1998-08-18 03:54:39 +00:00
bde	4e2d834c27	Removed unused includes.	1998-08-17 19:09:36 +00:00
msmith	64b624ba3e	"The releaseing of the reference and lock is not temporary and belongs where it is. The reference and lock(s) are acquired just above the code in VREF() and relookup()." Submitted by: Michael Hancock <michaelh@cet.co.jp>	1998-08-12 21:42:54 +00:00
julian	4c11bd9897	Handle the case of moving a directory onto the top of a sibling's child of the same name. Submitted by: Kirk Mckusick with fixes from luoqi Chen Obtained from: Whistle test tree.	1998-08-12 20:46:47 +00:00
bde	0c1764387c	Used daddr_t's, not ints, to store disk block numbers. Updated printf formats and args to match. Fixed old printf format errors (all related; most were hidden by calling printf indirectly). This change somehow avoids compiler bugs for 64-bit longs on i386's, although it increases the number of 64-bit calculations.	1998-07-28 18:25:51 +00:00
bde	1881ba1352	Made lazy syncing of timestamps for special files non-optional.	1998-07-27 15:37:00 +00:00
bde	863d5c8b68	Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.	1998-07-15 02:32:35 +00:00
bde	f0b863f4b5	Fixed printf format errors.	1998-07-11 07:46:16 +00:00
julian	3032208d42	Add code missed in the initial Soft updates integration. Make the unallocated parts of a directry have a know state in case we need it later.	1998-07-10 00:10:20 +00:00
julian	c520e2ce97	Don't update superblock if mounted readonly, also fixes some problems with softupdates on root. More cleanups are needed here.. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>	1998-07-08 23:52:27 +00:00
julian	78a155ddaf	Catch a few corner cases where FreeBSD differs enough from BSD 4.4 to confuse Soft updates.. Should solve several "dangling deps" panics.	1998-07-08 01:04:33 +00:00
julian	4363221ba2	VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>	1998-07-04 20:45:42 +00:00
bde	cd727ef682	Restored revs.1.89-1.90 which I somehow clobbered in rev.1.91.	1998-07-03 22:37:43 +00:00
bde	660e6408e6	Sync timestamp changes for inodes of special files to disk as late as possible (when the inode is reclaimed). Temporarily only do this if option UFS_LAZYMOD configured and softupdates aren't enabled. UFS_LAZYMOD is intentionally left out of /sys/conf/options. This is mainly to avoid almost useless disk i/o on battery powered machines. It's silly to write to disk (on the next sync or when the inode becomes inactive) just because someone hit a key or something wrote to the screen or /dev/null. PR: 5577 Previous version reviewed by: phk	1998-07-03 22:17:03 +00:00
bde	dfd9848c30	Centralized in-core inode update. Update the in-core inode directly in ufs_setattr() so that there is no need to pass timestamps to UFS_UPDATE() (everything else just needs the current time). Ignore the passed-in timestamps in UFS_UPDATE() and always call ufs_itimes() (was: itimes()) to do the update. The timestamps are still passed so that all the callers don't need to be changed yet.	1998-07-03 18:46:52 +00:00
phk	9905e9d2e7	Make vprint() print dev_t in hex also.	1998-06-27 07:28:49 +00:00
phk	3354f8d129	Report the type from the inode, not the vnode.	1998-06-27 06:45:04 +00:00
jkh	cfe1e92767	Flesh this document out just a little in response to some user questions and also recommend linking over copying since, at this stage, a stale copy is a real concern.	1998-06-26 10:35:55 +00:00
bde	403bdcb97b	Removed unused includes.	1998-06-21 14:53:44 +00:00
julian	fb17974b6a	Slight change to directory cleanup Makes soft updates a bit cleaner. Eliminates some warnings about 'corrupted directories' from fsck.	1998-06-14 19:31:28 +00:00
julian	e30abc2a08	Note which version of Kirk's sources this corresponds to.	1998-06-12 21:21:26 +00:00
julian	6b27bc7737	Fix the case when renaming to a file that you've just created and deleted, that had an inode that has not yet been written to disk, when the inode of the new file is also not yet written to disk, and your old directory entry is not yet on disk but you need to remove it and the new name exists in memory but has been deleted but the transaction to write the deleted name to disk exists and has not yet been cancelled by the request to delete the non existant name. I don't know how kirk could have missed such a glaring problem for so long. :-) Especially since the inconsitency survived on the disk for a whole 4 second on average before being fixed by other code. This was not a crashing bug but just led to filesystem inconsitencies if you crashed. Submitted by: Kirk McKusick (mckusick@mckusick.com)	1998-06-12 20:48:30 +00:00
julian	19e664debc	Add B_NOCACHE to several cases where BSD4.4 only required a B_INVAL. Change worked out by john and kirk in consort.	1998-06-11 17:44:32 +00:00
julian	21ee11979f	Fix for "live inode" panic. Submitted by: Kirk McKusick <mckusick@McKusick.COM> Reviewed by: yeah right...	1998-06-10 20:45:46 +00:00
julian	ab4debc1cf	Remove buggy debugging code.	1998-06-10 20:03:16 +00:00
julian	27341c23aa	Back out John's changes 1.45 -> 1.46 Kirk confirms that the original semantic was what he wanted... (well, a very slight difference) May fix "dangling deps" panic with soft updates.	1998-06-10 19:27:56 +00:00
julian	b4e8c144c1	The version of the softdep changes in FreeBSD broke the (doingdirectory && !newparent) case of ufs_rename(). rename("D1/X/", "D2/Y/") gives a wrong link count for D2. Submitted by: Bruce Evans <bde@zeta.org.au> Reviewed by: Kirk McKusick <mckusick@McKusick.COM>	1998-06-08 23:55:33 +00:00
bde	b8ca4196f1	Null change. Forgot to mention in previous log message that MNT_NOATIME is now ignored for special files, so that mounting root with option noatime doesn't break reporting of idle times in programs like `w'. The problem of execessive disk updates just to stamp atimes will be handled for special files by only writing atimes to disk when inodes become active. This works well because special files are relatively uncommon and their atimes are even more disposable at panic time than regular files' atimes.	1998-06-07 11:04:26 +00:00
bde	0af8745beb	Fixed some longstanding timestamp bugs: 1. mark atimes and mtimes of special files and fifos for update upon successful completion of non-null i/o, not at the beginning of the syscall. 2. never update file times for readonly filesystems. They were updated for stats and closes but not for syncs. The updates were of course only in-core and were thrown away when the inode was uncached, so the times sometimes appeared to go backwards. Improved comments in code related to (1) (mostly by removing them). Unmacroized ITIMES(). The test in (2) bloated it even more. Don't call getmicrotime() in the function version of it when we only need the time in seconds.	1998-06-07 10:49:18 +00:00
dfr	3879235125	Use size_t instead of u_int for sizes.	1998-06-04 17:21:39 +00:00
dfr	491642ca32	If the filesystem blocksize is less than the VM page size, use the generic getpages code. This happens for filesystems with 4k pages on the alpha since the normal alpha pagesize is 8k.	1998-06-04 17:04:44 +00:00
dfr	56e5ba84df	Don't cast a pointer to an int in DQHASH.	1998-06-04 17:03:16 +00:00
julian	1464500611	Add a reference to the original softupdates paper	1998-06-02 01:30:51 +00:00
julian	3fd4b55938	Add a reference to the Ganger/Patt paper	1998-06-02 01:27:27 +00:00
julian	0086b00b30	A fix to a debug test from Kirk.	1998-05-27 03:32:23 +00:00
julian	92e0f9da97	Ensure that there is enough information here, so that people can use soft updates should they desire.	1998-05-19 23:18:37 +00:00
julian	44ee923017	Bring up-to-date with Whistle's current version Includes some debugging code.	1998-05-19 23:07:25 +00:00
julian	988e3e4a34	Merge with Kirk's version as of Feb 20 His version 9.23 == our version 1.5 of ffs_softdep.c His version 9.5 == our version 1.4 of softdep.c	1998-05-19 22:54:53 +00:00
julian	0cc808ba0d	Merge in Kirk's changes to stop softupdates from hogging all of memory.	1998-05-19 21:45:53 +00:00
julian	99669d2e37	Change to stop a silly panic. This should be understood better. Change a buffer swizzle trick to a bcopy. It would be nice if the efficient trick could be used in the future.	1998-05-19 20:50:41 +00:00
julian	ad8fcbb0ce	First published FreeBSD version of soft updates Feb 5.	1998-05-19 20:18:42 +00:00
julian	9ae1fc57cc	This commit was generated by cvs2svn to compensate for changes in r36206, which included commits to RCS files with non-trunk default branches.	1998-05-19 20:03:29 +00:00
julian	6df1279ada	Import the next version received from kirk after some FreeBSD feedback.	1998-05-19 20:03:29 +00:00

... 4 5 6 7 8 ...

908 Commits