freebsd-dev

Author	SHA1	Message	Date
Matthew Dillon	23b590188f	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release	2001-12-20 22:42:27 +00:00
Kirk McKusick	f305c5d199	Change the atomic_set_char to atomic_set_int and atomic_clear_char to atomic_clear_int to ease the implementation for the sparc64. Requested by: Jake Burkholder <jake@locore.ca>	2001-12-18 18:05:17 +00:00
Ian Dowse	143a5346c9	Make sure we ignore the value of `fs_active' when reloading the superblock, and move the initialisation of it to beside where other pointer fields are initialised.	2001-12-16 18:54:09 +00:00
Ian Dowse	3fa4044e34	Move the new superblock field `fs_active' into the region of the superblock that is already set up to handle pointer types. This fixes an accidental change in the superblock size on 64-bit platforms caused by revision 1.24.	2001-12-16 18:51:11 +00:00
Kirk McKusick	cc5a92334f	Minimize the time necessary to suspend operations on a filesystem when taking a snapshot. The two time consuming operations are scanning all the filesystem bitmaps to determine which blocks are in use and scanning all the other snapshots so as to be able to expunge their blocks from the view of the current snapshot. The bitmap scanning is broken into two passes. Before suspending the filesystem all bitmaps are scanned. After the suspension, those bitmaps that changed after being scanned the first time are rescanned. Typically there are few bitmaps that need to be rescanned. The expunging of other snapshots is now done after the suspension is released by observing that we can easily identify any blocks that were allocated to them after the suspension (they will be maked as `not needing to be copied' in the just created snapshot). For all the gory details, see the ``Running fsck in the Background'' paper in the Usenix BSDCon 2002 Conference Proceedings, pages 55-64.	2001-12-14 00:15:06 +00:00
Kirk McKusick	9db12e5108	When a file is partially truncated, we first check to see if the new file end will land in the middle of a file hole. Since the last block of a file must always be allocated, the hole is filled by allocating a block at that location. If the hole being filled is a direct block, then the truncation may eventually reduce the full sized block down to a fragment. When running with soft updates, it is necessary to FSYNC the file after allocating the block and before creating the fragment to avoid triggering a soft updates inconsistency when the block unexpectedly shrinks. Found by: Matthew Dillon <dillon@apollo.backplane.com> MFC after: 1 week	2001-12-13 05:07:48 +00:00
Robert Watson	24373ce6ed	Use 'mkdir -p /.attribute/system' instead of breaking it into two seperate mkdir targets. Submitted by: jedgar	2001-11-30 15:32:07 +00:00
Robert Watson	cff9580525	Use 'mkdir -p /.attribute/system' instead of breaking it into two seperate mkdir targets.	2001-11-30 15:21:20 +00:00
Robert Watson	15f1c8d3d2	README.extattr incorrectly specified sample command lines for UFS_EXTATTR_AUTOSTART. Insert the missing 'initattr' arguments to extattrctl. Noticed by: green	2001-11-30 15:15:27 +00:00
Guido van Rooij	40e294f796	When mkdir()-ing, the parent dir gets is linkcount increased. Fix VN_KNOTE to reflect that. Found by: tobez@freebsd.org MFC after: 2 days	2001-11-22 15:33:12 +00:00
Ian Dowse	4202b366fc	Oops, when trying the dirhash sequential-access optimisation, compare the slot offset against the predicted offset, not a boolean flag. This typo effectively disabled the sequential optimisation, but was otherwise harmless. Not surprisingly, fixing this improves performance in the sequential access case. I am seeing a 7% speedup on one machine here; using dirhash when sequentially looking up directory entries is now about 5% faster instead of 2% slower than the non-dirhash case. Submitted by: KOIE Hidetaka <koie@suri.co.jp> MFC after: 1 week	2001-11-14 15:08:07 +00:00
Matthew Dillon	7e76bb562e	Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking in wdrain during a write. This flag needs to be used in devices whos strategy routines turn-around and issue another high level I/O, such as when MD turns around and issues a VOP_WRITE to vnode backing store, in order to avoid deadlocking the dirty buffer draining code. Remove a vprintf() warning from MD when the backing vnode is found to be in-use. The syncer of buf_daemon could be flushing the backing vnode at the time of an MD operation so the warning is not correct. MFC after: 1 week	2001-11-05 18:48:54 +00:00
Robert Watson	6d8785434f	o Update copyright dates. o Add reference to TrustedBSD Project in license header. o Update dated comments, including comment in extattr.h claiming that no file systems support extended attributes. o Improve comment consistency.	2001-11-01 21:37:07 +00:00
Robert Watson	b6e0472987	o Althought this is not specified in POSIX.1e, the UFS ACL implementation coerces the deletion of a default ACL on a directory when no default ACL EA is present to success. Because the UFS EA implementation doesn't disinguish the EA failure modes "that EA name has not been administratively enabled" from "that EA name has no defined data", there's a potential conflict in error return values. Normally, the lack of administratively configured EA support is coerced to EOPNOTSUPP to indicate that ACLs are not available; in this case, it is possible to get a successful return, even if ACLs are not available because EA support for them has not been enabled. Expand the comment in ufs_setacl() to identify this case. Obtained from: TrustedBSD Project	2001-10-27 05:39:17 +00:00
Robert Watson	ac8b3dd7dc	o Clarify a comment about the locking condition of the vnode upon exit from ufs_extattr_enable_with_open(). o Print auto-start notifications if (bootverbose). This was previously commented out since it didn't know how to check for bootverbose. o Drop in comments throughout indicating where ENOENT should be replaced with ENOATTR once that is available. Obtained from: TrustedBSD Project	2001-10-27 05:19:14 +00:00
Robert Watson	29543004bd	o The comment about ordering the destruction of the lock and the removal of the flag indicating that the structure was initialized didn't need an XXX, since it didn't need fixing. Obtained from: TrustedBSD Project	2001-10-27 05:05:39 +00:00
Robert Watson	9444746795	o Wrap a number of long lines of code, many of which were introduced due to KSE-related (p) expansions. Obtained from: TrustedBSD Project	2001-10-27 05:03:05 +00:00
Robert Watson	ce5ddec25f	Since namespace support was added to the UFS extended attribute implementation to replace single-character namespace prefixes, '$' is no longer an invalid attribute name, and the namespace is relevant to validity determination. o Remove '$' case from ufs_extattr_valid_attrname() o Add attrnamespace argument to ufs_extattr_valid_attrname(), and fill out appropriately. Currently no decisions are made based on the namespace argument, but may be in the future. Obtained from: TrustedBSD Project	2001-10-27 04:58:28 +00:00
Matthew Dillon	245df27cee	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week	2001-10-26 00:08:05 +00:00
Ian Dowse	71fc5e11c7	Default to not performing ufs_dirhash's extensive directory-block sanity check after every directory modification. This check can be re-enabled at any time by setting the sysctl "vfs.ufs.dirhash_docheck" to 1. This group of sanity tests was there to ensure that any UFS_DIRHASH bugs could be caught by a panic before a potentially corrupted directory block would be written to disk. It has served its main purpose now, so disable it in the interest of performance. MFC after: 1 week	2001-10-25 22:55:59 +00:00
Matthew Dillon	c72ccd014d	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days	2001-10-23 01:21:29 +00:00
John Baldwin	bd78cece5d	Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.	2001-10-11 23:38:17 +00:00
John Baldwin	7106ca0d1a	Add missing includes of sys/lock.h.	2001-10-11 17:52:20 +00:00
Matthew Dillon	962922dcd2	Remove panics for rename() race conditions. The panics are inappropriate because the IN_RENAME flag only fixes a few of the huge number of race conditions that can result in the source path becoming invalid even prior to the VOP_RENAME() call. The panics created a serious security issue whereby an attacker could fairly easily cause the panic to occur, crashing the machine. The correct solution requires a great deal of work in the namei path cache code. MFC after: 0 days	2001-10-08 00:37:54 +00:00
Robert Watson	ab66aa1468	o Replace two direct uid!=0 comparisons with suser_xxx() calls. Obtained from: TrustedBSD Project	2001-10-02 14:41:43 +00:00
Robert Watson	b73d2870cd	o Replace two direct uid!=0 comparisons with suser_td() calls. Obtained from: TrustedBSD Project	2001-10-02 14:34:22 +00:00
Matthew Dillon	4c94c7bfb9	Backout the last commit. The problem is actually much worse then I first thought and may require serious work to the VOP_RENAME() api itself. Basically, by the time the VOP_RENAME() function is called, it's already too late.	2001-10-02 04:26:58 +00:00
Matthew Dillon	be2a975a9f	IN_RENAME should only be cleared by the routine that set it. This fixes a rename/rmdir race that has been shown to cause a panic. Bug reported by: Yevgeniy Aleynikov <eugenea@infospace.com> MFC after: 3 days	2001-10-02 02:58:48 +00:00
John Baldwin	eb46fac565	- Fix some minor whitespace nits. - Move the SPECIAL_FLAG #define up next to the NOHOLDER #define and fix a little nit that caused it to be defined as -(sizeof (struct thread) + 1) instead of -2.	2001-09-27 21:04:13 +00:00
Robert Watson	57358f1e93	o Re-enable support of system file flags in jail() by adding back the PRISON_ROOT to the suser_xxx() check. Since securelevels may now be raised in specific jails, use of system flags can still be restricted in jail(), but in a more configurable way. o Users of jail() expecting system flags (such as schg) to restrict jail()'s should be sure to set the securelevel appropriately in jail()'s. o This fixes activities involving automated system flag removal in jail(), including installkernel and friends. Obtained from: TrustedBSD Project	2001-09-26 20:44:41 +00:00
Robert Watson	6748bcc51e	o Modify ufs_setattr() so that it uses securelevel_gt() instead of direct variable access. Obtained from: TrustedBSD Project	2001-09-26 20:31:37 +00:00
Robert Watson	aaef1c3934	o Further clarify comment: ad Udo's request, re-insert the 'if' refering to securelevels; also, update the unprivileged process text to better indicate the scope of actions permittable when any system flags are already set (limited). Submitted by: Udo Schweigert <udo.schweigert@siemens.com>	2001-09-25 12:02:44 +00:00
Robert Watson	82e83c60b3	o Parallelize the comment on the relationship between privileged un-jailed processes and the actual securelevel check: make the comment use '> 0' instead of inverted '<= 0'.	2001-09-25 02:26:10 +00:00
Ian Dowse	5d76690a7f	The addition of i_dirhash to struct inode pushed RELENG_4's sizeof(struct inode) into a new malloc bucket on the i386. This didn't happen in -current due to the removal of i_lock, but it does no harm to apply the workaround to -current first. Reduce the size of the i_spare[] array in struct inode from 4 to 3 entries, and change ext2fs to use i_din.di_spare[1] so that it does not need i_spare[3]. Reviewed by: bde MFC after: 3 days	2001-09-24 18:29:20 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Ian Dowse	4691e9ead0	The "dirpref" directory layout preference improvements make use of an array "fs_contigdirs[]" to avoid too many directories getting created in each cylinder group. The memory required for this and two other arrays (fs_csp[] and fs_maxcluster[]) is allocated with a single malloc() call, and divided up afterwards. However, the 'space' pointer is not advanced correctly, so fs_contigdirs and fs_maxcluster end up pointing to the same address. Add the missing code to advance the 'space' pointer, and remove an unnecessary update of the pointer that follows. This is likely to fix the "ffs_clusteralloc: map mismatch" panics that have been reported recently. Submitted by: Luke Mewburn <lukem@wasabisystems.com>	2001-09-09 23:48:28 +00:00
Chris D. Faulhaber	dac4a67ce7	Use ACL_PERM_NONE instead of hardcoding 0 when initializing ACL entry permissions. Reviewed by: rwatson	2001-09-01 23:18:15 +00:00
Robert Watson	7df97b6117	o At some point, unmounting a non-EA file system with EA's compiled in got a bit broken, when ufs_extattr_stop() was called and failed, ufs_extattr_destroy() would panic. This makes the call to destroy() conditional on the success of stop(). Submitted by: Christian Carstensen <cc@devcon.net> Obtained from: TrustedBSD Project	2001-09-01 20:11:05 +00:00
Peter Wemm	0f7289022b	If a file has been completely unlinked, stop automatically syncing the file. ffs will discard any pending dirty pages when it is closed, so we may as well not waste time trying to clean them. This doesn't stop other things from writing it out, eg: pageout, fsync(2) etc.	2001-08-27 06:09:56 +00:00
Ian Dowse	be70fc04ce	Stop using dirhash when a directory is removed, and ensure that we never attempt to hash directories once they are deleted. This fixes a problem where operations on a deleted directory could trigger dirhash sanity panics.	2001-08-26 20:47:19 +00:00
Ian Dowse	2ed42812bd	When compacting directories, ufs_direnter() always trusted DIRSIZ() to supply the number of bytes to be bcopy()'d to move an entry. If d_ino == 0 however, DIRSIZ() is not guaranteed to return a sensible length, so ufs_direnter could end up corrupting a directory during compaction. In practice I believe this can only happen after fsck_ffs has fixed a previously-corrupted directory. We now deal with any mid-block unused entries specially to avoid using DIRSIZ() or bcopy() on such entries. We also ensure that the variables 'dsize' and 'spacefree' contain meaningful values at all times. Add a few comments to describe better this intricate piece of code. The special handling of mid-block unused entries makes the dirhash- specific bugfix in the previous revision (1.53) now uncecessary, so this change removes it. Reviewed by: mckusick	2001-08-26 01:25:12 +00:00
Ian Dowse	7dfb550e0c	When compressing directory blocks, the dirhash code didn't check that the directory entry was in use before attempting to find it in the hash structures to change its offset. Normally, unused entries do not need to be moved, but fsck can leave behind some unused entries that do. A dirhash sanity panic resulted when the entry to be moved was not found. Add a check that stops entries with d_ino == 0 from being passed to ufsdirhash_move().	2001-08-22 01:35:17 +00:00
Peter Wemm	61a4237001	Sigh. ufs_lookup() calls ffs_snapgone(), meaning that 'options EXT2FS' without 'options FFS' would fail to link.	2001-08-18 03:08:48 +00:00
Ian Dowse	9e27954de1	Two recent commits in sys/ufs/ufs interacted badly with ext2fs because it shares ufs code. In ufs_fhtovp(), the test on i_effnlink is invalid because ext2fs does not maintain this field. In ufs_close(), i_effnlink is also tested, to determines whether or not to call vn_start_write(). The ufs_fhtovp issue breaks NFS exporting of ext2fs filesystems; I believe the other is harmless. Fix both cases by checking um_i_effnlink_valid in the ufsmount struct, and use i_nlink if necessary. Noticed by: bde Reviewed by: mckusick, bde	2001-07-29 22:26:01 +00:00
Ian Dowse	54d6d2dfaf	Disable the dirhash sanity check that panics if an unused directory entry (d_ino == 0) is found in a position that is not the start of a DIRBLKSIZ block. While such entries cannot occur normally (ufs always extends the previous entry to cover the free space instead), they do not cause problems and fsck does not fix them, so panicking is bad.	2001-07-27 18:45:41 +00:00
Peter Wemm	815d14ddab	Use a fixed type for times in on-disk structures for ufs rather than something that could potentially change like time_t.	2001-07-16 00:55:27 +00:00
Ian Dowse	50c7c3a7c8	Return a locked struct buf from ufsdirhash_lookup() to avoid one extra getblk/brelse sequence for each lookup. We already had this buf in ufsdirhash_lookup(), so there was no point in brelse'ing it only to have the caller immediately reaquire the same buffer. This should make the case of sequential lookups marginally faster; in my tests, sequential lookups with dirhash enabled are now only around 1% slower than without dirhash.	2001-07-13 20:50:38 +00:00
Ian Dowse	9b5ad47fb7	Bring in dirhash, a simple hash-based lookup optimisation for large directories. When enabled via "options UFS_DIRHASH", in-core hash arrays are maintained for large directories. These allow all directory operations to take place quickly instead of requiring long linear searches. For now anyway, dirhash is not enabled by default. The in-core hash arrays have a memory requirement that is approximately half the size of the size of the on-disk directory file. A number of new sysctl variables allow control over which directories get hashed and over the maximum amount of memory that dirhash will use: vfs.ufs.dirhash_minsize The minimum on-disk directory size for which hashing should be used. The default is 2560 (2.5k). vfs.ufs.dirhash_maxmem The system-wide maximum total memory to be used by dirhash data structures. The default is 2097152 (2MB). The current amount of memory being used by dirhash is visible through the read-only sysctl variable vfs.ufs.dirhash_maxmem. Finally, some extra sanity checks that are enabled by default, but which may have an impact on performance, can be disabled by setting vfs.ufs.dirhash_docheck to 0. Discussed on: -fs, -hackers	2001-07-10 21:21:29 +00:00
Matthew Dillon	0cddd8f023	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
John Baldwin	ed87274d16	Fix more mntvnode and vnode interlock order reversals.	2001-06-28 22:21:33 +00:00

1 2 3 4 5 ...

843 Commits