freebsd-dev

Author	SHA1	Message	Date
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Konstantin Belousov	9ddfa9c6e9	ffs_balloc_ufsX() routines, in the case of recovering from the failed allocation, free the indirect blocks before clearing the disk pointers, that could lead to the softupdate inconsistencies in the case of the machine or disk crash at the wrong time. Rearrange the recover code to do the ffs_blkfree() after the second ffs_syncvnode(), that clears the pointers chain. Proposed and reviewed by: tegge Tested by: Peter Holm MFC after: 3 weeks	2008-01-03 12:28:57 +00:00
David E. O'Brien	029839a449	style(9)	2008-01-02 01:19:17 +00:00
Konstantin Belousov	e7627b2c62	The ffs_balloc() routines, whan allocating the indirect blocks for the inode, do the rollback in case the allocation failed (due to insufficient free space or quota limits). But, the code does leaves the buffers corresponding to the inoirect blocks on the vnode bufobj list. This causes several assertion failures (for instance, "ffs_truncate3" in ffs_truncate()) to fail, and could result in the indirect block aliasing problem, like writing the context of such blocks to random disk location. Remove the buffers from the bufobj properly. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 3 weeks	2007-12-29 13:31:27 +00:00
Ken Smith	d9e6294e4f	Fix a broken check that recently became more annoying because it now gets enabled when INVARIANTS is on instead of DIAGNOSTIC (which apparently nobody uses). From Tor's description: This happens when the block range spans two block maps, the first in the inode (mapping up to NDADDR direct blocks) and the second being the first indirect block. The current check assumes that both block maps are indirect blocks. Work done by: tegge Tested by: kris, kensmith	2007-12-01 13:12:43 +00:00
Ruslan Ermilov	5b4ab4a032	Fix build without INVARIANTS and update a comment to match a change made in previous revision.	2007-11-09 11:04:36 +00:00
David E. O'Brien	1102b89baa	Turn most ffs 'DIAGNOSTIC's into INVARIANTS.	2007-11-08 17:21:51 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Alfred Perlstein	77465d9390	Get rid of qaddr_t. Requested by: bde	2007-10-16 10:54:55 +00:00
Bjoern A. Zeeb	7fd627f00f	Fix a DIV0 in case a large value for fs_avgfilesize or fs_avgfpdir is given (with newfs or tunefs) and dirsize overflows. In case dirsize is <= 0 because of an overflow set maxcontigdirs to 0 so it will be 1 later. This is what would happen for large fs_avgfilesize. [1] Identified with help from: roberto, pjd Submitted by: pjd [1] Approved by: re (rwatson) MFC after: 8 days	2007-09-10 14:12:29 +00:00
Craig Rodrigues	7a920f5761	Perform range check before allocating memory when reading extended attributes. Reviewed by: kib Approved by: re (hrs) PR: 114389	2007-07-13 18:51:08 +00:00
Peter Wemm	ae259a3d16	Fix an annoying pointer/int cast warning that shows up on 64 bit systems. Approved by: re	2007-07-02 01:31:43 +00:00
Konstantin Belousov	d66ba37013	Fix livelock that could occur when snapshoting UFS with quotas, where some quota limit was exceeded. Sequence of UFS_VALLOC()/UFS_VFREE() call there could cause inodeblock to have both freefile and inodedep dependencies without any inode in the block being marked for write. Then, softdep_check_suspend() would return EAGAIN forewer. Force write of inodeblock with allocated freefile softdependency by setting IN_MODIFIED flag in softdep_freefile and unconditionally calling UFS_UPDATE() in ufs_reclaim. Reported by: kris Debug help and tested by: Peter Holm Approved by: re (kensmith) MFC after: 3 weeks	2007-06-22 13:22:37 +00:00
Robert Watson	32f9753cfb	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Konstantin Belousov	7a31868ed0	Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file: part 2. Convert calls missed in the first big commit. Noted by: rwatson Pointy hat to: kib	2007-06-01 14:33:11 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
Pawel Jakub Dawidek	5d14c414ec	- Remove unnecessary vnode internal locking - v_vflag is protect by vnode's lock (not vnode's interlock). - Simplify code a bit.	2007-05-28 00:28:15 +00:00
Pawel Jakub Dawidek	64c40cdcb0	Eliminate VI_LOCK()/VI_UNLOCK() pair from getattr and close code paths. It's hard to measure performance improvement on my test machine, but the change won't degrade performance for sure. I can measure slight improvement for debugging kernel and it can also be a win for machines where atomic operation is more expensive. Reviewed by: kib	2007-05-23 11:06:09 +00:00
Konstantin Belousov	d413d21071	Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.	2007-05-18 13:02:13 +00:00
Andrew Thompson	832eef31d1	Add a newline to the printf message.	2007-05-03 22:39:52 +00:00
Konstantin Belousov	5b959aa44f	Fix the NAMEI zone leak when snapshot was successfully created. Reported and tested by: Peter Holm MFC after: 2 weeks	2007-04-10 09:31:42 +00:00
Konstantin Belousov	9724167c2a	Recalculate the NEWBLOCK flag for pagedep structure after the softdep lock is dropped, since pagedep may be already processed and deallocated. Found and tested by: kris MFC after: 2 weeks	2007-04-10 09:30:41 +00:00
Konstantin Belousov	23743f6a11	When LK_NOWAIT is passed as argument to process_worklist_item(), this does not prevent handle_workitem_remove() from recursing into a blocking version. Add the dirrem to worklist instead of processing it now if this is the case. Reported and tested by: kris Submitted by: tegge MFC after: 2 weeks	2007-04-10 09:28:17 +00:00
Xin LI	04533fc68e	Use *_EMPTY macros when appropriate.	2007-04-04 07:29:53 +00:00
Konstantin Belousov	06f0c8dc4d	Revert rev. 1.205. Replace unconditional acquision of Giant when QUOTAS are defined with VFS_LOCK_GIANT(NULL) call. This shall fix softdep operation when mpsafe_vfs = 0. Reported and tested by: kris Submitted by: tegge MFC after: 1 week	2007-03-29 08:26:04 +00:00
Konstantin Belousov	36d4667907	Mark UFS as being MP-Safe in "options QUOTA" case too. Remove no more neccessary Giant acquisions in softdepend processing code. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-20 10:51:45 +00:00
Brian Somers	dd51858d31	When we write extended attributes, assert that the inode hasn't already been deleted. The assertion is important to show that we won't end up accounting for extended attribute blocks (using fs_pendingblocks) in our subsequent call to fs_alloc(). Agreed verbally by: mckusick MFC after: 3 weeks	2007-03-19 18:51:02 +00:00
Konstantin Belousov	088ffd2086	Implement fine-grained locking for UFS quotas. Each struct dquot gets dq_lock mutex to protect dq_flags and to interlock with DQ_LOCK. qhash, dqfreelist and dq.dq_cnt are protected by global dqhlock mutex. i_dquot array for inode is protected by lockmgr' vnode lock, corresponding assert added to the dqget(). Access to struct ufsmount quota-related fields (um_quotas and um_qflags) is protected by um_lock. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith) This work were not possible without enormous amount of help given by Tor Egge and Peter Holm. Tor reviewed each version of patch, pointed out numerous errors and provided invaluable suggestions. Peter did tireless testing of the patch as it was developed.	2007-03-14 08:54:08 +00:00
Konstantin Belousov	df0f953ae2	Call getinoquota() before allocating new block for the directory to properly account for block allocation. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-14 08:50:27 +00:00
Konstantin Belousov	762c75b209	Remove unneeded getinoquota() call in the ufs_access(). Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-14 08:48:57 +00:00
Tor Egge	61b9d89ff0	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib	2007-03-13 01:50:27 +00:00
Kirk McKusick	a9093e846d	Move macros describing extended attributes in UFS from <sys/extattr.h> to <ufs/ufs/extattr.h>. Move description of extended attributes in UFS from man9/extattr.9 to man5/fs.5. Note that restore will not compile until <sys/extattr.h> and <ufs/ufs/extattr.h> have been updated. Suggested by: Robert Watson	2007-03-06 08:13:21 +00:00
Pawel Jakub Dawidek	b6f6e672f7	Fix build breakage.	2007-03-01 23:14:46 +00:00
Pawel Jakub Dawidek	7869327cfa	Change: "... try to use VADMIN in preference to VADMIN ..." To: "... try to use VADMIN in preference to VWRITE ..."	2007-03-01 21:44:08 +00:00
Pawel Jakub Dawidek	bb531912ff	Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better describe the privilege. OK'ed by: rwatson	2007-03-01 20:47:42 +00:00
Pawel Jakub Dawidek	3b2eb461e0	Avoid checking for privileges if there is no need to. Discussed with: rwatson	2007-03-01 20:38:24 +00:00
Brian Somers	98fff6b57c	Account for di_blocks allocations when IN_SPACECOUNTED is set in an inode's i_flag. It's possible that after ufs_infactive() calls softdep_releasefile(), i_nlink stays >0 for a considerable amount of time (> 60 seconds here). During this period, any ffs allocation routines that alter di_blocks must also account for the blocks in the filesystem's fs_pendingblocks value. This change fixes an eventual df/du discrepency that will happen as the result of fs_pendingblocks being reduced to <0. The only manifestation of this that people may recognise is the following message on boot: /somefs: update error: blocks -N files M at which point the negative pending block count is adjusted to zero. Reviewed by: tegge MFC after: 3 weeks	2007-02-23 20:23:35 +00:00
Kirk McKusick	6e6b7d44ef	The functions that set and delete external attributes must check that the filesystem is not mounted read-only before proceeding. Reported by: Ryan Beasley <ryanb@FreeBSD.org> MFC after: 1 week	2007-02-21 08:50:06 +00:00
Robert Watson	95b091d2f2	Rename three quota privileges from the UFS privilege namespace to the VFS privilege namespace: exceedquota, getquota, and setquota. Leave UFS-specific quota configuration privileges in the UFS name space. This renumbers VFS and UFS privileges, so requires rebuilding modules if you are using security policies aware of privilege identifiers. This is likely no one at this point since none of the committed MAC policies use the privilege checks.	2007-02-19 13:33:10 +00:00
Robert Watson	e82d0201bd	Limit quota privileges in jail to PRIV_UFS_GETQUOTA and PRIV_UFS_SETQUOTA.	2007-02-19 13:26:39 +00:00
Kirk McKusick	5a86fe5361	This README file is obsolete. The cited problems were fixed long ago and the code is installed by default so no longer requires action by the administrator to be included.	2007-02-17 08:25:43 +00:00
Pawel Jakub Dawidek	10bcafe9ab	Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs	2007-02-15 22:08:35 +00:00
Konstantin Belousov	32e2b5f1e5	Style(9).	2007-02-15 09:24:58 +00:00
Konstantin Belousov	6a000036fc	Remove not needed acquision of the mount interlock aroung reading of mnt_kern_flags in ufs_itimes(). Suggested by: ssouhlal Confirmed by: tegge MFC after: 2 weeks	2007-02-08 09:47:19 +00:00
Tor Egge	0d86a7f7c2	Call pbgetvp() and pbrelvp() instead of setting b_vp directly. PR: kern/108151	2007-02-04 23:42:02 +00:00
Mike Pritchard	522883b87f	If quotacheck or edquota reset the block or inode grace time for a user or group, when the kernel first sees this, it will update the grace time value. However, it never flags the quota as modified and the updated value never makes it to the quota data file unless the user actually makes some other change that would write the data out. Fixed to flag the quota as modified if the soft limit has actually been reached and should be now enforced.	2007-02-04 06:46:57 +00:00
Mike Pritchard	6c62e3fce9	Prevent quotactl calls that pass in an id of -1 from incorrectly using the callers UID instead of the GID when performing group operations. This could allow users to determine group quota information for groups they are not a member of in some cases. Rename the "uid" parameter in ufs_quotactl to "id" to better show that it is used for more than just the uid, and to be more in line with the naming conventions in the other quota routines. PR: kern/33940	2007-02-01 02:13:53 +00:00
Mike Pritchard	3c0508582d	Disallow negative UIDs when processing quotactl options.	2007-02-01 01:01:56 +00:00
Konstantin Belousov	2cc7d26f7f	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)	2007-01-23 10:01:19 +00:00
Xin LI	e499c6135c	Fix build. chkdquot() should not return anything.	2007-01-20 13:54:28 +00:00
Mike Pritchard	db9b81eabc	Quota system cleanup. 1) Do not do quota accounting for the actual quota data files or for file system snapshot files ("system" files). This prevents a deadlock descibed in PR kern/30958 if the kernel ever has to grow the quota file. Snapshot files were already exempt from the quota checks, but this change generalized the check. 2) Fix a cast that caused extremely large uids/gids to incorrectly write the quota information to the data file at a truncated value for a uint_t32 id value. The incorrect cast caused quota files in this case to be around 4GB in size, with the correct cast they can now be 131GB in size. Also related to PR kern/30958. 3) Check for what appear to be negative UIDs/GIDs and not account for them. This prevents the quota files from becoming 131GB in size and causing quotacheck to run forever at bootup. This could also cause the kernel to try and expand the quota file, which might deadlock due to the issue in #1. kern/30958 and kern/38156 (and some much older closed PR's). 4) With the deadlock problems gone, the kernel can now expand the size of the quota database files if it needs to. 5) Pass in the i-node count change value to chkiq and chkiqchg as an int, like it used to be before the common routine was split up into 2 different routines to increase / decrease the i-node in-use count. Prevents an underflow on the i-node count. Related to PR kern/89247. 6) Prevent the block usage from growing slowly if a file system is full and the write was denied due to that fact. PR kern/89247. Some of these changes require an updated quotacheck to prevent the creation of huge (131GB) quota data files (item #3). #1/#4 probably fixes a lot of the random hangs when quotas are enabled, possibly some of the jail hangs.	2007-01-20 11:58:32 +00:00
Mike Pritchard	6a5c532911	Fix a spelling error. heirarchy -> hierarchy. Obtained from: OpenBSD	2007-01-16 19:40:25 +00:00
Mike Pritchard	6192525baf	Fix a spelling error in some comments. heirarchy -> hierarchy. Obtained from: OpenBSD	2007-01-16 19:35:43 +00:00
Robert Watson	8102a9d4d5	Canonicalize copyright: use a date range rather than comma-delimited list. MFC after: 3 days	2007-01-08 17:55:32 +00:00
Kip Macy	2f6a774be4	change vop_lock handling to allowing tracking of callers' file and line for acquisition of lockmgr locks Approved by: scottl (standing in for mentor rwatson)	2006-11-13 05:51:22 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Konstantin Belousov	2276d0814f	Aquire Giant in the softdep_flush for clear_remove() and clear_inodedeps() processing when QUOTA is set. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 3 days	2006-11-01 13:48:44 +00:00
Pawel Jakub Dawidek	1a60c7fc8e	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl	2006-10-31 21:48:54 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Konstantin Belousov	ec7a247a24	Do not translate the IN_ACCESS inode flag into the IN_MODIFIED while filesystem is suspending/suspended. Doing so may result in deadlock. Instead, set the (new) IN_LAZYACCESS flag, that becomes IN_MODIFIED when suspend is lifted. Change the locking protocol in order to set the IN_ACCESS and timestamps without upgrading shared vnode lock to exclusive (see comments in the inode.h). Before that, inode was modified while holding only shared lock. Tested by: Peter Holm Reviewed by: tegge, bde Approved by: pjd (mentor) MFC after: 3 weeks	2006-10-10 09:20:54 +00:00
Tor Egge	ad4276811a	Correct check for when IO_SYNC should be set for filesystem not using softupdates when truncating a directory to zero length. Discussed with: bde	2006-10-02 02:08:31 +00:00
Tor Egge	8d0547c68b	Protect change to bo_flag by holding the bufobj mutex.	2006-09-26 04:21:20 +00:00
Tor Egge	e60c361218	Reduce fluctuations of mnt_flag to allow unlocked readers to get a slightly more consistent view.	2006-09-26 04:20:09 +00:00
Tor Egge	9b65c22cf4	Don't restore MNT_QUOTA bit in mnt_flag after snapshot creation, closing a race between nmount() and quotactl().	2006-09-26 04:19:11 +00:00
Tor Egge	55b4ff0d9f	Increase mnt_noasync once in softdep_mount() to disallow async io, closing a window where a file system using softupdates could be async for a short while if both MNT_UPDATE and MNT_ASYNC were passed as flags to nmount(). Add MNTK_SOFTDEP flag to ensure that softdep_mount() doesn't increase mnt_noasync multiple times.	2006-09-26 04:17:17 +00:00
Tor Egge	a1e363f256	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.	2006-09-26 04:15:59 +00:00
Tor Egge	5da56ddb21	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
Konstantin Belousov	28de2218ec	Fix the glitch introduced in rev. 1.93. In softdep_sync_metadata(), switch by worklist type contains two for() loops, for D_INDIRDEP and D_PAGEDEP. On error, these loops are exited by break, where the switch actually shall be leaved. Use goto instead of break to reach the error handling code. Reported by: Peter Holm Reviewed by: tegge Approved by: pjd (mentor) MFC after: 2 weeks	2006-09-20 07:49:28 +00:00
Robert Watson	5702e0965e	Declare security and security.bsd sysctl hierarchies in sysctl.h along with other commonly used sysctl name spaces, rather than declaring them all over the place. MFC after: 1 month Sponsored by: nCircle Network Security, Inc.	2006-09-17 20:00:36 +00:00
Konstantin Belousov	3f65847e2f	While checking for update of snapshot file in the ffs_copyonwrite, first filter out metadata update. Otherwise, devfs vnode could be erronously interpreted as ufs one, causing further check of i_flags to use random memory. PR: kern/100365 Debugged and fix described by: tegge Approved by: pjd (mentor) MFC after: 2 weeks	2006-08-21 17:20:19 +00:00
Pawel Jakub Dawidek	f4cc92c97c	Correct typo in comment.	2006-08-20 10:52:44 +00:00
David E. O'Brien	80cd95f9cc	Rather than print out a nice error message giving details sufficent to fix a 'ufs_dirbad' and then panicing (making it very hard to see the details), put them in the panic message itself.	2006-07-31 15:44:13 +00:00
Stefan Farfeleder	2b8c9fa46b	Drop two unnecessary casts.	2006-07-18 07:03:43 +00:00
Daichi GOTO	55e9893a66	The ufs_lookup.c has a critical bug around the whiteout process. UFS must check a whiteout name when it uses the whiteout, but the current implementation does not check the whileout name, so sometimes UFS writes over a wrong whtieout. UFS MUST check the whiteout name to use a corrent whiteout. This bug leads unionfs. panic. This commit fixes this trouble. Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer) Reviewed by: tegge & rodrigc (mentor) Approved by: rodrigc (mentor) MFC after: 2 weeks	2006-07-11 17:27:04 +00:00
Pawel Jakub Dawidek	5fe6d2beb4	Declare UFS module version.	2006-07-09 14:11:09 +00:00
Pawel Jakub Dawidek	946478fca6	Change fs->fs_fsmnt to mp->mnt_stat.f_mntonname in warnings about missing MAC and ACLs support in the kernel. If it is a first mount, fs->fs_fsmnt is empty. MFC after: 1 week	2006-07-09 14:10:35 +00:00
Craig Rodrigues	71ac2d7c7c	Check the sectorsize of the underlying disk before trying to bread() the UFS superblock. Should eliminate crashes when trying to do: mount -t ufs on an audio CD. PR: kern/85893 Reported by: Russell Francis <rfrancis at ev dot net> MFC after: 1 week	2006-06-03 21:20:37 +00:00
Maxim Konovalov	e680b88a3d	o Rearrange and remove incorrect comments. Requested by: bde	2006-05-31 15:55:52 +00:00
Maxim Konovalov	3593da9e81	o According to POSIX, the result of ftruncate(2) is unspecified for file types other than VREG, VDIR and shared memory objects. We already handle VREG, VLNK and VDIR cases. Silently ignore truncate requests for all the rest. Adjust comments. PR: kern/98064 Submitted by: bde Security: local DoS Regress. test: regression/fifo/fifo_misc MFC after: 2 weeks	2006-05-31 13:15:29 +00:00
Craig Rodrigues	ee98eb825b	Remove "update" from ffs_opts. It has been moved to global_opts in vfs_mount.c.	2006-05-26 12:44:12 +00:00
Craig Rodrigues	5eb304a91a	Remove calls to vfs_export() for exporting a filesystem for NFS mounting from individual filesystems. Call it instead in vfs_mount.c, after we call VFS_MOUNT() for a specific filesystem.	2006-05-26 00:32:21 +00:00
Craig Rodrigues	4ba8c2a5d3	Take errmsg out of ffs_opts. It is already part of global_opts in vfs_mount.c.	2006-05-24 00:12:21 +00:00
Maxim Konovalov	5d1d31b4b3	o Fix a comment: ufs2_dinode.di_blocks counts blocks not bytes actually held.	2006-05-21 21:55:29 +00:00
Maxim Konovalov	b6893ab299	o Fix a comment: directory whiteout type is DT_WHT not DT_W.	2006-05-21 21:28:34 +00:00
Tom Rhodes	e45269bf51	Provide a less cryptic panic message in place of just "found inode."	2006-05-16 18:51:22 +00:00
Tor Egge	e0cf717542	Read block hints list from last snapshot on the active snapshot list.	2006-05-16 00:14:20 +00:00
Tor Egge	d93d98d98f	Copy last block on file system again after file system has been suspended. Obtained from: NetBSD	2006-05-15 23:18:49 +00:00
Tor Egge	ae5d9f3b1c	Don't leak a locked buffer if last block on file system cannot be read.	2006-05-15 22:59:23 +00:00
Tor Egge	ebb78f64c7	Errors detected while file system is suspended should not trigger an assertion failure.	2006-05-15 22:52:22 +00:00
Tor Egge	b405cb5ea5	Expunge traces of unlinked snapshot files when making a new snapshot.	2006-05-13 20:41:37 +00:00
Tor Egge	4613aa0e99	Bring the call to softdep_releasefile() within the region protected by vn_start_secondary_write() since it might cause file system write activity (e.g. ffs_snapremove()).	2006-05-09 22:33:43 +00:00
Tor Egge	43e07fffb6	ffs_syncvnode() might skip some of the blocks due to them being locked, assuming them to be inflight write buffers. This is not always the case. bufdaemon might hold the buffer lock and give up writing the buffer due to it having dependencies, the file system being suspended or the vnode lock being held by another thread. When bufdaemon decides to write the buffer there is still a window before bufobj_wref() has been called, allowing other threads to believe that the vnode has no dirty buffers or inflight writes. Try harder to flush first block of new subdirectory to get rid of MKDIR_BODY dependency.	2006-05-06 20:51:31 +00:00
Tor Egge	b673e7b7eb	Return error if vnode was reclaimed while it was temporarily unlocked. Add missing calls to vn_finished_write() in error handling.	2006-05-05 21:27:31 +00:00
Tor Egge	0911ecffe7	Turn off disk quotas for snapshot files.	2006-05-05 20:10:04 +00:00
Tor Egge	c7793f61dc	Avoid locking overhead when snapshots are disabled.	2006-05-05 19:58:36 +00:00
Pawel Jakub Dawidek	5b139b2d75	- Set bio_done directly to NULL to indicate that we want to wait for the bio. - Use biowait() instead of copying the code. MFC after: 1 month	2006-05-05 10:06:22 +00:00
Tor Egge	d81daf63bc	Detect the snapshot file being prematurely unlinked.	2006-05-03 00:29:22 +00:00
Tor Egge	868bb88ff2	Temporarily undo clusters contribution to global runningbufspace while handling copy on write for the buffers taking part in the cluster.	2006-05-03 00:10:29 +00:00
Tor Egge	5515ad4282	A side effect of calling runningbufwakeup() is that bp->b_runningbufspace is cleared. Save old value and restore bp->b_runningbufspace before returning from ffs_copyonwrite().	2006-05-03 00:04:38 +00:00
Tor Egge	6d94935d36	Close a race when VOP_LOCK() on a snapshot file is attempted at the same time as it is changed back into a normal file. The locker would get the shared "snaplk" lock which would no longer be the correct lock for the vnode.	2006-05-02 23:52:43 +00:00
Scott Long	cbd6fedbf2	Fix a typo.	2006-04-28 04:39:50 +00:00
Jeff Roberson	6ca9fcc586	- Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child buffers to go on the buf daemon's DIRTYGIANT queue. - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler runs in the context of the buf daemon and may require Giant.	2006-04-28 01:05:31 +00:00
Tom Rhodes	7b3f1bbd61	Revert previous to this file before an actual request is made.	2006-04-22 04:22:15 +00:00
Tom Rhodes	8fc22c9d2e	Remove what I believe are two useless ifdefs. If a user or administrator enables multilabel, or any option for that matter, most likely they have a reason. This will allow users to see that mulilabel is enabled via an issued "mount" command and remove an annoying warning - printed only when a MAC kernel is not installed - on boot up. Discussed with: green, brueffer, Samy Al Bahra. Probably ran past: csjp (though I can't remember).	2006-04-21 07:14:25 +00:00
Ken Smith	39fac37953	Fix panic() message to give the right function name.	2006-04-17 07:43:56 +00:00
Tor Egge	68e8466655	Eliminate softdep_flush() livelock by accounting for number of worklist items marked as being in progress.	2006-04-03 22:23:23 +00:00
Jeff Roberson	3bbd6d8ae6	- Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs(). Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.	2006-03-31 03:54:20 +00:00
Tor Egge	700118c72f	Allow compilation when not using softupdates.	2006-03-19 22:16:44 +00:00
Tor Egge	7de3839d0d	Let snapshots make a copy of old contents for all buffers taking part in a cluster instead of just the first buffer. Delay buf_start() calls until snapshots have a copy of old content. PR: kern/93942	2006-03-19 21:43:36 +00:00
Tor Egge	30b3a49fab	Add kludge to avoid deadlock when unlinking snapshot.	2006-03-19 21:29:20 +00:00
Tor Egge	95e7a3c3ac	Reduce probability of unmount failing after having unmounted snapshots.	2006-03-19 21:09:19 +00:00
Tor Egge	8c86028f11	Ensure that vnode for directory isn't reclaimed before ffs_snapshot() has completed expunging unlinked files. It could come back at another memory location causing a lock order reversal.	2006-03-19 21:05:10 +00:00
Jeff Roberson	8db357205c	- Remove the call to softdep_waitidle after suspending the filesystem. This does not do what I wanted as all dirty buffers must be flushed by the call to ffs_sync and any remaining dependency work would mean that this failed. Pointed out by: tegge	2006-03-12 05:26:12 +00:00
Jeff Roberson	2eedeb7e60	- Remove the call to softdep_waitidle after suspending the filesystem. This does not do what I wanted as all dirty buffers must be flushed by the call to ffs_sync and any remaining dependency work would mean that this failed. Pointed out by: tegge	2006-03-12 05:24:14 +00:00
Tor Egge	ca2fa80767	Block secondary writes while expunging active unlinked files. Fix detection of active unlinked files by checking VI_OWEINACT and VI_DOINGINACT in addition to v_usecount. Defer inactive handling for unlinked files if the file system is mostly suspended (secondary writes being blocked). Perform deferred inactive handling after the file system is resumed.	2006-03-11 01:08:37 +00:00
Tor Egge	1e70cd7fc7	Remove unneeded (and broken) usage of MNT_REF()/MNT_REL().	2006-03-10 02:31:12 +00:00
Tor Egge	791dd2fade	Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.	2006-03-08 23:43:39 +00:00
Tor Egge	a695d54404	Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended file systems. This could cause deadlocks when creating snapshots. Reviewed by: jeff	2006-03-08 02:14:39 +00:00
Tor Egge	3b582b4e72	Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.	2006-03-02 22:13:28 +00:00
Jeff Roberson	b9b12498fd	- Acquire lk in softdep_slowdown so that it's owned when we call softdep_speedup(). - Assert that lk is held in softdep_speedup() rather than acquiring it. This avoids a potential lock recursion.	2006-03-02 08:52:53 +00:00
Jeff Roberson	eb2ea10590	- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris	2006-03-02 05:50:23 +00:00
Jeff Roberson	f5a4db791d	- Using LK_NOWAIT in qsync() can get us into infinite loop situations that lead to deadlocks. Remove it. MFC After: 1 week	2006-02-22 06:12:53 +00:00
Robert Watson	5652c15c24	In quotaoff(), lock the vnode instead of asserting it when manipulating v_vflags. MFC after: 1 week Submitted by: Antoine Brodin <antoine at brodin at laposte dot net>	2006-02-12 13:20:06 +00:00
Robert Watson	4a99d6f90a	Instead of asserting the vnode lock before manipulating v_vflag, acquire it and drop it afterwards. Found by: kris MFC after: 1 week	2006-02-11 21:09:27 +00:00
Jeff Roberson	89b0e10910	- Reorder calls to vrele() after calls to vput() when the vrele is a directory. vrele() may lock the passed vnode, which in these cases would give an invalid lock order of child -> parent. These situations are deadlock prone although do not typically deadlock because the vrele is typically not releasing the last reference to the vnode. Users of vrele must consider it as a call to vn_lock() and order it appropriately. MFC After: 1 week Sponsored by: Isilon Systems, Inc. Tested by: kkenn	2006-02-01 00:25:26 +00:00
Tor Egge	82be0a5a24	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
Tor Egge	6c62b2acd0	If the lock passed to getdirtybuf() is the softdep lock then the background write completed wakeup could be missed. Close the race by grabbing the lock normally used for protection of bp->b_xflags. Reviewed by: truckman	2006-01-09 19:32:21 +00:00
Tor Egge	c8c7711d66	Broaden scope of softdep_worklist_busy rwlock protection of softdep processing to avoid some dependencies being missed by softdep_flushworklist(). Reviewed by: truckman	2006-01-09 19:16:56 +00:00
Warner Losh	5c65ae3a88	New option: NO_FFS_SNAPSHOT. I did this in p4 about the same time that NetBSD implemented it independently of them (don't know which one was actually first). This saves about 24k for those times you don't need snapshot support (like when running off a ram disk, or in an embedded environment where size matters).	2006-01-06 04:44:09 +00:00
Xin LI	cd34c8b6a2	Typo.	2005-12-23 15:50:57 +00:00
Dag-Erling Smørgrav	0430a5e289	Eradicate caddr_t from the VFS API.	2005-12-14 00:49:52 +00:00
Craig Rodrigues	b6bd025c35	Fix parsing of atime, clusterr, clusterw, exec, suid, symfollow mount options. Noticed by: Amir Shalem < amir at boom dot org dot il>	2005-11-24 15:06:40 +00:00
Craig Rodrigues	cea903627f	If export mount flag is not passed in, set default parameters for export structure and pass that to vfs_export(). Currently in userland mount(8), an export structure is unconditionally passed in, only for UFS. This is an attempt to move that UFS-specific behavior out of mount(8) and into the UFS filesystem code.	2005-11-20 17:04:50 +00:00
Craig Rodrigues	359d438885	Add more options to ffs_opts, so that vfs_filteropts() will not complain when we pass these options to a UFS filesystem as strings via nmount(): noexec, nosuid, nosymfollow, sync, suiddir	2005-11-19 23:28:19 +00:00
Craig Rodrigues	26f59b6455	- Add parsing for the following existing UFS/FFS mount options in the nmount() callpath via vfs_getopt(), and set the appropriate MNT_* flag: -> acls, async, force, multilabel, noasync, noatime, -> noclusterr, noclusterw, snapshot, update - Allow errmsg as a valid mount option via vfs_getopt(), so we can later add a hook to propagate mount errors back to userspace via vfs_mount_error().	2005-11-18 06:06:10 +00:00
Xin LI	fad951e3a0	Slightly reorganize to reduce duplicated code. Reviewed by: rwatson	2005-11-07 18:25:23 +00:00
Paul Saab	e1cef62715	Rate limit filesystem full and out of inodes messages to once a second.	2005-10-31 20:33:28 +00:00
Robert Watson	5bb84bc84b	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
Xin LI	eb2893ec18	Remove an unneeded "a" from comment.	2005-10-25 19:46:15 +00:00
Nate Lawson	8680d6985f	Adjust maxfilesize for UFS1 and old 4.4 FFS. For UFS1, increase the limit to (max block - 1) * bsize. For DEV_BSIZE, this doubles the limit from 0.5 TB to 1 TB. For the old 4.4 FFS case, decrease the limit from 0.5 TB to 2 GB - 1. Older systems had a 32 bit off_t so they couldn't access the larger files anyway. Collaboration with: bde	2005-10-21 01:54:00 +00:00
Don Lewis	875e108755	Correct the type of the temporary variable used by ufs_lookup.c:1.78 to fix the race condition in the ufs_lookup() ISDOTDOT code. Noticed by: bde MFC after: 12 days	2005-10-16 21:31:46 +00:00
Don Lewis	12d360453c	Close a race in the ufs_lookup() code that handles the ISDOTDOT case by saving the value of dp->i_ino before unlocking the vnode for the current directory and passing the saved value to VFS_VGET(). Without this change, another thread can overwrite dp->i_ino after the current directory is unlocked, causing ufs_lookup() to lock and return the wrong vnode in place of the vnode for its parent directory. A deadlock can occur if dp->i_ino was changed to a subdirectory of the current directory because the root to leaf vnode lock ordering will be violated. A vnode lock can be leaked if dp->i_ino was changed to point to the current directory, which causes the current vnode lock for the current directory to be recursed, which confuses lookup() into calling vrele() when it should be calling vput(). The probability of this bug being triggered seems to be quite low unless the sysctl variable debug.vfscache is set to 0. Reviewed by: jhb MFC after: 2 weeks	2005-10-14 22:13:33 +00:00
Robert Watson	606dcf085f	When performing a VOP_LOOKUP() as part of UFS1 extended attribute auto-start, set cnp.cn_lkflags to LK_EXCLUSIVE. This flag must now be set so that lockmgr knows what kind of lock to acquire, and it will panic if not specified. This resulted in a panic when using extended attributes on UFS1 as of locking work present in the 6.x branch. This is a RELENG_6_0 merge candidate. Reported by: lofi MFC after: 3 days	2005-10-12 14:18:58 +00:00
Diomidis Spinellis	9f5c1d1955	Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks	2005-10-12 06:56:00 +00:00
Tor Egge	48c2ac4539	Avoid unintended VMIO on directories and symlinks due to leftover object not having been destroyed.	2005-10-10 19:02:04 +00:00
Tor Egge	4e0cd00988	Adjust totread argument passed to cluster_read() to account for offset not being block aligned.	2005-10-09 21:11:25 +00:00
Tor Egge	9248a8271c	Don't pretend that a failed sync write was succesful.	2005-10-09 20:49:01 +00:00

1 2 3 4 5 ...

1598 Commits