freebsd-skq

Author	SHA1	Message	Date
des	604db9f61a	WIP	2009-01-30 13:54:03 +00:00
trasz	d27cdd2fdd	Make sure the cdev doesn't go away while the filesystem is still mounted. Otherwise dev2udev() could return garbage. Reviewed by: kib Approved by: rwatson (mentor) Sponsored by: FreeBSD Foundation	2009-01-29 16:47:15 +00:00
rwatson	c99f201c9e	Following a fair amount of real world experience with ACLs and extended attributes since FreeBSD 5, make the following semantic changes: - Don't update the inode modification time (mtime) when extended attributes (and hence also ACLs) are added, modified, or removed. - Don't update the inode access tie (atime) when extended attributes (and hence also ACLs) are queried. This means that rsync (and related tools) won't improperly think that the data in the file has changed when only the ACL has changed. Note that ffs_reallocblks() has not been changed to not update on an IO_EXT transaction, but currently EAs don't use the cluster write routines so this shouldn't be a problem. If EAs grow support for clustering, then VOP_REALLOCBLKS() will need to grow a flag argument to carry down IO_EXT to UFS. MFC after: 1 week PR: ports/125739 Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: pluknet <pluknet@gmail.com>, Greg Byshenk <freebsd@byshenk.net> Discussed with: kib, kientzle, timur, Alexander Bokovoy <ab@samba.org>	2009-01-27 21:48:47 +00:00
jhb	2939c2f76f	Fix a few style bogons. Submitted by: bde	2009-01-21 20:08:17 +00:00
kib	44eef9d9bb	Move the code from ufs_lookup.c used to do dotdot lookup, into the helper function. It is supposed to be useful for any filesystem that has to unlock dvp to walk to the ".." entry in lookup routine. Requested by: jhb Tested by: pho MFC after: 1 month	2009-01-21 14:51:38 +00:00
jhb	47455a7b41	Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP: VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME can be performed while holding a shared vnode lock (the same functionality is done internally by VOP_READ which can run with a shared vnode lock). Add missing locking of the vnode interlock to the ufs implementation and remove a special note and test from the NFS client about not supporting the feature. Inspired by: ups Tested by: pho	2009-01-21 14:42:00 +00:00
kib	1a9032d339	The r187467 should remove all pages for V_NORMAL case too, because indirect block pages are not removed by the mentioned invocation of the vnode_pager_setsize(). Put a common code into the helper function ffs_pages_remove(). Reported and tested by: dchagin Reviewed by: ups MFC after: 3 weeks	2009-01-20 22:00:19 +00:00
jhb	19686e364d	Add a comment explaining why the "bufwait" / "dirhash" LOR reported by WITNESS will not actually result in a deadlock. Discussed with: kib MFC after: 1 week	2009-01-20 16:35:34 +00:00
kib	ac57d6e717	When extending inode size, we call vnode_pager_setsize(), to have a address space where to put vnode pages, and then call UFS_BALLOC(), to actually allocate new block and map it. When UFS_BALLOC() returns error, sometimes we forget to revert the vm object size increase, allowing for the pages that are not backed by the logical disk blocks. Revert vnode_pager_setsize() back when UFS_BALLOC() failed, for ffs_truncate() and ffs_write(). PR: 129956 Reviewed by: ups MFC after: 3 weeks	2009-01-20 11:30:22 +00:00
kib	cbb8defa10	FFS puts the extended attributes blocks at the negative blocks for the vnode, from -1 down. When vinvalbuf(vp, V_ALT) is done for the vnode, it incorrectly does vm_object_page_remove(0, 0), removing all pages from the underlying vm object, not only the pages that back the extended attributes data. Change vinvalbuf() to not remove any pages from the object when V_NORMAL or V_ALT are specified. Instead, the only in-tree caller in ffs_inode.c:ffs_truncate() that specifies V_ALT explicitely removes the corresponding page range. The V_NORMAL caller does vnode_pager_setsize(vp, 0) immediately after the call to vinvalbuf(V_NORMAL) already. Reported by: csjp Reviewed by: ups MFC after: 3 weeks	2009-01-20 11:27:45 +00:00
kib	ca2e9cb3a9	Lock the uepm_lock around the autostart of extattrs. Reported and tested by: pho Reviewed by: rwatson MFC after: 3 weeks	2009-01-08 12:49:55 +00:00
kib	1f52da301f	If unmount of the ffs mp failed, reinitialize the extended attributes for the mp, and restart them if autostart is enabled. Reported and tested by: pho Reviewed by: rwatson MFC after: 3 weeks	2009-01-08 12:48:27 +00:00
kib	068931a989	Do not busy twice the mount point where a quota operation is performed. Tested by: pho MFC after: 1 month	2008-12-18 12:01:53 +00:00
trasz	b2515a861b	According to phk@, VOP_STRATEGY should never, _ever_, return anything other than 0. Make it so. This fixes "panic: VOP_STRATEGY failed bp=0xc320dd90 vp=0xc3b9f648", encountered when writing to an orphaned filesystem. Reason for the panic was the following assert: KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p", bp, bp->b_vp)); at vfs_bio:bufstrategy(). Reviewed by: scottl, phk Approved by: rwatson (mentor) Sponsored by: FreeBSD Foundation	2008-12-16 21:13:11 +00:00
kib	01085e085d	The dqrele() function syncs the dq, then acquires the dqh lock, and then does final drop of the the dq reference to put it onto the free list. There is a possibility that the dq would be found by another thread after sync and before the dqh lock is acquired. If that other thread drops the dq before we have taken the dqh lock, the dirty dq is put on the free list. Recheck the DQ_MOD after the dqh lock is relocked. Repeat dqsync() if the dq is dirty. This ensures that up to date dq is written in the quota file and fixes assertion in dqget(). Reported and tested by: Frode Nordahl <frode nordahl net> MFC after: 3 days	2008-12-08 11:04:17 +00:00
kib	868630039f	Improve usefulness of the panic by printing the pointer to the problematic dquot. In-tree gdb is often unable to get the dq value, so supply it in panic message. MFC after: 3 days	2008-12-07 13:25:06 +00:00
kib	294fd952b0	Do not lock vnode interlock around reading of v_iflag to check VI_DOOMED. Read of the pointer is atomic, and flag cannot be set while vnode lock is held. Requested by: jhb MFC after: 1 month	2008-12-02 11:12:50 +00:00
kib	371d0216fb	Busy ufs filesystem around block of code that does ".." lookup. Since mnt_lock is before lock of any vnode on the mp, it uses LK_NOWAIT. Since MNTK_UNMOUNT may be transient, pdp lock is dropped when vfs_busy() failed, and operation is retried after some time. This way, ffs_vget() is not called on the mp that may be in the process of being destroyed by unmount. Check for the VI_DOOMED flag on pdp after its lock is reacquired, to better detect some situations where directory containing ".." entry is removed during the lookup. Reviewed by: tegge, attilio (previous version) Tested by: pho MFC after: 1 month	2008-11-22 13:11:11 +00:00
jhb	f8948bf5ef	Fix typo.	2008-11-19 20:06:59 +00:00
ambrisko	4ee5cbf821	For now on every 10 cyclinder groups flush the buffer cache to free up space. If the buffer cache fills up then the disk systems can grind to a halt. Better tuning can be figured out later. Tested by: Tim, others and work Reviewed by: Kostik Belousov PR: 128832	2008-11-13 17:40:21 +00:00
jhb	9f264a6a75	Quiet a WITNESS warning with the dirhash sx locks by setting the DUPOK flag. Specifically, if two threads race to create a dirhash for a directory, then one might already have created a private dirhash structure (and locked it) when it realizes the directory now has a structure and tries to lock that one.	2008-11-04 18:56:12 +00:00
trasz	5ff0dc16bf	In UFS, when reading EA that contains ACL fails for some reason, include inode number and filesystem name, so the administrator can fix the problem. Approved by: rwatson (mentor)	2008-11-04 12:30:31 +00:00
attilio	e1f493235e	Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho	2008-11-02 10:15:42 +00:00
trasz	0ad8692247	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)	2008-10-28 13:44:11 +00:00
kib	19fb7eaf9f	Provide an explanation for getinoquota() call in the ufs_access vop. MFC after: 3 days	2008-10-28 12:00:28 +00:00
des	a1e1ad22e0	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
des	66f807ed8b	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
kib	e4785f6af4	Assert that v_holdcnt is non-zero before entering lockmgr in vn_lock and ffs_lock. This cannot catch situations where holdcnt is incremented not by curthread, but I think it is useful. Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks	2008-10-20 10:11:33 +00:00
kib	f2993add72	Sync up summary information for cylinder groups while data is already in memory during snapshot creation. This improves the results of the background fsck. Submitted by: tegge MFC after: 1 week	2008-10-13 14:05:01 +00:00
attilio	b8bf37e585	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-10-10 21:23:50 +00:00
jhb	536c1c47bb	Enable shared lookups on UFS. There are some remaining issues with forced unmounts, but those are in the VFS lookup code are not UFS specific. Tested by: pho, kris	2008-09-24 18:53:04 +00:00
jhb	82d0acb613	Close a race between concurrent calls to ufsdirhash_recycle() and ufsdirhash_free() introduced in my last commit by removing the dirhash about to be free'd in ufsdirhash_free() from the global dirhash list before dropping the sx lock. Tested by: kris	2008-09-22 20:53:22 +00:00
kib	81d455e702	Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don't initialize va_vaflags and va_spare because they are not part of the VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero. Submitted by: Jaakko Heinonen <jh saunalahti fi> Reviewed by: bde Discussed on: freebsd-fs MFC after: 1 month	2008-09-20 19:46:45 +00:00
jhb	281938f04d	Retire the 'i_reclen' field from the in-memory i-node. Previously, during a DELETE lookup operation, lookup would cache the length of the directory entry to be deleted in 'i_reclen'. Later, the actual VOP to remove the directory entry (ufs_remove, ufs_rename, etc.) would call ufs_dirremove() which extended the length of the previous directory entry to "remove" the deleted entry. However, we always read the entire block containing the directory entry when doing the removal, so we always have the directory entry to be deleted in-memory when doing the update to the directory block. Also, we already have to figure out where the directory entry that is being removed is in the block so that we can pass the component name to the dirhash code to update the dirhash. So, instead of passing 'i_reclen' from ufs_lookup() to the ufs_dirremove() routine, just read the 'd_reclen' field directly out of the entry being removed when updating the length of the previous entry in the block. This avoids a cosmetic issue of writing to 'i_reclen' while holding a shared vnode lock. It also slightly reduces the amount of side-band data passed from ufs_lookup() to operations updating a directory via the directory's i-node. Reviewed by: jeff	2008-09-16 19:06:44 +00:00
jhb	020ede1151	Fix a race with shared lookups on UFS. If the the dirhash code reached the cap on memory usage, then shared LOOKUP operations could start free'ing dirhash structures. Without these fixes, concurrent free's on the same directory could result in one of the threads blocked on a lock in a dirhash structure free'd by the other thread. - Replace the lockmgr lock in the dirhash structure with an sx lock. - Use a reference count managed with ufsdirhash_hold()/drop() to determine when to free the dirhash structures. The directory i-node holds a reference while the dirhash is attached to an i-node. Code that wishes to lock the dirhash while holding a shared vnode lock must first acquire a private reference to the dirhash while holding the vnode interlock before acquiring the dirhash sx lock. After acquiring the sx lock, it drops the private reference after checking to see if the dirhash is still used by the directory i-node.	2008-09-16 16:23:56 +00:00
jhb	3a8bef861f	- Only set i_offset in the parent directory's i-node during a lookup for non-LOOKUP operations. - Relax a VOP assertion for a DELETE lookup. rename() uses WANTPARENT instead of LOCKPARENT when looking up the source pathname. ufs_rename() uses a relookup() to lock the parent directory when it decides to finally remove the source path. Thus, it is ok for a DELETE with WANTPARENT set instead of LOCKPARENT to use a shared vnode lock rather than an exclusive vnode lock. Reported by: kris (2) Reviewed by: jeff	2008-09-16 16:18:36 +00:00
jhb	37890de0aa	vdropl() drops the vnode interlock. Thus, the code in the QUOTA case that upgrades the vnode lock if it is share locked was dropping the interlock before actually checking VI_DOOMED. Fix this by do the vdropl() after the check and relying on it to drop the vnode interlock. Reported by: pho Reviewed by: kib MFC after: 1 week	2008-09-16 16:15:38 +00:00
kib	18a3116541	Suspend the write operations on the UFS filesystem being unmounted or remounted from rw to ro. Proposed and reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 11:55:53 +00:00
kib	f6863a9ef7	When attempt is made to suspend a filesystem that is already syspended, wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best. Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually. Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later. Reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 11:51:06 +00:00
kib	0488506405	Add the ffs structures introspection functions for ddb. Show the b_dep value for the buffer in the show buffer command. Add a comand to dump the dirty/clean buffer list for vnode. Reviewed by: tegge Tested and used by: pho MFC after: 1 month	2008-09-16 11:19:38 +00:00
kib	ad287e3d61	When downgrading the read-write mount to read-only, do_unmount() sets MNT_RDONLY flag before the VFS_MOUNT() is called. In ufs_inactive() and ufs_itimes_locked(), UFS verifies whether the fs is read-only by checking MNT_RDONLY, but this may cause loss of the IN_MODIFIED flag for inode on the fs being remounted rw->ro. Introduce UFS_RDONLY() struct ufsmount' method that reports the value of the fs_ronly. The later is set to 1 only after the remount is finished. Reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 10:59:35 +00:00
kib	f67f57a431	The struct inode *ip supplied to softdep_freefile is not neccessary the inode having number ino. In r170991, the ip was marked IN_MODIFIED, that is not quite correct. Mark only the right inode modified by checking inode number. Reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 10:52:25 +00:00
trasz	88f6f49133	When calling extattr_check_cred, use V{READ,WRITE}, not I{READ,WRITE}. Approved by: rwatson (mentor)	2008-09-03 12:46:09 +00:00
attilio	e2ca413d09	Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions. Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>	2008-08-31 14:26:08 +00:00
attilio	dbf35e279f	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
kib	8ccd4bd3a1	In ffs_valloc(), ffs_vget() may fail because insmntque() refused to insert new vnode into the mount vnode list. Then, for the SU-enabled mount, ffs_vfree could create freefile dependency. This dependency can hang around forever since inode is not marked as IN_MODIFIED and correspondingly inodeblock may be not marked as dirty. After ffs_vget() fails, retry with FFSV_FORCEINSMQ, mark the inode as modified, and vput() it immediately. Take care of the dup alloc. Tested by: pho Reviewed by: tegge MFC after: 1 month	2008-08-28 09:19:50 +00:00
kib	2a5baa49d2	Softdep code may need to instantiate vnode when processing dependencies. In particular, it may need this while syncing filesystem being unmounted. Since during unmount MNTK_NOINSMNTQUE flag is set, that could sometimes disallow insertion of the vnode into the vnode mount list, softdep code needs to overwrite the MNTK_NOINSMNTQUE flag. Create the ffs_vgetf() function that sets the VV_FORCEINSMQ flag for new vnode and use it consistently from the softdep code instead of ffs_vget(). Add the retry logic to the softdep_flushfiles() to flush the vnodes that could be instantiated while flushing softdep dependencies. Tested by: pho, kris Reviewed by: tegge MFC after: 1 month	2008-08-28 09:18:20 +00:00
kib	5890c8d4b7	Put the relocked variable from the r182111 into the #ifdef QUOTA braces to prevent warning about unused var on the !QUOTA kernels. Reported by: ed MFC after: 1 week	2008-08-24 19:06:19 +00:00
kib	1eb6d5f22f	Revert the r167541: "Remove unneeded getinoquota() call in the ufs_access()." The call to getinoquota in ufs_access() serves the purpose of instantiating inode dquot from the vn_open(). Since quotas are accounted only for the inodes with already attached dquot, removal of the call prevented opened inodes from participation in the quota calculations. Since ufs_access() may be called with the vnode being only shared locked, upgrade (and then downgrade) vnode lock if calling getinoquota(). Reported by: simon at optinet com In collaboration with: pho MFC after: 1 week	2008-08-24 17:24:22 +00:00
kib	ca3c43733a	Revert r181345. Move the NULL pointer check to the vfs_deleteopt() function. Discussed with: rodrigc MFC after: 3 days	2008-08-10 12:15:36 +00:00
kib	28272d34d6	User may do "mount -o snapshot ...", that causes new FFS mount to be performed with snapshot option, while the mp->mnt_opt is NULL. Protect against NULL pointer dereference. Noted by: Mateusz Guzik <mjguzik gmail com> MFC after: 3 days	2008-08-06 14:47:19 +00:00
des	f42aa30836	ufsmount.h uses "struct\tfoo bar;", except where it doesn't. quota.h uses "struct foo\tbar;", except where it doesn't. Try to make them both agree with themselves (though not with eachother)	2008-08-05 15:24:07 +00:00
des	c4f7fdb253	Whitespace, prototypes	2008-08-05 10:25:55 +00:00
jhb	69486dc077	Whitespace tweak.	2008-07-30 21:07:56 +00:00
kib	167593c766	The ffs_balloc_ufs{1,2} functions call bdwrite() while having several vnode buffers locked at once. In particular, there are indirect buffers among locked ones. The bdwrite() may start the flushing to keep dirty buffer list at the bounds. If any buffer on the dirty list requires translation from logical to physical block number, code may ends up trying to lock an indirect buffer already locked in ffs_balloc_ufsX. Prevent the bdflush() activity when several buffers are locked at once by setting the TDP_INBDFUSH for the problematic code blocks. Reported and tested by: pho, Josef Buchsteiner at Juniper In collaboration with: kan MFC after: 1 month	2008-07-23 14:32:44 +00:00
pjd	5373c91a91	Say hi to svn, by simplifing ffs_vget() function a bit - there is no need for a variable that is used only once.	2008-07-19 22:29:44 +00:00
rodrigc	4e4b0a667c	Fix comments to replace SBSIZE with SBLOCKSIZE, since SBSIZE was renamed to SBLOCKSIZE in version 1.33 Reviewed by: mckusick	2008-05-24 20:44:14 +00:00
rodrigc	c1896c3fec	After converting the "snapshot" mount option to the MNT_SNAPSHOT flag, delete "snapshot" from the persistent mount options list. This should fix problems with doing a mount -o snapshot of a file system, followed by an NFS export of the same file system. PR: 122833 Reported by: Leon Kos <leon.kos lecad fs uni-lj si>, Jaakko Heinonen <jh saunalahti fi> MFC after: 1 month	2008-05-24 00:41:32 +00:00
rodrigc	44e2059da2	For the following mount options, do not perform the string to flag conversions here, because we already do them further up in vfs_donmount() in vfs_mount.c async -> MNT_ASYNC force -> MNT_FORCE multilabel -> MNT_MULTILABEL noatime -> MNT_NOATIME noclusterr -> MNT_NOCLUSTERR noclusterw -> MNT_NOCLUSTERW MFC after: 1 month	2008-05-24 00:02:12 +00:00
ups	fbd329664f	Allow VM object creation in ufs_lookup. (If vfs.vmiodirenable is set) Directory IO without a VM object will store data in 'malloced' buffers severely limiting caching of the data. Without this change VM objects for directories are only created on an open() of the directory. TODO: Inline test if VM object already exists to avoid locking/function call overhead. Tested by: kris@ Reviewed by: jeff@ Reported by: David Filo	2008-05-20 19:05:43 +00:00
jeff	057cb45df3	- Use a local variable for i_ino in ufs_lookup. It is only used to communicate between two parts of this one function. This was causing problems with shared lookups as each would trash the ino value in the inode. - Remove the unused i_ino field from the inode structure.	2008-04-22 12:34:16 +00:00
kib	52243403eb	Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks	2008-04-16 11:33:32 +00:00
jeff	8facfb0569	- Use a lockmgr lock rather than a mtx to protect dirhash. This lock may be held for the duration of the various dirhash operations which avoids many complex unlock/lock/revalidate sequences. - Permit shared locks on lookup. To protect the ip->i_dirhash pointer we use the vnode interlock in the shared case. Callers holding the exclusive vnode lock can run without fear of concurrent modification to i_dirhash. - Hold an exclusive dirhash lock when creating the dirhash structure for the first time or when re-creating a dirhash structure which has been recycled. Tested by: kris, pho	2008-04-11 09:48:12 +00:00
jeff	fef79bb2db	- cache dp->i_offset in the local 'i_offset' variable for use in loop indexes so directory lookup becomes shared lock safe. In the modifying cases an exclusive lock is held here so the commit routine may rely on the state of i_offset. - Similarly handle i_diroff by fetching at the start and setting only once the operation is complete. Without the exclusive lock these are only considered hints. - Assert that an exclusive lock is held when we're preparing for a commit routine. - Honor the lock type request from lookup instead of always using exclusive locking. Tested by: pho, kris	2008-04-11 09:44:25 +00:00
pjd	e83dd39a00	Correct function name in panic(). Reported by: kensmith	2008-04-07 18:12:37 +00:00
attilio	07441f19e1	Optimize lockmgr in order to get rid of the pool mutex interlock, of the state transitioning flags and of msleep(9) callings. Use, instead, an algorithm very similar to what sx(9) and rwlock(9) alredy do and direct accesses to the sleepqueue(9) primitive. In order to avoid writer starvation a mechanism very similar to what rwlock(9) uses now is implemented, with the correspective per-thread shared lockmgrs counter. This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and lockmgr_args_rw(). These two are like the 2 "normal" versions, but they both accept a rwlock as interlock. In order to realize this, the general lockmgr manager function "__lockmgr_args()" has been implemented through the generic lock layer. It supports all the blocking primitives, but currently only these 2 mappers live. The patch drops the support for WITNESS atm, but it will be probabilly added soon. Also, there is a little race in the draining code which is also present in the current CVS stock implementation: if some sharers, once they wakeup, are in the runqueue they can contend the lock with the exclusive drainer. This is hard to be fixed but the now committed code mitigate this issue a lot better than the (past) CVS version. In addition assertive KA_HELD and KA_UNHELD have been made mute assertions because they are dangerous and they will be nomore supported soon. In order to avoid namespace pollution, stack.h is splitted into two parts: one which includes only the "struct stack" definition (_stack.h) and one defining the KPI. In this way, newly added _lockmgr.h can just include _stack.h. Kernel ABI results heavilly changed by this commit (the now committed version of "struct lock" is a lot smaller than the previous one) and KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction, so manpages and __FreeBSD_version will be updated accordingly. Tested by: kris, pho, jeff, danger Reviewed by: jeff Sponsored by: Google, Summer of Code program 2007	2008-04-06 20:08:51 +00:00
kib	eff8c6d35e	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
jeff	1968343329	- Since rev 1.142 of ffs_snapshot.c the interlock has not been required to protect the v_lock pointer. Removing the interlock acquisition here allows vn_lock() to proceed without requiring the interlock at all. - If the lock mutated while we were sleeping on it the interlock has been dropped. It is conceivable that the upper layer code was relying on the interlock and LK_NOWAIT to protect the identity or state of the vnode while acquiring the lock. In this case return EBUSY rather than trying the new lock to prevent potential races. Reviewed by: tegge	2008-03-31 07:55:45 +00:00
jeff	5e4f326d87	- Don't free snapdata structures when they are no longer in use. Keeping the lockmgr lock valid allows us to switch the v_lock pointer in snapshot vnodes between the embedded lockmgr lock and snapdata lock without needing the vnode interlock to protect against races - Keep unused snapdata structures in a list. - Add a function to lock the devvp and allocate a snapdata to it or acquire a new one without races. The old function was safe from creation races because we set the mount flag when creating snapshots and thus serializing them. However, it might have been subject to destroying races. Reviewed by: tegge	2008-03-31 07:47:08 +00:00
jhb	20cadd93f0	Fix a nit with the 'nofoo' options where 'foo' is mapped to 'nonofoo' (such as 'atime' vs 'noatime'). The filesystems will always see either 'nofoo' or 'nonofoo', never plain 'foo'. As such, their list of valid mount options should include 'nofoo' instead of 'foo'. With this fix, you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as noatime without getting an error. You can also update a noatime FFS filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime' using nmount(2) (e.g. 7.x /sbin/mount binary). MFC after: 1 week Reviewed by: crodig	2008-03-26 20:48:07 +00:00
dfr	79d2dfdaa6	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks	2008-03-26 15:23:12 +00:00
kib	5ddf5664cc	Yield the cpu in the kernel while iterating the list of the vnodes belonging to the mountpoint. Also, yield when in the softdep_process_worklist() even when we are not going to sleep due to buffer drain. It is believed that the ULE fixed the problem [1], but the yielding seems to be needed at least for the 4BSD case. Discussed: on stable@, with bde Reviewed by: tegge, jeff [1] MFC after: 2 weeks	2008-03-23 13:45:24 +00:00
jeff	a9d123c3ab	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)	2008-03-22 09:15:16 +00:00
kib	04661caa35	Reduce the acquisition of the vnode interlock in the ffs_read() and ffs_extread() when setting the IN_ACCESS flag by checking whether the IN_ACCESS is already set. The possible race there is admissible. Tested by: pho Submitted by: jeff	2008-03-21 12:33:00 +00:00
jeff	46f09d5bc3	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
rwatson	877d7c65ba	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
cokane	1dbd762cae	Replace the non-MPSAFE timeout(9) API in ffs_softdep.c with the MPSAFE callout_* API (e.g. callout_init_mtx(9)). This was one of the numerous items on the http://wiki.freebsd.org/SMPTODO list. Reviewed by: imp, obrien, jhb MFC after: 1 week	2008-03-13 20:15:48 +00:00
emaste	747c71e58b	Remove include of opt_quota.h; as of revision 1.205 there is no longer any #ifdef QUOTA conditional code.	2008-03-10 18:44:07 +00:00
kib	e378ea9934	Initialize mnt_stat.f_iosize before autostarting UFS1 extattrs. It is normally initialized by ffs_statfs() after ffs_mount finished. The extattr autostart code calls the ufs_lookup(), that uses value above to iterate over the directory blocks, see bmask initialization in the ufs_lookup() and ufsdirhash. Having the filesystem with root directory spanning more then one block would result in reading a random kernel memory. PR: kern/120781 Test case provided by: rwatson MFC after: 1 week	2008-03-05 16:34:03 +00:00
rwatson	48aef5a3f0	Continue on-going campaign to replace lockmgr locks with sx locks where the specific semantics of ockmgr aren't required: update UFS1 extended attributes to protect its data structures using an sx lock. While here, update comments on lock granularity. MFC after: 2 weeks	2008-03-04 12:50:11 +00:00
rwatson	96767d9090	Move setting of MNTK_MPSAFE flag before UFS1 extended attribute auto-start so that the flag is set before we start performing I/O in the auto-start routine. MFC after: 2 weeks Suggested by: kib	2008-03-04 12:10:03 +00:00
rwatson	ba77105862	Don't auto-start or allow extattrctl for UFS2 file systems, as UFS2 has native extended attributes. This didn't interfere with the operation of UFS2 extended attributes, but the code shouldn't be running for UFS2. MFC after: 2 weeks	2008-03-02 22:52:14 +00:00
keramida	41eb7744b1	Minor typo nit.	2008-02-25 19:31:44 +00:00
attilio	4014b55830	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-25 18:45:57 +00:00
attilio	0d54671a48	Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.	2008-02-24 16:38:58 +00:00
attilio	265cb5fb91	- Introduce lockmgr_args() in the lockmgr space. This function performs the same operation of lockmgr() but accepting a custom wmesg, prio and timo for the particular lock instance, overriding default values lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo. - Use lockmgr_args() in order to implement BUF_TIMELOCK() - Cleanup BUF_LOCK() - Remove LK_INTERNAL as it is nomore used in the lockmgr namespace Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-15 21:04:36 +00:00
attilio	7213f4c32b	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo	2008-01-24 12:34:30 +00:00
attilio	caa2ca048b	- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)	2008-01-19 17:36:23 +00:00
attilio	71b7824213	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
attilio	18d0a0dd51	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
kib	149cd5b092	ffs_balloc_ufsX() routines, in the case of recovering from the failed allocation, free the indirect blocks before clearing the disk pointers, that could lead to the softupdate inconsistencies in the case of the machine or disk crash at the wrong time. Rearrange the recover code to do the ffs_blkfree() after the second ffs_syncvnode(), that clears the pointers chain. Proposed and reviewed by: tegge Tested by: Peter Holm MFC after: 3 weeks	2008-01-03 12:28:57 +00:00
obrien	51625bf288	style(9)	2008-01-02 01:19:17 +00:00
kib	c9fb56ffc8	The ffs_balloc() routines, whan allocating the indirect blocks for the inode, do the rollback in case the allocation failed (due to insufficient free space or quota limits). But, the code does leaves the buffers corresponding to the inoirect blocks on the vnode bufobj list. This causes several assertion failures (for instance, "ffs_truncate3" in ffs_truncate()) to fail, and could result in the indirect block aliasing problem, like writing the context of such blocks to random disk location. Remove the buffers from the bufobj properly. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 3 weeks	2007-12-29 13:31:27 +00:00
kensmith	66cb6fd44f	Fix a broken check that recently became more annoying because it now gets enabled when INVARIANTS is on instead of DIAGNOSTIC (which apparently nobody uses). From Tor's description: This happens when the block range spans two block maps, the first in the inode (mapping up to NDADDR direct blocks) and the second being the first indirect block. The current check assumes that both block maps are indirect blocks. Work done by: tegge Tested by: kris, kensmith	2007-12-01 13:12:43 +00:00
ru	1f9b2f54c8	Fix build without INVARIANTS and update a comment to match a change made in previous revision.	2007-11-09 11:04:36 +00:00
obrien	1ae16b4e64	Turn most ffs 'DIAGNOSTIC's into INVARIANTS.	2007-11-08 17:21:51 +00:00
rwatson	60570a92bf	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
julian	51d643caa6	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
alfred	3a60df401c	Get rid of qaddr_t. Requested by: bde	2007-10-16 10:54:55 +00:00
bz	621c3a5b99	Fix a DIV0 in case a large value for fs_avgfilesize or fs_avgfpdir is given (with newfs or tunefs) and dirsize overflows. In case dirsize is <= 0 because of an overflow set maxcontigdirs to 0 so it will be 1 later. This is what would happen for large fs_avgfilesize. [1] Identified with help from: roberto, pjd Submitted by: pjd [1] Approved by: re (rwatson) MFC after: 8 days	2007-09-10 14:12:29 +00:00
rodrigc	85690ea861	Perform range check before allocating memory when reading extended attributes. Reviewed by: kib Approved by: re (hrs) PR: 114389	2007-07-13 18:51:08 +00:00
peter	5c6fefddf1	Fix an annoying pointer/int cast warning that shows up on 64 bit systems. Approved by: re	2007-07-02 01:31:43 +00:00
kib	2f486f25b6	Fix livelock that could occur when snapshoting UFS with quotas, where some quota limit was exceeded. Sequence of UFS_VALLOC()/UFS_VFREE() call there could cause inodeblock to have both freefile and inodedep dependencies without any inode in the block being marked for write. Then, softdep_check_suspend() would return EAGAIN forewer. Force write of inodeblock with allocated freefile softdependency by setting IN_MODIFIED flag in softdep_freefile and unconditionally calling UFS_UPDATE() in ufs_reclaim. Reported by: kris Debug help and tested by: Peter Holm Approved by: re (kensmith) MFC after: 3 weeks	2007-06-22 13:22:37 +00:00
rwatson	00b02345d4	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
jeff	91d1501790	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
kib	17260ba6f1	Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file: part 2. Convert calls missed in the first big commit. Noted by: rwatson Pointy hat to: kib	2007-06-01 14:33:11 +00:00
jeff	a7a8bac81f	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
kib	f13486a222	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
pjd	ee9e45ce7d	- Remove unnecessary vnode internal locking - v_vflag is protect by vnode's lock (not vnode's interlock). - Simplify code a bit.	2007-05-28 00:28:15 +00:00
pjd	1e693d6e52	Eliminate VI_LOCK()/VI_UNLOCK() pair from getattr and close code paths. It's hard to measure performance improvement on my test machine, but the change won't degrade performance for sure. I can measure slight improvement for debugging kernel and it can also be a win for machines where atomic operation is more expensive. Reviewed by: kib	2007-05-23 11:06:09 +00:00
kib	162fa8dc6d	Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.	2007-05-18 13:02:13 +00:00
thompsa	25f570bc48	Add a newline to the printf message.	2007-05-03 22:39:52 +00:00
kib	0f0ec54312	Fix the NAMEI zone leak when snapshot was successfully created. Reported and tested by: Peter Holm MFC after: 2 weeks	2007-04-10 09:31:42 +00:00
kib	859db1e740	Recalculate the NEWBLOCK flag for pagedep structure after the softdep lock is dropped, since pagedep may be already processed and deallocated. Found and tested by: kris MFC after: 2 weeks	2007-04-10 09:30:41 +00:00
kib	61f6e27c62	When LK_NOWAIT is passed as argument to process_worklist_item(), this does not prevent handle_workitem_remove() from recursing into a blocking version. Add the dirrem to worklist instead of processing it now if this is the case. Reported and tested by: kris Submitted by: tegge MFC after: 2 weeks	2007-04-10 09:28:17 +00:00
delphij	5a4b35079b	Use *_EMPTY macros when appropriate.	2007-04-04 07:29:53 +00:00
kib	951894c128	Revert rev. 1.205. Replace unconditional acquision of Giant when QUOTAS are defined with VFS_LOCK_GIANT(NULL) call. This shall fix softdep operation when mpsafe_vfs = 0. Reported and tested by: kris Submitted by: tegge MFC after: 1 week	2007-03-29 08:26:04 +00:00
kib	7f02b9589e	Mark UFS as being MP-Safe in "options QUOTA" case too. Remove no more neccessary Giant acquisions in softdepend processing code. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-20 10:51:45 +00:00
brian	c235d1165c	When we write extended attributes, assert that the inode hasn't already been deleted. The assertion is important to show that we won't end up accounting for extended attribute blocks (using fs_pendingblocks) in our subsequent call to fs_alloc(). Agreed verbally by: mckusick MFC after: 3 weeks	2007-03-19 18:51:02 +00:00
kib	3f7d43feef	Implement fine-grained locking for UFS quotas. Each struct dquot gets dq_lock mutex to protect dq_flags and to interlock with DQ_LOCK. qhash, dqfreelist and dq.dq_cnt are protected by global dqhlock mutex. i_dquot array for inode is protected by lockmgr' vnode lock, corresponding assert added to the dqget(). Access to struct ufsmount quota-related fields (um_quotas and um_qflags) is protected by um_lock. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith) This work were not possible without enormous amount of help given by Tor Egge and Peter Holm. Tor reviewed each version of patch, pointed out numerous errors and provided invaluable suggestions. Peter did tireless testing of the patch as it was developed.	2007-03-14 08:54:08 +00:00
kib	104c10948a	Call getinoquota() before allocating new block for the directory to properly account for block allocation. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-14 08:50:27 +00:00
kib	5db0b75a18	Remove unneeded getinoquota() call in the ufs_access(). Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-14 08:48:57 +00:00
tegge	214bc5723c	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib	2007-03-13 01:50:27 +00:00
mckusick	e5953785d0	Move macros describing extended attributes in UFS from <sys/extattr.h> to <ufs/ufs/extattr.h>. Move description of extended attributes in UFS from man9/extattr.9 to man5/fs.5. Note that restore will not compile until <sys/extattr.h> and <ufs/ufs/extattr.h> have been updated. Suggested by: Robert Watson	2007-03-06 08:13:21 +00:00
pjd	d38fdb51a0	Fix build breakage.	2007-03-01 23:14:46 +00:00
pjd	23ac3fc28a	Change: "... try to use VADMIN in preference to VADMIN ..." To: "... try to use VADMIN in preference to VWRITE ..."	2007-03-01 21:44:08 +00:00
pjd	e544923453	Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better describe the privilege. OK'ed by: rwatson	2007-03-01 20:47:42 +00:00
pjd	9558665f1e	Avoid checking for privileges if there is no need to. Discussed with: rwatson	2007-03-01 20:38:24 +00:00
brian	c3843b2cca	Account for di_blocks allocations when IN_SPACECOUNTED is set in an inode's i_flag. It's possible that after ufs_infactive() calls softdep_releasefile(), i_nlink stays >0 for a considerable amount of time (> 60 seconds here). During this period, any ffs allocation routines that alter di_blocks must also account for the blocks in the filesystem's fs_pendingblocks value. This change fixes an eventual df/du discrepency that will happen as the result of fs_pendingblocks being reduced to <0. The only manifestation of this that people may recognise is the following message on boot: /somefs: update error: blocks -N files M at which point the negative pending block count is adjusted to zero. Reviewed by: tegge MFC after: 3 weeks	2007-02-23 20:23:35 +00:00
mckusick	96503737f7	The functions that set and delete external attributes must check that the filesystem is not mounted read-only before proceeding. Reported by: Ryan Beasley <ryanb@FreeBSD.org> MFC after: 1 week	2007-02-21 08:50:06 +00:00
rwatson	d298e8c0c2	Rename three quota privileges from the UFS privilege namespace to the VFS privilege namespace: exceedquota, getquota, and setquota. Leave UFS-specific quota configuration privileges in the UFS name space. This renumbers VFS and UFS privileges, so requires rebuilding modules if you are using security policies aware of privilege identifiers. This is likely no one at this point since none of the committed MAC policies use the privilege checks.	2007-02-19 13:33:10 +00:00
rwatson	58e926bc94	Limit quota privileges in jail to PRIV_UFS_GETQUOTA and PRIV_UFS_SETQUOTA.	2007-02-19 13:26:39 +00:00
mckusick	368de56c4b	This README file is obsolete. The cited problems were fixed long ago and the code is installed by default so no longer requires action by the administrator to be included.	2007-02-17 08:25:43 +00:00
pjd	cb2d7c85a8	Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs	2007-02-15 22:08:35 +00:00
kib	80c0e5fb98	Style(9).	2007-02-15 09:24:58 +00:00
kib	08a6b49351	Remove not needed acquision of the mount interlock aroung reading of mnt_kern_flags in ufs_itimes(). Suggested by: ssouhlal Confirmed by: tegge MFC after: 2 weeks	2007-02-08 09:47:19 +00:00
tegge	06da132002	Call pbgetvp() and pbrelvp() instead of setting b_vp directly. PR: kern/108151	2007-02-04 23:42:02 +00:00
mpp	2ba513e47f	If quotacheck or edquota reset the block or inode grace time for a user or group, when the kernel first sees this, it will update the grace time value. However, it never flags the quota as modified and the updated value never makes it to the quota data file unless the user actually makes some other change that would write the data out. Fixed to flag the quota as modified if the soft limit has actually been reached and should be now enforced.	2007-02-04 06:46:57 +00:00
mpp	261dbc8078	Prevent quotactl calls that pass in an id of -1 from incorrectly using the callers UID instead of the GID when performing group operations. This could allow users to determine group quota information for groups they are not a member of in some cases. Rename the "uid" parameter in ufs_quotactl to "id" to better show that it is used for more than just the uid, and to be more in line with the naming conventions in the other quota routines. PR: kern/33940	2007-02-01 02:13:53 +00:00
mpp	3cdb06d461	Disallow negative UIDs when processing quotactl options.	2007-02-01 01:01:56 +00:00
kib	fdd50404d1	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)	2007-01-23 10:01:19 +00:00
delphij	49f7e5db02	Fix build. chkdquot() should not return anything.	2007-01-20 13:54:28 +00:00
mpp	0f6ed07b89	Quota system cleanup. 1) Do not do quota accounting for the actual quota data files or for file system snapshot files ("system" files). This prevents a deadlock descibed in PR kern/30958 if the kernel ever has to grow the quota file. Snapshot files were already exempt from the quota checks, but this change generalized the check. 2) Fix a cast that caused extremely large uids/gids to incorrectly write the quota information to the data file at a truncated value for a uint_t32 id value. The incorrect cast caused quota files in this case to be around 4GB in size, with the correct cast they can now be 131GB in size. Also related to PR kern/30958. 3) Check for what appear to be negative UIDs/GIDs and not account for them. This prevents the quota files from becoming 131GB in size and causing quotacheck to run forever at bootup. This could also cause the kernel to try and expand the quota file, which might deadlock due to the issue in #1. kern/30958 and kern/38156 (and some much older closed PR's). 4) With the deadlock problems gone, the kernel can now expand the size of the quota database files if it needs to. 5) Pass in the i-node count change value to chkiq and chkiqchg as an int, like it used to be before the common routine was split up into 2 different routines to increase / decrease the i-node in-use count. Prevents an underflow on the i-node count. Related to PR kern/89247. 6) Prevent the block usage from growing slowly if a file system is full and the write was denied due to that fact. PR kern/89247. Some of these changes require an updated quotacheck to prevent the creation of huge (131GB) quota data files (item #3). #1/#4 probably fixes a lot of the random hangs when quotas are enabled, possibly some of the jail hangs.	2007-01-20 11:58:32 +00:00
mpp	be61542c54	Fix a spelling error. heirarchy -> hierarchy. Obtained from: OpenBSD	2007-01-16 19:40:25 +00:00
mpp	5c20abfae9	Fix a spelling error in some comments. heirarchy -> hierarchy. Obtained from: OpenBSD	2007-01-16 19:35:43 +00:00
rwatson	7a85fd85df	Canonicalize copyright: use a date range rather than comma-delimited list. MFC after: 3 days	2007-01-08 17:55:32 +00:00
kmacy	0c00ea16db	change vop_lock handling to allowing tracking of callers' file and line for acquisition of lockmgr locks Approved by: scottl (standing in for mentor rwatson)	2006-11-13 05:51:22 +00:00
rwatson	10d0d9cf47	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
kib	1776bc8845	Aquire Giant in the softdep_flush for clear_remove() and clear_inodedeps() processing when QUOTA is set. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 3 days	2006-11-01 13:48:44 +00:00
pjd	036e929548	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl	2006-10-31 21:48:54 +00:00

1 2 3 4 5 ...

1687 Commits