freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	48b05c3f82	Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho	2008-04-08 09:45:49 +00:00
Konstantin Belousov	57b4252e45	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Attilio Rao	81c794f998	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-25 18:45:57 +00:00
Attilio Rao	628f51d275	Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.	2008-02-24 16:38:58 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Pawel Jakub Dawidek	b4d7e2983c	Fix some locking cases where we ask for exclusively locked vnode, but we get shared locked vnode in instead when vfs.lookup_shared is set to 1. Discussed with: kib, kris Tested by: kris Approved by: re (kensmith)	2007-09-21 10:16:56 +00:00
Robert Watson	e1e8f51b85	Universally adopt most conventional spelling of acquire.	2007-05-27 20:50:23 +00:00
Robert Watson	5e3f7694b1	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
Robert Watson	e92d773fbc	Rather than ignoring any error return from getnewvnode() in nameiinit(), explicitly test and panic. This should not ever happen, but if it does, this is a preferred failure mode to a NULL pointer dereference in kernel. Coverity CID: 1716 Found with: Coverity Prevent(tm)	2007-03-31 16:08:50 +00:00
Konstantin Belousov	478a8db4ce	If both ISDOTDOT and NOCROSSMOUNT are set then lookup() might breaks out of the special handling for ".." and perform an ISDOTDOT VOP_LOOKUP() for a filesystem root vnode. Handle this case inside lookup(). Submitted by: tegge PR: 92785 MFC after: 1 week	2007-02-15 09:53:49 +00:00
Konstantin Belousov	7f92c4ee02	Below is slightly edited description of the LOR by Tor Egge: -------------------------- [Deadlock] is caused by a lock order reversal in vfs_lookup(), where [some] process is trying to lock a directory vnode, that is the parent directory of covered vnode) while holding an exclusive vnode lock on covering vnode. A simplified scenario: root fs var fs / A / (/var) D /var B /log (/var/log) E vfs lock C vfs lock F Within each file system, the lock order is clear: C->A->B and F->D->E When traversing across mounts, the system can choose between two lock orders, but everything must then follow that lock order: L1: C->A->B \| +->F->D->E L2: F->D->E \| +->C->A->B The lookup() process for namei("/var") mixes those two lock orders: VOP_LOOKUP() obtains B while A is held vfs_busy() obtains a shared lock on F while A and B are held (follows L1, violates L2) vput() releases lock on B VOP_UNLOCK() releases lock on A VFS_ROOT() obtains lock on D while shared lock on F is held vfs_unbusy() releases shared lock on F vn_lock() obtains lock on A while D is held (violates L1, follows L2) dounmount() follows L1 (B is locked while F is drained). Without unmount activity, vfs_busy() will always succeed without blocking and the deadlock isn't triggered (the system behaves as if L2 is followed). With unmount, you can get 4 processes in a deadlock: p1: holds D, want A (in lookup()) p2: holds shared lock on F, want D (in VFS_ROOT()) p3: holds B, want drain lock on F (in dounmount()) p4: holds A, want B (in VOP_LOOKUP()) You can have more than one instance of p2. The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode. - Tor Egge To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp is actually not used by the callers of namei. Thus, placeholder deadfs vnode vp_crossmp is introduced that is filled into ni_dvp. Idea by: ups Reviewed by: tegge, ups, jeff, rwatson (mac interaction) Tested by: Peter Holm MFC after: 2 weeks	2007-01-22 11:25:22 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Mohan Srinivasan	3c5b80d6c2	Fix for a potential bug caught by Coverity. Pointed out to me by Kris Kennaway.	2006-09-14 17:57:02 +00:00
Mohan Srinivasan	7d7d9e2242	Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@	2006-09-13 18:39:09 +00:00
Robert Watson	5111b5e180	Remove register, use ANSI function headers.	2006-08-05 21:40:59 +00:00
Robert Watson	12de451046	We now spell "inode" as "vnode" in the VFS layer, so update comment for new world order. MFC after: 3 days Pointed out by: mckusick	2006-08-05 21:08:47 +00:00
Kris Kennaway	cef31ff7d9	Lock giant when assigning ni_vp and keep vfslocked state valid. Committed for: jeff	2006-04-29 07:13:49 +00:00
Jeff Roberson	4b5b86816c	- Consistently track ni_dvp and ni_vp with dvfslocked and vfslocked rather than trying to optimize it into a single lock. This adds more calls to lock giant with non smpsafe filesystems but is the only way to reliably hold the correct lock. - Remove an invalid assert in the mountedhere case in lookup and fix the code to properly deal with the scenario. We can actually have a lookup that returns dp == dvp with mountedhere set with certain unmount races. Tested by: kris Reported by: kris/mohans	2006-04-28 00:59:48 +00:00
Jeff Roberson	fdf86b2dcd	- LK_RETRY means nothing when passed to VOP_LOCK. Call vn_lock instead. - Move the vn_lock of the dvp until after we've unbusied the filesystem to avoid a LOR with the mount point lock. - In the v_mountedhere while loop we acquire a new instance of giant each time through without releasing the first. This would cause us to leak Giant. Sponsored by: Isilon Systems, Inc.	2006-03-31 02:59:23 +00:00
Jeff Roberson	2f0bca553a	- Don't check v_mount for NULL to determine if a vnode has been recycled. Use the more appropriate VI_DOOMED flag instead. Sponsored by: Isilon Systems, Inc. MFC After: 1 week	2006-02-06 10:15:27 +00:00
Robert Watson	95fea57c65	Add AUDITVNODE[12] flags to namei(), which cause namei() to audit path and vnode attribute information for looked up vnodes during the lookup operation. This will allow consumers of namei() to specify that this information be added to the in-process audit record. Submitted by: wsalamon Obtained from: TrustedBSD Project	2006-02-05 15:42:01 +00:00
Jeff Roberson	9157b485f0	- Solve a problem where a vput could be called on an outgoing directory without Giant held. Do this by tracking the vfslocked state for the directory seperate from the child. This is only important in the case where we cross a mountpoint. Sponsored by: Isilon Systems, Inc. MFC After: 3 days	2006-02-01 09:34:32 +00:00
Don Lewis	1dd5fc0fde	Tweak previous vfs_lookup.c commit to return an EINVAL error from lookup() instead of EPERM when a DELETE or RENAME operation is attempted on "..". In kern_unlink(), remap EINVAL errors returned from namei() to EPERM to match existing (and POSIX required) behaviour. Discussed with: bde MFC after: 3 days	2006-01-22 19:37:02 +00:00
Don Lewis	bea7a8d75c	Return EPERM from lookup() if cn_nameiop is DELETE or RENAME and the last component of the path name is "..". This keeps VOP_LOOKUP() from locking vnodes in reverse order. Tested by: Denis Shaposhnikov <dsh AT vlink DOT ru> MFC after: 3 days	2006-01-21 19:57:56 +00:00
John Baldwin	e12560dd4b	Use correct VFS locking rather than unconditionally grabbing Giant around namei() calls in kern_alternate_path(). Reviewed by: csjp MFC after: 1 week	2005-09-21 19:49:42 +00:00
Christian S.J. Peron	68ff2a4397	Improve the MP safeness associated with the creation of symbolic links and the execution of ELF binaries. Two problems were found: 1) The link path wasn't tagged as being MP safe and thus was not properly protected. 2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was insufficiently protected. This commit makes the following changes: -Sets the MPSAFE flag in NDINIT for symbolic link paths -Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been picked up. -Drop in an assertion into vfs_lookup which ensures that if the MPSAFE flag is NOT set, that we have picked up giant. If not panic (if WITNESS compiled into the kernel). This should help us find conditions where vnode operations are in-sufficiently protected. This is a RELENG_6 candidate. Discussed with: jeff MFC after: 4 days	2005-09-15 15:03:48 +00:00
Alexander Kabaev	0c207975f2	Do not keep parent directory locked while calling VFS_ROOT to traverse mount points in lookup(). The lock can be dropped safely around VFS_ROOT because LOCKPARENT semantics with child and perent vnodes coming from different FSes does not really have any meaningful use. On the other hard, this prevents easily triggered deadlock on systems using automounter daemon.	2005-08-14 18:10:04 +00:00
Jeff Roberson	74a5123246	- Remove a debugging printf that slipped in. Spotted by: Peter Wemm	2005-04-13 23:36:28 +00:00
Jeff Roberson	18ef8344b4	- Further simplify lookup; Force all filesystems to relock in the DOTDOT case. There are bugs in some which didn't unlock in the ISDOTDOT case to begin with that need to be addressed seperately. This simplifies things anyway. - Fix relookup() to prevent it from vrele()'ing the dvp while the vp is locked. Catch up to other lookup changes. Sponsored by: Isilon Systems, Inc. Reported by: Peter Wemm	2005-04-13 10:57:13 +00:00
Jeff Roberson	d3b78f7337	- If we vrele() a dvp while the child is locked we can potentially deadlock when vrele() acquires the directory lock in the wrong order. Fix this via the following changes: - Keep the directory locked after VOP_LOOKUP() until we've determined what we're going to do with the child. This allows us to remove the complicated post LOOKUP code which determins whether we should lock or unlock the parent. This means we may have to vput() in the appropriate cases later, rather than doing an unsafe vrele. - in NDFREE() keep two flags to indicate whether we need to unlock vp or dvp. This allows us to vput rather than vrele in the appropriate cases without rechecking the flags. Move the code to handle dvp after we handle vp. - Remove some dead code from namei() that was the result of changes to VFS_LOCK_GIANT(). Sponsored by: Isilon Systems, Inc.	2005-04-09 11:53:16 +00:00
Jeff Roberson	2bbd6c9818	- Move NDFREE() from vfs_subr to vfs_lookup where namei() is.	2005-04-05 08:58:49 +00:00
Jeff Roberson	9a6bb8ad8f	- Include opt_vfs.h for LOOKUP_SHARED. - Control the behavior of shared lookups with the lookup_shared sysctl which has its default behavior set via the LOOKUP_SHARED option.	2005-04-03 23:50:20 +00:00
Jeff Roberson	99f3c87034	- Set cn_lkflags to LK_SHARED in the LOOKUP_SHARED case so that we only acquire shared locks on intermediate directories. - For the LASTCN, we may have to LK_UPGRADE the parent directory before we lookup the last component. - Acquire VFS_ROOT and dp locks based on the cn_lkflag. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:07:15 +00:00
Jeff Roberson	ea9aa09dd1	- Remove an unused variable from relookup(). - Assert that REMOVE, CREATE, and RENAME callers have WANTPARENT or LOCKPARENT set. You can't complete any of these operations without at least a reference to the parent. Many filesystems check for this case even though it isn't possible in the current system.	2005-03-28 13:56:56 +00:00
Jeff Roberson	1e38e08e76	- Get rid of PDIRUNLOCK, instead, we fixup the lock state immediately after calling VOP_LOOKUP(). Rather than having each filesystem check the LOCKPARENT flag, we simply check it once here and unlock as required. The only unusual case is ISDOTDOT, where we require an unlocked vnode on return. Relocking this vnode with the child locked is allowed since the child is actually its parent. - Add a few asserts for some unusual conditions that I do not believe can happen. These will later go away and turn into implementations for these conditions. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:24:50 +00:00
Jeff Roberson	d830f82824	- Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For now, all calls to VFS_ROOT() should still acquire exclusive locks. Sponsored by: Isilon Systems, Inc.	2005-03-24 07:31:38 +00:00
Jeff Roberson	ad09e57f41	- Clear LOCKSHARED if LOOKUP_SHARED is not enabled. This is not strictly necessary since we disable the shared locks in vfs_cache, but it is prefered that the option not leak out into filesystems when it is disabled. Sponsored by: Isilon Systems, Inc.	2005-03-24 06:02:37 +00:00
John Baldwin	76951d21d1	- Tweak kern_msgctl() to return a copy of the requested message queue id structure in the struct pointed to by the 3rd argument for IPC_STAT and get rid of the 4th argument. The old way returned a pointer into the kernel array that the calling function would then access afterwards without holding the appropriate locks and doing non-lock-safe things like copyout() with the data anyways. This change removes that unsafeness and resulting race conditions as well as simplifying the interface. - Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(), fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of code duplication for freebsd4 and netbsd compatability system calls. - Add a new lookup function kern_alternate_path() that looks up a filename under an alternate prefix and determines which filename should be used. This is basically a more general version of linux_emul_convpath() that can be shared by all the ABIs thus allowing for further reduction of code duplication.	2005-02-07 18:44:55 +00:00
Poul-Henning Kamp	dcff5b1440	Don't call VOP_CREATEVOBJECT(), it's the responsibility of the filesystem which owns the vnode.	2005-01-24 23:53:54 +00:00
Jeff Roberson	22a960a69c	- Acquire and release Giant as we enter and leave filesystems which require it. - Track the status of Giant with the nd flag HASGIANT. - Release giant on return of namei() callers are not marked MPSAFE as they already own giant. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:27:05 +00:00
Poul-Henning Kamp	e39db32ab0	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.	2005-01-13 12:25:19 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Poul-Henning Kamp	082d21222b	Make NAMEI_DIAGNOSTIC compile again and add a stragic vprint()	2004-12-03 12:15:39 +00:00
Robert Watson	7a36e1d6c7	Assert Giant in namei(). Bugs have been reported in which, following a sleep() call waking up in namei(), a later assertion triggers that Giant is not held. By asserting Giant at the start of namei(), we can know that if that assertion triggers, Giant is lost during the call to namei(), and not before.	2004-08-04 18:39:07 +00:00
Alfred Perlstein	f257b7a54b	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00

1 2 3

110 Commits