Commit Graph

110 Commits

Author SHA1 Message Date
kib
eb77b477b4 Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
    renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by:	rdivacky
Sponsored by:	Google Summer of Code 2007
Tested by:	pho
2008-04-08 09:45:49 +00:00
kib
eff8c6d35e Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
rwatson
877d7c65ba In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
attilio
4014b55830 Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by:	Andrea Barberio <insomniac at slackware dot it>
2008-02-25 18:45:57 +00:00
attilio
0d54671a48 Introduce some functions in the vnode locks namespace and in the ffs
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode

In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock

Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.
2008-02-24 16:38:58 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
rwatson
60570a92bf Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
pjd
763f7449dc Fix some locking cases where we ask for exclusively locked vnode, but we get
shared locked vnode in instead when vfs.lookup_shared is set to 1.

Discussed with:	kib, kris
Tested by:	kris
Approved by:	re (kensmith)
2007-09-21 10:16:56 +00:00
rwatson
79a2e40812 Universally adopt most conventional spelling of acquire. 2007-05-27 20:50:23 +00:00
rwatson
765a83fd79 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
rwatson
76104e0492 Rather than ignoring any error return from getnewvnode() in nameiinit(),
explicitly test and panic.  This should not ever happen, but if it does,
this is a preferred failure mode to a NULL pointer dereference in kernel.

Coverity CID:	1716
Found with:	Coverity Prevent(tm)
2007-03-31 16:08:50 +00:00
kib
b4f5200c2e If both ISDOTDOT and NOCROSSMOUNT are set then lookup() might breaks out
of the special handling for ".." and perform an ISDOTDOT VOP_LOOKUP()
for a filesystem root vnode. Handle this case inside lookup().

Submitted by:	tegge
PR:		92785
MFC after:	1 week
2007-02-15 09:53:49 +00:00
kib
79752b63e1 Below is slightly edited description of the LOR by Tor Egge:
--------------------------
[Deadlock] is caused by a lock order reversal in vfs_lookup(), where
[some] process is trying to lock a directory vnode, that is the parent
directory of covered vnode) while holding an exclusive vnode lock on
covering vnode.

A simplified scenario:

root fs					var fs
/    		A			/    (/var)	D
/var		B			/log (/var/log) E
vfs lock	C			vfs lock	F

Within each file system, the lock order is clear: C->A->B and F->D->E

When traversing across mounts, the system can choose between two lock orders,
but everything must then follow that lock order:

      L1: C->A->B
		|
	        +->F->D->E

      L2: F->D->E
	     |
             +->C->A->B

The lookup() process for namei("/var") mixes those two lock orders:

    VOP_LOOKUP() obtains B while A is held
    vfs_busy() obtains a shared lock on F while A and B are held (follows L1,
    violates L2)
    vput() releases lock on B
    VOP_UNLOCK() releases lock on A
    VFS_ROOT() obtains lock on D while shared lock on F is held
    vfs_unbusy() releases shared lock on F
    vn_lock() obtains lock on A while D is held (violates L1, follows L2)

dounmount() follows L1 (B is locked while F is drained).

Without unmount activity, vfs_busy() will always succeed without blocking
and the deadlock isn't triggered (the system behaves as if L2 is followed).

With unmount, you can get 4 processes in a deadlock:

     p1: holds D, want A (in lookup())
     p2: holds shared lock on F, want D (in VFS_ROOT())
     p3: holds B, want drain lock on F (in dounmount())
     p4: holds A, want B (in VOP_LOOKUP())

You can have more than one instance of p2.

The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and
MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs
servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode.

- Tor Egge

To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp
is actually not used by the callers of namei. Thus, placeholder deadfs
vnode vp_crossmp is introduced that is filled into ni_dvp.

Idea by:	ups
Reviewed by:	tegge, ups, jeff, rwatson (mac interaction)
Tested by:	Peter Holm
MFC after:	2 weeks
2007-01-22 11:25:22 +00:00
rwatson
7beaaf5cd2 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
mohans
a569229408 Fix for a potential bug caught by Coverity. Pointed out to me by Kris Kennaway. 2006-09-14 17:57:02 +00:00
mohans
21daa650a9 Fixes up the handling of shared vnode lock lookups in the NFS client,
adds a FS type specific flag indicating that the FS supports shared
vnode lock lookups, adds some logic in vfs_lookup.c to test this flag
and set lock flags appropriately.

- amd on 6.x is a non-starter (without this change). Using amd under
  heavy load results in a deadlock (with cascading vnode locks all the
  way to the root) very quickly.
- This change should also fix the more general problem of cascading
  vnode deadlocks when an NFS server goes down.

Ideally, we wouldn't need these changes, as enabling shared vnode lock
lookups globally would work. Unfortunately, UFS, for example isn't
ready for shared vnode lock lookups, crashing pretty quickly.

This change is the result of discussions with Stephan Uphoff (ups@).

Reviewed by:	ups@
2006-09-13 18:39:09 +00:00
rwatson
3a02c014e3 Remove register, use ANSI function headers. 2006-08-05 21:40:59 +00:00
rwatson
a6a5fadace We now spell "inode" as "vnode" in the VFS layer, so update comment
for new world order.

MFC after:	3 days
Pointed out by:	mckusick
2006-08-05 21:08:47 +00:00
kris
78ed66ae08 Lock giant when assigning ni_vp and keep vfslocked state valid.
Committed for:	jeff
2006-04-29 07:13:49 +00:00
jeff
3450f7fc51 - Consistently track ni_dvp and ni_vp with dvfslocked and vfslocked rather
than trying to optimize it into a single lock.  This adds more calls to
   lock giant with non smpsafe filesystems but is the only way to reliably
   hold the correct lock.
 - Remove an invalid assert in the mountedhere case in lookup and fix the
   code to properly deal with the scenario.  We can actually have a lookup
   that returns dp == dvp with mountedhere set with certain unmount races.

Tested by:	kris
Reported by:	kris/mohans
2006-04-28 00:59:48 +00:00
jeff
73f46586c6 - LK_RETRY means nothing when passed to VOP_LOCK. Call vn_lock instead.
- Move the vn_lock of the dvp until after we've unbusied the filesystem
   to avoid a LOR with the mount point lock.
 - In the v_mountedhere while loop we acquire a new instance of giant each
   time through without releasing the first.  This would cause us to leak
   Giant.

Sponsored by:	Isilon Systems, Inc.
2006-03-31 02:59:23 +00:00
jeff
05c6b40c0d - Don't check v_mount for NULL to determine if a vnode has been recycled.
Use the more appropriate VI_DOOMED flag instead.

Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-06 10:15:27 +00:00
rwatson
62220258e1 Add AUDITVNODE[12] flags to namei(), which cause namei() to audit path
and vnode attribute information for looked up vnodes during the lookup
operation.  This will allow consumers of namei() to specify that this
information be added to the in-process audit record.

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2006-02-05 15:42:01 +00:00
jeff
b2e03708cb - Solve a problem where a vput could be called on an outgoing directory
without Giant held.  Do this by tracking the vfslocked state for
   the directory seperate from the child.  This is only important
   in the case where we cross a mountpoint.

Sponsored by:	Isilon Systems, Inc.
MFC After:	3 days
2006-02-01 09:34:32 +00:00
truckman
7ef6769a30 Tweak previous vfs_lookup.c commit to return an EINVAL error from
lookup() instead of EPERM when a DELETE or RENAME operation is
attempted on "..".

In kern_unlink(), remap EINVAL errors returned from namei() to EPERM
to match existing (and POSIX required) behaviour.

Discussed with:	bde
MFC after:	3 days
2006-01-22 19:37:02 +00:00
truckman
60d9e55a8a Return EPERM from lookup() if cn_nameiop is DELETE or RENAME and
the last component of the path name is "..".  This keeps VOP_LOOKUP()
from locking vnodes in reverse order.

Tested by:	Denis Shaposhnikov <dsh AT vlink DOT ru>
MFC after:	3 days
2006-01-21 19:57:56 +00:00
jhb
02342146b6 Use correct VFS locking rather than unconditionally grabbing Giant around
namei() calls in kern_alternate_path().

Reviewed by:	csjp
MFC after:	1 week
2005-09-21 19:49:42 +00:00
csjp
f7f404fd08 Improve the MP safeness associated with the creation of symbolic
links and the execution of ELF binaries. Two problems were found:

1) The link path wasn't tagged as being MP safe and thus was not properly
   protected.
2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was
   insufficiently protected.

This commit makes the following changes:

-Sets the MPSAFE flag in NDINIT for symbolic link paths
-Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which
 will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been
 picked up.
-Drop in an assertion into vfs_lookup which ensures that if the MPSAFE
 flag is NOT set, that we have picked up giant. If not panic (if WITNESS
 compiled into the kernel). This should help us find conditions where vnode
 operations are in-sufficiently protected.

This is a RELENG_6 candidate.

Discussed with:	jeff
MFC after:	4 days
2005-09-15 15:03:48 +00:00
kan
5f5eca170a Do not keep parent directory locked while calling VFS_ROOT to traverse mount
points in lookup(). The lock can be dropped safely around VFS_ROOT because
LOCKPARENT semantics with child and perent vnodes coming from different FSes
does not really have any meaningful use. On the other hard, this prevents
easily triggered deadlock on systems using automounter daemon.
2005-08-14 18:10:04 +00:00
jeff
b38ffce1e7 - Remove a debugging printf that slipped in.
Spotted by:	Peter Wemm
2005-04-13 23:36:28 +00:00
jeff
ee55350db0 - Further simplify lookup; Force all filesystems to relock in the DOTDOT
case.  There are bugs in some which didn't unlock in the ISDOTDOT case
   to begin with that need to be addressed seperately.  This simplifies
   things anyway.
 - Fix relookup() to prevent it from vrele()'ing the dvp while the vp
   is locked.  Catch up to other lookup changes.

Sponsored by:	Isilon Systems, Inc.
Reported by:	Peter Wemm
2005-04-13 10:57:13 +00:00
jeff
46c812596a - If we vrele() a dvp while the child is locked we can potentially deadlock
when vrele() acquires the directory lock in the wrong order.  Fix this
   via the following changes:
 - Keep the directory locked after VOP_LOOKUP() until we've determined
   what we're going to do with the child.  This allows us to remove the
   complicated post LOOKUP code which determins whether we should lock or
   unlock the parent.  This means we may have to vput() in the appropriate
   cases later, rather than doing an unsafe vrele.
 - in NDFREE() keep two flags to indicate whether we need to unlock vp or
   dvp.  This allows us to vput rather than vrele in the appropriate
   cases without rechecking the flags.  Move the code to handle dvp after
   we handle vp.
 - Remove some dead code from namei() that was the result of changes to
   VFS_LOCK_GIANT().

Sponsored by:	Isilon Systems, Inc.
2005-04-09 11:53:16 +00:00
jeff
d42252c158 - Move NDFREE() from vfs_subr to vfs_lookup where namei() is. 2005-04-05 08:58:49 +00:00
jeff
d9338897d1 - Include opt_vfs.h for LOOKUP_SHARED.
- Control the behavior of shared lookups with the lookup_shared sysctl
   which has its default behavior set via the LOOKUP_SHARED option.
2005-04-03 23:50:20 +00:00
jeff
4a949bda2b - Set cn_lkflags to LK_SHARED in the LOOKUP_SHARED case so that we only
acquire shared locks on intermediate directories.
 - For the LASTCN, we may have to LK_UPGRADE the parent directory before
   we lookup the last component.
 - Acquire VFS_ROOT and dp locks based on the cn_lkflag.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:07:15 +00:00
jeff
a5324b331d - Remove an unused variable from relookup().
- Assert that REMOVE, CREATE, and RENAME callers have WANTPARENT
   or LOCKPARENT set.  You can't complete any of these operations without
   at least a reference to the parent.  Many filesystems check for this case
   even though it isn't possible in the current system.
2005-03-28 13:56:56 +00:00
jeff
9043d7b0a7 - Get rid of PDIRUNLOCK, instead, we fixup the lock state immediately after
calling VOP_LOOKUP().  Rather than having each filesystem check the
   LOCKPARENT flag, we simply check it once here and unlock as required.
   The only unusual case is ISDOTDOT, where we require an unlocked vnode
   on return.  Relocking this vnode with the child locked is allowed since
   the child is actually its parent.
 - Add a few asserts for some unusual conditions that I do not believe can
   happen.  These will later go away and turn into implementations for these
   conditions.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:24:50 +00:00
jeff
0210925e42 - Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For
now, all calls to VFS_ROOT() should still acquire exclusive locks.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:31:38 +00:00
jeff
9b04831cbb - Clear LOCKSHARED if LOOKUP_SHARED is not enabled. This is not strictly
necessary since we disable the shared locks in vfs_cache, but it is
   prefered that the option not leak out into filesystems when it is
   disabled.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:02:37 +00:00
jhb
71c05d27c0 - Tweak kern_msgctl() to return a copy of the requested message queue id
structure in the struct pointed to by the 3rd argument for IPC_STAT and
  get rid of the 4th argument.  The old way returned a pointer into the
  kernel array that the calling function would then access afterwards
  without holding the appropriate locks and doing non-lock-safe things like
  copyout() with the data anyways.  This change removes that unsafeness and
  resulting race conditions as well as simplifying the interface.
- Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(),
  fstatfs(), and fhstatfs().  Use these wrappers to cut out a lot of
  code duplication for freebsd4 and netbsd compatability system calls.
- Add a new lookup function kern_alternate_path() that looks up a filename
  under an alternate prefix and determines which filename should be used.
  This is basically a more general version of linux_emul_convpath() that
  can be shared by all the ABIs thus allowing for further reduction of
  code duplication.
2005-02-07 18:44:55 +00:00
phk
716e67e429 Don't call VOP_CREATEVOBJECT(), it's the responsibility of the
filesystem which owns the vnode.
2005-01-24 23:53:54 +00:00
jeff
e794a01e4e - Acquire and release Giant as we enter and leave filesystems which
require it.
 - Track the status of Giant with the nd flag HASGIANT.
 - Release giant on return of namei() callers are not marked MPSAFE as
   they already own giant.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:27:05 +00:00
phk
3760addae2 Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT()
directly.
2005-01-13 12:25:19 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
phk
410936c3f9 Make NAMEI_DIAGNOSTIC compile again and add a stragic vprint() 2004-12-03 12:15:39 +00:00
rwatson
5d6fea3b71 Assert Giant in namei(). Bugs have been reported in which, following
a sleep() call waking up in namei(), a later assertion triggers that
Giant is not held.  By asserting Giant at the start of namei(), we can
know that if that assertion triggers, Giant is lost during the call to
namei(), and not before.
2004-08-04 18:39:07 +00:00
alfred
8a1713aada Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
imp
74cf37bd00 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
obrien
3b8fff9e4c Use __FBSDID(). 2003-06-11 00:56:59 +00:00