links and the execution of ELF binaries. Two problems were found:
1) The link path wasn't tagged as being MP safe and thus was not properly
protected.
2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was
insufficiently protected.
This commit makes the following changes:
-Sets the MPSAFE flag in NDINIT for symbolic link paths
-Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which
will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been
picked up.
-Drop in an assertion into vfs_lookup which ensures that if the MPSAFE
flag is NOT set, that we have picked up giant. If not panic (if WITNESS
compiled into the kernel). This should help us find conditions where vnode
operations are in-sufficiently protected.
This is a RELENG_6 candidate.
Discussed with: jeff
MFC after: 4 days
points in lookup(). The lock can be dropped safely around VFS_ROOT because
LOCKPARENT semantics with child and perent vnodes coming from different FSes
does not really have any meaningful use. On the other hard, this prevents
easily triggered deadlock on systems using automounter daemon.
case. There are bugs in some which didn't unlock in the ISDOTDOT case
to begin with that need to be addressed seperately. This simplifies
things anyway.
- Fix relookup() to prevent it from vrele()'ing the dvp while the vp
is locked. Catch up to other lookup changes.
Sponsored by: Isilon Systems, Inc.
Reported by: Peter Wemm
when vrele() acquires the directory lock in the wrong order. Fix this
via the following changes:
- Keep the directory locked after VOP_LOOKUP() until we've determined
what we're going to do with the child. This allows us to remove the
complicated post LOOKUP code which determins whether we should lock or
unlock the parent. This means we may have to vput() in the appropriate
cases later, rather than doing an unsafe vrele.
- in NDFREE() keep two flags to indicate whether we need to unlock vp or
dvp. This allows us to vput rather than vrele in the appropriate
cases without rechecking the flags. Move the code to handle dvp after
we handle vp.
- Remove some dead code from namei() that was the result of changes to
VFS_LOCK_GIANT().
Sponsored by: Isilon Systems, Inc.
acquire shared locks on intermediate directories.
- For the LASTCN, we may have to LK_UPGRADE the parent directory before
we lookup the last component.
- Acquire VFS_ROOT and dp locks based on the cn_lkflag.
Sponsored by: Isilon Systems, Inc.
- Assert that REMOVE, CREATE, and RENAME callers have WANTPARENT
or LOCKPARENT set. You can't complete any of these operations without
at least a reference to the parent. Many filesystems check for this case
even though it isn't possible in the current system.
calling VOP_LOOKUP(). Rather than having each filesystem check the
LOCKPARENT flag, we simply check it once here and unlock as required.
The only unusual case is ISDOTDOT, where we require an unlocked vnode
on return. Relocking this vnode with the child locked is allowed since
the child is actually its parent.
- Add a few asserts for some unusual conditions that I do not believe can
happen. These will later go away and turn into implementations for these
conditions.
Sponsored by: Isilon Systems, Inc.
necessary since we disable the shared locks in vfs_cache, but it is
prefered that the option not leak out into filesystems when it is
disabled.
Sponsored by: Isilon Systems, Inc.
structure in the struct pointed to by the 3rd argument for IPC_STAT and
get rid of the 4th argument. The old way returned a pointer into the
kernel array that the calling function would then access afterwards
without holding the appropriate locks and doing non-lock-safe things like
copyout() with the data anyways. This change removes that unsafeness and
resulting race conditions as well as simplifying the interface.
- Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(),
fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of
code duplication for freebsd4 and netbsd compatability system calls.
- Add a new lookup function kern_alternate_path() that looks up a filename
under an alternate prefix and determines which filename should be used.
This is basically a more general version of linux_emul_convpath() that
can be shared by all the ABIs thus allowing for further reduction of
code duplication.
require it.
- Track the status of Giant with the nd flag HASGIANT.
- Release giant on return of namei() callers are not marked MPSAFE as
they already own giant.
Sponsored By: Isilon Systems, Inc.
a sleep() call waking up in namei(), a later assertion triggers that
Giant is not held. By asserting Giant at the start of namei(), we can
know that if that assertion triggers, Giant is lost during the call to
namei(), and not before.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
caller to indicate that MAC checks are not required for the lookup.
Similar to IO_NOMACCHECK for vn_rdwr(), this indicates that the caller
has already performed all required protections and that this is an
internally generated operation. This will be used by the NFS server
code, as we don't currently enforce MAC protections against requests
delivered via NFS.
While here, add NOCROSSMOUNT to PARAMASK; apparently this was used at
one point for name lookup flag checking, but isn't any longer or it
would have triggered from the NFS server code passing it to indicate
that mountpoints shouldn't be crossed in lookups.
Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.
Idea stolen from: BSD/OS
kernel access control.
Authorize vop_readlink() and vop_lookup() activities during recursive
path lookup via namei() via calls to appropriate MAC entry points.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
pnbuf to increase the chances of detecting use of a free'd name buffer
if SAVENAME or SAVESTART wasn't passed in. Curiously, running with these
changes doesn't panic the kernel, and should.
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and
adapting it for KSE.
Locks:
1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.
1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp);
/* increments reference count on a file */
struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */
struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
people are on track with the cause and effect of this, and although
fixing this severely degenerate case appears to violate the letter of
POSIX.1-200x, Bruce and I (and enough others) agree that it should be
comitted.
So, this patch generates an ENOENT error for any attempt to do a path lookup
through an empty symlink (e.g. open(), stat()).
Submitted by: "Andrey A. Chernov" <ache@nagual.pp.ru>
Reviewed by: bde
Discussed exhaustively on: freebsd-current
Previously committed to: NetBSD 4 years ago
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
recently discussed at -hackers. The problem is a null-pointer
dereference that happens in kern/vfs_lookup.c when accessing ".."
with a v_mount entry for the current directory vnode of NULL. This
happens when a volume is forcibly unmounted, and the vnode for a
working directory in the mounted volume is cleared.
PR: 23191
Submitted by: Thomas Moestl <tmoestl@gmx.net>
filesystem lookup() routine if it unlocks parent directory. This flag should
be carefully tracked by filesystems if they want to work properly with nullfs
and other stacked filesystems.
VFS takes advantage of this flag to perform symantically correct usage
of vrele() instead of vput() if parent directory already unlocked.
If filesystem fails to track this flag then previous codepath in VFS left
unchanged.
Convert UFS code to set PDIRUNLOCK flag if necessary. Other filesystmes will
be changed after some period of testing.
Reviewed in general by: mckusick, dillon, adrian
Obtained from: NetBSD
filesystem may hold the lock. Otherwise unavoidable deadlock will occur.
This shouldn't have any side effects as long as we hold vfs lock.
Obtained from: NetBSD