mount(2) system call:
* Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
linprocfs). This (mostly) restores the ability to mount these
filesystems using the old mount(2) system call (see below for the
rest of the fix).
* Remove not-NULL check for the data argument from the mount(2) entry
point. Per the mount(2) man page, it is up to the individual
filesystem being mounted to verify data. Or, in the case of procfs,
etc. the filesystem is free to ignore the data parameter if it does
not use it. Enforcing data to be not-NULL in the mount(2) system call
entry point prevented passing NULL to filesystems which ignored the
data pointer value. Apparently, passing NULL was common practice
in such cases, as even our own mount_std(8) used to do it in the
pre-nmount(2) world.
All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change. One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
stop event earlier. After we have signalled that, we set P_WEXIT and
then wait for any processes with a hold on the vmspace via PHOLD to
release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is
invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
to zero.
- Change proc_rwmem() to require that the processing read from has its
vmspace held via PHOLD by the caller and get rid of all the junk to
screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
to clear an earlier single-step simualted via a breakpoint). We only
do one to avoid races. Also, by making the EINVAL error for unknown
requests be part of the default: case in the switch, the various
switch cases can now just break out to return which removes a _lot_ of
duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug
where a LWP ptrace command could return EINVAL with the proc lock still
held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
ptrace_clear_single_step() to always be called with the proc lock
held (it was a mixed bag previously). Alpha and arm have to drop
the lock while the mess around with breakpoints, but other archs
avoid extra lock release/acquires in ptrace(). I did have to fix a
couple of other consumers in kern_kse and a few other places to
hold the proc lock and PHOLD.
Tested by: ps (1 mostly, but some bits of 2-4 as well)
MFC after: 1 week
associated with the passed in pfs_node. If it does return a pointer, it
keeps the process locked. This allows a lot of places that were calling
pfind() again right after pfs_visible() to not have to do that and avoids
races since we don't drop the proc lock just to turn around and lock it
again. This will become more important with future changes to fix races
between procfs/ptrace and exit(2). Also, removed a duplicate pfs_visible()
call in pfs_getextattr().
Reviewed by: des
MFC after: 1 week
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.
Sponsored by: Isilon Systems, Inc.
as suggested by Matt's comment. Also fix some style and paranoia issues.
The entire function could benefit from review by a VFS guru.
MFC after: 6 weeks
Since we used an sbuf of size resid to accumulate dirents, we would end
up returning one byte short when we had enough dirents to fill or exceed
the size of the sbuf (the last byte being lost to bogus NUL termination)
causing the next call to return EINVAL due to an unaligned offset. This
went undetected for a long time because I did most of my testing in
single-user mode, where there are rarely enough processes to fill the
4096-byte buffer ls(1) uses. The most common symptom of this bug is that
tab completion of /proc or /compat/linux/proc does not work properly when
many processes are running.
Also, a check near the top would return EINVAL if resid was smaller than
PFS_DELEN, even if it was 0, which is frequently the case and perfectly
allowable. Change the test so that it returns 0 if resid is 0.
MFC after: 2 weeks
the filesystem. Check that rather than VI_XLOCK.
- VOP_INACTIVE should no longer drop the vnode lock.
- The vnode lock is required around calls to vrecycle() and vgone().
Sponsored by: Isilon Systems, Inc.
initializations but we did have lofty goals and big ideals.
Adjust to more contemporary circumstances and gain type checking.
Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.
Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.
Give coda correct prototypes and function definitions for
all vop_()s.
Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.
Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.
Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.
Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.
Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.
Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)
returning incompletely initialized processes. This problem was
eliminated by kern_proc.c:1.215, which causes pfind() not to
return processes in the PRS_NEW state.
check whether p_ucred is NULL or not in pfs_getattr() before
dereferencing the credential, and return ENOENT if there wasn't one.
This is a symptom of a larger problem, wherein pfind() can return
references to incompletely initialized processes, and we instead ought
to not return them, or check the process state before acting on the
process.
Reported by: kris
Discussed with: tjr, others
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.
s/vfs_mount/vfs_omount/
s/vfs_nmount/vfs_mount/
Name our filesystems mount function consistently.
Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.
Reorganize the root filesystem selection code.
for unknown events.
A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
one go before returning. This avoids calling uiomove() while holding
allproc_lock.
Don't adjust uio->uio_offset manually, uiomove() does that for us.
Don't drop allproc_lock before calling panic().
Suggested by: alfred
validating the offset within a given memory buffer before handing the
real work off to uiomove(9).
Use uiomove_frombuf in procfs to correct several issues with
integer arithmetic that could result in underflows/overflows. As a
side-effect, the code is significantly simplified.
Add additional sanity checks when computing a memory allocation size
in pfs_read.
Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-)
Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)
also fixes pfs_access() since it relies on VOP_GETATTR() which will call
pfs_getattr(). This prevents jailed processes from discovering the
existence, start time and ownership of processes outside the jail.
PR: kern/48156
directories. Previously, pfs_iterate() would return -1 when it
reached the end of the process list while processing a process
directory node, even if the parent directory contained further nodes
(which is the case for the linprocfs root directory, where the process
directory node is actually first in the list). With this patch,
pfs_iterate() will continue to traverse the parent directory's node
list after exhausting the process list (as was the intention all
along). The code should hopefully be easier to read as well.
While I'm here, have pfs_iterate() assert that the allproc lock is
held.
flexible process_fork, process_exec, and process_exit eventhandlers. This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.
Reviewed by: phk, jake, almost complete silence on arch@
closely what function is really doing. Update all existing consumers
to use the new name.
Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.
Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.
Reviewed by: jeff
kind of pseudofs-based filesystem. Fixes (at least) one problem where
when procfs is mounted mupltiple times, trying to unmount one will often
cause the wrong one to get unmounted, and other problem where mounting
one procfs on top of another caused the kernel to lock up.
Reviewed by: des
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.
Found by: src/tools/tools/vop_table
"refreshing" the label on the vnode before use, just get the label
right from inception. For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system. With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance. This
also corrects sematics for shared vnode locks, which were not
previously present in the system. This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form. With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception. We'll introduce a work around for this shortly.
Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories