structure changes now rather then piecemeal later on. mnt_nvnodelist
currently holds all the vnodes under the mount point. This will eventually
be split into a 'dirty' and 'clean' list. This way we only break kld's once
rather then twice. nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.
that a buffer's b_blkno would be valid. This is true when vmiodirenable
is turned off because the B_MALLOC'd buffer's data is invalidated when
the buffer is destroyed. But when vmiodirenable is turned on a buffer
can be reconstituted from its VMIO backing store. The reconstituted buffer
will have no knowledge of the physical block translation and the result is
serious directory corruption of the CDROM.
The solution is to fix cd9660_blkatoff() to always BMAP the buffer if
b_lblkno == b_blkno.
MFC after: 0 days
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
refcount is > 1 and false (0) otherwise.
Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.
This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().
It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".
the target process was being held locked during the uiomove() call. If the
process calling readdir() was the same as the target process (for instance
'ls /proc/curproc/'), and uiomove() caused a page fault, the result would
be a proc lock recursion. I have no idea how long this has been broken -
possibly ever since pfind() was changed to lock the process it returns.
Also replace the one and only call to procfs_findtextvp() with a direct
test of td->td_proc->p_textvp.
YA pseudofs megacommit, part 2:
- Merge the pfs_vnode and pfs_vdata structures, and make the vnode cache
a doubly-linked list. This eliminates the need to walk the list in
pfs_vncache_free().
- Add an exit callout which revokes vnodes associated with the process
that just exited. Since it needs to lock the cache when it does this,
pfs_vncache_mutex needs MTX_RECURSE.
- Add a third callback to the pfs_node structure. This one simply returns
non-zero if the specified requesting process is allowed to access the
specified node for the specified target process. This is used in
addition to the usual permission checks, e.g. when certain files don't
make sense for certain (system) processes.
- Make sure that pfs_lookup() and pfs_readdir() don't yap about files
which aren't pfs_visible(). Also check pfs_visible() before performing
reads and writes, to prevent the kind of races reported in SA-00:77 and
SA-01:55 (fork a child, open /proc/child/ctl, have that child fork a
setuid binary, and assume control of it).
- Add some more trace points.
- Rearrange the flag constants a little to simplify specifying and testing
for readability and writeability.
pseudofs_vnops.c:
- Track the aforementioned change.
- Add checks to pfs_open() to prevent opening read-only files for writing
or vice versa (pfs_{read,write} would block the actual reads and writes,
but it's still a bug to allow the open() to succeed). Also, return
EOPNOTSUPP if the caller attempts to lock the file.
- Add more trace points.
- Remove hardcoded uid, gid, mode from struct pfs_node; make pfs_getattr()
smart enough to get it right most of the time, and allow for callbacks
to handle the remaining cases. Rework the definition macros to match.
- Add lots of (conditional) debugging output.
- Fix a long-standing bug inherited from procfs: don't pretend to be a
read-only file system. Instead, return EOPNOTSUPP for operations we
truly can't support and allow others to fail silently. In particular,
pfs_lookup() now treats CREATE as LOOKUP. This may need more work.
- In pfs_lookup(), if the parent node is process-dependent, check that
the process in question still exists.
- Implement pfs_open() - its only current function is to check that the
process opening the file can see the process it belongs to.
- Finish adding support for writeable nodes.
- Bump module version number.
- Introduce lots of new bugs.
- Use some simple #define's at the top of the files for proc -> thread
changes instead of having lots of needless #ifdef's in the code.
- Don't try to use struct thread in !FreeBSD code.
- Don't use a few struct lwp's in some of the NetBSD code since it isn't
in their HEAD.
The new diff relative to before KSE is now signficantly smaller and easier
to maintain.
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
linux_getdents uses VOP_READDIR( ..., &ncookies, &cookies ) instead of
VOP_READDIR( ..., NULL, NULL ) because it seems to need the offsets for
linux_dirent and sizeof(dirent) != sizeof(linux_dirent)...
PR: 29467
Submitted by: Michael Reifenberger <root@nihil.plaut.de>
Reviewed by: phk
as there are now "unusual" protection properties to Pmem that differ
from the other files. While I'm at it, introduce proc locking for
the other files, which was previously present only in the Pmem case.
Obtained from: TrustedBSD Project
and so special-casing was introduced to provide extra procfs privilege
to the kmem group. With the advent of non-setgid kmem ps, this code
is no longer required, and in fact, can is potentially harmful as it
allocates privilege to a gid that is increasingly less meaningful.
Knowledge of specific gid's in kernel is also generally bad precedent,
as the kernel security policy doesn't distinguish gid's specifically,
only uid 0.
This commit removes reference to kmem in procfs, both in terms of
access control decisions, and the applying of gid kmem to the
/proc/*/mem file, simplifying the associated code considerably.
Processes are still permitted to access the mem file based on
the debugging policy, so ps -e still works fine for normal
processes and use.
Reviewed by: tmm
Obtained from: TrustedBSD Project
The p_can(...) construct was a premature (and, it turns out,
awkward) abstraction. The individual calls to p_canxxx() better
reflect differences between the inter-process authorization checks,
such as differing checks based on the type of signal. This has
a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
invocation of p_candebug(), while maintaining the special case
check of KTR_ROOT. This allows ktrace() to "play more nicely"
with new mandatory access control schemes, as well as making its
authorization checks consistent with other "debugging class"
checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
caller to determine if privilege was required for successful
evaluation of the access control check. This primitive is currently
unused, and as such, serves only to complicate the API.
Approved by: ({procfs,linprocfs} changes) des
Obtained from: TrustedBSD Project
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
Replace the a.out emulation of 'struct linker_set' with something
a little more flexible. <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.
The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).
The macros declare a strongly typed set. They return elements with the
type that you declare the set with, rather than a generic void *.
For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>). Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.
For a.out, we use the old linker_set struct.
NOTE: the item lists are no longer null terminated. This is why
the code impact is high in certain areas.
The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.
linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.
Reviewed by: eivind
not behaving correctly. Fix by attaching to the correct socket.
Also call so{rw}wakeup in addition to the fifo wakeup, so that any
kqfilters attached to the socket buffer get poked.
Only tun0 -> tun32767 may now be opened as struct ifnet's if_unit
is a short.
It's now possible to open /dev/tun and get a handle back for an available
tun device (use devname to find out what you got).
The implementation uses rman by popular demand (and against my judgement)
to track opened devices and uses the new dev_depends() to ensure that
all make_dev()d devices go away before the module is unloaded.
Reviewed by: phk
dev_t. The dev_depends(dev_t, dev_t) function is for tying them
to each other.
When destroy_dev() is called on a dev_t, all dev_t's depending
on it will also be destroyed (depth first order).
Rewrite the make_dev_alias() to use this dependency facility.
kern/subr_disk.c:
Make the disk mini-layer use dependencies to make sure all
relevant dev_t's are removed when the disk disappears.
Make the disk mini-layer precreate some magic sub devices
which the disk/slice/label code expects to be there.
kern/subr_disklabel.c:
Remove some now unneeded variables.
kern/subr_diskmbr.c:
Remove some ancient, commented out code.
kern/subr_diskslice.c:
Minor cleanup. Use name from dev_t instead of dsname()
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.
Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit