these in the main filesystems. This does not change the resulting code
but makes the source a little bit more grepable.
Sponsored by: DARPA and NAI Labs.
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.
Idea stolen from: BSD/OS
embedded into their file_entry descriptor. This is more for
correctness, since these files cannot be bmap'ed/mmap'ed anyways.
Enforce this restriction.
Submitted by: tes@sgi.com
kernel access control.
Teach devfs how to respond to pathconf() _POSIX_MAC_PRESENT queries,
allowing it to indicate to user processes that individual vnode labels
are available.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
kernel access control.
Modify procfs so that (when mounted multilabel) it exports process MAC
labels as the vnode labels of procfs vnodes associated with processes.
Approved by: des
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
kernel access control.
Modify pseudofs so that it can support synthetic file systems with
the multilabel flag set. In particular, implement vop_refreshlabel()
as pn_refreshlabel(). Implement pfs_refreshlabel() to invoke this,
and have it fall back to the mount label if the file system does
not implement pn_refreshlabel() for the node. Otherwise, permit
the file system to determine how the service is provided.
Approved by: des
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
kernel access control.
Instrument devfs to support per-dirent MAC labels. In particular,
invoke MAC framework when devfs directory entries are instantiated
due to make_dev() and related calls, and invoke the MAC framework
when vnodes are instantiated from these directory entries. Implement
vop_setlabel() for devfs, which pushes the label update into the
devfs directory entry for semi-persistant store. This permits the MAC
framework to assign labels to devices and directories as they are
instantiated, and export access control information via devfs vnodes.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
except for the fact tha they are presently swapped out. Also add a process
flag to indicate that the process has started the struggle to swap
back in. This will be needed for the case where multiple threads
start the swapin action top a collision. Also add code to stop
a process fropm being swapped out if one of the threads in this
process is actually off running on another CPU.. that might hurt...
Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
ruleset. If we do, that means there's a ruleset loop (10 includes 20
include 30 includes 10), which will quickly cause a double fault due
to stack overflow (since "include" is implemented by recursion).
(Previously, we only checked that X didn't include X.)
administrator to define certain properties of new devfs nodes before
they become visible to the userland. Both static (e.g., /dev/speaker)
and dynamic (e.g., /dev/bpf*, some removable devices) nodes are
supported. Each DEVFS mount may have a different ruleset assigned to
it, permitting different policies to be implemented for things like
jails.
Approved by: phk
- Initialize lock structure in vncache_alloc
- Return locked vnodes from vncache_alloc
- Setup vnode op vectors to use default lock, unlock, and islocked
- Implement simple locking scheme required for lookup
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
adding vnode to hash. The fix is to use atomic hash-lookup-and-add-if-
not-found operation. The odd thing is that this race can't happen
actually because the lowervp vnode is locked exclusively now during the
whole process of null node creation. This must be thought as a step
toward shared lookups.
Also remove vp->v_mount checks when looking for a match in the hash,
as this is the vestige.
Also add comments and cosmetic changes.
byte offset of the directory entry for the inode number for all types
of files except directories, although this breaks hard links for
non-directories even if it doesn't cause overflow. Just ignore this
broken inode number for stat() and readdir() and return a less broken
one (the block offset of the file), so that applications normally can't
see the brokenness.
This leaves at least the following brokenness:
- extra inodes, vnodes and caching for hard links.
- various overflow bugs. cd9660 supports 64-bit block numbers, but we
silently ignore the top 32 bits in isonum_733() and then drop another
10 bits for our broken inode numbers. We may also have sign extension
bugs from storing 32-bit extents in ints and longs even if ints are
32-bits. These bugs affect DVDs. mkisofs apparently limits them
by writing directory entries first.
Inode numbers were broken mainly in 4.4BSD-Lite2. FreeBSD-1.1.5 seems
to have a correct implementation modulo the overflow bugs. We need
to look up directory entries from inodes for symlinks only. FreeBSD-1.1.5
use separate fields (iso_parent_extent, iso_parent) to point to the
directory entry. 4.4BSD-Lite doesn't have these, and abuses i_ino to
point to the directory entry. Correct pointers are impossible for
hard links, but symlinks can't be hard links.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.
o Determine the lock strategy for each members in struct socket.
o Lock down the following members:
- so_count
- so_options
- so_linger
- so_state
o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:
- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()
Reviewed by: alfred
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.
make sure it's a correct operation for devfs, do it only in the
ISLASTCN case. If we don't, we are assuming that the final file will
be in devfs, which is not true if another partition is mounted on top
of devfs or with special filenames (like /dev/net/../../foo).
Reviewed by: phk
the block to read and copy out. This removes the hack in
udf_readatoffset() for only reading one block at a time. WooHoo!
Remove a redundant test for fragmented fids in both udf_readdir()
and udf_lookup(). Add comment to both as to why the test is
written the way it is. Add a few more safety checks for brelse().
Thanks to Timothy Shimmin <tes@boing.melbourne.sgi.com> for pointing
out these problems.
Requested by: bde
Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.
While I am here, sort include files alphabetically, where possible.
of a process pointer.
- Move the p_candebug() at the start of procfs_control() a bit to make
locking feasible. We still perform the access check before doing
anything, we just now perform it after acquiring locks.
- Don't lock the sched_lock for TRACE_WAIT_P() and when checking to see if
p_stat is SSTOP. We lock the process while setting p_stat to SSTOP
so locking the process is sufficient to do a read to see if p_stat is
SSTOP or not.
Setting of timestamps on devices had no effect visible to userland
because timestamps for devices were set in places that are never used.
This broke:
- update of file change time after a change of an attribute
- setting of file access and modification times.
The VA_UTIMES_NULL case did not work. Revs 1.31-1.32 were supposed to
fix this by copying correct bits from ufs, but had little or no effect
because the old checks were not removed.
files. We didn't clear the update marks when we set the times, so
some of the settings were sometimes clobbered with the current time a
little later. This caused cp -p even by root to almost always fail
to preserve any times despite not reporting any errors in attempting
to preserve them.
Don't forget to set the archive attribute when we set the read-only
attribute. We should only set the archive attribute if we actually
change something, but we mostly don't bother avoiding setting it
elsewhere, so don't bother here yet.
MFC after: 1 week
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.
Avoid locking in userret() in most of the remaining cases.
Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
provided the latter is nonzero. At this point, the former is a fairly
arbitrary default value (DFTPHYS), so changing it to any reasonable
value specified by the device driver is safe. Using the maximum of
these limits broke ffs clustered i/o for devices whose si_iosize_max
is < DFLTPHYS. Using the minimum would break device drivers' ability
to increase the active limit from DFTLPHYS up to MAXPHYS.
Copied the code for this and the associated (unnecessary?) fixup of
mp_iosize_max to all other filesystems that use clustering (ext2fs and
msdosfs). It was completely missing.
PR: 36309
MFC-after: 1 week
as it leaves the nullfs vnode allocated, but with no identity. The
effect is that a null mount can slowly accumulate all the vnodes
in the system, reclaiming them only when it is unmounted. Thus
the null_inactive state instead accelerates the release of the
null vnode by calling vrecycle which will in turn call the
null_reclaim operator. The null_reclaim routine then does the
freeing actions previosuly (incorrectly) done in null_inactive.
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
New locks are:
- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.
Please refer to sys/proc.h for the coverage of these locks.
Changes on the pgrp/session interface:
- pgfind() needs the pgrpsess_lock held.
- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.
- Call enterthispgrp() in order to enter an existing pgrp.
- pgsignal() requires a pgrp lock held.
Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)