in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.
This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.
While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.
NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.
Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
a resource leak. Move the resource deallocation code from fifo_close()
to a new function, fifo_cleanup(), and call fifo_cleanup() from
fifo_close() and the appropriate places in fifo_open().
Tested by: Lukas Ertl
Pointy hat to: truckman
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.
Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.
Discussed with: jeff
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.
The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.
This fixes the panic on shutdown people have reported.
pick up the DEVFS inode number from the dev_t and find our directory
entry from that, we don't need to scan the directory to find it.
This also solves an issue with on-demand devices in subdirectories.
Submitted by: cognet
passes the fdidx from VOP_OPEN down.
This is for all I know the final API for this functionality, but
the locking semantics for messing with the filedescriptor from
the device driver are not settled at this time.
stack trace supplied by phk, I now understand what's going on here. The
check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been
partially torn down in vclean(). It is not clear that this would cause
a problem. Document this in nfs_bio.c, which is where the other two
filesystems copied this code from.
validating the offset within a given memory buffer before handing the
real work off to uiomove(9).
Use uiomove_frombuf in procfs to correct several issues with
integer arithmetic that could result in underflows/overflows. As a
side-effect, the code is significantly simplified.
Add additional sanity checks when computing a memory allocation size
in pfs_read.
Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-)
Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)
file for vnode mappings. Note that this uses vn_fullpath() and may
be somewhat unreliable, although not too unreliable for shared
libraries. For non-vnode mappings, just print "-" for the field.
Obtained from: TrustedBSD Projects
Sponsored by: DARPA, AFRL, Network Associates Laboratories
struct msdosfsmount so that this file has the same prerequisites as
it used to. The new prerequistite was a meta-style bug. It required
many style bugs (unsorted includes ...) elsewhere.
Formatted prototypes in KNF. Resisted urge to sort all the prototypes,
to minimise differences with NetBSD. (NetBSD has reformatted the
prototypes but has not sorted them and still uses __P(()).)
are allowed by Windows (ref: MS KB article 120138).
XXX From my reading of the CIFS specification, it's not clear that
clients need to validate filenames at all.
PR: 57123
Submitted by: Paul Coucher
MFC after: 1 month
sufficient to guarantee that this race is not hit. The XLOCK will likely
have to be redesigned due to the way reference counting and mutexes work
in FreeBSD. We currently can not be guaranteed that xlock was not set
and cleared while we were blocked on the interlock while waiting to check
for XLOCK. This would lead us to reference a vnode which was not the
vnode we requested.
- Add a backtrace() call inside of INVARIANTS in the hopes of finding out if
this condition is ever hit. It should not, since we should be retaining
a reference to the vnode in these cases. The reference would be sufficient
to block recycling.
FIDs to be 128-bits wide and adds support for realms.
Add a new CODA_COMPAT_5 option, which requests support for the old
Coda 5.x interface instead of the new one.
Create a new coda5.ko module that supports the 5.x interface, and make
the existing coda.ko module use the new 6.x interface. These modules
cannot both be loaded at the same time.
Obtained from: Jan Harkes & the coda-6.0.2 distribution,
NetBSD (drochner) (CODA_COMPAT_5 option).
32K pages are selected. In spec_getpages() change the printf format
specifier and add an explicit cast so that we always print the field
as a long type.
also fixes pfs_access() since it relies on VOP_GETATTR() which will call
pfs_getattr(). This prevents jailed processes from discovering the
existence, start time and ownership of processes outside the jail.
PR: kern/48156
directories. Previously, pfs_iterate() would return -1 when it
reached the end of the process list while processing a process
directory node, even if the parent directory contained further nodes
(which is the case for the linprocfs root directory, where the process
directory node is actually first in the list). With this patch,
pfs_iterate() will continue to traverse the parent directory's node
list after exhausting the process list (as was the intention all
along). The code should hopefully be easier to read as well.
While I'm here, have pfs_iterate() assert that the allproc lock is
held.
masks for files and directories. This should make some
of the Midnight Commander users happy.
Remove an extra ')' in the manual page.
PR: 35699
Submitted by: Eugene Grosbein <eugen@grosbein.pp.ru> (original version)
Tested by: simon
contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
in ntfs_writentvattr_plain and ntfs_readntvattr_plain, and purge the boot
block from the buffer cache if isn't exactly one cluster long. These two
changes work around the same buffer cache bug that ntfs_subr.c 1.30 tried
to, but in a different way. This may decrease throughput by reading smaller
amounts of data from the disk at a time, but may increase it by avoiding
bogus writes of clean buffers.
Problem (re)reported by Karel J. Bosschaart on -current.
the user requests a read-only mount. This is necessary because we
don't do the VOP_OPEN again if they upgrade a read-only mount to
read-write.
Fixes lockup when creating files on msdosfs mounts that have been
mounted read-only then upgraded to read-write. The exact cause of
the lockup is not known, but it is likely to be the kernel getting
stuck in an infinite loop trying to write dirty buffers to a device
without write permission.
Reported/tested by andreas, discussed with phk.
an MSDOSFS file system either failed, silently corrupted the file, or
sometimes corrupted the neighboring file.
PR: 53695
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my> (original version)
MFC: 3 days
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.
By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.
At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
- Avoid calling bread() with different sizes on the same blkno.
Although the buffer cache is designed to handle differing size
buffers, it erroneously tries to write the incorrectly-sized buffer
buffer back to disk before reading the correctly-sized one, even
when it's not dirty. This behaviour caused a panic for read-only
NTFS mounts when INVARIANTS was enabled ("bundirty: buffer x still
on queue y"), reported by NAKAJI Hiroyuki.
- Fix a bug in the code handling holes: a variable was incremented
instead of decremented, which could cause an infinite loop.
smbfs_close(). This fixes paging to and from mmap()'d regions of smbfs
files after the descriptor has been closed, and makes thttpd, GNU ld,
and perhaps more things work that depend on being able to do this.
PR: 48291
- Emulate lock draining (LK_DRAIN) in null_lock() to avoid deadlocks
when the vnode is being recycled.
- Don't allow null_nodeget() to return a nullfs vnode from the wrong
mount when multiple nullfs's are mounted. It's unclear why these checks
were removed in null_subr.c 1.35, but they are definitely necessary.
Without the checks, trying to unmount a nullfs mount will erroneously
return EBUSY, and forcibly unmounting with -f will cause a panic.
- Bump LOG2_SIZEVNODE up to 8, since vnodes are >256 bytes now. The old
value (7) didn't cause any problems, but made the hash algorithm
suboptimal.
These changes fix nullfs enough that a parallel buildworld succeeds.
Submitted by: tegge (partially; LK_DRAIN)
Tested by: kris
resource deallocation back to fifo_close(). This eliminates any
stale data that might be stuck in the socket buffers after all the
readers and writers have closed the fifo.
Tested by: Thorsten Schroeder <ths@katjusha.de>
directory vnodes use to refer to their constituent vnodes, into
union_dircache_free(). Also s/union_dircache/union_dircache_get/ and
tweak the structure of union_dircache_r().
MFC after: 3 days
in smb_fphelp(): the parent vnode may have already been recycled
since we don't hold a reference to it. Fixes a panic when rebooting
with mdconfig -t vnode devices referring to vnodes on a smbfs mount.
been tested extensively, but -CURRENT testing has been hampered by a
number of panics that also occur without the patch. Since the
destabilizing changes between 4.X and 5.X are external to unionfs,
I believe this patch applies equally well to both.
Thanks to scrappy for assistance testing these and other changes.
MFC after: 4 days
Restructure the error handling portion of the resource allocation
code to eliminate duplicated code.
Test for the O_NONBLOCK && fi_readers == 0 case before incrementing
fi_writers and modifying the the socket flag to avoid having to
undo these operations in this error case.
Restructure and simplify the code that handles blocking opens.
There should be no change to functionality.
Sleep on the vnode interlock while waiting for another
caller to increment fi_readers or fi_writers. Hold the
vnode interlock while incrementing fi_readers or fi_writers
to prevent a wakeup from being missed.
Only access fi_readers and fi_writers while holding the vnode
lock. Previously fifo_close() decremented their values without
holding a lock.
Move resource deallocation from fifo_close() to fifo_inactive(),
which allows the VOP_CLOSE() call in the error return path in
fifo_open() to be removed. Fifo_open() was calling VOP_CLOSE()
with the vnode lock held, in violation the current vnode locking
API. Also the way fifo_close() used vrefcnt() to decide whether
to deallocate resources was bogus according to comments in the
vrefcnt() implementation.
Reviewed by: bde
entering sys_process.c debugging primitives, or we violate assertions.
Also, be more careful about releasing the process lock around calls
to uiomove() which may sleep waiting for paging machinations or
related notions. We may want to defer the uiomove() in at least
one case, but jhb will look into that at a later date.
Reported by: Philippe Charnier <charnier@xp11.frmug.org>
Reviewed by: jhb
uptime. Where necessary, convert it back to Unix time by adding boottime
to it. This fixes a potential problem in the accounting code, which would
compute the elapsed time incorrectly if the Unix time was stepped during
the lifetime of the process.
race where a thread could assume that a process was swapped in by
PHOLD() when it actually wasn't fully swapped in yet.
- In faultin(), always msleep() if PS_SWAPPINGIN is set instead of doing
this check after bumping p_lock in the PS_INMEM == 0 case. Also,
sched_lock is only needed for setting and clearning swapping PS_*
flags and the swap thread inhibitor.
- Don't set and clear the thread swap inhibitor in the same loops as the
pmap_swapin/out_thread() since we have to do it under sched_lock.
Instead, mimic the treatment of the PS_INMEM flag and use separate loops
to set the inhibitors when clearing PS_INMEM and clear the inhibitors
when setting PS_INMEM.
- swapout() now returns with the proc lock held as it holds the lock
while adjusting the swapping-related PS_* flags so that the proc lock
can be used to test those flags.
- Only use the proc lock to check the swapping-related PS_* flags in
several places.
- faultin() no longer requires sched_lock to be held by callers.
- Rename PS_SWAPPING to PS_SWAPPINGOUT to be less ambiguous now that we
have PS_SWAPPINGIN.
printed out needs a prefix such as when a thread is blocked on a lock.
- Use another local variable to close another race for the td_wmesg and
td_wchan members of struct thread.
on my system where I preload msdosfs and have it in my kernel.
There's likely another bug that's causing msdosfs_init() to be called
multiple times, but this makes that harmless.
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
flexible process_fork, process_exec, and process_exit eventhandlers. This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.
Reviewed by: phk, jake, almost complete silence on arch@
fifo_open() waiting for another reader or writer if one arrived and
departed while we were waiting (or a little earlier).
Rev.1.79 broke blocking opens of fifos by making them time out after 1
second. This was bad for at least apsfilter.
Tested by: "Simon 'corecode' Schubert" <corecode@corecode.ath.cx>,
Alexander Leidinger <Alexander@leidinger.net>,
phk
MFC after: 4 weeks
to avoid a "locking against myself" panic when udf_hashins() tries
to lock it again. Lock the vnode in udf_hashins() before adding it to
the hash bucket.
- Create a new function bdone() which sets B_DONE and calls wakup(bp). This
is suitable for use as b_iodone for buf consumers who are not going
through the buf cache.
- Create a new function bwait() which waits for the buf to be done at a set
priority and with a specific wmesg.
- Replace several cases where the above functionality was implemented
without locking with the new functions.
closely what function is really doing. Update all existing consumers
to use the new name.
Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.
Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.
Reviewed by: jeff
occurs when mounting the filesystem. The problem is that venus issues
the mount() syscall, which calls vfs_mount(), which calls coda_root()
which attempts to communicate with venus.
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
flag to the initial BUF_LOCK(). This will eventually be used in cases
were we want to use a buffer only if it is not currently in use.
- Convert all consumers of the getblk() api to use this extra parameter.
Reviwed by: arch
Not objected to by: mckusick
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.
branches:
Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.
This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.
Approved by: re(scottl)
One of the vnodes is on different mount and is possibly on a different
kind of filesystem; treating it as an smbfs vnode then writing to it
will probably corrupt it.
PR: 48381
MFC after: 1 month
that is protected by the vnode lock.
- Move B_SCANNED into b_vflags and call it BV_SCANNED.
- Create a vop_stdfsync() modeled after spec's sync.
- Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some
fs specific processing. This gives all of these filesystems proper
behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag.
- Annotate the locking in buf.h