Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.
Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.
Discussed with: jeff
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.
The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.
This fixes the panic on shutdown people have reported.
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory. The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.
Modify the calculation of minifree in the same way.
Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.
Reviewed by: mckusick
Tested by: kmarx@vicor.com
MFC after: 2 weeks
so make the code slightly more uniform. The vnode lock is acquired in
all cases and now the only difference between VCHR and other is we
call UFS_UPDATE instead of VOP_FSYNC().
- Slightly rewrite the fsync loop to be more lock friendly. We must
acquire the vnode interlock before dropping the mnt lock. We must
also check XLOCK to prevent vclean() races.
- Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean()
races.
- Use a local variable to store the results of the nvp == TAILQ_NEXT
test so that we do not access the vp after we've vrele()d it.
- Add an XXX comment about UFS_UPDATE() not being protected by any lock
here. I suspect that it should need the VOP lock.
we release the mntvnode_mtx.
- Call vgonel() directly instead of going through vrecycle() since we own
the interlock now.
- Remove a few cases where we locked the interlock just so that we could
call VOP_UNLOCK with interlock held.
mntvnode_mtx.
- Use a local variable to store the results of the test to see if the
next vnode on the mount list has changed. This is so that we no longer
acess the vnode after we vput() it.
in a list head instead of a pointer to the first element at the time of
the first call. These lists are subject to change, and getdirtybuf()
would refetch from the wrong list in some cases.
Spottedy by: tegge
Pointy hat to: me
bail out if the buffer is not already present.
- The buffer returned by incore() is not locked and should not be sent to
brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the
desired semantics.
caller to acquire it. This permits drain_output() to be done atomically
with other operations as well as reducing the number of lock operations.
- Assert that the proper locks are held in drain_output().
- Change getdirtybuf() to accept a mutex as an argument. This mutex is used
to protect the vnode's buf list and the BKGRDWAIT flag. This lock is
dropped when we successfully acquire a buffer and held on return
otherwise. These semantics reduce the number of cumbersome cases in
calling code.
- Pass the mtx from getdirtybuf() into interlocked_sleep() and allow this
mutex to be used as the interlock argument to BUF_LOCK() in the LOCKBUF
case of interlocked_sleep().
- Change the return value of getdirtybuf() to be the resulting locked buffer
or NULL otherwise. This is for callers who pass in a list head that
requires a lock. It is necessary since the lock that protects the list
head must be dropped in getdirtybuf() so that we don't have a lock order
reversal with the buf queues lock in bremfree().
- Adjust all callers of getdirtybuf() to match the new semantics.
- Add a comment in indir_trunc() that points at unlocked access to a buf.
This may also be one of the last instances of incore() in the tree.
- Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode
interlock.
- Don't use the B_LOCKED flag and QUEUE_LOCKED for background write
buffers. Check for the BKGRDINPROG flag before recycling or throwing
away a buffer. We do this instead because it is not safe for us to move
the original buffer to a new queue from the callback on the background
write buffer.
- Remove the B_LOCKED flag and the locked buffer queue. They are no longer
used.
- The vnode interlock is used around checks for BKGRDINPROG where it may
not be strictly necessary. If we hold the buf lock the a back-ground
write will not be started without our knowledge, one may only be
completed while we're not looking. Rather than remove the code, Document
two of the places where this extra locking is done. A pass should be
done to verify and minimize the locking later.
get the same value from ip->i_ump->um_devvp.
This saves a pointer in the memory copies of inodes, which can
easily run into several hundred kilobytes.
The extra indirection is unmeasurable in benchmarks.
Approved by: mckusick
generate the inode mode from a default ACL and creation mask,
implement ufs_sync_inode_from_acl() using acl_posix1e_newfilemode().
Since ACL_OVERRIDE_MASK/ACL_PRESERVE_MASK are defined, we no
longer need to explicitly pass in a "preserve_mask" field: this
is implicit in the use of POSIX.1e semantics.
Note: this change contains a semantic bugfix for new file creation:
we now intersect the ACL-generated mode and the cmode requested by
the user process. This means permissions on newly created file
objects will now be more conservative. In the future, we may want
to provide alternative semantics (similar to Solaris and Linux) in
which the ACL mask overrides the umask, permitting ACLs to broaden
the rights beyond the requested umask.
PR: 50148
Reported by: Ritz, Bruno <bruno_ritz@gmx.ch>
Obtained from: TrustedBSD Project
cases:
- Setting sticky bit on non-directory
- Setting setgid on a file with a group that isn't in the effective
or extended groups of the authorizing credential
I.e., test the requirement first, then do the privilege test,
rather than doing the privilege test regardless of the need for
privilege.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
kernel ACL interfaces and system call names.
Break out UFS2 and FFS extattr delete and list vnode operations from
setextattr and getextattr to deleteextattr and listextattr, which
cleans up the implementations, and makes the results more readable,
and makes the APIs more clear.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
UFS quota implementation. Push some quite broken access control
logic out of ufs_quotactl() into the individual command
implementations in ufs_quota.c; fix that logic. Pass in the thread
argument to any quotactl command that will need to perform access
control.
o quotaon() requires privilege (PRISON_ROOT).
o quotaoff() requires privilege (PRISON_ROOT).
o getquota() requires that:
If the type is USRQUOTA, either the effective uid match the
requested quota ID, that the unprivileged_get_quota flag be
set, or that the thread be privileged (PRISON_ROOT).
If the type is GRPQUOTA, require that either the thread be
a member of the group represented by the requested quota ID,
that the unprivileged_get_quota flag be set, or that the
thread be privileged (PRISON_ROOT).
o setquota() requires privilege (PRISON_ROOT).
o setuse() requires privilege (PRISON_ROOT).
o qsync() requires no special privilege (consistent with what
was present before, but probably not very useful).
Add a new sysctl, security.bsd.unprivileged_get_quota, which when
set to a non-zero value, will permit unprivileged users to query user
quotas with non-matching uids and gids. Set this to 0 by default
to be mostly consistent with the previous behavior (the same for
USRQUOTA, but not for GRPQUOTA).
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
attribute name of "" from ffs_getextattr(). Invoking VOP_GETETATTR()
with an empty name is now no longer supported; user application
compatibility is provided by a system call level compatibility
wrapper. We make sure to explicitly reject attempts to set an EA
with the name "".
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
extended attribute retrieval code: it's no longer special-cased,
and is caught by the normal UFS1 EA validity checks (and, in
fact, returns the same error, EINVAL).
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
if we permit them to occur, the kernel panics due to our performing
EA operations using VOP_STRATEGY on the vnode. This went unnoticed
previously because there are very for users of device nodes on UFS2
due to the introduction of devfs. However, this can come up with
the Linux compat directories and its hard-coded dev nodes (which will
need to go away as we move away from hard-coded device numbers).
This can come up if you use EA-intensive features such as ACLs and
MAC.
The proper fix is pretty complicated, but this band-aid would be
an excellent MFC candidate for the release.