unconditional acquisition of Giant for ACL related operations. If the file
system is set as being MP safe and debug.mpsafevfs is 1, do not pickup
giant.
For any operations which require namei(9) lookups:
__acl_get_file
__acl_get_link
__acl_set_file
__acl_set_link
__acl_delete_file
__acl_delete_link
__acl_aclcheck_file
__acl_aclcheck_link
-Set the MPSAFE flag in NDINIT
-Initialize vfslocked variable using the NDHASGIANT macro
For functions which operate on fds, make sure the operations are locked:
__acl_get_fd
__acl_set_fd
__acl_delete_fd
__acl_aclcheck_fd
-Initialize vfslocked using VFS_LOCK_GIANT before we manipulate the vnode
Discussed with: jeff
any other non-sleepable lock. In plain English: Giant comes before all
other mutexes.
- Add some extra description to the lock order reversal printf's to indicate
when a reversal is triggered by a hard-coded implicit rule.
Requested by: truckman (2)
MFC after: 1 week
state where sleeping on a sleep queue is not allowed. The facility
doesn't support recursion but uses a simple private per-thread flag
(TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is
set and INVARIANTS is enabled.
- Use this new facility to replace the g_xup and g_xdown mutexes that were
(ab)used to achieve similar behavior.
- Disallow sleeping in interrupt threads when invoking interrupt handlers.
MFC after: 1 week
Reviewed by: phk
links and the execution of ELF binaries. Two problems were found:
1) The link path wasn't tagged as being MP safe and thus was not properly
protected.
2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was
insufficiently protected.
This commit makes the following changes:
-Sets the MPSAFE flag in NDINIT for symbolic link paths
-Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which
will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been
picked up.
-Drop in an assertion into vfs_lookup which ensures that if the MPSAFE
flag is NOT set, that we have picked up giant. If not panic (if WITNESS
compiled into the kernel). This should help us find conditions where vnode
operations are in-sufficiently protected.
This is a RELENG_6 candidate.
Discussed with: jeff
MFC after: 4 days
shutdown procedures (which have a duration of more than 120 seconds).
We have two user-space affecting shutdown timeouts: a "soft" one in
/etc/rc.shutdown and a "hard" one in init(8). The first one can be
configured via /etc/rc.conf variable "rcshutdown_timeout" and defaults
to 30 seconds. The second one was originally (in 1998) intended to be
configured via sysctl(8) variable "kern.shutdown_timeout" and defaults
to 120 seconds.
Unfortunately, the "kern.shutdown_timeout" was declared "unused" in 1999
(as it obviously is actually not used within the kernel itself) and
hence was intentionally but misleadingly removed in revision 1.107 from
init_main.c. Kernel sysctl(8) variables are certainly a wrong way to
control user-space processes in general, but in this particular case the
sysctl(8) variable should have remained as it supports init(8), which
isn't passed command line flags (which in turn could have been set via
/etc/rc.conf), etc.
As there is already a similar "kern.init_path" sysctl(8) variable which
directly affects init(8), resurrect the init(8) shutdown timeout under
sysctl(8) variable "kern.init_shutdown_timeout". But this time document
it as being intentionally unused within the kernel and used by init(8).
Also document it in the manpages init(8) and rc.conf(5).
Reviewed by: phk
MFC after: 2 weeks
struct bufs that are persistently held by ext2fs. Ignore any buffers
with this flag in the code in boot() that counts "busy" and dirty
buffers and attempts to sync the dirty buffers, which is done before
attempting to unmount all the file systems during shutdown.
This fixes the problem caused by any ext2fs file systems that are
mounted at system shutdown time, which caused boot() to give up on
a non-zero number of buffers and skip the call to vfs_unmountall().
This left all the mounted file systems in a dirty state and caused
them to all require cleanup by fsck on reboot.
Move the two separate copies of the "busy" buffer test in boot()
to a separate function.
Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro.
Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with
this and previous flag changes.
PR: kern/56675, kern/85163
Tested by: "Matthias Andree" matthias.andree at gmx.de
Reviewed by: bde
MFC after: 3 days
Also introduce an aclinit function which will be used to create the UMA zone
for use by file systems at system start up.
MFC after: 1 month
Discussed with: rwatson
instead. Detailed changelist:
o Add flags field to struct pollrec, to indicate that
are particular entry is being worked on.
o Define a macro PR_VALID() to check that a pollrec
is valid and pollable.
o Mark ISRs as mpsafe.
o ether_poll()
- Acquire poll_mtx while traversing pollrec array.
- Skip pollrecs, that are being worked on.
- Conditionally acquire Giant when entering handler.
o netisr_pollmore()
- Conditionally assert Giant.
- Acquire poll_mtx while working with statistics.
o netisr_poll()
- Conditionally assert Giant.
- Acquire poll_mtx while working with statistics
and traversing pollrec array.
o ether_poll_register(), ether_poll_deregister()
- Conditionally assert Giant.
- Acquire poll_mtx while working with pollrec array.
o poll_idle()
- Remove all strange manipulations with Giant.
In collaboration with: ru, pjd
In collaboration with: Oleg Bulyzhin <oleg rinet.ru>
In collaboration with: dima <_pppp mail.ru>
remaining % arguments because the varargs are now out of sync and
there is a risk that we might for instance dereference an integer
in a %s argument.
Sponsored by: Napatech.com
link proctree and allproc to Giant since that order is already implicitly
enforced.
- Use a goto to handle the case where we want to enforce a reversal before
calling isitmydescendant() in witness_checkorder() so that the logic is
easier to follow and so that it is easier to add more forced-reversal
cases in the future.
MFC after: 3 days
mutex.
- Don't panic if a spin lock is held too long inside _mtx_lock_spin() if
panicstr is set (meaning that we are already in a panic). Just keep
spinning forever instead.
o for() instead of while() looping over mbuf chain
o paren's around all flag checks
o more verbose function and purpose description
o some more style changes
Based on feedback from: sam
m_demote(m->m_next) if they wish to start at the second mbuf in chain.
o Test m_type with == instead of &.
o Check m_nextpkt against NULL instead of implicit 0.
Based on feedback from: sam
1. Walk the absolute list in reverse to prefer duplicated levels that have
a lower absolute setting, i.e. 800 Mhz/50% is better than 1600 Mhz/25% even
though both have the same actual frequency. This also removes the need to
check for already-modified levels since by definition, those will be added
later in the sorted list.
2. Compare the absolute settings for derived levels and don't use the new
level if it's higher. For example, a level of 800 Mhz/75% is preferable to
1600 Mhz/25% even though the latter has a lower total frequency.
This work is based on a patch from the submitter but reworked by myself.
Submitted by: Tijl Coosemans (tijl/ulyssis.org)
int prep, int how).
Copies the data portion of mbuf (chain) n starting from offset off
for length len to mbuf (chain) m. Depending on prep the copied
data will be appended or prepended. The function ensures that the
mbuf (chain) m will be fully writeable by making real (not refcnt)
copies of mbuf clusters. For the prepending the function returns
a pointer to the new start of mbuf chain m and leaves as much
leading space as possible in the new first mbuf.
Reviewed by: glebius
checking on mbuf's and mbuf chains. Set sanitize to 1 to garble
illegal things and have them blow up later when used/accessed.
m_sanity()'s main purpose is for KASSERT()'s and debugging of non-
kosher mbuf manipulation (of which we have a number of).
Reviewed by: glebius
any tags and packet headers. If "all" is set then the first mbuf
in the chain will be cleaned too.
This function is used before an mbuf, that arrived as packet with
m->flags & M_PKTHDR, is appended to an mbuf chain using m->m_next
(not m->m_nextpkt).
Reviewed by: glebius
but vm_map_wire() fails, then a vm object, vm map entries, and kernel_map
free space is leaked and (2) unwiring is handled automatically by
vm_map_remove().
Suggested by: tegge
- if minfd < fd_freefile (as is most often the case, since minfd is
usually 0), set it to fd_freefile.
- remove a call to fd_first_free() which duplicates work already done
by fdused().
This change results in a small but measurable speedup for processes
with large numbers (several thousands) of open files.
PR: kern/85176
Submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
MFC after: 3 weeks
if an indirect relationship exists (keep both A->B->C and A->C).
This allows witness_checkorder() to use isitmychild() instead of
the much more expensive isitmydescendant() to check for valid lock
ordering.
Don't do an expensive tree walk to update the w_level values when
the tree is updated. Only update the w_level values when using the
debugger to display the tree.
Nuke the experimental "witness_watch > 1" mode that only compared
w_level for the two locks. This information is no longer maintained
at run time, and the use of isitmychild() in witness_checkorder
should bring performance close enough to the acceptable level that
this hack is not needed.
Report witness data structure allocation statistics under the
debug.witness sysctl.
Reviewed by: jhb
MFC after: 30 days
vlrureclaim() in vfs_subr.c 1.636 because waiting for the vnode
lock aggravates an existing race condition. It is also undesirable
according to the commit log for 1.631.
Fix the tiny race condition that remains by rechecking the vnode
state after grabbing the vnode lock and grabbing the vnode interlock.
Fix the problem of other threads being starved (which 1.636 attempted
to fix by removing LK_NOWAIT) by calling uio_yield() periodically
in vlrureclaim(). This should be more deterministic than hoping
that VOP_LOCK() without LK_NOWAIT will block, which may not happen
in this loop.
Reviewed by: kan
MFC after: 5 days