than keeping it locked until we exit the function to optimize the case
where the lock would be dropped and later reacquired. The optimization
was broken when kevent's were moved from UFS to VFS and the knote list
lock for a vnode kevent became the lockmgr vnode lock. If one tried
to use a kqueue that contained events for a kqueue fd followed by a vnode,
then the kq global lock would end up being held when the vnode lock was
acquired which could result in sleeping with a mutex held (and subsequent
panics) if the vnode lock was contested.
Reviewed by: jmg
Tested by: ps (on 6.x)
MFC after: 3 days
a race where data could come in before we clear the INFLUX flag, and get
skipped over by knote (and hence never be activated, though it should of
been)...
Found by: glebius & co.
Reviewed by: glebius
MFC after: 3 days
notifications when LIO operations completed. These were the problems
with LIO event complete notification:
- Move all LIO/AIO event notification into one general function
so we don't have bugs in different data paths. This unification
got rid of several notification bugs one of which if kqueue was
used a SIGILL could get sent to the process.
- Change the LIO event accounting to count all AIO request that
could have been split across the fast path and daemon mode.
The prior accounting only kept track of AIO op's in that
mode and not the entire list of operations. This could cause
a bogus LIO event complete notification to occur when all of
the fast path AIO op's completed and not the AIO op's that
ended up queued for the daemon.
Suggestions from: alc
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.
- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.
Reviewed by: jmg
Approved by: re (scottl), scottl (mentor override)
Pointyhat to: ssouhlal
Will be happy: everyone
w/o problems than I was before... This simply brings back the knote_delete
as knlist_delete which will also drop the knote's, instead of just clearing
the list and seeing _ONESHOT...
Fix a race where if a note was _INFLUX and _DETACHED, it could end up being
modified... whoopse..
MFC after: 1 week
Prodded by: ambrisko and dwhite
Use this in all the places where sleeping with the lock held is not
an issue.
The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.
happens when a proc exits, but needs to inform the user that this has
happened.. This also means we can remove the check for detached from
proc and sig f_detach functions as this is doing in kqueue now...
MFC after: 5 days
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.
Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).
Reviewed by: green, rwatson (both earlier versions)
Giant conditional on debug.mpsafenet in the socket soo_stat() routine,
unconditionally in vn_statfile() for VFS, and otherwise don't acquire
Giant. Accept an unlocked read in kqueue_stat(), and cryptof_stat() is
a no-op. Don't acquire Giant in fstat() system call.
Note: in fdescfs, fo_stat() is called while holding Giant due to the VFS
stack sitting on top, and therefore there will still be Giant recursion
in this case.
individual file object implementations can optionally acquire Giant if
they require it:
- soo_close(): depends on debug.mpsafenet
- pipe_close(): Giant not acquired
- kqueue_close(): Giant required
- vn_close(): Giant required
- cryptof_close(): Giant required (conservative)
Notes:
Giant is still acquired in close() even when closing MPSAFE objects
due to kqueue requiring Giant in the calling closef() code.
Microbenchmarks indicate that this removal of Giant cuts 3%-3% off
of pipe create/destroy pairs from user space with SMP compiled into
the kernel.
The cryptodev and opencrypto code appears MPSAFE, but I'm unable to
test it extensively and so have left Giant over fo_close(). It can
probably be removed given some testing and review.
generic filesystem events to userspace. Currently only mount and unmount
of filesystems are signalled. Soon to be added, up/down status of NFS.
Introduce a sysctl node used to route requests to/from filesystems
based on filesystem ids.
Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/
entrypoint by the sysctl code to change individual filesystems.
a callout, and use the new callout_drain API to make sure that a callout
has finished before we deallocate memory it is using.
PR: kern/64121
Discussed with: gallatin
in exit1(), make sure the p_klist is empty after sending NOTE_EXIT.
The process won't report fork() or execve() and won't be able to handle
NOTE_SIGNAL knotes anyway.
This fixes some race conditions with do_tdsignal() calling knote() while
the process is exiting.
Reported by: Stefan Farfeleder <stefan@fafoe.narf.at>
MFC after: 1 week
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
Use zpfind() to see if the process became a zombie if pfind() doesn't find it
and if the caller wants to know about process death, so that the caller knows
the process died even if it happened before the kevent was actually registered.
MFC after: 1 week
This fixes a race condition (specifically with signal events) that could
lead to the kn being re-inserted into the list after it has been destroyed,
which is not something we want to happen.
PR: kern/58258
table, acquiring the necessary locks as it works. It usually returns
two references to the new descriptor: one in the descriptor table
and one via a pointer argument.
As falloc releases the FILEDESC lock before returning, there is a
potential for a process to close the reference in the file descriptor
table before falloc's caller gets to use the file. I don't think this
can happen in practice at the moment, because Giant indirectly protects
closes.
To stop the file being completly closed in this situation, this change
makes falloc set the refcount to two when both references are returned.
This makes life easier for several of falloc's callers, because the
first thing they previously did was grab an extra reference on the
file.
Reviewed by: iedowse
Idea run past: jhb
the target process exiting which causes attempts to register the kevent
to randomly fail depending on whether the target runs to completion before
the parent can call kevent(2). The bug actually effects EVFILT_PROC
events on any zombie process, but the most common manifestation is with
parents trying to monitor child processes.
MFC after: 2 weeks
Sponsored by: NTT Multimedia Communications Labs
barrier between free'ing filedesc structures. Basically if you want to
access another process's filedesc, you want to hold this mutex over the
entire operation.
1. eliminate unnecessary loop which frees and re-allocates
the just allocated array
2. eliminate the newsize recomputation
3. eliminate unnecessary unlock and relock around free
4. correctly match the free with the malloc into M_KQUEUE instead of M_TEMP
5. eliminate conditional assignment of oldlist, which is equivalent to a
simple assignment
6. eliminate the oldlist temporary variable completely
Reviewed by: jhb
pointer types, and remove a huge number of casts from code using it.
Change struct xfile xf_data to xun_data (ABI is still compatible).
If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.
this was causing filedesc work to be very painful.
In order to make this work split out sigio definitions to thier own header
(sigio.h) which is included from proc.h for the time being.