Commit Graph

243 Commits

Author SHA1 Message Date
rwatson
53eaed07bc Use C99 initialization for struct filterops.
Obtained from:	Mac OS X
Sponsored by:	Apple Inc.
MFC after:	3 weeks
2009-09-12 20:03:45 +00:00
kib
e1cb2941d4 Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland
2009-06-10 20:59:32 +00:00
jhb
a1af9ecca4 Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
  locked.  It is not permissible to call soisconnected() with this lock
  held; however, so socket upcalls now return an integer value.  The two
  possible values are SU_OK and SU_ISCONNECTED.  If an upcall returns
  SU_ISCONNECTED, then the soisconnected() will be invoked on the
  socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls.  The
  API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
  the receive socket buffer automatically.  Note that a SO_SND upcall
  should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
  instead of calling soisconnected() directly.  They also no longer need
  to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
  state machine, but other accept filters no longer have any explicit
  knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
  while invoking soreceive() as a temporary band-aid.  The plan for
  the future is to add a new flag to allow soreceive() to be called with
  the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
  buffer locked.  Previously sowakeup() would drop the socket buffer
  lock only to call aio_swake() which immediately re-acquired the socket
  buffer lock for the duration of the function call.

Discussed with:	rwatson, rmacklem
2009-06-01 21:17:03 +00:00
jhb
d2c61e641d Use the correct type for the timeout parameter to the 32-bit
compat version aio_waitcomplete().

Reminded by:	bz
Submitted by:	jamie
MFC after:	3 days
2009-01-23 13:23:17 +00:00
jhb
f3dcc2d9e0 - Add 32-bit compat system calls for VFS_AIO. The system calls live in the
aio code and are registered via the recently added SYSCALL32_*() helpers.
- Since the aio code likes to invoke fuword and suword a lot down in the
  "bowels" of system calls, add a structure holding a set of operations for
  things like storing errors, copying in the aiocb structure, storing
  status, etc.  The 32-bit system calls use a separate operations vector to
  handle fuword32 vs fuword, etc.  Also, the oldsigevent handling is now
  done by having seperate operation vectors with different aiocb copyin
  routines.
- Split out kern_foo() functions for the various AIO system calls so the
  32-bit front ends can manage things like copying in and converting
  timespec structures, etc.
- For both the native and 32-bit aio_suspend() and lio_listio() calls,
  just use copyin() to read the array of aiocb pointers instead of using
  a for loop that iterated over fuword/fuword32.  The error handling in
  the old case was incomplete (lio_listio() just ignored any aiocb's that
  it got an EFAULT trying to read rather than reporting an error), and
  possibly slower.

MFC after:	1 month
2008-12-10 20:56:19 +00:00
gonzo
f0ffee5444 Use minimum of max_aio_procs and target_aio_procs when spawning new
aiod since there should be no more then max_aio_procs processes.
2008-06-21 11:34:34 +00:00
rwatson
95bb3acd1f Use FEATURE() macro to advertise aio availability. 2008-02-01 11:59:14 +00:00
dumbbell
ba3df23cb8 When asked to use kqueue, AIO stores its internal state in the
`kn_sdata' member of the newly registered knote. The problem is that
this member is overwritten by a call to kevent(2) with the EV_ADD flag,
targetted at the same kevent/knote. For instance, a userland application
may set the pointer to NULL, leading to a panic.

A testcase was provided by the submitter.

PR:	kern/118911
Submitted by:	MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after:	1 day
2008-01-24 17:10:19 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
julian
51d643caa6 Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
kib
e23c502c5b Destroy the kaio_mtx on the freeing the struct kaioinfo in the
aio_proc_rundown.

Do not allow for zero-length read to be passed to the fo_read file method
by aio.

Reported and tested by:	Peter Holm
Approved by:	re (kensmith)
2007-08-20 11:53:26 +00:00
mjacob
8d19daeade Remove unused variable. 2007-06-10 01:50:05 +00:00
jeff
a7a8bac81f - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
rwatson
69938bd196 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
trhodes
58cca8458a Merge posix4/* into normal kernel hierarchy.
Reviewed by:	glanced at by jhb
Approved by:	silence on -arch@ and -standards@
2006-11-11 16:26:58 +00:00
netchild
183bd5a34b MFP4 (with some minor changes):
Implement the linux_io_* syscalls (AIO). They are only enabled if the native
AIO code is available (either compiled in to the kernel or as a module) at
the time the functions are used. If the AIO stuff is not available there
will be a ENOSYS.

From the submitter:
---snip---
DESIGN NOTES:

1. Linux permits a process to own multiple AIO queues (distinguished by
   "context"), but FreeBSD creates only one single AIO queue per process.
   My code maintains a request queue (STAILQ of queue(3)) per "context",
   and throws all AIO requests of all contexts owned by a process into
   the single FreeBSD per-process AIO queue.

   When the process calls io_destroy(2), io_getevents(2), io_submit(2) and
   io_cancel(2), my code can pick out requests owned by the specified context
   from the single FreeBSD per-process AIO queue according to the per-context
   request queues maintained by my code.

2. The request queue maintained by my code stores contrast information between
   Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks
   (struct aiocb). FreeBSD IO control block actually exists in userland memory
   space, required by FreeBSD native aio_XXXXXX(2).

3. It is quite troubling that the function io_getevents() of libaio-0.3.105
   needs to use Linux-specific "struct aio_ring", which is a partial mirror
   of context in user space. I would rather take the address of context in
   kernel as the context ID, but the io_getevents() of libaio forces me to
   take the address of the "ring" in user space as the context ID.

   To my surprise, one comment line in the file "io_getevents.c" of
   libaio-0.3.105 reads:

             Ben will hate me for this

REFERENCE:

1. Linux kernel source code:   http://www.kernel.org/pub/linux/kernel/v2.6/
   (include/linux/aio_abi.h, fs/aio.c)

2. Linux manual pages:         http://www.kernel.org/pub/linux/docs/manpages/
   (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2))

3. Linux Scalability Effort:   http://lse.sourceforge.net/io/aio.html
   The design notes:           http://lse.sourceforge.net/io/aionotes.txt

4. The package libaio, both source and binary:
       http://rpmfind.net/linux/rpm2html/search.php?query=libaio
   Simple transparent interface to Linux AIO system calls.

5. Libaio-oracle:              http://oss.oracle.com/projects/libaio-oracle/
   POSIX AIO implementation based on Linux AIO system calls (depending on
   libaio).
---snip---

Submitted by:	Li, Xiao <intron@intron.ac>
2006-10-15 14:22:14 +00:00
jmg
28f00353ba hide kqueue_register from public view, and replace it w/ kqfd_register...
this eliminates a possible race in aio registering a kevent..
2006-09-24 04:47:47 +00:00
mp
f2b0030374 Remove call to fdfree() for the AIO daemons to prevent kernel panics
with linprocfs. This call is not needed since file descriptor sharing
was removed in v1.125.

Reviewed by:	alc, davidxu, ambrisko
MFC after:	3 days
2006-09-06 15:11:20 +00:00
netchild
b2a39f267a - Change process_exec function handlers prototype to include struct
image_params arg.
- Change struct image_params to include struct sysentvec pointer and
  initialize it.
- Change all consumers of process_exit/process_exec eventhandlers to
  new prototypes (includes splitting up into distinct exec/exit functions).
- Add eventhandler to userret.

Sponsored by:		Google SoC 2006
Submitted by:		rdivacky
Parts suggested by:	jhb (on hackers@)
2006-08-15 12:10:57 +00:00
ambrisko
f3fbf567ee Make lio ident more consistant with aio ident. 2006-06-02 17:45:48 +00:00
davidxu
a9ab4c311a Use a dedicated mutex to protect aio queues, the movation is to reduce
lock contention with other parts.
2006-05-09 00:10:11 +00:00
davidxu
60f31ebe10 1. Move code for scanning pending I/O from aio_fsync to aio_aqueue,
it has less overhead.
2. Avoid scheduling task if maximum number of I/O threads is reached.
2006-03-24 00:50:06 +00:00
davidxu
fa4b9b6f23 Implement aio_fsync() syscall. 2006-03-23 08:46:42 +00:00
davidxu
59079f19a5 1. Remove aio entry from lists earlier in aio_free_entry,
so other threads can not see it if we unlock the proc
   lock (this can happen in knlist_delete).  Don't do wakeup,
   it is not necessary.

2. Decrease kaio_buffer_count in biohelper rather than
   doing it in aio_bio_done_notify.

3. In aio_bio_done_notify, don't send notification if KAIO_RUNDOWN
   was set, because the process is already in single thread mode.

4. Use assignment to initialize aiothreadflags.

5. AIOCBLIST_RUNDOWN is not useful, axe the code using it.

6. use LIO_NOP instead of zero.
2006-02-26 12:56:23 +00:00
davidxu
44accb1d3b If block size is zero, use normal file operations to do I/O,
this eliminates a divided-by-zero fault.

Recommended by: phk
2006-02-22 00:05:12 +00:00
davidxu
a8cf0ae648 Just like dofilewrite(), call bwillwrite before fo_write. 2006-01-27 08:02:25 +00:00
davidxu
7a2375b9e4 return final error code in aio_return rather than a hardcoded 0. 2006-01-27 04:14:16 +00:00
davidxu
e2e1fa02bc in aio_aqueue, store same return code into job->_aiocb_private.error.
in aio_return, unlock proc lock before suword.
2006-01-26 08:37:02 +00:00
davidxu
1f5105d8a7 Add locking annotation and comments about socket, pipe, fifo problem.
Temporarily fix a locking problem for socket I/O.
2006-01-24 07:24:24 +00:00
davidxu
edbd69603d Er, rescure a deleted comment line. 2006-01-24 02:50:42 +00:00
davidxu
acb3d897fb More cleanup for aio code:
1) unregsiter kqueue filter for EVFILT_LIO.
2) free uma_zones.
3) call setsid directly to enter another session rather than
   implementing by itself.

Submitted by: jhb
2006-01-24 02:46:15 +00:00
davidxu
4aec3469f3 Add bracket. 2006-01-23 23:46:30 +00:00
davidxu
14cd6d7f49 Verify all supported notification types. 2006-01-23 10:27:15 +00:00
davidxu
47e7cba205 1) Merge _aio_aqueue and aio_aqueue, check quota in aio_aqueue,
so that lio_listio won't exceed the quota.
2) Remove lio_ref_count, it is no longer used.
2006-01-23 02:49:34 +00:00
davidxu
7c1b5ba95e Fix a bogus panic. 2006-01-22 09:39:59 +00:00
davidxu
362cb85fb6 Decrease kaio_active_count first, because user process may go away
after we notified it.
2006-01-22 09:25:52 +00:00
davidxu
72c8645faa Make aio code MP safe. 2006-01-22 05:59:27 +00:00
csjp
ef078951d6 Initialize ki to p->p_aioinfo after we know it's going to be referencing
a valid kaioinfo structure. This avoids a potential NULL pointer dereference.

Found with:	Coverity Prevent(tm)
MFC after:	2 weeks
2006-01-15 01:55:45 +00:00
jhb
0beb1bf77b Return error from fget_write() rather than hardcoding EBADF now that
fget_write() DTRT.

Requested by:	bde
2006-01-06 16:34:22 +00:00
davidxu
ce1172e446 In aio_waitcomplete, do not return EAGAIN if no other threads
have started aio, instead, initialize aio management structure
if it hasn't been done, the reason to adjust this behavior is
to make it a bit friendly for threaded program, consider two
threads, one submits aio_write, and another just calls
aio_waitcomplete to wait any I/O to be completed and recycle the
aio requests, before submitter doing any I/O, the recycler wants
to wait in kernel. This also fixes inconsistency with other aio
syscalls.
2005-11-08 23:48:32 +00:00
jhb
7b0555d459 Various and sundry cleanups:
- Use curthread for calls to knlist_delete() and add a big comment
  explaining why as well as appropriate assertions.
- Use TAILQ_FOREACH and TAILQ_FOREACH_SAFE instead of handrolling them.
- Use fget() family of functions to lookup file objects instead of
  grovelling around in file descriptor tables.
- Destroy the aio_freeproc mutex if we are unloaded.

Tested on:	i386
2005-11-08 17:43:05 +00:00
davidxu
ae161ac239 Fix name compatible problem with POSIX standard. the sigval_ptr and
sigval_int really should be sival_ptr and sival_int.
Also sigev_notify_function accepts a union sigval value but not a
pointer.
2005-11-04 09:41:00 +00:00
davidxu
064df4ad12 Support sending realtime signal information via signal queue, realtime
signal memory is pre-allocated, so kernel can always notify user code.
2005-11-03 05:25:26 +00:00
jhb
2eddf38ca6 Push down Giant into fdfree() and remove it from two of the callers.
Other callers such as some rfork() cases weren't locking Giant anyway.

Reviewed by:	csjp
MFC after:	1 week
2005-11-01 17:13:05 +00:00
davidxu
a27888aa88 Fix sigevent's POSIX incompatible problem by adding member fields
sigev_notify_function and sigev_notify_attributes. AIO syscalls
use sigevent, so they have to be adjusted.

Reviewed by:	alc
2005-10-30 02:12:49 +00:00
ambrisko
261193a68f Fix tinderbox box by removing incomplete/bad spl usage. Proper giant free
locking is required in for aio.

Pointed out by:	imp
2005-10-12 22:33:22 +00:00
ambrisko
e3a46811b7 Add in kqueue support to LIO event notification and fix how it handled
notifications when LIO operations completed.  These were the problems
with LIO event complete notification:
      -	Move all LIO/AIO event notification into one general function
	so we don't have bugs in different data paths.  This unification
	got rid of several notification bugs one of which if kqueue was
	used a SIGILL could get sent to the process.
      -	Change the LIO event accounting to count all AIO request that
	could have been split across the fast path and daemon mode.
	The prior accounting only kept track of AIO op's in that
	mode and not the entire list of operations.  This could cause
	a bogus LIO event complete notification to occur when all of
	the fast path AIO op's completed and not the AIO op's that
	ended up queued for the daemon.

Suggestions from:	alc
2005-10-12 17:51:31 +00:00
alc
38bf328ab8 Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function.  Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
2005-07-20 19:06:06 +00:00
ssouhlal
efe31cd3da Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by:	jmg
Approved by:	re (scottl), scottl (mentor override)
Pointyhat to:	ssouhlal
Will be happy:	everyone
2005-07-01 16:28:32 +00:00