Refactor the AIO subsystem to permit file-type-specific handling and

improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order.  Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels.  The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon.  This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system.  To protect against this, AIO
requests are only permitted for known "safe" files by default.  AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value.  The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default.  aio_mlock() is also enabled by default.

Reviewed by:	cem, jilles
Discussed with:	kib (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5289
This commit is contained in:
John Baldwin 2016-03-01 18:12:14 +00:00
parent cbc4d2db75
commit f3215338ef
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=296277
19 changed files with 1080 additions and 441 deletions

View File

@ -27,24 +27,116 @@
.\"
.\" $FreeBSD$
.\"
.Dd October 24, 2002
.Dd March 1, 2016
.Dt AIO 4
.Os
.Sh NAME
.Nm aio
.Nd asynchronous I/O
.Sh SYNOPSIS
To link into the kernel:
.Cd "options VFS_AIO"
.Pp
To load as a kernel loadable module:
.Dl kldload aio
.Sh DESCRIPTION
The
.Nm
facility provides system calls for asynchronous I/O.
It is available both as a kernel option for static inclusion and as a
dynamic kernel module.
However, asynchronous I/O operations are only enabled for certain file
types by default.
Asynchronous I/O operations for other file types may block an AIO daemon
indefinitely resulting in process and/or system hangs.
Asynchronous I/O operations can be enabled for all file types by setting
the
.Va vfs.aio.enable_unsafe
sysctl node to a non-zero value.
.Pp
Asynchronous I/O operations on sockets and raw disk devices do not block
indefinitely and are enabled by default.
.Pp
The
.Nm
facility uses kernel processes
(also known as AIO daemons)
to service most asynchronous I/O requests.
These processes are grouped into pools containing a variable number of
processes.
Each pool will add or remove processes to the pool based on load.
Pools can be configured by sysctl nodes that define the minimum
and maximum number of processes as well as the amount of time an idle
process will wait before exiting.
.Pp
One pool of AIO daemons is used to service asynchronous I/O requests for
sockets.
These processes are named
.Dq soaiod<N> .
The following sysctl nodes are used with this pool:
.Bl -tag -width indent
.It Va kern.ipc.aio.num_procs
The current number of processes in the pool.
.It Va kern.ipc.aio.target_procs
The minimum number of processes that should be present in the pool.
.It Va kern.ipc.aio.max_procs
The maximum number of processes permitted in the pool.
.It Va kern.ipc.aio.lifetime
The amount of time a process is permitted to idle in clock ticks.
If a process is idle for this amount of time and there are more processes
in the pool than the target minimum,
the process will exit.
.El
.Pp
A second pool of AIO daemons is used to service all other asynchronous I/O
requests except for I/O requests to raw disks.
These processes are named
.Dq aiod<N> .
The following sysctl nodes are used with this pool:
.Bl -tag -width indent
.It Va vfs.aio.num_aio_procs
The current number of processes in the pool.
.It Va vfs.aio.target_aio_procs
The minimum number of processes that should be present in the pool.
.It Va vfs.aio.max_aio_procs
The maximum number of processes permitted in the pool.
.It Va vfs.aio.aiod_lifetime
The amount of time a process is permitted to idle in clock ticks.
If a process is idle for this amount of time and there are more processes
in the pool than the target minimum,
the process will exit.
.El
.Pp
Asynchronous I/O requests for raw disks are queued directly to the disk
device layer after temporarily wiring the user pages associated with the
request.
These requests are not serviced by any of the AIO daemon pools.
.Pp
Several limits on the number of asynchronous I/O requests are imposed both
system-wide and per-process.
These limits are configured via the following sysctls:
.Bl -tag -width indent
.It Va vfs.aio.max_buf_aio
The maximum number of queued asynchronous I/O requests for raw disks permitted
for a single process.
Asynchronous I/O requests that have completed but whose status has not been
retrieved via
.Xr aio_return 2
or
.Xr aio_waitcomplete 2
are not counted against this limit.
.It Va vfs.aio.num_buf_aio
The number of queued asynchronous I/O requests for raw disks system-wide.
.It Va vfs.aio.max_aio_queue_per_proc
The maximum number of asynchronous I/O requests for a single process
serviced concurrently by the default AIO daemon pool.
.It Va vfs.aio.max_aio_per_proc
The maximum number of outstanding asynchronous I/O requests permitted for a
single process.
This includes requests that have not been serviced,
requests currently being serviced,
and requests that have completed but whose status has not been retrieved via
.Xr aio_return 2
or
.Xr aio_waitcomplete 2 .
.It Va vfs.aio.num_queue_count
The number of outstanding asynchronous I/O requests system-wide.
.It Va vfs.aio.max_aio_queue
The maximum number of outstanding asynchronous I/O requests permitted
system-wide.
.El
.Sh SEE ALSO
.Xr aio_cancel 2 ,
.Xr aio_error 2 ,
@ -54,9 +146,7 @@ dynamic kernel module.
.Xr aio_waitcomplete 2 ,
.Xr aio_write 2 ,
.Xr lio_listio 2 ,
.Xr config 8 ,
.Xr kldload 8 ,
.Xr kldunload 8
.Xr sysctl 8
.Sh HISTORY
The
.Nm
@ -66,3 +156,7 @@ The
.Nm
kernel module appeared in
.Fx 5.0 .
The
.Nm
facility was integrated into all kernels in
.Fx 11.0 .

View File

@ -1130,11 +1130,6 @@ options EXT2FS
#
options REISERFS
# Use real implementations of the aio_* system calls. There are numerous
# stability and security issues in the current aio code that make it
# unsuitable for inclusion on machines with untrusted local users.
options VFS_AIO
# Cryptographically secure random number generator; /dev/random
device random

View File

@ -3332,7 +3332,7 @@ kern/uipc_socket.c standard
kern/uipc_syscalls.c standard
kern/uipc_usrreq.c standard
kern/vfs_acl.c standard
kern/vfs_aio.c optional vfs_aio
kern/vfs_aio.c standard
kern/vfs_bio.c standard
kern/vfs_cache.c standard
kern/vfs_cluster.c standard

View File

@ -213,7 +213,6 @@ SYSVSHM opt_sysvipc.h
SW_WATCHDOG opt_watchdog.h
TURNSTILE_PROFILING
UMTX_PROFILING
VFS_AIO
VERBOSE_SYSINIT
WLCACHE opt_wavelan.h
WLDEBUG opt_wavelan.h

View File

@ -34,9 +34,12 @@ __FBSDID("$FreeBSD$");
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/aio.h>
#include <sys/domain.h>
#include <sys/file.h>
#include <sys/filedesc.h>
#include <sys/kernel.h>
#include <sys/kthread.h>
#include <sys/malloc.h>
#include <sys/proc.h>
#include <sys/protosw.h>
@ -48,6 +51,9 @@ __FBSDID("$FreeBSD$");
#include <sys/filio.h> /* XXX */
#include <sys/sockio.h>
#include <sys/stat.h>
#include <sys/sysctl.h>
#include <sys/sysproto.h>
#include <sys/taskqueue.h>
#include <sys/uio.h>
#include <sys/ucred.h>
#include <sys/un.h>
@ -64,6 +70,22 @@ __FBSDID("$FreeBSD$");
#include <security/mac/mac_framework.h>
#include <vm/vm.h>
#include <vm/pmap.h>
#include <vm/vm_extern.h>
#include <vm/vm_map.h>
static SYSCTL_NODE(_kern_ipc, OID_AUTO, aio, CTLFLAG_RD, NULL,
"socket AIO stats");
static int empty_results;
SYSCTL_INT(_kern_ipc_aio, OID_AUTO, empty_results, CTLFLAG_RD, &empty_results,
0, "socket operation returned EAGAIN");
static int empty_retries;
SYSCTL_INT(_kern_ipc_aio, OID_AUTO, empty_retries, CTLFLAG_RD, &empty_retries,
0, "socket operation retries");
static fo_rdwr_t soo_read;
static fo_rdwr_t soo_write;
static fo_ioctl_t soo_ioctl;
@ -72,6 +94,9 @@ extern fo_kqfilter_t soo_kqfilter;
static fo_stat_t soo_stat;
static fo_close_t soo_close;
static fo_fill_kinfo_t soo_fill_kinfo;
static fo_aio_queue_t soo_aio_queue;
static void soo_aio_cancel(struct kaiocb *job);
struct fileops socketops = {
.fo_read = soo_read,
@ -86,6 +111,7 @@ struct fileops socketops = {
.fo_chown = invfo_chown,
.fo_sendfile = invfo_sendfile,
.fo_fill_kinfo = soo_fill_kinfo,
.fo_aio_queue = soo_aio_queue,
.fo_flags = DFLAG_PASSABLE
};
@ -363,3 +389,374 @@ soo_fill_kinfo(struct file *fp, struct kinfo_file *kif, struct filedesc *fdp)
sizeof(kif->kf_path));
return (0);
}
static STAILQ_HEAD(, task) soaio_jobs;
static struct mtx soaio_jobs_lock;
static struct task soaio_kproc_task;
static int soaio_starting, soaio_idle, soaio_queued;
static struct unrhdr *soaio_kproc_unr;
static int soaio_max_procs = MAX_AIO_PROCS;
SYSCTL_INT(_kern_ipc_aio, OID_AUTO, max_procs, CTLFLAG_RW, &soaio_max_procs, 0,
"Maximum number of kernel processes to use for async socket IO");
static int soaio_num_procs;
SYSCTL_INT(_kern_ipc_aio, OID_AUTO, num_procs, CTLFLAG_RD, &soaio_num_procs, 0,
"Number of active kernel processes for async socket IO");
static int soaio_target_procs = TARGET_AIO_PROCS;
SYSCTL_INT(_kern_ipc_aio, OID_AUTO, target_procs, CTLFLAG_RD,
&soaio_target_procs, 0,
"Preferred number of ready kernel processes for async socket IO");
static int soaio_lifetime;
SYSCTL_INT(_kern_ipc_aio, OID_AUTO, lifetime, CTLFLAG_RW, &soaio_lifetime, 0,
"Maximum lifetime for idle aiod");
static void
soaio_kproc_loop(void *arg)
{
struct proc *p;
struct vmspace *myvm;
struct task *task;
int error, id, pending;
id = (intptr_t)arg;
/*
* Grab an extra reference on the daemon's vmspace so that it
* doesn't get freed by jobs that switch to a different
* vmspace.
*/
p = curproc;
myvm = vmspace_acquire_ref(p);
mtx_lock(&soaio_jobs_lock);
MPASS(soaio_starting > 0);
soaio_starting--;
for (;;) {
while (!STAILQ_EMPTY(&soaio_jobs)) {
task = STAILQ_FIRST(&soaio_jobs);
STAILQ_REMOVE_HEAD(&soaio_jobs, ta_link);
soaio_queued--;
pending = task->ta_pending;
task->ta_pending = 0;
mtx_unlock(&soaio_jobs_lock);
task->ta_func(task->ta_context, pending);
mtx_lock(&soaio_jobs_lock);
}
MPASS(soaio_queued == 0);
if (p->p_vmspace != myvm) {
mtx_unlock(&soaio_jobs_lock);
vmspace_switch_aio(myvm);
mtx_lock(&soaio_jobs_lock);
continue;
}
soaio_idle++;
error = mtx_sleep(&soaio_idle, &soaio_jobs_lock, 0, "-",
soaio_lifetime);
soaio_idle--;
if (error == EWOULDBLOCK && STAILQ_EMPTY(&soaio_jobs) &&
soaio_num_procs > soaio_target_procs)
break;
}
soaio_num_procs--;
mtx_unlock(&soaio_jobs_lock);
free_unr(soaio_kproc_unr, id);
kproc_exit(0);
}
static void
soaio_kproc_create(void *context, int pending)
{
struct proc *p;
int error, id;
mtx_lock(&soaio_jobs_lock);
for (;;) {
if (soaio_num_procs < soaio_target_procs) {
/* Must create */
} else if (soaio_num_procs >= soaio_max_procs) {
/*
* Hit the limit on kernel processes, don't
* create another one.
*/
break;
} else if (soaio_queued <= soaio_idle + soaio_starting) {
/*
* No more AIO jobs waiting for a process to be
* created, so stop.
*/
break;
}
soaio_starting++;
mtx_unlock(&soaio_jobs_lock);
id = alloc_unr(soaio_kproc_unr);
error = kproc_create(soaio_kproc_loop, (void *)(intptr_t)id,
&p, 0, 0, "soaiod%d", id);
if (error != 0) {
free_unr(soaio_kproc_unr, id);
mtx_lock(&soaio_jobs_lock);
soaio_starting--;
break;
}
mtx_lock(&soaio_jobs_lock);
soaio_num_procs++;
}
mtx_unlock(&soaio_jobs_lock);
}
static void
soaio_enqueue(struct task *task)
{
mtx_lock(&soaio_jobs_lock);
MPASS(task->ta_pending == 0);
task->ta_pending++;
STAILQ_INSERT_TAIL(&soaio_jobs, task, ta_link);
soaio_queued++;
if (soaio_queued <= soaio_idle)
wakeup_one(&soaio_idle);
else if (soaio_num_procs < soaio_max_procs)
taskqueue_enqueue(taskqueue_thread, &soaio_kproc_task);
mtx_unlock(&soaio_jobs_lock);
}
static void
soaio_init(void)
{
soaio_lifetime = AIOD_LIFETIME_DEFAULT;
STAILQ_INIT(&soaio_jobs);
mtx_init(&soaio_jobs_lock, "soaio jobs", NULL, MTX_DEF);
soaio_kproc_unr = new_unrhdr(1, INT_MAX, NULL);
TASK_INIT(&soaio_kproc_task, 0, soaio_kproc_create, NULL);
if (soaio_target_procs > 0)
taskqueue_enqueue(taskqueue_thread, &soaio_kproc_task);
}
SYSINIT(soaio, SI_SUB_VFS, SI_ORDER_ANY, soaio_init, NULL);
static __inline int
soaio_ready(struct socket *so, struct sockbuf *sb)
{
return (sb == &so->so_rcv ? soreadable(so) : sowriteable(so));
}
static void
soaio_process_job(struct socket *so, struct sockbuf *sb, struct kaiocb *job)
{
struct ucred *td_savedcred;
struct thread *td;
struct file *fp;
struct uio uio;
struct iovec iov;
size_t cnt;
int error, flags;
SOCKBUF_UNLOCK(sb);
aio_switch_vmspace(job);
td = curthread;
fp = job->fd_file;
retry:
td_savedcred = td->td_ucred;
td->td_ucred = job->cred;
cnt = job->uaiocb.aio_nbytes;
iov.iov_base = (void *)(uintptr_t)job->uaiocb.aio_buf;
iov.iov_len = cnt;
uio.uio_iov = &iov;
uio.uio_iovcnt = 1;
uio.uio_offset = 0;
uio.uio_resid = cnt;
uio.uio_segflg = UIO_USERSPACE;
uio.uio_td = td;
flags = MSG_NBIO;
/* TODO: Charge ru_msg* to job. */
if (sb == &so->so_rcv) {
uio.uio_rw = UIO_READ;
#ifdef MAC
error = mac_socket_check_receive(fp->f_cred, so);
if (error == 0)
#endif
error = soreceive(so, NULL, &uio, NULL, NULL, &flags);
} else {
uio.uio_rw = UIO_WRITE;
#ifdef MAC
error = mac_socket_check_send(fp->f_cred, so);
if (error == 0)
#endif
error = sosend(so, NULL, &uio, NULL, NULL, flags, td);
if (error == EPIPE && (so->so_options & SO_NOSIGPIPE) == 0) {
PROC_LOCK(job->userproc);
kern_psignal(job->userproc, SIGPIPE);
PROC_UNLOCK(job->userproc);
}
}
cnt -= uio.uio_resid;
td->td_ucred = td_savedcred;
/* XXX: Not sure if this is needed? */
if (cnt != 0 && (error == ERESTART || error == EINTR ||
error == EWOULDBLOCK))
error = 0;
if (error == EWOULDBLOCK) {
/*
* A read() or write() on the socket raced with this
* request. If the socket is now ready, try again.
* If it is not, place this request at the head of the
* queue to try again when the socket is ready.
*/
SOCKBUF_LOCK(sb);
empty_results++;
if (soaio_ready(so, sb)) {
empty_retries++;
SOCKBUF_UNLOCK(sb);
goto retry;
}
if (!aio_set_cancel_function(job, soo_aio_cancel)) {
MPASS(cnt == 0);
SOCKBUF_UNLOCK(sb);
aio_cancel(job);
SOCKBUF_LOCK(sb);
} else {
TAILQ_INSERT_HEAD(&sb->sb_aiojobq, job, list);
}
} else {
aio_complete(job, cnt, error);
SOCKBUF_LOCK(sb);
}
}
static void
soaio_process_sb(struct socket *so, struct sockbuf *sb)
{
struct kaiocb *job;
SOCKBUF_LOCK(sb);
while (!TAILQ_EMPTY(&sb->sb_aiojobq) && soaio_ready(so, sb)) {
job = TAILQ_FIRST(&sb->sb_aiojobq);
TAILQ_REMOVE(&sb->sb_aiojobq, job, list);
if (!aio_clear_cancel_function(job))
continue;
soaio_process_job(so, sb, job);
}
/*
* If there are still pending requests, the socket must not be
* ready so set SB_AIO to request a wakeup when the socket
* becomes ready.
*/
if (!TAILQ_EMPTY(&sb->sb_aiojobq))
sb->sb_flags |= SB_AIO;
sb->sb_flags &= ~SB_AIO_RUNNING;
SOCKBUF_UNLOCK(sb);
ACCEPT_LOCK();
SOCK_LOCK(so);
sorele(so);
}
void
soaio_rcv(void *context, int pending)
{
struct socket *so;
so = context;
soaio_process_sb(so, &so->so_rcv);
}
void
soaio_snd(void *context, int pending)
{
struct socket *so;
so = context;
soaio_process_sb(so, &so->so_snd);
}
void
sowakeup_aio(struct socket *so, struct sockbuf *sb)
{
SOCKBUF_LOCK_ASSERT(sb);
sb->sb_flags &= ~SB_AIO;
if (sb->sb_flags & SB_AIO_RUNNING)
return;
sb->sb_flags |= SB_AIO_RUNNING;
if (sb == &so->so_snd)
SOCK_LOCK(so);
soref(so);
if (sb == &so->so_snd)
SOCK_UNLOCK(so);
soaio_enqueue(&sb->sb_aiotask);
}
static void
soo_aio_cancel(struct kaiocb *job)
{
struct socket *so;
struct sockbuf *sb;
int opcode;
so = job->fd_file->f_data;
opcode = job->uaiocb.aio_lio_opcode;
if (opcode == LIO_READ)
sb = &so->so_rcv;
else {
MPASS(opcode == LIO_WRITE);
sb = &so->so_snd;
}
SOCKBUF_LOCK(sb);
if (!aio_cancel_cleared(job))
TAILQ_REMOVE(&sb->sb_aiojobq, job, list);
if (TAILQ_EMPTY(&sb->sb_aiojobq))
sb->sb_flags &= ~SB_AIO;
SOCKBUF_UNLOCK(sb);
aio_cancel(job);
}
static int
soo_aio_queue(struct file *fp, struct kaiocb *job)
{
struct socket *so;
struct sockbuf *sb;
so = fp->f_data;
switch (job->uaiocb.aio_lio_opcode) {
case LIO_READ:
sb = &so->so_rcv;
break;
case LIO_WRITE:
sb = &so->so_snd;
break;
default:
return (EINVAL);
}
SOCKBUF_LOCK(sb);
if (!aio_set_cancel_function(job, soo_aio_cancel))
panic("new job was cancelled");
TAILQ_INSERT_TAIL(&sb->sb_aiojobq, job, list);
if (!(sb->sb_flags & SB_AIO_RUNNING)) {
if (soaio_ready(so, sb))
sowakeup_aio(so, sb);
else
sb->sb_flags |= SB_AIO;
}
SOCKBUF_UNLOCK(sb);
return (0);
}

View File

@ -416,6 +416,9 @@ db_print_sockbuf(struct sockbuf *sb, const char *sockbufname, int indent)
db_printf("sb_flags: 0x%x (", sb->sb_flags);
db_print_sbflags(sb->sb_flags);
db_printf(")\n");
db_print_indent(indent);
db_printf("sb_aiojobq first: %p\n", TAILQ_FIRST(&sb->sb_aiojobq));
}
static void
@ -470,7 +473,6 @@ db_print_socket(struct socket *so, const char *socketname, int indent)
db_print_indent(indent);
db_printf("so_sigio: %p ", so->so_sigio);
db_printf("so_oobmark: %lu ", so->so_oobmark);
db_printf("so_aiojobq first: %p\n", TAILQ_FIRST(&so->so_aiojobq));
db_print_sockbuf(&so->so_rcv, "so_rcv", indent);
db_print_sockbuf(&so->so_snd, "so_snd", indent);

View File

@ -332,7 +332,7 @@ sowakeup(struct socket *so, struct sockbuf *sb)
} else
ret = SU_OK;
if (sb->sb_flags & SB_AIO)
aio_swake(so, sb);
sowakeup_aio(so, sb);
SOCKBUF_UNLOCK(sb);
if (ret == SU_ISCONNECTED)
soisconnected(so);

View File

@ -134,6 +134,7 @@ __FBSDID("$FreeBSD$");
#include <sys/stat.h>
#include <sys/sx.h>
#include <sys/sysctl.h>
#include <sys/taskqueue.h>
#include <sys/uio.h>
#include <sys/jail.h>
#include <sys/syslog.h>
@ -396,7 +397,10 @@ soalloc(struct vnet *vnet)
SOCKBUF_LOCK_INIT(&so->so_rcv, "so_rcv");
sx_init(&so->so_snd.sb_sx, "so_snd_sx");
sx_init(&so->so_rcv.sb_sx, "so_rcv_sx");
TAILQ_INIT(&so->so_aiojobq);
TAILQ_INIT(&so->so_snd.sb_aiojobq);
TAILQ_INIT(&so->so_rcv.sb_aiojobq);
TASK_INIT(&so->so_snd.sb_aiotask, 0, soaio_snd, so);
TASK_INIT(&so->so_rcv.sb_aiotask, 0, soaio_rcv, so);
#ifdef VIMAGE
VNET_ASSERT(vnet != NULL, ("%s:%d vnet is NULL, so=%p",
__func__, __LINE__, so));

File diff suppressed because it is too large Load Diff

View File

@ -31,7 +31,6 @@ SUBDIR= \
ahci \
${_aic} \
aic7xxx \
aio \
alc \
ale \
alq \

View File

@ -1,10 +0,0 @@
# $FreeBSD$
.PATH: ${.CURDIR}/../../kern
KMOD= aio
SRCS= vfs_aio.c opt_vfs_aio.h vnode_if.h opt_compat.h
EXPORT_SYMS= aio_init_aioinfo aio_aqueue
.include <bsd.kmod.mk>

View File

@ -21,6 +21,11 @@
#include <sys/types.h>
#include <sys/signal.h>
#ifdef _KERNEL
#include <sys/queue.h>
#include <sys/event.h>
#include <sys/signalvar.h>
#endif
/*
* Returned by aio_cancel:
@ -51,6 +56,24 @@
*/
#define AIO_LISTIO_MAX 16
#ifdef _KERNEL
/* Default values of tunables for the AIO worker pool. */
#ifndef MAX_AIO_PROCS
#define MAX_AIO_PROCS 32
#endif
#ifndef TARGET_AIO_PROCS
#define TARGET_AIO_PROCS 4
#endif
#ifndef AIOD_LIFETIME_DEFAULT
#define AIOD_LIFETIME_DEFAULT (30 * hz)
#endif
#endif
/*
* Private members for aiocb -- don't access
* directly.
@ -77,7 +100,91 @@ typedef struct aiocb {
struct sigevent aio_sigevent; /* Signal to deliver */
} aiocb_t;
#ifndef _KERNEL
#ifdef _KERNEL
typedef void aio_cancel_fn_t(struct kaiocb *);
typedef void aio_handle_fn_t(struct kaiocb *);
/*
* Kernel version of an I/O control block.
*
* Locking key:
* * - need not protected
* a - locked by kaioinfo lock
* b - locked by backend lock
* c - locked by aio_job_mtx
*/
struct kaiocb {
TAILQ_ENTRY(kaiocb) list; /* (b) backend-specific list of jobs */
TAILQ_ENTRY(kaiocb) plist; /* (a) lists of pending / done jobs */
TAILQ_ENTRY(kaiocb) allist; /* (a) list of all jobs in proc */
int jobflags; /* (a) job flags */
int inputcharge; /* (*) input blocks */
int outputcharge; /* (*) output blocks */
struct bio *bp; /* (*) BIO backend BIO pointer */
struct buf *pbuf; /* (*) BIO backend buffer pointer */
struct vm_page *pages[btoc(MAXPHYS)+1]; /* BIO backend pages */
int npages; /* BIO backend number of pages */
struct proc *userproc; /* (*) user process */
struct ucred *cred; /* (*) active credential when created */
struct file *fd_file; /* (*) pointer to file structure */
struct aioliojob *lio; /* (*) optional lio job */
struct aiocb *ujob; /* (*) pointer in userspace of aiocb */
struct knlist klist; /* (a) list of knotes */
struct aiocb uaiocb; /* (*) copy of user I/O control block */
ksiginfo_t ksi; /* (a) realtime signal info */
uint64_t seqno; /* (*) job number */
int pending; /* (a) number of pending I/O, aio_fsync only */
aio_cancel_fn_t *cancel_fn; /* (a) backend cancel function */
aio_handle_fn_t *handle_fn; /* (c) backend handle function */
};
struct socket;
struct sockbuf;
/*
* AIO backends should permit cancellation of queued requests waiting to
* be serviced by installing a cancel routine while the request is
* queued. The cancellation routine should dequeue the request if
* necessary and cancel it. Care must be used to handle races between
* queueing and dequeueing requests and cancellation.
*
* When queueing a request somewhere such that it can be cancelled, the
* caller should:
*
* 1) Acquire lock that protects the associated queue.
* 2) Call aio_set_cancel_function() to install the cancel routine.
* 3) If that fails, the request has a pending cancel and should be
* cancelled via aio_cancel().
* 4) Queue the request.
*
* When dequeueing a request to service it or hand it off to somewhere else,
* the caller should:
*
* 1) Acquire the lock that protects the associated queue.
* 2) Dequeue the request.
* 3) Call aio_clear_cancel_function() to clear the cancel routine.
* 4) If that fails, the cancel routine is about to be called. The
* caller should ignore the request.
*
* The cancel routine should:
*
* 1) Acquire the lock that protects the associated queue.
* 2) Call aio_cancel_cleared() to determine if the request is already
* dequeued due to a race with dequeueing thread.
* 3) If that fails, dequeue the request.
* 4) Cancel the request via aio_cancel().
*/
bool aio_cancel_cleared(struct kaiocb *job);
void aio_cancel(struct kaiocb *job);
bool aio_clear_cancel_function(struct kaiocb *job);
void aio_complete(struct kaiocb *job, long status, int error);
void aio_schedule(struct kaiocb *job, aio_handle_fn_t *func);
bool aio_set_cancel_function(struct kaiocb *job, aio_cancel_fn_t *func);
void aio_switch_vmspace(struct kaiocb *job);
#else /* !_KERNEL */
struct timespec;
@ -137,14 +244,6 @@ int aio_waitcomplete(struct aiocb **, struct timespec *);
int aio_fsync(int op, struct aiocb *aiocbp);
__END_DECLS
#else
#endif /* !_KERNEL */
/* Forward declarations for prototypes below. */
struct socket;
struct sockbuf;
extern void (*aio_swake)(struct socket *, struct sockbuf *);
#endif
#endif
#endif /* !_SYS_AIO_H_ */

View File

@ -73,6 +73,7 @@ struct socket;
struct file;
struct filecaps;
struct kaiocb;
struct kinfo_file;
struct ucred;
@ -119,6 +120,7 @@ typedef int fo_fill_kinfo_t(struct file *fp, struct kinfo_file *kif,
typedef int fo_mmap_t(struct file *fp, vm_map_t map, vm_offset_t *addr,
vm_size_t size, vm_prot_t prot, vm_prot_t cap_maxprot,
int flags, vm_ooffset_t foff, struct thread *td);
typedef int fo_aio_queue_t(struct file *fp, struct kaiocb *job);
typedef int fo_flags_t;
struct fileops {
@ -136,6 +138,7 @@ struct fileops {
fo_seek_t *fo_seek;
fo_fill_kinfo_t *fo_fill_kinfo;
fo_mmap_t *fo_mmap;
fo_aio_queue_t *fo_aio_queue;
fo_flags_t fo_flags; /* DFLAG_* below */
};
@ -406,6 +409,13 @@ fo_mmap(struct file *fp, vm_map_t map, vm_offset_t *addr, vm_size_t size,
flags, foff, td));
}
static __inline int
fo_aio_queue(struct file *fp, struct kaiocb *job)
{
return ((*fp->f_ops->fo_aio_queue)(fp, job));
}
#endif /* _KERNEL */
#endif /* !SYS_FILE_H */

View File

@ -36,6 +36,7 @@
#include <sys/_lock.h>
#include <sys/_mutex.h>
#include <sys/_sx.h>
#include <sys/_task.h>
#define SB_MAX (2*1024*1024) /* default for max chars in sockbuf */
@ -53,6 +54,7 @@
#define SB_IN_TOE 0x400 /* socket buffer is in the middle of an operation */
#define SB_AUTOSIZE 0x800 /* automatically size socket buffer */
#define SB_STOP 0x1000 /* backpressure indicator */
#define SB_AIO_RUNNING 0x2000 /* AIO operation running */
#define SBS_CANTSENDMORE 0x0010 /* can't send more data to peer */
#define SBS_CANTRCVMORE 0x0020 /* can't receive more data from peer */
@ -107,6 +109,8 @@ struct sockbuf {
short sb_flags; /* (a) flags, see below */
int (*sb_upcall)(struct socket *, void *, int); /* (a) */
void *sb_upcallarg; /* (a) */
TAILQ_HEAD(, kaiocb) sb_aiojobq; /* (a) pending AIO ops */
struct task sb_aiotask; /* AIO task */
};
#ifdef _KERNEL

View File

@ -103,7 +103,6 @@ struct socket {
struct sigio *so_sigio; /* [sg] information for async I/O or
out of band data (SIGURG) */
u_long so_oobmark; /* (c) chars to oob mark */
TAILQ_HEAD(, kaiocb) so_aiojobq; /* AIO ops waiting on socket */
struct sockbuf so_rcv, so_snd;
@ -342,6 +341,8 @@ int getsock_cap(struct thread *td, int fd, cap_rights_t *rightsp,
struct file **fpp, u_int *fflagp);
void soabort(struct socket *so);
int soaccept(struct socket *so, struct sockaddr **nam);
void soaio_rcv(void *context, int pending);
void soaio_snd(void *context, int pending);
int socheckuid(struct socket *so, uid_t uid);
int sobind(struct socket *so, struct sockaddr *nam, struct thread *td);
int sobindat(int fd, struct socket *so, struct sockaddr *nam,
@ -396,6 +397,7 @@ void soupcall_clear(struct socket *so, int which);
void soupcall_set(struct socket *so, int which,
int (*func)(struct socket *, void *, int), void *arg);
void sowakeup(struct socket *so, struct sockbuf *sb);
void sowakeup_aio(struct socket *so, struct sockbuf *sb);
int selsocket(struct socket *so, int events, struct timeval *tv,
struct thread *td);

View File

@ -47,6 +47,7 @@
#include <unistd.h>
#include "freebsd_test_suite/macros.h"
#include "local.h"
#define PATH_TEMPLATE "aio.XXXXXXXXXX"
@ -70,6 +71,7 @@ main (int argc, char *argv[])
unsigned i, j;
PLAIN_REQUIRE_KERNEL_MODULE("aio", 0);
PLAIN_REQUIRE_UNSAFE_AIO(0);
kq = kqueue();
if (kq < 0) {

View File

@ -60,6 +60,7 @@
#include <atf-c.h>
#include "freebsd_test_suite/macros.h"
#include "local.h"
#define PATH_TEMPLATE "aio.XXXXXXXXXX"
@ -340,6 +341,7 @@ ATF_TC_BODY(aio_file_test, tc)
int fd;
ATF_REQUIRE_KERNEL_MODULE("aio");
ATF_REQUIRE_UNSAFE_AIO();
strcpy(pathname, PATH_TEMPLATE);
fd = mkstemp(pathname);
@ -386,6 +388,7 @@ ATF_TC_BODY(aio_fifo_test, tc)
struct aio_context ac;
ATF_REQUIRE_KERNEL_MODULE("aio");
ATF_REQUIRE_UNSAFE_AIO();
/*
* In theory, mkstemp() can return a name that is then collided with.
@ -497,6 +500,7 @@ ATF_TC_BODY(aio_pty_test, tc)
int error;
ATF_REQUIRE_KERNEL_MODULE("aio");
ATF_REQUIRE_UNSAFE_AIO();
ATF_REQUIRE_MSG(openpty(&read_fd, &write_fd, NULL, NULL, NULL) == 0,
"openpty failed: %s", strerror(errno));
@ -544,6 +548,7 @@ ATF_TC_BODY(aio_pipe_test, tc)
int pipes[2];
ATF_REQUIRE_KERNEL_MODULE("aio");
ATF_REQUIRE_UNSAFE_AIO();
ATF_REQUIRE_MSG(pipe(pipes) != -1,
"pipe failed: %s", strerror(errno));

View File

@ -50,6 +50,7 @@
#include <unistd.h>
#include "freebsd_test_suite/macros.h"
#include "local.h"
#define PATH_TEMPLATE "aio.XXXXXXXXXX"
@ -74,6 +75,7 @@ main(int argc, char *argv[])
int tmp_file = 0, failed = 0;
PLAIN_REQUIRE_KERNEL_MODULE("aio", 0);
PLAIN_REQUIRE_UNSAFE_AIO(0);
kq = kqueue();
if (kq < 0)

74
tests/sys/aio/local.h Normal file
View File

@ -0,0 +1,74 @@
/*-
* Copyright (c) 2016 Chelsio Communications, Inc.
* All rights reserved.
* Written by: John Baldwin <jhb@FreeBSD.org>
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
#ifndef _AIO_TEST_LOCAL_H_
#define _AIO_TEST_LOCAL_H_
#include <sys/types.h>
#include <sys/sysctl.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <atf-c.h>
#define ATF_REQUIRE_UNSAFE_AIO() do { \
size_t _len; \
int _unsafe; \
\
_len = sizeof(_unsafe); \
if (sysctlbyname("vfs.aio.enable_unsafe", &_unsafe, &_len, NULL,\
0) < 0) { \
if (errno != ENOENT) \
atf_libc_error(errno, \
"Failed to read vfs.aio.enable_unsafe"); \
} else if (_unsafe == 0) \
atf_tc_skip("Unsafe AIO is disabled"); \
} while (0)
#define PLAIN_REQUIRE_UNSAFE_AIO(_exit_code) do { \
size_t _len; \
int _unsafe; \
\
_len = sizeof(_unsafe); \
if (sysctlbyname("vfs.aio.enable_unsafe", &_unsafe, &_len, NULL,\
0) < 0) { \
if (errno != ENOENT) { \
printf("Failed to read vfs.aio.enable_unsafe: %s\n",\
strerror(errno)); \
_exit(1); \
} \
} else if (_unsafe == 0) { \
printf("Unsafe AIO is disabled"); \
_exit(_exit_code); \
} \
} while (0)
#endif /* !_AIO_TEST_LOCAL_H_ */