diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc index 37d42bbf547a..663f3ee5479b 100644 --- a/lib/libc/sys/Makefile.inc +++ b/lib/libc/sys/Makefile.inc @@ -87,7 +87,7 @@ MAN2+= _exit.2 accept.2 access.2 acct.2 adjtime.2 \ getsockopt.2 gettimeofday.2 getuid.2 \ intro.2 ioctl.2 issetugid.2 jail.2 kill.2 \ kldfind.2 kldfirstmod.2 kldload.2 kldnext.2 kldstat.2 kldunload.2 \ - ktrace.2 link.2 listen.2 lseek.2 \ + ktrace.2 kqueue.2 link.2 listen.2 lseek.2 \ madvise.2 mincore.2 minherit.2 mkdir.2 mkfifo.2 mknod.2 mlock.2 mmap.2 \ mount.2 mprotect.2 msync.2 munmap.2 nanosleep.2 \ nfssvc.2 open.2 pathconf.2 pipe.2 poll.2 profil.2 ptrace.2 quotactl.2 \ diff --git a/lib/libc/sys/kqueue.2 b/lib/libc/sys/kqueue.2 new file mode 100644 index 000000000000..e0b53d5919d0 --- /dev/null +++ b/lib/libc/sys/kqueue.2 @@ -0,0 +1,400 @@ +.\" Copyright (c) 2000 Jonathan Lemon +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd April 14, 2000 +.Dt KQUEUE 2 +.Os +.Sh NAME +.Nm kqueue, kevent +.Nd kernel event notification mechanism +.Sh SYNOPSIS +.Fd #include +.Ft int +.Fn kqueue "void" +.Ft int +.Fn kevent "int kq" "int nchanges" "struct kevent **changelist" \ +"int nevents" "struct kevent *eventlist" "struct timespec *timeout" +.Sh DESCRIPTION +.Fn kqueue +provides a generic method of notifying the user when an event +happens or a condition holds, based on the results of small +pieces of kernel code termed filters. +An kevent is identified by the (ident, filter) pair; there may only +be one unique kevent per kqueue. + +The filter is executed upon the initial registration of a kevent +in order to detect a preexisting condition is present, and is also +executed whenever an event is passed to the filter for evaluation. +If the filter determines that the condition should be reported, +then the kevent is placed on the kqueue for the user to retrieve. + +The filter is also run when the user attempts to retrieve the kevent +from the kqueue, and if the filter indicates the condition that triggered +the event no longer holds, the kevent is removed from the kqueue and +is not returned. + +Multiple events which trigger the filter do not result in multiple +kevents being placed on the kqueue; instead, the filter will aggregate +the events into a single struct kevent. +Calling +.Fn close +on a file descriptor will remove any kevents that reference the descriptor. +.Pp +.Fn kqueue +creates a new kernel event queue and returns a descriptor. +.Pp +.Fn kevent +is used to register events with the queue, and return any pending +events to the user. +.Fa changelist +is a pointer to an array of pointers to +.Va kevent +structures, as defined in +.Aq Pa event.h . +All changes contained in the +.Fa changelist +are applied before any pending events are read from the queue. +.Fa nchanges +gives the size of +.Fa changelist . +.Fa eventlist +is a pointer to an array of kevent structures. +.Fa nevents +determines the size of +.Fa eventlist . +If +.Fa timeout +is a non-NULL pointer, it specifies a maximum interval to wait +for an event. If +.Fa timeout +is a NULL pointer, +.Fn kevent +waits indefinitely. To effect a poll, the +.Fa timeout +argument should be non-NULL, pointing to a zero-valued +.Va timespec +structure. +.Pp +The +.Va kevent +structure is defined as: +.Bd -literal +struct kevent { + uintptr_t ident; /* identifier for this event */ + short filter; /* filter for event */ + u_short flags; /* action flags for kqueue */ + u_int fflags; /* filter flag value */ + intptr_t data; /* filter data value */ + void *udata; /* opaque user data identifier */ +}; +.Ed +.Pp +The fields of +.Fa struct kevent +are: +.Bl -tag -width XXXfilter +.It ident +Value used to identify this event. +The exact interpretation is determined by the attached filter, +but often is a file descriptor. +.It filter +Identifies the kernel filter used to process this event. The pre-defined +system filters are described below. +.It flags +Actions to perform on the event. +.It fflags +Filter-specific flags. +.It data +Filter-specific data value. +.It udata +Opaque user-defined value passed through the kernel unchanged. +.El +.Pp +The +.Va flags +field can contain the following values: +.Bl -tag -width XXXEV_ONESHOT +.It EV_ADD +Adds the event to the kqueue. Re-adding an existing event +will modify the parameters of the original event, and not result +in a duplicate entry. Adding an event automatically enables it, +unless overridden by the EV_DISABLE flag. +.It EV_ENABLE +Permit +.Fn kevent +to return the event if it is triggered. +.It EV_DISABLE +Disable the event so +.Fn kevent +will not return it. The filter itself is not disabled. +.It EV_DELETE +Removes the event from the kqueue. Events which are attached to +file descriptors are automatically deleted on the last close of +the descriptor. +.It EV_ONESHOT +Causes the event to return only the first occurrence of the filter +being triggered. After the user retrieves the event from the kqueue, +it is deleted. +.It EV_CLEAR +After the event is retrieved by the user, its state is reset. +This is useful for filters which report state transitions +instead of the current state. Note that some filters may automatically +set this flag internally. +.It EV_EOF +Filters may set this flag to indicate filter-specific EOF condition. +.It EV_ERROR +See +.Sx RETURN VALUES +below. +.El +.Pp +The predefined system filters are listed below. +Arguments may be passed to and from the filter via the +.Va fflags +and +.Va data +fields in the kevent structure. +.Bl -tag +.It EVFILT_READ +Takes a descriptor as the identifier, and returns whenever +there is data available to read. +The behavior of the filter is slightly different depending +on the descriptor type. +.Bl -tag +.It Sockets +Sockets which have previously been passed to +.Fn listen +return when there is an incoming connection pending. +.Va data +contains the size of the listen backlog. +.Pp +Other socket descriptors return when there is data to be read, +subject to the SO_RCVLOWAT value of the socket buffer. +.Va data +contains the number of bytes in the socket buffer. +.Pp +If the read direction of the socket has shutdown, then the filter +also sets EV_EOF in +.Va flags . +It is possible for EOF to be returned (indicating the connection is gone) +while there is still data pending in the socket buffer. +.El +.Bl -tag +.It Vnodes +Returns when the file pointer is not at the end of file. +.Va data +contains the offset from current position to end of file, +and may be negative. +.El +.Bl -tag +.It Fifos, Pipes +Returns when the there is data to read; +.Va data +contains the number of bytes available. +.Pp +When the last writer disconnects, the filter will set EV_EOF in +.Va flags . +This may be cleared by passing in EV_CLEAR, at which point the +filter will resume waiting for data to become available before +returning. +.El +.El +.Bl -tag +.It EVFILT_WRITE +Takes a descriptor as the identifier, and returns whenever +it is possible to write to the descriptor. For sockets, pipes +and fifos, +.Va data +will contain the amount of space remaining in the write buffer. +The filter will set EV_EOF when the reader disconnects, and for +the fifo case, this may be cleared by use of EV_CLEAR. +Note that this filter is not supported for vnodes. +.El +.Bl -tag +.It EVFILT_AIO +A kevent structure is initialized, with +.Va ident +containing the descriptor of the kqueue that the event should be +attached to. The address of the kevent structure is then placed in the +.Va aio_lio_opcode +field of the AIO request, and the aio_* function is then called. +The event will be registered with the specified kqueue, and the +.Va ident +argument set to the +.Fa struct aiocb +returned by the aio_* function. +The filter returns under the same conditions as aio_error. +.Pp +NOTE: this interface is unstable and subject to change. +.El +.Bl -tag +.It EVFILT_VNODE +Takes a file descriptor as the identifier and the events to watch for in +.Va fflags , +and returns when one or more of the requested events occurs on the descriptor. +The events to monitor are: +.Bl -tag -width XXNOTE_RENAME +.It NOTE_DELETE +.Fn unlink +was called on the file referenced by the descriptor. +.It NOTE_WRITE +A write occurred on the file referenced by the descriptor. +.It NOTE_EXTEND +The file referenced by the descriptor was extended. +.It NOTE_ATTRIB +The file referenced by the descriptor had its attributes changed. +.It NOTE_LINK +The link count on the file changed. +.It NOTE_RENAME +The file referenced by the descriptor was renamed. +.El +.Pp +On return, +.Va fflags +contains the events which triggered the filter. +.El +.Bl -tag +.It EVFILT_PROC +Takes the process ID to monitor as the identifier and the events to watch for +in +.Va fflags , +and returns when the process performs one or more of the requested events. +If a process can normally see another process, it can attach an event to it. +The events to monitor are: +.Bl -tag -width XXNOTE_TRACKERR +.It NOTE_EXIT +The process has exited. +.It NOTE_FORK +The process has called +.Fn fork . +.It NOTE_EXEC +The process has executed a new process via +.Xr execve 2 +or similar call. +.It NOTE_TRACK +Follow a process across +.Fn fork +calls. The parent process will return with NOTE_TRACK set in the +.Va fflags +field, while the child process will return with NOTE_CHILD set in +.Va fflags +and the parent PID in +.Va data . +.It NOTE_TRACKERR +This flag is returned if the system was unable to attach an event to +the child process, usually due to resource limitations. +.El +.Pp +On return, +.Va fflags +contains the events which triggered the filter. +.It EVFILT_SIGNAL +Takes the signal number to monitor as the identifier and returns +when the given signal is delivered to the process. +This coexists with the +.Fn signal +and +.Fn sigaction +facilities, and has a lower precedence. The filter will record +all attempts to deliver a signal to a process, even if the signal has +been marked as SIG_IGN. Event notification happens after normal +signal delivery processing. +.Va data +returns the number of times the signal has occurred since the last call to +.Fn kqueue . +This filter automatically sets the EV_CLEAR flag internally. +.El +.Sh RETURN VALUES +.Fn kevent +returns the number of events placed in the +.Ar eventlist , +up to the value given by +.Ar nevents . +If an error occurs while processing an element of the +.Ar changelist +and there is enough room in the +.Ar eventlist , +then the event will be placed in the +.Ar eventlist +with +.Dv EV_ERROR +set in +.Va flags +and the system error in +.Va data . +Otherwise, +.Dv -1 +will be returned, and +.Dv errno +will be set to indicate the error condition. +If the time limit expires, then +.Fn kevent +returns 0. +.Sh ERRORS +The +.Fn kevent +function fails if: +.Bl -tag -width Er +.It Bq Er EACCESS +The process does not have permission to register a filter. +.It Bq Er EFAULT +There was an error reading or writing the +.Va kevent +structure. +.It Bq Er EBADF +The specified descriptor is invalid. +.It Bq Er EINTR +A signal was delivered before the timeout expired and before any +events were placed on the kqueue for return. +.It Bq Er EINVAL +The specified time limit or filter is invalid. +.It Bq Er ENOMEM +No memory was available to register the event. +.It Bq Er ESRCH +The specified process to attach to does not exist. +.El +.Sh SEE ALSO +.Xr aio_error 2 , +.Xr aio_read 2 , +.Xr aio_return 2 , +.Xr poll 2 , +.Xr read 2 , +.Xr select 2 , +.Xr signal 3 , +.Xr sigaction 2 , +.Xr write 2 . +.Sh HISTORY +The +.Fn kqueue +and +.Fn kevent +functions first appeared in +.Fx 5.0 . +.Sh AUTHOR +The +.Fn kqueue +system and this manual page were written by +.An Jonathan Lemon Aq jlemon@freebsd.org .