freebsd-dev/lib/libc/sys/madvise.2
John Baldwin 936c09ac0f Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region.  It is currently only supported on regular files.

Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types.  The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE().  These modes are
thus filesystem independent.  Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.

The second type of hints are used to hint to the OS that data will or
will not be used.  These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request.  This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf().  This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode.  This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue.  The second change adds
a new vm_object_page_cache() method.  This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.

To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.

Reviewed by:	jilles, kib, arch@
Approved by:	re (kib)
MFC after:	1 month
2011-11-04 04:02:50 +00:00

184 lines
6.2 KiB
Groff

.\" Copyright (c) 1991, 1993
.\" The Regents of the University of California. All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 4. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" @(#)madvise.2 8.1 (Berkeley) 6/9/93
.\" $FreeBSD$
.\"
.Dd July 19, 1996
.Dt MADVISE 2
.Os
.Sh NAME
.Nm madvise , posix_madvise
.Nd give advice about use of memory
.Sh LIBRARY
.Lb libc
.Sh SYNOPSIS
.In sys/mman.h
.Ft int
.Fn madvise "void *addr" "size_t len" "int behav"
.Ft int
.Fn posix_madvise "void *addr" "size_t len" "int behav"
.Sh DESCRIPTION
The
.Fn madvise
system call
allows a process that has knowledge of its memory behavior
to describe it to the system.
The
.Fn posix_madvise
interface is identical and is provided for standards conformance.
.Pp
The known behaviors are:
.Bl -tag -width MADV_SEQUENTIAL
.It Dv MADV_NORMAL
Tells the system to revert to the default paging
behavior.
.It Dv MADV_RANDOM
Is a hint that pages will be accessed randomly, and prefetching
is likely not advantageous.
.It Dv MADV_SEQUENTIAL
Causes the VM system to depress the priority of
pages immediately preceding a given page when it is faulted in.
.It Dv MADV_WILLNEED
Causes pages that are in a given virtual address range
to temporarily have higher priority, and if they are in
memory, decrease the likelihood of them being freed.
Additionally,
the pages that are already in memory will be immediately mapped into
the process, thereby eliminating unnecessary overhead of going through
the entire process of faulting the pages in.
This WILL NOT fault
pages in from backing store, but quickly map the pages already in memory
into the calling process.
.It Dv MADV_DONTNEED
Allows the VM system to decrease the in-memory priority
of pages in the specified range.
Additionally future references to
this address range will incur a page fault.
.It Dv MADV_FREE
Gives the VM system the freedom to free pages,
and tells the system that information in the specified page range
is no longer important.
This is an efficient way of allowing
.Xr malloc 3
to free pages anywhere in the address space, while keeping the address space
valid.
The next time that the page is referenced, the page might be demand
zeroed, or might contain the data that was there before the
.Dv MADV_FREE
call.
References made to that address space range will not make the VM system
page the information back in from backing store until the page is
modified again.
.It Dv MADV_NOSYNC
Request that the system not flush the data associated with this map to
physical backing store unless it needs to.
Typically this prevents the
file system update daemon from gratuitously writing pages dirtied
by the VM system to physical disk.
Note that VM/file system coherency is
always maintained, this feature simply ensures that the mapped data is
only flush when it needs to be, usually by the system pager.
.Pp
This feature is typically used when you want to use a file-backed shared
memory area to communicate between processes (IPC) and do not particularly
need the data being stored in that area to be physically written to disk.
With this feature you get the equivalent performance with mmap that you
would expect to get with SysV shared memory calls, but in a more controllable
and less restrictive manner.
However, note that this feature is not portable
across UNIX platforms (though some may do the right thing by default).
For more information see the MAP_NOSYNC section of
.Xr mmap 2
.It Dv MADV_AUTOSYNC
Undoes the effects of MADV_NOSYNC for any future pages dirtied within the
address range.
The effect on pages already dirtied is indeterminate - they
may or may not be reverted.
You can guarantee reversion by using the
.Xr msync 2
or
.Xr fsync 2
system calls.
.It Dv MADV_NOCORE
Region is not included in a core file.
.It Dv MADV_CORE
Include region in a core file.
.It Dv MADV_PROTECT
Informs the VM system this process should not be killed when the
swap space is exhausted.
The process must have superuser privileges.
This should be used judiciously in processes that must remain running
for the system to properly function.
.El
.Pp
Portable programs that call the
.Fn posix_madvise
interface should use the aliases
.Dv POSIX_MADV_NORMAL , POSIX_MADV_SEQUENTIAL ,
.Dv POSIX_MADV_RANDOM , POSIX_MADV_WILLNEED ,
and
.Dv POSIX_MADV_DONTNEED
rather than the flags described above.
.Sh RETURN VALUES
.Rv -std madvise
.Sh ERRORS
The
.Fn madvise
system call will fail if:
.Bl -tag -width Er
.It Bq Er EINVAL
The
.Fa behav
argument is not valid.
.It Bq Er ENOMEM
The virtual address range specified by the
.Fa addr
and
.Fa len
arguments is not valid.
.It Bq Er EPERM
.Dv MADV_PROTECT
was specified and the process does not have superuser privileges.
.El
.Sh SEE ALSO
.Xr mincore 2 ,
.Xr mprotect 2 ,
.Xr msync 2 ,
.Xr munmap 2 ,
.Xr posix_fadvise 2
.Sh STANDARDS
The
.Fn posix_madvise
interface conforms to
.St -p1003.1-2001 .
.Sh HISTORY
The
.Fn madvise
system call first appeared in
.Bx 4.4 .