freebsd-skq/lib/libc/sys/mlock.2
Mark Johnston 54a3a11421 Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes.  User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.

The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2).  Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks.  In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process.  The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.

The choice to count virtual user-wired pages rather than physical
pages was done for simplicity.  There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.

The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded.  For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.

Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM.  Users that wish to exceed the limit must tune
vm.max_user_wired.

Reviewed by:	kib, ngie (mlock() test changes)
Tested by:	pho (earlier version)
MFC after:	45 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19908
2019-05-13 16:38:48 +00:00

180 lines
5.6 KiB
Groff

.\" Copyright (c) 1993
.\" The Regents of the University of California. All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" @(#)mlock.2 8.2 (Berkeley) 12/11/93
.\" $FreeBSD$
.\"
.Dd May 13, 2019
.Dt MLOCK 2
.Os
.Sh NAME
.Nm mlock ,
.Nm munlock
.Nd lock (unlock) physical pages in memory
.Sh LIBRARY
.Lb libc
.Sh SYNOPSIS
.In sys/mman.h
.Ft int
.Fn mlock "const void *addr" "size_t len"
.Ft int
.Fn munlock "const void *addr" "size_t len"
.Sh DESCRIPTION
The
.Fn mlock
system call
locks into memory the physical pages associated with the virtual address
range starting at
.Fa addr
for
.Fa len
bytes.
The
.Fn munlock
system call unlocks pages previously locked by one or more
.Fn mlock
calls.
For both, the
.Fa addr
argument should be aligned to a multiple of the page size.
If the
.Fa len
argument is not a multiple of the page size, it will be rounded up
to be so.
The entire range must be allocated.
.Pp
After an
.Fn mlock
system call, the indicated pages will cause neither a non-resident page
nor address-translation fault until they are unlocked.
They may still cause protection-violation faults or TLB-miss faults on
architectures with software-managed TLBs.
The physical pages remain in memory until all locked mappings for the pages
are removed.
Multiple processes may have the same physical pages locked via their own
virtual address mappings.
A single process may likewise have pages multiply-locked via different virtual
mappings of the same physical pages.
Unlocking is performed explicitly by
.Fn munlock
or implicitly by a call to
.Fn munmap
which deallocates the unmapped address range.
Locked mappings are not inherited by the child process after a
.Xr fork 2 .
.Pp
Since physical memory is a potentially scarce resource, processes are
limited in how much they can lock down.
The amount of memory that a single process can
.Fn mlock
is limited by both the per-process
.Dv RLIMIT_MEMLOCK
resource limit and the
system-wide
.Dq wired pages
limit
.Va vm.max_user_wired .
.Va vm.max_user_wired
applies to the system as a whole, so the amount available to a single
process at any given time is the difference between
.Va vm.max_user_wired
and
.Va vm.stats.vm.v_user_wire_count .
.Pp
If
.Va security.bsd.unprivileged_mlock
is set to 0 these calls are only available to the super-user.
.Sh RETURN VALUES
.Rv -std
.Pp
If the call succeeds, all pages in the range become locked (unlocked);
otherwise the locked status of all pages in the range remains unchanged.
.Sh ERRORS
The
.Fn mlock
system call
will fail if:
.Bl -tag -width Er
.It Bq Er EPERM
.Va security.bsd.unprivileged_mlock
is set to 0 and the caller is not the super-user.
.It Bq Er EINVAL
The address range given wraps around zero.
.It Bq Er ENOMEM
Some portion of the indicated address range is not allocated.
There was an error faulting/mapping a page.
Locking the indicated range would exceed the per-process or system-wide limits
for locked memory.
.El
The
.Fn munlock
system call
will fail if:
.Bl -tag -width Er
.It Bq Er EPERM
.Va security.bsd.unprivileged_mlock
is set to 0 and the caller is not the super-user.
.It Bq Er EINVAL
The address range given wraps around zero.
.It Bq Er ENOMEM
Some or all of the address range specified by the addr and len
arguments does not correspond to valid mapped pages in the address space
of the process.
.It Bq Er ENOMEM
Locking the pages mapped by the specified range would exceed a limit on
the amount of memory that the process may lock.
.El
.Sh "SEE ALSO"
.Xr fork 2 ,
.Xr mincore 2 ,
.Xr minherit 2 ,
.Xr mlockall 2 ,
.Xr mmap 2 ,
.Xr munlockall 2 ,
.Xr munmap 2 ,
.Xr setrlimit 2 ,
.Xr getpagesize 3
.Sh HISTORY
The
.Fn mlock
and
.Fn munlock
system calls first appeared in
.Bx 4.4 .
.Sh BUGS
Allocating too much wired memory can lead to a memory-allocation deadlock
which requires a reboot to recover from.
.Pp
The per-process and system-wide resource limits of locked memory apply
to the amount of virtual memory locked, not the amount of locked physical
pages.
Hence two distinct locked mappings of the same physical page counts as
2 pages aginst the system limit, and also against the per-process limit
if both mappings belong to the same physical map.
.Pp
The per-process resource limit is not currently supported.