b08b976e68
allocation patterns, number of CPUs, and MALLOC_OPTIONS settings indicate that lazy deallocation has the potential to worsen throughput dramatically. Performance degradation occurs when multiple threads try to clear the lazy free cache simultaneously. Various experiments to avoid this bottleneck failed to completely solve this problem, while adding yet more complexity.
558 lines
17 KiB
Groff
558 lines
17 KiB
Groff
.\" Copyright (c) 1980, 1991, 1993
|
|
.\" The Regents of the University of California. All rights reserved.
|
|
.\"
|
|
.\" This code is derived from software contributed to Berkeley by
|
|
.\" the American National Standards Committee X3, on Information
|
|
.\" Processing Systems.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\" 3. Neither the name of the University nor the names of its contributors
|
|
.\" may be used to endorse or promote products derived from this software
|
|
.\" without specific prior written permission.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" @(#)malloc.3 8.1 (Berkeley) 6/4/93
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd February 17, 2008
|
|
.Dt MALLOC 3
|
|
.Os
|
|
.Sh NAME
|
|
.Nm malloc , calloc , realloc , free , reallocf , malloc_usable_size
|
|
.Nd general purpose memory allocation functions
|
|
.Sh LIBRARY
|
|
.Lb libc
|
|
.Sh SYNOPSIS
|
|
.In stdlib.h
|
|
.Ft void *
|
|
.Fn malloc "size_t size"
|
|
.Ft void *
|
|
.Fn calloc "size_t number" "size_t size"
|
|
.Ft void *
|
|
.Fn realloc "void *ptr" "size_t size"
|
|
.Ft void *
|
|
.Fn reallocf "void *ptr" "size_t size"
|
|
.Ft void
|
|
.Fn free "void *ptr"
|
|
.Ft const char *
|
|
.Va _malloc_options ;
|
|
.Ft void
|
|
.Fo \*(lp*_malloc_message\*(rp
|
|
.Fa "const char *p1" "const char *p2" "const char *p3" "const char *p4"
|
|
.Fc
|
|
.In malloc_np.h
|
|
.Ft size_t
|
|
.Fn malloc_usable_size "const void *ptr"
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Fn malloc
|
|
function allocates
|
|
.Fa size
|
|
bytes of uninitialized memory.
|
|
The allocated space is suitably aligned (after possible pointer coercion)
|
|
for storage of any type of object.
|
|
.Pp
|
|
The
|
|
.Fn calloc
|
|
function allocates space for
|
|
.Fa number
|
|
objects,
|
|
each
|
|
.Fa size
|
|
bytes in length.
|
|
The result is identical to calling
|
|
.Fn malloc
|
|
with an argument of
|
|
.Dq "number * size" ,
|
|
with the exception that the allocated memory is explicitly initialized
|
|
to zero bytes.
|
|
.Pp
|
|
The
|
|
.Fn realloc
|
|
function changes the size of the previously allocated memory referenced by
|
|
.Fa ptr
|
|
to
|
|
.Fa size
|
|
bytes.
|
|
The contents of the memory are unchanged up to the lesser of the new and
|
|
old sizes.
|
|
If the new size is larger,
|
|
the contents of the newly allocated portion of the memory are undefined.
|
|
Upon success, the memory referenced by
|
|
.Fa ptr
|
|
is freed and a pointer to the newly allocated memory is returned.
|
|
Note that
|
|
.Fn realloc
|
|
and
|
|
.Fn reallocf
|
|
may move the memory allocation, resulting in a different return value than
|
|
.Fa ptr .
|
|
If
|
|
.Fa ptr
|
|
is
|
|
.Dv NULL ,
|
|
the
|
|
.Fn realloc
|
|
function behaves identically to
|
|
.Fn malloc
|
|
for the specified size.
|
|
.Pp
|
|
The
|
|
.Fn reallocf
|
|
function is identical to the
|
|
.Fn realloc
|
|
function, except that it
|
|
will free the passed pointer when the requested memory cannot be allocated.
|
|
This is a
|
|
.Fx
|
|
specific API designed to ease the problems with traditional coding styles
|
|
for realloc causing memory leaks in libraries.
|
|
.Pp
|
|
The
|
|
.Fn free
|
|
function causes the allocated memory referenced by
|
|
.Fa ptr
|
|
to be made available for future allocations.
|
|
If
|
|
.Fa ptr
|
|
is
|
|
.Dv NULL ,
|
|
no action occurs.
|
|
.Pp
|
|
The
|
|
.Fn malloc_usable_size
|
|
function returns the usable size of the allocation pointed to by
|
|
.Fa ptr .
|
|
The return value may be larger than the size that was requested during
|
|
allocation.
|
|
The
|
|
.Fn malloc_usable_size
|
|
function is not a mechanism for in-place
|
|
.Fn realloc ;
|
|
rather it is provided solely as a tool for introspection purposes.
|
|
Any discrepancy between the requested allocation size and the size reported by
|
|
.Fn malloc_usable_size
|
|
should not be depended on, since such behavior is entirely
|
|
implementation-dependent.
|
|
.Sh TUNING
|
|
Once, when the first call is made to one of these memory allocation
|
|
routines, various flags will be set or reset, which affect the
|
|
workings of this allocator implementation.
|
|
.Pp
|
|
The
|
|
.Dq name
|
|
of the file referenced by the symbolic link named
|
|
.Pa /etc/malloc.conf ,
|
|
the value of the environment variable
|
|
.Ev MALLOC_OPTIONS ,
|
|
and the string pointed to by the global variable
|
|
.Va _malloc_options
|
|
will be interpreted, in that order, from left to right as flags.
|
|
.Pp
|
|
Each flag is a single letter, optionally prefixed by a non-negative base 10
|
|
integer repetition count.
|
|
For example,
|
|
.Dq 3N
|
|
is equivalent to
|
|
.Dq NNN .
|
|
Some flags control parameter magnitudes, where uppercase increases the
|
|
magnitude, and lowercase decreases the magnitude.
|
|
Other flags control boolean parameters, where uppercase indicates that a
|
|
behavior is set, or on, and lowercase means that a behavior is not set, or off.
|
|
.Bl -tag -width indent
|
|
.It A
|
|
All warnings (except for the warning about unknown
|
|
flags being set) become fatal.
|
|
The process will call
|
|
.Xr abort 3
|
|
in these cases.
|
|
.It B
|
|
Double/halve the per-arena lock contention threshold at which a thread is
|
|
randomly re-assigned to an arena.
|
|
This dynamic load balancing tends to push threads away from highly contended
|
|
arenas, which avoids worst case contention scenarios in which threads
|
|
disproportionately utilize arenas.
|
|
However, due to the highly dynamic load that applications may place on the
|
|
allocator, it is impossible for the allocator to know in advance how sensitive
|
|
it should be to contention over arenas.
|
|
Therefore, some applications may benefit from increasing or decreasing this
|
|
threshold parameter.
|
|
This option is not available for some configurations (non-PIC).
|
|
.It D
|
|
Use
|
|
.Xr sbrk 2
|
|
to acquire memory in the data storage segment (DSS).
|
|
This option is enabled by default.
|
|
See the
|
|
.Dq M
|
|
option for related information and interactions.
|
|
.It F
|
|
Double/halve the per-arena maximum number of dirty unused pages that are
|
|
allowed to accumulate before informing the kernel about at least half of those
|
|
pages via
|
|
.Xr madvise 2 .
|
|
This provides the kernel with sufficient information to recycle dirty pages if
|
|
physical memory becomes scarce and the pages remain unused.
|
|
The default is 512 pages per arena;
|
|
.Ev MALLOC_OPTIONS=10f
|
|
will prevent any dirty unused pages from accumulating.
|
|
.It J
|
|
Each byte of new memory allocated by
|
|
.Fn malloc ,
|
|
.Fn realloc
|
|
or
|
|
.Fn reallocf
|
|
will be initialized to 0xa5.
|
|
All memory returned by
|
|
.Fn free ,
|
|
.Fn realloc
|
|
or
|
|
.Fn reallocf
|
|
will be initialized to 0x5a.
|
|
This is intended for debugging and will impact performance negatively.
|
|
.It K
|
|
Double/halve the virtual memory chunk size.
|
|
The default chunk size is 1 MB.
|
|
.It M
|
|
Use
|
|
.Xr mmap 2
|
|
to acquire anonymously mapped memory.
|
|
This option is enabled by default.
|
|
If both the
|
|
.Dq D
|
|
and
|
|
.Dq M
|
|
options are enabled, the allocator prefers the DSS over anonymous mappings,
|
|
but allocation only fails if memory cannot be acquired via either method.
|
|
If neither option is enabled, then the
|
|
.Dq M
|
|
option is implicitly enabled in order to assure that there is a method for
|
|
acquiring memory.
|
|
.It N
|
|
Double/halve the number of arenas.
|
|
The default number of arenas is four times the number of CPUs, or one if there
|
|
is a single CPU.
|
|
.It P
|
|
Various statistics are printed at program exit via an
|
|
.Xr atexit 3
|
|
function.
|
|
This has the potential to cause deadlock for a multi-threaded process that exits
|
|
while one or more threads are executing in the memory allocation functions.
|
|
Therefore, this option should only be used with care; it is primarily intended
|
|
as a performance tuning aid during application development.
|
|
.It Q
|
|
Double/halve the size of the allocation quantum.
|
|
The default quantum is the minimum allowed by the architecture (typically 8 or
|
|
16 bytes).
|
|
.It S
|
|
Double/halve the size of the maximum size class that is a multiple of the
|
|
quantum.
|
|
Above this size, power-of-two spacing is used for size classes.
|
|
The default value is 512 bytes.
|
|
.It U
|
|
Generate
|
|
.Dq utrace
|
|
entries for
|
|
.Xr ktrace 1 ,
|
|
for all operations.
|
|
Consult the source for details on this option.
|
|
.It V
|
|
Attempting to allocate zero bytes will return a
|
|
.Dv NULL
|
|
pointer instead of
|
|
a valid pointer.
|
|
(The default behavior is to make a minimal allocation and return a
|
|
pointer to it.)
|
|
This option is provided for System V compatibility.
|
|
This option is incompatible with the
|
|
.Dq X
|
|
option.
|
|
.It X
|
|
Rather than return failure for any allocation function,
|
|
display a diagnostic message on
|
|
.Dv stderr
|
|
and cause the program to drop
|
|
core (using
|
|
.Xr abort 3 ) .
|
|
This option should be set at compile time by including the following in
|
|
the source code:
|
|
.Bd -literal -offset indent
|
|
_malloc_options = "X";
|
|
.Ed
|
|
.It Z
|
|
Each byte of new memory allocated by
|
|
.Fn malloc ,
|
|
.Fn realloc
|
|
or
|
|
.Fn reallocf
|
|
will be initialized to 0.
|
|
Note that this initialization only happens once for each byte, so
|
|
.Fn realloc
|
|
and
|
|
.Fn reallocf
|
|
calls do not zero memory that was previously allocated.
|
|
This is intended for debugging and will impact performance negatively.
|
|
.El
|
|
.Pp
|
|
The
|
|
.Dq J
|
|
and
|
|
.Dq Z
|
|
options are intended for testing and debugging.
|
|
An application which changes its behavior when these options are used
|
|
is flawed.
|
|
.Sh IMPLEMENTATION NOTES
|
|
Traditionally, allocators have used
|
|
.Xr sbrk 2
|
|
to obtain memory, which is suboptimal for several reasons, including race
|
|
conditions, increased fragmentation, and artificial limitations on maximum
|
|
usable memory.
|
|
This allocator uses both
|
|
.Xr sbrk 2
|
|
and
|
|
.Xr mmap 2
|
|
by default, but it can be configured at run time to use only one or the other.
|
|
If resource limits are not a primary concern, the preferred configuration is
|
|
.Ev MALLOC_OPTIONS=dM
|
|
or
|
|
.Ev MALLOC_OPTIONS=DM .
|
|
When so configured, the
|
|
.Ar datasize
|
|
resource limit has little practical effect for typical applications; use
|
|
.Ev MALLOC_OPTIONS=Dm
|
|
if that is a concern.
|
|
Regardless of allocator configuration, the
|
|
.Ar vmemoryuse
|
|
resource limit can be used to bound the total virtual memory used by a
|
|
process, as described in
|
|
.Xr limits 1 .
|
|
.Pp
|
|
This allocator uses multiple arenas in order to reduce lock contention for
|
|
threaded programs on multi-processor systems.
|
|
This works well with regard to threading scalability, but incurs some costs.
|
|
There is a small fixed per-arena overhead, and additionally, arenas manage
|
|
memory completely independently of each other, which means a small fixed
|
|
increase in overall memory fragmentation.
|
|
These overheads are not generally an issue, given the number of arenas normally
|
|
used.
|
|
Note that using substantially more arenas than the default is not likely to
|
|
improve performance, mainly due to reduced cache performance.
|
|
However, it may make sense to reduce the number of arenas if an application
|
|
does not make much use of the allocation functions.
|
|
.Pp
|
|
Memory is conceptually broken into equal-sized chunks, where the chunk size is
|
|
a power of two that is greater than the page size.
|
|
Chunks are always aligned to multiples of the chunk size.
|
|
This alignment makes it possible to find metadata for user objects very
|
|
quickly.
|
|
.Pp
|
|
User objects are broken into three categories according to size: small, large,
|
|
and huge.
|
|
Small objects are no larger than one half of a page.
|
|
Large objects are smaller than the chunk size.
|
|
Huge objects are a multiple of the chunk size.
|
|
Small and large objects are managed by arenas; huge objects are managed
|
|
separately in a single data structure that is shared by all threads.
|
|
Huge objects are used by applications infrequently enough that this single
|
|
data structure is not a scalability issue.
|
|
.Pp
|
|
Each chunk that is managed by an arena tracks its contents as runs of
|
|
contiguous pages (unused, backing a set of small objects, or backing one large
|
|
object).
|
|
The combination of chunk alignment and chunk page maps makes it possible to
|
|
determine all metadata regarding small and large allocations in
|
|
constant and logarithmic time, respectively.
|
|
.Pp
|
|
Small objects are managed in groups by page runs.
|
|
Each run maintains a bitmap that tracks which regions are in use.
|
|
Allocation requests that are no more than half the quantum (see the
|
|
.Dq Q
|
|
option) are rounded up to the nearest power of two (typically 2, 4, or 8).
|
|
Allocation requests that are more than half the quantum, but no more than the
|
|
maximum quantum-multiple size class (see the
|
|
.Dq S
|
|
option) are rounded up to the nearest multiple of the quantum.
|
|
Allocation requests that are larger than the maximum quantum-multiple size
|
|
class, but no larger than one half of a page, are rounded up to the nearest
|
|
power of two.
|
|
Allocation requests that are larger than half of a page, but small enough to
|
|
fit in an arena-managed chunk (see the
|
|
.Dq K
|
|
option), are rounded up to the nearest run size.
|
|
Allocation requests that are too large to fit in an arena-managed chunk are
|
|
rounded up to the nearest multiple of the chunk size.
|
|
.Pp
|
|
Allocations are packed tightly together, which can be an issue for
|
|
multi-threaded applications.
|
|
If you need to assure that allocations do not suffer from cache line sharing,
|
|
round your allocation requests up to the nearest multiple of the cache line
|
|
size.
|
|
.Sh DEBUGGING MALLOC PROBLEMS
|
|
The first thing to do is to set the
|
|
.Dq A
|
|
option.
|
|
This option forces a coredump (if possible) at the first sign of trouble,
|
|
rather than the normal policy of trying to continue if at all possible.
|
|
.Pp
|
|
It is probably also a good idea to recompile the program with suitable
|
|
options and symbols for debugger support.
|
|
.Pp
|
|
If the program starts to give unusual results, coredump or generally behave
|
|
differently without emitting any of the messages mentioned in the next
|
|
section, it is likely because it depends on the storage being filled with
|
|
zero bytes.
|
|
Try running it with the
|
|
.Dq Z
|
|
option set;
|
|
if that improves the situation, this diagnosis has been confirmed.
|
|
If the program still misbehaves,
|
|
the likely problem is accessing memory outside the allocated area.
|
|
.Pp
|
|
Alternatively, if the symptoms are not easy to reproduce, setting the
|
|
.Dq J
|
|
option may help provoke the problem.
|
|
.Pp
|
|
In truly difficult cases, the
|
|
.Dq U
|
|
option, if supported by the kernel, can provide a detailed trace of
|
|
all calls made to these functions.
|
|
.Pp
|
|
Unfortunately this implementation does not provide much detail about
|
|
the problems it detects; the performance impact for storing such information
|
|
would be prohibitive.
|
|
There are a number of allocator implementations available on the Internet
|
|
which focus on detecting and pinpointing problems by trading performance for
|
|
extra sanity checks and detailed diagnostics.
|
|
.Sh DIAGNOSTIC MESSAGES
|
|
If any of the memory allocation/deallocation functions detect an error or
|
|
warning condition, a message will be printed to file descriptor
|
|
.Dv STDERR_FILENO .
|
|
Errors will result in the process dumping core.
|
|
If the
|
|
.Dq A
|
|
option is set, all warnings are treated as errors.
|
|
.Pp
|
|
The
|
|
.Va _malloc_message
|
|
variable allows the programmer to override the function which emits
|
|
the text strings forming the errors and warnings if for some reason
|
|
the
|
|
.Dv stderr
|
|
file descriptor is not suitable for this.
|
|
Please note that doing anything which tries to allocate memory in
|
|
this function is likely to result in a crash or deadlock.
|
|
.Pp
|
|
All messages are prefixed by
|
|
.Dq Ao Ar progname Ac Ns Li : (malloc) .
|
|
.Sh RETURN VALUES
|
|
The
|
|
.Fn malloc
|
|
and
|
|
.Fn calloc
|
|
functions return a pointer to the allocated memory if successful; otherwise
|
|
a
|
|
.Dv NULL
|
|
pointer is returned and
|
|
.Va errno
|
|
is set to
|
|
.Er ENOMEM .
|
|
.Pp
|
|
The
|
|
.Fn realloc
|
|
and
|
|
.Fn reallocf
|
|
functions return a pointer, possibly identical to
|
|
.Fa ptr ,
|
|
to the allocated memory
|
|
if successful; otherwise a
|
|
.Dv NULL
|
|
pointer is returned, and
|
|
.Va errno
|
|
is set to
|
|
.Er ENOMEM
|
|
if the error was the result of an allocation failure.
|
|
The
|
|
.Fn realloc
|
|
function always leaves the original buffer intact
|
|
when an error occurs, whereas
|
|
.Fn reallocf
|
|
deallocates it in this case.
|
|
.Pp
|
|
The
|
|
.Fn free
|
|
function returns no value.
|
|
.Pp
|
|
The
|
|
.Fn malloc_usable_size
|
|
function returns the usable size of the allocation pointed to by
|
|
.Fa ptr .
|
|
.Sh ENVIRONMENT
|
|
The following environment variables affect the execution of the allocation
|
|
functions:
|
|
.Bl -tag -width ".Ev MALLOC_OPTIONS"
|
|
.It Ev MALLOC_OPTIONS
|
|
If the environment variable
|
|
.Ev MALLOC_OPTIONS
|
|
is set, the characters it contains will be interpreted as flags to the
|
|
allocation functions.
|
|
.El
|
|
.Sh EXAMPLES
|
|
To dump core whenever a problem occurs:
|
|
.Pp
|
|
.Bd -literal -offset indent
|
|
ln -s 'A' /etc/malloc.conf
|
|
.Ed
|
|
.Pp
|
|
To specify in the source that a program does no return value checking
|
|
on calls to these functions:
|
|
.Bd -literal -offset indent
|
|
_malloc_options = "X";
|
|
.Ed
|
|
.Sh SEE ALSO
|
|
.Xr limits 1 ,
|
|
.Xr madvise 2 ,
|
|
.Xr mmap 2 ,
|
|
.Xr sbrk 2 ,
|
|
.Xr alloca 3 ,
|
|
.Xr atexit 3 ,
|
|
.Xr getpagesize 3 ,
|
|
.Xr memory 3 ,
|
|
.Xr posix_memalign 3
|
|
.Sh STANDARDS
|
|
The
|
|
.Fn malloc ,
|
|
.Fn calloc ,
|
|
.Fn realloc
|
|
and
|
|
.Fn free
|
|
functions conform to
|
|
.St -isoC .
|
|
.Sh HISTORY
|
|
The
|
|
.Fn reallocf
|
|
function first appeared in
|
|
.Fx 3.0 .
|
|
.Pp
|
|
The
|
|
.Fn malloc_usable_size
|
|
function first appeared in
|
|
.Fx 7.0 .
|