2006-01-13 18:38:56 +00:00
|
|
|
/*-
|
2010-01-31 23:16:10 +00:00
|
|
|
* Copyright (C) 2006-2010 Jason Evans <jasone@FreeBSD.org>.
|
2006-01-13 18:38:56 +00:00
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice(s), this list of conditions and the following disclaimer as
|
|
|
|
* the first lines of this file unmodified other than the possible
|
|
|
|
* addition of one or more copyright notices.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice(s), this list of conditions and the following disclaimer in
|
|
|
|
* the documentation and/or other materials provided with the
|
|
|
|
* distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
|
|
|
|
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
|
|
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE
|
|
|
|
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
|
|
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
|
|
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
|
|
|
|
* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
|
|
|
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
|
|
|
|
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
|
|
|
|
* EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
*
|
|
|
|
*******************************************************************************
|
|
|
|
*
|
2006-03-26 23:37:25 +00:00
|
|
|
* This allocator implementation is designed to provide scalable performance
|
|
|
|
* for multi-threaded programs on multi-processor systems. The following
|
|
|
|
* features are included for this purpose:
|
2006-01-13 18:38:56 +00:00
|
|
|
*
|
|
|
|
* + Multiple arenas are used if there are multiple CPUs, which reduces lock
|
|
|
|
* contention and cache sloshing.
|
|
|
|
*
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
* + Thread-specific caching is used if there are multiple threads, which
|
|
|
|
* reduces the amount of locking.
|
|
|
|
*
|
2006-01-13 18:38:56 +00:00
|
|
|
* + Cache line sharing between arenas is avoided for internal data
|
|
|
|
* structures.
|
|
|
|
*
|
2007-03-28 19:55:07 +00:00
|
|
|
* + Memory is managed in chunks and runs (chunks can be split into runs),
|
|
|
|
* rather than as individual pages. This provides a constant-time
|
|
|
|
* mechanism for associating allocations with particular arenas.
|
2006-01-13 18:38:56 +00:00
|
|
|
*
|
2006-03-26 23:37:25 +00:00
|
|
|
* Allocation requests are rounded up to the nearest size class, and no record
|
|
|
|
* of the original request size is maintained. Allocations are broken into
|
2010-01-31 23:16:10 +00:00
|
|
|
* categories according to size class. Assuming runtime defaults, 4 KiB pages
|
2008-03-07 16:54:03 +00:00
|
|
|
* and a 16 byte quantum on a 32-bit system, the size classes in each category
|
|
|
|
* are as follows:
|
2006-03-26 23:37:25 +00:00
|
|
|
*
|
2010-01-31 23:16:10 +00:00
|
|
|
* |========================================|
|
|
|
|
* | Category | Subcategory | Size |
|
|
|
|
* |========================================|
|
|
|
|
* | Small | Tiny | 2 |
|
|
|
|
* | | | 4 |
|
|
|
|
* | | | 8 |
|
|
|
|
* | |------------------+----------|
|
|
|
|
* | | Quantum-spaced | 16 |
|
|
|
|
* | | | 32 |
|
|
|
|
* | | | 48 |
|
|
|
|
* | | | ... |
|
|
|
|
* | | | 96 |
|
|
|
|
* | | | 112 |
|
|
|
|
* | | | 128 |
|
|
|
|
* | |------------------+----------|
|
|
|
|
* | | Cacheline-spaced | 192 |
|
|
|
|
* | | | 256 |
|
|
|
|
* | | | 320 |
|
|
|
|
* | | | 384 |
|
|
|
|
* | | | 448 |
|
|
|
|
* | | | 512 |
|
|
|
|
* | |------------------+----------|
|
|
|
|
* | | Sub-page | 760 |
|
|
|
|
* | | | 1024 |
|
|
|
|
* | | | 1280 |
|
|
|
|
* | | | ... |
|
|
|
|
* | | | 3328 |
|
|
|
|
* | | | 3584 |
|
|
|
|
* | | | 3840 |
|
|
|
|
* |========================================|
|
|
|
|
* | Medium | 4 KiB |
|
|
|
|
* | | 6 KiB |
|
|
|
|
* | | 8 KiB |
|
|
|
|
* | | ... |
|
|
|
|
* | | 28 KiB |
|
|
|
|
* | | 30 KiB |
|
|
|
|
* | | 32 KiB |
|
|
|
|
* |========================================|
|
|
|
|
* | Large | 36 KiB |
|
|
|
|
* | | 40 KiB |
|
|
|
|
* | | 44 KiB |
|
|
|
|
* | | ... |
|
|
|
|
* | | 1012 KiB |
|
|
|
|
* | | 1016 KiB |
|
|
|
|
* | | 1020 KiB |
|
|
|
|
* |========================================|
|
|
|
|
* | Huge | 1 MiB |
|
|
|
|
* | | 2 MiB |
|
|
|
|
* | | 3 MiB |
|
|
|
|
* | | ... |
|
|
|
|
* |========================================|
|
2006-03-26 23:37:25 +00:00
|
|
|
*
|
2010-01-31 23:16:10 +00:00
|
|
|
* Different mechanisms are used accoding to category:
|
2006-03-26 23:37:25 +00:00
|
|
|
*
|
2010-01-31 23:16:10 +00:00
|
|
|
* Small/medium : Each size class is segregated into its own set of runs.
|
|
|
|
* Each run maintains a bitmap of which regions are
|
|
|
|
* free/allocated.
|
2006-03-26 23:37:25 +00:00
|
|
|
*
|
2006-04-04 03:51:47 +00:00
|
|
|
* Large : Each allocation is backed by a dedicated run. Metadata are stored
|
|
|
|
* in the associated arena chunk header maps.
|
2006-03-26 23:37:25 +00:00
|
|
|
*
|
2006-04-04 03:51:47 +00:00
|
|
|
* Huge : Each allocation is backed by a dedicated contiguous set of chunks.
|
|
|
|
* Metadata are stored in a separate red-black tree.
|
2006-01-13 18:38:56 +00:00
|
|
|
*
|
|
|
|
*******************************************************************************
|
|
|
|
*/
|
|
|
|
|
2006-03-26 23:37:25 +00:00
|
|
|
/*
|
2007-03-23 05:05:48 +00:00
|
|
|
* MALLOC_PRODUCTION disables assertions and statistics gathering. It also
|
|
|
|
* defaults the A and J runtime options to off. These settings are appropriate
|
|
|
|
* for production systems.
|
2006-01-27 04:42:10 +00:00
|
|
|
*/
|
2007-11-27 03:09:23 +00:00
|
|
|
/* #define MALLOC_PRODUCTION */
|
2006-01-27 04:42:10 +00:00
|
|
|
|
2007-03-23 05:05:48 +00:00
|
|
|
#ifndef MALLOC_PRODUCTION
|
2007-12-18 05:27:57 +00:00
|
|
|
/*
|
|
|
|
* MALLOC_DEBUG enables assertions and other sanity checks, and disables
|
|
|
|
* inline functions.
|
|
|
|
*/
|
2006-01-27 04:42:10 +00:00
|
|
|
# define MALLOC_DEBUG
|
2007-12-18 05:27:57 +00:00
|
|
|
|
|
|
|
/* MALLOC_STATS enables statistics calculation. */
|
|
|
|
# define MALLOC_STATS
|
2006-01-27 04:42:10 +00:00
|
|
|
#endif
|
2006-01-13 18:38:56 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/*
|
|
|
|
* MALLOC_TINY enables support for tiny objects, which are smaller than one
|
|
|
|
* quantum.
|
|
|
|
*/
|
|
|
|
#define MALLOC_TINY
|
|
|
|
|
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* MALLOC_TCACHE enables a thread-specific caching layer for small and medium
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
* objects. This makes it possible to allocate/deallocate objects without any
|
|
|
|
* locking when the cache is in the steady state.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
#define MALLOC_TCACHE
|
2007-12-17 01:20:04 +00:00
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
/*
|
|
|
|
* MALLOC_DSS enables use of sbrk(2) to allocate chunks from the data storage
|
|
|
|
* segment (DSS). In an ideal world, this functionality would be completely
|
|
|
|
* unnecessary, but we are burdened by history and the lack of resource limits
|
|
|
|
* for anonymous mapped memory.
|
|
|
|
*/
|
2008-01-03 23:22:13 +00:00
|
|
|
#define MALLOC_DSS
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
2002-03-22 21:53:29 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
#include "libc_private.h"
|
|
|
|
#ifdef MALLOC_DEBUG
|
|
|
|
# define _LOCK_DEBUG
|
|
|
|
#endif
|
|
|
|
#include "spinlock.h"
|
|
|
|
#include "namespace.h"
|
|
|
|
#include <sys/mman.h>
|
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/time.h>
|
|
|
|
#include <sys/types.h>
|
|
|
|
#include <sys/sysctl.h>
|
|
|
|
#include <sys/uio.h>
|
|
|
|
#include <sys/ktrace.h> /* Must come after several other sys/ includes. */
|
|
|
|
|
|
|
|
#include <machine/cpufunc.h>
|
2008-09-10 14:27:34 +00:00
|
|
|
#include <machine/param.h>
|
2006-01-13 18:38:56 +00:00
|
|
|
#include <machine/vmparam.h>
|
|
|
|
|
|
|
|
#include <errno.h>
|
|
|
|
#include <limits.h>
|
2010-08-17 09:13:26 +00:00
|
|
|
#include <link.h>
|
2006-01-13 18:38:56 +00:00
|
|
|
#include <pthread.h>
|
|
|
|
#include <sched.h>
|
|
|
|
#include <stdarg.h>
|
|
|
|
#include <stdbool.h>
|
|
|
|
#include <stdio.h>
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
#include <stdint.h>
|
2010-01-31 23:16:10 +00:00
|
|
|
#include <inttypes.h>
|
2006-01-13 18:38:56 +00:00
|
|
|
#include <stdlib.h>
|
|
|
|
#include <string.h>
|
|
|
|
#include <strings.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
|
|
|
|
#include "un-namespace.h"
|
|
|
|
|
2010-08-17 09:13:26 +00:00
|
|
|
#include "libc_private.h"
|
|
|
|
|
2010-02-28 22:57:13 +00:00
|
|
|
#define RB_COMPACT
|
2008-04-23 16:09:18 +00:00
|
|
|
#include "rb.h"
|
2010-01-31 23:16:10 +00:00
|
|
|
#if (defined(MALLOC_TCACHE) && defined(MALLOC_STATS))
|
|
|
|
#include "qr.h"
|
|
|
|
#include "ql.h"
|
|
|
|
#endif
|
2008-04-23 16:09:18 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
#ifdef MALLOC_DEBUG
|
|
|
|
/* Disable inlining to make debugging easier. */
|
2006-03-17 09:00:27 +00:00
|
|
|
# define inline
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Size of stack-allocated buffer passed to strerror_r(). */
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
#define STRERROR_BUF 64
|
2006-01-13 18:38:56 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* Minimum alignment of allocations is 2^LG_QUANTUM bytes.
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
*/
|
2006-01-13 18:38:56 +00:00
|
|
|
#ifdef __i386__
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_QUANTUM 4
|
|
|
|
# define LG_SIZEOF_PTR 2
|
2007-11-27 03:17:30 +00:00
|
|
|
# define CPU_SPINWAIT __asm__ volatile("pause")
|
2010-02-16 06:47:00 +00:00
|
|
|
# define TLS_MODEL __attribute__((tls_model("initial-exec")))
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
|
|
|
#ifdef __ia64__
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_QUANTUM 4
|
|
|
|
# define LG_SIZEOF_PTR 3
|
2010-02-16 06:47:00 +00:00
|
|
|
# define TLS_MODEL /* default */
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
|
|
|
#ifdef __alpha__
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_QUANTUM 4
|
|
|
|
# define LG_SIZEOF_PTR 3
|
2006-01-13 18:38:56 +00:00
|
|
|
# define NO_TLS
|
|
|
|
#endif
|
|
|
|
#ifdef __sparc64__
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_QUANTUM 4
|
|
|
|
# define LG_SIZEOF_PTR 3
|
2006-01-13 18:38:56 +00:00
|
|
|
# define NO_TLS
|
|
|
|
#endif
|
|
|
|
#ifdef __amd64__
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_QUANTUM 4
|
|
|
|
# define LG_SIZEOF_PTR 3
|
2007-11-27 03:17:30 +00:00
|
|
|
# define CPU_SPINWAIT __asm__ volatile("pause")
|
2010-02-16 06:47:00 +00:00
|
|
|
# define TLS_MODEL __attribute__((tls_model("initial-exec")))
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
|
|
|
#ifdef __arm__
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_QUANTUM 3
|
|
|
|
# define LG_SIZEOF_PTR 2
|
2006-01-13 18:38:56 +00:00
|
|
|
# define NO_TLS
|
|
|
|
#endif
|
2008-04-29 22:56:05 +00:00
|
|
|
#ifdef __mips__
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_QUANTUM 3
|
|
|
|
# define LG_SIZEOF_PTR 2
|
2008-04-29 22:56:05 +00:00
|
|
|
# define NO_TLS
|
|
|
|
#endif
|
2010-07-10 14:45:03 +00:00
|
|
|
#ifdef __powerpc64__
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_QUANTUM 4
|
2010-07-10 14:45:03 +00:00
|
|
|
# define LG_SIZEOF_PTR 3
|
|
|
|
# define TLS_MODEL /* default */
|
|
|
|
#elif defined(__powerpc__)
|
|
|
|
# define LG_QUANTUM 4
|
|
|
|
# define LG_SIZEOF_PTR 2
|
2010-02-16 20:46:22 +00:00
|
|
|
# define TLS_MODEL /* default */
|
2010-01-31 23:16:10 +00:00
|
|
|
#endif
|
|
|
|
#ifdef __s390x__
|
|
|
|
# define LG_QUANTUM 4
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#define QUANTUM ((size_t)(1U << LG_QUANTUM))
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#define QUANTUM_MASK (QUANTUM - 1)
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#define SIZEOF_PTR (1U << LG_SIZEOF_PTR)
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* sizeof(int) == (1U << LG_SIZEOF_INT). */
|
|
|
|
#ifndef LG_SIZEOF_INT
|
|
|
|
# define LG_SIZEOF_INT 2
|
2006-03-30 20:25:52 +00:00
|
|
|
#endif
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/* We can't use TLS in non-PIC programs, since TLS relies on loader magic. */
|
|
|
|
#if (!defined(PIC) && !defined(NO_TLS))
|
|
|
|
# define NO_TLS
|
|
|
|
#endif
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2007-12-17 01:20:04 +00:00
|
|
|
#ifdef NO_TLS
|
2010-01-31 23:16:10 +00:00
|
|
|
/* MALLOC_TCACHE requires TLS. */
|
|
|
|
# ifdef MALLOC_TCACHE
|
|
|
|
# undef MALLOC_TCACHE
|
2007-12-17 01:20:04 +00:00
|
|
|
# endif
|
|
|
|
#endif
|
|
|
|
|
1995-09-16 09:28:13 +00:00
|
|
|
/*
|
2006-01-13 18:38:56 +00:00
|
|
|
* Size and alignment of memory chunks that are allocated by the OS's virtual
|
|
|
|
* memory system.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
#define LG_CHUNK_DEFAULT 22
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* The minimum ratio of active:dirty pages per arena is computed as:
|
|
|
|
*
|
|
|
|
* (nactive >> opt_lg_dirty_mult) >= ndirty
|
|
|
|
*
|
|
|
|
* So, supposing that opt_lg_dirty_mult is 5, there can be no less than 32
|
|
|
|
* times as many active pages as dirty pages.
|
|
|
|
*/
|
|
|
|
#define LG_DIRTY_MULT_DEFAULT 5
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/*
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
* Maximum size of L1 cache line. This is used to avoid cache line aliasing.
|
|
|
|
* In addition, this controls the spacing of cacheline-spaced size classes.
|
1995-09-16 09:28:13 +00:00
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
#define LG_CACHELINE 6
|
|
|
|
#define CACHELINE ((size_t)(1U << LG_CACHELINE))
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#define CACHELINE_MASK (CACHELINE - 1)
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/*
|
|
|
|
* Subpages are an artificially designated partitioning of pages. Their only
|
|
|
|
* purpose is to support subpage-spaced size classes.
|
|
|
|
*
|
|
|
|
* There must be at least 4 subpages per page, due to the way size classes are
|
|
|
|
* handled.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
#define LG_SUBPAGE 8
|
|
|
|
#define SUBPAGE ((size_t)(1U << LG_SUBPAGE))
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#define SUBPAGE_MASK (SUBPAGE - 1)
|
|
|
|
|
|
|
|
#ifdef MALLOC_TINY
|
|
|
|
/* Smallest size class to support. */
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_TINY_MIN 1
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
1996-10-20 13:20:57 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/*
|
|
|
|
* Maximum size class that is a multiple of the quantum, but not (necessarily)
|
|
|
|
* a power of 2. Above this size, allocations are rounded up to the nearest
|
|
|
|
* power of 2.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
#define LG_QSPACE_MAX_DEFAULT 7
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Maximum size class that is a multiple of the cacheline, but not (necessarily)
|
|
|
|
* a power of 2. Above this size, allocations are rounded up to the nearest
|
|
|
|
* power of 2.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
#define LG_CSPACE_MAX_DEFAULT 9
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Maximum medium size class. This must not be more than 1/4 of a chunk
|
|
|
|
* (LG_MEDIUM_MAX_DEFAULT <= LG_CHUNK_DEFAULT - 2).
|
|
|
|
*/
|
|
|
|
#define LG_MEDIUM_MAX_DEFAULT 15
|
2006-03-17 09:00:27 +00:00
|
|
|
|
|
|
|
/*
|
2007-12-18 05:27:57 +00:00
|
|
|
* RUN_MAX_OVRHD indicates maximum desired run header overhead. Runs are sized
|
|
|
|
* as small as possible such that this setting is still honored, without
|
|
|
|
* violating other constraints. The goal is to make runs as small as possible
|
|
|
|
* without exceeding a per run external fragmentation threshold.
|
|
|
|
*
|
|
|
|
* We use binary fixed point math for overhead computations, where the binary
|
|
|
|
* point is implicitly RUN_BFP bits to the left.
|
2006-03-17 09:00:27 +00:00
|
|
|
*
|
2007-12-18 05:27:57 +00:00
|
|
|
* Note that it is possible to set RUN_MAX_OVRHD low enough that it cannot be
|
|
|
|
* honored for some/all object sizes, since there is one bit of header overhead
|
|
|
|
* per object (plus a constant). This constraint is relaxed (ignored) for runs
|
|
|
|
* that are so small that the per-region overhead is greater than:
|
2007-03-23 05:05:48 +00:00
|
|
|
*
|
2007-12-18 05:27:57 +00:00
|
|
|
* (RUN_MAX_OVRHD / (reg_size << (3+RUN_BFP))
|
2006-03-17 09:00:27 +00:00
|
|
|
*/
|
2007-12-18 05:27:57 +00:00
|
|
|
#define RUN_BFP 12
|
|
|
|
/* \/ Implicit binary fixed point. */
|
|
|
|
#define RUN_MAX_OVRHD 0x0000003dU
|
|
|
|
#define RUN_MAX_OVRHD_RELAX 0x00001800U
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
/* Put a cap on small object run size. This overrides RUN_MAX_OVRHD. */
|
2010-01-31 23:16:10 +00:00
|
|
|
#define RUN_MAX_SMALL \
|
|
|
|
(arena_maxclass <= (1U << (CHUNK_MAP_LG_PG_RANGE + PAGE_SHIFT)) \
|
|
|
|
? arena_maxclass : (1U << (CHUNK_MAP_LG_PG_RANGE + \
|
|
|
|
PAGE_SHIFT)))
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2007-11-27 03:17:30 +00:00
|
|
|
/*
|
|
|
|
* Hyper-threaded CPUs may need a special instruction inside spin loops in
|
|
|
|
* order to yield to another virtual CPU. If no such instruction is defined
|
|
|
|
* above, make CPU_SPINWAIT a no-op.
|
|
|
|
*/
|
|
|
|
#ifndef CPU_SPINWAIT
|
|
|
|
# define CPU_SPINWAIT
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Adaptive spinning must eventually switch to blocking, in order to avoid the
|
|
|
|
* potential for priority inversion deadlock. Backing off past a certain point
|
|
|
|
* can actually waste time.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
#define LG_SPIN_LIMIT 11
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
2007-12-18 05:27:57 +00:00
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* Default number of cache slots for each bin in the thread cache (0:
|
|
|
|
* disabled).
|
2007-12-18 05:27:57 +00:00
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_TCACHE_NSLOTS_DEFAULT 7
|
2007-12-18 05:27:57 +00:00
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* (1U << opt_lg_tcache_gc_sweep) is the approximate number of
|
|
|
|
* allocation events between full GC sweeps (-1: disabled). Integer
|
|
|
|
* rounding may cause the actual number to be slightly higher, since GC is
|
|
|
|
* performed incrementally.
|
2007-12-18 05:27:57 +00:00
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
# define LG_TCACHE_GC_SWEEP_DEFAULT 13
|
2007-12-17 01:20:04 +00:00
|
|
|
#endif
|
2007-11-27 03:17:30 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/******************************************************************************/
|
1997-05-30 20:39:32 +00:00
|
|
|
|
1996-09-23 19:26:39 +00:00
|
|
|
/*
|
2007-11-27 03:17:30 +00:00
|
|
|
* Mutexes based on spinlocks. We can't use normal pthread spinlocks in all
|
|
|
|
* places, because they require malloc()ed memory, which causes bootstrapping
|
|
|
|
* issues in some cases.
|
1996-09-23 19:26:39 +00:00
|
|
|
*/
|
2006-01-13 18:38:56 +00:00
|
|
|
typedef struct {
|
|
|
|
spinlock_t lock;
|
|
|
|
} malloc_mutex_t;
|
|
|
|
|
2006-04-04 19:46:28 +00:00
|
|
|
/* Set to true once the allocator has been initialized. */
|
2006-01-13 18:38:56 +00:00
|
|
|
static bool malloc_initialized = false;
|
|
|
|
|
2006-04-04 19:46:28 +00:00
|
|
|
/* Used to avoid initialization races. */
|
|
|
|
static malloc_mutex_t init_lock = {_SPINLOCK_INITIALIZER};
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
|
|
|
* Statistics data structures.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
typedef struct tcache_bin_stats_s tcache_bin_stats_t;
|
|
|
|
struct tcache_bin_stats_s {
|
|
|
|
/*
|
|
|
|
* Number of allocation requests that corresponded to the size of this
|
|
|
|
* bin.
|
|
|
|
*/
|
|
|
|
uint64_t nrequests;
|
|
|
|
};
|
|
|
|
#endif
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
typedef struct malloc_bin_stats_s malloc_bin_stats_t;
|
|
|
|
struct malloc_bin_stats_s {
|
|
|
|
/*
|
|
|
|
* Number of allocation requests that corresponded to the size of this
|
|
|
|
* bin.
|
|
|
|
*/
|
|
|
|
uint64_t nrequests;
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
/* Number of tcache fills from this bin. */
|
|
|
|
uint64_t nfills;
|
|
|
|
|
|
|
|
/* Number of tcache flushes to this bin. */
|
|
|
|
uint64_t nflushes;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Total number of runs created for this bin's size class. */
|
|
|
|
uint64_t nruns;
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/*
|
2007-03-28 19:55:07 +00:00
|
|
|
* Total number of runs reused by extracting them from the runs tree for
|
|
|
|
* this bin's size class.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
2007-03-28 19:55:07 +00:00
|
|
|
uint64_t reruns;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* High-water mark for this bin. */
|
2010-01-31 23:16:10 +00:00
|
|
|
size_t highruns;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Current number of runs in this bin. */
|
2010-01-31 23:16:10 +00:00
|
|
|
size_t curruns;
|
|
|
|
};
|
|
|
|
|
|
|
|
typedef struct malloc_large_stats_s malloc_large_stats_t;
|
|
|
|
struct malloc_large_stats_s {
|
|
|
|
/*
|
|
|
|
* Number of allocation requests that corresponded to this size class.
|
|
|
|
*/
|
|
|
|
uint64_t nrequests;
|
|
|
|
|
|
|
|
/* High-water mark for this size class. */
|
|
|
|
size_t highruns;
|
|
|
|
|
|
|
|
/* Current number of runs of this size class. */
|
|
|
|
size_t curruns;
|
2006-01-13 18:38:56 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
typedef struct arena_stats_s arena_stats_t;
|
|
|
|
struct arena_stats_s {
|
2007-03-23 05:05:48 +00:00
|
|
|
/* Number of bytes currently mapped. */
|
|
|
|
size_t mapped;
|
2006-03-26 23:37:25 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/*
|
|
|
|
* Total number of purge sweeps, total number of madvise calls made,
|
|
|
|
* and total pages purged in order to keep dirty unused memory under
|
|
|
|
* control.
|
|
|
|
*/
|
|
|
|
uint64_t npurge;
|
|
|
|
uint64_t nmadvise;
|
|
|
|
uint64_t purged;
|
|
|
|
|
2007-03-23 05:05:48 +00:00
|
|
|
/* Per-size-category statistics. */
|
|
|
|
size_t allocated_small;
|
|
|
|
uint64_t nmalloc_small;
|
|
|
|
uint64_t ndalloc_small;
|
2006-03-30 20:25:52 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
size_t allocated_medium;
|
|
|
|
uint64_t nmalloc_medium;
|
|
|
|
uint64_t ndalloc_medium;
|
|
|
|
|
2007-03-23 05:05:48 +00:00
|
|
|
size_t allocated_large;
|
|
|
|
uint64_t nmalloc_large;
|
|
|
|
uint64_t ndalloc_large;
|
2007-11-27 03:17:30 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* One element for each possible size class, including sizes that
|
|
|
|
* overlap with bin size classes. This is necessary because ipalloc()
|
|
|
|
* sometimes has to use such large objects in order to assure proper
|
|
|
|
* alignment.
|
|
|
|
*/
|
|
|
|
malloc_large_stats_t *lstats;
|
2006-01-13 18:38:56 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
typedef struct chunk_stats_s chunk_stats_t;
|
|
|
|
struct chunk_stats_s {
|
|
|
|
/* Number of chunks that were allocated. */
|
|
|
|
uint64_t nchunks;
|
|
|
|
|
|
|
|
/* High-water mark for number of chunks allocated. */
|
2010-01-31 23:16:10 +00:00
|
|
|
size_t highchunks;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Current number of chunks allocated. This value isn't maintained for
|
|
|
|
* any other purpose, so keep track of it in order to be able to set
|
|
|
|
* highchunks.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
size_t curchunks;
|
2006-01-13 18:38:56 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
#endif /* #ifdef MALLOC_STATS */
|
|
|
|
|
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
2008-02-06 02:59:54 +00:00
|
|
|
* Extent data structures.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Tree of extents. */
|
|
|
|
typedef struct extent_node_s extent_node_t;
|
|
|
|
struct extent_node_s {
|
2008-07-18 19:35:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Linkage for the size/address-ordered tree. */
|
2008-04-23 16:09:18 +00:00
|
|
|
rb_node(extent_node_t) link_szad;
|
2008-07-18 19:35:44 +00:00
|
|
|
#endif
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Linkage for the address-ordered tree. */
|
2008-04-23 16:09:18 +00:00
|
|
|
rb_node(extent_node_t) link_ad;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Pointer to the extent that this tree node is responsible for. */
|
|
|
|
void *addr;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Total region size. */
|
2006-01-13 18:38:56 +00:00
|
|
|
size_t size;
|
|
|
|
};
|
2008-04-23 16:09:18 +00:00
|
|
|
typedef rb_tree(extent_node_t) extent_tree_t;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
2006-03-17 09:00:27 +00:00
|
|
|
* Arena data structures.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
typedef struct arena_s arena_t;
|
|
|
|
typedef struct arena_bin_s arena_bin_t;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
/* Each element of the chunk map corresponds to one page within the chunk. */
|
|
|
|
typedef struct arena_chunk_map_s arena_chunk_map_t;
|
|
|
|
struct arena_chunk_map_s {
|
|
|
|
/*
|
|
|
|
* Linkage for run trees. There are two disjoint uses:
|
|
|
|
*
|
|
|
|
* 1) arena_t's runs_avail tree.
|
|
|
|
* 2) arena_run_t conceptually uses this linkage for in-use non-full
|
|
|
|
* runs, rather than directly embedding linkage.
|
|
|
|
*/
|
|
|
|
rb_node(arena_chunk_map_t) link;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Run address (or size) and various flags are stored together. The bit
|
|
|
|
* layout looks like (assuming 32-bit system):
|
|
|
|
*
|
2010-01-31 23:16:10 +00:00
|
|
|
* ???????? ???????? ????cccc ccccdzla
|
2008-07-18 19:35:44 +00:00
|
|
|
*
|
|
|
|
* ? : Unallocated: Run address for first/last pages, unset for internal
|
|
|
|
* pages.
|
2010-01-31 23:16:10 +00:00
|
|
|
* Small/medium: Don't care.
|
2008-07-18 19:35:44 +00:00
|
|
|
* Large: Run size for first page, unset for trailing pages.
|
|
|
|
* - : Unused.
|
2010-01-31 23:16:10 +00:00
|
|
|
* c : refcount (could overflow for PAGE_SIZE >= 128 KiB)
|
2008-07-18 19:35:44 +00:00
|
|
|
* d : dirty?
|
|
|
|
* z : zeroed?
|
|
|
|
* l : large?
|
|
|
|
* a : allocated?
|
|
|
|
*
|
|
|
|
* Following are example bit patterns for the three types of runs.
|
|
|
|
*
|
2010-01-31 23:16:10 +00:00
|
|
|
* p : run page offset
|
2008-07-18 19:35:44 +00:00
|
|
|
* s : run size
|
|
|
|
* x : don't care
|
|
|
|
* - : 0
|
|
|
|
* [dzla] : bit set
|
|
|
|
*
|
|
|
|
* Unallocated:
|
|
|
|
* ssssssss ssssssss ssss---- --------
|
|
|
|
* xxxxxxxx xxxxxxxx xxxx---- ----d---
|
|
|
|
* ssssssss ssssssss ssss---- -----z--
|
|
|
|
*
|
2010-01-31 23:16:10 +00:00
|
|
|
* Small/medium:
|
|
|
|
* pppppppp ppppcccc cccccccc cccc---a
|
|
|
|
* pppppppp ppppcccc cccccccc cccc---a
|
|
|
|
* pppppppp ppppcccc cccccccc cccc---a
|
2008-07-18 19:35:44 +00:00
|
|
|
*
|
|
|
|
* Large:
|
|
|
|
* ssssssss ssssssss ssss---- ------la
|
|
|
|
* -------- -------- -------- ------la
|
|
|
|
* -------- -------- -------- ------la
|
|
|
|
*/
|
|
|
|
size_t bits;
|
2010-01-31 23:16:10 +00:00
|
|
|
#define CHUNK_MAP_PG_MASK ((size_t)0xfff00000U)
|
|
|
|
#define CHUNK_MAP_PG_SHIFT 20
|
|
|
|
#define CHUNK_MAP_LG_PG_RANGE 12
|
|
|
|
|
|
|
|
#define CHUNK_MAP_RC_MASK ((size_t)0xffff0U)
|
|
|
|
#define CHUNK_MAP_RC_ONE ((size_t)0x00010U)
|
|
|
|
|
|
|
|
#define CHUNK_MAP_FLAGS_MASK ((size_t)0xfU)
|
|
|
|
#define CHUNK_MAP_DIRTY ((size_t)0x8U)
|
|
|
|
#define CHUNK_MAP_ZEROED ((size_t)0x4U)
|
|
|
|
#define CHUNK_MAP_LARGE ((size_t)0x2U)
|
|
|
|
#define CHUNK_MAP_ALLOCATED ((size_t)0x1U)
|
|
|
|
#define CHUNK_MAP_KEY (CHUNK_MAP_DIRTY | CHUNK_MAP_ALLOCATED)
|
2008-07-18 19:35:44 +00:00
|
|
|
};
|
|
|
|
typedef rb_tree(arena_chunk_map_t) arena_avail_tree_t;
|
|
|
|
typedef rb_tree(arena_chunk_map_t) arena_run_tree_t;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Arena chunk header. */
|
|
|
|
typedef struct arena_chunk_s arena_chunk_t;
|
|
|
|
struct arena_chunk_s {
|
|
|
|
/* Arena that owns the chunk. */
|
2008-02-06 02:59:54 +00:00
|
|
|
arena_t *arena;
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2008-05-01 17:25:55 +00:00
|
|
|
/* Linkage for the arena's chunks_dirty tree. */
|
|
|
|
rb_node(arena_chunk_t) link_dirty;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* True if the chunk is currently in the chunks_dirty tree, due to
|
|
|
|
* having at some point contained one or more dirty pages. Removal
|
|
|
|
* from chunks_dirty is lazy, so (dirtied && ndirty == 0) is possible.
|
|
|
|
*/
|
|
|
|
bool dirtied;
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Number of dirty pages. */
|
|
|
|
size_t ndirty;
|
2007-03-23 05:05:48 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
/* Map of pages within chunk that keeps track of free/large/small. */
|
2006-03-17 09:00:27 +00:00
|
|
|
arena_chunk_map_t map[1]; /* Dynamically sized. */
|
2006-01-13 18:38:56 +00:00
|
|
|
};
|
2008-04-23 16:09:18 +00:00
|
|
|
typedef rb_tree(arena_chunk_t) arena_chunk_tree_t;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
typedef struct arena_run_s arena_run_t;
|
|
|
|
struct arena_run_s {
|
|
|
|
#ifdef MALLOC_DEBUG
|
2006-03-24 00:28:08 +00:00
|
|
|
uint32_t magic;
|
2006-03-17 09:00:27 +00:00
|
|
|
# define ARENA_RUN_MAGIC 0x384adf93
|
|
|
|
#endif
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Bin this run is associated with. */
|
|
|
|
arena_bin_t *bin;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Index of first element that might have a free region. */
|
2006-03-30 20:25:52 +00:00
|
|
|
unsigned regs_minelm;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Number of free regions in run. */
|
2006-05-10 00:07:45 +00:00
|
|
|
unsigned nfree;
|
1994-05-27 05:00:24 +00:00
|
|
|
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
/* Bitmask of in-use regions (0: in use, 1: free). */
|
|
|
|
unsigned regs_mask[1]; /* Dynamically sized. */
|
1995-09-16 09:28:13 +00:00
|
|
|
};
|
2007-01-31 22:54:19 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
struct arena_bin_s {
|
|
|
|
/*
|
2006-03-17 09:00:27 +00:00
|
|
|
* Current run being used to service allocations of this bin's size
|
|
|
|
* class.
|
|
|
|
*/
|
|
|
|
arena_run_t *runcur;
|
|
|
|
|
|
|
|
/*
|
2007-03-28 19:55:07 +00:00
|
|
|
* Tree of non-full runs. This tree is used when looking for an
|
|
|
|
* existing run when runcur is no longer usable. We choose the
|
|
|
|
* non-full run that is lowest in memory; this policy tends to keep
|
|
|
|
* objects packed well, and it can also help reduce the number of
|
|
|
|
* almost-empty chunks.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
2007-03-28 19:55:07 +00:00
|
|
|
arena_run_tree_t runs;
|
2006-03-17 09:00:27 +00:00
|
|
|
|
|
|
|
/* Size of regions in a run for this bin's size class. */
|
|
|
|
size_t reg_size;
|
|
|
|
|
|
|
|
/* Total size of a run for this bin's size class. */
|
|
|
|
size_t run_size;
|
|
|
|
|
|
|
|
/* Total number of regions in a run for this bin's size class. */
|
|
|
|
uint32_t nregs;
|
|
|
|
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
/* Number of elements in a run's regs_mask for this bin's size class. */
|
|
|
|
uint32_t regs_mask_nelms;
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Offset of first region in a run for this bin's size class. */
|
|
|
|
uint32_t reg0_offset;
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
/* Bin statistics. */
|
|
|
|
malloc_bin_stats_t stats;
|
|
|
|
#endif
|
2006-01-13 18:38:56 +00:00
|
|
|
};
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
typedef struct tcache_s tcache_t;
|
|
|
|
#endif
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
struct arena_s {
|
|
|
|
#ifdef MALLOC_DEBUG
|
2006-03-17 09:00:27 +00:00
|
|
|
uint32_t magic;
|
2006-01-13 18:38:56 +00:00
|
|
|
# define ARENA_MAGIC 0x947d3d24
|
|
|
|
#endif
|
|
|
|
|
2007-11-27 03:17:30 +00:00
|
|
|
/* All operations on this arena require that lock be locked. */
|
|
|
|
pthread_mutex_t lock;
|
2006-03-17 09:00:27 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
arena_stats_t stats;
|
2010-01-31 23:16:10 +00:00
|
|
|
# ifdef MALLOC_TCACHE
|
|
|
|
/*
|
|
|
|
* List of tcaches for extant threads associated with this arena.
|
|
|
|
* Stats from these are merged incrementally, and at exit.
|
|
|
|
*/
|
|
|
|
ql_head(tcache_t) tcache_ql;
|
|
|
|
# endif
|
2006-03-17 09:00:27 +00:00
|
|
|
#endif
|
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
/* Tree of dirty-page-containing chunks this arena manages. */
|
2008-05-01 17:25:55 +00:00
|
|
|
arena_chunk_tree_t chunks_dirty;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-12-23 00:18:51 +00:00
|
|
|
/*
|
|
|
|
* In order to avoid rapid chunk allocation/deallocation when an arena
|
|
|
|
* oscillates right on the cusp of needing a new chunk, cache the most
|
2008-05-01 17:25:55 +00:00
|
|
|
* recently freed chunk. The spare is left in the arena's chunk trees
|
2008-02-06 02:59:54 +00:00
|
|
|
* until it is deleted.
|
2006-12-23 00:18:51 +00:00
|
|
|
*
|
|
|
|
* There is one spare chunk per arena, rather than one spare total, in
|
|
|
|
* order to avoid interactions between multiple threads that could make
|
|
|
|
* a single spare inadequate.
|
|
|
|
*/
|
2007-11-27 03:09:23 +00:00
|
|
|
arena_chunk_t *spare;
|
2006-12-23 00:18:51 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Number of pages in active runs. */
|
|
|
|
size_t nactive;
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/*
|
|
|
|
* Current count of pages within unused runs that are potentially
|
|
|
|
* dirty, and for which madvise(... MADV_FREE) has not been called. By
|
|
|
|
* tracking this, we can institute a limit on how much dirty unused
|
|
|
|
* memory is mapped for each arena.
|
|
|
|
*/
|
|
|
|
size_t ndirty;
|
|
|
|
|
|
|
|
/*
|
2008-07-18 19:35:44 +00:00
|
|
|
* Size/address-ordered tree of this arena's available runs. This tree
|
|
|
|
* is used for first-best-fit run allocation.
|
2008-02-06 02:59:54 +00:00
|
|
|
*/
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_avail_tree_t runs_avail;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2007-11-27 03:17:30 +00:00
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* bins is used to store trees of free regions of the following sizes,
|
|
|
|
* assuming a 16-byte quantum, 4 KiB page size, and default
|
2008-09-10 14:27:34 +00:00
|
|
|
* MALLOC_OPTIONS.
|
2006-01-13 18:38:56 +00:00
|
|
|
*
|
2010-01-31 23:16:10 +00:00
|
|
|
* bins[i] | size |
|
|
|
|
* --------+--------+
|
|
|
|
* 0 | 2 |
|
|
|
|
* 1 | 4 |
|
|
|
|
* 2 | 8 |
|
|
|
|
* --------+--------+
|
|
|
|
* 3 | 16 |
|
|
|
|
* 4 | 32 |
|
|
|
|
* 5 | 48 |
|
|
|
|
* : :
|
|
|
|
* 8 | 96 |
|
|
|
|
* 9 | 112 |
|
|
|
|
* 10 | 128 |
|
|
|
|
* --------+--------+
|
|
|
|
* 11 | 192 |
|
|
|
|
* 12 | 256 |
|
|
|
|
* 13 | 320 |
|
|
|
|
* 14 | 384 |
|
|
|
|
* 15 | 448 |
|
|
|
|
* 16 | 512 |
|
|
|
|
* --------+--------+
|
|
|
|
* 17 | 768 |
|
|
|
|
* 18 | 1024 |
|
|
|
|
* 19 | 1280 |
|
|
|
|
* : :
|
|
|
|
* 27 | 3328 |
|
|
|
|
* 28 | 3584 |
|
|
|
|
* 29 | 3840 |
|
|
|
|
* --------+--------+
|
|
|
|
* 30 | 4 KiB |
|
|
|
|
* 31 | 6 KiB |
|
|
|
|
* 33 | 8 KiB |
|
|
|
|
* : :
|
|
|
|
* 43 | 28 KiB |
|
|
|
|
* 44 | 30 KiB |
|
|
|
|
* 45 | 32 KiB |
|
|
|
|
* --------+--------+
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
2006-03-17 09:00:27 +00:00
|
|
|
arena_bin_t bins[1]; /* Dynamically sized. */
|
1995-09-16 09:28:13 +00:00
|
|
|
};
|
1994-05-27 05:00:24 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* Thread cache data structures.
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
*/
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
typedef struct tcache_bin_s tcache_bin_t;
|
|
|
|
struct tcache_bin_s {
|
|
|
|
# ifdef MALLOC_STATS
|
|
|
|
tcache_bin_stats_t tstats;
|
|
|
|
# endif
|
|
|
|
unsigned low_water; /* Min # cached since last GC. */
|
|
|
|
unsigned high_water; /* Max # cached since last GC. */
|
|
|
|
unsigned ncached; /* # of cached objects. */
|
|
|
|
void *slots[1]; /* Dynamically sized. */
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
};
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
struct tcache_s {
|
|
|
|
# ifdef MALLOC_STATS
|
|
|
|
ql_elm(tcache_t) link; /* Used for aggregating stats. */
|
|
|
|
# endif
|
|
|
|
arena_t *arena; /* This thread's arena. */
|
|
|
|
unsigned ev_cnt; /* Event count since incremental GC. */
|
|
|
|
unsigned next_gc_bin; /* Next bin to GC. */
|
|
|
|
tcache_bin_t *tbins[1]; /* Dynamically sized. */
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
};
|
|
|
|
#endif
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
|
|
|
* Data.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Number of CPUs. */
|
|
|
|
static unsigned ncpus;
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Various bin-related settings. */
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#ifdef MALLOC_TINY /* Number of (2^n)-spaced tiny bins. */
|
2010-01-31 23:16:10 +00:00
|
|
|
# define ntbins ((unsigned)(LG_QUANTUM - LG_TINY_MIN))
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#else
|
|
|
|
# define ntbins 0
|
|
|
|
#endif
|
2006-03-17 09:00:27 +00:00
|
|
|
static unsigned nqbins; /* Number of quantum-spaced bins. */
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
static unsigned ncbins; /* Number of cacheline-spaced bins. */
|
|
|
|
static unsigned nsbins; /* Number of subpage-spaced bins. */
|
2010-01-31 23:16:10 +00:00
|
|
|
static unsigned nmbins; /* Number of medium bins. */
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
static unsigned nbins;
|
2010-01-31 23:16:10 +00:00
|
|
|
static unsigned mbin0; /* mbin offset (nbins - nmbins). */
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#ifdef MALLOC_TINY
|
|
|
|
# define tspace_max ((size_t)(QUANTUM >> 1))
|
|
|
|
#endif
|
|
|
|
#define qspace_min QUANTUM
|
|
|
|
static size_t qspace_max;
|
|
|
|
static size_t cspace_min;
|
|
|
|
static size_t cspace_max;
|
|
|
|
static size_t sspace_min;
|
|
|
|
static size_t sspace_max;
|
2010-01-31 23:16:10 +00:00
|
|
|
#define small_maxclass sspace_max
|
|
|
|
#define medium_min PAGE_SIZE
|
|
|
|
static size_t medium_max;
|
|
|
|
#define bin_maxclass medium_max
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Soft limit on the number of medium size classes. Spacing between medium
|
|
|
|
* size classes never exceeds pagesize, which can force more than NBINS_MAX
|
|
|
|
* medium size classes.
|
|
|
|
*/
|
|
|
|
#define NMBINS_MAX 16
|
|
|
|
/* Spacing between medium size classes. */
|
|
|
|
static size_t lg_mspace;
|
|
|
|
static size_t mspace_mask;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
static uint8_t const *small_size2bin;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* const_small_size2bin is a static constant lookup table that in the common
|
|
|
|
* case can be used as-is for small_size2bin. For dynamically linked programs,
|
|
|
|
* this avoids a page of memory overhead per process.
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
*/
|
|
|
|
#define S2B_1(i) i,
|
|
|
|
#define S2B_2(i) S2B_1(i) S2B_1(i)
|
|
|
|
#define S2B_4(i) S2B_2(i) S2B_2(i)
|
|
|
|
#define S2B_8(i) S2B_4(i) S2B_4(i)
|
|
|
|
#define S2B_16(i) S2B_8(i) S2B_8(i)
|
|
|
|
#define S2B_32(i) S2B_16(i) S2B_16(i)
|
|
|
|
#define S2B_64(i) S2B_32(i) S2B_32(i)
|
|
|
|
#define S2B_128(i) S2B_64(i) S2B_64(i)
|
|
|
|
#define S2B_256(i) S2B_128(i) S2B_128(i)
|
2010-01-31 23:16:10 +00:00
|
|
|
static const uint8_t const_small_size2bin[PAGE_SIZE - 255] = {
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
S2B_1(0xffU) /* 0 */
|
2010-01-31 23:16:10 +00:00
|
|
|
#if (LG_QUANTUM == 4)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/* 64-bit system ************************/
|
|
|
|
# ifdef MALLOC_TINY
|
|
|
|
S2B_2(0) /* 2 */
|
|
|
|
S2B_2(1) /* 4 */
|
|
|
|
S2B_4(2) /* 8 */
|
|
|
|
S2B_8(3) /* 16 */
|
|
|
|
# define S2B_QMIN 3
|
|
|
|
# else
|
|
|
|
S2B_16(0) /* 16 */
|
|
|
|
# define S2B_QMIN 0
|
|
|
|
# endif
|
|
|
|
S2B_16(S2B_QMIN + 1) /* 32 */
|
|
|
|
S2B_16(S2B_QMIN + 2) /* 48 */
|
|
|
|
S2B_16(S2B_QMIN + 3) /* 64 */
|
|
|
|
S2B_16(S2B_QMIN + 4) /* 80 */
|
|
|
|
S2B_16(S2B_QMIN + 5) /* 96 */
|
|
|
|
S2B_16(S2B_QMIN + 6) /* 112 */
|
|
|
|
S2B_16(S2B_QMIN + 7) /* 128 */
|
|
|
|
# define S2B_CMIN (S2B_QMIN + 8)
|
|
|
|
#else
|
|
|
|
/* 32-bit system ************************/
|
|
|
|
# ifdef MALLOC_TINY
|
|
|
|
S2B_2(0) /* 2 */
|
|
|
|
S2B_2(1) /* 4 */
|
|
|
|
S2B_4(2) /* 8 */
|
|
|
|
# define S2B_QMIN 2
|
|
|
|
# else
|
|
|
|
S2B_8(0) /* 8 */
|
|
|
|
# define S2B_QMIN 0
|
|
|
|
# endif
|
|
|
|
S2B_8(S2B_QMIN + 1) /* 16 */
|
|
|
|
S2B_8(S2B_QMIN + 2) /* 24 */
|
|
|
|
S2B_8(S2B_QMIN + 3) /* 32 */
|
|
|
|
S2B_8(S2B_QMIN + 4) /* 40 */
|
|
|
|
S2B_8(S2B_QMIN + 5) /* 48 */
|
|
|
|
S2B_8(S2B_QMIN + 6) /* 56 */
|
|
|
|
S2B_8(S2B_QMIN + 7) /* 64 */
|
|
|
|
S2B_8(S2B_QMIN + 8) /* 72 */
|
|
|
|
S2B_8(S2B_QMIN + 9) /* 80 */
|
|
|
|
S2B_8(S2B_QMIN + 10) /* 88 */
|
|
|
|
S2B_8(S2B_QMIN + 11) /* 96 */
|
|
|
|
S2B_8(S2B_QMIN + 12) /* 104 */
|
|
|
|
S2B_8(S2B_QMIN + 13) /* 112 */
|
|
|
|
S2B_8(S2B_QMIN + 14) /* 120 */
|
|
|
|
S2B_8(S2B_QMIN + 15) /* 128 */
|
|
|
|
# define S2B_CMIN (S2B_QMIN + 16)
|
|
|
|
#endif
|
|
|
|
/****************************************/
|
|
|
|
S2B_64(S2B_CMIN + 0) /* 192 */
|
|
|
|
S2B_64(S2B_CMIN + 1) /* 256 */
|
|
|
|
S2B_64(S2B_CMIN + 2) /* 320 */
|
|
|
|
S2B_64(S2B_CMIN + 3) /* 384 */
|
|
|
|
S2B_64(S2B_CMIN + 4) /* 448 */
|
|
|
|
S2B_64(S2B_CMIN + 5) /* 512 */
|
|
|
|
# define S2B_SMIN (S2B_CMIN + 6)
|
|
|
|
S2B_256(S2B_SMIN + 0) /* 768 */
|
|
|
|
S2B_256(S2B_SMIN + 1) /* 1024 */
|
|
|
|
S2B_256(S2B_SMIN + 2) /* 1280 */
|
|
|
|
S2B_256(S2B_SMIN + 3) /* 1536 */
|
|
|
|
S2B_256(S2B_SMIN + 4) /* 1792 */
|
|
|
|
S2B_256(S2B_SMIN + 5) /* 2048 */
|
|
|
|
S2B_256(S2B_SMIN + 6) /* 2304 */
|
|
|
|
S2B_256(S2B_SMIN + 7) /* 2560 */
|
|
|
|
S2B_256(S2B_SMIN + 8) /* 2816 */
|
|
|
|
S2B_256(S2B_SMIN + 9) /* 3072 */
|
|
|
|
S2B_256(S2B_SMIN + 10) /* 3328 */
|
|
|
|
S2B_256(S2B_SMIN + 11) /* 3584 */
|
|
|
|
S2B_256(S2B_SMIN + 12) /* 3840 */
|
2008-09-10 14:27:34 +00:00
|
|
|
#if (PAGE_SHIFT == 13)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
S2B_256(S2B_SMIN + 13) /* 4096 */
|
|
|
|
S2B_256(S2B_SMIN + 14) /* 4352 */
|
|
|
|
S2B_256(S2B_SMIN + 15) /* 4608 */
|
|
|
|
S2B_256(S2B_SMIN + 16) /* 4864 */
|
|
|
|
S2B_256(S2B_SMIN + 17) /* 5120 */
|
|
|
|
S2B_256(S2B_SMIN + 18) /* 5376 */
|
|
|
|
S2B_256(S2B_SMIN + 19) /* 5632 */
|
|
|
|
S2B_256(S2B_SMIN + 20) /* 5888 */
|
|
|
|
S2B_256(S2B_SMIN + 21) /* 6144 */
|
|
|
|
S2B_256(S2B_SMIN + 22) /* 6400 */
|
|
|
|
S2B_256(S2B_SMIN + 23) /* 6656 */
|
|
|
|
S2B_256(S2B_SMIN + 24) /* 6912 */
|
|
|
|
S2B_256(S2B_SMIN + 25) /* 7168 */
|
|
|
|
S2B_256(S2B_SMIN + 26) /* 7424 */
|
|
|
|
S2B_256(S2B_SMIN + 27) /* 7680 */
|
|
|
|
S2B_256(S2B_SMIN + 28) /* 7936 */
|
|
|
|
#endif
|
|
|
|
};
|
|
|
|
#undef S2B_1
|
|
|
|
#undef S2B_2
|
|
|
|
#undef S2B_4
|
|
|
|
#undef S2B_8
|
|
|
|
#undef S2B_16
|
|
|
|
#undef S2B_32
|
|
|
|
#undef S2B_64
|
|
|
|
#undef S2B_128
|
|
|
|
#undef S2B_256
|
|
|
|
#undef S2B_QMIN
|
|
|
|
#undef S2B_CMIN
|
|
|
|
#undef S2B_SMIN
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Various chunk-related settings. */
|
2007-03-23 22:58:15 +00:00
|
|
|
static size_t chunksize;
|
|
|
|
static size_t chunksize_mask; /* (chunksize - 1). */
|
2008-02-06 02:59:54 +00:00
|
|
|
static size_t chunk_npages;
|
|
|
|
static size_t arena_chunk_header_npages;
|
2006-03-17 09:00:27 +00:00
|
|
|
static size_t arena_maxclass; /* Max size class for arenas. */
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/********/
|
|
|
|
/*
|
|
|
|
* Chunks.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Protects chunk-related data structures. */
|
2007-12-31 00:59:16 +00:00
|
|
|
static malloc_mutex_t huge_mtx;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/* Tree of chunks that are stand-alone huge allocations. */
|
2008-04-23 16:09:18 +00:00
|
|
|
static extent_tree_t huge;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2006-09-08 17:52:15 +00:00
|
|
|
/*
|
2007-12-31 00:59:16 +00:00
|
|
|
* Protects sbrk() calls. This avoids malloc races among threads, though it
|
|
|
|
* does not protect against races with threads that call sbrk() directly.
|
2006-09-08 17:52:15 +00:00
|
|
|
*/
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
static malloc_mutex_t dss_mtx;
|
|
|
|
/* Base address of the DSS. */
|
|
|
|
static void *dss_base;
|
|
|
|
/* Current end of the DSS, or ((void *)-1) if the DSS is exhausted. */
|
|
|
|
static void *dss_prev;
|
|
|
|
/* Current upper limit on DSS addresses. */
|
|
|
|
static void *dss_max;
|
2007-12-31 00:59:16 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Trees of chunks that were previously allocated (trees differ only in node
|
|
|
|
* ordering). These are used when allocating chunks, in an attempt to re-use
|
2008-02-06 02:59:54 +00:00
|
|
|
* address space. Depending on function, different tree orderings are needed,
|
2007-12-31 00:59:16 +00:00
|
|
|
* which is why there are two trees with the same contents.
|
|
|
|
*/
|
2008-04-23 16:09:18 +00:00
|
|
|
static extent_tree_t dss_chunks_szad;
|
|
|
|
static extent_tree_t dss_chunks_ad;
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
2007-03-28 19:55:07 +00:00
|
|
|
/* Huge allocation statistics. */
|
2006-03-17 09:00:27 +00:00
|
|
|
static uint64_t huge_nmalloc;
|
|
|
|
static uint64_t huge_ndalloc;
|
2006-01-13 18:38:56 +00:00
|
|
|
static size_t huge_allocated;
|
|
|
|
#endif
|
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
/****************************/
|
|
|
|
/*
|
|
|
|
* base (internal allocation).
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
2007-03-28 19:55:07 +00:00
|
|
|
* Current pages that are being used for internal memory allocations. These
|
|
|
|
* pages are carved up in cacheline-size quanta, so that there is no chance of
|
2006-03-26 23:37:25 +00:00
|
|
|
* false cache line sharing.
|
|
|
|
*/
|
2007-03-28 19:55:07 +00:00
|
|
|
static void *base_pages;
|
2006-01-19 07:23:13 +00:00
|
|
|
static void *base_next_addr;
|
2007-03-28 19:55:07 +00:00
|
|
|
static void *base_past_addr; /* Addr immediately past base_pages. */
|
2008-02-06 02:59:54 +00:00
|
|
|
static extent_node_t *base_nodes;
|
2006-01-19 07:23:13 +00:00
|
|
|
static malloc_mutex_t base_mtx;
|
2006-01-16 05:13:49 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2007-03-23 05:05:48 +00:00
|
|
|
static size_t base_mapped;
|
2006-01-16 05:13:49 +00:00
|
|
|
#endif
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/********/
|
|
|
|
/*
|
|
|
|
* Arenas.
|
|
|
|
*/
|
|
|
|
|
2006-03-26 23:37:25 +00:00
|
|
|
/*
|
2006-01-13 18:38:56 +00:00
|
|
|
* Arenas that are used to service external requests. Not all elements of the
|
|
|
|
* arenas array are necessarily used; arenas are created lazily as needed.
|
|
|
|
*/
|
|
|
|
static arena_t **arenas;
|
|
|
|
static unsigned narenas;
|
|
|
|
#ifndef NO_TLS
|
2007-12-17 01:20:04 +00:00
|
|
|
static unsigned next_arena;
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
2008-02-06 02:59:54 +00:00
|
|
|
static pthread_mutex_t arenas_lock; /* Protects arenas initialization. */
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
#ifndef NO_TLS
|
1994-05-27 05:00:24 +00:00
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* Map of _pthread_self() --> arenas[???], used for selecting an arena to use
|
2006-01-13 18:38:56 +00:00
|
|
|
* for allocations.
|
1994-05-27 05:00:24 +00:00
|
|
|
*/
|
2010-02-16 06:47:00 +00:00
|
|
|
static __thread arena_t *arenas_map TLS_MODEL;
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
/* Map of thread-specific caches. */
|
2010-02-16 06:47:00 +00:00
|
|
|
static __thread tcache_t *tcache_tls TLS_MODEL;
|
2010-01-31 23:16:10 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* Number of cache slots for each bin in the thread cache, or 0 if tcache is
|
|
|
|
* disabled.
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
size_t tcache_nslots;
|
|
|
|
|
|
|
|
/* Number of tcache allocation/deallocation events between incremental GCs. */
|
|
|
|
unsigned tcache_gc_incr;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* Used by chunk_alloc_mmap() to decide whether to attempt the fast path and
|
|
|
|
* potentially avoid some system calls. We can get away without TLS here,
|
|
|
|
* since the state of mmap_unaligned only affects performance, rather than
|
|
|
|
* correct function.
|
|
|
|
*/
|
|
|
|
#ifndef NO_TLS
|
2010-02-16 06:47:00 +00:00
|
|
|
static __thread bool mmap_unaligned TLS_MODEL;
|
|
|
|
#else
|
|
|
|
static bool mmap_unaligned;
|
2010-01-31 23:16:10 +00:00
|
|
|
#endif
|
2010-02-16 06:47:00 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
static malloc_mutex_t chunks_mtx;
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Chunk statistics. */
|
2006-01-19 07:23:13 +00:00
|
|
|
static chunk_stats_t stats_chunks;
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
|
|
|
|
|
|
|
/*******************************/
|
|
|
|
/*
|
|
|
|
* Runtime configuration options.
|
|
|
|
*/
|
|
|
|
const char *_malloc_options;
|
|
|
|
|
2007-03-23 05:05:48 +00:00
|
|
|
#ifndef MALLOC_PRODUCTION
|
2006-01-13 18:38:56 +00:00
|
|
|
static bool opt_abort = true;
|
|
|
|
static bool opt_junk = true;
|
2006-03-17 09:00:27 +00:00
|
|
|
#else
|
|
|
|
static bool opt_abort = false;
|
|
|
|
static bool opt_junk = false;
|
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
static size_t opt_lg_tcache_nslots = LG_TCACHE_NSLOTS_DEFAULT;
|
|
|
|
static ssize_t opt_lg_tcache_gc_sweep = LG_TCACHE_GC_SWEEP_DEFAULT;
|
|
|
|
#endif
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
static bool opt_dss = true;
|
2008-01-03 23:22:13 +00:00
|
|
|
static bool opt_mmap = true;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
static ssize_t opt_lg_dirty_mult = LG_DIRTY_MULT_DEFAULT;
|
|
|
|
static bool opt_stats_print = false;
|
|
|
|
static size_t opt_lg_qspace_max = LG_QSPACE_MAX_DEFAULT;
|
|
|
|
static size_t opt_lg_cspace_max = LG_CSPACE_MAX_DEFAULT;
|
|
|
|
static size_t opt_lg_medium_max = LG_MEDIUM_MAX_DEFAULT;
|
|
|
|
static size_t opt_lg_chunk = LG_CHUNK_DEFAULT;
|
2006-01-13 18:38:56 +00:00
|
|
|
static bool opt_utrace = false;
|
|
|
|
static bool opt_sysv = false;
|
|
|
|
static bool opt_xmalloc = false;
|
|
|
|
static bool opt_zero = false;
|
2007-11-27 03:09:23 +00:00
|
|
|
static int opt_narenas_lshift = 0;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
typedef struct {
|
|
|
|
void *p;
|
|
|
|
size_t s;
|
|
|
|
void *r;
|
|
|
|
} malloc_utrace_t;
|
|
|
|
|
|
|
|
#define UTRACE(a, b, c) \
|
|
|
|
if (opt_utrace) { \
|
2008-02-06 02:59:54 +00:00
|
|
|
malloc_utrace_t ut; \
|
|
|
|
ut.p = (a); \
|
|
|
|
ut.s = (b); \
|
|
|
|
ut.r = (c); \
|
2006-01-13 18:38:56 +00:00
|
|
|
utrace(&ut, sizeof(ut)); \
|
|
|
|
}
|
1994-05-27 05:00:24 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/******************************************************************************/
|
1995-09-16 09:28:13 +00:00
|
|
|
/*
|
2006-01-13 18:38:56 +00:00
|
|
|
* Begin function prototypes for non-inline static functions.
|
1995-09-16 09:28:13 +00:00
|
|
|
*/
|
1994-05-27 05:00:24 +00:00
|
|
|
|
2007-12-17 01:20:04 +00:00
|
|
|
static void malloc_mutex_init(malloc_mutex_t *mutex);
|
2007-11-27 03:17:30 +00:00
|
|
|
static bool malloc_spin_init(pthread_mutex_t *lock);
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TINY
|
|
|
|
static size_t pow2_ceil(size_t x);
|
|
|
|
#endif
|
2006-01-13 18:38:56 +00:00
|
|
|
static void wrtmessage(const char *p1, const char *p2, const char *p3,
|
|
|
|
const char *p4);
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2006-01-13 18:38:56 +00:00
|
|
|
static void malloc_printf(const char *format, ...);
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
static char *umax2s(uintmax_t x, unsigned base, char *s);
|
2008-02-06 02:59:54 +00:00
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
static bool base_pages_alloc_dss(size_t minsize);
|
|
|
|
#endif
|
|
|
|
static bool base_pages_alloc_mmap(size_t minsize);
|
2007-03-28 19:55:07 +00:00
|
|
|
static bool base_pages_alloc(size_t minsize);
|
2006-01-16 05:13:49 +00:00
|
|
|
static void *base_alloc(size_t size);
|
2007-11-27 03:17:30 +00:00
|
|
|
static void *base_calloc(size_t number, size_t size);
|
2008-02-06 02:59:54 +00:00
|
|
|
static extent_node_t *base_node_alloc(void);
|
|
|
|
static void base_node_dealloc(extent_node_t *node);
|
2006-01-13 18:38:56 +00:00
|
|
|
static void *pages_map(void *addr, size_t size);
|
|
|
|
static void pages_unmap(void *addr, size_t size);
|
2008-02-06 02:59:54 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2010-01-31 23:16:10 +00:00
|
|
|
static void *chunk_alloc_dss(size_t size, bool *zero);
|
|
|
|
static void *chunk_recycle_dss(size_t size, bool *zero);
|
2008-02-06 02:59:54 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
static void *chunk_alloc_mmap_slow(size_t size, bool unaligned);
|
2008-02-06 02:59:54 +00:00
|
|
|
static void *chunk_alloc_mmap(size_t size);
|
2010-01-31 23:16:10 +00:00
|
|
|
static void *chunk_alloc(size_t size, bool *zero);
|
2008-02-06 02:59:54 +00:00
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
static extent_node_t *chunk_dealloc_dss_record(void *chunk, size_t size);
|
|
|
|
static bool chunk_dealloc_dss(void *chunk, size_t size);
|
|
|
|
#endif
|
|
|
|
static void chunk_dealloc_mmap(void *chunk, size_t size);
|
2006-01-13 18:38:56 +00:00
|
|
|
static void chunk_dealloc(void *chunk, size_t size);
|
2006-03-30 20:25:52 +00:00
|
|
|
#ifndef NO_TLS
|
|
|
|
static arena_t *choose_arena_hard(void);
|
|
|
|
#endif
|
2007-11-27 03:12:15 +00:00
|
|
|
static void arena_run_split(arena_t *arena, arena_run_t *run, size_t size,
|
2008-07-18 19:35:44 +00:00
|
|
|
bool large, bool zero);
|
2006-03-17 09:00:27 +00:00
|
|
|
static arena_chunk_t *arena_chunk_alloc(arena_t *arena);
|
2006-12-23 00:18:51 +00:00
|
|
|
static void arena_chunk_dealloc(arena_t *arena, arena_chunk_t *chunk);
|
2008-07-18 19:35:44 +00:00
|
|
|
static arena_run_t *arena_run_alloc(arena_t *arena, size_t size, bool large,
|
2008-02-06 02:59:54 +00:00
|
|
|
bool zero);
|
|
|
|
static void arena_purge(arena_t *arena);
|
|
|
|
static void arena_run_dalloc(arena_t *arena, arena_run_t *run, bool dirty);
|
|
|
|
static void arena_run_trim_head(arena_t *arena, arena_chunk_t *chunk,
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_t *run, size_t oldsize, size_t newsize);
|
2008-02-06 02:59:54 +00:00
|
|
|
static void arena_run_trim_tail(arena_t *arena, arena_chunk_t *chunk,
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_t *run, size_t oldsize, size_t newsize, bool dirty);
|
2006-07-27 04:00:12 +00:00
|
|
|
static arena_run_t *arena_bin_nonfull_run_get(arena_t *arena, arena_bin_t *bin);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
static void *arena_bin_malloc_hard(arena_t *arena, arena_bin_t *bin);
|
|
|
|
static size_t arena_bin_run_size_calc(arena_bin_t *bin, size_t min_run_size);
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
static void tcache_bin_fill(tcache_t *tcache, tcache_bin_t *tbin,
|
|
|
|
size_t binind);
|
|
|
|
static void *tcache_alloc_hard(tcache_t *tcache, tcache_bin_t *tbin,
|
|
|
|
size_t binind);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
static void *arena_malloc_medium(arena_t *arena, size_t size, bool zero);
|
2008-02-06 02:59:54 +00:00
|
|
|
static void *arena_malloc_large(arena_t *arena, size_t size, bool zero);
|
2007-03-23 22:58:15 +00:00
|
|
|
static void *arena_palloc(arena_t *arena, size_t alignment, size_t size,
|
|
|
|
size_t alloc_size);
|
2010-01-31 23:16:10 +00:00
|
|
|
static bool arena_is_large(const void *ptr);
|
2006-03-30 20:25:52 +00:00
|
|
|
static size_t arena_salloc(const void *ptr);
|
2010-01-31 23:16:10 +00:00
|
|
|
static void
|
|
|
|
arena_dalloc_bin_run(arena_t *arena, arena_chunk_t *chunk, arena_run_t *run,
|
|
|
|
arena_bin_t *bin);
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
static void arena_stats_print(arena_t *arena);
|
|
|
|
#endif
|
|
|
|
static void stats_print_atexit(void);
|
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
static void tcache_bin_flush(tcache_bin_t *tbin, size_t binind,
|
|
|
|
unsigned rem);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
2008-02-06 02:59:54 +00:00
|
|
|
static void arena_dalloc_large(arena_t *arena, arena_chunk_t *chunk,
|
|
|
|
void *ptr);
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
static void arena_dalloc_hard(arena_t *arena, arena_chunk_t *chunk,
|
|
|
|
void *ptr, arena_chunk_map_t *mapelm, tcache_t *tcache);
|
|
|
|
#endif
|
2008-02-17 18:34:17 +00:00
|
|
|
static void arena_ralloc_large_shrink(arena_t *arena, arena_chunk_t *chunk,
|
2008-02-08 00:35:56 +00:00
|
|
|
void *ptr, size_t size, size_t oldsize);
|
2008-02-17 18:34:17 +00:00
|
|
|
static bool arena_ralloc_large_grow(arena_t *arena, arena_chunk_t *chunk,
|
2008-02-08 00:35:56 +00:00
|
|
|
void *ptr, size_t size, size_t oldsize);
|
2008-02-17 18:34:17 +00:00
|
|
|
static bool arena_ralloc_large(void *ptr, size_t size, size_t oldsize);
|
2008-02-08 00:35:56 +00:00
|
|
|
static void *arena_ralloc(void *ptr, size_t size, size_t oldsize);
|
2010-01-31 23:16:10 +00:00
|
|
|
static bool arena_new(arena_t *arena, unsigned ind);
|
2006-01-13 18:38:56 +00:00
|
|
|
static arena_t *arenas_extend(unsigned ind);
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
static tcache_bin_t *tcache_bin_create(arena_t *arena);
|
|
|
|
static void tcache_bin_destroy(tcache_t *tcache, tcache_bin_t *tbin,
|
|
|
|
unsigned binind);
|
|
|
|
# ifdef MALLOC_STATS
|
|
|
|
static void tcache_stats_merge(tcache_t *tcache, arena_t *arena);
|
|
|
|
# endif
|
|
|
|
static tcache_t *tcache_create(arena_t *arena);
|
|
|
|
static void tcache_destroy(tcache_t *tcache);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
2007-11-27 03:12:15 +00:00
|
|
|
static void *huge_malloc(size_t size, bool zero);
|
2007-03-23 22:58:15 +00:00
|
|
|
static void *huge_palloc(size_t alignment, size_t size);
|
2006-03-19 18:28:06 +00:00
|
|
|
static void *huge_ralloc(void *ptr, size_t size, size_t oldsize);
|
2006-01-13 18:38:56 +00:00
|
|
|
static void huge_dalloc(void *ptr);
|
2010-01-31 23:16:10 +00:00
|
|
|
static void malloc_stats_print(void);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#ifdef MALLOC_DEBUG
|
2010-01-31 23:16:10 +00:00
|
|
|
static void small_size2bin_validate(void);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
static bool small_size2bin_init(void);
|
|
|
|
static bool small_size2bin_init_hard(void);
|
|
|
|
static unsigned malloc_ncpus(void);
|
2006-01-19 02:11:05 +00:00
|
|
|
static bool malloc_init_hard(void);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* End function prototypes.
|
|
|
|
*/
|
|
|
|
/******************************************************************************/
|
2010-01-31 23:16:10 +00:00
|
|
|
|
|
|
|
static void
|
|
|
|
wrtmessage(const char *p1, const char *p2, const char *p3, const char *p4)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (_write(STDERR_FILENO, p1, strlen(p1)) < 0
|
|
|
|
|| _write(STDERR_FILENO, p2, strlen(p2)) < 0
|
|
|
|
|| _write(STDERR_FILENO, p3, strlen(p3)) < 0
|
|
|
|
|| _write(STDERR_FILENO, p4, strlen(p4)) < 0)
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
void (*_malloc_message)(const char *p1, const char *p2, const char *p3,
|
|
|
|
const char *p4) = wrtmessage;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't want to depend on vsnprintf() for production builds, since that can
|
|
|
|
* cause unnecessary bloat for static binaries. umax2s() provides minimal
|
|
|
|
* integer printing functionality, so that malloc_printf() use can be limited to
|
|
|
|
* MALLOC_STATS code.
|
|
|
|
*/
|
|
|
|
#define UMAX2S_BUFSIZE 65
|
|
|
|
static char *
|
|
|
|
umax2s(uintmax_t x, unsigned base, char *s)
|
|
|
|
{
|
|
|
|
unsigned i;
|
|
|
|
|
|
|
|
i = UMAX2S_BUFSIZE - 1;
|
|
|
|
s[i] = '\0';
|
|
|
|
switch (base) {
|
|
|
|
case 10:
|
|
|
|
do {
|
|
|
|
i--;
|
|
|
|
s[i] = "0123456789"[x % 10];
|
|
|
|
x /= 10;
|
|
|
|
} while (x > 0);
|
|
|
|
break;
|
|
|
|
case 16:
|
|
|
|
do {
|
|
|
|
i--;
|
|
|
|
s[i] = "0123456789abcdef"[x & 0xf];
|
|
|
|
x >>= 4;
|
|
|
|
} while (x > 0);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
do {
|
|
|
|
i--;
|
|
|
|
s[i] = "0123456789abcdefghijklmnopqrstuvwxyz"[x % base];
|
|
|
|
x /= base;
|
|
|
|
} while (x > 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (&s[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Define a custom assert() in order to reduce the chances of deadlock during
|
|
|
|
* assertion failure.
|
|
|
|
*/
|
|
|
|
#ifdef MALLOC_DEBUG
|
|
|
|
# define assert(e) do { \
|
|
|
|
if (!(e)) { \
|
|
|
|
char line_buf[UMAX2S_BUFSIZE]; \
|
|
|
|
_malloc_message(_getprogname(), ": (malloc) ", \
|
|
|
|
__FILE__, ":"); \
|
|
|
|
_malloc_message(umax2s(__LINE__, 10, line_buf), \
|
|
|
|
": Failed assertion: ", "\"", #e); \
|
|
|
|
_malloc_message("\"\n", "", "", ""); \
|
|
|
|
abort(); \
|
|
|
|
} \
|
|
|
|
} while (0)
|
|
|
|
#else
|
|
|
|
#define assert(e)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
/*
|
|
|
|
* Print to stderr in such a way as to (hopefully) avoid memory allocation.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
malloc_printf(const char *format, ...)
|
|
|
|
{
|
|
|
|
char buf[4096];
|
|
|
|
va_list ap;
|
|
|
|
|
|
|
|
va_start(ap, format);
|
|
|
|
vsnprintf(buf, sizeof(buf), format, ap);
|
|
|
|
va_end(ap);
|
|
|
|
_malloc_message(buf, "", "", "");
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/******************************************************************************/
|
2006-01-13 18:38:56 +00:00
|
|
|
/*
|
2007-11-27 03:17:30 +00:00
|
|
|
* Begin mutex. We can't use normal pthread mutexes in all places, because
|
|
|
|
* they require malloc()ed memory, which causes bootstrapping issues in some
|
|
|
|
* cases.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
static void
|
2007-12-17 01:20:04 +00:00
|
|
|
malloc_mutex_init(malloc_mutex_t *mutex)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
|
|
|
static const spinlock_t lock = _SPINLOCK_INITIALIZER;
|
|
|
|
|
2007-12-17 01:20:04 +00:00
|
|
|
mutex->lock = lock;
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
static inline void
|
2007-12-17 01:20:04 +00:00
|
|
|
malloc_mutex_lock(malloc_mutex_t *mutex)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
if (__isthreaded)
|
2007-12-17 01:20:04 +00:00
|
|
|
_SPINLOCK(&mutex->lock);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
static inline void
|
2007-12-17 01:20:04 +00:00
|
|
|
malloc_mutex_unlock(malloc_mutex_t *mutex)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
if (__isthreaded)
|
2007-12-17 01:20:04 +00:00
|
|
|
_SPINUNLOCK(&mutex->lock);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* End mutex.
|
|
|
|
*/
|
|
|
|
/******************************************************************************/
|
2007-11-27 03:17:30 +00:00
|
|
|
/*
|
|
|
|
* Begin spin lock. Spin locks here are actually adaptive mutexes that block
|
|
|
|
* after a period of spinning, because unbounded spinning would allow for
|
|
|
|
* priority inversion.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We use an unpublished interface to initialize pthread mutexes with an
|
|
|
|
* allocation callback, in order to avoid infinite recursion.
|
|
|
|
*/
|
|
|
|
int _pthread_mutex_init_calloc_cb(pthread_mutex_t *mutex,
|
|
|
|
void *(calloc_cb)(size_t, size_t));
|
|
|
|
|
|
|
|
__weak_reference(_pthread_mutex_init_calloc_cb_stub,
|
|
|
|
_pthread_mutex_init_calloc_cb);
|
|
|
|
|
|
|
|
int
|
|
|
|
_pthread_mutex_init_calloc_cb_stub(pthread_mutex_t *mutex,
|
|
|
|
void *(calloc_cb)(size_t, size_t))
|
|
|
|
{
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool
|
|
|
|
malloc_spin_init(pthread_mutex_t *lock)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (_pthread_mutex_init_calloc_cb(lock, base_calloc) != 0)
|
|
|
|
return (true);
|
|
|
|
|
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
static inline void
|
2007-11-27 03:17:30 +00:00
|
|
|
malloc_spin_lock(pthread_mutex_t *lock)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (__isthreaded) {
|
|
|
|
if (_pthread_mutex_trylock(lock) != 0) {
|
2008-11-30 05:55:24 +00:00
|
|
|
/* Exponentially back off if there are multiple CPUs. */
|
|
|
|
if (ncpus > 1) {
|
|
|
|
unsigned i;
|
|
|
|
volatile unsigned j;
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
for (i = 1; i <= LG_SPIN_LIMIT; i++) {
|
2008-11-30 05:55:24 +00:00
|
|
|
for (j = 0; j < (1U << i); j++) {
|
|
|
|
CPU_SPINWAIT;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (_pthread_mutex_trylock(lock) == 0)
|
2010-01-31 23:16:10 +00:00
|
|
|
return;
|
2008-08-14 17:31:42 +00:00
|
|
|
}
|
2007-11-27 03:17:30 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Spinning failed. Block until the lock becomes
|
|
|
|
* available, in order to avoid indefinite priority
|
|
|
|
* inversion.
|
|
|
|
*/
|
|
|
|
_pthread_mutex_lock(lock);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
malloc_spin_unlock(pthread_mutex_t *lock)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (__isthreaded)
|
|
|
|
_pthread_mutex_unlock(lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* End spin lock.
|
|
|
|
*/
|
|
|
|
/******************************************************************************/
|
2006-01-13 18:38:56 +00:00
|
|
|
/*
|
|
|
|
* Begin Utility functions/macros.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Return the chunk address for allocation address a. */
|
|
|
|
#define CHUNK_ADDR2BASE(a) \
|
2007-03-23 22:58:15 +00:00
|
|
|
((void *)((uintptr_t)(a) & ~chunksize_mask))
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/* Return the chunk offset of address a. */
|
|
|
|
#define CHUNK_ADDR2OFFSET(a) \
|
2007-03-23 22:58:15 +00:00
|
|
|
((size_t)((uintptr_t)(a) & chunksize_mask))
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/* Return the smallest chunk multiple that is >= s. */
|
|
|
|
#define CHUNK_CEILING(s) \
|
2007-03-23 22:58:15 +00:00
|
|
|
(((s) + chunksize_mask) & ~chunksize_mask)
|
2006-01-13 18:38:56 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/* Return the smallest quantum multiple that is >= a. */
|
|
|
|
#define QUANTUM_CEILING(a) \
|
|
|
|
(((a) + QUANTUM_MASK) & ~QUANTUM_MASK)
|
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
/* Return the smallest cacheline multiple that is >= s. */
|
|
|
|
#define CACHELINE_CEILING(s) \
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
(((s) + CACHELINE_MASK) & ~CACHELINE_MASK)
|
2006-01-16 05:13:49 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/* Return the smallest subpage multiple that is >= s. */
|
|
|
|
#define SUBPAGE_CEILING(s) \
|
|
|
|
(((s) + SUBPAGE_MASK) & ~SUBPAGE_MASK)
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Return the smallest medium size class that is >= s. */
|
|
|
|
#define MEDIUM_CEILING(s) \
|
|
|
|
(((s) + mspace_mask) & ~mspace_mask)
|
|
|
|
|
|
|
|
/* Return the smallest pagesize multiple that is >= s. */
|
2007-03-23 05:05:48 +00:00
|
|
|
#define PAGE_CEILING(s) \
|
2008-09-10 14:27:34 +00:00
|
|
|
(((s) + PAGE_MASK) & ~PAGE_MASK)
|
2007-03-23 05:05:48 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#ifdef MALLOC_TINY
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Compute the smallest power of 2 that is >= x. */
|
2010-01-31 23:16:10 +00:00
|
|
|
static size_t
|
2006-03-17 09:00:27 +00:00
|
|
|
pow2_ceil(size_t x)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
2006-04-27 01:03:00 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
x--;
|
|
|
|
x |= x >> 1;
|
|
|
|
x |= x >> 2;
|
|
|
|
x |= x >> 4;
|
|
|
|
x |= x >> 8;
|
|
|
|
x |= x >> 16;
|
|
|
|
#if (SIZEOF_PTR == 8)
|
|
|
|
x |= x >> 32;
|
|
|
|
#endif
|
|
|
|
x++;
|
|
|
|
return (x);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
/******************************************************************************/
|
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2008-02-06 02:59:54 +00:00
|
|
|
static bool
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
base_pages_alloc_dss(size_t minsize)
|
2006-09-08 17:52:15 +00:00
|
|
|
{
|
|
|
|
|
|
|
|
/*
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
* Do special DSS allocation here, since base allocations don't need to
|
2007-03-28 19:55:07 +00:00
|
|
|
* be chunk-aligned.
|
2006-09-08 17:52:15 +00:00
|
|
|
*/
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_lock(&dss_mtx);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
if (dss_prev != (void *)-1) {
|
2006-09-08 17:52:15 +00:00
|
|
|
intptr_t incr;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
size_t csize = CHUNK_CEILING(minsize);
|
2006-09-08 17:52:15 +00:00
|
|
|
|
|
|
|
do {
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
/* Get the current end of the DSS. */
|
2007-12-31 06:19:48 +00:00
|
|
|
dss_max = sbrk(0);
|
2006-09-08 17:52:15 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Calculate how much padding is necessary to
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
* chunk-align the end of the DSS. Don't worry about
|
2007-12-31 06:19:48 +00:00
|
|
|
* dss_max not being chunk-aligned though.
|
2006-09-08 17:52:15 +00:00
|
|
|
*/
|
2007-03-23 22:58:15 +00:00
|
|
|
incr = (intptr_t)chunksize
|
2007-12-31 06:19:48 +00:00
|
|
|
- (intptr_t)CHUNK_ADDR2OFFSET(dss_max);
|
2008-02-06 02:59:54 +00:00
|
|
|
assert(incr >= 0);
|
|
|
|
if ((size_t)incr < minsize)
|
2007-03-28 19:55:07 +00:00
|
|
|
incr += csize;
|
2006-09-08 17:52:15 +00:00
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
dss_prev = sbrk(incr);
|
2007-12-31 06:19:48 +00:00
|
|
|
if (dss_prev == dss_max) {
|
2006-09-08 17:52:15 +00:00
|
|
|
/* Success. */
|
2007-12-31 06:19:48 +00:00
|
|
|
dss_max = (void *)((intptr_t)dss_prev + incr);
|
|
|
|
base_pages = dss_prev;
|
2007-03-28 19:55:07 +00:00
|
|
|
base_next_addr = base_pages;
|
2007-12-31 06:19:48 +00:00
|
|
|
base_past_addr = dss_max;
|
2006-09-08 17:52:15 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2007-03-23 05:05:48 +00:00
|
|
|
base_mapped += incr;
|
2006-09-08 17:52:15 +00:00
|
|
|
#endif
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_unlock(&dss_mtx);
|
2006-09-08 17:52:15 +00:00
|
|
|
return (false);
|
|
|
|
}
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
} while (dss_prev != (void *)-1);
|
2007-03-28 19:55:07 +00:00
|
|
|
}
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_unlock(&dss_mtx);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
|
|
|
return (true);
|
|
|
|
}
|
2006-09-08 17:52:15 +00:00
|
|
|
#endif
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
static bool
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
base_pages_alloc_mmap(size_t minsize)
|
|
|
|
{
|
|
|
|
size_t csize;
|
|
|
|
|
2007-03-28 19:55:07 +00:00
|
|
|
assert(minsize != 0);
|
|
|
|
csize = PAGE_CEILING(minsize);
|
|
|
|
base_pages = pages_map(NULL, csize);
|
|
|
|
if (base_pages == NULL)
|
2006-09-08 17:52:15 +00:00
|
|
|
return (true);
|
2007-03-28 19:55:07 +00:00
|
|
|
base_next_addr = base_pages;
|
|
|
|
base_past_addr = (void *)((uintptr_t)base_pages + csize);
|
2006-09-08 17:52:15 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2007-03-28 19:55:07 +00:00
|
|
|
base_mapped += csize;
|
2006-09-08 17:52:15 +00:00
|
|
|
#endif
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
2006-09-08 17:52:15 +00:00
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
static bool
|
|
|
|
base_pages_alloc(size_t minsize)
|
|
|
|
{
|
|
|
|
|
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
if (opt_mmap && minsize != 0)
|
|
|
|
#endif
|
|
|
|
{
|
|
|
|
if (base_pages_alloc_mmap(minsize) == false)
|
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
|
2008-11-03 21:17:18 +00:00
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
if (opt_dss) {
|
|
|
|
if (base_pages_alloc_dss(minsize) == false)
|
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
return (true);
|
|
|
|
}
|
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
static void *
|
|
|
|
base_alloc(size_t size)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
size_t csize;
|
|
|
|
|
|
|
|
/* Round size up to nearest multiple of the cacheline size. */
|
|
|
|
csize = CACHELINE_CEILING(size);
|
|
|
|
|
|
|
|
malloc_mutex_lock(&base_mtx);
|
2007-03-28 19:55:07 +00:00
|
|
|
/* Make sure there's enough space for the allocation. */
|
|
|
|
if ((uintptr_t)base_next_addr + csize > (uintptr_t)base_past_addr) {
|
2008-06-10 15:46:18 +00:00
|
|
|
if (base_pages_alloc(csize)) {
|
|
|
|
malloc_mutex_unlock(&base_mtx);
|
2008-02-06 02:59:54 +00:00
|
|
|
return (NULL);
|
2008-06-10 15:46:18 +00:00
|
|
|
}
|
2006-01-16 05:13:49 +00:00
|
|
|
}
|
|
|
|
/* Allocate. */
|
|
|
|
ret = base_next_addr;
|
2006-01-20 03:11:11 +00:00
|
|
|
base_next_addr = (void *)((uintptr_t)base_next_addr + csize);
|
2006-01-16 05:13:49 +00:00
|
|
|
malloc_mutex_unlock(&base_mtx);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2007-11-27 03:17:30 +00:00
|
|
|
static void *
|
|
|
|
base_calloc(size_t number, size_t size)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
|
|
|
|
ret = base_alloc(number * size);
|
2010-01-31 23:16:10 +00:00
|
|
|
if (ret != NULL)
|
|
|
|
memset(ret, 0, number * size);
|
2007-11-27 03:17:30 +00:00
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
static extent_node_t *
|
|
|
|
base_node_alloc(void)
|
2006-01-16 05:13:49 +00:00
|
|
|
{
|
2008-02-06 02:59:54 +00:00
|
|
|
extent_node_t *ret;
|
2006-01-16 05:13:49 +00:00
|
|
|
|
|
|
|
malloc_mutex_lock(&base_mtx);
|
2008-02-06 02:59:54 +00:00
|
|
|
if (base_nodes != NULL) {
|
|
|
|
ret = base_nodes;
|
|
|
|
base_nodes = *(extent_node_t **)ret;
|
2006-01-16 05:13:49 +00:00
|
|
|
malloc_mutex_unlock(&base_mtx);
|
|
|
|
} else {
|
|
|
|
malloc_mutex_unlock(&base_mtx);
|
2008-02-06 02:59:54 +00:00
|
|
|
ret = (extent_node_t *)base_alloc(sizeof(extent_node_t));
|
2006-01-16 05:13:49 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2008-02-06 02:59:54 +00:00
|
|
|
base_node_dealloc(extent_node_t *node)
|
2006-01-16 05:13:49 +00:00
|
|
|
{
|
|
|
|
|
|
|
|
malloc_mutex_lock(&base_mtx);
|
2008-02-06 02:59:54 +00:00
|
|
|
*(extent_node_t **)node = base_nodes;
|
|
|
|
base_nodes = node;
|
2006-01-16 05:13:49 +00:00
|
|
|
malloc_mutex_unlock(&base_mtx);
|
|
|
|
}
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/*
|
|
|
|
* End Utility functions/macros.
|
|
|
|
*/
|
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
2008-02-06 02:59:54 +00:00
|
|
|
* Begin extent tree code.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2006-03-17 09:00:27 +00:00
|
|
|
static inline int
|
2008-02-06 02:59:54 +00:00
|
|
|
extent_szad_comp(extent_node_t *a, extent_node_t *b)
|
2007-12-31 00:59:16 +00:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
size_t a_size = a->size;
|
|
|
|
size_t b_size = b->size;
|
|
|
|
|
|
|
|
ret = (a_size > b_size) - (a_size < b_size);
|
|
|
|
if (ret == 0) {
|
2008-02-06 02:59:54 +00:00
|
|
|
uintptr_t a_addr = (uintptr_t)a->addr;
|
|
|
|
uintptr_t b_addr = (uintptr_t)b->addr;
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
ret = (a_addr > b_addr) - (a_addr < b_addr);
|
2007-12-31 00:59:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return (ret);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
2008-05-01 17:25:55 +00:00
|
|
|
/* Wrap red-black tree macros in functions. */
|
2010-02-28 22:57:13 +00:00
|
|
|
rb_gen(__unused static, extent_tree_szad_, extent_tree_t, extent_node_t,
|
2008-05-01 17:25:55 +00:00
|
|
|
link_szad, extent_szad_comp)
|
2008-07-18 19:35:44 +00:00
|
|
|
#endif
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
static inline int
|
|
|
|
extent_ad_comp(extent_node_t *a, extent_node_t *b)
|
|
|
|
{
|
|
|
|
uintptr_t a_addr = (uintptr_t)a->addr;
|
|
|
|
uintptr_t b_addr = (uintptr_t)b->addr;
|
|
|
|
|
|
|
|
return ((a_addr > b_addr) - (a_addr < b_addr));
|
|
|
|
}
|
|
|
|
|
2008-05-01 17:25:55 +00:00
|
|
|
/* Wrap red-black tree macros in functions. */
|
2010-02-28 22:57:13 +00:00
|
|
|
rb_gen(__unused static, extent_tree_ad_, extent_tree_t, extent_node_t, link_ad,
|
2008-05-01 17:25:55 +00:00
|
|
|
extent_ad_comp)
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* End extent tree code.
|
|
|
|
*/
|
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
|
|
|
* Begin chunk management functions.
|
|
|
|
*/
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
static void *
|
|
|
|
pages_map(void *addr, size_t size)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't use MAP_FIXED here, because it can cause the *replacement*
|
|
|
|
* of existing mappings, and we only want to create new mappings.
|
|
|
|
*/
|
|
|
|
ret = mmap(addr, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON,
|
|
|
|
-1, 0);
|
|
|
|
assert(ret != NULL);
|
|
|
|
|
|
|
|
if (ret == MAP_FAILED)
|
|
|
|
ret = NULL;
|
|
|
|
else if (addr != NULL && ret != addr) {
|
|
|
|
/*
|
|
|
|
* We succeeded in mapping memory, but not in the right place.
|
|
|
|
*/
|
|
|
|
if (munmap(ret, size) == -1) {
|
|
|
|
char buf[STRERROR_BUF];
|
|
|
|
|
|
|
|
strerror_r(errno, buf, sizeof(buf));
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in munmap(): ", buf, "\n");
|
2006-01-13 18:38:56 +00:00
|
|
|
if (opt_abort)
|
|
|
|
abort();
|
|
|
|
}
|
|
|
|
ret = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(ret == NULL || (addr == NULL && ret != addr)
|
|
|
|
|| (addr != NULL && ret == addr));
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
pages_unmap(void *addr, size_t size)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (munmap(addr, size) == -1) {
|
|
|
|
char buf[STRERROR_BUF];
|
|
|
|
|
|
|
|
strerror_r(errno, buf, sizeof(buf));
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in munmap(): ", buf, "\n");
|
2006-01-13 18:38:56 +00:00
|
|
|
if (opt_abort)
|
|
|
|
abort();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2008-02-06 02:59:54 +00:00
|
|
|
static void *
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk_alloc_dss(size_t size, bool *zero)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
2010-01-31 23:16:10 +00:00
|
|
|
void *ret;
|
|
|
|
|
|
|
|
ret = chunk_recycle_dss(size, zero);
|
|
|
|
if (ret != NULL)
|
|
|
|
return (ret);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-04-29 01:32:42 +00:00
|
|
|
/*
|
|
|
|
* sbrk() uses a signed increment argument, so take care not to
|
|
|
|
* interpret a huge allocation request as a negative increment.
|
|
|
|
*/
|
|
|
|
if ((intptr_t)size < 0)
|
|
|
|
return (NULL);
|
|
|
|
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_lock(&dss_mtx);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
if (dss_prev != (void *)-1) {
|
|
|
|
intptr_t incr;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/*
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
* The loop is necessary to recover from races with other
|
|
|
|
* threads that are using the DSS for something other than
|
|
|
|
* malloc.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
do {
|
|
|
|
/* Get the current end of the DSS. */
|
2007-12-31 06:19:48 +00:00
|
|
|
dss_max = sbrk(0);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
/*
|
|
|
|
* Calculate how much padding is necessary to
|
|
|
|
* chunk-align the end of the DSS.
|
|
|
|
*/
|
|
|
|
incr = (intptr_t)size
|
2007-12-31 06:19:48 +00:00
|
|
|
- (intptr_t)CHUNK_ADDR2OFFSET(dss_max);
|
2008-02-06 02:59:54 +00:00
|
|
|
if (incr == (intptr_t)size)
|
2007-12-31 06:19:48 +00:00
|
|
|
ret = dss_max;
|
|
|
|
else {
|
|
|
|
ret = (void *)((intptr_t)dss_max + incr);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
incr += size;
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
|
|
|
dss_prev = sbrk(incr);
|
2007-12-31 06:19:48 +00:00
|
|
|
if (dss_prev == dss_max) {
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Success. */
|
2007-12-31 06:19:48 +00:00
|
|
|
dss_max = (void *)((intptr_t)dss_prev + incr);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
malloc_mutex_unlock(&dss_mtx);
|
2010-01-31 23:16:10 +00:00
|
|
|
*zero = true;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
return (ret);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
} while (dss_prev != (void *)-1);
|
2006-04-27 01:03:00 +00:00
|
|
|
}
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_unlock(&dss_mtx);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
return (NULL);
|
|
|
|
}
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
static void *
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk_recycle_dss(size_t size, bool *zero)
|
2007-12-31 00:59:16 +00:00
|
|
|
{
|
2008-02-06 02:59:54 +00:00
|
|
|
extent_node_t *node, key;
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
key.addr = NULL;
|
2007-12-31 00:59:16 +00:00
|
|
|
key.size = size;
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_lock(&dss_mtx);
|
2008-05-01 17:25:55 +00:00
|
|
|
node = extent_tree_szad_nsearch(&dss_chunks_szad, &key);
|
2008-02-06 02:59:54 +00:00
|
|
|
if (node != NULL) {
|
|
|
|
void *ret = node->addr;
|
2007-12-31 00:59:16 +00:00
|
|
|
|
|
|
|
/* Remove node from the tree. */
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_szad_remove(&dss_chunks_szad, node);
|
2007-12-31 00:59:16 +00:00
|
|
|
if (node->size == size) {
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_ad_remove(&dss_chunks_ad, node);
|
2008-02-06 02:59:54 +00:00
|
|
|
base_node_dealloc(node);
|
2007-12-31 00:59:16 +00:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Insert the remainder of node's address range as a
|
|
|
|
* smaller chunk. Its position within dss_chunks_ad
|
|
|
|
* does not change.
|
|
|
|
*/
|
|
|
|
assert(node->size > size);
|
2008-02-06 02:59:54 +00:00
|
|
|
node->addr = (void *)((uintptr_t)node->addr + size);
|
2007-12-31 00:59:16 +00:00
|
|
|
node->size -= size;
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_szad_insert(&dss_chunks_szad, node);
|
2007-12-31 00:59:16 +00:00
|
|
|
}
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_unlock(&dss_mtx);
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (*zero)
|
2007-12-31 00:59:16 +00:00
|
|
|
memset(ret, 0, size);
|
|
|
|
return (ret);
|
|
|
|
}
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_unlock(&dss_mtx);
|
2007-12-31 00:59:16 +00:00
|
|
|
|
|
|
|
return (NULL);
|
|
|
|
}
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#endif
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
static void *
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk_alloc_mmap_slow(size_t size, bool unaligned)
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
{
|
2007-12-31 00:59:16 +00:00
|
|
|
void *ret;
|
|
|
|
size_t offset;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Beware size_t wrap-around. */
|
|
|
|
if (size + chunksize <= size)
|
|
|
|
return (NULL);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = pages_map(NULL, size + chunksize);
|
2007-12-31 00:59:16 +00:00
|
|
|
if (ret == NULL)
|
|
|
|
return (NULL);
|
2007-02-22 19:10:30 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Clean up unneeded leading/trailing space. */
|
2007-12-31 00:59:16 +00:00
|
|
|
offset = CHUNK_ADDR2OFFSET(ret);
|
|
|
|
if (offset != 0) {
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Note that mmap() returned an unaligned mapping. */
|
|
|
|
unaligned = true;
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Leading space. */
|
|
|
|
pages_unmap(ret, chunksize - offset);
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = (void *)((uintptr_t)ret +
|
|
|
|
(chunksize - offset));
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Trailing space. */
|
|
|
|
pages_unmap((void *)((uintptr_t)ret + size),
|
|
|
|
offset);
|
|
|
|
} else {
|
|
|
|
/* Trailing space only. */
|
|
|
|
pages_unmap((void *)((uintptr_t)ret + size),
|
|
|
|
chunksize);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If mmap() returned an aligned mapping, reset mmap_unaligned so that
|
|
|
|
* the next chunk_alloc_mmap() execution tries the fast allocation
|
|
|
|
* method.
|
|
|
|
*/
|
|
|
|
if (unaligned == false)
|
|
|
|
mmap_unaligned = false;
|
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *
|
|
|
|
chunk_alloc_mmap(size_t size)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Ideally, there would be a way to specify alignment to mmap() (like
|
|
|
|
* NetBSD has), but in the absence of such a feature, we have to work
|
|
|
|
* hard to efficiently create aligned mappings. The reliable, but
|
|
|
|
* slow method is to create a mapping that is over-sized, then trim the
|
|
|
|
* excess. However, that always results in at least one call to
|
|
|
|
* pages_unmap().
|
|
|
|
*
|
|
|
|
* A more optimistic approach is to try mapping precisely the right
|
|
|
|
* amount, then try to append another mapping if alignment is off. In
|
|
|
|
* practice, this works out well as long as the application is not
|
|
|
|
* interleaving mappings via direct mmap() calls. If we do run into a
|
|
|
|
* situation where there is an interleaved mapping and we are unable to
|
|
|
|
* extend an unaligned mapping, our best option is to switch to the
|
|
|
|
* slow method until mmap() returns another aligned mapping. This will
|
|
|
|
* tend to leave a gap in the memory map that is too small to cause
|
|
|
|
* later problems for the optimistic method.
|
|
|
|
*
|
|
|
|
* Another possible confounding factor is address space layout
|
|
|
|
* randomization (ASLR), which causes mmap(2) to disregard the
|
|
|
|
* requested address. mmap_unaligned tracks whether the previous
|
|
|
|
* chunk_alloc_mmap() execution received any unaligned or relocated
|
|
|
|
* mappings, and if so, the current execution will immediately fall
|
|
|
|
* back to the slow method. However, we keep track of whether the fast
|
|
|
|
* method would have succeeded, and if so, we make a note to try the
|
|
|
|
* fast method next time.
|
|
|
|
*/
|
2007-02-22 19:10:30 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (mmap_unaligned == false) {
|
|
|
|
size_t offset;
|
|
|
|
|
|
|
|
ret = pages_map(NULL, size);
|
|
|
|
if (ret == NULL)
|
|
|
|
return (NULL);
|
2007-02-22 19:10:30 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
offset = CHUNK_ADDR2OFFSET(ret);
|
|
|
|
if (offset != 0) {
|
|
|
|
mmap_unaligned = true;
|
|
|
|
/* Try to extend chunk boundary. */
|
|
|
|
if (pages_map((void *)((uintptr_t)ret + size),
|
|
|
|
chunksize - offset) == NULL) {
|
|
|
|
/*
|
|
|
|
* Extension failed. Clean up, then revert to
|
|
|
|
* the reliable-but-expensive method.
|
|
|
|
*/
|
|
|
|
pages_unmap(ret, size);
|
|
|
|
ret = chunk_alloc_mmap_slow(size, true);
|
2007-02-22 19:10:30 +00:00
|
|
|
} else {
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Clean up unneeded leading space. */
|
|
|
|
pages_unmap(ret, chunksize - offset);
|
|
|
|
ret = (void *)((uintptr_t)ret + (chunksize -
|
|
|
|
offset));
|
2007-02-22 19:10:30 +00:00
|
|
|
}
|
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
} else
|
|
|
|
ret = chunk_alloc_mmap_slow(size, false);
|
2007-02-22 19:10:30 +00:00
|
|
|
|
2007-12-31 00:59:16 +00:00
|
|
|
return (ret);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* If the caller specifies (*zero == false), it is still possible to receive
|
|
|
|
* zeroed memory, in which case *zero is toggled to true. arena_chunk_alloc()
|
|
|
|
* takes advantage of this to avoid demanding zeroed chunks, but taking
|
|
|
|
* advantage of them if they are returned.
|
|
|
|
*/
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
static void *
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk_alloc(size_t size, bool *zero)
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
{
|
2007-12-31 00:59:16 +00:00
|
|
|
void *ret;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
|
|
|
assert(size != 0);
|
|
|
|
assert((size & chunksize_mask) == 0);
|
|
|
|
|
2008-11-03 21:17:18 +00:00
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
if (opt_mmap)
|
|
|
|
#endif
|
|
|
|
{
|
|
|
|
ret = chunk_alloc_mmap(size);
|
2010-01-31 23:16:10 +00:00
|
|
|
if (ret != NULL) {
|
|
|
|
*zero = true;
|
2008-11-03 21:17:18 +00:00
|
|
|
goto RETURN;
|
2010-01-31 23:16:10 +00:00
|
|
|
}
|
2008-11-03 21:17:18 +00:00
|
|
|
}
|
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2007-12-31 00:59:16 +00:00
|
|
|
if (opt_dss) {
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = chunk_alloc_dss(size, zero);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
if (ret != NULL)
|
|
|
|
goto RETURN;
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
2006-04-27 01:03:00 +00:00
|
|
|
#endif
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/* All strategies for allocation failed. */
|
|
|
|
ret = NULL;
|
|
|
|
RETURN:
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
if (ret != NULL) {
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_mutex_lock(&chunks_mtx);
|
2007-03-23 22:58:15 +00:00
|
|
|
stats_chunks.nchunks += (size / chunksize);
|
|
|
|
stats_chunks.curchunks += (size / chunksize);
|
2010-01-31 23:16:10 +00:00
|
|
|
if (stats_chunks.curchunks > stats_chunks.highchunks)
|
|
|
|
stats_chunks.highchunks = stats_chunks.curchunks;
|
|
|
|
malloc_mutex_unlock(&chunks_mtx);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
assert(CHUNK_ADDR2BASE(ret) == ret);
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2008-02-06 02:59:54 +00:00
|
|
|
static extent_node_t *
|
2007-12-31 00:59:16 +00:00
|
|
|
chunk_dealloc_dss_record(void *chunk, size_t size)
|
|
|
|
{
|
2008-02-06 02:59:54 +00:00
|
|
|
extent_node_t *node, *prev, key;
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
key.addr = (void *)((uintptr_t)chunk + size);
|
2008-05-01 17:25:55 +00:00
|
|
|
node = extent_tree_ad_nsearch(&dss_chunks_ad, &key);
|
2007-12-31 00:59:16 +00:00
|
|
|
/* Try to coalesce forward. */
|
2008-02-06 02:59:54 +00:00
|
|
|
if (node != NULL && node->addr == key.addr) {
|
2007-12-31 06:19:48 +00:00
|
|
|
/*
|
|
|
|
* Coalesce chunk with the following address range. This does
|
|
|
|
* not change the position within dss_chunks_ad, so only
|
|
|
|
* remove/insert from/into dss_chunks_szad.
|
|
|
|
*/
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_szad_remove(&dss_chunks_szad, node);
|
2008-02-06 02:59:54 +00:00
|
|
|
node->addr = chunk;
|
2007-12-31 06:19:48 +00:00
|
|
|
node->size += size;
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_szad_insert(&dss_chunks_szad, node);
|
2007-12-31 00:59:16 +00:00
|
|
|
} else {
|
2007-12-31 06:19:48 +00:00
|
|
|
/*
|
2008-02-06 02:59:54 +00:00
|
|
|
* Coalescing forward failed, so insert a new node. Drop
|
|
|
|
* dss_mtx during node allocation, since it is possible that a
|
|
|
|
* new base chunk will be allocated.
|
2007-12-31 06:19:48 +00:00
|
|
|
*/
|
|
|
|
malloc_mutex_unlock(&dss_mtx);
|
2008-02-06 02:59:54 +00:00
|
|
|
node = base_node_alloc();
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_lock(&dss_mtx);
|
2007-12-31 00:59:16 +00:00
|
|
|
if (node == NULL)
|
|
|
|
return (NULL);
|
2008-02-06 02:59:54 +00:00
|
|
|
node->addr = chunk;
|
2007-12-31 00:59:16 +00:00
|
|
|
node->size = size;
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_ad_insert(&dss_chunks_ad, node);
|
|
|
|
extent_tree_szad_insert(&dss_chunks_szad, node);
|
2007-12-31 00:59:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Try to coalesce backward. */
|
2008-05-01 17:25:55 +00:00
|
|
|
prev = extent_tree_ad_prev(&dss_chunks_ad, node);
|
2008-02-06 02:59:54 +00:00
|
|
|
if (prev != NULL && (void *)((uintptr_t)prev->addr + prev->size) ==
|
2007-12-31 00:59:16 +00:00
|
|
|
chunk) {
|
|
|
|
/*
|
|
|
|
* Coalesce chunk with the previous address range. This does
|
|
|
|
* not change the position within dss_chunks_ad, so only
|
|
|
|
* remove/insert node from/into dss_chunks_szad.
|
|
|
|
*/
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_szad_remove(&dss_chunks_szad, prev);
|
|
|
|
extent_tree_ad_remove(&dss_chunks_ad, prev);
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_szad_remove(&dss_chunks_szad, node);
|
2008-02-06 02:59:54 +00:00
|
|
|
node->addr = prev->addr;
|
2007-12-31 00:59:16 +00:00
|
|
|
node->size += prev->size;
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_szad_insert(&dss_chunks_szad, node);
|
2007-12-31 00:59:16 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
base_node_dealloc(prev);
|
2007-12-31 00:59:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return (node);
|
|
|
|
}
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
static bool
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
chunk_dealloc_dss(void *chunk, size_t size)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
2010-01-31 23:16:10 +00:00
|
|
|
bool ret;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_lock(&dss_mtx);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
if ((uintptr_t)chunk >= (uintptr_t)dss_base
|
|
|
|
&& (uintptr_t)chunk < (uintptr_t)dss_max) {
|
2008-02-06 02:59:54 +00:00
|
|
|
extent_node_t *node;
|
2007-12-31 00:59:16 +00:00
|
|
|
|
|
|
|
/* Try to coalesce with other unused chunks. */
|
|
|
|
node = chunk_dealloc_dss_record(chunk, size);
|
|
|
|
if (node != NULL) {
|
2008-02-06 02:59:54 +00:00
|
|
|
chunk = node->addr;
|
2007-12-31 00:59:16 +00:00
|
|
|
size = node->size;
|
|
|
|
}
|
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
/* Get the current end of the DSS. */
|
2007-12-31 06:19:48 +00:00
|
|
|
dss_max = sbrk(0);
|
2006-04-27 01:03:00 +00:00
|
|
|
|
|
|
|
/*
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
* Try to shrink the DSS if this chunk is at the end of the
|
|
|
|
* DSS. The sbrk() call here is subject to a race condition
|
|
|
|
* with threads that use brk(2) or sbrk(2) directly, but the
|
|
|
|
* alternative would be to leak memory for the sake of poorly
|
|
|
|
* designed multi-threaded programs.
|
2006-04-27 01:03:00 +00:00
|
|
|
*/
|
2007-12-31 06:19:48 +00:00
|
|
|
if ((void *)((uintptr_t)chunk + size) == dss_max
|
2007-12-31 00:59:16 +00:00
|
|
|
&& (dss_prev = sbrk(-(intptr_t)size)) == dss_max) {
|
2007-12-31 06:19:48 +00:00
|
|
|
/* Success. */
|
|
|
|
dss_max = (void *)((intptr_t)dss_prev - (intptr_t)size);
|
|
|
|
|
|
|
|
if (node != NULL) {
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_szad_remove(&dss_chunks_szad, node);
|
|
|
|
extent_tree_ad_remove(&dss_chunks_ad, node);
|
2008-02-06 02:59:54 +00:00
|
|
|
base_node_dealloc(node);
|
2006-04-27 01:03:00 +00:00
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
} else
|
2006-04-27 01:03:00 +00:00
|
|
|
madvise(chunk, size, MADV_FREE);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = false;
|
|
|
|
goto RETURN;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = true;
|
|
|
|
RETURN:
|
|
|
|
malloc_mutex_unlock(&dss_mtx);
|
|
|
|
return (ret);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
}
|
2006-04-27 01:03:00 +00:00
|
|
|
#endif
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
static void
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
chunk_dealloc_mmap(void *chunk, size_t size)
|
|
|
|
{
|
|
|
|
|
|
|
|
pages_unmap(chunk, size);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
chunk_dealloc(void *chunk, size_t size)
|
|
|
|
{
|
|
|
|
|
|
|
|
assert(chunk != NULL);
|
|
|
|
assert(CHUNK_ADDR2BASE(chunk) == chunk);
|
|
|
|
assert(size != 0);
|
|
|
|
assert((size & chunksize_mask) == 0);
|
|
|
|
|
2007-12-31 00:59:16 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_mutex_lock(&chunks_mtx);
|
2007-12-31 00:59:16 +00:00
|
|
|
stats_chunks.curchunks -= (size / chunksize);
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_mutex_unlock(&chunks_mtx);
|
2007-12-31 00:59:16 +00:00
|
|
|
#endif
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
if (opt_dss) {
|
|
|
|
if (chunk_dealloc_dss(chunk, size) == false)
|
2007-12-31 00:59:16 +00:00
|
|
|
return;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (opt_mmap)
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
chunk_dealloc_mmap(chunk, size);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-03-17 09:00:27 +00:00
|
|
|
* End chunk management functions.
|
|
|
|
*/
|
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
|
|
|
* Begin arena.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
|
|
|
|
2006-03-30 20:25:52 +00:00
|
|
|
/*
|
|
|
|
* Choose an arena based on a per-thread value (fast-path code, calls slow-path
|
2007-06-15 22:00:16 +00:00
|
|
|
* code if necessary).
|
2006-03-30 20:25:52 +00:00
|
|
|
*/
|
|
|
|
static inline arena_t *
|
|
|
|
choose_arena(void)
|
|
|
|
{
|
|
|
|
arena_t *ret;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We can only use TLS if this is a PIC library, since for the static
|
|
|
|
* library version, libc's malloc is used by TLS allocation, which
|
|
|
|
* introduces a bootstrapping issue.
|
|
|
|
*/
|
|
|
|
#ifndef NO_TLS
|
|
|
|
if (__isthreaded == false) {
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Avoid the overhead of TLS for single-threaded operation. */
|
2006-03-30 20:25:52 +00:00
|
|
|
return (arenas[0]);
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = arenas_map;
|
2007-11-27 03:09:23 +00:00
|
|
|
if (ret == NULL) {
|
2006-03-30 20:25:52 +00:00
|
|
|
ret = choose_arena_hard();
|
2007-11-27 03:09:23 +00:00
|
|
|
assert(ret != NULL);
|
|
|
|
}
|
2006-03-30 20:25:52 +00:00
|
|
|
#else
|
2008-02-06 02:59:54 +00:00
|
|
|
if (__isthreaded && narenas > 1) {
|
2006-03-30 20:25:52 +00:00
|
|
|
unsigned long ind;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Hash _pthread_self() to one of the arenas. There is a prime
|
|
|
|
* number of arenas, so this has a reasonable chance of
|
|
|
|
* working. Even so, the hashing can be easily thwarted by
|
|
|
|
* inconvenient _pthread_self() values. Without specific
|
|
|
|
* knowledge of how _pthread_self() calculates values, we can't
|
2006-04-27 01:03:00 +00:00
|
|
|
* easily do much better than this.
|
2006-03-30 20:25:52 +00:00
|
|
|
*/
|
|
|
|
ind = (unsigned long) _pthread_self() % narenas;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Optimistially assume that arenas[ind] has been initialized.
|
|
|
|
* At worst, we find out that some other thread has already
|
|
|
|
* done so, after acquiring the lock in preparation. Note that
|
|
|
|
* this lazy locking also has the effect of lazily forcing
|
|
|
|
* cache coherency; without the lock acquisition, there's no
|
|
|
|
* guarantee that modification of arenas[ind] by another thread
|
|
|
|
* would be seen on this CPU for an arbitrary amount of time.
|
|
|
|
*
|
|
|
|
* In general, this approach to modifying a synchronized value
|
|
|
|
* isn't a good idea, but in this case we only ever modify the
|
|
|
|
* value once, so things work out well.
|
|
|
|
*/
|
|
|
|
ret = arenas[ind];
|
|
|
|
if (ret == NULL) {
|
|
|
|
/*
|
|
|
|
* Avoid races with another thread that may have already
|
|
|
|
* initialized arenas[ind].
|
|
|
|
*/
|
2007-11-27 03:17:30 +00:00
|
|
|
malloc_spin_lock(&arenas_lock);
|
2006-03-30 20:25:52 +00:00
|
|
|
if (arenas[ind] == NULL)
|
|
|
|
ret = arenas_extend((unsigned)ind);
|
|
|
|
else
|
|
|
|
ret = arenas[ind];
|
2007-11-27 03:17:30 +00:00
|
|
|
malloc_spin_unlock(&arenas_lock);
|
2006-03-30 20:25:52 +00:00
|
|
|
}
|
|
|
|
} else
|
|
|
|
ret = arenas[0];
|
|
|
|
#endif
|
|
|
|
|
|
|
|
assert(ret != NULL);
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifndef NO_TLS
|
|
|
|
/*
|
|
|
|
* Choose an arena based on a per-thread value (slow-path code only, called
|
|
|
|
* only by choose_arena()).
|
|
|
|
*/
|
|
|
|
static arena_t *
|
|
|
|
choose_arena_hard(void)
|
|
|
|
{
|
|
|
|
arena_t *ret;
|
|
|
|
|
|
|
|
assert(__isthreaded);
|
|
|
|
|
2007-11-27 03:17:30 +00:00
|
|
|
if (narenas > 1) {
|
2007-12-17 01:20:04 +00:00
|
|
|
malloc_spin_lock(&arenas_lock);
|
|
|
|
if ((ret = arenas[next_arena]) == NULL)
|
|
|
|
ret = arenas_extend(next_arena);
|
|
|
|
next_arena = (next_arena + 1) % narenas;
|
|
|
|
malloc_spin_unlock(&arenas_lock);
|
2007-11-27 03:17:30 +00:00
|
|
|
} else
|
2006-03-30 20:25:52 +00:00
|
|
|
ret = arenas[0];
|
2007-11-27 03:17:30 +00:00
|
|
|
|
2006-03-30 20:25:52 +00:00
|
|
|
arenas_map = ret;
|
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
static inline int
|
|
|
|
arena_chunk_comp(arena_chunk_t *a, arena_chunk_t *b)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
2007-12-31 00:59:16 +00:00
|
|
|
uintptr_t a_chunk = (uintptr_t)a;
|
|
|
|
uintptr_t b_chunk = (uintptr_t)b;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
assert(a != NULL);
|
|
|
|
assert(b != NULL);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2007-12-31 00:59:16 +00:00
|
|
|
return ((a_chunk > b_chunk) - (a_chunk < b_chunk));
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
2008-05-01 17:25:55 +00:00
|
|
|
/* Wrap red-black tree macros in functions. */
|
2010-02-28 22:57:13 +00:00
|
|
|
rb_gen(__unused static, arena_chunk_tree_dirty_, arena_chunk_tree_t,
|
2008-05-01 17:25:55 +00:00
|
|
|
arena_chunk_t, link_dirty, arena_chunk_comp)
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2007-03-28 19:55:07 +00:00
|
|
|
static inline int
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_comp(arena_chunk_map_t *a, arena_chunk_map_t *b)
|
2007-03-28 19:55:07 +00:00
|
|
|
{
|
2008-07-18 19:35:44 +00:00
|
|
|
uintptr_t a_mapelm = (uintptr_t)a;
|
|
|
|
uintptr_t b_mapelm = (uintptr_t)b;
|
2007-03-28 19:55:07 +00:00
|
|
|
|
|
|
|
assert(a != NULL);
|
|
|
|
assert(b != NULL);
|
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
return ((a_mapelm > b_mapelm) - (a_mapelm < b_mapelm));
|
2007-03-28 19:55:07 +00:00
|
|
|
}
|
|
|
|
|
2008-05-01 17:25:55 +00:00
|
|
|
/* Wrap red-black tree macros in functions. */
|
2010-02-28 22:57:13 +00:00
|
|
|
rb_gen(__unused static, arena_run_tree_, arena_run_tree_t, arena_chunk_map_t,
|
2008-07-18 19:35:44 +00:00
|
|
|
link, arena_run_comp)
|
2007-03-28 19:55:07 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
static inline int
|
|
|
|
arena_avail_comp(arena_chunk_map_t *a, arena_chunk_map_t *b)
|
2008-02-06 02:59:54 +00:00
|
|
|
{
|
2008-07-18 19:35:44 +00:00
|
|
|
int ret;
|
2008-09-10 14:27:34 +00:00
|
|
|
size_t a_size = a->bits & ~PAGE_MASK;
|
|
|
|
size_t b_size = b->bits & ~PAGE_MASK;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
ret = (a_size > b_size) - (a_size < b_size);
|
|
|
|
if (ret == 0) {
|
|
|
|
uintptr_t a_mapelm, b_mapelm;
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if ((a->bits & CHUNK_MAP_KEY) != CHUNK_MAP_KEY)
|
2008-07-18 19:35:44 +00:00
|
|
|
a_mapelm = (uintptr_t)a;
|
|
|
|
else {
|
|
|
|
/*
|
|
|
|
* Treat keys as though they are lower than anything
|
|
|
|
* else.
|
|
|
|
*/
|
|
|
|
a_mapelm = 0;
|
|
|
|
}
|
|
|
|
b_mapelm = (uintptr_t)b;
|
|
|
|
|
|
|
|
ret = (a_mapelm > b_mapelm) - (a_mapelm < b_mapelm);
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
/* Wrap red-black tree macros in functions. */
|
2010-02-28 22:57:13 +00:00
|
|
|
rb_gen(__unused static, arena_avail_tree_, arena_avail_tree_t,
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_chunk_map_t, link, arena_avail_comp)
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
static inline void
|
|
|
|
arena_run_rc_incr(arena_run_t *run, arena_bin_t *bin, const void *ptr)
|
|
|
|
{
|
|
|
|
arena_chunk_t *chunk;
|
|
|
|
arena_t *arena;
|
|
|
|
size_t pagebeg, pageend, i;
|
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
|
|
|
|
arena = chunk->arena;
|
|
|
|
pagebeg = ((uintptr_t)ptr - (uintptr_t)chunk) >> PAGE_SHIFT;
|
|
|
|
pageend = ((uintptr_t)ptr + (uintptr_t)(bin->reg_size - 1) -
|
|
|
|
(uintptr_t)chunk) >> PAGE_SHIFT;
|
|
|
|
|
|
|
|
for (i = pagebeg; i <= pageend; i++) {
|
|
|
|
size_t mapbits = chunk->map[i].bits;
|
|
|
|
|
|
|
|
if (mapbits & CHUNK_MAP_DIRTY) {
|
|
|
|
assert((mapbits & CHUNK_MAP_RC_MASK) == 0);
|
|
|
|
chunk->ndirty--;
|
|
|
|
arena->ndirty--;
|
|
|
|
mapbits ^= CHUNK_MAP_DIRTY;
|
|
|
|
}
|
|
|
|
assert((mapbits & CHUNK_MAP_RC_MASK) != CHUNK_MAP_RC_MASK);
|
|
|
|
mapbits += CHUNK_MAP_RC_ONE;
|
|
|
|
chunk->map[i].bits = mapbits;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
arena_run_rc_decr(arena_run_t *run, arena_bin_t *bin, const void *ptr)
|
|
|
|
{
|
|
|
|
arena_chunk_t *chunk;
|
|
|
|
arena_t *arena;
|
|
|
|
size_t pagebeg, pageend, mapbits, i;
|
|
|
|
bool dirtier = false;
|
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
|
|
|
|
arena = chunk->arena;
|
|
|
|
pagebeg = ((uintptr_t)ptr - (uintptr_t)chunk) >> PAGE_SHIFT;
|
|
|
|
pageend = ((uintptr_t)ptr + (uintptr_t)(bin->reg_size - 1) -
|
|
|
|
(uintptr_t)chunk) >> PAGE_SHIFT;
|
|
|
|
|
|
|
|
/* First page. */
|
|
|
|
mapbits = chunk->map[pagebeg].bits;
|
|
|
|
mapbits -= CHUNK_MAP_RC_ONE;
|
|
|
|
if ((mapbits & CHUNK_MAP_RC_MASK) == 0) {
|
|
|
|
dirtier = true;
|
|
|
|
assert((mapbits & CHUNK_MAP_DIRTY) == 0);
|
|
|
|
mapbits |= CHUNK_MAP_DIRTY;
|
|
|
|
chunk->ndirty++;
|
|
|
|
arena->ndirty++;
|
|
|
|
}
|
|
|
|
chunk->map[pagebeg].bits = mapbits;
|
|
|
|
|
|
|
|
if (pageend - pagebeg >= 1) {
|
|
|
|
/*
|
|
|
|
* Interior pages are completely consumed by the object being
|
|
|
|
* deallocated, which means that the pages can be
|
|
|
|
* unconditionally marked dirty.
|
|
|
|
*/
|
|
|
|
for (i = pagebeg + 1; i < pageend; i++) {
|
|
|
|
mapbits = chunk->map[i].bits;
|
|
|
|
mapbits -= CHUNK_MAP_RC_ONE;
|
|
|
|
assert((mapbits & CHUNK_MAP_RC_MASK) == 0);
|
|
|
|
dirtier = true;
|
|
|
|
assert((mapbits & CHUNK_MAP_DIRTY) == 0);
|
|
|
|
mapbits |= CHUNK_MAP_DIRTY;
|
|
|
|
chunk->ndirty++;
|
|
|
|
arena->ndirty++;
|
|
|
|
chunk->map[i].bits = mapbits;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Last page. */
|
|
|
|
mapbits = chunk->map[pageend].bits;
|
|
|
|
mapbits -= CHUNK_MAP_RC_ONE;
|
|
|
|
if ((mapbits & CHUNK_MAP_RC_MASK) == 0) {
|
|
|
|
dirtier = true;
|
|
|
|
assert((mapbits & CHUNK_MAP_DIRTY) == 0);
|
|
|
|
mapbits |= CHUNK_MAP_DIRTY;
|
|
|
|
chunk->ndirty++;
|
|
|
|
arena->ndirty++;
|
|
|
|
}
|
|
|
|
chunk->map[pageend].bits = mapbits;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dirtier) {
|
|
|
|
if (chunk->dirtied == false) {
|
|
|
|
arena_chunk_tree_dirty_insert(&arena->chunks_dirty,
|
|
|
|
chunk);
|
|
|
|
chunk->dirtied = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Enforce opt_lg_dirty_mult. */
|
|
|
|
if (opt_lg_dirty_mult >= 0 && (arena->nactive >>
|
|
|
|
opt_lg_dirty_mult) < arena->ndirty)
|
|
|
|
arena_purge(arena);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-04-04 03:51:47 +00:00
|
|
|
static inline void *
|
|
|
|
arena_run_reg_alloc(arena_run_t *run, arena_bin_t *bin)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
2006-04-04 03:51:47 +00:00
|
|
|
void *ret;
|
|
|
|
unsigned i, mask, bit, regind;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
assert(run->magic == ARENA_RUN_MAGIC);
|
2007-03-28 19:55:07 +00:00
|
|
|
assert(run->regs_minelm < bin->regs_mask_nelms);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Move the first check outside the loop, so that run->regs_minelm can
|
|
|
|
* be updated unconditionally, without the possibility of updating it
|
|
|
|
* multiple times.
|
|
|
|
*/
|
|
|
|
i = run->regs_minelm;
|
|
|
|
mask = run->regs_mask[i];
|
|
|
|
if (mask != 0) {
|
|
|
|
/* Usable allocation found. */
|
|
|
|
bit = ffs((int)mask) - 1;
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
regind = ((i << (LG_SIZEOF_INT + 3)) + bit);
|
2008-02-06 02:59:54 +00:00
|
|
|
assert(regind < bin->nregs);
|
2007-03-28 19:55:07 +00:00
|
|
|
ret = (void *)(((uintptr_t)run) + bin->reg0_offset
|
|
|
|
+ (bin->reg_size * regind));
|
|
|
|
|
|
|
|
/* Clear bit. */
|
2007-11-27 03:09:23 +00:00
|
|
|
mask ^= (1U << bit);
|
2007-03-28 19:55:07 +00:00
|
|
|
run->regs_mask[i] = mask;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_run_rc_incr(run, bin, ret);
|
|
|
|
|
2007-03-28 19:55:07 +00:00
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i++; i < bin->regs_mask_nelms; i++) {
|
2006-03-17 09:00:27 +00:00
|
|
|
mask = run->regs_mask[i];
|
|
|
|
if (mask != 0) {
|
2006-04-04 03:51:47 +00:00
|
|
|
/* Usable allocation found. */
|
2007-01-31 22:54:19 +00:00
|
|
|
bit = ffs((int)mask) - 1;
|
2006-04-04 03:51:47 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
regind = ((i << (LG_SIZEOF_INT + 3)) + bit);
|
2008-02-06 02:59:54 +00:00
|
|
|
assert(regind < bin->nregs);
|
2007-03-28 19:55:07 +00:00
|
|
|
ret = (void *)(((uintptr_t)run) + bin->reg0_offset
|
|
|
|
+ (bin->reg_size * regind));
|
2006-04-04 03:51:47 +00:00
|
|
|
|
|
|
|
/* Clear bit. */
|
2007-11-27 03:09:23 +00:00
|
|
|
mask ^= (1U << bit);
|
2006-04-04 03:51:47 +00:00
|
|
|
run->regs_mask[i] = mask;
|
|
|
|
|
2006-03-26 23:37:25 +00:00
|
|
|
/*
|
2006-03-17 09:00:27 +00:00
|
|
|
* Make a note that nothing before this element
|
|
|
|
* contains a free region.
|
|
|
|
*/
|
2007-03-28 19:55:07 +00:00
|
|
|
run->regs_minelm = i; /* Low payoff: + (mask == 0); */
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_run_rc_incr(run, bin, ret);
|
|
|
|
|
2007-03-28 19:55:07 +00:00
|
|
|
return (ret);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
}
|
2006-04-04 03:51:47 +00:00
|
|
|
/* Not reached. */
|
|
|
|
assert(0);
|
2006-04-05 18:46:24 +00:00
|
|
|
return (NULL);
|
2006-04-04 03:51:47 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
arena_run_reg_dalloc(arena_run_t *run, arena_bin_t *bin, void *ptr, size_t size)
|
|
|
|
{
|
2009-12-10 02:51:40 +00:00
|
|
|
unsigned shift, diff, regind, elm, bit;
|
2006-04-04 03:51:47 +00:00
|
|
|
|
|
|
|
assert(run->magic == ARENA_RUN_MAGIC);
|
|
|
|
|
|
|
|
/*
|
2006-07-01 16:51:10 +00:00
|
|
|
* Avoid doing division with a variable divisor if possible. Using
|
|
|
|
* actual division here can reduce allocator throughput by over 20%!
|
2006-04-04 03:51:47 +00:00
|
|
|
*/
|
|
|
|
diff = (unsigned)((uintptr_t)ptr - (uintptr_t)run - bin->reg0_offset);
|
2006-07-01 16:51:10 +00:00
|
|
|
|
2009-12-10 02:51:40 +00:00
|
|
|
/* Rescale (factor powers of 2 out of the numerator and denominator). */
|
|
|
|
shift = ffs(size) - 1;
|
|
|
|
diff >>= shift;
|
|
|
|
size >>= shift;
|
|
|
|
|
|
|
|
if (size == 1) {
|
|
|
|
/* The divisor was a power of 2. */
|
|
|
|
regind = diff;
|
|
|
|
} else {
|
2006-07-01 16:51:10 +00:00
|
|
|
/*
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
* To divide by a number D that is not a power of two we
|
|
|
|
* multiply by (2^21 / D) and then right shift by 21 positions.
|
|
|
|
*
|
|
|
|
* X / D
|
|
|
|
*
|
|
|
|
* becomes
|
|
|
|
*
|
2009-12-10 02:51:40 +00:00
|
|
|
* (X * size_invs[D - 3]) >> SIZE_INV_SHIFT
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
*
|
|
|
|
* We can omit the first three elements, because we never
|
2009-12-10 02:51:40 +00:00
|
|
|
* divide by 0, and 1 and 2 are both powers of two, which are
|
|
|
|
* handled above.
|
2006-07-01 16:51:10 +00:00
|
|
|
*/
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#define SIZE_INV_SHIFT 21
|
2009-12-10 02:51:40 +00:00
|
|
|
#define SIZE_INV(s) (((1U << SIZE_INV_SHIFT) / (s)) + 1)
|
|
|
|
static const unsigned size_invs[] = {
|
|
|
|
SIZE_INV(3),
|
|
|
|
SIZE_INV(4), SIZE_INV(5), SIZE_INV(6), SIZE_INV(7),
|
|
|
|
SIZE_INV(8), SIZE_INV(9), SIZE_INV(10), SIZE_INV(11),
|
|
|
|
SIZE_INV(12), SIZE_INV(13), SIZE_INV(14), SIZE_INV(15),
|
|
|
|
SIZE_INV(16), SIZE_INV(17), SIZE_INV(18), SIZE_INV(19),
|
|
|
|
SIZE_INV(20), SIZE_INV(21), SIZE_INV(22), SIZE_INV(23),
|
|
|
|
SIZE_INV(24), SIZE_INV(25), SIZE_INV(26), SIZE_INV(27),
|
|
|
|
SIZE_INV(28), SIZE_INV(29), SIZE_INV(30), SIZE_INV(31)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
};
|
2009-12-10 02:51:40 +00:00
|
|
|
|
|
|
|
if (size <= ((sizeof(size_invs) / sizeof(unsigned)) + 2))
|
|
|
|
regind = (diff * size_invs[size - 3]) >> SIZE_INV_SHIFT;
|
|
|
|
else
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
regind = diff / size;
|
2009-12-10 02:51:40 +00:00
|
|
|
#undef SIZE_INV
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#undef SIZE_INV_SHIFT
|
2009-12-10 02:51:40 +00:00
|
|
|
}
|
2006-09-08 17:52:15 +00:00
|
|
|
assert(diff == regind * size);
|
2006-04-04 03:51:47 +00:00
|
|
|
assert(regind < bin->nregs);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
elm = regind >> (LG_SIZEOF_INT + 3);
|
2006-04-04 03:51:47 +00:00
|
|
|
if (elm < run->regs_minelm)
|
|
|
|
run->regs_minelm = elm;
|
2010-01-31 23:16:10 +00:00
|
|
|
bit = regind - (elm << (LG_SIZEOF_INT + 3));
|
2007-11-27 03:09:23 +00:00
|
|
|
assert((run->regs_mask[elm] & (1U << bit)) == 0);
|
|
|
|
run->regs_mask[elm] |= (1U << bit);
|
2010-01-31 23:16:10 +00:00
|
|
|
|
|
|
|
arena_run_rc_decr(run, bin, ptr);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
static void
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_split(arena_t *arena, arena_run_t *run, size_t size, bool large,
|
2008-02-06 02:59:54 +00:00
|
|
|
bool zero)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
2006-03-17 09:00:27 +00:00
|
|
|
arena_chunk_t *chunk;
|
2008-05-01 17:25:55 +00:00
|
|
|
size_t old_ndirty, run_ind, total_pages, need_pages, rem_pages, i;
|
2006-03-17 09:00:27 +00:00
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(run);
|
2008-05-01 17:25:55 +00:00
|
|
|
old_ndirty = chunk->ndirty;
|
2006-03-17 09:00:27 +00:00
|
|
|
run_ind = (unsigned)(((uintptr_t)run - (uintptr_t)chunk)
|
2008-09-10 14:27:34 +00:00
|
|
|
>> PAGE_SHIFT);
|
|
|
|
total_pages = (chunk->map[run_ind].bits & ~PAGE_MASK) >>
|
|
|
|
PAGE_SHIFT;
|
|
|
|
need_pages = (size >> PAGE_SHIFT);
|
2007-11-27 03:12:15 +00:00
|
|
|
assert(need_pages > 0);
|
2007-03-23 05:05:48 +00:00
|
|
|
assert(need_pages <= total_pages);
|
|
|
|
rem_pages = total_pages - need_pages;
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_avail_tree_remove(&arena->runs_avail, &chunk->map[run_ind]);
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->nactive += need_pages;
|
2008-07-18 19:35:44 +00:00
|
|
|
|
|
|
|
/* Keep track of trailing unused pages for later use. */
|
|
|
|
if (rem_pages > 0) {
|
|
|
|
chunk->map[run_ind+need_pages].bits = (rem_pages <<
|
2008-09-10 14:27:34 +00:00
|
|
|
PAGE_SHIFT) | (chunk->map[run_ind+need_pages].bits &
|
2010-01-31 23:16:10 +00:00
|
|
|
CHUNK_MAP_FLAGS_MASK);
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[run_ind+total_pages-1].bits = (rem_pages <<
|
2008-09-10 14:27:34 +00:00
|
|
|
PAGE_SHIFT) | (chunk->map[run_ind+total_pages-1].bits &
|
2010-01-31 23:16:10 +00:00
|
|
|
CHUNK_MAP_FLAGS_MASK);
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_avail_tree_insert(&arena->runs_avail,
|
|
|
|
&chunk->map[run_ind+need_pages]);
|
|
|
|
}
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
for (i = 0; i < need_pages; i++) {
|
|
|
|
/* Zero if necessary. */
|
|
|
|
if (zero) {
|
2008-07-18 19:35:44 +00:00
|
|
|
if ((chunk->map[run_ind + i].bits & CHUNK_MAP_ZEROED)
|
2008-02-06 02:59:54 +00:00
|
|
|
== 0) {
|
|
|
|
memset((void *)((uintptr_t)chunk + ((run_ind
|
2008-09-10 14:27:34 +00:00
|
|
|
+ i) << PAGE_SHIFT)), 0, PAGE_SIZE);
|
2008-07-18 19:35:44 +00:00
|
|
|
/* CHUNK_MAP_ZEROED is cleared below. */
|
2007-11-27 03:12:15 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Update dirty page accounting. */
|
2008-07-18 19:35:44 +00:00
|
|
|
if (chunk->map[run_ind + i].bits & CHUNK_MAP_DIRTY) {
|
2008-02-06 02:59:54 +00:00
|
|
|
chunk->ndirty--;
|
|
|
|
arena->ndirty--;
|
2008-07-18 19:35:44 +00:00
|
|
|
/* CHUNK_MAP_DIRTY is cleared below. */
|
2007-11-27 03:12:15 +00:00
|
|
|
}
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
/* Initialize the chunk map. */
|
2008-07-18 19:35:44 +00:00
|
|
|
if (large) {
|
|
|
|
chunk->map[run_ind + i].bits = CHUNK_MAP_LARGE
|
|
|
|
| CHUNK_MAP_ALLOCATED;
|
|
|
|
} else {
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk->map[run_ind + i].bits = (i << CHUNK_MAP_PG_SHIFT)
|
2008-07-18 19:35:44 +00:00
|
|
|
| CHUNK_MAP_ALLOCATED;
|
|
|
|
}
|
2006-03-17 09:00:27 +00:00
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (large) {
|
|
|
|
/*
|
|
|
|
* Set the run size only in the first element for large runs.
|
|
|
|
* This is primarily a debugging aid, since the lack of size
|
|
|
|
* info for trailing pages only matters if the application
|
|
|
|
* tries to operate on an interior pointer.
|
|
|
|
*/
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[run_ind].bits |= size;
|
2010-01-31 23:16:10 +00:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Initialize the first page's refcount to 1, so that the run
|
|
|
|
* header is protected from dirty page purging.
|
|
|
|
*/
|
|
|
|
chunk->map[run_ind].bits += CHUNK_MAP_RC_ONE;
|
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
static arena_chunk_t *
|
|
|
|
arena_chunk_alloc(arena_t *arena)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
2006-03-17 09:00:27 +00:00
|
|
|
arena_chunk_t *chunk;
|
2008-07-18 19:35:44 +00:00
|
|
|
size_t i;
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2006-12-23 00:18:51 +00:00
|
|
|
if (arena->spare != NULL) {
|
|
|
|
chunk = arena->spare;
|
|
|
|
arena->spare = NULL;
|
|
|
|
} else {
|
2010-01-31 23:16:10 +00:00
|
|
|
bool zero;
|
|
|
|
size_t zeroed;
|
|
|
|
|
|
|
|
zero = false;
|
|
|
|
chunk = (arena_chunk_t *)chunk_alloc(chunksize, &zero);
|
2006-12-23 00:18:51 +00:00
|
|
|
if (chunk == NULL)
|
|
|
|
return (NULL);
|
2007-03-23 05:05:48 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2007-03-23 22:58:15 +00:00
|
|
|
arena->stats.mapped += chunksize;
|
2007-03-23 05:05:48 +00:00
|
|
|
#endif
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-12-23 00:18:51 +00:00
|
|
|
chunk->arena = arena;
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk->dirtied = false;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-12-23 00:18:51 +00:00
|
|
|
/*
|
|
|
|
* Claim that no pages are in use, since the header is merely
|
|
|
|
* overhead.
|
|
|
|
*/
|
2008-02-06 02:59:54 +00:00
|
|
|
chunk->ndirty = 0;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-12-23 00:18:51 +00:00
|
|
|
/*
|
2008-07-18 19:35:44 +00:00
|
|
|
* Initialize the map to contain one maximal free untouched run.
|
2010-01-31 23:16:10 +00:00
|
|
|
* Mark the pages as zeroed iff chunk_alloc() returned a zeroed
|
|
|
|
* chunk.
|
2006-12-23 00:18:51 +00:00
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
zeroed = zero ? CHUNK_MAP_ZEROED : 0;
|
2008-07-18 19:35:44 +00:00
|
|
|
for (i = 0; i < arena_chunk_header_npages; i++)
|
|
|
|
chunk->map[i].bits = 0;
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk->map[i].bits = arena_maxclass | zeroed;
|
|
|
|
for (i++; i < chunk_npages-1; i++)
|
|
|
|
chunk->map[i].bits = zeroed;
|
|
|
|
chunk->map[chunk_npages-1].bits = arena_maxclass | zeroed;
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
/* Insert the run into the runs_avail tree. */
|
|
|
|
arena_avail_tree_insert(&arena->runs_avail,
|
|
|
|
&chunk->map[arena_chunk_header_npages]);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
return (chunk);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
static void
|
2006-12-23 00:18:51 +00:00
|
|
|
arena_chunk_dealloc(arena_t *arena, arena_chunk_t *chunk)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
if (arena->spare != NULL) {
|
2010-01-31 23:16:10 +00:00
|
|
|
if (arena->spare->dirtied) {
|
2008-05-01 17:25:55 +00:00
|
|
|
arena_chunk_tree_dirty_remove(
|
|
|
|
&chunk->arena->chunks_dirty, arena->spare);
|
|
|
|
arena->ndirty -= arena->spare->ndirty;
|
|
|
|
}
|
2008-02-06 02:59:54 +00:00
|
|
|
chunk_dealloc((void *)arena->spare, chunksize);
|
2007-03-23 05:05:48 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2007-03-23 22:58:15 +00:00
|
|
|
arena->stats.mapped -= chunksize;
|
2007-03-23 05:05:48 +00:00
|
|
|
#endif
|
2006-12-23 00:18:51 +00:00
|
|
|
}
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
/*
|
2008-07-18 19:35:44 +00:00
|
|
|
* Remove run from runs_avail, regardless of whether this chunk
|
2008-02-06 02:59:54 +00:00
|
|
|
* will be cached, so that the arena does not use it. Dirty page
|
2008-05-01 17:25:55 +00:00
|
|
|
* flushing only uses the chunks_dirty tree, so leaving this chunk in
|
|
|
|
* the chunks_* trees is sufficient for that purpose.
|
2008-02-06 02:59:54 +00:00
|
|
|
*/
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_avail_tree_remove(&arena->runs_avail,
|
|
|
|
&chunk->map[arena_chunk_header_npages]);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
arena->spare = chunk;
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
static arena_run_t *
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_alloc(arena_t *arena, size_t size, bool large, bool zero)
|
2006-03-17 09:00:27 +00:00
|
|
|
{
|
|
|
|
arena_chunk_t *chunk;
|
2007-03-23 05:05:48 +00:00
|
|
|
arena_run_t *run;
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_chunk_map_t *mapelm, key;
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
assert(size <= arena_maxclass);
|
2008-09-10 14:27:34 +00:00
|
|
|
assert((size & PAGE_MASK) == 0);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Search the arena's chunks for the lowest best fit. */
|
2008-07-18 19:35:44 +00:00
|
|
|
key.bits = size | CHUNK_MAP_KEY;
|
|
|
|
mapelm = arena_avail_tree_nsearch(&arena->runs_avail, &key);
|
|
|
|
if (mapelm != NULL) {
|
|
|
|
arena_chunk_t *run_chunk = CHUNK_ADDR2BASE(mapelm);
|
|
|
|
size_t pageind = ((uintptr_t)mapelm - (uintptr_t)run_chunk->map)
|
|
|
|
/ sizeof(arena_chunk_map_t);
|
|
|
|
|
|
|
|
run = (arena_run_t *)((uintptr_t)run_chunk + (pageind
|
2008-09-10 14:27:34 +00:00
|
|
|
<< PAGE_SHIFT));
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_split(arena, run, size, large, zero);
|
2008-02-06 02:59:54 +00:00
|
|
|
return (run);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2007-06-15 22:00:16 +00:00
|
|
|
/*
|
|
|
|
* No usable runs. Create a new chunk from which to allocate the run.
|
|
|
|
*/
|
2007-03-23 05:05:48 +00:00
|
|
|
chunk = arena_chunk_alloc(arena);
|
|
|
|
if (chunk == NULL)
|
2006-03-17 09:00:27 +00:00
|
|
|
return (NULL);
|
2007-03-23 05:05:48 +00:00
|
|
|
run = (arena_run_t *)((uintptr_t)chunk + (arena_chunk_header_npages <<
|
2008-09-10 14:27:34 +00:00
|
|
|
PAGE_SHIFT));
|
2007-03-23 05:05:48 +00:00
|
|
|
/* Update page map. */
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_split(arena, run, size, large, zero);
|
2007-03-23 05:05:48 +00:00
|
|
|
return (run);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2010-02-28 22:57:13 +00:00
|
|
|
#ifdef MALLOC_DEBUG
|
|
|
|
static arena_chunk_t *
|
|
|
|
chunks_dirty_iter_cb(arena_chunk_tree_t *tree, arena_chunk_t *chunk, void *arg)
|
|
|
|
{
|
|
|
|
size_t *ndirty = (size_t *)arg;
|
|
|
|
|
|
|
|
assert(chunk->dirtied);
|
|
|
|
*ndirty += chunk->ndirty;
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
static void
|
2008-02-06 02:59:54 +00:00
|
|
|
arena_purge(arena_t *arena)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
2006-03-17 09:00:27 +00:00
|
|
|
arena_chunk_t *chunk;
|
2008-05-01 17:25:55 +00:00
|
|
|
size_t i, npages;
|
2008-02-06 02:59:54 +00:00
|
|
|
#ifdef MALLOC_DEBUG
|
2008-07-18 19:35:44 +00:00
|
|
|
size_t ndirty = 0;
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2010-02-28 22:57:13 +00:00
|
|
|
arena_chunk_tree_dirty_iter(&arena->chunks_dirty, NULL,
|
|
|
|
chunks_dirty_iter_cb, (void *)&ndirty);
|
2008-02-06 02:59:54 +00:00
|
|
|
assert(ndirty == arena->ndirty);
|
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
assert((arena->nactive >> opt_lg_dirty_mult) < arena->ndirty);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
arena->stats.npurge++;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Iterate downward through chunks until enough dirty memory has been
|
2008-05-01 17:25:55 +00:00
|
|
|
* purged. Terminate as soon as possible in order to minimize the
|
|
|
|
* number of system calls, even if a chunk has only been partially
|
2008-02-06 02:59:54 +00:00
|
|
|
* purged.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
|
|
|
|
while ((arena->nactive >> (opt_lg_dirty_mult + 1)) < arena->ndirty) {
|
2008-05-01 17:25:55 +00:00
|
|
|
chunk = arena_chunk_tree_dirty_last(&arena->chunks_dirty);
|
|
|
|
assert(chunk != NULL);
|
|
|
|
|
|
|
|
for (i = chunk_npages - 1; chunk->ndirty > 0; i--) {
|
|
|
|
assert(i >= arena_chunk_header_npages);
|
2008-07-18 19:35:44 +00:00
|
|
|
if (chunk->map[i].bits & CHUNK_MAP_DIRTY) {
|
|
|
|
chunk->map[i].bits ^= CHUNK_MAP_DIRTY;
|
2008-05-01 17:25:55 +00:00
|
|
|
/* Find adjacent dirty run(s). */
|
|
|
|
for (npages = 1; i > arena_chunk_header_npages
|
2008-07-18 19:35:44 +00:00
|
|
|
&& (chunk->map[i - 1].bits &
|
|
|
|
CHUNK_MAP_DIRTY); npages++) {
|
2008-05-01 17:25:55 +00:00
|
|
|
i--;
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[i].bits ^= CHUNK_MAP_DIRTY;
|
2008-05-01 17:25:55 +00:00
|
|
|
}
|
|
|
|
chunk->ndirty -= npages;
|
|
|
|
arena->ndirty -= npages;
|
|
|
|
|
|
|
|
madvise((void *)((uintptr_t)chunk + (i <<
|
2008-09-10 14:27:34 +00:00
|
|
|
PAGE_SHIFT)), (npages << PAGE_SHIFT),
|
2008-05-01 17:25:55 +00:00
|
|
|
MADV_FREE);
|
2008-02-06 02:59:54 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2008-05-01 17:25:55 +00:00
|
|
|
arena->stats.nmadvise++;
|
|
|
|
arena->stats.purged += npages;
|
2008-02-06 02:59:54 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
if ((arena->nactive >> (opt_lg_dirty_mult + 1))
|
|
|
|
>= arena->ndirty)
|
2008-05-01 17:25:55 +00:00
|
|
|
break;
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
|
|
|
}
|
2008-05-01 17:25:55 +00:00
|
|
|
|
|
|
|
if (chunk->ndirty == 0) {
|
|
|
|
arena_chunk_tree_dirty_remove(&arena->chunks_dirty,
|
|
|
|
chunk);
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk->dirtied = false;
|
2008-05-01 17:25:55 +00:00
|
|
|
}
|
|
|
|
}
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
arena_run_dalloc(arena_t *arena, arena_run_t *run, bool dirty)
|
|
|
|
{
|
|
|
|
arena_chunk_t *chunk;
|
|
|
|
size_t size, run_ind, run_pages;
|
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(run);
|
2008-07-18 19:35:44 +00:00
|
|
|
run_ind = (size_t)(((uintptr_t)run - (uintptr_t)chunk)
|
2008-09-10 14:27:34 +00:00
|
|
|
>> PAGE_SHIFT);
|
2007-03-23 22:58:15 +00:00
|
|
|
assert(run_ind >= arena_chunk_header_npages);
|
2008-07-18 19:35:44 +00:00
|
|
|
assert(run_ind < chunk_npages);
|
|
|
|
if ((chunk->map[run_ind].bits & CHUNK_MAP_LARGE) != 0)
|
2008-09-10 14:27:34 +00:00
|
|
|
size = chunk->map[run_ind].bits & ~PAGE_MASK;
|
2008-07-18 19:35:44 +00:00
|
|
|
else
|
|
|
|
size = run->bin->run_size;
|
2008-09-10 14:27:34 +00:00
|
|
|
run_pages = (size >> PAGE_SHIFT);
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->nactive -= run_pages;
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
/* Mark pages as unallocated in the chunk map. */
|
2008-02-06 02:59:54 +00:00
|
|
|
if (dirty) {
|
|
|
|
size_t i;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
for (i = 0; i < run_pages; i++) {
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* When (dirty == true), *all* pages within the run
|
|
|
|
* need to have their dirty bits set, because only
|
|
|
|
* small runs can create a mixture of clean/dirty
|
|
|
|
* pages, but such runs are passed to this function
|
|
|
|
* with (dirty == false).
|
|
|
|
*/
|
2008-07-18 19:35:44 +00:00
|
|
|
assert((chunk->map[run_ind + i].bits & CHUNK_MAP_DIRTY)
|
|
|
|
== 0);
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk->ndirty++;
|
|
|
|
arena->ndirty++;
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[run_ind + i].bits = CHUNK_MAP_DIRTY;
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
2008-07-18 19:35:44 +00:00
|
|
|
} else {
|
2008-02-06 02:59:54 +00:00
|
|
|
size_t i;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
for (i = 0; i < run_pages; i++) {
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[run_ind + i].bits &= ~(CHUNK_MAP_LARGE |
|
|
|
|
CHUNK_MAP_ALLOCATED);
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
|
|
|
}
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[run_ind].bits = size | (chunk->map[run_ind].bits &
|
2010-01-31 23:16:10 +00:00
|
|
|
CHUNK_MAP_FLAGS_MASK);
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[run_ind+run_pages-1].bits = size |
|
2010-01-31 23:16:10 +00:00
|
|
|
(chunk->map[run_ind+run_pages-1].bits & CHUNK_MAP_FLAGS_MASK);
|
2007-03-23 05:05:48 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Try to coalesce forward. */
|
2008-07-18 19:35:44 +00:00
|
|
|
if (run_ind + run_pages < chunk_npages &&
|
|
|
|
(chunk->map[run_ind+run_pages].bits & CHUNK_MAP_ALLOCATED) == 0) {
|
|
|
|
size_t nrun_size = chunk->map[run_ind+run_pages].bits &
|
2008-09-10 14:27:34 +00:00
|
|
|
~PAGE_MASK;
|
2008-07-18 19:35:44 +00:00
|
|
|
|
2007-11-27 03:12:15 +00:00
|
|
|
/*
|
2008-07-18 19:35:44 +00:00
|
|
|
* Remove successor from runs_avail; the coalesced run is
|
|
|
|
* inserted later.
|
2007-11-27 03:12:15 +00:00
|
|
|
*/
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_avail_tree_remove(&arena->runs_avail,
|
|
|
|
&chunk->map[run_ind+run_pages]);
|
|
|
|
|
|
|
|
size += nrun_size;
|
2008-09-10 14:27:34 +00:00
|
|
|
run_pages = size >> PAGE_SHIFT;
|
2008-07-18 19:35:44 +00:00
|
|
|
|
2008-09-10 14:27:34 +00:00
|
|
|
assert((chunk->map[run_ind+run_pages-1].bits & ~PAGE_MASK)
|
2008-07-18 19:35:44 +00:00
|
|
|
== nrun_size);
|
|
|
|
chunk->map[run_ind].bits = size | (chunk->map[run_ind].bits &
|
2010-01-31 23:16:10 +00:00
|
|
|
CHUNK_MAP_FLAGS_MASK);
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[run_ind+run_pages-1].bits = size |
|
2010-01-31 23:16:10 +00:00
|
|
|
(chunk->map[run_ind+run_pages-1].bits &
|
|
|
|
CHUNK_MAP_FLAGS_MASK);
|
2007-03-23 05:05:48 +00:00
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Try to coalesce backward. */
|
2008-07-18 19:35:44 +00:00
|
|
|
if (run_ind > arena_chunk_header_npages && (chunk->map[run_ind-1].bits &
|
|
|
|
CHUNK_MAP_ALLOCATED) == 0) {
|
2008-09-10 14:27:34 +00:00
|
|
|
size_t prun_size = chunk->map[run_ind-1].bits & ~PAGE_MASK;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-09-10 14:27:34 +00:00
|
|
|
run_ind -= prun_size >> PAGE_SHIFT;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
/*
|
|
|
|
* Remove predecessor from runs_avail; the coalesced run is
|
|
|
|
* inserted later.
|
|
|
|
*/
|
|
|
|
arena_avail_tree_remove(&arena->runs_avail,
|
|
|
|
&chunk->map[run_ind]);
|
|
|
|
|
|
|
|
size += prun_size;
|
2008-09-10 14:27:34 +00:00
|
|
|
run_pages = size >> PAGE_SHIFT;
|
2008-07-18 19:35:44 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
assert((chunk->map[run_ind].bits & ~PAGE_MASK) == prun_size);
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[run_ind].bits = size | (chunk->map[run_ind].bits &
|
2010-01-31 23:16:10 +00:00
|
|
|
CHUNK_MAP_FLAGS_MASK);
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[run_ind+run_pages-1].bits = size |
|
2010-01-31 23:16:10 +00:00
|
|
|
(chunk->map[run_ind+run_pages-1].bits &
|
|
|
|
CHUNK_MAP_FLAGS_MASK);
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
/* Insert into runs_avail, now that coalescing is complete. */
|
|
|
|
arena_avail_tree_insert(&arena->runs_avail, &chunk->map[run_ind]);
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* Deallocate chunk if it is now completely unused. The bit
|
|
|
|
* manipulation checks whether the first run is unallocated and extends
|
|
|
|
* to the end of the chunk.
|
|
|
|
*/
|
2008-09-10 14:27:34 +00:00
|
|
|
if ((chunk->map[arena_chunk_header_npages].bits & (~PAGE_MASK |
|
2008-07-18 19:35:44 +00:00
|
|
|
CHUNK_MAP_ALLOCATED)) == arena_maxclass)
|
2006-12-23 00:18:51 +00:00
|
|
|
arena_chunk_dealloc(arena, chunk);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* It is okay to do dirty page processing even if the chunk was
|
|
|
|
* deallocated above, since in that case it is the spare. Waiting
|
|
|
|
* until after possible chunk deallocation to do dirty processing
|
|
|
|
* allows for an old spare to be fully deallocated, thus decreasing the
|
|
|
|
* chances of spuriously crossing the dirty page purging threshold.
|
|
|
|
*/
|
|
|
|
if (dirty) {
|
|
|
|
if (chunk->dirtied == false) {
|
|
|
|
arena_chunk_tree_dirty_insert(&arena->chunks_dirty,
|
|
|
|
chunk);
|
|
|
|
chunk->dirtied = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Enforce opt_lg_dirty_mult. */
|
|
|
|
if (opt_lg_dirty_mult >= 0 && (arena->nactive >>
|
|
|
|
opt_lg_dirty_mult) < arena->ndirty)
|
|
|
|
arena_purge(arena);
|
|
|
|
}
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_trim_head(arena_t *arena, arena_chunk_t *chunk, arena_run_t *run,
|
|
|
|
size_t oldsize, size_t newsize)
|
2008-02-06 02:59:54 +00:00
|
|
|
{
|
2008-09-10 14:27:34 +00:00
|
|
|
size_t pageind = ((uintptr_t)run - (uintptr_t)chunk) >> PAGE_SHIFT;
|
|
|
|
size_t head_npages = (oldsize - newsize) >> PAGE_SHIFT;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
assert(oldsize > newsize);
|
|
|
|
|
|
|
|
/*
|
2008-07-18 19:35:44 +00:00
|
|
|
* Update the chunk map so that arena_run_dalloc() can treat the
|
|
|
|
* leading run as separately allocated.
|
2008-02-06 02:59:54 +00:00
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
assert((chunk->map[pageind].bits & CHUNK_MAP_DIRTY) == 0);
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[pageind].bits = (oldsize - newsize) | CHUNK_MAP_LARGE |
|
|
|
|
CHUNK_MAP_ALLOCATED;
|
2010-01-31 23:16:10 +00:00
|
|
|
assert((chunk->map[pageind+head_npages].bits & CHUNK_MAP_DIRTY) == 0);
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[pageind+head_npages].bits = newsize | CHUNK_MAP_LARGE |
|
|
|
|
CHUNK_MAP_ALLOCATED;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_dalloc(arena, run, false);
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_trim_tail(arena_t *arena, arena_chunk_t *chunk, arena_run_t *run,
|
|
|
|
size_t oldsize, size_t newsize, bool dirty)
|
2008-02-06 02:59:54 +00:00
|
|
|
{
|
2008-09-10 14:27:34 +00:00
|
|
|
size_t pageind = ((uintptr_t)run - (uintptr_t)chunk) >> PAGE_SHIFT;
|
|
|
|
size_t npages = newsize >> PAGE_SHIFT;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
assert(oldsize > newsize);
|
|
|
|
|
|
|
|
/*
|
2008-07-18 19:35:44 +00:00
|
|
|
* Update the chunk map so that arena_run_dalloc() can treat the
|
|
|
|
* trailing run as separately allocated.
|
2008-02-06 02:59:54 +00:00
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
assert((chunk->map[pageind].bits & CHUNK_MAP_DIRTY) == 0);
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[pageind].bits = newsize | CHUNK_MAP_LARGE |
|
|
|
|
CHUNK_MAP_ALLOCATED;
|
2010-01-31 23:16:10 +00:00
|
|
|
assert((chunk->map[pageind+npages].bits & CHUNK_MAP_DIRTY) == 0);
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[pageind+npages].bits = (oldsize - newsize) | CHUNK_MAP_LARGE
|
|
|
|
| CHUNK_MAP_ALLOCATED;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
arena_run_dalloc(arena, (arena_run_t *)((uintptr_t)run + newsize),
|
|
|
|
dirty);
|
2006-03-17 09:00:27 +00:00
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
static arena_run_t *
|
2006-07-27 04:00:12 +00:00
|
|
|
arena_bin_nonfull_run_get(arena_t *arena, arena_bin_t *bin)
|
2006-03-17 09:00:27 +00:00
|
|
|
{
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_chunk_map_t *mapelm;
|
2006-03-17 09:00:27 +00:00
|
|
|
arena_run_t *run;
|
|
|
|
unsigned i, remainder;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Look for a usable run. */
|
2008-07-18 19:35:44 +00:00
|
|
|
mapelm = arena_run_tree_first(&bin->runs);
|
|
|
|
if (mapelm != NULL) {
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_chunk_t *chunk;
|
|
|
|
size_t pageind;
|
|
|
|
|
2006-03-20 04:05:05 +00:00
|
|
|
/* run is guaranteed to have available space. */
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_tree_remove(&bin->runs, mapelm);
|
2010-01-31 23:16:10 +00:00
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(mapelm);
|
|
|
|
pageind = (((uintptr_t)mapelm - (uintptr_t)chunk->map) /
|
|
|
|
sizeof(arena_chunk_map_t));
|
|
|
|
run = (arena_run_t *)((uintptr_t)chunk + (uintptr_t)((pageind -
|
|
|
|
((mapelm->bits & CHUNK_MAP_PG_MASK) >> CHUNK_MAP_PG_SHIFT))
|
|
|
|
<< PAGE_SHIFT));
|
2007-03-28 19:55:07 +00:00
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
bin->stats.reruns++;
|
|
|
|
#endif
|
2006-03-17 09:00:27 +00:00
|
|
|
return (run);
|
|
|
|
}
|
2006-03-26 23:37:25 +00:00
|
|
|
/* No existing runs have any space available. */
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Allocate a new run. */
|
2008-07-18 19:35:44 +00:00
|
|
|
run = arena_run_alloc(arena, bin->run_size, false, false);
|
2006-03-17 09:00:27 +00:00
|
|
|
if (run == NULL)
|
|
|
|
return (NULL);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Initialize run internals. */
|
|
|
|
run->bin = bin;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2008-06-10 15:46:18 +00:00
|
|
|
for (i = 0; i < bin->regs_mask_nelms - 1; i++)
|
2006-03-17 09:00:27 +00:00
|
|
|
run->regs_mask[i] = UINT_MAX;
|
2010-01-31 23:16:10 +00:00
|
|
|
remainder = bin->nregs & ((1U << (LG_SIZEOF_INT + 3)) - 1);
|
2008-06-10 15:46:18 +00:00
|
|
|
if (remainder == 0)
|
|
|
|
run->regs_mask[i] = UINT_MAX;
|
|
|
|
else {
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
/* The last element has spare bits that need to be unset. */
|
2010-01-31 23:16:10 +00:00
|
|
|
run->regs_mask[i] = (UINT_MAX >> ((1U << (LG_SIZEOF_INT + 3))
|
2006-03-17 09:00:27 +00:00
|
|
|
- remainder));
|
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
run->regs_minelm = 0;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
run->nfree = bin->nregs;
|
|
|
|
#ifdef MALLOC_DEBUG
|
|
|
|
run->magic = ARENA_RUN_MAGIC;
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2006-03-17 09:00:27 +00:00
|
|
|
bin->stats.nruns++;
|
|
|
|
bin->stats.curruns++;
|
|
|
|
if (bin->stats.curruns > bin->stats.highruns)
|
|
|
|
bin->stats.highruns = bin->stats.curruns;
|
2003-10-25 12:56:51 +00:00
|
|
|
#endif
|
2006-03-17 09:00:27 +00:00
|
|
|
return (run);
|
|
|
|
}
|
1996-09-23 19:26:39 +00:00
|
|
|
|
2006-03-26 23:37:25 +00:00
|
|
|
/* bin->runcur must have space available before this function is called. */
|
2006-03-17 09:00:27 +00:00
|
|
|
static inline void *
|
2006-07-27 04:00:12 +00:00
|
|
|
arena_bin_malloc_easy(arena_t *arena, arena_bin_t *bin, arena_run_t *run)
|
2006-03-17 09:00:27 +00:00
|
|
|
{
|
|
|
|
void *ret;
|
1995-10-08 18:44:20 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
assert(run->magic == ARENA_RUN_MAGIC);
|
2006-03-20 04:05:05 +00:00
|
|
|
assert(run->nfree > 0);
|
1995-10-08 18:44:20 +00:00
|
|
|
|
2006-04-04 03:51:47 +00:00
|
|
|
ret = arena_run_reg_alloc(run, bin);
|
|
|
|
assert(ret != NULL);
|
2006-03-17 09:00:27 +00:00
|
|
|
run->nfree--;
|
1996-09-23 19:26:39 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
return (ret);
|
|
|
|
}
|
1996-09-23 19:26:39 +00:00
|
|
|
|
2006-03-26 23:37:25 +00:00
|
|
|
/* Re-fill bin->runcur, then call arena_bin_malloc_easy(). */
|
2006-01-13 18:38:56 +00:00
|
|
|
static void *
|
2006-07-27 04:00:12 +00:00
|
|
|
arena_bin_malloc_hard(arena_t *arena, arena_bin_t *bin)
|
2006-03-17 09:00:27 +00:00
|
|
|
{
|
|
|
|
|
2006-07-27 04:00:12 +00:00
|
|
|
bin->runcur = arena_bin_nonfull_run_get(arena, bin);
|
2006-03-17 09:00:27 +00:00
|
|
|
if (bin->runcur == NULL)
|
|
|
|
return (NULL);
|
|
|
|
assert(bin->runcur->magic == ARENA_RUN_MAGIC);
|
2006-03-24 22:13:49 +00:00
|
|
|
assert(bin->runcur->nfree > 0);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2006-07-27 04:00:12 +00:00
|
|
|
return (arena_bin_malloc_easy(arena, bin, bin->runcur));
|
2006-03-17 09:00:27 +00:00
|
|
|
}
|
|
|
|
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
/*
|
|
|
|
* Calculate bin->run_size such that it meets the following constraints:
|
|
|
|
*
|
|
|
|
* *) bin->run_size >= min_run_size
|
|
|
|
* *) bin->run_size <= arena_maxclass
|
|
|
|
* *) bin->run_size <= RUN_MAX_SMALL
|
2007-03-23 05:05:48 +00:00
|
|
|
* *) run header overhead <= RUN_MAX_OVRHD (or header overhead relaxed).
|
2010-01-31 23:16:10 +00:00
|
|
|
* *) run header size < PAGE_SIZE
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
*
|
|
|
|
* bin->nregs, bin->regs_mask_nelms, and bin->reg0_offset are
|
|
|
|
* also calculated here, since these settings are all interdependent.
|
|
|
|
*/
|
|
|
|
static size_t
|
|
|
|
arena_bin_run_size_calc(arena_bin_t *bin, size_t min_run_size)
|
|
|
|
{
|
|
|
|
size_t try_run_size, good_run_size;
|
2007-03-23 22:58:15 +00:00
|
|
|
unsigned good_nregs, good_mask_nelms, good_reg0_offset;
|
|
|
|
unsigned try_nregs, try_mask_nelms, try_reg0_offset;
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
|
2008-09-10 14:27:34 +00:00
|
|
|
assert(min_run_size >= PAGE_SIZE);
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
assert(min_run_size <= arena_maxclass);
|
|
|
|
assert(min_run_size <= RUN_MAX_SMALL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Calculate known-valid settings before entering the run_size
|
|
|
|
* expansion loop, so that the first part of the loop always copies
|
|
|
|
* valid settings.
|
|
|
|
*
|
|
|
|
* The do..while loop iteratively reduces the number of regions until
|
|
|
|
* the run header and the regions no longer overlap. A closed formula
|
|
|
|
* would be quite messy, since there is an interdependency between the
|
|
|
|
* header's mask length and the number of regions.
|
|
|
|
*/
|
|
|
|
try_run_size = min_run_size;
|
|
|
|
try_nregs = ((try_run_size - sizeof(arena_run_t)) / bin->reg_size)
|
2007-12-18 05:27:57 +00:00
|
|
|
+ 1; /* Counter-act try_nregs-- in loop. */
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
do {
|
|
|
|
try_nregs--;
|
2010-01-31 23:16:10 +00:00
|
|
|
try_mask_nelms = (try_nregs >> (LG_SIZEOF_INT + 3)) +
|
|
|
|
((try_nregs & ((1U << (LG_SIZEOF_INT + 3)) - 1)) ? 1 : 0);
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
try_reg0_offset = try_run_size - (try_nregs * bin->reg_size);
|
|
|
|
} while (sizeof(arena_run_t) + (sizeof(unsigned) * (try_mask_nelms - 1))
|
|
|
|
> try_reg0_offset);
|
|
|
|
|
|
|
|
/* run_size expansion loop. */
|
|
|
|
do {
|
|
|
|
/*
|
|
|
|
* Copy valid settings before trying more aggressive settings.
|
|
|
|
*/
|
|
|
|
good_run_size = try_run_size;
|
|
|
|
good_nregs = try_nregs;
|
|
|
|
good_mask_nelms = try_mask_nelms;
|
|
|
|
good_reg0_offset = try_reg0_offset;
|
|
|
|
|
|
|
|
/* Try more aggressive settings. */
|
2008-09-10 14:27:34 +00:00
|
|
|
try_run_size += PAGE_SIZE;
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
try_nregs = ((try_run_size - sizeof(arena_run_t)) /
|
|
|
|
bin->reg_size) + 1; /* Counter-act try_nregs-- in loop. */
|
|
|
|
do {
|
|
|
|
try_nregs--;
|
2010-01-31 23:16:10 +00:00
|
|
|
try_mask_nelms = (try_nregs >> (LG_SIZEOF_INT + 3)) +
|
|
|
|
((try_nregs & ((1U << (LG_SIZEOF_INT + 3)) - 1)) ?
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
1 : 0);
|
|
|
|
try_reg0_offset = try_run_size - (try_nregs *
|
|
|
|
bin->reg_size);
|
|
|
|
} while (sizeof(arena_run_t) + (sizeof(unsigned) *
|
|
|
|
(try_mask_nelms - 1)) > try_reg0_offset);
|
|
|
|
} while (try_run_size <= arena_maxclass && try_run_size <= RUN_MAX_SMALL
|
2007-12-18 05:27:57 +00:00
|
|
|
&& RUN_MAX_OVRHD * (bin->reg_size << 3) > RUN_MAX_OVRHD_RELAX
|
2010-01-31 23:16:10 +00:00
|
|
|
&& (try_reg0_offset << RUN_BFP) > RUN_MAX_OVRHD * try_run_size
|
|
|
|
&& (sizeof(arena_run_t) + (sizeof(unsigned) * (try_mask_nelms - 1)))
|
|
|
|
< PAGE_SIZE);
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
|
|
|
|
assert(sizeof(arena_run_t) + (sizeof(unsigned) * (good_mask_nelms - 1))
|
|
|
|
<= good_reg0_offset);
|
2010-01-31 23:16:10 +00:00
|
|
|
assert((good_mask_nelms << (LG_SIZEOF_INT + 3)) >= good_nregs);
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
|
|
|
|
/* Copy final settings. */
|
|
|
|
bin->run_size = good_run_size;
|
|
|
|
bin->nregs = good_nregs;
|
|
|
|
bin->regs_mask_nelms = good_mask_nelms;
|
|
|
|
bin->reg0_offset = good_reg0_offset;
|
|
|
|
|
|
|
|
return (good_run_size);
|
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
2007-11-27 03:17:30 +00:00
|
|
|
static inline void
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_event(tcache_t *tcache)
|
2007-11-27 03:17:30 +00:00
|
|
|
{
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (tcache_gc_incr == 0)
|
|
|
|
return;
|
2007-11-27 03:17:30 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache->ev_cnt++;
|
|
|
|
assert(tcache->ev_cnt <= tcache_gc_incr);
|
|
|
|
if (tcache->ev_cnt >= tcache_gc_incr) {
|
|
|
|
size_t binind = tcache->next_gc_bin;
|
|
|
|
tcache_bin_t *tbin = tcache->tbins[binind];
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (tbin != NULL) {
|
|
|
|
if (tbin->high_water == 0) {
|
|
|
|
/*
|
|
|
|
* This bin went completely unused for an
|
|
|
|
* entire GC cycle, so throw away the tbin.
|
|
|
|
*/
|
|
|
|
assert(tbin->ncached == 0);
|
|
|
|
tcache_bin_destroy(tcache, tbin, binind);
|
|
|
|
tcache->tbins[binind] = NULL;
|
|
|
|
} else {
|
|
|
|
if (tbin->low_water > 0) {
|
|
|
|
/*
|
|
|
|
* Flush (ceiling) half of the objects
|
|
|
|
* below the low water mark.
|
|
|
|
*/
|
|
|
|
tcache_bin_flush(tbin, binind,
|
|
|
|
tbin->ncached - (tbin->low_water >>
|
|
|
|
1) - (tbin->low_water & 1));
|
|
|
|
}
|
|
|
|
tbin->low_water = tbin->ncached;
|
|
|
|
tbin->high_water = tbin->ncached;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
tcache->next_gc_bin++;
|
|
|
|
if (tcache->next_gc_bin == nbins)
|
|
|
|
tcache->next_gc_bin = 0;
|
|
|
|
tcache->ev_cnt = 0;
|
2007-11-27 03:17:30 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
static inline void *
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_bin_alloc(tcache_bin_t *tbin)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (tbin->ncached == 0)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
return (NULL);
|
2010-01-31 23:16:10 +00:00
|
|
|
tbin->ncached--;
|
|
|
|
if (tbin->ncached < tbin->low_water)
|
|
|
|
tbin->low_water = tbin->ncached;
|
|
|
|
return (tbin->slots[tbin->ncached]);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_bin_fill(tcache_t *tcache, tcache_bin_t *tbin, size_t binind)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
|
|
|
arena_t *arena;
|
|
|
|
arena_bin_t *bin;
|
|
|
|
arena_run_t *run;
|
2010-01-31 23:16:10 +00:00
|
|
|
void *ptr;
|
|
|
|
unsigned i;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
assert(tbin->ncached == 0);
|
|
|
|
|
|
|
|
arena = tcache->arena;
|
|
|
|
bin = &arena->bins[binind];
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
malloc_spin_lock(&arena->lock);
|
2010-01-31 23:16:10 +00:00
|
|
|
for (i = 0; i < (tcache_nslots >> 1); i++) {
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
if ((run = bin->runcur) != NULL && run->nfree > 0)
|
2010-01-31 23:16:10 +00:00
|
|
|
ptr = arena_bin_malloc_easy(arena, bin, run);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
else
|
2010-01-31 23:16:10 +00:00
|
|
|
ptr = arena_bin_malloc_hard(arena, bin);
|
|
|
|
if (ptr == NULL)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
break;
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* Fill tbin such that the objects lowest in memory are used
|
|
|
|
* first.
|
|
|
|
*/
|
|
|
|
tbin->slots[(tcache_nslots >> 1) - 1 - i] = ptr;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
bin->stats.nfills++;
|
|
|
|
bin->stats.nrequests += tbin->tstats.nrequests;
|
|
|
|
if (bin->reg_size <= small_maxclass) {
|
|
|
|
arena->stats.nmalloc_small += (i - tbin->ncached);
|
|
|
|
arena->stats.allocated_small += (i - tbin->ncached) *
|
|
|
|
bin->reg_size;
|
|
|
|
arena->stats.nmalloc_small += tbin->tstats.nrequests;
|
|
|
|
} else {
|
|
|
|
arena->stats.nmalloc_medium += (i - tbin->ncached);
|
|
|
|
arena->stats.allocated_medium += (i - tbin->ncached) *
|
|
|
|
bin->reg_size;
|
|
|
|
arena->stats.nmalloc_medium += tbin->tstats.nrequests;
|
|
|
|
}
|
|
|
|
tbin->tstats.nrequests = 0;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
2010-01-31 23:16:10 +00:00
|
|
|
tbin->ncached = i;
|
|
|
|
if (tbin->ncached > tbin->high_water)
|
|
|
|
tbin->high_water = tbin->ncached;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void *
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_alloc(tcache_t *tcache, size_t size, bool zero)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
|
|
|
void *ret;
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_bin_t *tbin;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
size_t binind;
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (size <= small_maxclass)
|
|
|
|
binind = small_size2bin[size];
|
|
|
|
else {
|
|
|
|
binind = mbin0 + ((MEDIUM_CEILING(size) - medium_min) >>
|
|
|
|
lg_mspace);
|
|
|
|
}
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
assert(binind < nbins);
|
2010-01-31 23:16:10 +00:00
|
|
|
tbin = tcache->tbins[binind];
|
|
|
|
if (tbin == NULL) {
|
|
|
|
tbin = tcache_bin_create(tcache->arena);
|
|
|
|
if (tbin == NULL)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
return (NULL);
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache->tbins[binind] = tbin;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = tcache_bin_alloc(tbin);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
if (ret == NULL) {
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = tcache_alloc_hard(tcache, tbin, binind);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
if (ret == NULL)
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (zero == false) {
|
|
|
|
if (opt_junk)
|
|
|
|
memset(ret, 0xa5, size);
|
|
|
|
else if (opt_zero)
|
|
|
|
memset(ret, 0, size);
|
|
|
|
} else
|
|
|
|
memset(ret, 0, size);
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
tbin->tstats.nrequests++;
|
|
|
|
#endif
|
|
|
|
tcache_event(tcache);
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *
|
|
|
|
tcache_alloc_hard(tcache_t *tcache, tcache_bin_t *tbin, size_t binind)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
|
|
|
|
tcache_bin_fill(tcache, tbin, binind);
|
|
|
|
ret = tcache_bin_alloc(tbin);
|
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
static inline void *
|
|
|
|
arena_malloc_small(arena_t *arena, size_t size, bool zero)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
|
|
|
void *ret;
|
2008-02-06 02:59:54 +00:00
|
|
|
arena_bin_t *bin;
|
|
|
|
arena_run_t *run;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
size_t binind;
|
1996-10-26 08:19:07 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
binind = small_size2bin[size];
|
|
|
|
assert(binind < mbin0);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
bin = &arena->bins[binind];
|
|
|
|
size = bin->reg_size;
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
if ((run = bin->runcur) != NULL && run->nfree > 0)
|
|
|
|
ret = arena_bin_malloc_easy(arena, bin, run);
|
|
|
|
else
|
|
|
|
ret = arena_bin_malloc_hard(arena, bin);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
if (ret == NULL) {
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
return (NULL);
|
|
|
|
}
|
2007-03-23 22:58:15 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
# ifdef MALLOC_TCACHE
|
|
|
|
if (__isthreaded == false) {
|
|
|
|
# endif
|
|
|
|
bin->stats.nrequests++;
|
|
|
|
arena->stats.nmalloc_small++;
|
|
|
|
# ifdef MALLOC_TCACHE
|
|
|
|
}
|
|
|
|
# endif
|
2008-02-06 02:59:54 +00:00
|
|
|
arena->stats.allocated_small += size;
|
2006-03-17 09:00:27 +00:00
|
|
|
#endif
|
2008-02-06 02:59:54 +00:00
|
|
|
malloc_spin_unlock(&arena->lock);
|
2007-11-27 03:12:15 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
if (zero == false) {
|
|
|
|
if (opt_junk)
|
|
|
|
memset(ret, 0xa5, size);
|
|
|
|
else if (opt_zero)
|
2007-11-27 03:12:15 +00:00
|
|
|
memset(ret, 0, size);
|
2008-02-06 02:59:54 +00:00
|
|
|
} else
|
|
|
|
memset(ret, 0, size);
|
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
static void *
|
|
|
|
arena_malloc_medium(arena_t *arena, size_t size, bool zero)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
arena_bin_t *bin;
|
|
|
|
arena_run_t *run;
|
|
|
|
size_t binind;
|
|
|
|
|
|
|
|
size = MEDIUM_CEILING(size);
|
|
|
|
binind = mbin0 + ((size - medium_min) >> lg_mspace);
|
|
|
|
assert(binind < nbins);
|
|
|
|
bin = &arena->bins[binind];
|
|
|
|
assert(bin->reg_size == size);
|
|
|
|
|
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
if ((run = bin->runcur) != NULL && run->nfree > 0)
|
|
|
|
ret = arena_bin_malloc_easy(arena, bin, run);
|
|
|
|
else
|
|
|
|
ret = arena_bin_malloc_hard(arena, bin);
|
|
|
|
|
|
|
|
if (ret == NULL) {
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
# ifdef MALLOC_TCACHE
|
|
|
|
if (__isthreaded == false) {
|
|
|
|
# endif
|
|
|
|
bin->stats.nrequests++;
|
|
|
|
arena->stats.nmalloc_medium++;
|
|
|
|
# ifdef MALLOC_TCACHE
|
|
|
|
}
|
|
|
|
# endif
|
|
|
|
arena->stats.allocated_medium += size;
|
|
|
|
#endif
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
|
|
|
|
if (zero == false) {
|
|
|
|
if (opt_junk)
|
|
|
|
memset(ret, 0xa5, size);
|
|
|
|
else if (opt_zero)
|
|
|
|
memset(ret, 0, size);
|
|
|
|
} else
|
|
|
|
memset(ret, 0, size);
|
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
static void *
|
|
|
|
arena_malloc_large(arena_t *arena, size_t size, bool zero)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
|
|
|
|
/* Large allocation. */
|
|
|
|
size = PAGE_CEILING(size);
|
|
|
|
malloc_spin_lock(&arena->lock);
|
2008-07-18 19:35:44 +00:00
|
|
|
ret = (void *)arena_run_alloc(arena, size, true, zero);
|
2008-02-06 02:59:54 +00:00
|
|
|
if (ret == NULL) {
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
return (NULL);
|
|
|
|
}
|
2006-03-30 20:25:52 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2008-02-06 02:59:54 +00:00
|
|
|
arena->stats.nmalloc_large++;
|
|
|
|
arena->stats.allocated_large += size;
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].nrequests++;
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns++;
|
|
|
|
if (arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns >
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].highruns) {
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].highruns =
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns;
|
|
|
|
}
|
2006-03-30 20:25:52 +00:00
|
|
|
#endif
|
2008-02-06 02:59:54 +00:00
|
|
|
malloc_spin_unlock(&arena->lock);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
if (zero == false) {
|
|
|
|
if (opt_junk)
|
|
|
|
memset(ret, 0xa5, size);
|
|
|
|
else if (opt_zero)
|
|
|
|
memset(ret, 0, size);
|
2007-11-27 03:12:15 +00:00
|
|
|
}
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2007-03-23 22:58:15 +00:00
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
static inline void *
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_malloc(size_t size, bool zero)
|
2007-03-23 22:58:15 +00:00
|
|
|
{
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
assert(size != 0);
|
|
|
|
assert(QUANTUM_CEILING(size) <= arena_maxclass);
|
2007-03-23 22:58:15 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
if (size <= bin_maxclass) {
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
if (__isthreaded && tcache_nslots) {
|
|
|
|
tcache_t *tcache = tcache_tls;
|
|
|
|
if ((uintptr_t)tcache > (uintptr_t)1)
|
|
|
|
return (tcache_alloc(tcache, size, zero));
|
|
|
|
else if (tcache == NULL) {
|
|
|
|
tcache = tcache_create(choose_arena());
|
|
|
|
if (tcache == NULL)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
return (NULL);
|
2010-01-31 23:16:10 +00:00
|
|
|
return (tcache_alloc(tcache, size, zero));
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
}
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
if (size <= small_maxclass) {
|
|
|
|
return (arena_malloc_small(choose_arena(), size,
|
|
|
|
zero));
|
|
|
|
} else {
|
|
|
|
return (arena_malloc_medium(choose_arena(),
|
|
|
|
size, zero));
|
|
|
|
}
|
2008-02-06 02:59:54 +00:00
|
|
|
} else
|
2010-01-31 23:16:10 +00:00
|
|
|
return (arena_malloc_large(choose_arena(), size, zero));
|
2007-03-23 22:58:15 +00:00
|
|
|
}
|
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
static inline void *
|
|
|
|
imalloc(size_t size)
|
|
|
|
{
|
|
|
|
|
|
|
|
assert(size != 0);
|
|
|
|
|
|
|
|
if (size <= arena_maxclass)
|
2010-01-31 23:16:10 +00:00
|
|
|
return (arena_malloc(size, false));
|
2008-02-08 00:35:56 +00:00
|
|
|
else
|
|
|
|
return (huge_malloc(size, false));
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void *
|
|
|
|
icalloc(size_t size)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (size <= arena_maxclass)
|
2010-01-31 23:16:10 +00:00
|
|
|
return (arena_malloc(size, true));
|
2008-02-08 00:35:56 +00:00
|
|
|
else
|
|
|
|
return (huge_malloc(size, true));
|
|
|
|
}
|
|
|
|
|
2007-03-23 22:58:15 +00:00
|
|
|
/* Only handles large allocations that require more than page alignment. */
|
|
|
|
static void *
|
|
|
|
arena_palloc(arena_t *arena, size_t alignment, size_t size, size_t alloc_size)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
size_t offset;
|
|
|
|
arena_chunk_t *chunk;
|
|
|
|
|
2008-09-10 14:27:34 +00:00
|
|
|
assert((size & PAGE_MASK) == 0);
|
|
|
|
assert((alignment & PAGE_MASK) == 0);
|
2007-03-23 22:58:15 +00:00
|
|
|
|
2007-12-17 01:20:04 +00:00
|
|
|
malloc_spin_lock(&arena->lock);
|
2008-07-18 19:35:44 +00:00
|
|
|
ret = (void *)arena_run_alloc(arena, alloc_size, true, false);
|
2007-03-23 22:58:15 +00:00
|
|
|
if (ret == NULL) {
|
2007-11-27 03:17:30 +00:00
|
|
|
malloc_spin_unlock(&arena->lock);
|
2007-03-23 22:58:15 +00:00
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ret);
|
|
|
|
|
|
|
|
offset = (uintptr_t)ret & (alignment - 1);
|
2008-09-10 14:27:34 +00:00
|
|
|
assert((offset & PAGE_MASK) == 0);
|
2007-03-23 22:58:15 +00:00
|
|
|
assert(offset < alloc_size);
|
2008-07-18 19:35:44 +00:00
|
|
|
if (offset == 0)
|
|
|
|
arena_run_trim_tail(arena, chunk, ret, alloc_size, size, false);
|
|
|
|
else {
|
2007-03-23 22:58:15 +00:00
|
|
|
size_t leadsize, trailsize;
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
leadsize = alignment - offset;
|
|
|
|
if (leadsize > 0) {
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_trim_head(arena, chunk, ret, alloc_size,
|
2008-02-06 02:59:54 +00:00
|
|
|
alloc_size - leadsize);
|
|
|
|
ret = (void *)((uintptr_t)ret + leadsize);
|
2007-03-23 22:58:15 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
trailsize = alloc_size - leadsize - size;
|
|
|
|
if (trailsize != 0) {
|
|
|
|
/* Trim trailing space. */
|
|
|
|
assert(trailsize < alloc_size);
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_trim_tail(arena, chunk, ret, size + trailsize,
|
|
|
|
size, false);
|
2007-03-23 22:58:15 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
arena->stats.nmalloc_large++;
|
|
|
|
arena->stats.allocated_large += size;
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].nrequests++;
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns++;
|
|
|
|
if (arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns >
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].highruns) {
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].highruns =
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns;
|
|
|
|
}
|
2007-03-23 22:58:15 +00:00
|
|
|
#endif
|
2007-11-27 03:17:30 +00:00
|
|
|
malloc_spin_unlock(&arena->lock);
|
2007-03-23 22:58:15 +00:00
|
|
|
|
|
|
|
if (opt_junk)
|
2006-03-19 18:28:06 +00:00
|
|
|
memset(ret, 0xa5, size);
|
2007-03-23 22:58:15 +00:00
|
|
|
else if (opt_zero)
|
2006-03-19 18:28:06 +00:00
|
|
|
memset(ret, 0, size);
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
static inline void *
|
|
|
|
ipalloc(size_t alignment, size_t size)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
size_t ceil_size;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Round size up to the nearest multiple of alignment.
|
|
|
|
*
|
|
|
|
* This done, we can take advantage of the fact that for each small
|
|
|
|
* size class, every object is aligned at the smallest power of two
|
|
|
|
* that is non-zero in the base two representation of the size. For
|
|
|
|
* example:
|
|
|
|
*
|
|
|
|
* Size | Base 2 | Minimum alignment
|
|
|
|
* -----+----------+------------------
|
|
|
|
* 96 | 1100000 | 32
|
|
|
|
* 144 | 10100000 | 32
|
|
|
|
* 192 | 11000000 | 64
|
|
|
|
*
|
|
|
|
* Depending on runtime settings, it is possible that arena_malloc()
|
|
|
|
* will further round up to a power of two, but that never causes
|
|
|
|
* correctness issues.
|
|
|
|
*/
|
|
|
|
ceil_size = (size + (alignment - 1)) & (-alignment);
|
|
|
|
/*
|
|
|
|
* (ceil_size < size) protects against the combination of maximal
|
|
|
|
* alignment and size greater than maximal alignment.
|
|
|
|
*/
|
|
|
|
if (ceil_size < size) {
|
|
|
|
/* size_t overflow. */
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2008-09-10 14:27:34 +00:00
|
|
|
if (ceil_size <= PAGE_SIZE || (alignment <= PAGE_SIZE
|
2008-02-08 00:35:56 +00:00
|
|
|
&& ceil_size <= arena_maxclass))
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = arena_malloc(ceil_size, false);
|
2008-02-08 00:35:56 +00:00
|
|
|
else {
|
|
|
|
size_t run_size;
|
|
|
|
|
|
|
|
/*
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
* We can't achieve subpage alignment, so round up alignment
|
2008-02-08 00:35:56 +00:00
|
|
|
* permanently; it makes later calculations simpler.
|
|
|
|
*/
|
|
|
|
alignment = PAGE_CEILING(alignment);
|
|
|
|
ceil_size = PAGE_CEILING(size);
|
|
|
|
/*
|
|
|
|
* (ceil_size < size) protects against very large sizes within
|
2008-09-10 14:27:34 +00:00
|
|
|
* PAGE_SIZE of SIZE_T_MAX.
|
2008-02-08 00:35:56 +00:00
|
|
|
*
|
|
|
|
* (ceil_size + alignment < ceil_size) protects against the
|
|
|
|
* combination of maximal alignment and ceil_size large enough
|
|
|
|
* to cause overflow. This is similar to the first overflow
|
|
|
|
* check above, but it needs to be repeated due to the new
|
|
|
|
* ceil_size value, which may now be *equal* to maximal
|
|
|
|
* alignment, whereas before we only detected overflow if the
|
|
|
|
* original size was *greater* than maximal alignment.
|
|
|
|
*/
|
|
|
|
if (ceil_size < size || ceil_size + alignment < ceil_size) {
|
|
|
|
/* size_t overflow. */
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Calculate the size of the over-size run that arena_palloc()
|
|
|
|
* would need to allocate in order to guarantee the alignment.
|
|
|
|
*/
|
|
|
|
if (ceil_size >= alignment)
|
2008-09-10 14:27:34 +00:00
|
|
|
run_size = ceil_size + alignment - PAGE_SIZE;
|
2008-02-08 00:35:56 +00:00
|
|
|
else {
|
|
|
|
/*
|
|
|
|
* It is possible that (alignment << 1) will cause
|
|
|
|
* overflow, but it doesn't matter because we also
|
2008-09-10 14:27:34 +00:00
|
|
|
* subtract PAGE_SIZE, which in the case of overflow
|
2008-02-08 00:35:56 +00:00
|
|
|
* leaves us with a very large run_size. That causes
|
|
|
|
* the first conditional below to fail, which means
|
|
|
|
* that the bogus run_size value never gets used for
|
|
|
|
* anything important.
|
|
|
|
*/
|
2008-09-10 14:27:34 +00:00
|
|
|
run_size = (alignment << 1) - PAGE_SIZE;
|
2008-02-08 00:35:56 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (run_size <= arena_maxclass) {
|
|
|
|
ret = arena_palloc(choose_arena(), alignment, ceil_size,
|
|
|
|
run_size);
|
|
|
|
} else if (alignment <= chunksize)
|
|
|
|
ret = huge_malloc(ceil_size, false);
|
|
|
|
else
|
|
|
|
ret = huge_palloc(alignment, ceil_size);
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(((uintptr_t)ret & (alignment - 1)) == 0);
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
static bool
|
|
|
|
arena_is_large(const void *ptr)
|
|
|
|
{
|
|
|
|
arena_chunk_t *chunk;
|
|
|
|
size_t pageind, mapbits;
|
|
|
|
|
|
|
|
assert(ptr != NULL);
|
|
|
|
assert(CHUNK_ADDR2BASE(ptr) != ptr);
|
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
|
|
|
|
pageind = (((uintptr_t)ptr - (uintptr_t)chunk) >> PAGE_SHIFT);
|
|
|
|
mapbits = chunk->map[pageind].bits;
|
|
|
|
assert((mapbits & CHUNK_MAP_ALLOCATED) != 0);
|
|
|
|
return ((mapbits & CHUNK_MAP_LARGE) != 0);
|
|
|
|
}
|
|
|
|
|
2006-03-26 23:37:25 +00:00
|
|
|
/* Return the size of the allocation pointed to by ptr. */
|
|
|
|
static size_t
|
2006-03-30 20:25:52 +00:00
|
|
|
arena_salloc(const void *ptr)
|
2006-03-26 23:37:25 +00:00
|
|
|
{
|
|
|
|
size_t ret;
|
|
|
|
arena_chunk_t *chunk;
|
2008-07-18 19:35:44 +00:00
|
|
|
size_t pageind, mapbits;
|
2006-03-26 23:37:25 +00:00
|
|
|
|
|
|
|
assert(ptr != NULL);
|
|
|
|
assert(CHUNK_ADDR2BASE(ptr) != ptr);
|
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
|
2008-09-10 14:27:34 +00:00
|
|
|
pageind = (((uintptr_t)ptr - (uintptr_t)chunk) >> PAGE_SHIFT);
|
2008-07-18 19:35:44 +00:00
|
|
|
mapbits = chunk->map[pageind].bits;
|
|
|
|
assert((mapbits & CHUNK_MAP_ALLOCATED) != 0);
|
|
|
|
if ((mapbits & CHUNK_MAP_LARGE) == 0) {
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_run_t *run = (arena_run_t *)((uintptr_t)chunk +
|
|
|
|
(uintptr_t)((pageind - ((mapbits & CHUNK_MAP_PG_MASK) >>
|
|
|
|
CHUNK_MAP_PG_SHIFT)) << PAGE_SHIFT));
|
2006-03-26 23:37:25 +00:00
|
|
|
assert(run->magic == ARENA_RUN_MAGIC);
|
|
|
|
ret = run->bin->reg_size;
|
2008-02-06 02:59:54 +00:00
|
|
|
} else {
|
2008-09-10 14:27:34 +00:00
|
|
|
ret = mapbits & ~PAGE_MASK;
|
2008-07-18 19:35:44 +00:00
|
|
|
assert(ret != 0);
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
2006-03-26 23:37:25 +00:00
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
static inline size_t
|
|
|
|
isalloc(const void *ptr)
|
2008-02-06 02:59:54 +00:00
|
|
|
{
|
2008-02-08 00:35:56 +00:00
|
|
|
size_t ret;
|
|
|
|
arena_chunk_t *chunk;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
assert(ptr != NULL);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
|
|
|
|
if (chunk != ptr) {
|
|
|
|
/* Region. */
|
|
|
|
assert(chunk->arena->magic == ARENA_MAGIC);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
ret = arena_salloc(ptr);
|
|
|
|
} else {
|
|
|
|
extent_node_t *node, key;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
/* Chunk (huge allocation). */
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
malloc_mutex_lock(&huge_mtx);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
/* Extract from tree of huge allocations. */
|
|
|
|
key.addr = __DECONST(void *, ptr);
|
2008-05-01 17:25:55 +00:00
|
|
|
node = extent_tree_ad_search(&huge, &key);
|
2008-02-08 00:35:56 +00:00
|
|
|
assert(node != NULL);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
ret = node->size;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
malloc_mutex_unlock(&huge_mtx);
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
return (ret);
|
2008-02-06 02:59:54 +00:00
|
|
|
}
|
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
static inline void
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_dalloc_bin(arena_t *arena, arena_chunk_t *chunk, void *ptr,
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_chunk_map_t *mapelm)
|
2008-02-06 02:59:54 +00:00
|
|
|
{
|
2010-01-31 23:16:10 +00:00
|
|
|
size_t pageind;
|
2008-02-08 00:35:56 +00:00
|
|
|
arena_run_t *run;
|
|
|
|
arena_bin_t *bin;
|
|
|
|
size_t size;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
pageind = (((uintptr_t)ptr - (uintptr_t)chunk) >> PAGE_SHIFT);
|
|
|
|
run = (arena_run_t *)((uintptr_t)chunk + (uintptr_t)((pageind -
|
|
|
|
((mapelm->bits & CHUNK_MAP_PG_MASK) >> CHUNK_MAP_PG_SHIFT)) <<
|
|
|
|
PAGE_SHIFT));
|
2008-02-08 00:35:56 +00:00
|
|
|
assert(run->magic == ARENA_RUN_MAGIC);
|
|
|
|
bin = run->bin;
|
|
|
|
size = bin->reg_size;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
if (opt_junk)
|
|
|
|
memset(ptr, 0x5a, size);
|
2007-11-27 03:12:15 +00:00
|
|
|
|
|
|
|
arena_run_reg_dalloc(run, bin, ptr, size);
|
|
|
|
run->nfree++;
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (run->nfree == bin->nregs)
|
|
|
|
arena_dalloc_bin_run(arena, chunk, run, bin);
|
|
|
|
else if (run->nfree == 1 && run != bin->runcur) {
|
2007-11-27 03:12:15 +00:00
|
|
|
/*
|
|
|
|
* Make sure that bin->runcur always refers to the lowest
|
|
|
|
* non-full run, if one exists.
|
|
|
|
*/
|
|
|
|
if (bin->runcur == NULL)
|
|
|
|
bin->runcur = run;
|
|
|
|
else if ((uintptr_t)run < (uintptr_t)bin->runcur) {
|
|
|
|
/* Switch runcur. */
|
|
|
|
if (bin->runcur->nfree > 0) {
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_chunk_t *runcur_chunk =
|
|
|
|
CHUNK_ADDR2BASE(bin->runcur);
|
|
|
|
size_t runcur_pageind =
|
|
|
|
(((uintptr_t)bin->runcur -
|
2008-09-10 14:27:34 +00:00
|
|
|
(uintptr_t)runcur_chunk)) >> PAGE_SHIFT;
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_chunk_map_t *runcur_mapelm =
|
|
|
|
&runcur_chunk->map[runcur_pageind];
|
|
|
|
|
2007-11-27 03:12:15 +00:00
|
|
|
/* Insert runcur. */
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_tree_insert(&bin->runs,
|
|
|
|
runcur_mapelm);
|
2007-11-27 03:12:15 +00:00
|
|
|
}
|
|
|
|
bin->runcur = run;
|
2008-07-18 19:35:44 +00:00
|
|
|
} else {
|
|
|
|
size_t run_pageind = (((uintptr_t)run -
|
2008-09-10 14:27:34 +00:00
|
|
|
(uintptr_t)chunk)) >> PAGE_SHIFT;
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_chunk_map_t *run_mapelm =
|
|
|
|
&chunk->map[run_pageind];
|
|
|
|
|
|
|
|
assert(arena_run_tree_search(&bin->runs, run_mapelm) ==
|
|
|
|
NULL);
|
|
|
|
arena_run_tree_insert(&bin->runs, run_mapelm);
|
|
|
|
}
|
2007-11-27 03:12:15 +00:00
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
if (size <= small_maxclass) {
|
|
|
|
arena->stats.allocated_small -= size;
|
|
|
|
arena->stats.ndalloc_small++;
|
|
|
|
} else {
|
|
|
|
arena->stats.allocated_medium -= size;
|
|
|
|
arena->stats.ndalloc_medium++;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
arena_dalloc_bin_run(arena_t *arena, arena_chunk_t *chunk, arena_run_t *run,
|
|
|
|
arena_bin_t *bin)
|
|
|
|
{
|
|
|
|
size_t run_ind;
|
|
|
|
|
|
|
|
/* Deallocate run. */
|
|
|
|
if (run == bin->runcur)
|
|
|
|
bin->runcur = NULL;
|
|
|
|
else if (bin->nregs != 1) {
|
|
|
|
size_t run_pageind = (((uintptr_t)run -
|
|
|
|
(uintptr_t)chunk)) >> PAGE_SHIFT;
|
|
|
|
arena_chunk_map_t *run_mapelm =
|
|
|
|
&chunk->map[run_pageind];
|
|
|
|
/*
|
|
|
|
* This block's conditional is necessary because if the
|
|
|
|
* run only contains one region, then it never gets
|
|
|
|
* inserted into the non-full runs tree.
|
|
|
|
*/
|
|
|
|
arena_run_tree_remove(&bin->runs, run_mapelm);
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* Mark the first page as dirty. The dirty bit for every other page in
|
|
|
|
* the run is already properly set, which means we can call
|
|
|
|
* arena_run_dalloc(..., false), thus potentially avoiding the needless
|
|
|
|
* creation of many dirty pages.
|
|
|
|
*/
|
|
|
|
run_ind = (size_t)(((uintptr_t)run - (uintptr_t)chunk) >> PAGE_SHIFT);
|
|
|
|
assert((chunk->map[run_ind].bits & CHUNK_MAP_DIRTY) == 0);
|
|
|
|
chunk->map[run_ind].bits |= CHUNK_MAP_DIRTY;
|
|
|
|
chunk->ndirty++;
|
|
|
|
arena->ndirty++;
|
|
|
|
|
|
|
|
#ifdef MALLOC_DEBUG
|
|
|
|
run->magic = 0;
|
|
|
|
#endif
|
|
|
|
arena_run_dalloc(arena, run, false);
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
bin->stats.curruns--;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
if (chunk->dirtied == false) {
|
|
|
|
arena_chunk_tree_dirty_insert(&arena->chunks_dirty, chunk);
|
|
|
|
chunk->dirtied = true;
|
|
|
|
}
|
|
|
|
/* Enforce opt_lg_dirty_mult. */
|
|
|
|
if (opt_lg_dirty_mult >= 0 && (arena->nactive >> opt_lg_dirty_mult) <
|
|
|
|
arena->ndirty)
|
|
|
|
arena_purge(arena);
|
|
|
|
}
|
|
|
|
|
2007-11-27 03:12:15 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
static void
|
|
|
|
arena_stats_print(arena_t *arena)
|
|
|
|
{
|
|
|
|
|
|
|
|
malloc_printf("dirty pages: %zu:%zu active:dirty, %"PRIu64" sweep%s,"
|
|
|
|
" %"PRIu64" madvise%s, %"PRIu64" purged\n",
|
|
|
|
arena->nactive, arena->ndirty,
|
|
|
|
arena->stats.npurge, arena->stats.npurge == 1 ? "" : "s",
|
|
|
|
arena->stats.nmadvise, arena->stats.nmadvise == 1 ? "" : "s",
|
|
|
|
arena->stats.purged);
|
|
|
|
|
|
|
|
malloc_printf(" allocated nmalloc ndalloc\n");
|
|
|
|
malloc_printf("small: %12zu %12"PRIu64" %12"PRIu64"\n",
|
|
|
|
arena->stats.allocated_small, arena->stats.nmalloc_small,
|
|
|
|
arena->stats.ndalloc_small);
|
|
|
|
malloc_printf("medium: %12zu %12"PRIu64" %12"PRIu64"\n",
|
|
|
|
arena->stats.allocated_medium, arena->stats.nmalloc_medium,
|
|
|
|
arena->stats.ndalloc_medium);
|
|
|
|
malloc_printf("large: %12zu %12"PRIu64" %12"PRIu64"\n",
|
|
|
|
arena->stats.allocated_large, arena->stats.nmalloc_large,
|
|
|
|
arena->stats.ndalloc_large);
|
|
|
|
malloc_printf("total: %12zu %12"PRIu64" %12"PRIu64"\n",
|
|
|
|
arena->stats.allocated_small + arena->stats.allocated_medium +
|
|
|
|
arena->stats.allocated_large, arena->stats.nmalloc_small +
|
|
|
|
arena->stats.nmalloc_medium + arena->stats.nmalloc_large,
|
|
|
|
arena->stats.ndalloc_small + arena->stats.ndalloc_medium +
|
|
|
|
arena->stats.ndalloc_large);
|
|
|
|
malloc_printf("mapped: %12zu\n", arena->stats.mapped);
|
|
|
|
|
|
|
|
if (arena->stats.nmalloc_small + arena->stats.nmalloc_medium > 0) {
|
|
|
|
unsigned i, gap_start;
|
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
malloc_printf("bins: bin size regs pgs requests "
|
|
|
|
"nfills nflushes newruns reruns maxruns curruns\n");
|
|
|
|
#else
|
|
|
|
malloc_printf("bins: bin size regs pgs requests "
|
|
|
|
"newruns reruns maxruns curruns\n");
|
|
|
|
#endif
|
|
|
|
for (i = 0, gap_start = UINT_MAX; i < nbins; i++) {
|
|
|
|
if (arena->bins[i].stats.nruns == 0) {
|
|
|
|
if (gap_start == UINT_MAX)
|
|
|
|
gap_start = i;
|
|
|
|
} else {
|
|
|
|
if (gap_start != UINT_MAX) {
|
|
|
|
if (i > gap_start + 1) {
|
|
|
|
/*
|
|
|
|
* Gap of more than one size
|
|
|
|
* class.
|
|
|
|
*/
|
|
|
|
malloc_printf("[%u..%u]\n",
|
|
|
|
gap_start, i - 1);
|
|
|
|
} else {
|
|
|
|
/* Gap of one size class. */
|
|
|
|
malloc_printf("[%u]\n",
|
|
|
|
gap_start);
|
|
|
|
}
|
|
|
|
gap_start = UINT_MAX;
|
|
|
|
}
|
|
|
|
malloc_printf(
|
|
|
|
"%13u %1s %5u %4u %3u %9"PRIu64" %9"PRIu64
|
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
" %9"PRIu64" %9"PRIu64
|
|
|
|
#endif
|
|
|
|
" %9"PRIu64" %7zu %7zu\n",
|
|
|
|
i,
|
|
|
|
i < ntbins ? "T" : i < ntbins + nqbins ?
|
|
|
|
"Q" : i < ntbins + nqbins + ncbins ? "C" :
|
|
|
|
i < ntbins + nqbins + ncbins + nsbins ? "S"
|
|
|
|
: "M",
|
|
|
|
arena->bins[i].reg_size,
|
|
|
|
arena->bins[i].nregs,
|
|
|
|
arena->bins[i].run_size >> PAGE_SHIFT,
|
|
|
|
arena->bins[i].stats.nrequests,
|
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
arena->bins[i].stats.nfills,
|
|
|
|
arena->bins[i].stats.nflushes,
|
|
|
|
#endif
|
|
|
|
arena->bins[i].stats.nruns,
|
|
|
|
arena->bins[i].stats.reruns,
|
|
|
|
arena->bins[i].stats.highruns,
|
|
|
|
arena->bins[i].stats.curruns);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gap_start != UINT_MAX) {
|
|
|
|
if (i > gap_start + 1) {
|
|
|
|
/* Gap of more than one size class. */
|
|
|
|
malloc_printf("[%u..%u]\n", gap_start, i - 1);
|
|
|
|
} else {
|
|
|
|
/* Gap of one size class. */
|
|
|
|
malloc_printf("[%u]\n", gap_start);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (arena->stats.nmalloc_large > 0) {
|
|
|
|
size_t i;
|
|
|
|
ssize_t gap_start;
|
|
|
|
size_t nlclasses = (chunksize - PAGE_SIZE) >> PAGE_SHIFT;
|
|
|
|
|
|
|
|
malloc_printf(
|
|
|
|
"large: size pages nrequests maxruns curruns\n");
|
|
|
|
|
|
|
|
for (i = 0, gap_start = -1; i < nlclasses; i++) {
|
|
|
|
if (arena->stats.lstats[i].nrequests == 0) {
|
|
|
|
if (gap_start == -1)
|
|
|
|
gap_start = i;
|
|
|
|
} else {
|
|
|
|
if (gap_start != -1) {
|
|
|
|
malloc_printf("[%zu]\n", i - gap_start);
|
|
|
|
gap_start = -1;
|
|
|
|
}
|
|
|
|
malloc_printf(
|
|
|
|
"%13zu %5zu %9"PRIu64" %9zu %9zu\n",
|
|
|
|
(i+1) << PAGE_SHIFT, i+1,
|
|
|
|
arena->stats.lstats[i].nrequests,
|
|
|
|
arena->stats.lstats[i].highruns,
|
|
|
|
arena->stats.lstats[i].curruns);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gap_start != -1)
|
|
|
|
malloc_printf("[%zu]\n", i - gap_start);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
static void
|
|
|
|
stats_print_atexit(void)
|
|
|
|
{
|
|
|
|
|
|
|
|
#if (defined(MALLOC_TCACHE) && defined(MALLOC_STATS))
|
|
|
|
unsigned i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Merge stats from extant threads. This is racy, since individual
|
|
|
|
* threads do not lock when recording tcache stats events. As a
|
|
|
|
* consequence, the final stats may be slightly out of date by the time
|
|
|
|
* they are reported, if other threads continue to allocate.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < narenas; i++) {
|
|
|
|
arena_t *arena = arenas[i];
|
|
|
|
if (arena != NULL) {
|
|
|
|
tcache_t *tcache;
|
|
|
|
|
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
ql_foreach(tcache, &arena->tcache_ql, link) {
|
|
|
|
tcache_stats_merge(tcache, arena);
|
|
|
|
}
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
}
|
|
|
|
}
|
2007-11-27 03:12:15 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_stats_print();
|
2007-11-27 03:12:15 +00:00
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
static void
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_bin_flush(tcache_bin_t *tbin, size_t binind, unsigned rem)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
|
|
|
arena_chunk_t *chunk;
|
|
|
|
arena_t *arena;
|
2010-01-31 23:16:10 +00:00
|
|
|
void *ptr;
|
|
|
|
unsigned i, ndeferred, ncached;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
for (ndeferred = tbin->ncached - rem; ndeferred > 0;) {
|
|
|
|
ncached = ndeferred;
|
|
|
|
/* Lock the arena associated with the first object. */
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(tbin->slots[0]);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
arena = chunk->arena;
|
|
|
|
malloc_spin_lock(&arena->lock);
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Deallocate every object that belongs to the locked arena. */
|
|
|
|
for (i = ndeferred = 0; i < ncached; i++) {
|
|
|
|
ptr = tbin->slots[i];
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
if (chunk->arena == arena) {
|
2010-01-31 23:16:10 +00:00
|
|
|
size_t pageind = (((uintptr_t)ptr -
|
2008-09-10 14:27:34 +00:00
|
|
|
(uintptr_t)chunk) >> PAGE_SHIFT);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
arena_chunk_map_t *mapelm =
|
|
|
|
&chunk->map[pageind];
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_dalloc_bin(arena, chunk, ptr, mapelm);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
} else {
|
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* This object was allocated via a different
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
* arena than the one that is currently locked.
|
2010-01-31 23:16:10 +00:00
|
|
|
* Stash the object, so that it can be handled
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
* in a future pass.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
tbin->slots[ndeferred] = ptr;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
ndeferred++;
|
|
|
|
}
|
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
arena->bins[binind].stats.nflushes++;
|
|
|
|
{
|
|
|
|
arena_bin_t *bin = &arena->bins[binind];
|
|
|
|
bin->stats.nrequests += tbin->tstats.nrequests;
|
|
|
|
if (bin->reg_size <= small_maxclass) {
|
|
|
|
arena->stats.nmalloc_small +=
|
|
|
|
tbin->tstats.nrequests;
|
|
|
|
} else {
|
|
|
|
arena->stats.nmalloc_medium +=
|
|
|
|
tbin->tstats.nrequests;
|
|
|
|
}
|
|
|
|
tbin->tstats.nrequests = 0;
|
|
|
|
}
|
|
|
|
#endif
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (rem > 0) {
|
|
|
|
/*
|
|
|
|
* Shift the remaining valid pointers to the base of the slots
|
|
|
|
* array.
|
|
|
|
*/
|
|
|
|
memmove(&tbin->slots[0], &tbin->slots[tbin->ncached - rem],
|
|
|
|
rem * sizeof(void *));
|
|
|
|
}
|
|
|
|
tbin->ncached = rem;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_dalloc(tcache_t *tcache, void *ptr)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
|
|
|
arena_t *arena;
|
|
|
|
arena_chunk_t *chunk;
|
|
|
|
arena_run_t *run;
|
|
|
|
arena_bin_t *bin;
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_bin_t *tbin;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
size_t pageind, binind;
|
|
|
|
arena_chunk_map_t *mapelm;
|
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
|
|
|
|
arena = chunk->arena;
|
2008-09-10 14:27:34 +00:00
|
|
|
pageind = (((uintptr_t)ptr - (uintptr_t)chunk) >> PAGE_SHIFT);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
mapelm = &chunk->map[pageind];
|
2010-01-31 23:16:10 +00:00
|
|
|
run = (arena_run_t *)((uintptr_t)chunk + (uintptr_t)((pageind -
|
|
|
|
((mapelm->bits & CHUNK_MAP_PG_MASK) >> CHUNK_MAP_PG_SHIFT)) <<
|
|
|
|
PAGE_SHIFT));
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
assert(run->magic == ARENA_RUN_MAGIC);
|
|
|
|
bin = run->bin;
|
|
|
|
binind = ((uintptr_t)bin - (uintptr_t)&arena->bins) /
|
|
|
|
sizeof(arena_bin_t);
|
|
|
|
assert(binind < nbins);
|
|
|
|
|
|
|
|
if (opt_junk)
|
|
|
|
memset(ptr, 0x5a, arena->bins[binind].reg_size);
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
tbin = tcache->tbins[binind];
|
|
|
|
if (tbin == NULL) {
|
|
|
|
tbin = tcache_bin_create(choose_arena());
|
|
|
|
if (tbin == NULL) {
|
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
arena_dalloc_bin(arena, chunk, ptr, mapelm);
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
return;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache->tbins[binind] = tbin;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
|
|
|
|
if (tbin->ncached == tcache_nslots)
|
|
|
|
tcache_bin_flush(tbin, binind, (tcache_nslots >> 1));
|
|
|
|
assert(tbin->ncached < tcache_nslots);
|
|
|
|
tbin->slots[tbin->ncached] = ptr;
|
|
|
|
tbin->ncached++;
|
|
|
|
if (tbin->ncached > tbin->high_water)
|
|
|
|
tbin->high_water = tbin->ncached;
|
|
|
|
|
|
|
|
tcache_event(tcache);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
1995-09-16 09:28:13 +00:00
|
|
|
static void
|
2008-02-06 02:59:54 +00:00
|
|
|
arena_dalloc_large(arena_t *arena, arena_chunk_t *chunk, void *ptr)
|
|
|
|
{
|
2010-01-31 23:16:10 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Large allocation. */
|
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
|
|
|
|
#ifndef MALLOC_STATS
|
|
|
|
if (opt_junk)
|
|
|
|
#endif
|
|
|
|
{
|
2008-07-18 19:35:44 +00:00
|
|
|
size_t pageind = ((uintptr_t)ptr - (uintptr_t)chunk) >>
|
2008-09-10 14:27:34 +00:00
|
|
|
PAGE_SHIFT;
|
|
|
|
size_t size = chunk->map[pageind].bits & ~PAGE_MASK;
|
2008-02-06 02:59:54 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
if (opt_junk)
|
|
|
|
#endif
|
|
|
|
memset(ptr, 0x5a, size);
|
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->stats.ndalloc_large++;
|
2008-02-06 02:59:54 +00:00
|
|
|
arena->stats.allocated_large -= size;
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns--;
|
2008-02-06 02:59:54 +00:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
arena_run_dalloc(arena, (arena_run_t *)ptr, true);
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
2006-03-30 20:25:52 +00:00
|
|
|
arena_dalloc(arena_t *arena, arena_chunk_t *chunk, void *ptr)
|
1995-09-16 09:28:13 +00:00
|
|
|
{
|
2008-02-06 02:59:54 +00:00
|
|
|
size_t pageind;
|
2007-03-23 22:58:15 +00:00
|
|
|
arena_chunk_map_t *mapelm;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
assert(arena != NULL);
|
2006-03-17 09:00:27 +00:00
|
|
|
assert(arena->magic == ARENA_MAGIC);
|
2006-03-30 20:25:52 +00:00
|
|
|
assert(chunk->arena == arena);
|
2006-01-13 18:38:56 +00:00
|
|
|
assert(ptr != NULL);
|
|
|
|
assert(CHUNK_ADDR2BASE(ptr) != ptr);
|
|
|
|
|
2008-09-10 14:27:34 +00:00
|
|
|
pageind = (((uintptr_t)ptr - (uintptr_t)chunk) >> PAGE_SHIFT);
|
2007-03-23 22:58:15 +00:00
|
|
|
mapelm = &chunk->map[pageind];
|
2008-07-18 19:35:44 +00:00
|
|
|
assert((mapelm->bits & CHUNK_MAP_ALLOCATED) != 0);
|
|
|
|
if ((mapelm->bits & CHUNK_MAP_LARGE) == 0) {
|
2006-03-26 23:37:25 +00:00
|
|
|
/* Small allocation. */
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
if (__isthreaded && tcache_nslots) {
|
|
|
|
tcache_t *tcache = tcache_tls;
|
|
|
|
if ((uintptr_t)tcache > (uintptr_t)1)
|
|
|
|
tcache_dalloc(tcache, ptr);
|
|
|
|
else {
|
|
|
|
arena_dalloc_hard(arena, chunk, ptr, mapelm,
|
|
|
|
tcache);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
#endif
|
|
|
|
malloc_spin_lock(&arena->lock);
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_dalloc_bin(arena, chunk, ptr, mapelm);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
malloc_spin_unlock(&arena->lock);
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
#endif
|
2008-07-18 19:35:44 +00:00
|
|
|
} else
|
2008-02-06 02:59:54 +00:00
|
|
|
arena_dalloc_large(arena, chunk, ptr);
|
1995-09-16 09:28:13 +00:00
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
static void
|
|
|
|
arena_dalloc_hard(arena_t *arena, arena_chunk_t *chunk, void *ptr,
|
|
|
|
arena_chunk_map_t *mapelm, tcache_t *tcache)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (tcache == NULL) {
|
|
|
|
tcache = tcache_create(arena);
|
|
|
|
if (tcache == NULL) {
|
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
arena_dalloc_bin(arena, chunk, ptr, mapelm);
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
} else
|
|
|
|
tcache_dalloc(tcache, ptr);
|
|
|
|
} else {
|
|
|
|
/* This thread is currently exiting, so directly deallocate. */
|
|
|
|
assert(tcache == (void *)(uintptr_t)1);
|
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
arena_dalloc_bin(arena, chunk, ptr, mapelm);
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
static inline void
|
|
|
|
idalloc(void *ptr)
|
|
|
|
{
|
|
|
|
arena_chunk_t *chunk;
|
|
|
|
|
|
|
|
assert(ptr != NULL);
|
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
|
|
|
|
if (chunk != ptr)
|
|
|
|
arena_dalloc(chunk->arena, chunk, ptr);
|
|
|
|
else
|
|
|
|
huge_dalloc(ptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2008-02-17 18:34:17 +00:00
|
|
|
arena_ralloc_large_shrink(arena_t *arena, arena_chunk_t *chunk, void *ptr,
|
2008-02-08 00:35:56 +00:00
|
|
|
size_t size, size_t oldsize)
|
|
|
|
{
|
|
|
|
|
|
|
|
assert(size < oldsize);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Shrink the run, and make trailing pages available for other
|
|
|
|
* allocations.
|
|
|
|
*/
|
|
|
|
malloc_spin_lock(&arena->lock);
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_trim_tail(arena, chunk, (arena_run_t *)ptr, oldsize, size,
|
|
|
|
true);
|
2008-02-08 00:35:56 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->stats.ndalloc_large++;
|
|
|
|
arena->stats.allocated_large -= oldsize;
|
|
|
|
arena->stats.lstats[(oldsize >> PAGE_SHIFT) - 1].curruns--;
|
|
|
|
|
|
|
|
arena->stats.nmalloc_large++;
|
|
|
|
arena->stats.allocated_large += size;
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].nrequests++;
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns++;
|
|
|
|
if (arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns >
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].highruns) {
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].highruns =
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns;
|
|
|
|
}
|
2008-02-08 00:35:56 +00:00
|
|
|
#endif
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool
|
2008-02-17 18:34:17 +00:00
|
|
|
arena_ralloc_large_grow(arena_t *arena, arena_chunk_t *chunk, void *ptr,
|
2008-02-08 00:35:56 +00:00
|
|
|
size_t size, size_t oldsize)
|
|
|
|
{
|
2008-09-10 14:27:34 +00:00
|
|
|
size_t pageind = ((uintptr_t)ptr - (uintptr_t)chunk) >> PAGE_SHIFT;
|
|
|
|
size_t npages = oldsize >> PAGE_SHIFT;
|
2008-07-18 19:35:44 +00:00
|
|
|
|
2008-09-10 14:27:34 +00:00
|
|
|
assert(oldsize == (chunk->map[pageind].bits & ~PAGE_MASK));
|
2008-02-08 00:35:56 +00:00
|
|
|
|
|
|
|
/* Try to extend the run. */
|
|
|
|
assert(size > oldsize);
|
|
|
|
malloc_spin_lock(&arena->lock);
|
2008-07-18 19:35:44 +00:00
|
|
|
if (pageind + npages < chunk_npages && (chunk->map[pageind+npages].bits
|
|
|
|
& CHUNK_MAP_ALLOCATED) == 0 && (chunk->map[pageind+npages].bits &
|
2008-09-10 14:27:34 +00:00
|
|
|
~PAGE_MASK) >= size - oldsize) {
|
2008-02-08 00:35:56 +00:00
|
|
|
/*
|
|
|
|
* The next run is available and sufficiently large. Split the
|
|
|
|
* following run, then merge the first part with the existing
|
2008-07-18 19:35:44 +00:00
|
|
|
* allocation.
|
2008-02-08 00:35:56 +00:00
|
|
|
*/
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_run_split(arena, (arena_run_t *)((uintptr_t)chunk +
|
2008-09-10 14:27:34 +00:00
|
|
|
((pageind+npages) << PAGE_SHIFT)), size - oldsize, true,
|
2008-07-18 19:35:44 +00:00
|
|
|
false);
|
2008-02-08 00:35:56 +00:00
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
chunk->map[pageind].bits = size | CHUNK_MAP_LARGE |
|
|
|
|
CHUNK_MAP_ALLOCATED;
|
|
|
|
chunk->map[pageind+npages].bits = CHUNK_MAP_LARGE |
|
|
|
|
CHUNK_MAP_ALLOCATED;
|
2008-02-08 00:35:56 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->stats.ndalloc_large++;
|
|
|
|
arena->stats.allocated_large -= oldsize;
|
|
|
|
arena->stats.lstats[(oldsize >> PAGE_SHIFT) - 1].curruns--;
|
|
|
|
|
|
|
|
arena->stats.nmalloc_large++;
|
|
|
|
arena->stats.allocated_large += size;
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].nrequests++;
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns++;
|
|
|
|
if (arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns >
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].highruns) {
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].highruns =
|
|
|
|
arena->stats.lstats[(size >> PAGE_SHIFT) - 1].curruns;
|
|
|
|
}
|
2008-02-08 00:35:56 +00:00
|
|
|
#endif
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
|
|
|
|
return (true);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to resize a large allocation, in order to avoid copying. This will
|
|
|
|
* always fail if growing an object, and the following run is already in use.
|
|
|
|
*/
|
|
|
|
static bool
|
2008-02-17 18:34:17 +00:00
|
|
|
arena_ralloc_large(void *ptr, size_t size, size_t oldsize)
|
2008-02-08 00:35:56 +00:00
|
|
|
{
|
2008-02-17 18:34:17 +00:00
|
|
|
size_t psize;
|
2008-02-08 00:35:56 +00:00
|
|
|
|
2008-02-17 18:34:17 +00:00
|
|
|
psize = PAGE_CEILING(size);
|
|
|
|
if (psize == oldsize) {
|
|
|
|
/* Same size class. */
|
|
|
|
if (opt_junk && size < oldsize) {
|
|
|
|
memset((void *)((uintptr_t)ptr + size), 0x5a, oldsize -
|
|
|
|
size);
|
|
|
|
}
|
2008-02-08 00:35:56 +00:00
|
|
|
return (false);
|
|
|
|
} else {
|
2008-02-17 18:34:17 +00:00
|
|
|
arena_chunk_t *chunk;
|
|
|
|
arena_t *arena;
|
|
|
|
|
|
|
|
chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
|
|
|
|
arena = chunk->arena;
|
|
|
|
assert(arena->magic == ARENA_MAGIC);
|
|
|
|
|
|
|
|
if (psize < oldsize) {
|
|
|
|
/* Fill before shrinking in order avoid a race. */
|
|
|
|
if (opt_junk) {
|
|
|
|
memset((void *)((uintptr_t)ptr + size), 0x5a,
|
|
|
|
oldsize - size);
|
|
|
|
}
|
|
|
|
arena_ralloc_large_shrink(arena, chunk, ptr, psize,
|
|
|
|
oldsize);
|
|
|
|
return (false);
|
|
|
|
} else {
|
|
|
|
bool ret = arena_ralloc_large_grow(arena, chunk, ptr,
|
|
|
|
psize, oldsize);
|
|
|
|
if (ret == false && opt_zero) {
|
|
|
|
memset((void *)((uintptr_t)ptr + oldsize), 0,
|
|
|
|
size - oldsize);
|
|
|
|
}
|
|
|
|
return (ret);
|
|
|
|
}
|
2008-02-08 00:35:56 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *
|
|
|
|
arena_ralloc(void *ptr, size_t size, size_t oldsize)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
size_t copysize;
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/*
|
|
|
|
* Try to avoid moving the allocation.
|
|
|
|
*
|
|
|
|
* posix_memalign() can cause allocation of "large" objects that are
|
|
|
|
* smaller than bin_maxclass (in order to meet alignment requirements).
|
|
|
|
* Therefore, do not assume that (oldsize <= bin_maxclass) indicates
|
|
|
|
* ptr refers to a bin-allocated object.
|
|
|
|
*/
|
|
|
|
if (oldsize <= arena_maxclass) {
|
|
|
|
if (arena_is_large(ptr) == false ) {
|
|
|
|
if (size <= small_maxclass) {
|
|
|
|
if (oldsize <= small_maxclass &&
|
|
|
|
small_size2bin[size] ==
|
|
|
|
small_size2bin[oldsize])
|
|
|
|
goto IN_PLACE;
|
|
|
|
} else if (size <= bin_maxclass) {
|
|
|
|
if (small_maxclass < oldsize && oldsize <=
|
|
|
|
bin_maxclass && MEDIUM_CEILING(size) ==
|
|
|
|
MEDIUM_CEILING(oldsize))
|
|
|
|
goto IN_PLACE;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
assert(size <= arena_maxclass);
|
|
|
|
if (size > bin_maxclass) {
|
|
|
|
if (arena_ralloc_large(ptr, size, oldsize) ==
|
|
|
|
false)
|
|
|
|
return (ptr);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
/* Try to avoid moving the allocation. */
|
2010-01-31 23:16:10 +00:00
|
|
|
if (size <= small_maxclass) {
|
|
|
|
if (oldsize <= small_maxclass && small_size2bin[size] ==
|
|
|
|
small_size2bin[oldsize])
|
|
|
|
goto IN_PLACE;
|
|
|
|
} else if (size <= bin_maxclass) {
|
|
|
|
if (small_maxclass < oldsize && oldsize <= bin_maxclass &&
|
|
|
|
MEDIUM_CEILING(size) == MEDIUM_CEILING(oldsize))
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
goto IN_PLACE;
|
|
|
|
} else {
|
2010-01-31 23:16:10 +00:00
|
|
|
if (bin_maxclass < oldsize && oldsize <= arena_maxclass) {
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
assert(size > bin_maxclass);
|
|
|
|
if (arena_ralloc_large(ptr, size, oldsize) == false)
|
|
|
|
return (ptr);
|
|
|
|
}
|
2008-02-08 00:35:56 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we get here, then size and oldsize are different enough that we
|
|
|
|
* need to move the object. In that case, fall back to allocating new
|
|
|
|
* space and copying.
|
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = arena_malloc(size, false);
|
2008-02-08 00:35:56 +00:00
|
|
|
if (ret == NULL)
|
|
|
|
return (NULL);
|
|
|
|
|
|
|
|
/* Junk/zero-filling were already done by arena_malloc(). */
|
|
|
|
copysize = (size < oldsize) ? size : oldsize;
|
|
|
|
memcpy(ret, ptr, copysize);
|
|
|
|
idalloc(ptr);
|
|
|
|
return (ret);
|
|
|
|
IN_PLACE:
|
|
|
|
if (opt_junk && size < oldsize)
|
|
|
|
memset((void *)((uintptr_t)ptr + size), 0x5a, oldsize - size);
|
|
|
|
else if (opt_zero && size > oldsize)
|
|
|
|
memset((void *)((uintptr_t)ptr + oldsize), 0, size - oldsize);
|
|
|
|
return (ptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void *
|
|
|
|
iralloc(void *ptr, size_t size)
|
|
|
|
{
|
|
|
|
size_t oldsize;
|
|
|
|
|
|
|
|
assert(ptr != NULL);
|
|
|
|
assert(size != 0);
|
|
|
|
|
|
|
|
oldsize = isalloc(ptr);
|
|
|
|
|
|
|
|
if (size <= arena_maxclass)
|
|
|
|
return (arena_ralloc(ptr, size, oldsize));
|
|
|
|
else
|
|
|
|
return (huge_ralloc(ptr, size, oldsize));
|
|
|
|
}
|
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
static bool
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_new(arena_t *arena, unsigned ind)
|
1994-05-27 05:00:24 +00:00
|
|
|
{
|
2006-01-13 18:38:56 +00:00
|
|
|
unsigned i;
|
2006-03-17 09:00:27 +00:00
|
|
|
arena_bin_t *bin;
|
2008-08-08 20:42:42 +00:00
|
|
|
size_t prev_run_size;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2007-11-27 03:17:30 +00:00
|
|
|
if (malloc_spin_init(&arena->lock))
|
|
|
|
return (true);
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
memset(&arena->stats, 0, sizeof(arena_stats_t));
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->stats.lstats = (malloc_large_stats_t *)base_alloc(
|
|
|
|
sizeof(malloc_large_stats_t) * ((chunksize - PAGE_SIZE) >>
|
|
|
|
PAGE_SHIFT));
|
|
|
|
if (arena->stats.lstats == NULL)
|
|
|
|
return (true);
|
|
|
|
memset(arena->stats.lstats, 0, sizeof(malloc_large_stats_t) *
|
|
|
|
((chunksize - PAGE_SIZE) >> PAGE_SHIFT));
|
|
|
|
# ifdef MALLOC_TCACHE
|
|
|
|
ql_new(&arena->tcache_ql);
|
|
|
|
# endif
|
2006-03-17 09:00:27 +00:00
|
|
|
#endif
|
1995-10-08 18:44:20 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Initialize chunks. */
|
2008-05-01 17:25:55 +00:00
|
|
|
arena_chunk_tree_dirty_new(&arena->chunks_dirty);
|
2006-12-23 00:18:51 +00:00
|
|
|
arena->spare = NULL;
|
1997-05-30 20:39:32 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
arena->nactive = 0;
|
2008-02-06 02:59:54 +00:00
|
|
|
arena->ndirty = 0;
|
|
|
|
|
2008-07-18 19:35:44 +00:00
|
|
|
arena_avail_tree_new(&arena->runs_avail);
|
2008-02-06 02:59:54 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Initialize bins. */
|
2008-09-10 14:27:34 +00:00
|
|
|
prev_run_size = PAGE_SIZE;
|
2006-03-17 09:00:27 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
i = 0;
|
|
|
|
#ifdef MALLOC_TINY
|
2006-03-17 09:00:27 +00:00
|
|
|
/* (2^n)-spaced tiny bins. */
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
for (; i < ntbins; i++) {
|
2006-03-17 09:00:27 +00:00
|
|
|
bin = &arena->bins[i];
|
|
|
|
bin->runcur = NULL;
|
2008-05-01 17:25:55 +00:00
|
|
|
arena_run_tree_new(&bin->runs);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
bin->reg_size = (1U << (LG_TINY_MIN + i));
|
2006-03-17 09:00:27 +00:00
|
|
|
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
prev_run_size = arena_bin_run_size_calc(bin, prev_run_size);
|
2006-01-16 05:13:49 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2006-03-17 09:00:27 +00:00
|
|
|
memset(&bin->stats, 0, sizeof(malloc_bin_stats_t));
|
|
|
|
#endif
|
|
|
|
}
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Quantum-spaced bins. */
|
|
|
|
for (; i < ntbins + nqbins; i++) {
|
|
|
|
bin = &arena->bins[i];
|
|
|
|
bin->runcur = NULL;
|
2008-05-01 17:25:55 +00:00
|
|
|
arena_run_tree_new(&bin->runs);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
bin->reg_size = (i - ntbins + 1) << LG_QUANTUM;
|
2006-03-17 09:00:27 +00:00
|
|
|
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
prev_run_size = arena_bin_run_size_calc(bin, prev_run_size);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
memset(&bin->stats, 0, sizeof(malloc_bin_stats_t));
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
/* Cacheline-spaced bins. */
|
|
|
|
for (; i < ntbins + nqbins + ncbins; i++) {
|
2006-03-17 09:00:27 +00:00
|
|
|
bin = &arena->bins[i];
|
|
|
|
bin->runcur = NULL;
|
2008-05-01 17:25:55 +00:00
|
|
|
arena_run_tree_new(&bin->runs);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
bin->reg_size = cspace_min + ((i - (ntbins + nqbins)) <<
|
2010-01-31 23:16:10 +00:00
|
|
|
LG_CACHELINE);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
|
|
|
prev_run_size = arena_bin_run_size_calc(bin, prev_run_size);
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
memset(&bin->stats, 0, sizeof(malloc_bin_stats_t));
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Subpage-spaced bins. */
|
2010-01-31 23:16:10 +00:00
|
|
|
for (; i < ntbins + nqbins + ncbins + nsbins; i++) {
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
bin = &arena->bins[i];
|
|
|
|
bin->runcur = NULL;
|
|
|
|
arena_run_tree_new(&bin->runs);
|
|
|
|
|
|
|
|
bin->reg_size = sspace_min + ((i - (ntbins + nqbins + ncbins))
|
2010-01-31 23:16:10 +00:00
|
|
|
<< LG_SUBPAGE);
|
|
|
|
|
|
|
|
prev_run_size = arena_bin_run_size_calc(bin, prev_run_size);
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
memset(&bin->stats, 0, sizeof(malloc_bin_stats_t));
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Medium bins. */
|
|
|
|
for (; i < nbins; i++) {
|
|
|
|
bin = &arena->bins[i];
|
|
|
|
bin->runcur = NULL;
|
|
|
|
arena_run_tree_new(&bin->runs);
|
|
|
|
|
|
|
|
bin->reg_size = medium_min + ((i - (ntbins + nqbins + ncbins +
|
|
|
|
nsbins)) << lg_mspace);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
prev_run_size = arena_bin_run_size_calc(bin, prev_run_size);
|
2006-03-17 09:00:27 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
memset(&bin->stats, 0, sizeof(malloc_bin_stats_t));
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
2006-03-17 09:00:27 +00:00
|
|
|
}
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
#ifdef MALLOC_DEBUG
|
|
|
|
arena->magic = ARENA_MAGIC;
|
|
|
|
#endif
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-01-27 07:46:22 +00:00
|
|
|
return (false);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
1996-10-20 13:20:57 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Create a new arena and insert it into the arenas array at index ind. */
|
|
|
|
static arena_t *
|
|
|
|
arenas_extend(unsigned ind)
|
|
|
|
{
|
|
|
|
arena_t *ret;
|
1996-10-20 13:20:57 +00:00
|
|
|
|
2006-03-17 09:00:27 +00:00
|
|
|
/* Allocate enough space for trailing bins. */
|
|
|
|
ret = (arena_t *)base_alloc(sizeof(arena_t)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
+ (sizeof(arena_bin_t) * (nbins - 1)));
|
2010-01-31 23:16:10 +00:00
|
|
|
if (ret != NULL && arena_new(ret, ind) == false) {
|
2006-01-13 18:38:56 +00:00
|
|
|
arenas[ind] = ret;
|
2006-01-16 05:13:49 +00:00
|
|
|
return (ret);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
2006-01-16 05:13:49 +00:00
|
|
|
/* Only reached if there is an OOM error. */
|
1995-12-18 12:03:54 +00:00
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
/*
|
|
|
|
* OOM here is quite inconvenient to propagate, since dealing with it
|
|
|
|
* would require a check for failure in the fast path. Instead, punt
|
|
|
|
* by using arenas[0]. In practice, this is an extremely unlikely
|
|
|
|
* failure.
|
|
|
|
*/
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error initializing arena\n", "", "");
|
2006-01-16 05:13:49 +00:00
|
|
|
if (opt_abort)
|
|
|
|
abort();
|
|
|
|
|
|
|
|
return (arenas[0]);
|
1994-05-27 05:00:24 +00:00
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
static tcache_bin_t *
|
|
|
|
tcache_bin_create(arena_t *arena)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_bin_t *ret;
|
|
|
|
size_t tsize;
|
|
|
|
|
|
|
|
tsize = sizeof(tcache_bin_t) + (sizeof(void *) * (tcache_nslots - 1));
|
|
|
|
if (tsize <= small_maxclass)
|
|
|
|
ret = (tcache_bin_t *)arena_malloc_small(arena, tsize, false);
|
|
|
|
else if (tsize <= bin_maxclass)
|
|
|
|
ret = (tcache_bin_t *)arena_malloc_medium(arena, tsize, false);
|
|
|
|
else
|
|
|
|
ret = (tcache_bin_t *)imalloc(tsize);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
if (ret == NULL)
|
|
|
|
return (NULL);
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
memset(&ret->tstats, 0, sizeof(tcache_bin_stats_t));
|
|
|
|
#endif
|
|
|
|
ret->low_water = 0;
|
|
|
|
ret->high_water = 0;
|
|
|
|
ret->ncached = 0;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_bin_destroy(tcache_t *tcache, tcache_bin_t *tbin, unsigned binind)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
|
|
|
arena_t *arena;
|
|
|
|
arena_chunk_t *chunk;
|
2010-01-31 23:16:10 +00:00
|
|
|
size_t pageind, tsize;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
arena_chunk_map_t *mapelm;
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
chunk = CHUNK_ADDR2BASE(tbin);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
arena = chunk->arena;
|
2010-01-31 23:16:10 +00:00
|
|
|
pageind = (((uintptr_t)tbin - (uintptr_t)chunk) >> PAGE_SHIFT);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
mapelm = &chunk->map[pageind];
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
if (tbin->tstats.nrequests != 0) {
|
|
|
|
arena_t *arena = tcache->arena;
|
|
|
|
arena_bin_t *bin = &arena->bins[binind];
|
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
bin->stats.nrequests += tbin->tstats.nrequests;
|
|
|
|
if (bin->reg_size <= small_maxclass)
|
|
|
|
arena->stats.nmalloc_small += tbin->tstats.nrequests;
|
|
|
|
else
|
|
|
|
arena->stats.nmalloc_medium += tbin->tstats.nrequests;
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
assert(tbin->ncached == 0);
|
|
|
|
tsize = sizeof(tcache_bin_t) + (sizeof(void *) * (tcache_nslots - 1));
|
|
|
|
if (tsize <= bin_maxclass) {
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
malloc_spin_lock(&arena->lock);
|
2010-01-31 23:16:10 +00:00
|
|
|
arena_dalloc_bin(arena, chunk, tbin, mapelm);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
} else
|
2010-01-31 23:16:10 +00:00
|
|
|
idalloc(tbin);
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
static void
|
|
|
|
tcache_stats_merge(tcache_t *tcache, arena_t *arena)
|
|
|
|
{
|
|
|
|
unsigned i;
|
|
|
|
|
|
|
|
/* Merge and reset tcache stats. */
|
|
|
|
for (i = 0; i < mbin0; i++) {
|
|
|
|
arena_bin_t *bin = &arena->bins[i];
|
|
|
|
tcache_bin_t *tbin = tcache->tbins[i];
|
|
|
|
if (tbin != NULL) {
|
|
|
|
bin->stats.nrequests += tbin->tstats.nrequests;
|
|
|
|
arena->stats.nmalloc_small += tbin->tstats.nrequests;
|
|
|
|
tbin->tstats.nrequests = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
for (; i < nbins; i++) {
|
|
|
|
arena_bin_t *bin = &arena->bins[i];
|
|
|
|
tcache_bin_t *tbin = tcache->tbins[i];
|
|
|
|
if (tbin != NULL) {
|
|
|
|
bin->stats.nrequests += tbin->tstats.nrequests;
|
|
|
|
arena->stats.nmalloc_medium += tbin->tstats.nrequests;
|
|
|
|
tbin->tstats.nrequests = 0;
|
|
|
|
}
|
|
|
|
}
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
#endif
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
static tcache_t *
|
|
|
|
tcache_create(arena_t *arena)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_t *tcache;
|
|
|
|
|
|
|
|
if (sizeof(tcache_t) + (sizeof(tcache_bin_t *) * (nbins - 1)) <=
|
|
|
|
small_maxclass) {
|
|
|
|
tcache = (tcache_t *)arena_malloc_small(arena, sizeof(tcache_t)
|
|
|
|
+ (sizeof(tcache_bin_t *) * (nbins - 1)), true);
|
|
|
|
} else if (sizeof(tcache_t) + (sizeof(tcache_bin_t *) * (nbins - 1)) <=
|
|
|
|
bin_maxclass) {
|
|
|
|
tcache = (tcache_t *)arena_malloc_medium(arena, sizeof(tcache_t)
|
|
|
|
+ (sizeof(tcache_bin_t *) * (nbins - 1)), true);
|
|
|
|
} else {
|
|
|
|
tcache = (tcache_t *)icalloc(sizeof(tcache_t) +
|
|
|
|
(sizeof(tcache_bin_t *) * (nbins - 1)));
|
|
|
|
}
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (tcache == NULL)
|
|
|
|
return (NULL);
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
/* Link into list of extant tcaches. */
|
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
ql_elm_new(tcache, link);
|
|
|
|
ql_tail_insert(&arena->tcache_ql, tcache, link);
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
#endif
|
|
|
|
|
|
|
|
tcache->arena = arena;
|
|
|
|
|
|
|
|
tcache_tls = tcache;
|
|
|
|
|
|
|
|
return (tcache);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_destroy(tcache_t *tcache)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
2010-01-31 23:16:10 +00:00
|
|
|
unsigned i;
|
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
|
|
|
/* Unlink from list of extant tcaches. */
|
|
|
|
malloc_spin_lock(&tcache->arena->lock);
|
|
|
|
ql_remove(&tcache->arena->tcache_ql, tcache, link);
|
|
|
|
tcache_stats_merge(tcache, tcache->arena);
|
|
|
|
malloc_spin_unlock(&tcache->arena->lock);
|
|
|
|
#endif
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
|
|
|
for (i = 0; i < nbins; i++) {
|
2010-01-31 23:16:10 +00:00
|
|
|
tcache_bin_t *tbin = tcache->tbins[i];
|
|
|
|
if (tbin != NULL) {
|
|
|
|
tcache_bin_flush(tbin, i, 0);
|
|
|
|
tcache_bin_destroy(tcache, tbin, i);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (arena_salloc(tcache) <= bin_maxclass) {
|
|
|
|
arena_chunk_t *chunk = CHUNK_ADDR2BASE(tcache);
|
|
|
|
arena_t *arena = chunk->arena;
|
|
|
|
size_t pageind = (((uintptr_t)tcache - (uintptr_t)chunk) >>
|
|
|
|
PAGE_SHIFT);
|
|
|
|
arena_chunk_map_t *mapelm = &chunk->map[pageind];
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
arena_dalloc_bin(arena, chunk, tcache, mapelm);
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
|
|
|
} else
|
|
|
|
idalloc(tcache);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
1995-09-16 09:28:13 +00:00
|
|
|
/*
|
2006-01-13 18:38:56 +00:00
|
|
|
* End arena.
|
1995-09-16 09:28:13 +00:00
|
|
|
*/
|
2006-01-13 18:38:56 +00:00
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
2006-03-17 09:00:27 +00:00
|
|
|
* Begin general internal functions.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
static void *
|
2007-11-27 03:12:15 +00:00
|
|
|
huge_malloc(size_t size, bool zero)
|
1995-09-16 09:28:13 +00:00
|
|
|
{
|
2006-01-13 18:38:56 +00:00
|
|
|
void *ret;
|
2006-04-27 01:03:00 +00:00
|
|
|
size_t csize;
|
2008-02-06 02:59:54 +00:00
|
|
|
extent_node_t *node;
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-04-27 01:03:00 +00:00
|
|
|
/* Allocate one or more contiguous chunks for this request. */
|
1995-10-08 18:44:20 +00:00
|
|
|
|
2006-04-27 01:03:00 +00:00
|
|
|
csize = CHUNK_CEILING(size);
|
|
|
|
if (csize == 0) {
|
2006-01-13 18:38:56 +00:00
|
|
|
/* size is large enough to cause size_t wrap-around. */
|
2006-01-27 07:46:22 +00:00
|
|
|
return (NULL);
|
1996-10-20 13:20:57 +00:00
|
|
|
}
|
1995-10-08 18:44:20 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Allocate an extent node with which to track the chunk. */
|
|
|
|
node = base_node_alloc();
|
2006-01-27 07:46:22 +00:00
|
|
|
if (node == NULL)
|
|
|
|
return (NULL);
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
ret = chunk_alloc(csize, &zero);
|
2006-01-13 18:38:56 +00:00
|
|
|
if (ret == NULL) {
|
2008-02-06 02:59:54 +00:00
|
|
|
base_node_dealloc(node);
|
2006-01-27 07:46:22 +00:00
|
|
|
return (NULL);
|
1995-10-08 18:44:20 +00:00
|
|
|
}
|
|
|
|
|
2006-06-20 20:38:25 +00:00
|
|
|
/* Insert node into huge. */
|
2008-02-06 02:59:54 +00:00
|
|
|
node->addr = ret;
|
2006-04-27 01:03:00 +00:00
|
|
|
node->size = csize;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2007-12-31 00:59:16 +00:00
|
|
|
malloc_mutex_lock(&huge_mtx);
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_ad_insert(&huge, node);
|
2006-01-13 18:38:56 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2006-03-17 09:00:27 +00:00
|
|
|
huge_nmalloc++;
|
2006-04-27 01:03:00 +00:00
|
|
|
huge_allocated += csize;
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
2007-12-31 00:59:16 +00:00
|
|
|
malloc_mutex_unlock(&huge_mtx);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2007-11-27 03:12:15 +00:00
|
|
|
if (zero == false) {
|
|
|
|
if (opt_junk)
|
|
|
|
memset(ret, 0xa5, csize);
|
|
|
|
else if (opt_zero)
|
|
|
|
memset(ret, 0, csize);
|
|
|
|
}
|
2006-03-19 18:28:06 +00:00
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2007-03-23 22:58:15 +00:00
|
|
|
/* Only handles large allocations that require more than chunk alignment. */
|
|
|
|
static void *
|
|
|
|
huge_palloc(size_t alignment, size_t size)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
size_t alloc_size, chunk_size, offset;
|
2008-02-06 02:59:54 +00:00
|
|
|
extent_node_t *node;
|
2010-01-31 23:16:10 +00:00
|
|
|
bool zero;
|
2007-03-23 22:58:15 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This allocation requires alignment that is even larger than chunk
|
|
|
|
* alignment. This means that huge_malloc() isn't good enough.
|
|
|
|
*
|
|
|
|
* Allocate almost twice as many chunks as are demanded by the size or
|
|
|
|
* alignment, in order to assure the alignment can be achieved, then
|
|
|
|
* unmap leading and trailing chunks.
|
|
|
|
*/
|
2007-03-24 20:44:06 +00:00
|
|
|
assert(alignment >= chunksize);
|
2007-03-23 22:58:15 +00:00
|
|
|
|
|
|
|
chunk_size = CHUNK_CEILING(size);
|
|
|
|
|
|
|
|
if (size >= alignment)
|
|
|
|
alloc_size = chunk_size + alignment - chunksize;
|
|
|
|
else
|
|
|
|
alloc_size = (alignment << 1) - chunksize;
|
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/* Allocate an extent node with which to track the chunk. */
|
|
|
|
node = base_node_alloc();
|
2007-03-23 22:58:15 +00:00
|
|
|
if (node == NULL)
|
|
|
|
return (NULL);
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
zero = false;
|
|
|
|
ret = chunk_alloc(alloc_size, &zero);
|
2007-03-23 22:58:15 +00:00
|
|
|
if (ret == NULL) {
|
2008-02-06 02:59:54 +00:00
|
|
|
base_node_dealloc(node);
|
2007-03-23 22:58:15 +00:00
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
offset = (uintptr_t)ret & (alignment - 1);
|
|
|
|
assert((offset & chunksize_mask) == 0);
|
|
|
|
assert(offset < alloc_size);
|
|
|
|
if (offset == 0) {
|
|
|
|
/* Trim trailing space. */
|
|
|
|
chunk_dealloc((void *)((uintptr_t)ret + chunk_size), alloc_size
|
|
|
|
- chunk_size);
|
|
|
|
} else {
|
|
|
|
size_t trailsize;
|
|
|
|
|
|
|
|
/* Trim leading space. */
|
|
|
|
chunk_dealloc(ret, alignment - offset);
|
|
|
|
|
|
|
|
ret = (void *)((uintptr_t)ret + (alignment - offset));
|
|
|
|
|
|
|
|
trailsize = alloc_size - (alignment - offset) - chunk_size;
|
|
|
|
if (trailsize != 0) {
|
|
|
|
/* Trim trailing space. */
|
|
|
|
assert(trailsize < alloc_size);
|
|
|
|
chunk_dealloc((void *)((uintptr_t)ret + chunk_size),
|
|
|
|
trailsize);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Insert node into huge. */
|
2008-02-06 02:59:54 +00:00
|
|
|
node->addr = ret;
|
2007-03-23 22:58:15 +00:00
|
|
|
node->size = chunk_size;
|
|
|
|
|
2007-12-31 00:59:16 +00:00
|
|
|
malloc_mutex_lock(&huge_mtx);
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_ad_insert(&huge, node);
|
2007-03-23 22:58:15 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2007-03-28 19:55:07 +00:00
|
|
|
huge_nmalloc++;
|
|
|
|
huge_allocated += chunk_size;
|
2007-03-23 22:58:15 +00:00
|
|
|
#endif
|
2007-12-31 00:59:16 +00:00
|
|
|
malloc_mutex_unlock(&huge_mtx);
|
2007-03-23 22:58:15 +00:00
|
|
|
|
|
|
|
if (opt_junk)
|
|
|
|
memset(ret, 0xa5, chunk_size);
|
|
|
|
else if (opt_zero)
|
|
|
|
memset(ret, 0, chunk_size);
|
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2006-03-19 18:28:06 +00:00
|
|
|
static void *
|
|
|
|
huge_ralloc(void *ptr, size_t size, size_t oldsize)
|
|
|
|
{
|
|
|
|
void *ret;
|
2008-02-08 00:35:56 +00:00
|
|
|
size_t copysize;
|
2006-03-19 18:28:06 +00:00
|
|
|
|
2006-03-26 23:37:25 +00:00
|
|
|
/* Avoid moving the allocation if the size class would not change. */
|
2006-03-19 18:28:06 +00:00
|
|
|
if (oldsize > arena_maxclass &&
|
2007-06-15 22:00:16 +00:00
|
|
|
CHUNK_CEILING(size) == CHUNK_CEILING(oldsize)) {
|
|
|
|
if (opt_junk && size < oldsize) {
|
|
|
|
memset((void *)((uintptr_t)ptr + size), 0x5a, oldsize
|
|
|
|
- size);
|
|
|
|
} else if (opt_zero && size > oldsize) {
|
|
|
|
memset((void *)((uintptr_t)ptr + oldsize), 0, size
|
|
|
|
- oldsize);
|
|
|
|
}
|
2006-03-19 18:28:06 +00:00
|
|
|
return (ptr);
|
2007-06-15 22:00:16 +00:00
|
|
|
}
|
2006-03-19 18:28:06 +00:00
|
|
|
|
|
|
|
/*
|
2006-03-26 23:37:25 +00:00
|
|
|
* If we get here, then size and oldsize are different enough that we
|
|
|
|
* need to use a different size class. In that case, fall back to
|
|
|
|
* allocating new space and copying.
|
2006-03-19 18:28:06 +00:00
|
|
|
*/
|
2007-11-27 03:12:15 +00:00
|
|
|
ret = huge_malloc(size, false);
|
2006-03-19 18:28:06 +00:00
|
|
|
if (ret == NULL)
|
|
|
|
return (NULL);
|
|
|
|
|
2008-02-08 00:35:56 +00:00
|
|
|
copysize = (size < oldsize) ? size : oldsize;
|
|
|
|
memcpy(ret, ptr, copysize);
|
2006-03-19 18:28:06 +00:00
|
|
|
idalloc(ptr);
|
2006-01-13 18:38:56 +00:00
|
|
|
return (ret);
|
|
|
|
}
|
1996-10-20 13:20:57 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
static void
|
|
|
|
huge_dalloc(void *ptr)
|
|
|
|
{
|
2008-02-06 02:59:54 +00:00
|
|
|
extent_node_t *node, key;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2007-12-31 00:59:16 +00:00
|
|
|
malloc_mutex_lock(&huge_mtx);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/* Extract from tree of huge allocations. */
|
2008-02-06 02:59:54 +00:00
|
|
|
key.addr = ptr;
|
2008-05-01 17:25:55 +00:00
|
|
|
node = extent_tree_ad_search(&huge, &key);
|
2006-01-13 18:38:56 +00:00
|
|
|
assert(node != NULL);
|
2008-02-06 02:59:54 +00:00
|
|
|
assert(node->addr == ptr);
|
2008-04-23 16:09:18 +00:00
|
|
|
extent_tree_ad_remove(&huge, node);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
2006-03-17 09:00:27 +00:00
|
|
|
huge_ndalloc++;
|
|
|
|
huge_allocated -= node->size;
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
1996-10-20 13:20:57 +00:00
|
|
|
|
2007-12-31 00:59:16 +00:00
|
|
|
malloc_mutex_unlock(&huge_mtx);
|
1996-10-20 13:20:57 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Unmap chunk. */
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
if (opt_dss && opt_junk)
|
2008-02-06 02:59:54 +00:00
|
|
|
memset(node->addr, 0x5a, node->size);
|
2006-03-19 18:28:06 +00:00
|
|
|
#endif
|
2008-02-06 02:59:54 +00:00
|
|
|
chunk_dealloc(node->addr, node->size);
|
1996-10-20 13:20:57 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
base_node_dealloc(node);
|
1994-05-27 05:00:24 +00:00
|
|
|
}
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
static void
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_stats_print(void)
|
2006-01-13 18:38:56 +00:00
|
|
|
{
|
2010-01-31 23:16:10 +00:00
|
|
|
char s[UMAX2S_BUFSIZE];
|
1995-10-08 18:44:20 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
_malloc_message("___ Begin malloc statistics ___\n", "", "", "");
|
|
|
|
_malloc_message("Assertions ",
|
2006-01-13 18:38:56 +00:00
|
|
|
#ifdef NDEBUG
|
2010-01-31 23:16:10 +00:00
|
|
|
"disabled",
|
2006-01-13 18:38:56 +00:00
|
|
|
#else
|
2010-01-31 23:16:10 +00:00
|
|
|
"enabled",
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
"\n", "");
|
|
|
|
_malloc_message("Boolean MALLOC_OPTIONS: ", opt_abort ? "A" : "a", "", "");
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2010-01-31 23:16:10 +00:00
|
|
|
_malloc_message(opt_dss ? "D" : "d", "", "", "");
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
_malloc_message(opt_junk ? "J" : "j", "", "", "");
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2010-01-31 23:16:10 +00:00
|
|
|
_malloc_message(opt_mmap ? "M" : "m", "", "", "");
|
|
|
|
#endif
|
|
|
|
_malloc_message("P", "", "", "");
|
|
|
|
_malloc_message(opt_utrace ? "U" : "u", "", "", "");
|
|
|
|
_malloc_message(opt_sysv ? "V" : "v", "", "", "");
|
|
|
|
_malloc_message(opt_xmalloc ? "X" : "x", "", "", "");
|
|
|
|
_malloc_message(opt_zero ? "Z" : "z", "", "", "");
|
|
|
|
_malloc_message("\n", "", "", "");
|
|
|
|
|
|
|
|
_malloc_message("CPUs: ", umax2s(ncpus, 10, s), "\n", "");
|
|
|
|
_malloc_message("Max arenas: ", umax2s(narenas, 10, s), "\n", "");
|
|
|
|
_malloc_message("Pointer size: ", umax2s(sizeof(void *), 10, s), "\n", "");
|
|
|
|
_malloc_message("Quantum size: ", umax2s(QUANTUM, 10, s), "\n", "");
|
|
|
|
_malloc_message("Cacheline size (assumed): ",
|
|
|
|
umax2s(CACHELINE, 10, s), "\n", "");
|
|
|
|
_malloc_message("Subpage spacing: ", umax2s(SUBPAGE, 10, s), "\n", "");
|
|
|
|
_malloc_message("Medium spacing: ", umax2s((1U << lg_mspace), 10, s), "\n",
|
|
|
|
"");
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#ifdef MALLOC_TINY
|
2010-01-31 23:16:10 +00:00
|
|
|
_malloc_message("Tiny 2^n-spaced sizes: [", umax2s((1U << LG_TINY_MIN), 10,
|
|
|
|
s), "..", "");
|
|
|
|
_malloc_message(umax2s((qspace_min >> 1), 10, s), "]\n", "", "");
|
|
|
|
#endif
|
|
|
|
_malloc_message("Quantum-spaced sizes: [", umax2s(qspace_min, 10, s), "..",
|
|
|
|
"");
|
|
|
|
_malloc_message(umax2s(qspace_max, 10, s), "]\n", "", "");
|
|
|
|
_malloc_message("Cacheline-spaced sizes: [",
|
|
|
|
umax2s(cspace_min, 10, s), "..", "");
|
|
|
|
_malloc_message(umax2s(cspace_max, 10, s), "]\n", "", "");
|
|
|
|
_malloc_message("Subpage-spaced sizes: [", umax2s(sspace_min, 10, s), "..",
|
|
|
|
"");
|
|
|
|
_malloc_message(umax2s(sspace_max, 10, s), "]\n", "", "");
|
|
|
|
_malloc_message("Medium sizes: [", umax2s(medium_min, 10, s), "..", "");
|
|
|
|
_malloc_message(umax2s(medium_max, 10, s), "]\n", "", "");
|
|
|
|
if (opt_lg_dirty_mult >= 0) {
|
|
|
|
_malloc_message("Min active:dirty page ratio per arena: ",
|
|
|
|
umax2s((1U << opt_lg_dirty_mult), 10, s), ":1\n", "");
|
|
|
|
} else {
|
|
|
|
_malloc_message("Min active:dirty page ratio per arena: N/A\n", "",
|
|
|
|
"", "");
|
|
|
|
}
|
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
_malloc_message("Thread cache slots per size class: ",
|
|
|
|
tcache_nslots ? umax2s(tcache_nslots, 10, s) : "N/A", "\n", "");
|
|
|
|
_malloc_message("Thread cache GC sweep interval: ",
|
|
|
|
(tcache_nslots && tcache_gc_incr > 0) ?
|
|
|
|
umax2s((1U << opt_lg_tcache_gc_sweep), 10, s) : "N/A", "", "");
|
|
|
|
_malloc_message(" (increment interval: ",
|
|
|
|
(tcache_nslots && tcache_gc_incr > 0) ? umax2s(tcache_gc_incr, 10, s)
|
|
|
|
: "N/A", ")\n", "");
|
|
|
|
#endif
|
|
|
|
_malloc_message("Chunk size: ", umax2s(chunksize, 10, s), "", "");
|
|
|
|
_malloc_message(" (2^", umax2s(opt_lg_chunk, 10, s), ")\n", "");
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
{
|
|
|
|
size_t allocated, mapped;
|
|
|
|
unsigned i;
|
|
|
|
arena_t *arena;
|
|
|
|
|
|
|
|
/* Calculate and print allocated/mapped stats. */
|
|
|
|
|
|
|
|
/* arenas. */
|
|
|
|
for (i = 0, allocated = 0; i < narenas; i++) {
|
|
|
|
if (arenas[i] != NULL) {
|
|
|
|
malloc_spin_lock(&arenas[i]->lock);
|
|
|
|
allocated += arenas[i]->stats.allocated_small;
|
|
|
|
allocated += arenas[i]->stats.allocated_large;
|
|
|
|
malloc_spin_unlock(&arenas[i]->lock);
|
2006-03-26 23:37:25 +00:00
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
}
|
2006-03-26 23:37:25 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* huge/base. */
|
|
|
|
malloc_mutex_lock(&huge_mtx);
|
|
|
|
allocated += huge_allocated;
|
|
|
|
mapped = stats_chunks.curchunks * chunksize;
|
|
|
|
malloc_mutex_unlock(&huge_mtx);
|
2006-03-26 23:37:25 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_mutex_lock(&base_mtx);
|
|
|
|
mapped += base_mapped;
|
|
|
|
malloc_mutex_unlock(&base_mtx);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_printf("Allocated: %zu, mapped: %zu\n", allocated,
|
|
|
|
mapped);
|
2006-09-08 17:52:15 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Print chunk stats. */
|
|
|
|
{
|
|
|
|
chunk_stats_t chunks_stats;
|
2007-11-27 03:17:30 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_mutex_lock(&huge_mtx);
|
|
|
|
chunks_stats = stats_chunks;
|
|
|
|
malloc_mutex_unlock(&huge_mtx);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_printf("chunks: nchunks "
|
|
|
|
"highchunks curchunks\n");
|
|
|
|
malloc_printf(" %13"PRIu64"%13zu%13zu\n",
|
|
|
|
chunks_stats.nchunks, chunks_stats.highchunks,
|
|
|
|
chunks_stats.curchunks);
|
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Print chunk stats. */
|
|
|
|
malloc_printf(
|
|
|
|
"huge: nmalloc ndalloc allocated\n");
|
|
|
|
malloc_printf(" %12"PRIu64" %12"PRIu64" %12zu\n", huge_nmalloc,
|
|
|
|
huge_ndalloc, huge_allocated);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Print stats for each arena. */
|
|
|
|
for (i = 0; i < narenas; i++) {
|
|
|
|
arena = arenas[i];
|
|
|
|
if (arena != NULL) {
|
|
|
|
malloc_printf("\narenas[%u]:\n", i);
|
|
|
|
malloc_spin_lock(&arena->lock);
|
|
|
|
arena_stats_print(arena);
|
|
|
|
malloc_spin_unlock(&arena->lock);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
#endif /* #ifdef MALLOC_STATS */
|
|
|
|
_malloc_message("--- End malloc statistics ---\n", "", "", "");
|
1994-05-27 05:00:24 +00:00
|
|
|
}
|
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#ifdef MALLOC_DEBUG
|
|
|
|
static void
|
2010-01-31 23:16:10 +00:00
|
|
|
small_size2bin_validate(void)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
|
|
|
size_t i, size, binind;
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
assert(small_size2bin[0] == 0xffU);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
i = 1;
|
|
|
|
# ifdef MALLOC_TINY
|
|
|
|
/* Tiny. */
|
2010-01-31 23:16:10 +00:00
|
|
|
for (; i < (1U << LG_TINY_MIN); i++) {
|
|
|
|
size = pow2_ceil(1U << LG_TINY_MIN);
|
|
|
|
binind = ffs((int)(size >> (LG_TINY_MIN + 1)));
|
|
|
|
assert(small_size2bin[i] == binind);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
for (; i < qspace_min; i++) {
|
|
|
|
size = pow2_ceil(i);
|
2010-01-31 23:16:10 +00:00
|
|
|
binind = ffs((int)(size >> (LG_TINY_MIN + 1)));
|
|
|
|
assert(small_size2bin[i] == binind);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
# endif
|
|
|
|
/* Quantum-spaced. */
|
|
|
|
for (; i <= qspace_max; i++) {
|
|
|
|
size = QUANTUM_CEILING(i);
|
2010-01-31 23:16:10 +00:00
|
|
|
binind = ntbins + (size >> LG_QUANTUM) - 1;
|
|
|
|
assert(small_size2bin[i] == binind);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
/* Cacheline-spaced. */
|
|
|
|
for (; i <= cspace_max; i++) {
|
|
|
|
size = CACHELINE_CEILING(i);
|
|
|
|
binind = ntbins + nqbins + ((size - cspace_min) >>
|
2010-01-31 23:16:10 +00:00
|
|
|
LG_CACHELINE);
|
|
|
|
assert(small_size2bin[i] == binind);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
/* Sub-page. */
|
|
|
|
for (; i <= sspace_max; i++) {
|
|
|
|
size = SUBPAGE_CEILING(i);
|
|
|
|
binind = ntbins + nqbins + ncbins + ((size - sspace_min)
|
2010-01-31 23:16:10 +00:00
|
|
|
>> LG_SUBPAGE);
|
|
|
|
assert(small_size2bin[i] == binind);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
static bool
|
2010-01-31 23:16:10 +00:00
|
|
|
small_size2bin_init(void)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_qspace_max != LG_QSPACE_MAX_DEFAULT
|
|
|
|
|| opt_lg_cspace_max != LG_CSPACE_MAX_DEFAULT
|
|
|
|
|| sizeof(const_small_size2bin) != small_maxclass + 1)
|
|
|
|
return (small_size2bin_init_hard());
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
small_size2bin = const_small_size2bin;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#ifdef MALLOC_DEBUG
|
2010-01-31 23:16:10 +00:00
|
|
|
assert(sizeof(const_small_size2bin) == small_maxclass + 1);
|
|
|
|
small_size2bin_validate();
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool
|
2010-01-31 23:16:10 +00:00
|
|
|
small_size2bin_init_hard(void)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
{
|
|
|
|
size_t i, size, binind;
|
2010-01-31 23:16:10 +00:00
|
|
|
uint8_t *custom_small_size2bin;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
assert(opt_lg_qspace_max != LG_QSPACE_MAX_DEFAULT
|
|
|
|
|| opt_lg_cspace_max != LG_CSPACE_MAX_DEFAULT
|
|
|
|
|| sizeof(const_small_size2bin) != small_maxclass + 1);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
custom_small_size2bin = (uint8_t *)base_alloc(small_maxclass + 1);
|
|
|
|
if (custom_small_size2bin == NULL)
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
return (true);
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
custom_small_size2bin[0] = 0xffU;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
i = 1;
|
|
|
|
#ifdef MALLOC_TINY
|
|
|
|
/* Tiny. */
|
2010-01-31 23:16:10 +00:00
|
|
|
for (; i < (1U << LG_TINY_MIN); i++) {
|
|
|
|
size = pow2_ceil(1U << LG_TINY_MIN);
|
|
|
|
binind = ffs((int)(size >> (LG_TINY_MIN + 1)));
|
|
|
|
custom_small_size2bin[i] = binind;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
for (; i < qspace_min; i++) {
|
|
|
|
size = pow2_ceil(i);
|
2010-01-31 23:16:10 +00:00
|
|
|
binind = ffs((int)(size >> (LG_TINY_MIN + 1)));
|
|
|
|
custom_small_size2bin[i] = binind;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
/* Quantum-spaced. */
|
|
|
|
for (; i <= qspace_max; i++) {
|
|
|
|
size = QUANTUM_CEILING(i);
|
2010-01-31 23:16:10 +00:00
|
|
|
binind = ntbins + (size >> LG_QUANTUM) - 1;
|
|
|
|
custom_small_size2bin[i] = binind;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
/* Cacheline-spaced. */
|
|
|
|
for (; i <= cspace_max; i++) {
|
|
|
|
size = CACHELINE_CEILING(i);
|
|
|
|
binind = ntbins + nqbins + ((size - cspace_min) >>
|
2010-01-31 23:16:10 +00:00
|
|
|
LG_CACHELINE);
|
|
|
|
custom_small_size2bin[i] = binind;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
/* Sub-page. */
|
|
|
|
for (; i <= sspace_max; i++) {
|
|
|
|
size = SUBPAGE_CEILING(i);
|
|
|
|
binind = ntbins + nqbins + ncbins + ((size - sspace_min) >>
|
2010-01-31 23:16:10 +00:00
|
|
|
LG_SUBPAGE);
|
|
|
|
custom_small_size2bin[i] = binind;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
small_size2bin = custom_small_size2bin;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#ifdef MALLOC_DEBUG
|
2010-01-31 23:16:10 +00:00
|
|
|
small_size2bin_validate();
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
static unsigned
|
|
|
|
malloc_ncpus(void)
|
|
|
|
{
|
2010-08-17 09:13:26 +00:00
|
|
|
int mib[2];
|
2010-01-31 23:16:10 +00:00
|
|
|
unsigned ret;
|
2010-08-17 09:13:26 +00:00
|
|
|
int error;
|
|
|
|
size_t len;
|
|
|
|
|
|
|
|
error = _elf_aux_info(AT_NCPUS, &ret, sizeof(ret));
|
|
|
|
if (error != 0 || ret == 0) {
|
|
|
|
mib[0] = CTL_HW;
|
|
|
|
mib[1] = HW_NCPU;
|
|
|
|
len = sizeof(ret);
|
|
|
|
if (sysctl(mib, 2, &ret, &len, (void *)NULL, 0) == -1) {
|
|
|
|
/* Error. */
|
|
|
|
ret = 1;
|
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
1994-05-27 05:00:24 +00:00
|
|
|
/*
|
2006-01-13 18:38:56 +00:00
|
|
|
* FreeBSD's pthreads implementation calls malloc(3), so the malloc
|
|
|
|
* implementation has to take pains to avoid infinite recursion during
|
|
|
|
* initialization.
|
1994-05-27 05:00:24 +00:00
|
|
|
*/
|
2006-03-17 09:00:27 +00:00
|
|
|
static inline bool
|
2006-01-13 18:38:56 +00:00
|
|
|
malloc_init(void)
|
1994-05-27 05:00:24 +00:00
|
|
|
{
|
1995-10-08 18:44:20 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
if (malloc_initialized == false)
|
2006-01-19 02:11:05 +00:00
|
|
|
return (malloc_init_hard());
|
|
|
|
|
|
|
|
return (false);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
1996-10-20 13:20:57 +00:00
|
|
|
|
2006-01-19 02:11:05 +00:00
|
|
|
static bool
|
2006-01-13 18:38:56 +00:00
|
|
|
malloc_init_hard(void)
|
|
|
|
{
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
unsigned i;
|
2006-01-13 18:38:56 +00:00
|
|
|
int linklen;
|
|
|
|
char buf[PATH_MAX + 1];
|
|
|
|
const char *opts;
|
|
|
|
|
2006-04-04 19:46:28 +00:00
|
|
|
malloc_mutex_lock(&init_lock);
|
|
|
|
if (malloc_initialized) {
|
|
|
|
/*
|
|
|
|
* Another thread initialized the allocator before this one
|
|
|
|
* acquired init_lock.
|
|
|
|
*/
|
|
|
|
malloc_mutex_unlock(&init_lock);
|
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Get number of CPUs. */
|
2010-01-31 23:16:10 +00:00
|
|
|
ncpus = malloc_ncpus();
|
1995-09-16 09:28:13 +00:00
|
|
|
|
Make malloc(3) superpage aware. Specifically, if getpagesizes(3) returns
a large page size that is greater than malloc(3)'s default chunk size but
less than or equal to 4 MB, then increase the chunk size to match the large
page size.
Most often, using a chunk size that is less than the large page size is not
a problem. However, consider a long-running application that allocates and
frees significant amounts of memory. In particular, it frees enough memory
at times that some of that memory is munmap()ed. Up until the first
munmap(), a 1MB chunk size is just fine; it's not a problem for the virtual
memory system. Two adjacent 1MB chunks that are aligned on a 2MB boundary
will be promoted automatically to a superpage even though they were
allocated at different times. The trouble begins with the munmap(),
releasing a 1MB chunk will trigger the demotion of the containing superpage,
leaving behind a half-used 2MB reservation. Now comes the real problem.
Unfortunately, when the application needs to allocate more memory, and it
recycles the previously munmap()ed address range, the implementation of
mmap() won't be able to reuse the reservation. Basically, the coalescing
rules in the virtual memory system don't allow this new range to combine
with its neighbor. The effect being that superpage promotion will not
reoccur for this range of addresses until both 1MB chunks are freed at some
point in the future.
Reviewed by: jasone
MFC after: 3 weeks
2009-09-26 18:20:40 +00:00
|
|
|
/*
|
|
|
|
* Increase the chunk size to the largest page size that is greater
|
|
|
|
* than the default chunk size and less than or equal to 4MB.
|
|
|
|
*/
|
|
|
|
{
|
|
|
|
size_t pagesizes[MAXPAGESIZES];
|
|
|
|
int k, nsizes;
|
|
|
|
|
|
|
|
nsizes = getpagesizes(pagesizes, MAXPAGESIZES);
|
|
|
|
for (k = 0; k < nsizes; k++)
|
|
|
|
if (pagesizes[k] <= (1LU << 22))
|
2010-01-31 23:16:10 +00:00
|
|
|
while ((1LU << opt_lg_chunk) < pagesizes[k])
|
|
|
|
opt_lg_chunk++;
|
Make malloc(3) superpage aware. Specifically, if getpagesizes(3) returns
a large page size that is greater than malloc(3)'s default chunk size but
less than or equal to 4 MB, then increase the chunk size to match the large
page size.
Most often, using a chunk size that is less than the large page size is not
a problem. However, consider a long-running application that allocates and
frees significant amounts of memory. In particular, it frees enough memory
at times that some of that memory is munmap()ed. Up until the first
munmap(), a 1MB chunk size is just fine; it's not a problem for the virtual
memory system. Two adjacent 1MB chunks that are aligned on a 2MB boundary
will be promoted automatically to a superpage even though they were
allocated at different times. The trouble begins with the munmap(),
releasing a 1MB chunk will trigger the demotion of the containing superpage,
leaving behind a half-used 2MB reservation. Now comes the real problem.
Unfortunately, when the application needs to allocate more memory, and it
recycles the previously munmap()ed address range, the implementation of
mmap() won't be able to reuse the reservation. Basically, the coalescing
rules in the virtual memory system don't allow this new range to combine
with its neighbor. The effect being that superpage promotion will not
reoccur for this range of addresses until both 1MB chunks are freed at some
point in the future.
Reviewed by: jasone
MFC after: 3 weeks
2009-09-26 18:20:40 +00:00
|
|
|
}
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
for (i = 0; i < 3; i++) {
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
unsigned j;
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Get runtime configuration. */
|
|
|
|
switch (i) {
|
|
|
|
case 0:
|
|
|
|
if ((linklen = readlink("/etc/malloc.conf", buf,
|
|
|
|
sizeof(buf) - 1)) != -1) {
|
|
|
|
/*
|
|
|
|
* Use the contents of the "/etc/malloc.conf"
|
|
|
|
* symbolic link's name.
|
|
|
|
*/
|
|
|
|
buf[linklen] = '\0';
|
|
|
|
opts = buf;
|
|
|
|
} else {
|
|
|
|
/* No configuration specified. */
|
|
|
|
buf[0] = '\0';
|
|
|
|
opts = buf;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case 1:
|
|
|
|
if (issetugid() == 0 && (opts =
|
|
|
|
getenv("MALLOC_OPTIONS")) != NULL) {
|
|
|
|
/*
|
|
|
|
* Do nothing; opts is already initialized to
|
|
|
|
* the value of the MALLOC_OPTIONS environment
|
|
|
|
* variable.
|
|
|
|
*/
|
|
|
|
} else {
|
|
|
|
/* No configuration specified. */
|
|
|
|
buf[0] = '\0';
|
|
|
|
opts = buf;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
if (_malloc_options != NULL) {
|
2008-02-08 00:35:56 +00:00
|
|
|
/*
|
|
|
|
* Use options that were compiled into the
|
|
|
|
* program.
|
|
|
|
*/
|
|
|
|
opts = _malloc_options;
|
2006-01-13 18:38:56 +00:00
|
|
|
} else {
|
|
|
|
/* No configuration specified. */
|
|
|
|
buf[0] = '\0';
|
|
|
|
opts = buf;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
/* NOTREACHED */
|
|
|
|
assert(false);
|
2010-01-31 23:16:10 +00:00
|
|
|
buf[0] = '\0';
|
|
|
|
opts = buf;
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
for (j = 0; opts[j] != '\0'; j++) {
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
unsigned k, nreps;
|
|
|
|
bool nseen;
|
|
|
|
|
|
|
|
/* Parse repetition count, if any. */
|
|
|
|
for (nreps = 0, nseen = false;; j++, nseen = true) {
|
|
|
|
switch (opts[j]) {
|
|
|
|
case '0': case '1': case '2': case '3':
|
|
|
|
case '4': case '5': case '6': case '7':
|
|
|
|
case '8': case '9':
|
|
|
|
nreps *= 10;
|
|
|
|
nreps += opts[j] - '0';
|
|
|
|
break;
|
|
|
|
default:
|
2008-02-08 00:35:56 +00:00
|
|
|
goto MALLOC_OUT;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
}
|
|
|
|
}
|
2008-02-08 00:35:56 +00:00
|
|
|
MALLOC_OUT:
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
if (nseen == false)
|
|
|
|
nreps = 1;
|
|
|
|
|
|
|
|
for (k = 0; k < nreps; k++) {
|
|
|
|
switch (opts[j]) {
|
|
|
|
case 'a':
|
|
|
|
opt_abort = false;
|
|
|
|
break;
|
|
|
|
case 'A':
|
|
|
|
opt_abort = true;
|
|
|
|
break;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
case 'c':
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_cspace_max - 1 >
|
|
|
|
opt_lg_qspace_max &&
|
|
|
|
opt_lg_cspace_max >
|
|
|
|
LG_CACHELINE)
|
|
|
|
opt_lg_cspace_max--;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
break;
|
|
|
|
case 'C':
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_cspace_max < PAGE_SHIFT
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
- 1)
|
2010-01-31 23:16:10 +00:00
|
|
|
opt_lg_cspace_max++;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
break;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
case 'd':
|
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
opt_dss = false;
|
|
|
|
#endif
|
|
|
|
break;
|
|
|
|
case 'D':
|
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
opt_dss = true;
|
|
|
|
#endif
|
|
|
|
break;
|
2010-01-31 23:16:10 +00:00
|
|
|
case 'e':
|
|
|
|
if (opt_lg_medium_max > PAGE_SHIFT)
|
|
|
|
opt_lg_medium_max--;
|
|
|
|
break;
|
|
|
|
case 'E':
|
|
|
|
if (opt_lg_medium_max + 1 <
|
|
|
|
opt_lg_chunk)
|
|
|
|
opt_lg_medium_max++;
|
|
|
|
break;
|
2008-02-06 02:59:54 +00:00
|
|
|
case 'f':
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_dirty_mult + 1 <
|
|
|
|
(sizeof(size_t) << 3))
|
|
|
|
opt_lg_dirty_mult++;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
break;
|
2008-02-06 02:59:54 +00:00
|
|
|
case 'F':
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_dirty_mult >= 0)
|
|
|
|
opt_lg_dirty_mult--;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
break;
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
case 'g':
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_tcache_gc_sweep >= 0)
|
|
|
|
opt_lg_tcache_gc_sweep--;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
break;
|
|
|
|
case 'G':
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_tcache_gc_sweep + 1 <
|
|
|
|
(sizeof(size_t) << 3))
|
|
|
|
opt_lg_tcache_gc_sweep++;
|
|
|
|
break;
|
|
|
|
case 'h':
|
|
|
|
if (opt_lg_tcache_nslots > 0)
|
|
|
|
opt_lg_tcache_nslots--;
|
|
|
|
break;
|
|
|
|
case 'H':
|
|
|
|
if (opt_lg_tcache_nslots + 1 <
|
|
|
|
(sizeof(size_t) << 3))
|
|
|
|
opt_lg_tcache_nslots++;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
break;
|
|
|
|
#endif
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
case 'j':
|
|
|
|
opt_junk = false;
|
|
|
|
break;
|
|
|
|
case 'J':
|
|
|
|
opt_junk = true;
|
|
|
|
break;
|
|
|
|
case 'k':
|
|
|
|
/*
|
|
|
|
* Chunks always require at least one
|
2010-01-31 23:16:10 +00:00
|
|
|
* header page, plus enough room to
|
|
|
|
* hold a run for the largest medium
|
|
|
|
* size class (one page more than the
|
|
|
|
* size).
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
if ((1U << (opt_lg_chunk - 1)) >=
|
|
|
|
(2U << PAGE_SHIFT) + (1U <<
|
|
|
|
opt_lg_medium_max))
|
|
|
|
opt_lg_chunk--;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
break;
|
|
|
|
case 'K':
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_chunk + 1 <
|
2008-02-06 02:59:54 +00:00
|
|
|
(sizeof(size_t) << 3))
|
2010-01-31 23:16:10 +00:00
|
|
|
opt_lg_chunk++;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
break;
|
|
|
|
case 'm':
|
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
opt_mmap = false;
|
|
|
|
#endif
|
|
|
|
break;
|
|
|
|
case 'M':
|
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
opt_mmap = true;
|
|
|
|
#endif
|
|
|
|
break;
|
|
|
|
case 'n':
|
|
|
|
opt_narenas_lshift--;
|
|
|
|
break;
|
|
|
|
case 'N':
|
|
|
|
opt_narenas_lshift++;
|
|
|
|
break;
|
|
|
|
case 'p':
|
2010-01-31 23:16:10 +00:00
|
|
|
opt_stats_print = false;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
break;
|
|
|
|
case 'P':
|
2010-01-31 23:16:10 +00:00
|
|
|
opt_stats_print = true;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
break;
|
|
|
|
case 'q':
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_qspace_max > LG_QUANTUM)
|
|
|
|
opt_lg_qspace_max--;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
break;
|
|
|
|
case 'Q':
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_lg_qspace_max + 1 <
|
|
|
|
opt_lg_cspace_max)
|
|
|
|
opt_lg_qspace_max++;
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
break;
|
|
|
|
case 'u':
|
|
|
|
opt_utrace = false;
|
|
|
|
break;
|
|
|
|
case 'U':
|
|
|
|
opt_utrace = true;
|
|
|
|
break;
|
|
|
|
case 'v':
|
|
|
|
opt_sysv = false;
|
|
|
|
break;
|
|
|
|
case 'V':
|
|
|
|
opt_sysv = true;
|
|
|
|
break;
|
|
|
|
case 'x':
|
|
|
|
opt_xmalloc = false;
|
|
|
|
break;
|
|
|
|
case 'X':
|
|
|
|
opt_xmalloc = true;
|
|
|
|
break;
|
|
|
|
case 'z':
|
|
|
|
opt_zero = false;
|
|
|
|
break;
|
|
|
|
case 'Z':
|
|
|
|
opt_zero = true;
|
|
|
|
break;
|
|
|
|
default: {
|
|
|
|
char cbuf[2];
|
2008-02-06 02:59:54 +00:00
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
cbuf[0] = opts[j];
|
|
|
|
cbuf[1] = '\0';
|
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Unsupported character "
|
2008-02-06 02:59:54 +00:00
|
|
|
"in malloc options: '", cbuf,
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
"'\n");
|
|
|
|
}
|
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
1995-10-08 18:44:20 +00:00
|
|
|
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
/* Make sure that there is some method for acquiring memory. */
|
|
|
|
if (opt_dss == false && opt_mmap == false)
|
|
|
|
opt_mmap = true;
|
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_stats_print) {
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Print statistics at exit. */
|
2010-01-31 23:16:10 +00:00
|
|
|
atexit(stats_print_atexit);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
1995-09-16 09:28:13 +00:00
|
|
|
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
/* Set variables according to the value of opt_lg_[qc]space_max. */
|
|
|
|
qspace_max = (1U << opt_lg_qspace_max);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
cspace_min = CACHELINE_CEILING(qspace_max);
|
|
|
|
if (cspace_min == qspace_max)
|
|
|
|
cspace_min += CACHELINE;
|
2010-01-31 23:16:10 +00:00
|
|
|
cspace_max = (1U << opt_lg_cspace_max);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
sspace_min = SUBPAGE_CEILING(cspace_max);
|
|
|
|
if (sspace_min == cspace_max)
|
|
|
|
sspace_min += SUBPAGE;
|
2008-09-10 14:27:34 +00:00
|
|
|
assert(sspace_min < PAGE_SIZE);
|
|
|
|
sspace_max = PAGE_SIZE - SUBPAGE;
|
2010-01-31 23:16:10 +00:00
|
|
|
medium_max = (1U << opt_lg_medium_max);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
|
|
|
#ifdef MALLOC_TINY
|
2010-01-31 23:16:10 +00:00
|
|
|
assert(LG_QUANTUM >= LG_TINY_MIN);
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
#endif
|
2010-01-31 23:16:10 +00:00
|
|
|
assert(ntbins <= LG_QUANTUM);
|
|
|
|
nqbins = qspace_max >> LG_QUANTUM;
|
|
|
|
ncbins = ((cspace_max - cspace_min) >> LG_CACHELINE) + 1;
|
|
|
|
nsbins = ((sspace_max - sspace_min) >> LG_SUBPAGE) + 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Compute medium size class spacing and the number of medium size
|
|
|
|
* classes. Limit spacing to no more than pagesize, but if possible
|
|
|
|
* use the smallest spacing that does not exceed NMBINS_MAX medium size
|
|
|
|
* classes.
|
|
|
|
*/
|
|
|
|
lg_mspace = LG_SUBPAGE;
|
|
|
|
nmbins = ((medium_max - medium_min) >> lg_mspace) + 1;
|
|
|
|
while (lg_mspace < PAGE_SHIFT && nmbins > NMBINS_MAX) {
|
|
|
|
lg_mspace = lg_mspace + 1;
|
|
|
|
nmbins = ((medium_max - medium_min) >> lg_mspace) + 1;
|
|
|
|
}
|
|
|
|
mspace_mask = (1U << lg_mspace) - 1U;
|
|
|
|
|
|
|
|
mbin0 = ntbins + nqbins + ncbins + nsbins;
|
|
|
|
nbins = mbin0 + nmbins;
|
|
|
|
/*
|
|
|
|
* The small_size2bin lookup table uses uint8_t to encode each bin
|
|
|
|
* index, so we cannot support more than 256 small size classes. This
|
|
|
|
* limit is difficult to exceed (not even possible with 16B quantum and
|
|
|
|
* 4KiB pages), and such configurations are impractical, but
|
|
|
|
* nonetheless we need to protect against this case in order to avoid
|
|
|
|
* undefined behavior.
|
|
|
|
*/
|
|
|
|
if (mbin0 > 256) {
|
|
|
|
char line_buf[UMAX2S_BUFSIZE];
|
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Too many small size classes (",
|
|
|
|
umax2s(mbin0, 10, line_buf), " > max 256)\n");
|
|
|
|
abort();
|
|
|
|
}
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
if (small_size2bin_init()) {
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
malloc_mutex_unlock(&init_lock);
|
|
|
|
return (true);
|
|
|
|
}
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
if (opt_lg_tcache_nslots > 0) {
|
|
|
|
tcache_nslots = (1U << opt_lg_tcache_nslots);
|
|
|
|
|
|
|
|
/* Compute incremental GC event threshold. */
|
|
|
|
if (opt_lg_tcache_gc_sweep >= 0) {
|
|
|
|
tcache_gc_incr = ((1U << opt_lg_tcache_gc_sweep) /
|
|
|
|
nbins) + (((1U << opt_lg_tcache_gc_sweep) % nbins ==
|
|
|
|
0) ? 0 : 1);
|
|
|
|
} else
|
|
|
|
tcache_gc_incr = 0;
|
|
|
|
} else
|
|
|
|
tcache_nslots = 0;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Set variables according to the value of opt_lg_chunk. */
|
|
|
|
chunksize = (1LU << opt_lg_chunk);
|
2007-03-23 22:58:15 +00:00
|
|
|
chunksize_mask = chunksize - 1;
|
2008-09-10 14:27:34 +00:00
|
|
|
chunk_npages = (chunksize >> PAGE_SHIFT);
|
2007-03-23 05:05:48 +00:00
|
|
|
{
|
2008-02-06 02:59:54 +00:00
|
|
|
size_t header_size;
|
2007-03-23 05:05:48 +00:00
|
|
|
|
2008-02-06 02:59:54 +00:00
|
|
|
/*
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
* Compute the header size such that it is large enough to
|
|
|
|
* contain the page map.
|
2008-02-06 02:59:54 +00:00
|
|
|
*/
|
|
|
|
header_size = sizeof(arena_chunk_t) +
|
2008-07-18 19:35:44 +00:00
|
|
|
(sizeof(arena_chunk_map_t) * (chunk_npages - 1));
|
2008-09-10 14:27:34 +00:00
|
|
|
arena_chunk_header_npages = (header_size >> PAGE_SHIFT) +
|
|
|
|
((header_size & PAGE_MASK) != 0);
|
2007-03-23 05:05:48 +00:00
|
|
|
}
|
2007-03-23 22:58:15 +00:00
|
|
|
arena_maxclass = chunksize - (arena_chunk_header_npages <<
|
2008-09-10 14:27:34 +00:00
|
|
|
PAGE_SHIFT);
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2009-11-14 09:31:47 +00:00
|
|
|
UTRACE((void *)(intptr_t)(-1), 0, 0);
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_mutex_init(&chunks_mtx);
|
2006-01-13 18:38:56 +00:00
|
|
|
memset(&stats_chunks, 0, sizeof(chunk_stats_t));
|
|
|
|
#endif
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Various sanity checks that regard configuration. */
|
2008-09-10 14:27:34 +00:00
|
|
|
assert(chunksize >= PAGE_SIZE);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/* Initialize chunks data. */
|
2007-12-31 00:59:16 +00:00
|
|
|
malloc_mutex_init(&huge_mtx);
|
2008-05-01 17:25:55 +00:00
|
|
|
extent_tree_ad_new(&huge);
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
|
|
|
malloc_mutex_init(&dss_mtx);
|
|
|
|
dss_base = sbrk(0);
|
|
|
|
dss_prev = dss_base;
|
|
|
|
dss_max = dss_base;
|
2008-05-01 17:25:55 +00:00
|
|
|
extent_tree_szad_new(&dss_chunks_szad);
|
|
|
|
extent_tree_ad_new(&dss_chunks_ad);
|
2006-01-13 18:38:56 +00:00
|
|
|
#endif
|
|
|
|
#ifdef MALLOC_STATS
|
2006-03-17 09:00:27 +00:00
|
|
|
huge_nmalloc = 0;
|
|
|
|
huge_ndalloc = 0;
|
2006-01-13 18:38:56 +00:00
|
|
|
huge_allocated = 0;
|
|
|
|
#endif
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
/* Initialize base allocation data structures. */
|
2006-03-24 00:28:08 +00:00
|
|
|
#ifdef MALLOC_STATS
|
2007-03-23 05:05:48 +00:00
|
|
|
base_mapped = 0;
|
2006-03-24 00:28:08 +00:00
|
|
|
#endif
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2006-03-24 00:28:08 +00:00
|
|
|
/*
|
2006-09-08 17:52:15 +00:00
|
|
|
* Allocate a base chunk here, since it doesn't actually have to be
|
|
|
|
* chunk-aligned. Doing this before allocating any other chunks allows
|
|
|
|
* the use of space that would otherwise be wasted.
|
2006-03-24 00:28:08 +00:00
|
|
|
*/
|
Add the 'D' and 'M' run time options, and use them to control whether
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
2007-12-27 23:29:44 +00:00
|
|
|
if (opt_dss)
|
|
|
|
base_pages_alloc(0);
|
2006-03-24 00:28:08 +00:00
|
|
|
#endif
|
2008-02-06 02:59:54 +00:00
|
|
|
base_nodes = NULL;
|
2006-01-16 05:13:49 +00:00
|
|
|
malloc_mutex_init(&base_mtx);
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
if (ncpus > 1) {
|
2006-03-26 23:37:25 +00:00
|
|
|
/*
|
2010-01-31 23:16:10 +00:00
|
|
|
* For SMP systems, create more than one arena per CPU by
|
|
|
|
* default.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
if (tcache_nslots) {
|
|
|
|
/*
|
|
|
|
* Only large object allocation/deallocation is
|
|
|
|
* guaranteed to acquire an arena mutex, so we can get
|
|
|
|
* away with fewer arenas than without thread caching.
|
|
|
|
*/
|
|
|
|
opt_narenas_lshift += 1;
|
|
|
|
} else {
|
|
|
|
#endif
|
|
|
|
/*
|
|
|
|
* All allocations must acquire an arena mutex, so use
|
|
|
|
* plenty of arenas.
|
|
|
|
*/
|
|
|
|
opt_narenas_lshift += 2;
|
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
}
|
|
|
|
#endif
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
1995-09-16 09:28:13 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Determine how many arenas to use. */
|
2006-02-04 01:11:30 +00:00
|
|
|
narenas = ncpus;
|
2006-03-26 23:41:35 +00:00
|
|
|
if (opt_narenas_lshift > 0) {
|
|
|
|
if ((narenas << opt_narenas_lshift) > narenas)
|
|
|
|
narenas <<= opt_narenas_lshift;
|
|
|
|
/*
|
2007-11-27 03:09:23 +00:00
|
|
|
* Make sure not to exceed the limits of what base_alloc() can
|
|
|
|
* handle.
|
2006-03-26 23:41:35 +00:00
|
|
|
*/
|
2007-03-23 22:58:15 +00:00
|
|
|
if (narenas * sizeof(arena_t *) > chunksize)
|
|
|
|
narenas = chunksize / sizeof(arena_t *);
|
2006-03-26 23:41:35 +00:00
|
|
|
} else if (opt_narenas_lshift < 0) {
|
2007-11-27 03:09:23 +00:00
|
|
|
if ((narenas >> -opt_narenas_lshift) < narenas)
|
|
|
|
narenas >>= -opt_narenas_lshift;
|
2006-03-26 23:41:35 +00:00
|
|
|
/* Make sure there is at least one arena. */
|
|
|
|
if (narenas == 0)
|
|
|
|
narenas = 1;
|
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
#ifdef NO_TLS
|
|
|
|
if (narenas > 1) {
|
|
|
|
static const unsigned primes[] = {1, 3, 5, 7, 11, 13, 17, 19,
|
|
|
|
23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83,
|
|
|
|
89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
|
|
|
|
151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211,
|
|
|
|
223, 227, 229, 233, 239, 241, 251, 257, 263};
|
2007-01-31 22:54:19 +00:00
|
|
|
unsigned nprimes, parenas;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Pick a prime number of hash arenas that is more than narenas
|
|
|
|
* so that direct hashing of pthread_self() pointers tends to
|
|
|
|
* spread allocations evenly among the arenas.
|
|
|
|
*/
|
|
|
|
assert((narenas & 1) == 0); /* narenas must be even. */
|
2010-01-31 23:16:10 +00:00
|
|
|
nprimes = (sizeof(primes) >> LG_SIZEOF_INT);
|
2006-01-13 18:38:56 +00:00
|
|
|
parenas = primes[nprimes - 1]; /* In case not enough primes. */
|
|
|
|
for (i = 1; i < nprimes; i++) {
|
|
|
|
if (primes[i] > narenas) {
|
|
|
|
parenas = primes[i];
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
narenas = parenas;
|
|
|
|
}
|
|
|
|
#endif
|
1995-10-08 18:44:20 +00:00
|
|
|
|
2007-12-17 01:20:04 +00:00
|
|
|
#ifndef NO_TLS
|
|
|
|
next_arena = 0;
|
|
|
|
#endif
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/* Allocate and initialize arenas. */
|
2006-01-16 05:13:49 +00:00
|
|
|
arenas = (arena_t **)base_alloc(sizeof(arena_t *) * narenas);
|
2006-04-04 19:46:28 +00:00
|
|
|
if (arenas == NULL) {
|
|
|
|
malloc_mutex_unlock(&init_lock);
|
2006-01-19 02:11:05 +00:00
|
|
|
return (true);
|
2006-04-04 19:46:28 +00:00
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
/*
|
|
|
|
* Zero the array. In practice, this should always be pre-zeroed,
|
|
|
|
* since it was just mmap()ed, but let's be sure.
|
|
|
|
*/
|
|
|
|
memset(arenas, 0, sizeof(arena_t *) * narenas);
|
1995-10-08 18:44:20 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/*
|
|
|
|
* Initialize one arena here. The rest are lazily created in
|
2007-11-27 03:09:23 +00:00
|
|
|
* choose_arena_hard().
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
|
|
|
arenas_extend(0);
|
2006-04-04 19:46:28 +00:00
|
|
|
if (arenas[0] == NULL) {
|
|
|
|
malloc_mutex_unlock(&init_lock);
|
2006-01-19 02:11:05 +00:00
|
|
|
return (true);
|
2006-04-04 19:46:28 +00:00
|
|
|
}
|
2007-11-27 03:13:15 +00:00
|
|
|
#ifndef NO_TLS
|
2007-11-27 03:17:30 +00:00
|
|
|
/*
|
|
|
|
* Assign the initial arena to the initial thread, in order to avoid
|
|
|
|
* spurious creation of an extra arena if the application switches to
|
|
|
|
* threaded mode.
|
|
|
|
*/
|
|
|
|
arenas_map = arenas[0];
|
2007-12-17 01:20:04 +00:00
|
|
|
#endif
|
2007-11-27 03:17:30 +00:00
|
|
|
malloc_spin_init(&arenas_lock);
|
2004-03-07 20:41:27 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
malloc_initialized = true;
|
2006-04-04 19:46:28 +00:00
|
|
|
malloc_mutex_unlock(&init_lock);
|
2006-01-19 02:11:05 +00:00
|
|
|
return (false);
|
1996-07-03 05:03:07 +00:00
|
|
|
}
|
1997-06-22 17:54:27 +00:00
|
|
|
|
2004-02-21 09:14:38 +00:00
|
|
|
/*
|
2006-03-17 09:00:27 +00:00
|
|
|
* End general internal functions.
|
2006-01-13 18:38:56 +00:00
|
|
|
*/
|
|
|
|
/******************************************************************************/
|
|
|
|
/*
|
|
|
|
* Begin malloc(3)-compatible functions.
|
2004-02-21 09:14:38 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
void *
|
|
|
|
malloc(size_t size)
|
|
|
|
{
|
2006-01-13 18:38:56 +00:00
|
|
|
void *ret;
|
|
|
|
|
2006-01-19 02:11:05 +00:00
|
|
|
if (malloc_init()) {
|
|
|
|
ret = NULL;
|
2010-01-31 23:16:10 +00:00
|
|
|
goto OOM;
|
2006-01-19 02:11:05 +00:00
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
if (size == 0) {
|
|
|
|
if (opt_sysv == false)
|
2006-06-30 20:54:15 +00:00
|
|
|
size = 1;
|
|
|
|
else {
|
2010-01-31 23:16:10 +00:00
|
|
|
if (opt_xmalloc) {
|
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in malloc(): "
|
|
|
|
"invalid size 0\n", "", "");
|
|
|
|
abort();
|
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
ret = NULL;
|
2006-06-30 20:54:15 +00:00
|
|
|
goto RETURN;
|
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
2004-02-21 09:14:38 +00:00
|
|
|
|
2006-03-30 20:25:52 +00:00
|
|
|
ret = imalloc(size);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
OOM:
|
2006-01-13 18:38:56 +00:00
|
|
|
if (ret == NULL) {
|
|
|
|
if (opt_xmalloc) {
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in malloc(): out of memory\n", "",
|
|
|
|
"");
|
2006-01-13 18:38:56 +00:00
|
|
|
abort();
|
|
|
|
}
|
|
|
|
errno = ENOMEM;
|
2006-03-19 18:28:06 +00:00
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
RETURN:
|
2006-01-13 18:38:56 +00:00
|
|
|
UTRACE(0, size, ret);
|
|
|
|
return (ret);
|
2004-02-21 09:14:38 +00:00
|
|
|
}
|
|
|
|
|
2006-01-12 07:28:21 +00:00
|
|
|
int
|
|
|
|
posix_memalign(void **memptr, size_t alignment, size_t size)
|
|
|
|
{
|
2006-01-13 18:38:56 +00:00
|
|
|
int ret;
|
|
|
|
void *result;
|
2006-01-12 07:28:21 +00:00
|
|
|
|
2006-01-19 02:11:05 +00:00
|
|
|
if (malloc_init())
|
2006-01-13 18:38:56 +00:00
|
|
|
result = NULL;
|
2006-01-19 02:11:05 +00:00
|
|
|
else {
|
2010-01-31 23:16:10 +00:00
|
|
|
if (size == 0) {
|
|
|
|
if (opt_sysv == false)
|
|
|
|
size = 1;
|
|
|
|
else {
|
|
|
|
if (opt_xmalloc) {
|
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in "
|
|
|
|
"posix_memalign(): invalid "
|
|
|
|
"size 0\n", "", "");
|
|
|
|
abort();
|
|
|
|
}
|
|
|
|
result = NULL;
|
|
|
|
*memptr = NULL;
|
|
|
|
ret = 0;
|
|
|
|
goto RETURN;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-01-19 02:11:05 +00:00
|
|
|
/* Make sure that alignment is a large enough power of 2. */
|
|
|
|
if (((alignment - 1) & alignment) != 0
|
|
|
|
|| alignment < sizeof(void *)) {
|
|
|
|
if (opt_xmalloc) {
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in posix_memalign(): "
|
|
|
|
"invalid alignment\n", "", "");
|
2006-01-19 02:11:05 +00:00
|
|
|
abort();
|
|
|
|
}
|
|
|
|
result = NULL;
|
|
|
|
ret = EINVAL;
|
|
|
|
goto RETURN;
|
|
|
|
}
|
2006-01-12 07:28:21 +00:00
|
|
|
|
2006-03-30 20:25:52 +00:00
|
|
|
result = ipalloc(alignment, size);
|
2006-01-19 02:11:05 +00:00
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
if (result == NULL) {
|
|
|
|
if (opt_xmalloc) {
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in posix_memalign(): out of memory\n",
|
|
|
|
"", "");
|
2006-01-13 18:38:56 +00:00
|
|
|
abort();
|
|
|
|
}
|
|
|
|
ret = ENOMEM;
|
|
|
|
goto RETURN;
|
2006-03-19 18:28:06 +00:00
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
*memptr = result;
|
|
|
|
ret = 0;
|
|
|
|
|
|
|
|
RETURN:
|
|
|
|
UTRACE(0, size, result);
|
|
|
|
return (ret);
|
2006-01-12 07:28:21 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void *
|
|
|
|
calloc(size_t num, size_t size)
|
|
|
|
{
|
2006-01-13 18:38:56 +00:00
|
|
|
void *ret;
|
2006-03-30 20:25:52 +00:00
|
|
|
size_t num_size;
|
2006-01-12 07:28:21 +00:00
|
|
|
|
2006-01-19 02:11:05 +00:00
|
|
|
if (malloc_init()) {
|
2007-01-31 22:54:19 +00:00
|
|
|
num_size = 0;
|
2006-01-19 02:11:05 +00:00
|
|
|
ret = NULL;
|
|
|
|
goto RETURN;
|
|
|
|
}
|
2006-01-12 07:28:21 +00:00
|
|
|
|
2006-03-30 20:25:52 +00:00
|
|
|
num_size = num * size;
|
|
|
|
if (num_size == 0) {
|
2006-08-13 21:54:47 +00:00
|
|
|
if ((opt_sysv == false) && ((num == 0) || (size == 0)))
|
2006-06-30 20:54:15 +00:00
|
|
|
num_size = 1;
|
|
|
|
else {
|
2006-01-13 18:38:56 +00:00
|
|
|
ret = NULL;
|
2006-06-30 20:54:15 +00:00
|
|
|
goto RETURN;
|
|
|
|
}
|
2006-04-04 03:51:47 +00:00
|
|
|
/*
|
|
|
|
* Try to avoid division here. We know that it isn't possible to
|
|
|
|
* overflow during multiplication if neither operand uses any of the
|
|
|
|
* most significant half of the bits in a size_t.
|
|
|
|
*/
|
|
|
|
} else if (((num | size) & (SIZE_T_MAX << (sizeof(size_t) << 2)))
|
|
|
|
&& (num_size / size != num)) {
|
2006-01-13 18:38:56 +00:00
|
|
|
/* size_t overflow. */
|
|
|
|
ret = NULL;
|
|
|
|
goto RETURN;
|
|
|
|
}
|
2006-01-12 07:28:21 +00:00
|
|
|
|
2006-03-30 20:25:52 +00:00
|
|
|
ret = icalloc(num_size);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
RETURN:
|
|
|
|
if (ret == NULL) {
|
|
|
|
if (opt_xmalloc) {
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in calloc(): out of memory\n", "",
|
|
|
|
"");
|
2006-01-13 18:38:56 +00:00
|
|
|
abort();
|
|
|
|
}
|
|
|
|
errno = ENOMEM;
|
|
|
|
}
|
2006-01-12 07:28:21 +00:00
|
|
|
|
2006-03-30 20:25:52 +00:00
|
|
|
UTRACE(0, num_size, ret);
|
2006-01-13 18:38:56 +00:00
|
|
|
return (ret);
|
2006-01-12 07:28:21 +00:00
|
|
|
}
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
void *
|
|
|
|
realloc(void *ptr, size_t size)
|
2004-02-21 09:14:38 +00:00
|
|
|
{
|
2006-01-13 18:38:56 +00:00
|
|
|
void *ret;
|
|
|
|
|
2006-06-30 20:54:15 +00:00
|
|
|
if (size == 0) {
|
|
|
|
if (opt_sysv == false)
|
|
|
|
size = 1;
|
|
|
|
else {
|
|
|
|
if (ptr != NULL)
|
|
|
|
idalloc(ptr);
|
|
|
|
ret = NULL;
|
|
|
|
goto RETURN;
|
|
|
|
}
|
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-06-30 20:54:15 +00:00
|
|
|
if (ptr != NULL) {
|
|
|
|
assert(malloc_initialized);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-06-30 20:54:15 +00:00
|
|
|
ret = iralloc(ptr, size);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-06-30 20:54:15 +00:00
|
|
|
if (ret == NULL) {
|
|
|
|
if (opt_xmalloc) {
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in realloc(): out of "
|
|
|
|
"memory\n", "", "");
|
2006-06-30 20:54:15 +00:00
|
|
|
abort();
|
2006-03-19 18:28:06 +00:00
|
|
|
}
|
2006-06-30 20:54:15 +00:00
|
|
|
errno = ENOMEM;
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
|
|
|
} else {
|
2006-06-30 20:54:15 +00:00
|
|
|
if (malloc_init())
|
|
|
|
ret = NULL;
|
|
|
|
else
|
|
|
|
ret = imalloc(size);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-06-30 20:54:15 +00:00
|
|
|
if (ret == NULL) {
|
|
|
|
if (opt_xmalloc) {
|
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
2007-03-20 03:44:10 +00:00
|
|
|
_malloc_message(_getprogname(),
|
|
|
|
": (malloc) Error in realloc(): out of "
|
|
|
|
"memory\n", "", "");
|
2006-06-30 20:54:15 +00:00
|
|
|
abort();
|
|
|
|
}
|
|
|
|
errno = ENOMEM;
|
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
2004-02-21 09:14:38 +00:00
|
|
|
|
2006-06-30 20:54:15 +00:00
|
|
|
RETURN:
|
2006-01-13 18:38:56 +00:00
|
|
|
UTRACE(ptr, size, ret);
|
|
|
|
return (ret);
|
2004-02-21 09:14:38 +00:00
|
|
|
}
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
void
|
|
|
|
free(void *ptr)
|
2004-02-21 09:14:38 +00:00
|
|
|
{
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
UTRACE(ptr, 0, 0);
|
2006-06-30 20:54:15 +00:00
|
|
|
if (ptr != NULL) {
|
2006-01-13 18:38:56 +00:00
|
|
|
assert(malloc_initialized);
|
|
|
|
|
|
|
|
idalloc(ptr);
|
|
|
|
}
|
2004-02-21 09:14:38 +00:00
|
|
|
}
|
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
/*
|
|
|
|
* End malloc(3)-compatible functions.
|
|
|
|
*/
|
|
|
|
/******************************************************************************/
|
2006-03-28 22:16:04 +00:00
|
|
|
/*
|
|
|
|
* Begin non-standard functions.
|
|
|
|
*/
|
|
|
|
|
|
|
|
size_t
|
|
|
|
malloc_usable_size(const void *ptr)
|
|
|
|
{
|
|
|
|
|
|
|
|
assert(ptr != NULL);
|
|
|
|
|
2006-06-30 20:54:15 +00:00
|
|
|
return (isalloc(ptr));
|
2006-03-28 22:16:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* End non-standard functions.
|
|
|
|
*/
|
|
|
|
/******************************************************************************/
|
2006-01-12 07:28:21 +00:00
|
|
|
/*
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
* Begin library-private functions.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We provide an unpublished interface in order to receive notifications from
|
|
|
|
* the pthreads library whenever a thread exits. This allows us to clean up
|
|
|
|
* thread caches.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
_malloc_thread_cleanup(void)
|
|
|
|
{
|
|
|
|
|
2010-01-31 23:16:10 +00:00
|
|
|
#ifdef MALLOC_TCACHE
|
|
|
|
tcache_t *tcache = tcache_tls;
|
|
|
|
|
|
|
|
if (tcache != NULL) {
|
|
|
|
assert(tcache != (void *)(uintptr_t)1);
|
|
|
|
tcache_destroy(tcache);
|
|
|
|
tcache_tls = (void *)(uintptr_t)1;
|
Add thread-specific caching for small size classes, based on magazines.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
2008-08-27 02:00:53 +00:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The following functions are used by threading libraries for protection of
|
|
|
|
* malloc during fork(). These functions are only called if the program is
|
2006-01-12 07:28:21 +00:00
|
|
|
* running in threaded mode, so there is no need to check whether the program
|
|
|
|
* is threaded here.
|
|
|
|
*/
|
|
|
|
|
|
|
|
void
|
|
|
|
_malloc_prefork(void)
|
|
|
|
{
|
2010-01-31 23:16:10 +00:00
|
|
|
unsigned i;
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/* Acquire all mutexes in a safe order. */
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_spin_lock(&arenas_lock);
|
|
|
|
for (i = 0; i < narenas; i++) {
|
|
|
|
if (arenas[i] != NULL)
|
|
|
|
malloc_spin_lock(&arenas[i]->lock);
|
|
|
|
}
|
2006-01-12 07:28:21 +00:00
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
malloc_mutex_lock(&base_mtx);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2007-12-31 00:59:16 +00:00
|
|
|
malloc_mutex_lock(&huge_mtx);
|
|
|
|
|
|
|
|
#ifdef MALLOC_DSS
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_lock(&dss_mtx);
|
2007-12-31 00:59:16 +00:00
|
|
|
#endif
|
2006-01-12 07:28:21 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
_malloc_postfork(void)
|
|
|
|
{
|
2006-01-13 18:38:56 +00:00
|
|
|
unsigned i;
|
|
|
|
|
|
|
|
/* Release all mutexes, now that fork() has completed. */
|
|
|
|
|
2007-12-31 00:59:16 +00:00
|
|
|
#ifdef MALLOC_DSS
|
2007-12-31 06:19:48 +00:00
|
|
|
malloc_mutex_unlock(&dss_mtx);
|
2007-12-31 00:59:16 +00:00
|
|
|
#endif
|
|
|
|
|
|
|
|
malloc_mutex_unlock(&huge_mtx);
|
2006-01-13 18:38:56 +00:00
|
|
|
|
2006-01-16 05:13:49 +00:00
|
|
|
malloc_mutex_unlock(&base_mtx);
|
2006-01-12 07:28:21 +00:00
|
|
|
|
2006-01-13 18:38:56 +00:00
|
|
|
for (i = 0; i < narenas; i++) {
|
2010-01-31 23:16:10 +00:00
|
|
|
if (arenas[i] != NULL)
|
|
|
|
malloc_spin_unlock(&arenas[i]->lock);
|
2006-01-13 18:38:56 +00:00
|
|
|
}
|
2010-01-31 23:16:10 +00:00
|
|
|
malloc_spin_unlock(&arenas_lock);
|
2006-01-12 07:28:21 +00:00
|
|
|
}
|
2006-01-13 18:38:56 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* End library-private functions.
|
|
|
|
*/
|
|
|
|
/******************************************************************************/
|