124 Commits

Author SHA1 Message Date
jasone
2f4016efa3 Add an unreachable return statement, in order to avoid a compiler warning
for non-standard optimization levels.

Reported by:	Michael Zach <zach@webges.com>
2006-04-05 18:46:24 +00:00
jasone
53b0978842 Only initialize the first per-chunk page map element for free runs. This
makes run split/coalesce operations of complexity lg(n) rather than n.
2006-04-05 04:15:12 +00:00
jasone
b2f560b56d Add init_lock, and use it to protect against allocator initialization
races.  This isn't currently necessary for libpthread or libthr, but
without it external threads libraries like the linuxthreads port are
not safe to use.

Reported by:	ganbold@micom.mng.net
2006-04-04 19:46:28 +00:00
jasone
ecc5750010 Refactor per-run bitmap manipulation functions so that bitmap offsets only
have to be calculated once per allocator operation.

Make nil const.

Update various comments.

Remove/avoid division where possible.

For the one division operation that remains in the critical path, add a
switch statement that has a case for each small size class, and do division
with a constant divisor in each case.  This allows the compiler to generate
optimized code that does not use hardware division [1].

Obtained from:	peter [1]
2006-04-04 03:51:47 +00:00
jasone
996bd9246d Optimize runtime performance, primary using the following techniques:
* Avoid choosing an arena until it's certain that an arena is needed
    for allocation.

  * Convert division/multiplication to bitshifting where possible.

  * Avoid accessing TLS variables in single-threaded code.

  * Reduce the amount of pointer dereferencing.

  * Move lock acquisition in critical paths to only protect the the code
    that requires synchronization, and completely remove locking where
    possible.
2006-03-30 20:25:52 +00:00
jasone
1a854b0cf7 Add malloc_usable_size(3).
Discussed with:		arch@
2006-03-28 22:16:04 +00:00
jasone
0ed4b6d88e Allow the 'n' option to decrease the number of arenas below the default,
to as little as one arena.  Also, limit the number of arenas to avoid a
potential invariant violation in base_alloc().
2006-03-26 23:41:35 +00:00
jasone
9b6cd0a1ee Add comments and reformat/rearrange code. There are no significant
functional changes in this commit.
2006-03-26 23:37:25 +00:00
jasone
443b2d32bc Convert TINY_MIN_2POW from a cpp macro to tiny_min_2pow (a variable), and
determine its value at run time according to other relevant values.  This
avoids the creation of runs that are incompletely utilized, as long as
pagesize isn't too large (>32kB, given the current RUN_MIN_REGS_2POW
setting).

Increase the size of several structure bitfields in arena_run_t in order
to avoid integer overflow in the case that a run's header does not overlap
with the space that is usable as application allocation regions.  Given
the tiny_min_2pow change, this fix has no additional impact unless
pagesize is >32kB.

Reported by:	kris
2006-03-24 22:13:49 +00:00
jasone
c5cf5122a1 Add USE_BRK-specific code in malloc_init_hard() to allow the first
internally used chunk to start at the beginning of the heap, rather
than at a chunk-aligned address.  This reduces mapped memory somewhat
for 32-bit architectures.

Add the arena_run_link_t type and use it wherever a run object is only
used as a ring 'header'.  This saves approximately 40 kB of memory per
arena.

Remove an obsolete (no longer used) code path from base_alloc(), which
supported the internal allocation of objects larger than the chunk
size.

Enhance chunk_dealloc() to cache chunk addresses for all deallocated
chunks.  This has no impact for most programs, but has the potential
to reduce VM map fragmentation for programs that use huge
allocations.
2006-03-24 00:28:08 +00:00
jasone
8a77abffbc Separate completely full runs from runs that are merely almost full, so
that no linear searching is necessary if we resort to allocating from a
run that is known to be mostly full.  There are pathological edge cases
that could have caused severely degraded performance, and this change
fixes that.
2006-03-20 04:05:05 +00:00
jasone
6ab124975f Optimize realloc() to reallocate in place if the old and new sizes are
close enough to each other that reallocation would allocate a new region
of the same size.  This improves the performance of repeated incremental
reallocations by up to three orders of magnitude. [1]

Fix arena_new() to properly constrain run size if a small chunk size was
specified during runtime configuration.

Suggested by:	se [1]
2006-03-19 18:28:06 +00:00
jasone
1759b378e2 Modify allocation policy, in order to avoid excessive fragmentation for
allocation patterns that involve a relatively even mixture of many
different size classes.

Reduce the chunk size from 16 MB to 2 MB.  Since chunks are now carved up
using an address-ordered first best fit policy, VM map fragmentation is
much less likely, which makes smaller chunks not as much of a risk.  This
reduces the virtual memory size of most applications.

Remove redzones, since program buffer overruns are no longer as likely to
corrupt malloc data structures.

Remove the C MALLOC_OPTIONS flag, and add H and S.
2006-03-17 09:00:27 +00:00
jasone
c3550ff9ae Fix calculation of the number of arenas to use on multi-processor systems. 2006-02-04 01:11:30 +00:00
jasone
4018cc9c9b Remove unwarranted uses of 'goto'. 2006-01-27 07:46:22 +00:00
jasone
7d53fc1d66 Add NO_MALLOC_EXTRAS, so that various extra features that can cause
performance degradation can be disabled via something like the following
in /etc/malloc.conf:

	CFLAGS+=-DNO_MALLOC_EXTRAS

Suggested by:	deischen
2006-01-27 04:42:10 +00:00
jasone
94601bfc12 Fix the type of a statistics counter (unsigned --> unsigned long). 2006-01-27 04:36:39 +00:00
jasone
efa3cf2bb3 Clean up statistics gathering and printing. 2006-01-27 02:36:44 +00:00
jasone
6f41342041 Optimize arena_bin_pop() to reduce the number of separator operations.
Remove the block of code that tries to use delayed regions in LIFO order,
since from a policy perspective, it conflicts with LRU caching of newly
coalesced regions in arena_undelay().  There are numerous policy
alternatives, and it isn't readily obvious which (if any) is superior;
this change at least has the virtue of being consistent with policy.
2006-01-26 08:11:23 +00:00
jasone
f7b6cf045e Remove a redundant variable assignment in arena_reg_frag_alloc(). 2006-01-25 05:41:02 +00:00
jasone
30e6373190 If no coalesced exact-fit small regions are available, but delayed exact-
fit regions are available, use the delayed regions in LIFO order, in order
to increase locality of reference.  We might expect this to cause delayed
regions to be removed from the delay ring buffer more often (since we're
now re-using more recently buffered regions), but numerous tests indicate
that the overall impact on memory usage tends to be good (reduced
fragmentation).

Re-work arena_frag_reg_alloc() so that when large free regions are
exhausted, it uses small regions in a way that favors contiguous allocation
of sequentially allocated small regions.  Use arena_frag_reg_alloc() in
this capacity, rather than directly attempting over-fitting of small
requests when no large regions are available.

Remove the bin overfit statistic, since it is no longer relevant due to
the arena_frag_reg_alloc() changes.

Do not specify arena_frag_reg_alloc() as an inline function.  It is too
large to benefit much from being inlined, and it is also called in two
places, only one of which is in the critical path (the other call bloated
arena_reg_alloc()).

Call arena_coalesce() for a region before caching it with
arena_mru_cache().

Add assertions that detect the attempted caching of adjacent free regions,
so that we notice this problem when it is first created, rather than in
arena_coalesce(), when it's too late to know how the problem arose.

Reported by:    Hans Blancke
2006-01-25 04:21:22 +00:00
jasone
f26da23003 Make the 'C' and 'c' malloc options consistent with other options; 'C'
doubles the cache size, and 'c' halves the cache size.
2006-01-23 03:32:38 +00:00
jasone
cd2c530a43 In arena_chunk_reg_alloc(), try to avoid touching the last page in the
chunk during initialization, in order to avoid physically backing the
page unless data are allocated there.
2006-01-23 03:19:01 +00:00
jasone
cc036a2601 Use uintptr_t rather than size_t when casting pointers to integers. Also,
fix the few remaining casting style(9) errors that remained after the
functional change.

Reported by:	jmallett
2006-01-20 03:11:11 +00:00
jasone
92d4e46d35 Revert addtion of assertions in revision 1.99. These assertions cause
problems in cases where regions are faked up for the purposes of red-black
tree searches, since those faked region headers reside on the stack, rather
than in a malloc chunk.
2006-01-19 19:20:42 +00:00
jasone
583abbb79b Add assertions that detect some forms of region separator corruption. 2006-01-19 19:08:11 +00:00
jasone
0d0573f36b Remove loops in arena_coalesce(). They are no longer necessary, now that
internal allocation does not rely on recursive arena use (base_arena was
removed in revision 1.95).
2006-01-19 18:37:30 +00:00
jasone
4634244628 Make all internal variables and functions static.
Reported by:	ache
2006-01-19 07:23:13 +00:00
jasone
2622b7ee77 Return NULL if there is an OOM error during initialization, rather than
allowing the error to be fatal.

Move a label in order to make sure to properly handle errors in malloc(0).

Reported by:	Alastair D'Silva, Saneto Takanori
2006-01-19 02:11:05 +00:00
jasone
4179e6a903 Add a separate simple internal base allocator and remove base_arena, so that
there is never any need to recursively call the main allocation functions.

Remove recursive spinlock support, since it is no longer needed.

Allow chunks to be as small as the page size.

Correctly propagate OOM errors from arena_new().
2006-01-16 05:13:49 +00:00
marcel
7a98306144 Define NO_TLS on ia64. The dynamic TLS implementation on ia64 is
broken for non-threaded shared processes in that __tls_get_addr()
assumes the thread pointer is always initialized. This is not the
case. When arenas_map is referenced in choose_arena() and it is
defined as a thread-local variable, it will result in a SIGSEGV.

PR: ia64/91846 (describes the TLS/ia64 bug).
2006-01-16 00:32:46 +00:00
jasone
46a2bcd398 Replace malloc(), calloc(), posix_memalign(), realloc(), and free() with
a scalable concurrent allocator implementation.

Reviewed by:	current@
Approved by:	phk, markm (mentor)
2006-01-13 18:38:56 +00:00
jasone
658b743414 Fix a bitwise logic error in posix_memalign().
Reported by:	glebius
2006-01-12 18:09:25 +00:00
jasone
3668a2e494 In preparation for a new malloc implementation:
* Add posix_memalign().

  * Move calloc() from calloc.c to malloc.c.  Add a calloc() implementation in
    rtld-elf in order to make the loader happy (even though calloc() isn't
    used in rtld-elf).

  * Add _malloc_prefork() and _malloc_postfork(), and use them instead of
    directly manipulating __malloc_lock.

Approved by:	phk, markm (mentor)
2006-01-12 07:28:21 +00:00
delphij
116d1ffddd Remove the check about whether MALLOC_EXTRA_SANITY is defined,
surrounding the undef'ing it.  It does not seem necessary to
undef some symbol that is not exist, and gcc does not complain
about whether a symbol is exist before #undef'ing it out.

Spotted by:	mingyanguo via ChinaUnix.net forum
Reviewed by:	phk
2005-02-27 17:16:16 +00:00
stefanf
9dea8aeba1 Consistently use __inline instead of __inline__ as the former is an empty macro
in <sys/cdefs.h> for compilers without support for inline.
2004-07-04 16:11:03 +00:00
cognet
d57448a392 Define malloc_pageshift and malloc_minsize for arm. 2004-05-14 11:50:51 +00:00
phk
a545780c51 Rearrange (centralize) initialization of mallocs internals to always be
done before the first call, even if this is a malloc(0) call.

PR:	62859
2004-03-07 20:41:27 +00:00
phk
80fe0dbbdf Remove the triplicity in the public functions by vectoring them all
through a realloc like function.

Make the malloc_active variable a local static to this new function.

Don't warn about recursion more than once per base call.

constify malloc_func.
2004-02-21 09:14:38 +00:00
phk
9942edab59 Move the check for sensitive processes to the point where the exception
has been hit, this makes it cover more cases.

Call the message function directly rather than fiddle with flag-saving
when we find an unknown character in our options.

The 'A' flag should not trigger on legal out of memory conditions.
2004-02-21 08:55:38 +00:00
marcel
769360c440 Do not adjust to the pagesize at runtime. Besides for the one-time
initialization overhead, there's a problem in that we never call
imalloc() and thus malloc_init() for zero-sized allocations. As a
result, malloc(0) returns NULL when it's the first or only malloc in
the program. Any non-zero allocation will initialize the malloc code
with the side-effect that subsequent zero-sized allocations return a
non-NULL pointer. This is because the pointer we return for zero-
sized allocations is calculated from malloc_pageshift, which needs
to be initialized at runtime on ia64.

The result of the inconsistent behaviour described above is that
configure scripts failed the test for a GNU compatible malloc. This
resulted in a lot of broken ports.

Other, even simpler, solutions were possible as well:
1.  initialize malloc_pageshift with some non-zero value (say 13 for
    8KB pages) and keep the runtime adjustment.
2.  Stop using malloc_pageshift to calculate ZEROSIZEPTR.

Removal of the runtime adjustment was chosen because then ia64 is the
same as any other platform. It is not to say that using a page size
obtained at runtime is bad per se. It's that there's currently a high
level of gratuity for its existence and the moment it causes problems
is the moment you need to get rid of it. Hence, it's not unthinkable
that this commit is (partially) reverted some time in the future when
we do have a good reason for it and a good way to achieve it.

Approved by: re@ (rwatson)
Reported by: kris (portmgr@) -- may the ports be with you
2003-11-28 18:03:22 +00:00
deischen
96918b9811 Externalize malloc's spinlock so that a thread library can take
it around an application's fork() call.  Our new thread libraries
(libthr, libpthread) can now have threads running while another
thread calls fork().  In this case, it is possible for malloc
to be left in an inconsistent state in the child.  Our thread
libraries, libpthread in particular, need to use malloc internally
after a fork (in the child).

Reviewed by:	davidxu
2003-11-04 19:49:56 +00:00
tjr
8366c4708a Remove incomplete support for running FreeBSD userland on old NetBSD kernels
lacking the issetugid() and utrace() syscalls.
2003-10-29 10:45:01 +00:00
phk
a98bdabe34 Consistently cast to (u_char *) when filling with junk. 2003-10-25 23:47:33 +00:00
phk
9c4255e21c Style changes. Inching closer to convergence with OpenBSD. 2003-10-25 12:56:51 +00:00
phk
732b6aad5c More style fixes to improve diffability with OpenBSD.
Pull 'A' evilness for realloc(3) from OpenBSD.
2003-09-27 18:58:26 +00:00
phk
edc864517b Style changes to improve diffability against OpenBSD version. 2003-09-27 17:29:03 +00:00
phk
ebd6a7dd85 Minor constification. 2003-07-29 11:16:14 +00:00
phk
3f94d08046 Clarify the code a bit.
Submitted by:	Nadav Eiron <nadav@TheEirons.org>
2003-06-01 09:16:50 +00:00
peter
3ecb61f317 Tell malloc.c that AMD64 uses the same pagesize as i386. 2003-04-30 19:30:34 +00:00