chunk per arena, rather than immediately deallocating all unused chunks.
This fixes a potential performance issue when allocating/deallocating
an object of size (4kB..1MB] in a loop.
Reported by: davidxu
don't be greedy on the GNU "::" extension when arg separated by whitespace
and POSIX_CORRECTLY is set. From POSIX point of view this is unclear
situation, so minimal assumption looks right.
(size_t)(num * size) == 0
but both num and size are nonzero.
Reported by: Ilja van Sprundel
Approved by: jasone
Security: Integer overflow; calloc was allocating 1 byte in
response to a request for a multiple of 2^32 (or 2^64)
bytes instead of returning NULL.
well as avoiding a switch statement. This change has no significant impact
to performance when branch prediction is successful at predicting the sizes
of objects passed to free(), but in the case that the object sizes are
semi-random, this change has the potential to prevent many branch prediction
misses, thus improving performance substantially.
Take advantage of alignment guarantees in ipalloc(), and pad object sizes to
something less than a power of two when possible. This has the potential
to substantially reduce internal fragmentation for objects allocated via
posix_memalign().
Avoid an unnecessary pow2_ceil() call in arena_ralloc().
Submitted by: djam8193ah@hotmail.com
and instead creating a small allocation for each malloc(0) call. The
optional SysV compatibility behavior remains unchanged.
Add a couple of assertions.
Fix a couple of typos in error message strings.
The text is correct in the "DESCRIPTION" section, so fix "SYNOPSIS"
to use the correct name.
PR: docs/90498
Submitted by: Vasil Dimov
MFC after: 3 days
4kB pages), in order to avoid dangerous rounding error when calculating
fullness limits during run promotion/demotion.
Convert a structure bitfield to a normal field in areana_run_t. This should
have been changed along with the other fields in revision 1.120.
bounds. [1]
Modify logic for utilizing the data segment, such that it is possible to
create huge allocations there.
Shrink the data segment when deallocating a chunk, if it is at the end of
the data segment.
Rename chunk_size to csize in huge_malloc(), in order to avoid masking a
static variable of the same name. [1]
Reported by: Paul Allen <nospam@ugcs.caltech.edu>
races. This isn't currently necessary for libpthread or libthr, but
without it external threads libraries like the linuxthreads port are
not safe to use.
Reported by: ganbold@micom.mng.net
have to be calculated once per allocator operation.
Make nil const.
Update various comments.
Remove/avoid division where possible.
For the one division operation that remains in the critical path, add a
switch statement that has a case for each small size class, and do division
with a constant divisor in each case. This allows the compiler to generate
optimized code that does not use hardware division [1].
Obtained from: peter [1]
* Avoid choosing an arena until it's certain that an arena is needed
for allocation.
* Convert division/multiplication to bitshifting where possible.
* Avoid accessing TLS variables in single-threaded code.
* Reduce the amount of pointer dereferencing.
* Move lock acquisition in critical paths to only protect the the code
that requires synchronization, and completely remove locking where
possible.
determine its value at run time according to other relevant values. This
avoids the creation of runs that are incompletely utilized, as long as
pagesize isn't too large (>32kB, given the current RUN_MIN_REGS_2POW
setting).
Increase the size of several structure bitfields in arena_run_t in order
to avoid integer overflow in the case that a run's header does not overlap
with the space that is usable as application allocation regions. Given
the tiny_min_2pow change, this fix has no additional impact unless
pagesize is >32kB.
Reported by: kris
internally used chunk to start at the beginning of the heap, rather
than at a chunk-aligned address. This reduces mapped memory somewhat
for 32-bit architectures.
Add the arena_run_link_t type and use it wherever a run object is only
used as a ring 'header'. This saves approximately 40 kB of memory per
arena.
Remove an obsolete (no longer used) code path from base_alloc(), which
supported the internal allocation of objects larger than the chunk
size.
Enhance chunk_dealloc() to cache chunk addresses for all deallocated
chunks. This has no impact for most programs, but has the potential
to reduce VM map fragmentation for programs that use huge
allocations.
that no linear searching is necessary if we resort to allocating from a
run that is known to be mostly full. There are pathological edge cases
that could have caused severely degraded performance, and this change
fixes that.
close enough to each other that reallocation would allocate a new region
of the same size. This improves the performance of repeated incremental
reallocations by up to three orders of magnitude. [1]
Fix arena_new() to properly constrain run size if a small chunk size was
specified during runtime configuration.
Suggested by: se [1]
allocation patterns that involve a relatively even mixture of many
different size classes.
Reduce the chunk size from 16 MB to 2 MB. Since chunks are now carved up
using an address-ordered first best fit policy, VM map fragmentation is
much less likely, which makes smaller chunks not as much of a risk. This
reduces the virtual memory size of most applications.
Remove redzones, since program buffer overruns are no longer as likely to
corrupt malloc data structures.
Remove the C MALLOC_OPTIONS flag, and add H and S.