contention. The intent is to dynamically adjust to load imbalances, which
can cause severe contention.
Use pthread mutexes where possible instead of libc "spinlocks" (they aren't
actually spin locks). Conceptually, this change is meant only to support
the dynamic load balancing code by enabling the use of spin locks, but it
has the added apparent benefit of substantially improving performance due to
reduced context switches when there is moderate arena lock contention.
Proper tuning parameter configuration for this change is a finicky business,
and it is very much machine-dependent. One seemingly promising solution
would be to run a tuning program during operating system installation that
computes appropriate settings for load balancing. (The pthreads adaptive
spin locks should probably be similarly tuned.)
vector of slots for lazily freed objects. For each deallocation, before
doing the hard work of locking the arena and deallocating, try several times
to randomly insert the object into the vector using atomic operations.
This approach is particularly effective at reducing contention for
multi-threaded applications that use the producer-consumer model, wherein
one producer thread allocates objects, then multiple consumer threads
deallocate those objects.
allocations. [1]
Fix calculation of the number of arenas when 'n' is specified via
MALLOC_OPTIONS.
Clean up various style inconsistencies.
Obtained from: [1] NetBSD
to an int to remove the warning from using a size_t variable on 64-bit
platforms.
Submitted by: Xin LI <delphij@FreeBSD.org>
Approved by: wes
Approved by: re (kensmith)
inactive variables should cause a rebuild of environ, otherwise, exec()'d
processes will be missing a variable in environ that has been unset then
set.
Submitted by: Taku Yamamoto <taku@tackymt.homeip.net>
Reviewed by: ache
Approved by: wes (mentor)
Approved by: re (kensmith)
or replace (i.e., zdump) the environment after a call to setenv(), putenv()
or unsetenv() has been made, a few changes were made.
- getenv() will return the value from the new environ array.
- setenv() was split into two functions: __setenv() which is most of the
previous setenv() without checks on the name and setenv() which
contains the checks before calling __setenv().
- setenv(), putenv() and unsetenv() will unset all previous values and
call __setenv() on all entries in the new environ array which in turn
adds them to the end of the envVars array. Calling __setenv() instead
of setenv() is done to avoid the temporary replacement of the '=' in a
string with a NUL byte. Some strings may be read-only data.
Added more regression checks for clearing the environment array.
Replaced gettimeofday() with getrusage() in timing regression check for
better accuracy.
Fixed an off-by-one bug in __remove_putenv() in the use of memmove(). This
went unnoticed due to the allocation of double the number of environ
entries when building envVars.
Fixed a few spelling mistakes in the comments.
Reviewed by: ache
Approved by: wes
Approved by: re (kensmith)
setenv(3) by tracking the size of the memory allocated instead of using
strlen() on the current value.
Convert all calls to POSIX from historic BSD API:
- unsetenv returns an int.
- putenv takes a char * instead of const char *.
- putenv no longer makes a copy of the input string.
- errno is set appropriately for POSIX. Exceptions involve bad environ
variable and internal initialization code. These both set errno to
EFAULT.
Several patches to base utilities to handle the POSIX changes from
Andrey Chernov's previous commit. A few I re-wrote to use setenv()
instead of putenv().
New regression module for tools/regression/environ to test these
functions. It also can be used to test the performance.
Bump __FreeBSD_version to 700050 due to API change.
PR: kern/99826
Approved by: wes
Approved by: re (kensmith)
Not because I admit they are technically wrong and not because of bug
reports (I receive nothing). But because I surprisingly meets so
strong opposition and resistance so lost any desire to continue that.
Anyone who interested in POSIX can dig out what changes and how
through cvs diffs.
(also IEEE Std 1003.1-2001)
The specs explicitly says that altering passed string
should change the environment, i.e. putenv() directly puts its arg
into environment (unlike setenv() which just copies it there).
It means that putenv() can't be implemented via setenv()
(like we have before) at all. Putenv() value lives (allows modifying)
up to the next putenv() or setenv() call.
compatibility with the different environment conventions" (man page).
With the standards, we don't have them different anymore and
IEEE Std 1003.1-2001 says that
"The values that the environment variables may be assigned are not
restricted except that they are considered to end with a null byte"
Issue 6 (also IEEE Std 1003.1-2001) in following areas:
args, return, errors.
Putenv still needs rewriting because specs explicitly says that
altering passed string later should change the environment (currently we
copy the string so can't provide that).
avoid downcasting issues. In particular, this change fixes
posix_memalign(3) for alignments greater than 2^31 on LP64 systems.
Make sure that NDEBUG is always set to be compatible with MALLOC_DEBUG. [1]
Reported by: [1] Lee Hyo geol <hyogeollee@gmail.com>
trees that track all non-full runs for each bin. Use the red-black
trees to be able to guarantee that each new allocation is placed in the
lowest address available in any non-full run. This change completes the
transition to allocating from low addresses in order to reduce the
retention of sparsely used chunks.
If the run in current use by a bin becomes empty, deallocate the run
rather than retaining it for later use. The previous behavior had the
tendency to spread empty runs across multiple chunks, thus preventing
the release of chunks that were completely unused.
Generalize base_chunk_alloc() (and rename it to base_pages_alloc()) to
handle allocation sizes larger than the chunk size, so that it is
possible to support chunk sizes that are smaller than an arena object.
Reduce the minimum chunk size from 64kB to 8kB.
Optimize tracking of addresses for deleted chunks.
Fix a statistics bug for huge allocations.
rounding and overflow. Carefully document what the various overflow
tests actually detect.
The bugs mostly canceled out, such that the worst possible failure
cases resulted in non-fatal over-allocations.
than binary buddies, the alignment guarantees are weaker, which requires
a more complex aligned allocation algorithm, similar to that used for
alignment greater than the chunk size.
Reported by: matteo
chunks. This allows runs to be any multiple of the page size. The
primary advantage is that large objects are no longer constrained to be
2^n pages, which can dramatically decrease internal fragmentation for
large objects. This also allows the sizes for runs that back small
objects to be more finely tuned.
Free runs are searched for linearly using the chunk page map (with the
help of some heuristic optimizations). This changes the allocation
policy from "first best fit" to "first fit". A prototype red-black tree
implementation for tracking free runs that implemented "first best fit"
did not cause a measurable speed or memory usage difference for
realistic chunk sizes (though of course it is possible to construct
benchmarks that favor one allocation policy over another).
Refine the handling of fullness constraints for small runs to be more
tunable.
Restructure the per chunk page map to contain only two fields per entry,
rather than four. Also, increase each entry from 4 to 8 bytes, since it
allows for 32-bit integers, without increasing the number of chunk
header pages.
Relax the maximum chunk size constraint. This is of no practical
interest; it is merely fallout from the chunk page map restructuring.
Revamp statistics gathering and reporting to be faster, clearer and more
informative. Statistics gathering is fast enough now to have little
to no impact on application speed, but it still requires approximately
two extra pages of memory per arena (per process). This memory overhead
may be acceptable for most systems, but we still need to leave
statistics gathering disabled by default in RELENG branches.
Rename NO_MALLOC_EXTRAS to MALLOC_PRODUCTION in order to make its intent
clearer (i.e. it should be defined in RELENG branches).
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
Reported by: [1] kientzle
This has no impact unless USE_BRK is defined (32-bit platforms), in
which case user allocations are allocated via mmap() if at all possible,
in order to avoid the possibility of unreclaimable chunks in the data
segment.
Fix an obscure bug in base_alloc() that could have allowed undefined
behavior if an application were to use sbrk() in conjunction with a
USE_BRK-enabled malloc.
chunk per arena, rather than immediately deallocating all unused chunks.
This fixes a potential performance issue when allocating/deallocating
an object of size (4kB..1MB] in a loop.
Reported by: davidxu
don't be greedy on the GNU "::" extension when arg separated by whitespace
and POSIX_CORRECTLY is set. From POSIX point of view this is unclear
situation, so minimal assumption looks right.
(size_t)(num * size) == 0
but both num and size are nonzero.
Reported by: Ilja van Sprundel
Approved by: jasone
Security: Integer overflow; calloc was allocating 1 byte in
response to a request for a multiple of 2^32 (or 2^64)
bytes instead of returning NULL.
well as avoiding a switch statement. This change has no significant impact
to performance when branch prediction is successful at predicting the sizes
of objects passed to free(), but in the case that the object sizes are
semi-random, this change has the potential to prevent many branch prediction
misses, thus improving performance substantially.
Take advantage of alignment guarantees in ipalloc(), and pad object sizes to
something less than a power of two when possible. This has the potential
to substantially reduce internal fragmentation for objects allocated via
posix_memalign().
Avoid an unnecessary pow2_ceil() call in arena_ralloc().
Submitted by: djam8193ah@hotmail.com
and instead creating a small allocation for each malloc(0) call. The
optional SysV compatibility behavior remains unchanged.
Add a couple of assertions.
Fix a couple of typos in error message strings.
The text is correct in the "DESCRIPTION" section, so fix "SYNOPSIS"
to use the correct name.
PR: docs/90498
Submitted by: Vasil Dimov
MFC after: 3 days
4kB pages), in order to avoid dangerous rounding error when calculating
fullness limits during run promotion/demotion.
Convert a structure bitfield to a normal field in areana_run_t. This should
have been changed along with the other fields in revision 1.120.
bounds. [1]
Modify logic for utilizing the data segment, such that it is possible to
create huge allocations there.
Shrink the data segment when deallocating a chunk, if it is at the end of
the data segment.
Rename chunk_size to csize in huge_malloc(), in order to avoid masking a
static variable of the same name. [1]
Reported by: Paul Allen <nospam@ugcs.caltech.edu>
races. This isn't currently necessary for libpthread or libthr, but
without it external threads libraries like the linuxthreads port are
not safe to use.
Reported by: ganbold@micom.mng.net
have to be calculated once per allocator operation.
Make nil const.
Update various comments.
Remove/avoid division where possible.
For the one division operation that remains in the critical path, add a
switch statement that has a case for each small size class, and do division
with a constant divisor in each case. This allows the compiler to generate
optimized code that does not use hardware division [1].
Obtained from: peter [1]
* Avoid choosing an arena until it's certain that an arena is needed
for allocation.
* Convert division/multiplication to bitshifting where possible.
* Avoid accessing TLS variables in single-threaded code.
* Reduce the amount of pointer dereferencing.
* Move lock acquisition in critical paths to only protect the the code
that requires synchronization, and completely remove locking where
possible.
determine its value at run time according to other relevant values. This
avoids the creation of runs that are incompletely utilized, as long as
pagesize isn't too large (>32kB, given the current RUN_MIN_REGS_2POW
setting).
Increase the size of several structure bitfields in arena_run_t in order
to avoid integer overflow in the case that a run's header does not overlap
with the space that is usable as application allocation regions. Given
the tiny_min_2pow change, this fix has no additional impact unless
pagesize is >32kB.
Reported by: kris
internally used chunk to start at the beginning of the heap, rather
than at a chunk-aligned address. This reduces mapped memory somewhat
for 32-bit architectures.
Add the arena_run_link_t type and use it wherever a run object is only
used as a ring 'header'. This saves approximately 40 kB of memory per
arena.
Remove an obsolete (no longer used) code path from base_alloc(), which
supported the internal allocation of objects larger than the chunk
size.
Enhance chunk_dealloc() to cache chunk addresses for all deallocated
chunks. This has no impact for most programs, but has the potential
to reduce VM map fragmentation for programs that use huge
allocations.
that no linear searching is necessary if we resort to allocating from a
run that is known to be mostly full. There are pathological edge cases
that could have caused severely degraded performance, and this change
fixes that.
close enough to each other that reallocation would allocate a new region
of the same size. This improves the performance of repeated incremental
reallocations by up to three orders of magnitude. [1]
Fix arena_new() to properly constrain run size if a small chunk size was
specified during runtime configuration.
Suggested by: se [1]
allocation patterns that involve a relatively even mixture of many
different size classes.
Reduce the chunk size from 16 MB to 2 MB. Since chunks are now carved up
using an address-ordered first best fit policy, VM map fragmentation is
much less likely, which makes smaller chunks not as much of a risk. This
reduces the virtual memory size of most applications.
Remove redzones, since program buffer overruns are no longer as likely to
corrupt malloc data structures.
Remove the C MALLOC_OPTIONS flag, and add H and S.
Remove the block of code that tries to use delayed regions in LIFO order,
since from a policy perspective, it conflicts with LRU caching of newly
coalesced regions in arena_undelay(). There are numerous policy
alternatives, and it isn't readily obvious which (if any) is superior;
this change at least has the virtue of being consistent with policy.
fit regions are available, use the delayed regions in LIFO order, in order
to increase locality of reference. We might expect this to cause delayed
regions to be removed from the delay ring buffer more often (since we're
now re-using more recently buffered regions), but numerous tests indicate
that the overall impact on memory usage tends to be good (reduced
fragmentation).
Re-work arena_frag_reg_alloc() so that when large free regions are
exhausted, it uses small regions in a way that favors contiguous allocation
of sequentially allocated small regions. Use arena_frag_reg_alloc() in
this capacity, rather than directly attempting over-fitting of small
requests when no large regions are available.
Remove the bin overfit statistic, since it is no longer relevant due to
the arena_frag_reg_alloc() changes.
Do not specify arena_frag_reg_alloc() as an inline function. It is too
large to benefit much from being inlined, and it is also called in two
places, only one of which is in the critical path (the other call bloated
arena_reg_alloc()).
Call arena_coalesce() for a region before caching it with
arena_mru_cache().
Add assertions that detect the attempted caching of adjacent free regions,
so that we notice this problem when it is first created, rather than in
arena_coalesce(), when it's too late to know how the problem arose.
Reported by: Hans Blancke
problems in cases where regions are faked up for the purposes of red-black
tree searches, since those faked region headers reside on the stack, rather
than in a malloc chunk.
allowing the error to be fatal.
Move a label in order to make sure to properly handle errors in malloc(0).
Reported by: Alastair D'Silva, Saneto Takanori
there is never any need to recursively call the main allocation functions.
Remove recursive spinlock support, since it is no longer needed.
Allow chunks to be as small as the page size.
Correctly propagate OOM errors from arena_new().
broken for non-threaded shared processes in that __tls_get_addr()
assumes the thread pointer is always initialized. This is not the
case. When arenas_map is referenced in choose_arena() and it is
defined as a thread-local variable, it will result in a SIGSEGV.
PR: ia64/91846 (describes the TLS/ia64 bug).
* Add posix_memalign().
* Move calloc() from calloc.c to malloc.c. Add a calloc() implementation in
rtld-elf in order to make the loader happy (even though calloc() isn't
used in rtld-elf).
* Add _malloc_prefork() and _malloc_postfork(), and use them instead of
directly manipulating __malloc_lock.
Approved by: phk, markm (mentor)
between a 32-bit integer and a radix-64 ASCII string. The l64a_r() function
is a NetBSD addition.
PR: 51209 (based on submission, but very different)
Reviewed by: bde, ru
a tty device instead of the legacy minor number approach. This is known to
fix gnome-vfs' sftp module as well as kio_sftp and kdesu on -CURRENT.
Thanks to scottl for the snprintf() approach idea.
Reviewed by: phk
Tested by: pav
mich
Approved by: re (scottl)
surrounding the undef'ing it. It does not seem necessary to
undef some symbol that is not exist, and gcc does not complain
about whether a symbol is exist before #undef'ing it out.
Spotted by: mingyanguo via ChinaUnix.net forum
Reviewed by: phk
really so.
"If the value of base is 16, the characters 0x or 0X may optionally
precede the sequence of letters and digits, following the sign if
present."
Found by: joerg
seed, the random number generator rand(3) still sucks and is unlikely
sufficient for crypto use. Correct what appears to be a cut and paste
error from the srandomdev() man page.
Submitted by: Ben Mesander
example. The externs haven't been needed in about 10 years, so
there's no reason to have them other than for hysterical raisins. And
the California Rasins haven't been around for a long time...
under the RETURN VALUES section so it is consistent with others.
Cleanup the return value text for getenv(3) a little while I am here.
PR: docs/58033
MFC after: 3 days
cleanups, handling 'ls -l-', handling '--*'
Note this is in the same time back out of our v1.3
"Don't print an error message if the bad option is '?'"
because it directly violates POSIX.
through a realloc like function.
Make the malloc_active variable a local static to this new function.
Don't warn about recursion more than once per base call.
constify malloc_func.
has been hit, this makes it cover more cases.
Call the message function directly rather than fiddle with flag-saving
when we find an unknown character in our options.
The 'A' flag should not trigger on legal out of memory conditions.
These files had tags after the copyright notice,
inside the comment block (incorrect, removed),
and outside the comment block (correct).
Approved by: rwatson (mentor)
This results in no functional change, aside from fixing a data
corruption bug on LP64 platforms. The code here could still use a
significant amount of cleanup.
PR: 56502
Submitted by: hrs (earlier version)
ó++ ABI document at http://www.codesourcery.com/cxx-abi/abi.html#dso-dtor
The ABI was initially defined for ia64, but GCC3 and Intel compilers
have adopted it on other platforms.
This is the patch from PR bin/59552 with a number of changes by
me.
PR: bin/59552
Submitted by: Bradley T Hughes (bhughes at trolltech dot com)
C++ ABI document at http://www.codesourcery.com/cxx-abi/abi.html#dso-dtor
The ABI was initially defined for ia64, but GCC3 and Intel compilers
have adopted it on other platforms.
This is the patch from PR bin/59552 with a number of changes by
me.
PR: bin/59552
Submitted by: Bradley T Hughes (bhughes at trolltech dot com)
initialization overhead, there's a problem in that we never call
imalloc() and thus malloc_init() for zero-sized allocations. As a
result, malloc(0) returns NULL when it's the first or only malloc in
the program. Any non-zero allocation will initialize the malloc code
with the side-effect that subsequent zero-sized allocations return a
non-NULL pointer. This is because the pointer we return for zero-
sized allocations is calculated from malloc_pageshift, which needs
to be initialized at runtime on ia64.
The result of the inconsistent behaviour described above is that
configure scripts failed the test for a GNU compatible malloc. This
resulted in a lot of broken ports.
Other, even simpler, solutions were possible as well:
1. initialize malloc_pageshift with some non-zero value (say 13 for
8KB pages) and keep the runtime adjustment.
2. Stop using malloc_pageshift to calculate ZEROSIZEPTR.
Removal of the runtime adjustment was chosen because then ia64 is the
same as any other platform. It is not to say that using a page size
obtained at runtime is bad per se. It's that there's currently a high
level of gratuity for its existence and the moment it causes problems
is the moment you need to get rid of it. Hence, it's not unthinkable
that this commit is (partially) reverted some time in the future when
we do have a good reason for it and a good way to achieve it.
Approved by: re@ (rwatson)
Reported by: kris (portmgr@) -- may the ports be with you
sorting strings with common prefixes by noting
when all the strings land in just one bin.
Testing shows significant speedups (on the order of
30%) on strings with common prefixes and no slowdowns on any
of my test cases.
Submitted by: Markus Bjartveit Kruger <markusk@pvv.ntnu.no>
PR: 58860
Approved by: gordon (mentor)
it around an application's fork() call. Our new thread libraries
(libthr, libpthread) can now have threads running while another
thread calls fork(). In this case, it is possible for malloc
to be left in an inconsistent state in the child. Our thread
libraries, libpthread in particular, need to use malloc internally
after a fork (in the child).
Reviewed by: davidxu
send strhash(3) off to sleep with the fishes. Nothing in our tree uses it.
It has no documentation. It is nonstandard and in spite of the filename
strhash.c and strhash.h, it lives in application namespace by providing
compulsory global symbols hash_create()/hash_destroy()/hash_search()/
hash_traverse()/hash_purge()/hash_stats() regardless of whether you
#include <strhash.h> or not. If it turns out that there is a huge
application for this after all, I can repocopy it somewhere safer and
we can revive it elsewhere. But please, not in libc!
technique) so that we don't wind up calling into an application's
version if the application defines them.
Inspired by: qpopper's interfering and buggy version of strlcpy
package, a more recent, generalized set of routines. Among the
changes:
- Declare strtof() and strtold() in stdlib.h.
- Add glue to libc to support these routines for all kinds
of ``long double''.
- Update printf() to reflect the fact that dtoa works slightly
differently now.
As soon as I see that nothing has blown up, I will kill
src/lib/libc/stdlib/strtod.c. Soon printf() will be able
to use the new routines to output long doubles without loss
of precision, but numerous bugs in the existing code must
be addressed first.
Reviewed by: bde (briefly), mike (mentor), obrien
seed->first value correlation. It breaks rand_r()... Other possible methods
like shuffling inside aray will breaks rand_r() too, because it assumes
only one word state, i.e. nothing extra can be added after seed assignment
in srand().
BTW, for old formulae seed->first value correlation is not so monotonically
increased as with other Linear Congruential Generators of this type only
becase arithmetic overflow happens. But overflow affects distribution
and lower bits very badly, as many articles says, such type of overflow
not improves PRNG.
So, monotonically increased seed->first value correlation problem remains...