Commit Graph

560 Commits

Author SHA1 Message Date
ache
7ec3418f8e Fix typo in the comment 2007-12-11 20:39:32 +00:00
jasone
f7cf554c6b Only zero large allocations when necessary (for calloc()). 2007-11-28 00:17:34 +00:00
jasone
afb1c25b9f Document the B and L MALLOC_OPTIONS. 2007-11-27 03:18:26 +00:00
jasone
4db288d2b3 Implement dynamic load balancing of thread-->arena mapping, based on lock
contention.  The intent is to dynamically adjust to load imbalances, which
can cause severe contention.

Use pthread mutexes where possible instead of libc "spinlocks" (they aren't
actually spin locks).  Conceptually, this change is meant only to support
the dynamic load balancing code by enabling the use of spin locks, but it
has the added apparent benefit of substantially improving performance due to
reduced context switches when there is moderate arena lock contention.

Proper tuning parameter configuration for this change is a finicky business,
and it is very much machine-dependent.  One seemingly promising solution
would be to run a tuning program during operating system installation that
computes appropriate settings for load balancing.  (The pthreads adaptive
spin locks should probably be similarly tuned.)
2007-11-27 03:17:30 +00:00
jasone
2dd595aefe Implement lazy deallocation of small objects. For each arena, maintain a
vector of slots for lazily freed objects.  For each deallocation, before
doing the hard work of locking the arena and deallocating, try several times
to randomly insert the object into the vector using atomic operations.

This approach is particularly effective at reducing contention for
multi-threaded applications that use the producer-consumer model, wherein
one producer thread allocates objects, then multiple consumer threads
deallocate those objects.
2007-11-27 03:13:15 +00:00
jasone
305f155400 Avoid re-zeroing memory in calloc() when possible. 2007-11-27 03:12:15 +00:00
jasone
6aa187a4bf Fix stats printing of the amount of memory currently consumed by huge
allocations. [1]

Fix calculation of the number of arenas when 'n' is specified via
MALLOC_OPTIONS.

Clean up various style inconsistencies.

Obtained from:	[1] NetBSD
2007-11-27 03:09:23 +00:00
davidxu
728449ebd6 Remove out of date notes, the atoi code is thread-safe and async-cancel
safe.

Discussed with: desichen
2007-10-19 06:23:39 +00:00
scf
c96f51e4b8 The precision for a string argument in a call to warnx() needs to be cast
to an int to remove the warning from using a size_t variable on 64-bit
platforms.

Submitted by:	Xin LI <delphij@FreeBSD.org>
Approved by:	wes
Approved by:	re (kensmith)
2007-09-22 02:30:44 +00:00
scf
1de7e4d9a4 Skip rebuilding environ in setenv() only upon reuse of an active variable;
inactive variables should cause a rebuild of environ, otherwise, exec()'d
processes will be missing a variable in environ that has been unset then
set.

Submitted by:	Taku Yamamoto <taku@tackymt.homeip.net>
Reviewed by:	ache
Approved by:	wes (mentor)
Approved by:	re (kensmith)
2007-09-15 21:48:54 +00:00
scf
8db32e8a1e Added environ-replacement detection. For programs that "clean" (i.e., su)
or replace (i.e., zdump) the environment after a call to setenv(), putenv()
or unsetenv() has been made, a few changes were made.
  - getenv() will return the value from the new environ array.
  - setenv() was split into two functions:  __setenv() which is most of the
    previous setenv() without checks on the name and setenv() which
    contains the checks before calling __setenv().
  - setenv(), putenv() and unsetenv() will unset all previous values and
    call __setenv() on all entries in the new environ array which in turn
    adds them to the end of the envVars array.  Calling __setenv() instead
    of setenv() is done to avoid the temporary replacement of the '=' in a
    string with a NUL byte.  Some strings may be read-only data.

Added more regression checks for clearing the environment array.

Replaced gettimeofday() with getrusage() in timing regression check for
better accuracy.

Fixed an off-by-one bug in __remove_putenv() in the use of memmove().  This
went unnoticed due to the allocation of double the number of environ
entries when building envVars.

Fixed a few spelling mistakes in the comments.

Reviewed by:	ache
Approved by:	wes
Approved by:	re (kensmith)
2007-07-20 23:30:13 +00:00
scf
196b6346ba Significantly reduce the memory leak as noted in BUGS section for
setenv(3) by tracking the size of the memory allocated instead of using
strlen() on the current value.

Convert all calls to POSIX from historic BSD API:
 - unsetenv returns an int.
 - putenv takes a char * instead of const char *.
 - putenv no longer makes a copy of the input string.
 - errno is set appropriately for POSIX.  Exceptions involve bad environ
   variable and internal initialization code.  These both set errno to
   EFAULT.

Several patches to base utilities to handle the POSIX changes from
Andrey Chernov's previous commit.  A few I re-wrote to use setenv()
instead of putenv().

New regression module for tools/regression/environ to test these
functions.  It also can be used to test the performance.

Bump __FreeBSD_version to 700050 due to API change.

PR:		kern/99826
Approved by:	wes
Approved by:	re (kensmith)
2007-07-04 00:00:41 +00:00
jasone
3f7b9ec5f3 Add information about the implications of using mmap(2) instead of sbrk(2).
Submitted by:	bmah, jhb
2007-06-15 22:32:33 +00:00
jasone
a3eda1cfc4 Fix junk/zero filling for realloc(). Junk filling was missing in one case,
and zero filling was broken in a way that could cause memory corruption.

Update comments.
2007-06-15 22:00:16 +00:00
jon
50945671cb Backout 1.5 as requested by deischen 2007-05-22 05:28:40 +00:00
jon
f4c0271704 __cleanup() is needed for ports/devel/valgrind, export it. 2007-05-22 03:03:28 +00:00
ache
6ccaf050cc Back out all POSIXified *env() changes.
Not because I admit they are technically wrong and not because of bug
reports (I receive nothing). But because I surprisingly meets so
strong opposition and resistance so lost any desire to continue that.

Anyone who interested in POSIX can dig out what changes and how
through cvs diffs.
2007-05-01 16:02:44 +00:00
ache
55677ed894 Bump .Dd
Suggested by: Henrik Brix Andersen <henrik@brixandersen.dk>
2007-04-30 19:37:10 +00:00
ache
c0dd4e798d Add phrase
"so altering the argument shall change the environment."
into putenv description.
2007-04-30 18:01:51 +00:00
ache
d85104099a Make putenv() fully conforms to Open Group specs Issue 6
(also IEEE Std 1003.1-2001)

The specs explicitly says that altering passed string
should change the environment, i.e. putenv() directly puts its arg
into environment (unlike setenv() which just copies it there).
It means that putenv() can't be implemented via setenv()
(like we have before) at all. Putenv() value lives (allows modifying)
up to the next putenv() or setenv() call.
2007-04-30 16:56:18 +00:00
ache
c323ce85c0 Remove special case skipping initial '=' of the setenv() value "for
compatibility with the different environment conventions" (man page).
With the standards, we don't have them different anymore and
IEEE Std 1003.1-2001 says that

"The values that the environment variables may be assigned are not
restricted except that they are considered to end with a null byte"
2007-04-30 03:47:31 +00:00
ache
df3730247f Make setenv, putenv, getenv and unsetenv conforming to Open Group specs
Issue 6 (also IEEE Std 1003.1-2001) in following areas:
args, return, errors.

Putenv still needs rewriting because specs explicitly says that
altering passed string later should change the environment (currently we
copy the string so can't provide that).
2007-04-30 02:25:02 +00:00
deischen
2a7306fdc5 Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
ru
2cb30c04fd Swap "underflow"/"overflow" in the table header.
Submitted by:	Ricardo Nabinger Sanchez
MFC after:	3 days
2007-04-10 11:17:00 +00:00
jasone
26e0e354b0 Use size_t instead of unsigned for pagesize-related values, in order to
avoid downcasting issues.  In particular, this change fixes
posix_memalign(3) for alignments greater than 2^31 on LP64 systems.

Make sure that NDEBUG is always set to be compatible with MALLOC_DEBUG. [1]

Reported by:	[1] Lee Hyo geol <hyogeollee@gmail.com>
2007-03-29 21:07:17 +00:00
jasone
ca1ed8272e Remove the run promotion/demotion machinery. Replace it with red-black
trees that track all non-full runs for each bin.  Use the red-black
trees to be able to guarantee that each new allocation is placed in the
lowest address available in any non-full run.  This change completes the
transition to allocating from low addresses in order to reduce the
retention of sparsely used chunks.

If the run in current use by a bin becomes empty, deallocate the run
rather than retaining it for later use.  The previous behavior had the
tendency to spread empty runs across multiple chunks, thus preventing
the release of chunks that were completely unused.

Generalize base_chunk_alloc() (and rename it to base_pages_alloc()) to
handle allocation sizes larger than the chunk size, so that it is
possible to support chunk sizes that are smaller than an arena object.

Reduce the minimum chunk size from 64kB to 8kB.

Optimize tracking of addresses for deleted chunks.

Fix a statistics bug for huge allocations.
2007-03-28 19:55:07 +00:00
jasone
07d14320a3 Update the IMPLEMENTATION NOTES section to reflect recent malloc
enhancements.
2007-03-28 04:34:19 +00:00
jasone
a011d43e23 Add a HISTORY section. 2007-03-28 04:32:51 +00:00
jasone
8561f36aa3 Fix some subtle bugs for posix_memalign() having to do with integer
rounding and overflow.  Carefully document what the various overflow
tests actually detect.

The bugs mostly canceled out, such that the worst possible failure
cases resulted in non-fatal over-allocations.
2007-03-24 20:44:06 +00:00
jasone
b63dd57bc9 Fix posix_memalign() for large objects. Now that runs are extents rather
than binary buddies, the alignment guarantees are weaker, which requires
a more complex aligned allocation algorithm, similar to that used for
alignment greater than the chunk size.

Reported by:	matteo
2007-03-23 22:58:15 +00:00
jasone
a1325e04ba Use extents rather than binary buddies to track free pages within
chunks.  This allows runs to be any multiple of the page size.  The
primary advantage is that large objects are no longer constrained to be
2^n pages, which can dramatically decrease internal fragmentation for
large objects.  This also allows the sizes for runs that back small
objects to be more finely tuned.

Free runs are searched for linearly using the chunk page map (with the
help of some heuristic optimizations).  This changes the allocation
policy from "first best fit" to "first fit".  A prototype red-black tree
implementation for tracking free runs that implemented "first best fit"
did not cause a measurable speed or memory usage difference for
realistic chunk sizes (though of course it is possible to construct
benchmarks that favor one allocation policy over another).

Refine the handling of fullness constraints for small runs to be more
tunable.

Restructure the per chunk page map to contain only two fields per entry,
rather than four.  Also, increase each entry from 4 to 8 bytes, since it
allows for 32-bit integers, without increasing the number of chunk
header pages.

Relax the maximum chunk size constraint.  This is of no practical
interest; it is merely fallout from the chunk page map restructuring.

Revamp statistics gathering and reporting to be faster, clearer and more
informative.  Statistics gathering is fast enough now to have little
to no impact on application speed, but it still requires approximately
two extra pages of memory per arena (per process).  This memory overhead
may be acceptable for most systems, but we still need to leave
statistics gathering disabled by default in RELENG branches.

Rename NO_MALLOC_EXTRAS to MALLOC_PRODUCTION in order to make its intent
clearer (i.e. it should be defined in RELENG branches).
2007-03-23 05:05:48 +00:00
jasone
ef1856bb83 Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]

Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class.  Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.

Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound.  There are two exceptions that override the header
overhead bound:

	1) If the bound is impossible to honor, it is relaxed on a
	   per-size-class basis.  Since there is one bit of header
	   overhead per object (plus a constant), it is impossible to
	   achieve a header overhead less than or equal to 1/(# of bits
	   per object).  For the current setting of maximum 0.5% header
	   overhead, this relaxation comes into play for {2, 4, 8,
	   16}-byte objects, for which header overhead is (on 64-bit
	   systems) {7.1, 4.3, 2.2, 1.2}%, respectively.

	2) There is still a cap on small run size, still set to 64kB.
	   This comes into play for {1024, 2048}-byte objects, for which
	   header overhead is {1.6, 3.1}%, respectively.

In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad.  It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).

Reduce the default chunk size from 2MB to 1MB.  Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).

Reported by:	[1] kientzle
2007-03-20 03:44:10 +00:00
jasone
a1e21ebd26 Modify chunk_alloc() to prefer mmap()ed memory over sbrk()ed memory.
This has no impact unless USE_BRK is defined (32-bit platforms), in
which case user allocations are allocated via mmap() if at all possible,
in order to avoid the possibility of unreclaimable chunks in the data
segment.

Fix an obscure bug in base_alloc() that could have allowed undefined
behavior if an application were to use sbrk() in conjunction with a
USE_BRK-enabled malloc.
2007-02-22 19:10:30 +00:00
jasone
5918b89319 Fix a utrace(2)-related bug in calloc(3).
Integrate various pedantic cleanups.

Submitted by:	Andrew Doran <ad@netbsd.org>
2007-01-31 22:54:19 +00:00
imp
cd1f140ae4 Per Regents of the University of Calfornia letter, remove advertising
clause.

# If I've done so improperly on a file, please let me know.
2007-01-09 00:28:16 +00:00
jasone
9667303e99 Implement chunk allocation/deallocation hysteresis by caching one spare
chunk per arena, rather than immediately deallocating all unused chunks.
This fixes a potential performance issue when allocating/deallocating
an object of size (4kB..1MB] in a loop.

Reported by:	davidxu
2006-12-23 00:18:51 +00:00
trhodes
784213457b Note that the value from getenv() should not be modified by applications.
PR:		60544
Reviewed by:	ru
2006-10-12 08:39:24 +00:00
trhodes
1f27dd98e0 getenv.3: Put "is" on a line with other words
getobjformat.3: "takes precedence over" is not an envrionment variable.

PR:		75545
Submitted by:	n-kogane@syd.odn.ne.jp
MFC after:	3 days
2006-10-07 21:27:21 +00:00
ru
81bed6b884 Revise markup in recently added manpages. 2006-09-30 10:34:13 +00:00
ache
b8fd741213 Keep compatible parts in sync with OpenBSD v1.21, add some comments.
No functional changes.
2006-09-23 14:48:31 +00:00
ache
b4df5c3aa1 Remove code #ifndef'ed in prev. commit to stay in sync with OpenBSD
v1.21 which just do that.
2006-09-22 18:59:03 +00:00
ache
eb7bc007cc Be more GNU compatible:
don't be greedy on the GNU "::" extension when arg separated by whitespace
and POSIX_CORRECTLY is set. From POSIX point of view this is unclear
situation, so minimal assumption looks right.
2006-09-22 17:01:38 +00:00
ru
f4eec08060 Markup fixes. 2006-09-17 21:27:35 +00:00
jasone
ce0ab81797 Change the way base allocation is done for internal malloc data
structures, in order to avoid the possibility of attempted recursive
lock acquisition for chunks_mtx.

Reported by:	Slawa Olhovchenkov <slw@zxy.spb.ru>
2006-09-08 17:52:15 +00:00
ru
cb0ad18d63 alloca() cannot check if the allocation is valid; mention the consequences.
Obtained from:	OpenBSD
2006-09-05 16:30:11 +00:00
marcel
d9435a56c2 Enable TLS on PowerPC. 2006-09-01 19:14:14 +00:00
marcel
aa70489a8b Enable TLS on ia64. 2006-09-01 06:18:43 +00:00
cperciva
230593e64f Correctly handle the case in calloc(num, size) where
(size_t)(num * size) == 0
but both num and size are nonzero.

Reported by:	Ilja van Sprundel
Approved by:	jasone
Security:	Integer overflow; calloc was allocating 1 byte in
		response to a request for a multiple of 2^32 (or 2^64)
		bytes instead of returning NULL.
2006-08-13 21:54:47 +00:00
marcel
bf73c5645f Define NO_TLS on PowerPC.
See also: PR ia64/91846
2006-08-09 19:01:27 +00:00
jasone
c606303b8c Conditionally expand the size_invs lookup table in arena_run_reg_dalloc()
so that architectures with a quantum of 8 (rather than 16) work.

Restore arm's quantum to 8.

Submitted by:	jmg
2006-07-27 19:09:32 +00:00