According to style(9):
> normally, include <sys/types.h> OR <sys/param.h>, but not both.
(<sys/param.h> already includes <sys/types.h> when LOCORE is not defined).
Add missing Symbol.map entry for __aligned_alloc.
Add weak-->strong symbol binding for
{malloc_stats_print,mallctl,mallctlnametomib,mallctlbymib} -->
{__malloc_stats_print,__mallctl,__mallctlnametomib,__mallctlbymib}. These
bindings complete the set necessary to allow applications to replace all
malloc-related symbols.
tdelete() is supposed to return the address of the parent node that has
been deleted. We already keep track of this node in the loop between
lines 94-107. The GO_LEFT()/GO_RIGHT() macros are used later on as well,
so we must make sure not to change it to something else.
Traditionally the hcreate() function creates a hash table that uses
chaining, using a fixed user-provided size. The problem with this
approach is that this often either wastes memory (table too big) or
yields bad performance (table too small). For applications it may not
always be easy to estimate the right hash table size. A fixed number
only increases performance compared to a linked list by a constant
factor.
This problem can be solved easily by dynamically resizing the hash
table. If the size of the hash table is at least doubled, this has no
negative on the running time complexity. If a dynamically sized hash
table is used, we can also switch to using open addressing instead of
chaining, which has the advantage of just using a single allocation for
the entire table, instead of allocating many small objects.
Finally, a problem with the existing implementation is that its
deterministic algorithm for hashing makes it possible to come up with
fixed patterns to trigger an excessive number of collisions. We can
easily solve this by using FNV-1a as a hashing algorithm in combination
with a randomly generated offset basis.
Measurements have shown that this implementation is about 20-25% faster
than the existing implementation (even if the existing implementation is
given an excessive number of buckets). Though it allocates more memory
through malloc() than the old implementation (between 4-8 pointers per
used entry instead of 3), process memory use is similar to the old
implementation as if the estimated size was underestimated by a factor
10. This is due to the fact that malloc() needs to perform less
bookkeeping.
Reviewed by: jilles, pfg
Obtained from: https://github.com/NuxiNL/cloudlibc
Differential Revision: https://reviews.freebsd.org/D4644
The existing implementations of POSIX tsearch() and tdelete() don't
attempt to perform any balancing at all. Testing reveals that inserting
100k nodes into a tree sequentially takes approximately one minute on my
system.
Though most other BSDs also don't use any balanced tree internally, C
libraries like glibc and musl do provide better implementations. glibc
uses a red-black tree and musl uses an AVL tree.
Red-black trees have the advantage over AVL trees that they only require
O(1) rotations after insertion and deletion, but have the disadvantage
that the tree has a maximum depth of 2*log2(n) instead of 1.44*log2(n).
My take is that it's better to focus on having a lower maximum depth,
for the reason that in the case of tsearch() the invocation of the
comparator likely dominates the running time.
This change replaces the tsearch() and tdelete() functions by versions
that create an AVL tree. Compared to musl's implementation, this version
is different in two different ways:
- We don't keep track of heights; just balances. This is sufficient.
This has the advantage that it reduces the number of nodes that are
being accessed. Storing heights requires us to also access all of the
siblings along the path.
- Don't use any recursion at all. We know that the tree cannot 2^64
elements in size, so the height of the tree can never be larger than
96. Use a 128-bit bitmask to keep track of the path that is computed.
This allows us to iterate over the same path twice, meaning we can
apply rotations from top to bottom.
Inserting 100k nodes into a tree now only takes 0.015 seconds. Insertion
seems to be twice as fast as glibc, whereas deletion has about the same
performance. Unlike glibc, it uses a fixed amount of memory.
I also experimented with both recursive and iterative bottom-up
implementations of the same algorithm. This iterative top-down version
performs similar to the recursive bottom-up version in terms of speed
and code size.
For some reason, the iterative bottom-up algorithm was actually 30%
faster for deletion, but has a quadratic memory complexity to keep track
of all the parent pointers.
Reviewed by: jilles
Obtained from: https://github.com/NuxiNL/cloudlibc
Differential Revision: https://reviews.freebsd.org/D4412
Tracking these leads to situations where meta mode will consider the
file to be out of date if /bin/sh or /bin/ln are newer than the source
file. There's no reason for meta mode to do this as make is already
handling the rebuild dependency fine.
Sponsored by: EMC / Isilon Storage Division
are aliases for the syscall stubs and are plt-interposed, to the
libc-private aliases of internally interposed sigprocmask() etc.
Since e.g. _sigaction is not interposed by libthr, calling signal()
removes thr_sighandler() from the handler slot etc. The result was
breaking signal semantic and rtld locking.
The added __libc_sigprocmask and other symbols are hidden, they are
not exported and cannot be called through PLT. The setjmp/longjmp
functions for x86 were changed to use direct calls, and since
PIC_PROLOGUE only needed for functional PLT indirection on i386, it is
removed as well.
The PowerPC bug of calling the syscall directly in the setjmp/longjmp
implementation is kept as is.
Reported by: Pete French <petefrench@ingresso.co.uk>
Tested by: Michiel Boland <boland37@xs4all.nl>
Reviewed by: jilles (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Add a manpage for it, assign the copyright to the OpenBSD project on it since it
is mostly copy/paste from OpenBSD manpage.
style(9) fixes
Differential Revision: https://reviews.freebsd.org/D2420
Reviewed by: kib
Implement a small enhancement to the original qsort implementation:
If the data is 32 bit aligned we can side-step the long type
version and use int instead.
The change brings a modest but significant improvement in
32 bit workloads.
Relnotes: yes
PR: 135718
Taken from: ache
any applications which need unpredictable random numbers, not merely those
which are cryptographic in nature.
If you work for a lottery and you're using random(3) to select the winning
numbers, please let me know.
(or loading a dso linked to libthr.so into process which was not
linked against threading library).
- Remove libthr interposers of the libc functions, including
__error(). Instead, functions calls are indirected through the
interposing table, similar to how pthread stubs in libc are already
done. Libc by default points either to syscall trampolines or to
existing libc implementations. On libthr load, libthr rewrites the
pointers to the cancellable implementations already in libthr. The
interposition table is separate from pthreads stubs indirection
table to not pull pthreads stubs into static binaries.
- Postpone the malloc(3) internal mutexes initialization until libthr
is loaded. This avoids recursion between calloc(3) and static
pthread_mutex_t initialization.
- Reinstall signal handlers with wrapper on libthr load. The
_rtld_is_dlopened(3) is used to avoid useless calls to sigaction(2)
when libthr is statically referenced from the main binary.
In the process, fix openat(2), swapcontext(2) and setcontext(2)
interposing. The libc symbols were exported at different versions
than libthr interposers. Export both libc and libthr versions from
libc now, with default set to the higher version from libthr.
Remove unused and disconnected swapcontext(3) userspace implementation
from libc/gen.
No objections from: deischen
Tested by: pho, antoine (exp-run) (previous versions)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
bsearch_b is the Apple blocks enabled version of bsearch(3).
This was added to libc in Revision 264042 but the commit
missed the declaration required to make use of it.
While here move some other block-related functions to the
BSD_VISIBLE block as these are non-standard.
Phabric: D638
Reviewed by: theraven, wollman
The hcreate(3) implementation and related functions we inherited
from NetBSD used to free() the key value, something that is not
supported by the standard implementation.
This would cause a segmentation fault when attempting to run
the examples from the opengroup and linux manpages. NetBSD
has added non-standard calls to provide the previous
behaviour but hdestroy is not very commonly used so at this
time it seems excessive to bring those to FreeBSD.
Bump the __FreeBSD_version as this is an ABI change.
Reference:
http://bugs.dragonflybsd.org/issues/1398
MFC after: 2 weeks
While testing this I found a conformance issue in hdestroy()
that will be fixed in a subsequent commit.
Obtained from: NetBSD (hcreate.c, CVS Rev. 1.7)
Also ANSIfy a function declaration.
While here update the OpenBSD patch level in getopt_long.c as we
already have the corresponding change.
Obtained from: NetBSD
MFC after: 2 weeks
If realpath() is called on pathnames like "/dev/null/." or "/dev/null/..",
it should fail with [ENOTDIR]. Pathnames like "/dev/null/" already failed as
they should.
Also, put the check for non-directories after lstatting the previous
component instead of when the empty component (consecutive or trailing
slashes) is detected, saving an lstat() call and some lines of code.
PR: kern/82980
MFC after: 2 weeks
if not already defined. This allows building libc from outside of
lib/libc using a reach-over makefile.
A typical use-case is to build a standard ILP32 version and a COMPAT32
version in a single iteration by building the COMPAT32 version using a
reach-over makefile.
Obtained from: Juniper Networks, Inc.