Commit Graph

11 Commits

Author SHA1 Message Date
Ed Schouten
4ef9bd22ed Improve typing of POSIX search tree functions.
Back in 2015 when I reimplemented these functions to use an AVL tree, I
was annoyed by the weakness of the typing of these functions. Both tree
nodes and keys are represented by 'void *', meaning that things like the
documentation for these functions are an absolute train wreck.

To make things worse, users of these functions need to cast the return
value of tfind()/tsearch() from 'void *' to 'type_of_key **' in order to
access the key. Technically speaking such casts violate aliasing rules.
I've observed actual breakages as a result of this by enabling features
like LTO.

I've filed a bug report at the Austin Group. Looking at the way the bug
got resolved, they made a pretty good step in the right direction. A new
type 'posix_tnode' has been added to correspond to tree nodes. It is
still defined as 'void' for source-level compatibility, but in the very
far future it could be replaced by a proper structure type containing a
key pointer.

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8205
2016-10-13 18:25:40 +00:00
Ed Schouten
3196923796 Remove an unneeded assignment of the return value.
tdelete() is supposed to return the address of the parent node that has
been deleted. We already keep track of this node in the loop between
lines 94-107. The GO_LEFT()/GO_RIGHT() macros are used later on as well,
so we must make sure not to change it to something else.
2016-01-14 07:27:42 +00:00
Ed Schouten
459d04a5ee Let tsearch()/tdelete() use an AVL tree.
The existing implementations of POSIX tsearch() and tdelete() don't
attempt to perform any balancing at all. Testing reveals that inserting
100k nodes into a tree sequentially takes approximately one minute on my
system.

Though most other BSDs also don't use any balanced tree internally, C
libraries like glibc and musl do provide better implementations. glibc
uses a red-black tree and musl uses an AVL tree.

Red-black trees have the advantage over AVL trees that they only require
O(1) rotations after insertion and deletion, but have the disadvantage
that the tree has a maximum depth of 2*log2(n) instead of 1.44*log2(n).
My take is that it's better to focus on having a lower maximum depth,
for the reason that in the case of tsearch() the invocation of the
comparator likely dominates the running time.

This change replaces the tsearch() and tdelete() functions by versions
that create an AVL tree. Compared to musl's implementation, this version
is different in two different ways:

- We don't keep track of heights; just balances. This is sufficient.
  This has the advantage that it reduces the number of nodes that are
  being accessed. Storing heights requires us to also access all of the
  siblings along the path.

- Don't use any recursion at all. We know that the tree cannot 2^64
  elements in size, so the height of the tree can never be larger than
  96. Use a 128-bit bitmask to keep track of the path that is computed.
  This allows us to iterate over the same path twice, meaning we can
  apply rotations from top to bottom.

Inserting 100k nodes into a tree now only takes 0.015 seconds. Insertion
seems to be twice as fast as glibc, whereas deletion has about the same
performance. Unlike glibc, it uses a fixed amount of memory.

I also experimented with both recursive and iterative bottom-up
implementations of the same algorithm. This iterative top-down version
performs similar to the recursive bottom-up version in terms of speed
and code size.

For some reason, the iterative bottom-up algorithm was actually 30%
faster for deletion, but has a quadratic memory complexity to keep track
of all the parent pointers.

Reviewed by:	jilles
Obtained from:	https://github.com/NuxiNL/cloudlibc
Differential Revision:	https://reviews.freebsd.org/D4412
2015-12-22 18:12:11 +00:00
Pedro F. Giffuni
02aa7d7b57 Update comment and NetBSD ID tag.
The NetBSD revisions correspond to changes we have already done
like __P() removal and ANSI-fication of definitions.
2015-02-06 14:22:00 +00:00
Pedro F. Giffuni
b20592de1b tdelete(3): don't delete the node we are about to return.
CID:		272528
Obtained from:	NetBSD (CVS rev. 1.4)
MFC after:	2 weeks
2015-02-05 23:02:43 +00:00
Tim J. Robbins
051900864f No need to include <assert.h> here. 2003-01-05 02:43:18 +00:00
Tim J. Robbins
58d38e2520 Style: One space between "restrict" qualifier and "*". 2002-09-06 11:24:06 +00:00
Robert Drehmel
840b798c83 - Add the 'restrict' qualifier to match the IEEE Std 1003.1-2001
prototype of the tdelete(3) function.
 - Remove duplicated space.
 - Use an ANSI-C function definition for tdelete(3).
 - Update the manual page.
2002-08-14 21:16:41 +00:00
David E. O'Brien
333fc21e3c Fix the style of the SCM ID's.
I believe have made all of libc .c's as consistent as possible.
2002-03-22 21:53:29 +00:00
David E. O'Brien
c05ac53b8b Remove __P() usage. 2002-03-21 22:49:10 +00:00
Alfred Perlstein
64566a3e2a bring in binary search tree code.
Obtained from: NetBSD
2000-07-01 06:55:11 +00:00