354 lines
11 KiB
Plaintext
354 lines
11 KiB
Plaintext
.\" tbl | readme.ms | [tn]roff -ms | ...
|
|
.\" note the "C" (courier) and "CB" fonts: you will probably have to
|
|
.\" change these.
|
|
.\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $
|
|
|
|
.de P1
|
|
.br
|
|
.nr dT 4
|
|
.nf
|
|
.ft C
|
|
.sp .5
|
|
.nr t \\n(dT*\\w'x'u
|
|
.ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu
|
|
..
|
|
.de P2
|
|
.br
|
|
.ft 1
|
|
.br
|
|
.sp .5
|
|
.br
|
|
.fi
|
|
..
|
|
.\" CW uses the typewriter/courier font.
|
|
.de CW
|
|
\fC\\$1\\fP\\$2
|
|
..
|
|
|
|
.\" Footnote numbering [by Henry Spencer]
|
|
.\" <text>\*f for a footnote number..
|
|
.\" .FS
|
|
.\" \*F <footnote text>
|
|
.\" .FE
|
|
.\"
|
|
.ds f \\u\\s-2\\n+f\\s+2\\d
|
|
.nr f 0 1
|
|
.ds F \\n+F.
|
|
.nr F 0 1
|
|
|
|
.ND
|
|
.LP
|
|
.TL
|
|
\fIsdbm\fP \(em Substitute DBM
|
|
.br
|
|
or
|
|
.br
|
|
Berkeley \fIndbm\fP for Every UN*X\** Made Simple
|
|
.AU
|
|
Ozan (oz) Yigit
|
|
.AI
|
|
The Guild of PD Software Toolmakers
|
|
Toronto - Canada
|
|
.sp
|
|
oz@nexus.yorku.ca
|
|
.LP
|
|
.FS
|
|
UN*X is not a trademark of any (dis)organization.
|
|
.FE
|
|
.sp 2
|
|
\fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP
|
|
.SH
|
|
A The Clone of the \fIndbm\fP library
|
|
.PP
|
|
The sources accompanying this notice \(em \fIsdbm\fP \(em constitute
|
|
the first public release (Dec. 1990) of a complete clone of
|
|
the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to
|
|
clone the proven functionality of \fIndbm\fP as closely as possible,
|
|
including a few improvements. It is practical, easy to understand, and
|
|
compatible.
|
|
The \fIsdbm\fP library is not derived from any licensed, proprietary or
|
|
copyrighted software.
|
|
.PP
|
|
The \fIsdbm\fP implementation is based on a 1978 algorithm
|
|
[Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''.
|
|
In the course of searching for a substitute for \fIndbm\fP, I
|
|
prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80]
|
|
and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP
|
|
implementation. The Bell Labs
|
|
\fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by
|
|
Ken Thompson, [Tho90, Tor87] and predates Larson's work.
|
|
.PP
|
|
The \fIsdbm\fR programming interface is totally compatible
|
|
with \fIndbm\fP and includes a slight improvement in database initialization.
|
|
It is also expected to be binary-compatible under most UN*X versions that
|
|
support the \fIndbm\fP library.
|
|
.PP
|
|
The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP
|
|
library, as a side effect of various simplifications to the original Larson
|
|
algorithm. It does produce \fIholes\fP in the page file as it writes
|
|
pages past the end of file. (Larson's paper include a clever solution to
|
|
this problem that is a result of using the hash value directly as a block
|
|
address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP
|
|
creates fewer holes in general, and the resulting pagefiles are
|
|
smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP
|
|
in database creation.
|
|
Unlike the \fIndbm\fP, the \fIsdbm\fP
|
|
.CW store
|
|
operation will not ``wander away'' trying to split its
|
|
data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case
|
|
situations) be inserted. (It will fail after a pre-defined number of attempts.)
|
|
.SH
|
|
Important Compatibility Warning
|
|
.PP
|
|
The \fIsdbm\fP and \fIndbm\fP
|
|
libraries \fIcannot\fP share databases: one cannot read the (dir/pag)
|
|
database created by the other. This is due to the differences
|
|
between the \fIndbm\fP and \fIsdbm\fP algorithms\**,
|
|
.FS
|
|
Torek's discussion [Tor87]
|
|
indicates that \fIdbm/ndbm\fP implementations use the hash
|
|
value to traverse the radix trie differently than \fIsdbm\fP
|
|
and as a result, the page indexes are generated in \fIdifferent\fP order.
|
|
For more information, send e-mail to the author.
|
|
.FE
|
|
and the hash functions
|
|
used.
|
|
It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP
|
|
by ignoring the index completely: see
|
|
.CW dbd ,
|
|
.CW dbu
|
|
etc.
|
|
.R
|
|
.LP
|
|
.SH
|
|
Notice of Intellectual Property
|
|
.LP
|
|
\fIThe entire\fP sdbm \fIlibrary package, as authored by me,\fP Ozan S. Yigit,
|
|
\fIis hereby placed in the public domain.\fP As such, the author is not
|
|
responsible for the consequences of use of this software, no matter how
|
|
awful, even if they arise from defects in it. There is no expressed or
|
|
implied warranty for the \fIsdbm\fP library.
|
|
.PP
|
|
Since the \fIsdbm\fP
|
|
library package is in the public domain, this \fIoriginal\fP
|
|
release or any additional public-domain releases of the modified original
|
|
cannot possibly (by definition) be withheld from you. Also by definition,
|
|
You (singular) have all the rights to this code (including the right to
|
|
sell without permission, the right to hoard\**
|
|
.FS
|
|
You cannot really hoard something that is available to the public at
|
|
large, but try if it makes you feel any better.
|
|
.FE
|
|
and the right to do other icky things as
|
|
you see fit) but those rights are also granted to everyone else.
|
|
.PP
|
|
Please note that all previous distributions of this software contained
|
|
a copyright (which is now dropped) to protect its
|
|
origins and its current public domain status against any possible claims
|
|
and/or challenges.
|
|
.SH
|
|
Acknowledgments
|
|
.PP
|
|
Many people have been very helpful and supportive. A partial list would
|
|
necessarily include Rayan Zacherissen (who contributed the man page,
|
|
and also hacked a MMAP version of \fIsdbm\fP),
|
|
Arnold Robbins, Chris Lewis,
|
|
Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started
|
|
in the first place), Johannes Ruschein
|
|
(who did the minix port) and David Tilbrook. I thank you all.
|
|
.SH
|
|
Distribution Manifest and Notes
|
|
.LP
|
|
This distribution of \fIsdbm\fP includes (at least) the following:
|
|
.P1
|
|
CHANGES change log
|
|
README this file.
|
|
biblio a small bibliography on external hashing
|
|
dba.c a crude (n/s)dbm page file analyzer
|
|
dbd.c a crude (n/s)dbm page file dumper (for conversion)
|
|
dbe.1 man page for dbe.c
|
|
dbe.c Janick's database editor
|
|
dbm.c a dbm library emulation wrapper for ndbm/sdbm
|
|
dbm.h header file for the above
|
|
dbu.c a crude db management utility
|
|
hash.c hashing function
|
|
makefile guess.
|
|
pair.c page-level routines (posted earlier)
|
|
pair.h header file for the above
|
|
readme.ms troff source for the README file
|
|
sdbm.3 man page
|
|
sdbm.c the real thing
|
|
sdbm.h header file for the above
|
|
tune.h place for tuning & portability thingies
|
|
util.c miscellaneous
|
|
.P2
|
|
.PP
|
|
.CW dbu
|
|
is a simple database manipulation program\** that tries to look
|
|
.FS
|
|
The
|
|
.CW dbd ,
|
|
.CW dba ,
|
|
.CW dbu
|
|
utilities are quick hacks and are not fit for production use. They were
|
|
developed late one night, just to test out \fIsdbm\fP, and convert some
|
|
databases.
|
|
.FE
|
|
like Bell Labs'
|
|
.CW cbt
|
|
utility. It is currently incomplete in functionality.
|
|
I use
|
|
.CW dbu
|
|
to test out the routines: it takes (from stdin) tab separated
|
|
key/value pairs for commands like
|
|
.CW build
|
|
or
|
|
.CW insert
|
|
or takes keys for
|
|
commands like
|
|
.CW delete
|
|
or
|
|
.CW look .
|
|
.P1
|
|
dbu <build|creat|look|insert|cat|delete> dbmfile
|
|
.P2
|
|
.PP
|
|
.CW dba
|
|
is a crude analyzer of \fIdbm/sdbm/ndbm\fP
|
|
page files. It scans the entire
|
|
page file, reporting page level statistics, and totals at the end.
|
|
.PP
|
|
.CW dbd
|
|
is a crude dump program for \fIdbm/ndbm/sdbm\fP
|
|
databases. It ignores the
|
|
bitmap, and dumps the data pages in sequence. It can be used to create
|
|
input for the
|
|
.CW dbu
|
|
utility.
|
|
Note that
|
|
.CW dbd
|
|
will skip any NULLs in the key and data
|
|
fields, thus is unsuitable to convert some peculiar databases that
|
|
insist in including the terminating null.
|
|
.PP
|
|
I have also included a copy of the
|
|
.CW dbe
|
|
(\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for
|
|
your pleasure. You may find it more useful than the little
|
|
.CW dbu
|
|
utility.
|
|
.PP
|
|
.CW dbm.[ch]
|
|
is a \fIdbm\fP library emulation on top of \fIndbm\fP
|
|
(and hence suitable for \fIsdbm\fP). Written by Robert Elz.
|
|
.PP
|
|
The \fIsdbm\fP
|
|
library has been around in beta test for quite a long time, and from whatever
|
|
little feedback I received (maybe no news is good news), I believe it has been
|
|
functioning without any significant problems. I would, of course, appreciate
|
|
all fixes and/or improvements. Portability enhancements would especially be
|
|
useful.
|
|
.SH
|
|
Implementation Issues
|
|
.PP
|
|
Hash functions:
|
|
The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling
|
|
hash function to be effective. I ran into a set of constants for a simple
|
|
hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP
|
|
for various inputs:
|
|
.P1
|
|
/*
|
|
* polynomial conversion ignoring overflows
|
|
* 65599 nice. 65587 even better.
|
|
*/
|
|
long
|
|
dbm_hash(char *str, int len) {
|
|
register unsigned long n = 0;
|
|
|
|
while (len--)
|
|
n = n * 65599 + *str++;
|
|
return n;
|
|
}
|
|
.P2
|
|
.PP
|
|
There may be better hash functions for the purposes of dynamic hashing.
|
|
Try your favorite, and check the pagefile. If it contains too many pages
|
|
with too many holes, (in relation to this one for example) or if
|
|
\fIsdbm\fP
|
|
simply stops working (fails after
|
|
.CW SPLTMAX
|
|
attempts to split) when you feed your
|
|
NEWS
|
|
.CW history
|
|
file to it, you probably do not have a good hashing function.
|
|
If you do better (for different types of input), I would like to know
|
|
about the function you use.
|
|
.PP
|
|
Block sizes: It seems (from various tests on a few machines) that a page
|
|
file block size
|
|
.CW PBLKSIZ
|
|
of 1024 is by far the best for performance, but
|
|
this also happens to limit the size of a key/value pair. Depending on your
|
|
needs, you may wish to increase the page size, and also adjust
|
|
.CW PAIRMAX
|
|
(the maximum size of a key/value pair allowed: should always be at least
|
|
three words smaller than
|
|
.CW PBLKSIZ .)
|
|
accordingly. The system-wide version of the library
|
|
should probably be
|
|
configured with 1024 (distribution default), as this appears to be sufficient
|
|
for most common uses of \fIsdbm\fP.
|
|
.SH
|
|
Portability
|
|
.PP
|
|
This package has been tested in many different UN*Xes even including minix,
|
|
and appears to be reasonably portable. This does not mean it will port
|
|
easily to non-UN*X systems.
|
|
.SH
|
|
Notes and Miscellaneous
|
|
.PP
|
|
The \fIsdbm\fP is not a very complicated package, at least not after you
|
|
familiarize yourself with the literature on external hashing. There are
|
|
other interesting algorithms in existence that ensure (approximately)
|
|
single-read access to a data value associated with any key. These are
|
|
directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson
|
|
variations), \fIspiral storage\fP [Mar79] or directory schemes such as
|
|
\fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources
|
|
provide a reasonable playground for experimentation with other algorithms.
|
|
See the June 1988 issue of ACM Computing Surveys [Enb88] for an
|
|
excellent overview of the field.
|
|
.PG
|
|
.SH
|
|
References
|
|
.LP
|
|
.IP [Lar78] 4m
|
|
P.-A. Larson,
|
|
``Dynamic Hashing'', \fIBIT\fP, vol. 18, pp. 184-201, 1978.
|
|
.IP [Tho90] 4m
|
|
Ken Thompson, \fIprivate communication\fP, Nov. 1990
|
|
.IP [Lit80] 4m
|
|
W. Litwin,
|
|
`` Linear Hashing: A new tool for file and table addressing'',
|
|
\fIProceedings of the 6th Conference on Very Large Dabatases (Montreal)\fP,
|
|
pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980.
|
|
.IP [Fag79] 4m
|
|
R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong,
|
|
``Extendible Hashing - A Fast Access Method for Dynamic Files'',
|
|
\fIACM Trans. Database Syst.\fP, vol. 4, no.3, pp. 315-344, Sept. 1979.
|
|
.IP [Wal84] 4m
|
|
Rich Wales,
|
|
``Discussion of "dbm" data base system'', \fIUSENET newsgroup unix.wizards\fP,
|
|
Jan. 1984.
|
|
.IP [Tor87] 4m
|
|
Chris Torek,
|
|
``Re: dbm.a and ndbm.a archives'', \fIUSENET newsgroup comp.unix\fP,
|
|
1987.
|
|
.IP [Mar79] 4m
|
|
G. N. Martin,
|
|
``Spiral Storage: Incrementally Augmentable Hash Addressed Storage'',
|
|
\fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979.
|
|
.IP [Enb88] 4m
|
|
R. J. Enbody and H. C. Du,
|
|
``Dynamic Hashing Schemes'',\fIACM Computing Surveys\fP,
|
|
vol. 20, no. 2, pp. 85-113, June 1988.
|