freebsd-dev/gnu/usr.bin/ptx/examples/ignore
1994-05-06 07:54:54 +00:00
..
bix ptx: permuted index generator 1994-05-06 07:54:54 +00:00
eign ptx: permuted index generator 1994-05-06 07:54:54 +00:00
README ptx: permuted index generator 1994-05-06 07:54:54 +00:00

From beebe@math.utah.edu Wed Oct 27 19:37:22 1993
Date: Tue, 26 Oct 93 15:43:19 MDT
From: "Nelson H. F. Beebe" <beebe@math.utah.edu>
To: pinard@iro.umontreal.ca
Subject: Re: Another short comment on gptx 0.2

/usr/lib/eign:		DECstation 5000, ULTRIX 4.3
			HP 9000/735, HP-UX 9.0
			IBM RS/6000, AIX 2.3
			IBM 3090, AIX MP370 2.1
			Stardent 1520, OS 2.2
			Sun SPARCstation, SunOS 4.x

No eign anywhere on:	HP 375, BSD 4.3 (ptx.c is in /usr/src/usr.bin,
				and the source code refers to /usr/lib/eign,
				but I could not find it in the source tree)
			NeXT, Mach 3.0 (though documented in man pages)
			Sun SPARCstation, Solaris 2.x
			SGI Indigo, IRIX 4.0.x
			
The contents of the eign files that I found on the above machines were
almost identical.  With the exception of the Stardent and the IBM
3090, there were only two such files, one with 150 words, and the
other with 133, with only a few differences between them (some words
in the 133-word file were not in the 150-word file).  I found the
133-word variant in groff-1.06/src/indxbib.  I used archie to search
for eign, and it found 7 sites, all with the groff versions.

The Stardent and IBM 3090 eign files have the same contents as the
150-word version, but have a multiline copyright comment at the
beginning.  None of the others contains a copyright.

I recently had occasion to build a similar list of words for bibindex,
which indexes a BibTeX .bib file, and for which omission of common
words, like articles and prepositions, helps to reduce the size of the
index.  I didn't use eign to build that list, but instead, went
through the word lists from 3.8MB of .bib files in the tuglib
collection on ftp.math.utah.edu:pub/tex/bib, and collected words to be
ignored.  That list includes words from several languages.  I'll leave
it up to you to decide whether you wish to merge them or not; I
suspect it may be a better design choice to keep a separate eign file
for each language, although in my own application of ptx-ing
bibliographies, the titles do occur in multiple languages, so a
mixed-language eign is appropriate.  Since there are standard ISO
2-letter abbreviations for every country, perhaps one could have
eign.xy for country xy (of course, only approximately is country ==
language).  The exact list of words in eign is not so critical; its
only purpose is to reduce the size of the output by not indexing words
that occur very frequently and have little content in themselves.

I'm enclosing a shar bundle at the end of this message with the merger
of the multiple eign versions (duplicates eliminated, and the list
sorted into 179 unique words), followed by the bibindex list.



========================================================================
Nelson H. F. Beebe                      Tel: +1 801 581 5254
Center for Scientific Computing         FAX: +1 801 581 4148
Department of Mathematics, 105 JWB      Internet: beebe@math.utah.edu
University of Utah
Salt Lake City, UT 84112, USA
========================================================================