139 lines
5.0 KiB
Plaintext
Raw Normal View History

$Id:$
This the sgmls release 1.1 SGML parser written by James Clark
jjc@jclark.com, repackaged for FreeBSD. The original source may be
obtained from ftp://ftp.jclark.com/.
Pieces removed include:
* Test documents: Compiled on FreeBSD, sgmls passes all tests.
* sgml-mode.el: The sole file covered by the GNU GPL. This is not
installed anyway and anyone wishing to do serious SGML editing
would be best to get the psgml package.
* Makefiles and config files for other operating systems (vms, dos,
cms).
* Formatted versions of the man pages.
20-Apr-1995 John Fieber <jfieber@freebsd.org>
The original README and TODO follow.
----------------------------------------------------------------------
This is sgmls, an SGML parser derived from the ARCSGML parser
materials which were written by Charles F. Goldfarb. (These are
available for anonymous ftp from ftp.ifi.uio.no [128.240.88.1] in the
directory SIGhyper/SGMLUG/distrib.)
The version number is given in the file version.c.
The file INSTALL contains installation instructions.
The file NEWS describes recent user-visible changes.
The file sgmls.man contains a Unix manual page; sgmls.txt is the
formatted version of this.
The file sgml-mode.el contains a very simple SGML mode for GNU Emacs.
The files sgmls.c and sgmls.h contain a small library for parsing the
output of sgmls. This is used by sgmlsasp, which translates the
output of sgmls using an ASP replacement file, and by rast, which
translates the output of sgmls to the format of a RAST result. The
files sgmlsasp.man and rast.man contain Unix manual pages for sgmlsasp
and rast; sgmlsasp.txt and rast.txt are the formatted versions of
these.
The file LICENSE contains the license which applies to arcsgml and
accordingly to those parts of sgmls derived from arcsgml. See also
the copyright notice at the beginning of sgmlxtrn.c. The parts that
were written by me are in the public domain (any files that were
written entirely by me contain a comment to that effect.) The file
sgml-mode.el is covered by the GNU GPL.
Please report any bugs to me. When reporting bugs, please include the
version number, details of your machine, OS and compiler, and a
complete self-contained file that will allow me to reproduce the bug.
James Clark
jjc@jclark.com
----------------------------------------------------------------------
Warn about mixed content models where #PCDATA can't occur everywhere.
Perhaps there should be a configuration option saying what a control
character is for the purpose of SHUNCHAR CONTROLS.
Should the current character that is printed in error messages be
taken from be taken from the file entity or the current entity?
Refine SYS_ action. If we distinguish DELNONCH in lexmark, lexgrp,
lexsd, we can have separate action that ignores the following
character as well.
Should RSs in CDATA/SDATA entities be ignored as specified in 322:1-2?
Similarily, do the rules on REs in 322:3-11 apply to CDATA/SDATA
entities? (I don't think they count as being `in content'.)
What should the entity manager do when it encounters code 13 in an
input file? (Currently it treats it as an RE.)
Document when invalid exclusions are detected.
Option not to perform capacity checking.
Give a warning if the recommendation of 422:1-3 is contravened.
Should an empty CDATA/RCDATA marked section be allowed in the document
type declaration subset?
Include example of use of SGML_PATH in documentation.
Try to detect the situation in 310:8-10 (but see 282:1-2).
Resize hash tables if they become too full.
Say something in the man page about message catalogues.
Consider whether support for SHORTREF NONE requires further changes
(other than disallowing short reference mapping declaration).
Fake /dev/fd/N and /dev/stdin for systems that don't provide it.
Improve the effficiency of the entity manager by not closing and
reopening files. If we run out of FILEs choose the stream with the
fewest bytes remaining to be read, and read the rest of it into
memory. Each entity level will have its own read buffer.
Support multi-line error messages: automatically indent after
newline. (We could output to a temporary file first, then copy to
stderr replacing newlines by newline+indent).
Option that says to output out of context things.
Divide up formal public identifier errors. Give these errors their
own type code.
Consider whether, when OMITTAG is NO, we need to change interpretation
of an empty start-tag (7.4.1.1).
Possibly turn errors 70 and 136 into warnings.
Make things work with NORMSEP > 2. Would need to keep track of number
of CDATA and SDATA entities in CDATA attributes.
Handle `SCOPE INSTANCE'.
In entgen.c, truncate filenames for OSs that don't do this themselves.
Provide an option that specifies that maximum number of errors; when
this limit is exceeded sgmls would exit.
Document non-portable assumptions in the code.
Option to write out SGML declaration. In this case make it write out
APPINFO parameter.
Allow there to be catalogs mapping public ids to filenames.
Environment variable SGML_CATALOG containing list of filenames of
catalogs.