139 lines
5.0 KiB
Plaintext
139 lines
5.0 KiB
Plaintext
|
$Id:$
|
||
|
|
||
|
This the sgmls release 1.1 SGML parser written by James Clark
|
||
|
jjc@jclark.com, repackaged for FreeBSD. The original source may be
|
||
|
obtained from ftp://ftp.jclark.com/.
|
||
|
|
||
|
Pieces removed include:
|
||
|
* Test documents: Compiled on FreeBSD, sgmls passes all tests.
|
||
|
* sgml-mode.el: The sole file covered by the GNU GPL. This is not
|
||
|
installed anyway and anyone wishing to do serious SGML editing
|
||
|
would be best to get the psgml package.
|
||
|
* Makefiles and config files for other operating systems (vms, dos,
|
||
|
cms).
|
||
|
* Formatted versions of the man pages.
|
||
|
|
||
|
|
||
|
20-Apr-1995 John Fieber <jfieber@freebsd.org>
|
||
|
|
||
|
|
||
|
The original README and TODO follow.
|
||
|
----------------------------------------------------------------------
|
||
|
This is sgmls, an SGML parser derived from the ARCSGML parser
|
||
|
materials which were written by Charles F. Goldfarb. (These are
|
||
|
available for anonymous ftp from ftp.ifi.uio.no [128.240.88.1] in the
|
||
|
directory SIGhyper/SGMLUG/distrib.)
|
||
|
|
||
|
The version number is given in the file version.c.
|
||
|
|
||
|
The file INSTALL contains installation instructions.
|
||
|
|
||
|
The file NEWS describes recent user-visible changes.
|
||
|
|
||
|
The file sgmls.man contains a Unix manual page; sgmls.txt is the
|
||
|
formatted version of this.
|
||
|
|
||
|
The file sgml-mode.el contains a very simple SGML mode for GNU Emacs.
|
||
|
|
||
|
The files sgmls.c and sgmls.h contain a small library for parsing the
|
||
|
output of sgmls. This is used by sgmlsasp, which translates the
|
||
|
output of sgmls using an ASP replacement file, and by rast, which
|
||
|
translates the output of sgmls to the format of a RAST result. The
|
||
|
files sgmlsasp.man and rast.man contain Unix manual pages for sgmlsasp
|
||
|
and rast; sgmlsasp.txt and rast.txt are the formatted versions of
|
||
|
these.
|
||
|
|
||
|
The file LICENSE contains the license which applies to arcsgml and
|
||
|
accordingly to those parts of sgmls derived from arcsgml. See also
|
||
|
the copyright notice at the beginning of sgmlxtrn.c. The parts that
|
||
|
were written by me are in the public domain (any files that were
|
||
|
written entirely by me contain a comment to that effect.) The file
|
||
|
sgml-mode.el is covered by the GNU GPL.
|
||
|
|
||
|
Please report any bugs to me. When reporting bugs, please include the
|
||
|
version number, details of your machine, OS and compiler, and a
|
||
|
complete self-contained file that will allow me to reproduce the bug.
|
||
|
|
||
|
James Clark
|
||
|
jjc@jclark.com
|
||
|
|
||
|
----------------------------------------------------------------------
|
||
|
Warn about mixed content models where #PCDATA can't occur everywhere.
|
||
|
|
||
|
Perhaps there should be a configuration option saying what a control
|
||
|
character is for the purpose of SHUNCHAR CONTROLS.
|
||
|
|
||
|
Should the current character that is printed in error messages be
|
||
|
taken from be taken from the file entity or the current entity?
|
||
|
|
||
|
Refine SYS_ action. If we distinguish DELNONCH in lexmark, lexgrp,
|
||
|
lexsd, we can have separate action that ignores the following
|
||
|
character as well.
|
||
|
|
||
|
Should RSs in CDATA/SDATA entities be ignored as specified in 322:1-2?
|
||
|
Similarily, do the rules on REs in 322:3-11 apply to CDATA/SDATA
|
||
|
entities? (I don't think they count as being `in content'.)
|
||
|
|
||
|
What should the entity manager do when it encounters code 13 in an
|
||
|
input file? (Currently it treats it as an RE.)
|
||
|
|
||
|
Document when invalid exclusions are detected.
|
||
|
|
||
|
Option not to perform capacity checking.
|
||
|
|
||
|
Give a warning if the recommendation of 422:1-3 is contravened.
|
||
|
|
||
|
Should an empty CDATA/RCDATA marked section be allowed in the document
|
||
|
type declaration subset?
|
||
|
|
||
|
Include example of use of SGML_PATH in documentation.
|
||
|
|
||
|
Try to detect the situation in 310:8-10 (but see 282:1-2).
|
||
|
|
||
|
Resize hash tables if they become too full.
|
||
|
|
||
|
Say something in the man page about message catalogues.
|
||
|
|
||
|
Consider whether support for SHORTREF NONE requires further changes
|
||
|
(other than disallowing short reference mapping declaration).
|
||
|
|
||
|
Fake /dev/fd/N and /dev/stdin for systems that don't provide it.
|
||
|
|
||
|
Improve the effficiency of the entity manager by not closing and
|
||
|
reopening files. If we run out of FILEs choose the stream with the
|
||
|
fewest bytes remaining to be read, and read the rest of it into
|
||
|
memory. Each entity level will have its own read buffer.
|
||
|
|
||
|
Support multi-line error messages: automatically indent after
|
||
|
newline. (We could output to a temporary file first, then copy to
|
||
|
stderr replacing newlines by newline+indent).
|
||
|
|
||
|
Option that says to output out of context things.
|
||
|
|
||
|
Divide up formal public identifier errors. Give these errors their
|
||
|
own type code.
|
||
|
|
||
|
Consider whether, when OMITTAG is NO, we need to change interpretation
|
||
|
of an empty start-tag (7.4.1.1).
|
||
|
|
||
|
Possibly turn errors 70 and 136 into warnings.
|
||
|
|
||
|
Make things work with NORMSEP > 2. Would need to keep track of number
|
||
|
of CDATA and SDATA entities in CDATA attributes.
|
||
|
|
||
|
Handle `SCOPE INSTANCE'.
|
||
|
|
||
|
In entgen.c, truncate filenames for OSs that don't do this themselves.
|
||
|
|
||
|
Provide an option that specifies that maximum number of errors; when
|
||
|
this limit is exceeded sgmls would exit.
|
||
|
|
||
|
Document non-portable assumptions in the code.
|
||
|
|
||
|
Option to write out SGML declaration. In this case make it write out
|
||
|
APPINFO parameter.
|
||
|
|
||
|
Allow there to be catalogs mapping public ids to filenames.
|
||
|
Environment variable SGML_CATALOG containing list of filenames of
|
||
|
catalogs.
|