157 lines
5.2 KiB
Plaintext
157 lines
5.2 KiB
Plaintext
.nr PS 12
|
|
.nr VS 14
|
|
.LP
|
|
.TL
|
|
Design of grohtml
|
|
.sp 1i
|
|
.SH
|
|
What is grohtml
|
|
.LP
|
|
Grohtml is a back end for groff which generates html.
|
|
The aim of grohtml is to produce respectible html given
|
|
fairly typical groff input.
|
|
.SH
|
|
Limitations of grohtml
|
|
.LP
|
|
Although basic text can be translated
|
|
in a straightforward fashion there are some areas where grohtml
|
|
has to try and guess text relationship. In particular whenever
|
|
grohtml encounters text tables and indented paragraphs or
|
|
two column mode it will try and utilize the html table construct
|
|
to preserve columns. Grohtml also attempts to work out which
|
|
lines should be automatically formatted by the browser.
|
|
Ultimately in trying to make reasonable guesses most of the time
|
|
it will make mistakes occasionally.
|
|
.PP
|
|
Tbl, pic, eqn's are also generated using images which may be
|
|
considered a limitation.
|
|
.SH
|
|
Overview of html.cc
|
|
.LP
|
|
This file briefly provides an overview of how html.cc operates.
|
|
The html device driver works as follows:
|
|
.IP (i) .5i
|
|
firstly it creates a linked list of all words on a page.
|
|
.IP (ii) .5i
|
|
it runs through the page and finds the left most margin. Later
|
|
on when generating the page it removes the margin.
|
|
.IP (iii) .5i
|
|
scans a page and builds two kinds of regions ascii text and graphical.
|
|
The graphical regions consist of tbl's, eqn's, pic's
|
|
(basically anything that cannot be textually displayed).
|
|
It will scan through a page to find lines (such as footer etc)
|
|
and places these into tiny graphical regions. Certain fonts
|
|
also are treated as a graphical region - as html has no easy
|
|
equivalent. For example Greek math symbols.
|
|
.LP
|
|
Finally all graphical regions are translated into png files and
|
|
all text regions into html text.
|
|
.PP
|
|
To give grohtml a sporting chance of accuratly deciding which
|
|
is a graphical region and which is text, the front end programs
|
|
tbl, eqn, pic have all been tweeked to encapsulate pictures, tables
|
|
and equations with the following lines:
|
|
.sp
|
|
.nf
|
|
\f[CR]\&.if '\\*(.T'html' \\X(graphic-start(\c
|
|
|
|
\&.if '\\*(.T'html' \\X(graphic-end(\c
|
|
\fP
|
|
.fi
|
|
.sp
|
|
these appear to grohtml as:
|
|
.sp
|
|
.nf
|
|
\f[CR]\&x X graphic-start
|
|
|
|
\&...
|
|
|
|
\&x X graphic-end\fP
|
|
.fi
|
|
.sp
|
|
.LP
|
|
In addition to graphic-start and graphic-end there are two
|
|
other "special characters" which are used.
|
|
.sp
|
|
\f[CR]\&x X index:N\fP
|
|
.sp
|
|
where N is a number. The purpose of this sequence is to stop
|
|
devhtml from automatically producing links to headings which
|
|
have a header level >N.
|
|
The line:
|
|
.sp
|
|
\f[CR]\&x X html:STRING\fR
|
|
.sp
|
|
.LP
|
|
allows a STRING to be passed through to the output file with
|
|
no processing whatsoever. Ie it allows users to include html
|
|
commands, via macro, such as:
|
|
.sp
|
|
\f[CR]\&.URL "Latest Emacs" "ftp://somewonderful.gnu.software"\fP
|
|
.sp
|
|
.LP
|
|
Where the URL macro bundles the info into STRING above.
|
|
For more info consult: \f[CR]tmac/tmac.arkup\fP.
|
|
.PP
|
|
While scanning through a page the html device copies headings and titles
|
|
into a list of links which are later written to the beginning
|
|
of the html document.
|
|
.SH
|
|
Table handling code
|
|
.LP
|
|
Provided that the -t option is not present when grohtml is run the grohtml
|
|
driver will attempt to find textual tables and generate html tables.
|
|
This allows .RS and .RE commands to operate with auto formatting. It also
|
|
should grohtml to process .2C correctly. However, the table handling code
|
|
has to examine the troff output and \fIguess\fR when a table starts and
|
|
finishes. It is well to know the limitations of this approach as it
|
|
sometimes makes the wrong decision.
|
|
.LP
|
|
Here are some of the rules that grohtml uses for terminating a html table:
|
|
.LP
|
|
.IP "(i)" .5i
|
|
A table will be terminated when grohtml finds line which is all in bold
|
|
font (it believes that this is a header which is outside of a table).
|
|
This might be considered incorrect behaviour especially if you use .2C
|
|
which generates a heading on the left column when the corresponding
|
|
right row is blank.
|
|
.IP "(ii)" .5i
|
|
A table is terminated when grohtml sees that the complete line is
|
|
has been spanned by words. Ie no gaps exist.
|
|
.IP "(nb)" .5i
|
|
the documentation about these rules is particularly incomplete and needs finishing
|
|
when time prevails.
|
|
.SH
|
|
To do
|
|
.LP
|
|
.IP (i) .5i
|
|
finish working out the max and min x, y, extents for splines.
|
|
.IP (ii) .5i
|
|
check and test thoroughly all the character descriptions in devhtml
|
|
(originally taken from devX100)
|
|
.IP (iii) .5i
|
|
improve tmac.arkup
|
|
.IP (vi) .5i
|
|
also improve documentation.
|
|
.IP (v) .5i
|
|
fix the bugs which are exposed by Eric Raymonds pic guide,
|
|
\fBMaking Pictures With GNU PIC\fR. It appears that grohtml becomes confused
|
|
about which sections of the document are text and which sections need
|
|
to be rendered as an image.
|
|
.IP (vi) .5i
|
|
it would be nice to modularise the source. A natural division might be
|
|
to extract the table handling code from html.cc into table.cc.
|
|
The table.cc could be expanded to recognise output from tbl and try
|
|
and generate html tables with lines/rules/boxes. The code as it stands
|
|
should cope with very simple plain text tables. But of course at present
|
|
it does not get a chance to do this because the output of gtbl is
|
|
bracketed by \fCgraphic-start\fR and \fCgraphic-end\fR.
|
|
.IP (vii) .5i
|
|
introduce anti aliasing for the images as mentioned by Werner.
|
|
.SH
|
|
Dependencies
|
|
.LP
|
|
Grohtml is dependent upon grops, gs which are invoked to
|
|
generate all png files. Png files are generated whenever a table, picture,
|
|
equation or line is encountered.
|