0c8de5b03c
Among all the modifications, this new byacc also solves a 14 year old bug [1] PR: bin/23254 [1] Submitted by: marka@nominum.com [1] MFC after: 3 weeks
348 lines
11 KiB
Groff
348 lines
11 KiB
Groff
.\" $Id: yacc.1,v 1.23 2014/04/09 09:48:50 tom Exp $
|
|
.\"
|
|
.\" .TH YACC 1 "July\ 15,\ 1990"
|
|
.\" .UC 6
|
|
.de ES
|
|
.ne 8
|
|
.nf
|
|
.sp
|
|
.in +4
|
|
..
|
|
.de XE
|
|
.in -4
|
|
.fi
|
|
..
|
|
.\" Escape single quotes in literal strings from groff's Unicode transform.
|
|
.ie \n(.g .ds AQ \(aq
|
|
.el .ds AQ '
|
|
.ie \n(.g .ds `` \(lq
|
|
.el .ds `` ``
|
|
.ie \n(.g .ds '' \(rq
|
|
.el .ds '' ''
|
|
.\" Bulleted paragraph
|
|
.de bP
|
|
.IP \(bu 4
|
|
..
|
|
.TH YACC 1 "January 1, 2014" "Berkeley Yacc" "User Commands"
|
|
.SH NAME
|
|
Yacc \- an LALR(1) parser generator
|
|
.SH SYNOPSIS
|
|
.B yacc [ -BdgilLPrtvVy ] [ \-b
|
|
.I file_prefix
|
|
.B ] [ \-o
|
|
.I output_file
|
|
.B ] [ \-p
|
|
.I symbol_prefix
|
|
.B ]
|
|
.I filename
|
|
.SH DESCRIPTION
|
|
.B Yacc
|
|
reads the grammar specification in the file
|
|
.I filename
|
|
and generates an LALR(1) parser for it.
|
|
The parsers consist of a set of LALR(1) parsing tables and a driver routine
|
|
written in the C programming language.
|
|
.B Yacc
|
|
normally writes the parse tables and the driver routine to the file
|
|
.IR y.tab.c.
|
|
.PP
|
|
The following options are available:
|
|
.TP 5
|
|
\fB\-b \fP\fIfile_prefix\fR
|
|
The
|
|
.B \-b
|
|
option changes the prefix prepended to the output file names to
|
|
the string denoted by
|
|
.IR file_prefix.
|
|
The default prefix is the character
|
|
.IR y.
|
|
.TP
|
|
.B \-B
|
|
create a backtracking parser (compile-type configuration for \fBbtyacc\fP).
|
|
.TP
|
|
.B \-d
|
|
The \fB-d\fR option causes the header file
|
|
.BR y.tab.h
|
|
to be written.
|
|
It contains #define's for the token identifiers.
|
|
.TP
|
|
.B \-g
|
|
The
|
|
.B \-g
|
|
option causes a graphical description of the generated LALR(1) parser to
|
|
be written to the file
|
|
.BR y.dot
|
|
in graphviz format, ready to be processed by dot(1).
|
|
.TP
|
|
.B \-i
|
|
The \fB-i\fR option causes a supplementary header file
|
|
.BR y.tab.i
|
|
to be written.
|
|
It contains extern declarations
|
|
and supplementary #define's as needed to map the conventional \fIyacc\fP
|
|
\fByy\fP-prefixed names to whatever the \fB-p\fP option may specify.
|
|
The code file, e.g., \fBy.tab.c\fP is modified to #include this file
|
|
as well as the \fBy.tab.h\fP file, enforcing consistent usage of the
|
|
symbols defined in those files.
|
|
.IP
|
|
The supplementary header file makes it simpler to separate compilation
|
|
of lex- and yacc-files.
|
|
.TP
|
|
.B \-l
|
|
If the
|
|
.B \-l
|
|
option is not specified,
|
|
.B yacc
|
|
will insert \fI#line\fP directives in the generated code.
|
|
The \fI#line\fP directives let the C compiler relate errors in the
|
|
generated code to the user's original code.
|
|
If the \fB-l\fR option is specified,
|
|
.B yacc
|
|
will not insert the \fI#line\fP directives.
|
|
\&\fI#line\fP directives specified by the user will be retained.
|
|
.TP
|
|
.B \-L
|
|
enable position processing, e.g., \*(``%locations\*('' (compile-type configuration for \fBbtyacc\fP).
|
|
.TP
|
|
\fB\-o \fP\fIoutput_file\fR
|
|
specify the filename for the parser file.
|
|
If this option is not given, the output filename is
|
|
the file prefix concatenated with the file suffix, e.g., \fBy.tab.c\fP.
|
|
This overrides the \fB-b\fP option.
|
|
.TP
|
|
\fB\-p \fP\fIsymbol_prefix\fR
|
|
The
|
|
.B \-p
|
|
option changes the prefix prepended to yacc-generated symbols to
|
|
the string denoted by
|
|
.IR symbol_prefix.
|
|
The default prefix is the string
|
|
.BR yy.
|
|
.TP
|
|
.B \-P
|
|
create a reentrant parser, e.g., \*(``%pure-parser\*(''.
|
|
.TP
|
|
.B \-r
|
|
The
|
|
.B \-r
|
|
option causes
|
|
.B yacc
|
|
to produce separate files for code and tables.
|
|
The code file is named
|
|
.IR y.code.c,
|
|
and the tables file is named
|
|
.IR y.tab.c.
|
|
The prefix \*(``\fIy.\fP\*('' can be overridden using the \fB\-b\fP option.
|
|
.TP
|
|
.B \-s
|
|
suppress \*(``\fB#define\fP\*('' statements generated for string literals in
|
|
a \*(``\fB%token\fP\*('' statement,
|
|
to more closely match original \fByacc\fP behavior.
|
|
.IP
|
|
Normally when \fByacc\fP sees a line such as
|
|
.ES
|
|
%token OP_ADD "ADD"
|
|
.XE
|
|
.IP
|
|
it notices that the quoted \*(``ADD\*('' is a valid C identifier,
|
|
and generates a #define not only for OP_ADD,
|
|
but for ADD as well,
|
|
e.g.,
|
|
.ES
|
|
#define OP_ADD 257
|
|
.br
|
|
#define ADD 258
|
|
.XE
|
|
.IP
|
|
The original \fByacc\fP does not generate the second \*(``\fB#define\fP\*(''.
|
|
The \fB\-s\fP option suppresses this \*(``\fB#define\fP\*(''.
|
|
.IP
|
|
POSIX (IEEE 1003.1 2004) documents only names and numbers
|
|
for \*(``\fB%token\fP\*('',
|
|
though original \fByacc\fP and bison also accept string literals.
|
|
.TP
|
|
.B \-t
|
|
The
|
|
.B \-t
|
|
option changes the preprocessor directives generated by
|
|
.B yacc
|
|
so that debugging statements will be incorporated in the compiled code.
|
|
.TP
|
|
.B \-v
|
|
The
|
|
.B \-v
|
|
option causes a human-readable description of the generated parser to
|
|
be written to the file
|
|
.IR y.output.
|
|
.TP
|
|
.B \-V
|
|
print the version number to the standard output.
|
|
.TP
|
|
.B \-y
|
|
\fByacc\fP ignores this option,
|
|
which bison supports for ostensible POSIX compatibility.
|
|
.SH EXTENSIONS
|
|
.B yacc
|
|
provides some extensions for
|
|
compatibility with bison and other implementations of yacc.
|
|
The \fB%destructor\fP and \fB%locations\fP features are available
|
|
only if \fByacc\fP has been configured and compiled to support the
|
|
back-tracking (\fBbtyacc\fP) functionality.
|
|
The remaining features are always available:
|
|
.TP
|
|
\fB %destructor\fP { \fIcode\fP } \fIsymbol+\fP
|
|
defines code that is invoked when a symbol is automatically
|
|
discarded during error recovery.
|
|
This code can be used to
|
|
reclaim dynamically allocated memory associated with the corresponding
|
|
semantic value for cases where user actions cannot manage the memory
|
|
explicitly.
|
|
.IP
|
|
On encountering a parse error, the generated parser
|
|
discards symbols on the stack and input tokens until it reaches a state
|
|
that will allow parsing to continue.
|
|
This error recovery approach results in a memory leak
|
|
if the \fBYYSTYPE\fP value is, or contains,
|
|
pointers to dynamically allocated memory.
|
|
.IP
|
|
The bracketed \fIcode\fP is invoked whenever the parser discards one of
|
|
the symbols. Within \fIcode\fP, \*(``\fB$$\fP\*('' or
|
|
\*(``\fB$<tag>$\fP\*('' designates the semantic value associated with the
|
|
discarded symbol, and \*(``\fB@$\fP\*('' designates its location (see
|
|
\fB%locations\fP directive).
|
|
.IP
|
|
A per-symbol destructor is defined by listing a grammar symbol
|
|
in \fIsymbol+\fP. A per-type destructor is defined by listing
|
|
a semantic type tag (e.g., \*(``<some_tag>\*('') in \fIsymbol+\fP; in this
|
|
case, the parser will invoke \fIcode\fP whenever it discards any grammar
|
|
symbol that has that semantic type tag, unless that symbol has its own
|
|
per-symbol destructor.
|
|
.IP
|
|
Two categories of default destructor are supported that are
|
|
invoked when discarding any grammar symbol that has no per-symbol and no
|
|
per-type destructor:
|
|
.RS
|
|
.bP
|
|
the code for \*(``\fB<*>\fP\*('' is used
|
|
for grammar symbols that have an explicitly declared semantic type tag
|
|
(via \*(``\fB%type\fP\*('');
|
|
.bP
|
|
the code for \*(``\fB<>\fP\*('' is used
|
|
for grammar symbols that have no declared semantic type tag.
|
|
.RE
|
|
.TP
|
|
\fB %expect\fP \fInumber\fP
|
|
tells \fByacc\fP the expected number of shift/reduce conflicts.
|
|
That makes it only report the number if it differs.
|
|
.TP
|
|
\fB %expect-rr\fP \fInumber\fP
|
|
tell \fByacc\fP the expected number of reduce/reduce conflicts.
|
|
That makes it only report the number if it differs.
|
|
This is (unlike bison) allowable in LALR parsers.
|
|
.TP
|
|
\fB %locations\fP
|
|
tells \fByacc\fP to enable management of position information associated
|
|
with each token, provided by the lexer in the global variable \fByylloc\fP,
|
|
similar to management of semantic value information provided in \fByylval\fP.
|
|
.IP
|
|
As for semantic values, locations can be referenced within actions using
|
|
\fB@$\fP to refer to the location of the left hand side symbol, and \fB@N\fP
|
|
(\fBN\fP an integer) to refer to the location of one of the right hand side
|
|
symbols. Also as for semantic values, when a rule is matched, a default
|
|
action is used the compute the location represented by \fB@$\fP as the
|
|
beginning of the first symbol and the end of the last symbol in the right
|
|
hand side of the rule. This default computation can be overridden by
|
|
explicit assignment to \fB@$\fP in a rule action.
|
|
.IP
|
|
The type of \fByylloc\fP is \fBYYLTYPE\fP, which is defined by default as:
|
|
.ES
|
|
typedef struct YYLTYPE {
|
|
int first_line;
|
|
int first_column;
|
|
int last_line;
|
|
int last_column;
|
|
} YYLTYPE;
|
|
.XE
|
|
.IP
|
|
\fBYYLTYPE\fP can be redefined by the user
|
|
(\fBYYLTYPE_IS_DEFINED\fP must be defined, to inhibit the default)
|
|
in the declarations section of the specification file.
|
|
As in bison, the macro \fBYYLLOC_DEFAULT\fP is invoked
|
|
each time a rule is matched to calculate a position for the left hand side of
|
|
the rule, before the associated action is executed; this macro can be
|
|
redefined by the user.
|
|
.IP
|
|
This directive adds a \fBYYLTYPE\fP parameter to \fByyerror()\fP.
|
|
If the \fB%pure-parser\fP directive is present,
|
|
a \fBYYLTYPE\fP parameter is added to \fByylex()\fP calls.
|
|
.TP
|
|
\fB %lex-param\fP { \fIargument-declaration\fP }
|
|
By default, the lexer accepts no parameters, e.g., \fByylex()\fP.
|
|
Use this directive to add parameter declarations for your customized lexer.
|
|
.TP
|
|
\fB %parse-param\fP { \fIargument-declaration\fP }
|
|
By default, the parser accepts no parameters, e.g., \fByyparse()\fP.
|
|
Use this directive to add parameter declarations for your customized parser.
|
|
.TP
|
|
\fB %pure-parser\fP
|
|
Most variables (other than \fByydebug\fP and \fByynerrs\fP) are
|
|
allocated on the stack within \fByyparse\fP, making the parser reasonably
|
|
reentrant.
|
|
.TP
|
|
\fB %token-table\fP
|
|
Make the parser's names for tokens available in the \fByytname\fP array.
|
|
However,
|
|
.B yacc
|
|
does not predefine \*(``$end\*('', \*(``$error\*(''
|
|
or \*(``$undefined\*('' in this array.
|
|
.SH PORTABILITY
|
|
According to Robert Corbett,
|
|
.ES
|
|
Berkeley Yacc is an LALR(1) parser generator. Berkeley Yacc has been made
|
|
as compatible as possible with AT&T Yacc. Berkeley Yacc can accept any input
|
|
specification that conforms to the AT&T Yacc documentation. Specifications
|
|
that take advantage of undocumented features of AT&T Yacc will probably be
|
|
rejected.
|
|
.XE
|
|
.PP
|
|
The rationale in
|
|
.ES
|
|
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html
|
|
.XE
|
|
.PP
|
|
documents some features of AT&T yacc which are no longer required for POSIX
|
|
compliance.
|
|
.PP
|
|
That said, you may be interested in reusing grammar files with some
|
|
other implementation which is not strictly compatible with AT&T yacc.
|
|
For instance, there is bison.
|
|
Here are a few differences:
|
|
.bP
|
|
\fBYacc\fP accepts an equals mark preceding the left curly brace
|
|
of an action (as in the original grammar file \fBftp.y\fP):
|
|
.ES
|
|
| STAT CRLF
|
|
= {
|
|
statcmd();
|
|
}
|
|
.XE
|
|
.bP
|
|
\fBYacc\fP and bison emit code in different order, and in particular bison
|
|
makes forward reference to common functions such as yylex, yyparse and
|
|
yyerror without providing prototypes.
|
|
.bP
|
|
Bison's support for \*(``%expect\*('' is broken in more than one release.
|
|
For best results using bison, delete that directive.
|
|
.bP
|
|
Bison has no equivalent for some of \fByacc\fP's commmand-line options,
|
|
relying on directives embedded in the grammar file.
|
|
.bP
|
|
Bison's \*(``\fB\-y\fP\*('' option does not affect bison's lack of support for
|
|
features of AT&T yacc which were deemed obsolescent.
|
|
.
|
|
.SH DIAGNOSTICS
|
|
If there are rules that are never reduced, the number of such rules is
|
|
reported on standard error.
|
|
If there are any LALR(1) conflicts, the number of conflicts is reported
|
|
on standard error.
|