This commit was manufactured by cvs2svn to create tag 'grep_2_0'.
This commit is contained in:
parent
717f769197
commit
c0712724bc
Notes:
svn2git
2020-12-20 02:59:44 +00:00
svn path=/cvs2svn/tags/grep_2_0/; revision=95
29
gnu/usr.bin/grep/AUTHORS
Normal file
29
gnu/usr.bin/grep/AUTHORS
Normal file
@ -0,0 +1,29 @@
|
||||
Mike Haertel wrote the main program and the dfa and kwset matchers.
|
||||
|
||||
Arthur David Olson contributed the heuristics for finding fixed substrings
|
||||
at the end of dfa.c.
|
||||
|
||||
Richard Stallman and Karl Berry wrote the regex backtracking matcher.
|
||||
|
||||
Henry Spencer wrote the original test suite from which grep's was derived.
|
||||
|
||||
Scott Anderson invented the Khadafy test.
|
||||
|
||||
David MacKenzie wrote the automatic configuration software use to
|
||||
produce the configure script.
|
||||
|
||||
Authors of the replacements for standard library routines are identified
|
||||
in the corresponding source files.
|
||||
|
||||
The idea of using Boyer-Moore type algorithms to quickly filter out
|
||||
non-matching text before calling the regexp matcher was originally due
|
||||
to James Woods. He also contributed some code to early versions of
|
||||
GNU grep.
|
||||
|
||||
Finally, I would like to thank Andrew Hume for many fascinating discussions
|
||||
of string searching issues over the years. Hume & Sunday's excellent
|
||||
paper on fast string searching (AT&T Bell Laboratories CSTR #156)
|
||||
describes some of the history of the subject, as well as providing
|
||||
exhaustive performance analysis of various implementation alternatives.
|
||||
The inner loop of GNU grep is similar to Hume & Sunday's recommended
|
||||
"Tuned Boyer Moore" inner loop.
|
@ -1,7 +1,13 @@
|
||||
|
||||
PROG= grep
|
||||
SRCS= dfa.c regex.o grep.o
|
||||
CFLAGS+=-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1
|
||||
MLINKS= grep.1 egrep.1
|
||||
SRCS= dfa.c grep.c getopt.c kwset.c obstack.c regex.c search.c
|
||||
|
||||
CFLAGS+=-DGREP -DHAVE_STRING_H=1 -DHAVE_SYS_PARAM_H=1 -DHAVE_UNISTD_H=1 \
|
||||
-DHAVE_GETPAGESIZE=1 -DHAVE_MEMCHR=1 -DHAVE_STRERROR=1 \
|
||||
-DHAVE_VALLOC=1
|
||||
|
||||
|
||||
#check: ${.CURDIR}/grep
|
||||
check: all
|
||||
awk sh ${.CURDIR}/tests/check.sh ${.CURDIR}/tests
|
||||
|
||||
.include <bsd.prog.mk>
|
||||
|
35
gnu/usr.bin/grep/NEWS
Normal file
35
gnu/usr.bin/grep/NEWS
Normal file
@ -0,0 +1,35 @@
|
||||
Version 2.0:
|
||||
|
||||
The most important user visible change is that egrep and fgrep have
|
||||
disappeared as separate programs into the single grep program mandated
|
||||
by POSIX 1003.2. New options -G, -E, and -F have been added,
|
||||
selecting grep, egrep, and fgrep behavior respectively. For
|
||||
compatibility with historical practice, hard links named egrep and
|
||||
fgrep are also provided. See the manual page for details.
|
||||
|
||||
In addition, the regular expression facilities described in Posix
|
||||
draft 11.2 are now supported, except for internationalization features
|
||||
related to locale-dependent collating sequence information.
|
||||
|
||||
There is a new option, -L, which is like -l except it lists
|
||||
files which don't contain matches. The reason this option was
|
||||
added is because '-l -v' doesn't do what you expect.
|
||||
|
||||
Performance has been improved; the amount of improvement is platform
|
||||
dependent, but (for example) grep 2.0 typically runs at least 30% faster
|
||||
than grep 1.6 on a DECstation using the MIPS compiler. Where possible,
|
||||
grep now uses mmap() for file input; on a Sun 4 running SunOS 4.1 this
|
||||
may cut system time by as much as half, for a total reduction in running
|
||||
time by nearly 50%. On machines that don't use mmap(), the buffering
|
||||
code has been rewritten to choose more favorable alignments and buffer
|
||||
sizes for read().
|
||||
|
||||
Portability has been substantially cleaned up, and an automatic
|
||||
configure script is now provided.
|
||||
|
||||
The internals have changed in ways too numerous to mention.
|
||||
People brave enough to reuse the DFA matcher in other programs
|
||||
will now have their bravery amply "rewarded", for the interface
|
||||
to that file has been completely changed. Some changes were
|
||||
necessary to track the evolution of the regex package, and since
|
||||
I was changing it anyway I decided to do a general cleanup.
|
15
gnu/usr.bin/grep/PROJECTS
Normal file
15
gnu/usr.bin/grep/PROJECTS
Normal file
@ -0,0 +1,15 @@
|
||||
Write Texinfo documentation for grep. The manual page would be a good
|
||||
place to start, but Info documents are also supposed to contain a
|
||||
tutorial and examples.
|
||||
|
||||
Fix the DFA matcher to never use exponential space. (Fortunately, these
|
||||
cases are rare.)
|
||||
|
||||
Improve the performance of the regex backtracking matcher. This matcher
|
||||
is agonizingly slow, and is responsible for grep sometimes being slower
|
||||
than Unix grep when backreferences are used.
|
||||
|
||||
Provide support for the Posix [= =] and [. .] constructs. This is
|
||||
difficult because it requires locale-dependent details of the character
|
||||
set and collating sequence, but Posix does not standardize any method
|
||||
for accessing this information!
|
@ -1,70 +1,28 @@
|
||||
This README documents GNU e?grep version 1.6. All bugs reported for
|
||||
previous versions have been fixed.
|
||||
This is GNU grep 2.0, the "fastest grep in the west" (we hope). All
|
||||
bugs reported in previous releases have been fixed. Many exciting new
|
||||
bugs have probably been introduced in this major revision.
|
||||
|
||||
See the file INSTALL for compilation and installation instructions.
|
||||
|
||||
Send bug reports to bug-gnu-utils@prep.ai.mit.edu.
|
||||
|
||||
GNU e?grep is provided "as is" with no warranty. The exact terms
|
||||
GNU grep is provided "as is" with no warranty. The exact terms
|
||||
under which you may use and (re)distribute this program are detailed
|
||||
in the GNU General Public License, in the file COPYING.
|
||||
|
||||
GNU e?grep is based on a fast lazy-state deterministic matcher (about
|
||||
GNU grep is based on a fast lazy-state deterministic matcher (about
|
||||
twice as fast as stock Unix egrep) hybridized with a Boyer-Moore-Gosper
|
||||
search for a fixed string that eliminates impossible text from being
|
||||
considered by the full regexp matcher without necessarily having to
|
||||
look at every character. The result is typically many times faster
|
||||
than Unix grep or egrep. (Regular expressions containing backreferencing
|
||||
may run more slowly, however.)
|
||||
will run more slowly, however.)
|
||||
|
||||
GNU e?grep is brought to you by the efforts of several people:
|
||||
See the file AUTHORS for a list of authors and other contributors.
|
||||
|
||||
Mike Haertel wrote the deterministic regexp code and the bulk
|
||||
of the program.
|
||||
See the file INSTALL for compilation and installation instructions.
|
||||
|
||||
James A. Woods is responsible for the hybridized search strategy
|
||||
of using Boyer-Moore-Gosper fixed-string search as a filter
|
||||
before calling the general regexp matcher.
|
||||
See the file MANIFEST for a list of files in this distribution.
|
||||
|
||||
Arthur David Olson contributed code that finds fixed strings for
|
||||
the aforementioned BMG search for a large class of regexps.
|
||||
See the file NEWS for a description of major changes in this release.
|
||||
|
||||
Richard Stallman wrote the backtracking regexp matcher that is
|
||||
used for \<digit> backreferences, as well as the getopt that
|
||||
is provided for 4.2BSD sites. The backtracking matcher was
|
||||
originally written for GNU Emacs.
|
||||
See the file PROJECTS if you want to be mentioned in AUTHORS.
|
||||
|
||||
D. A. Gwyn wrote the C alloca emulation that is provided so
|
||||
System V machines can run this program. (Alloca is used only
|
||||
by RMS' backtracking matcher, and then only rarely, so there
|
||||
is no loss if your machine doesn't have a "real" alloca.)
|
||||
|
||||
Scott Anderson and Henry Spencer designed the regression tests
|
||||
used in the "regress" script.
|
||||
|
||||
Paul Placeway wrote the manual page, based on this README.
|
||||
|
||||
If you are interested in improving this program, you may wish to try
|
||||
any of the following:
|
||||
|
||||
1. Replace the fast search loop with a faster search loop.
|
||||
There are several things that could be improved, the most notable
|
||||
of which would be to calculate a minimal delta2 to use.
|
||||
|
||||
2. Make backreferencing \<digit> faster. Right now, backreferencing is
|
||||
handled by calling the Emacs backtracking matcher to verify the partial
|
||||
match. This is slow; if the DFA routines could handle backreferencing
|
||||
themselves a speedup on the order of three to four times might occur
|
||||
in those cases where the backtracking matcher is called to verify nearly
|
||||
every line. Also, some portability problems due to the inclusion of the
|
||||
emacs matcher would be solved because it could then be eliminated.
|
||||
Note that expressions with backreferencing are not true regular
|
||||
expressions, and thus are not equivalent to any DFA. So this is hard.
|
||||
|
||||
3. Handle POSIX style regexps. I'm not sure if this could be called an
|
||||
improvement; some of the things on regexps in the POSIX draft I have
|
||||
seen are pretty sickening. But it would be useful in the interests of
|
||||
conforming to the standard.
|
||||
|
||||
4. Replace the main driver program grep.c with the much cleaner main driver
|
||||
program used in GNU fgrep.
|
||||
Send bug reports to bug-gnu-utils@prep.ai.mit.edu. Be sure to
|
||||
include the word "grep" in your Subject: header field.
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -16,210 +16,115 @@
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||
|
||||
/* Written June, 1988 by Mike Haertel */
|
||||
|
||||
#ifdef STDC_HEADERS
|
||||
|
||||
#include <stddef.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
#else /* !STDC_HEADERS */
|
||||
|
||||
#define const
|
||||
#include <sys/types.h> /* For size_t. */
|
||||
extern char *calloc(), *malloc(), *realloc();
|
||||
extern void free();
|
||||
|
||||
#ifndef NULL
|
||||
#define NULL 0
|
||||
#endif
|
||||
|
||||
#endif /* ! STDC_HEADERS */
|
||||
|
||||
#include <ctype.h>
|
||||
#ifndef isascii
|
||||
#define ISALNUM(c) isalnum(c)
|
||||
#define ISALPHA(c) isalpha(c)
|
||||
#define ISUPPER(c) isupper(c)
|
||||
#define ISLOWER(c) islower(c)
|
||||
#else
|
||||
#define ISALNUM(c) (isascii(c) && isalnum(c))
|
||||
#define ISALPHA(c) (isascii(c) && isalpha(c))
|
||||
#define ISUPPER(c) (isascii(c) && isupper(c))
|
||||
#define ISLOWER(c) (isascii(c) && islower(c))
|
||||
#endif
|
||||
|
||||
/* 1 means plain parentheses serve as grouping, and backslash
|
||||
parentheses are needed for literal searching.
|
||||
0 means backslash-parentheses are grouping, and plain parentheses
|
||||
are for literal searching. */
|
||||
#define RE_NO_BK_PARENS 1
|
||||
|
||||
/* 1 means plain | serves as the "or"-operator, and \| is a literal.
|
||||
0 means \| serves as the "or"-operator, and | is a literal. */
|
||||
#define RE_NO_BK_VBAR 2
|
||||
|
||||
/* 0 means plain + or ? serves as an operator, and \+, \? are literals.
|
||||
1 means \+, \? are operators and plain +, ? are literals. */
|
||||
#define RE_BK_PLUS_QM 4
|
||||
|
||||
/* 1 means | binds tighter than ^ or $.
|
||||
0 means the contrary. */
|
||||
#define RE_TIGHT_VBAR 8
|
||||
|
||||
/* 1 means treat \n as an _OR operator
|
||||
0 means treat it as a normal character */
|
||||
#define RE_NEWLINE_OR 16
|
||||
|
||||
/* 0 means that a special characters (such as *, ^, and $) always have
|
||||
their special meaning regardless of the surrounding context.
|
||||
1 means that special characters may act as normal characters in some
|
||||
contexts. Specifically, this applies to:
|
||||
^ - only special at the beginning, or after ( or |
|
||||
$ - only special at the end, or before ) or |
|
||||
*, +, ? - only special when not after the beginning, (, or | */
|
||||
#define RE_CONTEXT_INDEP_OPS 32
|
||||
|
||||
/* Now define combinations of bits for the standard possibilities. */
|
||||
#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_CONTEXT_INDEP_OPS)
|
||||
#define RE_SYNTAX_EGREP (RE_SYNTAX_AWK | RE_NEWLINE_OR)
|
||||
#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
|
||||
#define RE_SYNTAX_EMACS 0
|
||||
/* FIXME:
|
||||
2. We should not export so much of the DFA internals.
|
||||
In addition to clobbering modularity, we eat up valuable
|
||||
name space. */
|
||||
|
||||
/* Number of bits in an unsigned char. */
|
||||
#define CHARBITS 8
|
||||
|
||||
/* First integer value that is greater than any character code. */
|
||||
#define _NOTCHAR (1 << CHARBITS)
|
||||
#define NOTCHAR (1 << CHARBITS)
|
||||
|
||||
/* INTBITS need not be exact, just a lower bound. */
|
||||
#define INTBITS (CHARBITS * sizeof (int))
|
||||
|
||||
/* Number of ints required to hold a bit for every character. */
|
||||
#define _CHARSET_INTS ((_NOTCHAR + INTBITS - 1) / INTBITS)
|
||||
#define CHARCLASS_INTS ((NOTCHAR + INTBITS - 1) / INTBITS)
|
||||
|
||||
/* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
|
||||
typedef int _charset[_CHARSET_INTS];
|
||||
typedef int charclass[CHARCLASS_INTS];
|
||||
|
||||
/* The regexp is parsed into an array of tokens in postfix form. Some tokens
|
||||
are operators and others are terminal symbols. Most (but not all) of these
|
||||
codes are returned by the lexical analyzer. */
|
||||
#if __STDC__
|
||||
|
||||
typedef enum
|
||||
{
|
||||
_END = -1, /* _END is a terminal symbol that matches the
|
||||
end of input; any value of _END or less in
|
||||
END = -1, /* END is a terminal symbol that matches the
|
||||
end of input; any value of END or less in
|
||||
the parse tree is such a symbol. Accepting
|
||||
states of the DFA are those that would have
|
||||
a transition on _END. */
|
||||
a transition on END. */
|
||||
|
||||
/* Ordinary character values are terminal symbols that match themselves. */
|
||||
|
||||
_EMPTY = _NOTCHAR, /* _EMPTY is a terminal symbol that matches
|
||||
EMPTY = NOTCHAR, /* EMPTY is a terminal symbol that matches
|
||||
the empty string. */
|
||||
|
||||
_BACKREF, /* _BACKREF is generated by \<digit>; it
|
||||
BACKREF, /* BACKREF is generated by \<digit>; it
|
||||
it not completely handled. If the scanner
|
||||
detects a transition on backref, it returns
|
||||
a kind of "semi-success" indicating that
|
||||
the match will have to be verified with
|
||||
a backtracking matcher. */
|
||||
|
||||
_BEGLINE, /* _BEGLINE is a terminal symbol that matches
|
||||
BEGLINE, /* BEGLINE is a terminal symbol that matches
|
||||
the empty string if it is at the beginning
|
||||
of a line. */
|
||||
|
||||
_ALLBEGLINE, /* _ALLBEGLINE is a terminal symbol that
|
||||
matches the empty string if it is at the
|
||||
beginning of a line; _ALLBEGLINE applies
|
||||
to the entire regexp and can only occur
|
||||
as the first token thereof. _ALLBEGLINE
|
||||
never appears in the parse tree; a _BEGLINE
|
||||
is prepended with _CAT to the entire
|
||||
regexp instead. */
|
||||
|
||||
_ENDLINE, /* _ENDLINE is a terminal symbol that matches
|
||||
ENDLINE, /* ENDLINE is a terminal symbol that matches
|
||||
the empty string if it is at the end of
|
||||
a line. */
|
||||
|
||||
_ALLENDLINE, /* _ALLENDLINE is to _ENDLINE as _ALLBEGLINE
|
||||
is to _BEGLINE. */
|
||||
|
||||
_BEGWORD, /* _BEGWORD is a terminal symbol that matches
|
||||
BEGWORD, /* BEGWORD is a terminal symbol that matches
|
||||
the empty string if it is at the beginning
|
||||
of a word. */
|
||||
|
||||
_ENDWORD, /* _ENDWORD is a terminal symbol that matches
|
||||
ENDWORD, /* ENDWORD is a terminal symbol that matches
|
||||
the empty string if it is at the end of
|
||||
a word. */
|
||||
|
||||
_LIMWORD, /* _LIMWORD is a terminal symbol that matches
|
||||
LIMWORD, /* LIMWORD is a terminal symbol that matches
|
||||
the empty string if it is at the beginning
|
||||
or the end of a word. */
|
||||
|
||||
_NOTLIMWORD, /* _NOTLIMWORD is a terminal symbol that
|
||||
NOTLIMWORD, /* NOTLIMWORD is a terminal symbol that
|
||||
matches the empty string if it is not at
|
||||
the beginning or end of a word. */
|
||||
|
||||
_QMARK, /* _QMARK is an operator of one argument that
|
||||
QMARK, /* QMARK is an operator of one argument that
|
||||
matches zero or one occurences of its
|
||||
argument. */
|
||||
|
||||
_STAR, /* _STAR is an operator of one argument that
|
||||
STAR, /* STAR is an operator of one argument that
|
||||
matches the Kleene closure (zero or more
|
||||
occurrences) of its argument. */
|
||||
|
||||
_PLUS, /* _PLUS is an operator of one argument that
|
||||
PLUS, /* PLUS is an operator of one argument that
|
||||
matches the positive closure (one or more
|
||||
occurrences) of its argument. */
|
||||
|
||||
_CAT, /* _CAT is an operator of two arguments that
|
||||
REPMN, /* REPMN is a lexical token corresponding
|
||||
to the {m,n} construct. REPMN never
|
||||
appears in the compiled token vector. */
|
||||
|
||||
CAT, /* CAT is an operator of two arguments that
|
||||
matches the concatenation of its
|
||||
arguments. _CAT is never returned by the
|
||||
arguments. CAT is never returned by the
|
||||
lexical analyzer. */
|
||||
|
||||
_OR, /* _OR is an operator of two arguments that
|
||||
OR, /* OR is an operator of two arguments that
|
||||
matches either of its arguments. */
|
||||
|
||||
_LPAREN, /* _LPAREN never appears in the parse tree,
|
||||
ORTOP, /* OR at the toplevel in the parse tree.
|
||||
This is used for a boyer-moore heuristic. */
|
||||
|
||||
LPAREN, /* LPAREN never appears in the parse tree,
|
||||
it is only a lexeme. */
|
||||
|
||||
_RPAREN, /* _RPAREN never appears in the parse tree. */
|
||||
RPAREN, /* RPAREN never appears in the parse tree. */
|
||||
|
||||
_SET /* _SET and (and any value greater) is a
|
||||
CSET /* CSET and (and any value greater) is a
|
||||
terminal symbol that matches any of a
|
||||
class of characters. */
|
||||
} _token;
|
||||
} token;
|
||||
|
||||
#else /* ! __STDC__ */
|
||||
|
||||
typedef short _token;
|
||||
|
||||
#define _END -1
|
||||
#define _EMPTY _NOTCHAR
|
||||
#define _BACKREF (_EMPTY + 1)
|
||||
#define _BEGLINE (_EMPTY + 2)
|
||||
#define _ALLBEGLINE (_EMPTY + 3)
|
||||
#define _ENDLINE (_EMPTY + 4)
|
||||
#define _ALLENDLINE (_EMPTY + 5)
|
||||
#define _BEGWORD (_EMPTY + 6)
|
||||
#define _ENDWORD (_EMPTY + 7)
|
||||
#define _LIMWORD (_EMPTY + 8)
|
||||
#define _NOTLIMWORD (_EMPTY + 9)
|
||||
#define _QMARK (_EMPTY + 10)
|
||||
#define _STAR (_EMPTY + 11)
|
||||
#define _PLUS (_EMPTY + 12)
|
||||
#define _CAT (_EMPTY + 13)
|
||||
#define _OR (_EMPTY + 14)
|
||||
#define _LPAREN (_EMPTY + 15)
|
||||
#define _RPAREN (_EMPTY + 16)
|
||||
#define _SET (_EMPTY + 17)
|
||||
|
||||
#endif /* ! __STDC__ */
|
||||
|
||||
/* Sets are stored in an array in the compiled regexp; the index of the
|
||||
array corresponding to a given set token is given by _SET_INDEX(t). */
|
||||
#define _SET_INDEX(t) ((t) - _SET)
|
||||
/* Sets are stored in an array in the compiled dfa; the index of the
|
||||
array corresponding to a given set token is given by SET_INDEX(t). */
|
||||
#define SET_INDEX(t) ((t) - CSET)
|
||||
|
||||
/* Sometimes characters can only be matched depending on the surrounding
|
||||
context. Such context decisions depend on what the previous character
|
||||
@ -239,36 +144,36 @@ typedef short _token;
|
||||
|
||||
Word-constituent characters are those that satisfy isalnum().
|
||||
|
||||
The macro _SUCCEEDS_IN_CONTEXT determines whether a a given constraint
|
||||
The macro SUCCEEDS_IN_CONTEXT determines whether a a given constraint
|
||||
succeeds in a particular context. Prevn is true if the previous character
|
||||
was a newline, currn is true if the lookahead character is a newline.
|
||||
Prevl and currl similarly depend upon whether the previous and current
|
||||
characters are word-constituent letters. */
|
||||
#define _MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
|
||||
((constraint) & 1 << ((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4)
|
||||
#define _MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \
|
||||
((constraint) & 1 << ((prevl) ? 2 : 0) + ((currl) ? 1 : 0))
|
||||
#define _SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \
|
||||
(_MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
|
||||
&& _MATCHES_LETTER_CONTEXT(constraint, prevl, currl))
|
||||
#define MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
|
||||
((constraint) & 1 << (((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4))
|
||||
#define MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \
|
||||
((constraint) & 1 << (((prevl) ? 2 : 0) + ((currl) ? 1 : 0)))
|
||||
#define SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \
|
||||
(MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
|
||||
&& MATCHES_LETTER_CONTEXT(constraint, prevl, currl))
|
||||
|
||||
/* The following macros give information about what a constraint depends on. */
|
||||
#define _PREV_NEWLINE_DEPENDENT(constraint) \
|
||||
#define PREV_NEWLINE_DEPENDENT(constraint) \
|
||||
(((constraint) & 0xc0) >> 2 != ((constraint) & 0x30))
|
||||
#define _PREV_LETTER_DEPENDENT(constraint) \
|
||||
#define PREV_LETTER_DEPENDENT(constraint) \
|
||||
(((constraint) & 0x0c) >> 2 != ((constraint) & 0x03))
|
||||
|
||||
/* Tokens that match the empty string subject to some constraint actually
|
||||
work by applying that constraint to determine what may follow them,
|
||||
taking into account what has gone before. The following values are
|
||||
the constraints corresponding to the special tokens previously defined. */
|
||||
#define _NO_CONSTRAINT 0xff
|
||||
#define _BEGLINE_CONSTRAINT 0xcf
|
||||
#define _ENDLINE_CONSTRAINT 0xaf
|
||||
#define _BEGWORD_CONSTRAINT 0xf2
|
||||
#define _ENDWORD_CONSTRAINT 0xf4
|
||||
#define _LIMWORD_CONSTRAINT 0xf6
|
||||
#define _NOTLIMWORD_CONSTRAINT 0xf9
|
||||
#define NO_CONSTRAINT 0xff
|
||||
#define BEGLINE_CONSTRAINT 0xcf
|
||||
#define ENDLINE_CONSTRAINT 0xaf
|
||||
#define BEGWORD_CONSTRAINT 0xf2
|
||||
#define ENDWORD_CONSTRAINT 0xf4
|
||||
#define LIMWORD_CONSTRAINT 0xf6
|
||||
#define NOTLIMWORD_CONSTRAINT 0xf9
|
||||
|
||||
/* States of the recognizer correspond to sets of positions in the parse
|
||||
tree, together with the constraints under which they may be matched.
|
||||
@ -278,44 +183,48 @@ typedef struct
|
||||
{
|
||||
unsigned index; /* Index into the parse array. */
|
||||
unsigned constraint; /* Constraint for matching this position. */
|
||||
} _position;
|
||||
} position;
|
||||
|
||||
/* Sets of positions are stored as arrays. */
|
||||
typedef struct
|
||||
{
|
||||
_position *elems; /* Elements of this position set. */
|
||||
position *elems; /* Elements of this position set. */
|
||||
int nelem; /* Number of elements in this set. */
|
||||
} _position_set;
|
||||
} position_set;
|
||||
|
||||
/* A state of the regexp consists of a set of positions, some flags,
|
||||
/* A state of the dfa consists of a set of positions, some flags,
|
||||
and the token value of the lowest-numbered position of the state that
|
||||
contains an _END token. */
|
||||
contains an END token. */
|
||||
typedef struct
|
||||
{
|
||||
int hash; /* Hash of the positions of this state. */
|
||||
_position_set elems; /* Positions this state could match. */
|
||||
position_set elems; /* Positions this state could match. */
|
||||
char newline; /* True if previous state matched newline. */
|
||||
char letter; /* True if previous state matched a letter. */
|
||||
char backref; /* True if this state matches a \<digit>. */
|
||||
unsigned char constraint; /* Constraint for this state to accept. */
|
||||
int first_end; /* Token value of the first _END in elems. */
|
||||
} _dfa_state;
|
||||
int first_end; /* Token value of the first END in elems. */
|
||||
} dfa_state;
|
||||
|
||||
/* If an r.e. is at most MUST_MAX characters long, we look for a string which
|
||||
must appear in it; whatever's found is dropped into the struct reg. */
|
||||
|
||||
#define MUST_MAX 50
|
||||
/* Element of a list of strings, at least one of which is known to
|
||||
appear in any R.E. matching the DFA. */
|
||||
struct dfamust
|
||||
{
|
||||
int exact;
|
||||
char *must;
|
||||
struct dfamust *next;
|
||||
};
|
||||
|
||||
/* A compiled regular expression. */
|
||||
struct regexp
|
||||
struct dfa
|
||||
{
|
||||
/* Stuff built by the scanner. */
|
||||
_charset *charsets; /* Array of character sets for _SET tokens. */
|
||||
int cindex; /* Index for adding new charsets. */
|
||||
int calloc; /* Number of charsets currently allocated. */
|
||||
charclass *charclasses; /* Array of character sets for CSET tokens. */
|
||||
int cindex; /* Index for adding new charclasses. */
|
||||
int calloc; /* Number of charclasses currently allocated. */
|
||||
|
||||
/* Stuff built by the parser. */
|
||||
_token *tokens; /* Postfix parse array. */
|
||||
token *tokens; /* Postfix parse array. */
|
||||
int tindex; /* Index for adding new tokens. */
|
||||
int talloc; /* Number of tokens currently allocated. */
|
||||
int depth; /* Depth required of an evaluation stack
|
||||
@ -323,15 +232,15 @@ struct regexp
|
||||
parse tree. */
|
||||
int nleaves; /* Number of leaves on the parse tree. */
|
||||
int nregexps; /* Count of parallel regexps being built
|
||||
with regparse(). */
|
||||
with dfaparse(). */
|
||||
|
||||
/* Stuff owned by the state builder. */
|
||||
_dfa_state *states; /* States of the regexp. */
|
||||
dfa_state *states; /* States of the dfa. */
|
||||
int sindex; /* Index for adding new states. */
|
||||
int salloc; /* Number of states currently allocated. */
|
||||
|
||||
/* Stuff built by the structure analyzer. */
|
||||
_position_set *follows; /* Array of follow sets, indexed by position
|
||||
position_set *follows; /* Array of follow sets, indexed by position
|
||||
index. The follow of a position is the set
|
||||
of positions containing characters that
|
||||
could conceivably follow a character
|
||||
@ -361,7 +270,7 @@ struct regexp
|
||||
int **fails; /* Transition tables after failing to accept
|
||||
on a state that potentially could do so. */
|
||||
int *success; /* Table of acceptance conditions used in
|
||||
regexecute and computed in build_state. */
|
||||
dfaexec and computed in build_state. */
|
||||
int *newlines; /* Transitions on newlines. The entry for a
|
||||
newline in any transition table is always
|
||||
-1 so we can count lines without wasting
|
||||
@ -369,40 +278,41 @@ struct regexp
|
||||
newline is stored separately and handled
|
||||
as a special case. Newline is also used
|
||||
as a sentinel at the end of the buffer. */
|
||||
char must[MUST_MAX];
|
||||
int mustn;
|
||||
struct dfamust *musts; /* List of strings, at least one of which
|
||||
is known to appear in any r.e. matching
|
||||
the dfa. */
|
||||
};
|
||||
|
||||
/* Some macros for user access to regexp internals. */
|
||||
/* Some macros for user access to dfa internals. */
|
||||
|
||||
/* ACCEPTING returns true if s could possibly be an accepting state of r. */
|
||||
#define ACCEPTING(s, r) ((r).states[s].constraint)
|
||||
|
||||
/* ACCEPTS_IN_CONTEXT returns true if the given state accepts in the
|
||||
specified context. */
|
||||
#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, reg) \
|
||||
_SUCCEEDS_IN_CONTEXT((reg).states[state].constraint, \
|
||||
#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, dfa) \
|
||||
SUCCEEDS_IN_CONTEXT((dfa).states[state].constraint, \
|
||||
prevn, currn, prevl, currl)
|
||||
|
||||
/* FIRST_MATCHING_REGEXP returns the index number of the first of parallel
|
||||
regexps that a given state could accept. Parallel regexps are numbered
|
||||
starting at 1. */
|
||||
#define FIRST_MATCHING_REGEXP(state, reg) (-(reg).states[state].first_end)
|
||||
#define FIRST_MATCHING_REGEXP(state, dfa) (-(dfa).states[state].first_end)
|
||||
|
||||
/* Entry points. */
|
||||
|
||||
#if __STDC__
|
||||
|
||||
/* Regsyntax() takes two arguments; the first sets the syntax bits described
|
||||
/* dfasyntax() takes two arguments; the first sets the syntax bits described
|
||||
earlier in this file, and the second sets the case-folding flag. */
|
||||
extern void regsyntax(int, int);
|
||||
extern void dfasyntax(int, int);
|
||||
|
||||
/* Compile the given string of the given length into the given struct regexp.
|
||||
/* Compile the given string of the given length into the given struct dfa.
|
||||
Final argument is a flag specifying whether to build a searching or an
|
||||
exact matcher. */
|
||||
extern void regcompile(const char *, size_t, struct regexp *, int);
|
||||
extern void dfacomp(char *, size_t, struct dfa *, int);
|
||||
|
||||
/* Execute the given struct regexp on the buffer of characters. The
|
||||
/* Execute the given struct dfa on the buffer of characters. The
|
||||
first char * points to the beginning, and the second points to the
|
||||
first character after the end of the buffer, which must be a writable
|
||||
place so a sentinel end-of-buffer marker can be stored there. The
|
||||
@ -414,37 +324,37 @@ extern void regcompile(const char *, size_t, struct regexp *, int);
|
||||
order to verify backreferencing; otherwise the flag will be cleared.
|
||||
Returns NULL if no match is found, or a pointer to the first
|
||||
character after the first & shortest matching string in the buffer. */
|
||||
extern char *regexecute(struct regexp *, char *, char *, int, int *, int *);
|
||||
extern char *dfaexec(struct dfa *, char *, char *, int, int *, int *);
|
||||
|
||||
/* Free the storage held by the components of a struct regexp. */
|
||||
extern void regfree(struct regexp *);
|
||||
/* Free the storage held by the components of a struct dfa. */
|
||||
extern void dfafree(struct dfa *);
|
||||
|
||||
/* Entry points for people who know what they're doing. */
|
||||
|
||||
/* Initialize the components of a struct regexp. */
|
||||
extern void reginit(struct regexp *);
|
||||
/* Initialize the components of a struct dfa. */
|
||||
extern void dfainit(struct dfa *);
|
||||
|
||||
/* Incrementally parse a string of given length into a struct regexp. */
|
||||
extern void regparse(const char *, size_t, struct regexp *);
|
||||
/* Incrementally parse a string of given length into a struct dfa. */
|
||||
extern void dfaparse(char *, size_t, struct dfa *);
|
||||
|
||||
/* Analyze a parsed regexp; second argument tells whether to build a searching
|
||||
or an exact matcher. */
|
||||
extern void reganalyze(struct regexp *, int);
|
||||
extern void dfaanalyze(struct dfa *, int);
|
||||
|
||||
/* Compute, for each possible character, the transitions out of a given
|
||||
state, storing them in an array of integers. */
|
||||
extern void regstate(int, struct regexp *, int []);
|
||||
extern void dfastate(int, struct dfa *, int []);
|
||||
|
||||
/* Error handling. */
|
||||
|
||||
/* Regerror() is called by the regexp routines whenever an error occurs. It
|
||||
/* dfaerror() is called by the regexp routines whenever an error occurs. It
|
||||
takes a single argument, a NUL-terminated string describing the error.
|
||||
The default regerror() prints the error message to stderr and exits.
|
||||
The user can provide a different regfree() if so desired. */
|
||||
extern void regerror(const char *);
|
||||
The default dfaerror() prints the error message to stderr and exits.
|
||||
The user can provide a different dfafree() if so desired. */
|
||||
extern void dfaerror(char *);
|
||||
|
||||
#else /* ! __STDC__ */
|
||||
extern void regsyntax(), regcompile(), regfree(), reginit(), regparse();
|
||||
extern void reganalyze(), regstate(), regerror();
|
||||
extern char *regexecute();
|
||||
extern void dfasyntax(), dfacomp(), dfafree(), dfainit(), dfaparse();
|
||||
extern void dfaanalyze(), dfastate(), dfaerror();
|
||||
extern char *dfaexec();
|
||||
#endif /* ! __STDC__ */
|
||||
|
@ -3,12 +3,13 @@
|
||||
"Keep this file name-space clean" means, talk to roland@gnu.ai.mit.edu
|
||||
before changing it!
|
||||
|
||||
Copyright (C) 1987, 88, 89, 90, 91, 1992 Free Software Foundation, Inc.
|
||||
Copyright (C) 1987, 88, 89, 90, 91, 92, 1993
|
||||
Free Software Foundation, Inc.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2, or (at your option)
|
||||
any later version.
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the terms of the GNU General Public License as published by the
|
||||
Free Software Foundation; either version 2, or (at your option) any
|
||||
later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
@ -17,49 +18,67 @@
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||
|
||||
/* AIX requires this to be the first thing in the file. */
|
||||
/* NOTE!!! AIX requires this to be the first thing in the file.
|
||||
Do not put ANYTHING before it! */
|
||||
#if !defined (__GNUC__) && defined (_AIX)
|
||||
#pragma alloca
|
||||
#endif
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#ifdef __GNUC__
|
||||
#define alloca __builtin_alloca
|
||||
#else /* not __GNUC__ */
|
||||
#if defined(sparc) && !defined(USG) && !defined(SVR4) && !defined(__svr4__)
|
||||
#if defined (HAVE_ALLOCA_H) || (defined(sparc) && (defined(sun) || (!defined(USG) && !defined(SVR4) && !defined(__svr4__))))
|
||||
#include <alloca.h>
|
||||
#else
|
||||
#ifdef _AIX
|
||||
#pragma alloca
|
||||
#else
|
||||
#ifndef _AIX
|
||||
char *alloca ();
|
||||
#endif
|
||||
#endif /* sparc */
|
||||
#endif /* alloca.h */
|
||||
#endif /* not __GNUC__ */
|
||||
|
||||
#ifdef LIBC
|
||||
/* For when compiled as part of the GNU C library. */
|
||||
#include <ansidecl.h>
|
||||
#if !__STDC__ && !defined(const) && IN_GCC
|
||||
#define const
|
||||
#endif
|
||||
|
||||
/* This tells Alpha OSF/1 not to define a getopt prototype in <stdio.h>. */
|
||||
#ifndef _NO_PROTO
|
||||
#define _NO_PROTO
|
||||
#endif
|
||||
|
||||
#include <stdio.h>
|
||||
|
||||
/* Comment out all this code if we are using the GNU C Library, and are not
|
||||
actually compiling the library itself. This code is part of the GNU C
|
||||
Library, but also included in many other GNU distributions. Compiling
|
||||
and linking in this code is a waste when using the GNU C library
|
||||
(especially if it is a shared library). Rather than having every GNU
|
||||
program understand `configure --with-gnu-libc' and omit the object files,
|
||||
it is simpler to just do this in the source for each such file. */
|
||||
|
||||
#if defined (_LIBC) || !defined (__GNU_LIBRARY__)
|
||||
|
||||
|
||||
/* This needs to come after some library #include
|
||||
to get __GNU_LIBRARY__ defined. */
|
||||
#ifdef __GNU_LIBRARY__
|
||||
#undef alloca
|
||||
/* Don't include stdlib.h for non-GNU C libraries because some of them
|
||||
contain conflicting prototypes for getopt. */
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#else /* Not GNU C library. */
|
||||
#define __alloca alloca
|
||||
#endif /* GNU C library. */
|
||||
|
||||
|
||||
#ifndef __STDC__
|
||||
#define const
|
||||
#endif
|
||||
|
||||
/* If GETOPT_COMPAT is defined, `+' as well as `--' can introduce a
|
||||
long-named option. Because this is not POSIX.2 compliant, it is
|
||||
being phased out. */
|
||||
#define GETOPT_COMPAT
|
||||
/* #define GETOPT_COMPAT */
|
||||
|
||||
/* This version of `getopt' appears to the caller like standard Unix `getopt'
|
||||
but it behaves differently for the user, since it allows the user
|
||||
@ -97,6 +116,7 @@ char *optarg = 0;
|
||||
Otherwise, `optind' communicates from one call to the next
|
||||
how much of ARGV has been scanned so far. */
|
||||
|
||||
/* XXX 1003.2 says this must be 1 before any call. */
|
||||
int optind = 0;
|
||||
|
||||
/* The next char to be scanned in the option-element
|
||||
@ -113,6 +133,12 @@ static char *nextchar;
|
||||
|
||||
int opterr = 1;
|
||||
|
||||
/* Set to an option character which was unrecognized.
|
||||
This must be initialized on some systems to avoid linking in the
|
||||
system's own getopt implementation. */
|
||||
|
||||
int optopt = '?';
|
||||
|
||||
/* Describe how to deal with options that follow non-option ARGV-elements.
|
||||
|
||||
If the caller did not specify anything,
|
||||
@ -148,6 +174,10 @@ static enum
|
||||
} ordering;
|
||||
|
||||
#ifdef __GNU_LIBRARY__
|
||||
/* We want to avoid inclusion of string.h with non-GNU libraries
|
||||
because there are many ways it can cause trouble.
|
||||
On some systems, it contains special magic macros that don't work
|
||||
in GCC. */
|
||||
#include <string.h>
|
||||
#define my_index strchr
|
||||
#define my_bcopy(src, dst, n) memcpy ((dst), (src), (n))
|
||||
@ -159,22 +189,23 @@ static enum
|
||||
char *getenv ();
|
||||
|
||||
static char *
|
||||
my_index (string, chr)
|
||||
char *string;
|
||||
my_index (str, chr)
|
||||
const char *str;
|
||||
int chr;
|
||||
{
|
||||
while (*string)
|
||||
while (*str)
|
||||
{
|
||||
if (*string == chr)
|
||||
return string;
|
||||
string++;
|
||||
if (*str == chr)
|
||||
return (char *) str;
|
||||
str++;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void
|
||||
my_bcopy (from, to, size)
|
||||
char *from, *to;
|
||||
const char *from;
|
||||
char *to;
|
||||
int size;
|
||||
{
|
||||
int i;
|
||||
@ -210,10 +241,12 @@ exchange (argv)
|
||||
|
||||
/* Interchange the two blocks of data in ARGV. */
|
||||
|
||||
my_bcopy (&argv[first_nonopt], temp, nonopts_size);
|
||||
my_bcopy (&argv[last_nonopt], &argv[first_nonopt],
|
||||
my_bcopy ((char *) &argv[first_nonopt], (char *) temp, nonopts_size);
|
||||
my_bcopy ((char *) &argv[last_nonopt], (char *) &argv[first_nonopt],
|
||||
(optind - last_nonopt) * sizeof (char *));
|
||||
my_bcopy (temp, &argv[first_nonopt + optind - last_nonopt], nonopts_size);
|
||||
my_bcopy ((char *) temp,
|
||||
(char *) &argv[first_nonopt + optind - last_nonopt],
|
||||
nonopts_size);
|
||||
|
||||
/* Update records for the slots the non-options now occupy. */
|
||||
|
||||
@ -489,7 +522,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
||||
fprintf (stderr, "%s: option `%s' requires an argument\n",
|
||||
argv[0], argv[optind - 1]);
|
||||
nextchar += strlen (nextchar);
|
||||
return '?';
|
||||
return optstring[0] == ':' ? ':' : '?';
|
||||
}
|
||||
}
|
||||
nextchar += strlen (nextchar);
|
||||
@ -523,7 +556,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
||||
fprintf (stderr, "%s: unrecognized option `%c%s'\n",
|
||||
argv[0], argv[optind][0], nextchar);
|
||||
}
|
||||
nextchar += strlen (nextchar);
|
||||
nextchar = (char *) "";
|
||||
optind++;
|
||||
return '?';
|
||||
}
|
||||
@ -537,18 +570,24 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
||||
|
||||
/* Increment `optind' when we start to process its last character. */
|
||||
if (*nextchar == '\0')
|
||||
optind++;
|
||||
++optind;
|
||||
|
||||
if (temp == NULL || c == ':')
|
||||
{
|
||||
if (opterr)
|
||||
{
|
||||
#if 0
|
||||
if (c < 040 || c >= 0177)
|
||||
fprintf (stderr, "%s: unrecognized option, character code 0%o\n",
|
||||
argv[0], c);
|
||||
else
|
||||
fprintf (stderr, "%s: unrecognized option `-%c'\n", argv[0], c);
|
||||
#else
|
||||
/* 1003.2 specifies the format of this message. */
|
||||
fprintf (stderr, "%s: illegal option -- %c\n", argv[0], c);
|
||||
#endif
|
||||
}
|
||||
optopt = c;
|
||||
return '?';
|
||||
}
|
||||
if (temp[1] == ':')
|
||||
@ -568,7 +607,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
||||
else
|
||||
{
|
||||
/* This is an option that requires an argument. */
|
||||
if (*nextchar != 0)
|
||||
if (*nextchar != '\0')
|
||||
{
|
||||
optarg = nextchar;
|
||||
/* If we end this ARGV-element by taking the rest as an arg,
|
||||
@ -578,8 +617,20 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
||||
else if (optind == argc)
|
||||
{
|
||||
if (opterr)
|
||||
{
|
||||
#if 0
|
||||
fprintf (stderr, "%s: option `-%c' requires an argument\n",
|
||||
argv[0], c);
|
||||
#else
|
||||
/* 1003.2 specifies the format of this message. */
|
||||
fprintf (stderr, "%s: option requires an argument -- %c\n",
|
||||
argv[0], c);
|
||||
#endif
|
||||
}
|
||||
optopt = c;
|
||||
if (optstring[0] == ':')
|
||||
c = ':';
|
||||
else
|
||||
c = '?';
|
||||
}
|
||||
else
|
||||
@ -604,6 +655,8 @@ getopt (argc, argv, optstring)
|
||||
(int *) 0,
|
||||
0);
|
||||
}
|
||||
|
||||
#endif /* _LIBC or not __GNU_LIBRARY__. */
|
||||
|
||||
#ifdef TEST
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/* Declarations for getopt.
|
||||
Copyright (C) 1989, 1990, 1991, 1992 Free Software Foundation, Inc.
|
||||
Copyright (C) 1989, 1990, 1991, 1992, 1993 Free Software Foundation, Inc.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2, or (at your option)
|
||||
any later version.
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the terms of the GNU General Public License as published by the
|
||||
Free Software Foundation; either version 2, or (at your option) any
|
||||
later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
@ -13,11 +13,15 @@
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||
|
||||
#ifndef _GETOPT_H
|
||||
#define _GETOPT_H 1
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/* For communication from `getopt' to the caller.
|
||||
When `getopt' finds an option that takes an argument,
|
||||
the argument value is returned here.
|
||||
@ -45,6 +49,10 @@ extern int optind;
|
||||
|
||||
extern int opterr;
|
||||
|
||||
/* Set to an option character which was unrecognized. */
|
||||
|
||||
extern int optopt;
|
||||
|
||||
/* Describe the long-named options requested by the application.
|
||||
The LONG_OPTIONS argument to getopt_long or getopt_long_only is a vector
|
||||
of `struct option' terminated by an element containing a name which is
|
||||
@ -82,15 +90,19 @@ struct option
|
||||
|
||||
/* Names for the values of the `has_arg' field of `struct option'. */
|
||||
|
||||
enum _argtype
|
||||
{
|
||||
no_argument,
|
||||
required_argument,
|
||||
optional_argument
|
||||
};
|
||||
#define no_argument 0
|
||||
#define required_argument 1
|
||||
#define optional_argument 2
|
||||
|
||||
#if __STDC__
|
||||
#if defined(__GNU_LIBRARY__)
|
||||
/* Many other libraries have conflicting prototypes for getopt, with
|
||||
differences in the consts, in stdlib.h. To avoid compilation
|
||||
errors, only prototype getopt for the GNU C library. */
|
||||
extern int getopt (int argc, char *const *argv, const char *shortopts);
|
||||
#else /* not __GNU_LIBRARY__ */
|
||||
extern int getopt ();
|
||||
#endif /* not __GNU_LIBRARY__ */
|
||||
extern int getopt_long (int argc, char *const *argv, const char *shortopts,
|
||||
const struct option *longopts, int *longind);
|
||||
extern int getopt_long_only (int argc, char *const *argv,
|
||||
@ -110,4 +122,8 @@ extern int getopt_long_only ();
|
||||
extern int _getopt_internal ();
|
||||
#endif /* not __STDC__ */
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _GETOPT_H */
|
||||
|
42
gnu/usr.bin/grep/getpagesize.h
Normal file
42
gnu/usr.bin/grep/getpagesize.h
Normal file
@ -0,0 +1,42 @@
|
||||
#ifdef BSD
|
||||
#ifndef BSD4_1
|
||||
#define HAVE_GETPAGESIZE
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#ifndef HAVE_GETPAGESIZE
|
||||
|
||||
#ifdef VMS
|
||||
#define getpagesize() 512
|
||||
#endif
|
||||
|
||||
#ifdef HAVE_UNISTD_H
|
||||
#include <unistd.h>
|
||||
#endif
|
||||
|
||||
#ifdef _SC_PAGESIZE
|
||||
#define getpagesize() sysconf(_SC_PAGESIZE)
|
||||
#else
|
||||
|
||||
#ifdef HAVE_SYS_PARAM_H
|
||||
#include <sys/param.h>
|
||||
|
||||
#ifdef EXEC_PAGESIZE
|
||||
#define getpagesize() EXEC_PAGESIZE
|
||||
#else
|
||||
#ifdef NBPG
|
||||
#define getpagesize() NBPG * CLSIZE
|
||||
#ifndef CLSIZE
|
||||
#define CLSIZE 1
|
||||
#endif /* no CLSIZE */
|
||||
#else /* no NBPG */
|
||||
#define getpagesize() NBPC
|
||||
#endif /* no NBPG */
|
||||
#endif /* no EXEC_PAGESIZE */
|
||||
#else /* !HAVE_SYS_PARAM_H */
|
||||
#define getpagesize() 8192 /* punt totally */
|
||||
#endif /* !HAVE_SYS_PARAM_H */
|
||||
#endif /* no _SC_PAGESIZE */
|
||||
|
||||
#endif /* not HAVE_GETPAGESIZE */
|
||||
|
@ -1,234 +1,375 @@
|
||||
.TH GREP 1 "1988 December 13" "GNU Project" \" -*- nroff -*-
|
||||
.UC 4
|
||||
.TH GREP 1 "1992 September 10" "GNU Project"
|
||||
.SH NAME
|
||||
grep, egrep \- print lines matching a regular expression
|
||||
.SH SYNOPSIS
|
||||
grep, egrep, fgrep \- print lines matching a pattern
|
||||
.SH SYNOPOSIS
|
||||
.B grep
|
||||
[
|
||||
.B \-CVbchilnsvwx
|
||||
] [
|
||||
.BI \- num
|
||||
] [
|
||||
.B \-AB
|
||||
.I num
|
||||
] [ [
|
||||
.BR \- [[ AB "] ]\c"
|
||||
.I "num"
|
||||
]
|
||||
[
|
||||
.BR \- [ CEFGVBchilnsvwx ]
|
||||
]
|
||||
[
|
||||
.B \-e
|
||||
]
|
||||
.I expr
|
||||
.I pattern
|
||||
|
|
||||
.B \-f
|
||||
.I file
|
||||
.BI \-f file
|
||||
] [
|
||||
.I "files ..."
|
||||
.I files...
|
||||
]
|
||||
.SH DESCRIPTION
|
||||
.I Grep
|
||||
searches the files listed in the arguments (or standard
|
||||
input if no files are given) for all lines that contain a match for
|
||||
the given
|
||||
.IR expr .
|
||||
If any lines match, they are printed.
|
||||
.PP
|
||||
Also, if any matches were found,
|
||||
.I grep
|
||||
exits with a status of 0, but if no matches were found it exits
|
||||
with a status of 1. This is useful for building shell scripts that
|
||||
use
|
||||
.I grep
|
||||
as a condition for, for example, the
|
||||
.I if
|
||||
statement.
|
||||
.B Grep
|
||||
searches the named input
|
||||
.I files
|
||||
(or standard input if no files are named, or
|
||||
the file name
|
||||
.B \-
|
||||
is given)
|
||||
for lines containing a match to the given
|
||||
.IR pattern .
|
||||
By default,
|
||||
.B grep
|
||||
prints the matching lines.
|
||||
.PP
|
||||
When invoked as
|
||||
.I egrep
|
||||
the syntax of the
|
||||
.I expr
|
||||
is slightly different; See below.
|
||||
.br
|
||||
.SH "REGULAR EXPRESSIONS"
|
||||
.RS 2.5i
|
||||
.ta 1i 2i
|
||||
.sp
|
||||
.ti -2.0i
|
||||
(grep) (egrep) (explanation)
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\fIc\fP \fIc\fP a single (non-meta) character matches itself.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\&. . matches any single character except newline.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\? ? postfix operator; preceeding item is optional.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\(** \(** postfix operator; preceeding item 0 or
|
||||
more times.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\+ + postfix operator; preceeding item 1 or
|
||||
more times.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\| | infix operator; matches either
|
||||
argument.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
^ ^ matches the empty string at the beginning of a line.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
$ $ matches the empty string at the end of a line.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\< \\< matches the empty string at the beginning of a word.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\> \\> matches the empty string at the end of a word.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
[\fIchars\fP] [\fIchars\fP] match any character in the given class; if the
|
||||
first character after [ is ^, match any character
|
||||
not in the given class; a range of characters may
|
||||
be specified by \fIfirst\-last\fP; for example, \\W
|
||||
(below) is equivalent to the class [^A\-Za\-z0\-9]
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\( \\) ( ) parentheses are used to override operator precedence.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\\fIdigit\fP \\\fIdigit\fP \\\fIn\fP matches a repeat of the text
|
||||
matched earlier in the regexp by the subexpression inside the nth
|
||||
opening parenthesis.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\ \\ any special character may be preceded
|
||||
by a backslash to match it literally.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
(the following are for compatibility with GNU Emacs)
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\b \\b matches the empty string at the edge of a word.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\B \\B matches the empty string if not at the edge of a word.
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\w \\w matches word-constituent characters (letters & digits).
|
||||
.sp
|
||||
.ti -2.0i
|
||||
\\W \\W matches characters that are not word-constituent.
|
||||
.RE
|
||||
.PP
|
||||
Operator precedence is (highest to lowest) ?, \(**, and +, concatenation,
|
||||
and finally |. All other constructs are syntactically identical to
|
||||
normal characters. For the truly interested, the file dfa.c describes
|
||||
(and implements) the exact grammar understood by the parser.
|
||||
.SH OPTIONS
|
||||
There are three major variants of
|
||||
.BR grep ,
|
||||
controlled by the following options.
|
||||
.PD 0
|
||||
.TP
|
||||
.BI \-A " num"
|
||||
print <num> lines of context after every matching line
|
||||
.B \-G
|
||||
Interpret
|
||||
.I pattern
|
||||
as a basic regular expression (see below). This is the default.
|
||||
.TP
|
||||
.BI \-B " num"
|
||||
print
|
||||
.I num
|
||||
lines of context before every matching line
|
||||
.B \-E
|
||||
Interpret
|
||||
.I pattern
|
||||
as an extended regular expression (see below).
|
||||
.TP
|
||||
.B \-C
|
||||
print 2 lines of context on each side of every match
|
||||
.B \-F
|
||||
Interpret
|
||||
.I pattern
|
||||
as a list of fixed strings, separated by newlines,
|
||||
any of which is to be matched.
|
||||
.LP
|
||||
In addition, two variant programs
|
||||
.B egrep
|
||||
and
|
||||
.B fgrep
|
||||
are available.
|
||||
.B Egrep
|
||||
is similiar (but not identical) to
|
||||
.BR "grep\ \-E" ,
|
||||
and is compatible with the historical Unix
|
||||
.BR egrep .
|
||||
.B Fgrep
|
||||
is the same as
|
||||
.BR "grep\ \-F" .
|
||||
.PD
|
||||
.LP
|
||||
All variants of
|
||||
.B grep
|
||||
understand the following options:
|
||||
.PD 0
|
||||
.TP
|
||||
.BI \- num
|
||||
print
|
||||
Matches will be printed with
|
||||
.I num
|
||||
lines of context on each side of every match
|
||||
lines of leading and trailing context. However,
|
||||
.B grep
|
||||
will never print any given line more than once.
|
||||
.TP
|
||||
.BI \-A " num"
|
||||
Print
|
||||
.I num
|
||||
lines of trailing context after matching lines.
|
||||
.TP
|
||||
.BI \-B " num"
|
||||
Print
|
||||
.I num
|
||||
lines of leading context before matching lines.
|
||||
.TP
|
||||
.B \-C
|
||||
Equivalent to
|
||||
.BR \-2 .
|
||||
.TP
|
||||
.B \-V
|
||||
print the version number on the diagnostic output
|
||||
Print the version number of
|
||||
.B grep
|
||||
to standard error. This version number should
|
||||
be included in all bug reports (see below).
|
||||
.TP
|
||||
.B \-b
|
||||
print every match preceded by its byte offset
|
||||
Print the byte offset within the input file before
|
||||
each line of output.
|
||||
.TP
|
||||
.B \-c
|
||||
print a total count of matching lines only
|
||||
Suppress normal output; instead print a count of
|
||||
matching lines for each input file.
|
||||
With the
|
||||
.B \-v
|
||||
option (see below), count non-matching lines.
|
||||
.TP
|
||||
.BI \-e " expr"
|
||||
search for
|
||||
.IR expr ;
|
||||
useful if
|
||||
.I expr
|
||||
begins with \-
|
||||
.BI \-e " pattern"
|
||||
Use
|
||||
.I pattern
|
||||
as the pattern; useful to protect patterns beginning with
|
||||
.BR \- .
|
||||
.TP
|
||||
.BI \-f " file"
|
||||
search for the expression contained in
|
||||
.I file
|
||||
Obtain the pattern from
|
||||
.IR file .
|
||||
.TP
|
||||
.B \-h
|
||||
don't display filenames on matches
|
||||
Suppress the prefixing of filenames on output
|
||||
when multiple files are searched.
|
||||
.TP
|
||||
.B \-i
|
||||
ignore case difference when comparing strings
|
||||
Ignore case distinctions in both the
|
||||
.I pattern
|
||||
and the input files.
|
||||
.TP
|
||||
.B \-L
|
||||
Suppress normal output; instead print the name
|
||||
of each input file from which no output would
|
||||
normally have been printed.
|
||||
.TP
|
||||
.B \-l
|
||||
list files containing matches only
|
||||
Suppress normal output; instead print
|
||||
the name of each input file from which output
|
||||
would normally have been printed.
|
||||
.TP
|
||||
.B \-n
|
||||
print each match preceded by its line number
|
||||
Prefix each line of output with the line number
|
||||
within its input file.
|
||||
.TP
|
||||
.B \-q
|
||||
Quiet; suppress normal output.
|
||||
.TP
|
||||
.B \-s
|
||||
run silently producing no output except error messages
|
||||
Suppress error messages about nonexistent or unreadable files.
|
||||
.TP
|
||||
.B \-v
|
||||
print only lines that contain no matches for the <expr>
|
||||
Invert the sense of matching, to select non-matching lines.
|
||||
.TP
|
||||
.B \-w
|
||||
print only lines where the match is a complete word
|
||||
Select only those lines containing matches that form whole words.
|
||||
The test is that the matching substring must either be at the
|
||||
beginning of the line, or preceded by a non-word constituent
|
||||
character. Similarly, it must be either at the end of the line
|
||||
or followed by a non-word constituent character. Word-constituent
|
||||
characters are letters, digits, and the underscore.
|
||||
.TP
|
||||
.B \-x
|
||||
print only lines where the match is a whole line
|
||||
.SH "SEE ALSO"
|
||||
emacs(1), ed(1), sh(1),
|
||||
.I "GNU Emacs Manual"
|
||||
.SH INCOMPATIBILITIES
|
||||
The following incompatibilities with UNIX
|
||||
.I grep
|
||||
exist:
|
||||
Select only those matches that exactly match the whole line.
|
||||
.PD
|
||||
.SH "REGULAR EXPRESSIONS"
|
||||
.PP
|
||||
.RS 0.5i
|
||||
The context-dependent meaning of \(** is not quite the same (grep only).
|
||||
A regular expression is a pattern that describes a set of strings.
|
||||
Regular expressions are constructed analagously to arithmetic
|
||||
expressions, by using various operators to combine smaller expressions.
|
||||
.PP
|
||||
.B \-b
|
||||
prints a byte offset instead of a block offset.
|
||||
.B Grep
|
||||
understands two different versions of regular expression syntax:
|
||||
``basic'' and ``extended.'' In
|
||||
.RB "GNU\ " grep ,
|
||||
there is no difference in available functionality using either syntax.
|
||||
In other implementations, basic regular expressions are less powerful.
|
||||
The following description applies to extended regular expressions;
|
||||
differences for basic regular expressions are summarized afterwards.
|
||||
.PP
|
||||
The {\fIm,n\fP} construct of System V grep is not implemented.
|
||||
The fundamental building blocks are the regular expressions that match
|
||||
a single character. Most characters, including all letters and digits,
|
||||
are regular expressions that match themselves. Any metacharacter with
|
||||
special meaning may be quoted by preceding it with a backslash.
|
||||
.PP
|
||||
A list of characters enclosed by
|
||||
.B [
|
||||
and
|
||||
.B ]
|
||||
matches any single
|
||||
character in that list; if the first character of the list
|
||||
is the caret
|
||||
.B ^
|
||||
then it matches any character
|
||||
.I not
|
||||
in the list.
|
||||
For example, the regular expression
|
||||
.B [0123456789]
|
||||
matches any single digit. A range of ASCII characters
|
||||
may be specified by giving the first and last characters, separated
|
||||
by a hyphen.
|
||||
Finally, certain named classes of characters are predefined.
|
||||
Their names are self explanatory, and they are
|
||||
.BR [:alnum:] ,
|
||||
.BR [:alpha:] ,
|
||||
.BR [:cntrl:] ,
|
||||
.BR [:digit:] ,
|
||||
.BR [:graph:] ,
|
||||
.BR [:lower:] ,
|
||||
.BR [:print:] ,
|
||||
.BR [:punct:] ,
|
||||
.BR [:space:] ,
|
||||
.BR [:upper:] ,
|
||||
and
|
||||
.BR [:xdigit:].
|
||||
For example,
|
||||
.B [[:alnum:]]
|
||||
means
|
||||
.BR [0-9A-Za-z] ,
|
||||
except the latter form is dependent upon the ASCII character encoding,
|
||||
whereas the former is portable.
|
||||
(Note that the brackets in these class names are part of the symbolic
|
||||
names, and must be included in addition to the brackets delimiting
|
||||
the bracket list.) Most metacharacters lose their special meaning
|
||||
inside lists. To include a literal
|
||||
.B ]
|
||||
place it first in the list. Similarly, to include a literal
|
||||
.B ^
|
||||
place it anywhere but first. Finally, to include a literal
|
||||
.B \-
|
||||
place it last.
|
||||
.PP
|
||||
The period
|
||||
.B .
|
||||
matches any single character.
|
||||
The symbol
|
||||
.B \ew
|
||||
is a synonym for
|
||||
.B [[:alnum:]]
|
||||
and
|
||||
.B \eW
|
||||
is a synonym for
|
||||
.BR [^[:alnum]] .
|
||||
.PP
|
||||
The caret
|
||||
.B ^
|
||||
and the dollar sign
|
||||
.B $
|
||||
are metacharacters that respectively match the empty string at the
|
||||
beginning and end of a line.
|
||||
The symbols
|
||||
.B \e<
|
||||
and
|
||||
.B \e>
|
||||
respectively match the empty string at the beginning and end of a word.
|
||||
The symbol
|
||||
.B \eb
|
||||
matches the empty string at the edge of a word,
|
||||
and
|
||||
.B \eB
|
||||
matches the empty string provided it's
|
||||
.I not
|
||||
at the edge of a word.
|
||||
.PP
|
||||
A regular expression matching a single character may be followed
|
||||
by one of several repetition operators:
|
||||
.PD 0
|
||||
.TP
|
||||
.B ?
|
||||
The preceding item is optional and matched at most once.
|
||||
.TP
|
||||
.B *
|
||||
The preceding item will be matched zero or more times.
|
||||
.TP
|
||||
.B +
|
||||
The preceding item will be matched one or more times.
|
||||
.TP
|
||||
.BI { n }
|
||||
The preceding item is matched exactly
|
||||
.I n
|
||||
times.
|
||||
.TP
|
||||
.BI { n ,}
|
||||
The preceding item is matched
|
||||
.I n
|
||||
or more times.
|
||||
.TP
|
||||
.BI {, m }
|
||||
The preceding item is optional and is matched at most
|
||||
.I m
|
||||
times.
|
||||
.TP
|
||||
.BI { n , m }
|
||||
The preceding item is matched at least
|
||||
.I n
|
||||
times, but not more than
|
||||
.I m
|
||||
times.
|
||||
.PD
|
||||
.PP
|
||||
Two regular expressions may be concatenated; the resulting
|
||||
regular expression matches any string formed by concatenating
|
||||
two substrings that respectively match the concatenated
|
||||
subexpressions.
|
||||
.PP
|
||||
Two regular expressions may be joined by the infix operator
|
||||
.BR | ;
|
||||
the resulting regular expression matches any string matching
|
||||
either subexpression.
|
||||
.PP
|
||||
Repetition takes precedence over concatenation, which in turn
|
||||
takes precedence over alternation. A whole subexpression may be
|
||||
enclosed in parentheses to override these precedence rules.
|
||||
.PP
|
||||
The backreference
|
||||
.BI \e n\c
|
||||
\&, where
|
||||
.I n
|
||||
is a single digit, matches the substring
|
||||
previously matched by the
|
||||
.IR n th
|
||||
parenthesized subexpression of the regular expression.
|
||||
.PP
|
||||
In basic regular expressions the metacharacters
|
||||
.BR ? ,
|
||||
.BR + ,
|
||||
.BR { ,
|
||||
.BR | ,
|
||||
.BR ( ,
|
||||
and
|
||||
.BR )
|
||||
lose their special meaning; instead use the backslashed
|
||||
versions
|
||||
.BR \e? ,
|
||||
.BR \e+ ,
|
||||
.BR \e{ ,
|
||||
.BR \e| ,
|
||||
.BR \e( ,
|
||||
and
|
||||
.BR \e) .
|
||||
.PP
|
||||
In
|
||||
.B egrep
|
||||
the metacharacter
|
||||
.B {
|
||||
loses its special meaning; instead use
|
||||
.BR \e{ .
|
||||
.SH DIAGNOSTICS
|
||||
.PP
|
||||
Normally, exit status is 0 if matches were found,
|
||||
and 1 if no matches were found. (The
|
||||
.B \-v
|
||||
option inverts the sense of the exit status.)
|
||||
Exit status is 2 if there were syntax errors
|
||||
in the pattern, inaccessible input files, or
|
||||
other system errors.
|
||||
.SH BUGS
|
||||
GNU \fIe?grep\fP has been thoroughly debugged and tested over a period
|
||||
of several years; we think it's a reliable beast or we wouldn't
|
||||
distribute it. If by some fluke of the universe you discover a bug,
|
||||
send a detailed description (including options, regular expressions,
|
||||
and a copy of an input file that can reproduce it) to mike@ai.mit.edu.
|
||||
.PP
|
||||
.SH AUTHORS
|
||||
Mike Haertel wrote the deterministic regexp code and the bulk
|
||||
of the program.
|
||||
Email bug reports to
|
||||
.BR bug-gnu-utils@prep.ai.mit.edu .
|
||||
Be sure to include the word ``grep'' somewhere in the ``Subject:'' field.
|
||||
.PP
|
||||
James A. Woods is responsible for the hybridized search strategy
|
||||
of using Boyer-Moore-Gosper fixed-string search as a filter
|
||||
before calling the general regexp matcher.
|
||||
Large repetition counts in the
|
||||
.BI { m , n }
|
||||
construct may cause grep to use lots of memory.
|
||||
In addition,
|
||||
certain other obscure regular expressions require exponential time
|
||||
and space, and may cause
|
||||
.B grep
|
||||
to run out of memory.
|
||||
.PP
|
||||
Arthur David Olson contributed code that finds fixed strings for
|
||||
the aforementioned BMG search for a large class of regexps.
|
||||
.PP
|
||||
Richard Stallman wrote the backtracking regexp matcher that is used
|
||||
for \\\fIdigit\fP backreferences, as well as the GNU getopt. The
|
||||
backtracking matcher was originally written for GNU Emacs.
|
||||
.PP
|
||||
D. A. Gwyn wrote the C alloca emulation that is provided so
|
||||
System V machines can run this program. (Alloca is used only
|
||||
by RMS' backtracking matcher, and then only rarely, so there
|
||||
is no loss if your machine doesn't have a "real" alloca.)
|
||||
.PP
|
||||
Scott Anderson and Henry Spencer designed the regression tests
|
||||
used in the "regress" script.
|
||||
.PP
|
||||
Paul Placeway wrote the original version of this manual page.
|
||||
Backreferences are very slow, and may require exponential time.
|
||||
|
File diff suppressed because it is too large
Load Diff
53
gnu/usr.bin/grep/grep.h
Normal file
53
gnu/usr.bin/grep/grep.h
Normal file
@ -0,0 +1,53 @@
|
||||
/* grep.h - interface to grep driver for searching subroutines.
|
||||
Copyright (C) 1992 Free Software Foundation, Inc.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2, or (at your option)
|
||||
any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||
|
||||
#if __STDC__
|
||||
|
||||
extern void fatal(const char *, int);
|
||||
|
||||
/* Grep.c expects the matchers vector to be terminated
|
||||
by an entry with a NULL name, and to contain at least
|
||||
an entry named "default". */
|
||||
|
||||
extern struct matcher
|
||||
{
|
||||
char *name;
|
||||
void (*compile)(char *, size_t);
|
||||
char *(*execute)(char *, size_t, char **);
|
||||
} matchers[];
|
||||
|
||||
#else
|
||||
|
||||
extern void fatal();
|
||||
|
||||
extern struct matcher
|
||||
{
|
||||
char *name;
|
||||
void (*compile)();
|
||||
char *(*execute)();
|
||||
} matchers[];
|
||||
|
||||
#endif
|
||||
|
||||
/* Exported from grep.c. */
|
||||
extern char *matcher;
|
||||
|
||||
/* The following flags are exported from grep for the matchers
|
||||
to look at. */
|
||||
extern int match_icase; /* -i */
|
||||
extern int match_words; /* -w */
|
||||
extern int match_lines; /* -x */
|
805
gnu/usr.bin/grep/kwset.c
Normal file
805
gnu/usr.bin/grep/kwset.c
Normal file
@ -0,0 +1,805 @@
|
||||
/* kwset.c - search for any of a set of keywords.
|
||||
Copyright 1989 Free Software Foundation
|
||||
Written August 1989 by Mike Haertel.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 1, or (at your option)
|
||||
any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
|
||||
The author may be reached (Email) at the address mike@ai.mit.edu,
|
||||
or (US mail) as Mike Haertel c/o Free Software Foundation. */
|
||||
|
||||
/* The algorithm implemented by these routines bears a startling resemblence
|
||||
to one discovered by Beate Commentz-Walter, although it is not identical.
|
||||
See "A String Matching Algorithm Fast on the Average," Technical Report,
|
||||
IBM-Germany, Scientific Center Heidelberg, Tiergartenstrasse 15, D-6900
|
||||
Heidelberg, Germany. See also Aho, A.V., and M. Corasick, "Efficient
|
||||
String Matching: An Aid to Bibliographic Search," CACM June 1975,
|
||||
Vol. 18, No. 6, which describes the failure function used below. */
|
||||
|
||||
|
||||
#ifdef STDC_HEADERS
|
||||
#include <limits.h>
|
||||
#include <stdlib.h>
|
||||
#else
|
||||
#define INT_MAX 2147483647
|
||||
#define UCHAR_MAX 255
|
||||
#ifdef __STDC__
|
||||
#include <stddef.h>
|
||||
#else
|
||||
#include <sys/types.h>
|
||||
#endif
|
||||
extern char *malloc();
|
||||
extern void free();
|
||||
#endif
|
||||
|
||||
#ifdef HAVE_MEMCHR
|
||||
#include <string.h>
|
||||
#ifdef NEED_MEMORY_H
|
||||
#include <memory.h>
|
||||
#endif
|
||||
#else
|
||||
#ifdef __STDC__
|
||||
extern void *memchr();
|
||||
#else
|
||||
extern char *memchr();
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#ifdef GREP
|
||||
extern char *xmalloc();
|
||||
#define malloc xmalloc
|
||||
#endif
|
||||
|
||||
#include "kwset.h"
|
||||
#include "obstack.h"
|
||||
|
||||
#define NCHAR (UCHAR_MAX + 1)
|
||||
#define obstack_chunk_alloc malloc
|
||||
#define obstack_chunk_free free
|
||||
|
||||
/* Balanced tree of edges and labels leaving a given trie node. */
|
||||
struct tree
|
||||
{
|
||||
struct tree *llink; /* Left link; MUST be first field. */
|
||||
struct tree *rlink; /* Right link (to larger labels). */
|
||||
struct trie *trie; /* Trie node pointed to by this edge. */
|
||||
unsigned char label; /* Label on this edge. */
|
||||
char balance; /* Difference in depths of subtrees. */
|
||||
};
|
||||
|
||||
/* Node of a trie representing a set of reversed keywords. */
|
||||
struct trie
|
||||
{
|
||||
unsigned int accepting; /* Word index of accepted word, or zero. */
|
||||
struct tree *links; /* Tree of edges leaving this node. */
|
||||
struct trie *parent; /* Parent of this node. */
|
||||
struct trie *next; /* List of all trie nodes in level order. */
|
||||
struct trie *fail; /* Aho-Corasick failure function. */
|
||||
int depth; /* Depth of this node from the root. */
|
||||
int shift; /* Shift function for search failures. */
|
||||
int maxshift; /* Max shift of self and descendents. */
|
||||
};
|
||||
|
||||
/* Structure returned opaquely to the caller, containing everything. */
|
||||
struct kwset
|
||||
{
|
||||
struct obstack obstack; /* Obstack for node allocation. */
|
||||
int words; /* Number of words in the trie. */
|
||||
struct trie *trie; /* The trie itself. */
|
||||
int mind; /* Minimum depth of an accepting node. */
|
||||
int maxd; /* Maximum depth of any node. */
|
||||
unsigned char delta[NCHAR]; /* Delta table for rapid search. */
|
||||
struct trie *next[NCHAR]; /* Table of children of the root. */
|
||||
char *target; /* Target string if there's only one. */
|
||||
int mind2; /* Used in Boyer-Moore search for one string. */
|
||||
char *trans; /* Character translation table. */
|
||||
};
|
||||
|
||||
/* Allocate and initialize a keyword set object, returning an opaque
|
||||
pointer to it. Return NULL if memory is not available. */
|
||||
kwset_t
|
||||
kwsalloc(trans)
|
||||
char *trans;
|
||||
{
|
||||
struct kwset *kwset;
|
||||
|
||||
kwset = (struct kwset *) malloc(sizeof (struct kwset));
|
||||
if (!kwset)
|
||||
return 0;
|
||||
|
||||
obstack_init(&kwset->obstack);
|
||||
kwset->words = 0;
|
||||
kwset->trie
|
||||
= (struct trie *) obstack_alloc(&kwset->obstack, sizeof (struct trie));
|
||||
if (!kwset->trie)
|
||||
{
|
||||
kwsfree((kwset_t) kwset);
|
||||
return 0;
|
||||
}
|
||||
kwset->trie->accepting = 0;
|
||||
kwset->trie->links = 0;
|
||||
kwset->trie->parent = 0;
|
||||
kwset->trie->next = 0;
|
||||
kwset->trie->fail = 0;
|
||||
kwset->trie->depth = 0;
|
||||
kwset->trie->shift = 0;
|
||||
kwset->mind = INT_MAX;
|
||||
kwset->maxd = -1;
|
||||
kwset->target = 0;
|
||||
kwset->trans = trans;
|
||||
|
||||
return (kwset_t) kwset;
|
||||
}
|
||||
|
||||
/* Add the given string to the contents of the keyword set. Return NULL
|
||||
for success, an error message otherwise. */
|
||||
char *
|
||||
kwsincr(kws, text, len)
|
||||
kwset_t kws;
|
||||
char *text;
|
||||
size_t len;
|
||||
{
|
||||
struct kwset *kwset;
|
||||
register struct trie *trie;
|
||||
register unsigned char label;
|
||||
register struct tree *link;
|
||||
register int depth;
|
||||
struct tree *links[12];
|
||||
enum { L, R } dirs[12];
|
||||
struct tree *t, *r, *l, *rl, *lr;
|
||||
|
||||
kwset = (struct kwset *) kws;
|
||||
trie = kwset->trie;
|
||||
text += len;
|
||||
|
||||
/* Descend the trie (built of reversed keywords) character-by-character,
|
||||
installing new nodes when necessary. */
|
||||
while (len--)
|
||||
{
|
||||
label = kwset->trans ? kwset->trans[(unsigned char) *--text] : *--text;
|
||||
|
||||
/* Descend the tree of outgoing links for this trie node,
|
||||
looking for the current character and keeping track
|
||||
of the path followed. */
|
||||
link = trie->links;
|
||||
links[0] = (struct tree *) &trie->links;
|
||||
dirs[0] = L;
|
||||
depth = 1;
|
||||
|
||||
while (link && label != link->label)
|
||||
{
|
||||
links[depth] = link;
|
||||
if (label < link->label)
|
||||
dirs[depth++] = L, link = link->llink;
|
||||
else
|
||||
dirs[depth++] = R, link = link->rlink;
|
||||
}
|
||||
|
||||
/* The current character doesn't have an outgoing link at
|
||||
this trie node, so build a new trie node and install
|
||||
a link in the current trie node's tree. */
|
||||
if (!link)
|
||||
{
|
||||
link = (struct tree *) obstack_alloc(&kwset->obstack,
|
||||
sizeof (struct tree));
|
||||
if (!link)
|
||||
return "memory exhausted";
|
||||
link->llink = 0;
|
||||
link->rlink = 0;
|
||||
link->trie = (struct trie *) obstack_alloc(&kwset->obstack,
|
||||
sizeof (struct trie));
|
||||
if (!link->trie)
|
||||
return "memory exhausted";
|
||||
link->trie->accepting = 0;
|
||||
link->trie->links = 0;
|
||||
link->trie->parent = trie;
|
||||
link->trie->next = 0;
|
||||
link->trie->fail = 0;
|
||||
link->trie->depth = trie->depth + 1;
|
||||
link->trie->shift = 0;
|
||||
link->label = label;
|
||||
link->balance = 0;
|
||||
|
||||
/* Install the new tree node in its parent. */
|
||||
if (dirs[--depth] == L)
|
||||
links[depth]->llink = link;
|
||||
else
|
||||
links[depth]->rlink = link;
|
||||
|
||||
/* Back up the tree fixing the balance flags. */
|
||||
while (depth && !links[depth]->balance)
|
||||
{
|
||||
if (dirs[depth] == L)
|
||||
--links[depth]->balance;
|
||||
else
|
||||
++links[depth]->balance;
|
||||
--depth;
|
||||
}
|
||||
|
||||
/* Rebalance the tree by pointer rotations if necessary. */
|
||||
if (depth && ((dirs[depth] == L && --links[depth]->balance)
|
||||
|| (dirs[depth] == R && ++links[depth]->balance)))
|
||||
{
|
||||
switch (links[depth]->balance)
|
||||
{
|
||||
case (char) -2:
|
||||
switch (dirs[depth + 1])
|
||||
{
|
||||
case L:
|
||||
r = links[depth], t = r->llink, rl = t->rlink;
|
||||
t->rlink = r, r->llink = rl;
|
||||
t->balance = r->balance = 0;
|
||||
break;
|
||||
case R:
|
||||
r = links[depth], l = r->llink, t = l->rlink;
|
||||
rl = t->rlink, lr = t->llink;
|
||||
t->llink = l, l->rlink = lr, t->rlink = r, r->llink = rl;
|
||||
l->balance = t->balance != 1 ? 0 : -1;
|
||||
r->balance = t->balance != (char) -1 ? 0 : 1;
|
||||
t->balance = 0;
|
||||
break;
|
||||
}
|
||||
break;
|
||||
case 2:
|
||||
switch (dirs[depth + 1])
|
||||
{
|
||||
case R:
|
||||
l = links[depth], t = l->rlink, lr = t->llink;
|
||||
t->llink = l, l->rlink = lr;
|
||||
t->balance = l->balance = 0;
|
||||
break;
|
||||
case L:
|
||||
l = links[depth], r = l->rlink, t = r->llink;
|
||||
lr = t->llink, rl = t->rlink;
|
||||
t->llink = l, l->rlink = lr, t->rlink = r, r->llink = rl;
|
||||
l->balance = t->balance != 1 ? 0 : -1;
|
||||
r->balance = t->balance != (char) -1 ? 0 : 1;
|
||||
t->balance = 0;
|
||||
break;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
if (dirs[depth - 1] == L)
|
||||
links[depth - 1]->llink = t;
|
||||
else
|
||||
links[depth - 1]->rlink = t;
|
||||
}
|
||||
}
|
||||
|
||||
trie = link->trie;
|
||||
}
|
||||
|
||||
/* Mark the node we finally reached as accepting, encoding the
|
||||
index number of this word in the keyword set so far. */
|
||||
if (!trie->accepting)
|
||||
trie->accepting = 1 + 2 * kwset->words;
|
||||
++kwset->words;
|
||||
|
||||
/* Keep track of the longest and shortest string of the keyword set. */
|
||||
if (trie->depth < kwset->mind)
|
||||
kwset->mind = trie->depth;
|
||||
if (trie->depth > kwset->maxd)
|
||||
kwset->maxd = trie->depth;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Enqueue the trie nodes referenced from the given tree in the
|
||||
given queue. */
|
||||
static void
|
||||
enqueue(tree, last)
|
||||
struct tree *tree;
|
||||
struct trie **last;
|
||||
{
|
||||
if (!tree)
|
||||
return;
|
||||
enqueue(tree->llink, last);
|
||||
enqueue(tree->rlink, last);
|
||||
(*last) = (*last)->next = tree->trie;
|
||||
}
|
||||
|
||||
/* Compute the Aho-Corasick failure function for the trie nodes referenced
|
||||
from the given tree, given the failure function for their parent as
|
||||
well as a last resort failure node. */
|
||||
static void
|
||||
treefails(tree, fail, recourse)
|
||||
register struct tree *tree;
|
||||
struct trie *fail;
|
||||
struct trie *recourse;
|
||||
{
|
||||
register struct tree *link;
|
||||
|
||||
if (!tree)
|
||||
return;
|
||||
|
||||
treefails(tree->llink, fail, recourse);
|
||||
treefails(tree->rlink, fail, recourse);
|
||||
|
||||
/* Find, in the chain of fails going back to the root, the first
|
||||
node that has a descendent on the current label. */
|
||||
while (fail)
|
||||
{
|
||||
link = fail->links;
|
||||
while (link && tree->label != link->label)
|
||||
if (tree->label < link->label)
|
||||
link = link->llink;
|
||||
else
|
||||
link = link->rlink;
|
||||
if (link)
|
||||
{
|
||||
tree->trie->fail = link->trie;
|
||||
return;
|
||||
}
|
||||
fail = fail->fail;
|
||||
}
|
||||
|
||||
tree->trie->fail = recourse;
|
||||
}
|
||||
|
||||
/* Set delta entries for the links of the given tree such that
|
||||
the preexisting delta value is larger than the current depth. */
|
||||
static void
|
||||
treedelta(tree, depth, delta)
|
||||
register struct tree *tree;
|
||||
register unsigned int depth;
|
||||
unsigned char delta[];
|
||||
{
|
||||
if (!tree)
|
||||
return;
|
||||
treedelta(tree->llink, depth, delta);
|
||||
treedelta(tree->rlink, depth, delta);
|
||||
if (depth < delta[tree->label])
|
||||
delta[tree->label] = depth;
|
||||
}
|
||||
|
||||
/* Return true if A has every label in B. */
|
||||
static int
|
||||
hasevery(a, b)
|
||||
register struct tree *a;
|
||||
register struct tree *b;
|
||||
{
|
||||
if (!b)
|
||||
return 1;
|
||||
if (!hasevery(a, b->llink))
|
||||
return 0;
|
||||
if (!hasevery(a, b->rlink))
|
||||
return 0;
|
||||
while (a && b->label != a->label)
|
||||
if (b->label < a->label)
|
||||
a = a->llink;
|
||||
else
|
||||
a = a->rlink;
|
||||
return !!a;
|
||||
}
|
||||
|
||||
/* Compute a vector, indexed by character code, of the trie nodes
|
||||
referenced from the given tree. */
|
||||
static void
|
||||
treenext(tree, next)
|
||||
struct tree *tree;
|
||||
struct trie *next[];
|
||||
{
|
||||
if (!tree)
|
||||
return;
|
||||
treenext(tree->llink, next);
|
||||
treenext(tree->rlink, next);
|
||||
next[tree->label] = tree->trie;
|
||||
}
|
||||
|
||||
/* Compute the shift for each trie node, as well as the delta
|
||||
table and next cache for the given keyword set. */
|
||||
char *
|
||||
kwsprep(kws)
|
||||
kwset_t kws;
|
||||
{
|
||||
register struct kwset *kwset;
|
||||
register int i;
|
||||
register struct trie *curr, *fail;
|
||||
register char *trans;
|
||||
unsigned char delta[NCHAR];
|
||||
struct trie *last, *next[NCHAR];
|
||||
|
||||
kwset = (struct kwset *) kws;
|
||||
|
||||
/* Initial values for the delta table; will be changed later. The
|
||||
delta entry for a given character is the smallest depth of any
|
||||
node at which an outgoing edge is labeled by that character. */
|
||||
if (kwset->mind < 256)
|
||||
for (i = 0; i < NCHAR; ++i)
|
||||
delta[i] = kwset->mind;
|
||||
else
|
||||
for (i = 0; i < NCHAR; ++i)
|
||||
delta[i] = 255;
|
||||
|
||||
/* Check if we can use the simple boyer-moore algorithm, instead
|
||||
of the hairy commentz-walter algorithm. */
|
||||
if (kwset->words == 1 && kwset->trans == 0)
|
||||
{
|
||||
/* Looking for just one string. Extract it from the trie. */
|
||||
kwset->target = obstack_alloc(&kwset->obstack, kwset->mind);
|
||||
for (i = kwset->mind - 1, curr = kwset->trie; i >= 0; --i)
|
||||
{
|
||||
kwset->target[i] = curr->links->label;
|
||||
curr = curr->links->trie;
|
||||
}
|
||||
/* Build the Boyer Moore delta. Boy that's easy compared to CW. */
|
||||
for (i = 0; i < kwset->mind; ++i)
|
||||
delta[(unsigned char) kwset->target[i]] = kwset->mind - (i + 1);
|
||||
kwset->mind2 = kwset->mind;
|
||||
/* Find the minimal delta2 shift that we might make after
|
||||
a backwards match has failed. */
|
||||
for (i = 0; i < kwset->mind - 1; ++i)
|
||||
if (kwset->target[i] == kwset->target[kwset->mind - 1])
|
||||
kwset->mind2 = kwset->mind - (i + 1);
|
||||
}
|
||||
else
|
||||
{
|
||||
/* Traverse the nodes of the trie in level order, simultaneously
|
||||
computing the delta table, failure function, and shift function. */
|
||||
for (curr = last = kwset->trie; curr; curr = curr->next)
|
||||
{
|
||||
/* Enqueue the immediate descendents in the level order queue. */
|
||||
enqueue(curr->links, &last);
|
||||
|
||||
curr->shift = kwset->mind;
|
||||
curr->maxshift = kwset->mind;
|
||||
|
||||
/* Update the delta table for the descendents of this node. */
|
||||
treedelta(curr->links, curr->depth, delta);
|
||||
|
||||
/* Compute the failure function for the decendents of this node. */
|
||||
treefails(curr->links, curr->fail, kwset->trie);
|
||||
|
||||
/* Update the shifts at each node in the current node's chain
|
||||
of fails back to the root. */
|
||||
for (fail = curr->fail; fail; fail = fail->fail)
|
||||
{
|
||||
/* If the current node has some outgoing edge that the fail
|
||||
doesn't, then the shift at the fail should be no larger
|
||||
than the difference of their depths. */
|
||||
if (!hasevery(fail->links, curr->links))
|
||||
if (curr->depth - fail->depth < fail->shift)
|
||||
fail->shift = curr->depth - fail->depth;
|
||||
|
||||
/* If the current node is accepting then the shift at the
|
||||
fail and its descendents should be no larger than the
|
||||
difference of their depths. */
|
||||
if (curr->accepting && fail->maxshift > curr->depth - fail->depth)
|
||||
fail->maxshift = curr->depth - fail->depth;
|
||||
}
|
||||
}
|
||||
|
||||
/* Traverse the trie in level order again, fixing up all nodes whose
|
||||
shift exceeds their inherited maxshift. */
|
||||
for (curr = kwset->trie->next; curr; curr = curr->next)
|
||||
{
|
||||
if (curr->maxshift > curr->parent->maxshift)
|
||||
curr->maxshift = curr->parent->maxshift;
|
||||
if (curr->shift > curr->maxshift)
|
||||
curr->shift = curr->maxshift;
|
||||
}
|
||||
|
||||
/* Create a vector, indexed by character code, of the outgoing links
|
||||
from the root node. */
|
||||
for (i = 0; i < NCHAR; ++i)
|
||||
next[i] = 0;
|
||||
treenext(kwset->trie->links, next);
|
||||
|
||||
if ((trans = kwset->trans) != 0)
|
||||
for (i = 0; i < NCHAR; ++i)
|
||||
kwset->next[i] = next[(unsigned char) trans[i]];
|
||||
else
|
||||
for (i = 0; i < NCHAR; ++i)
|
||||
kwset->next[i] = next[i];
|
||||
}
|
||||
|
||||
/* Fix things up for any translation table. */
|
||||
if ((trans = kwset->trans) != 0)
|
||||
for (i = 0; i < NCHAR; ++i)
|
||||
kwset->delta[i] = delta[(unsigned char) trans[i]];
|
||||
else
|
||||
for (i = 0; i < NCHAR; ++i)
|
||||
kwset->delta[i] = delta[i];
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
#define U(C) ((unsigned char) (C))
|
||||
|
||||
/* Fast boyer-moore search. */
|
||||
static char *
|
||||
bmexec(kws, text, size)
|
||||
kwset_t kws;
|
||||
char *text;
|
||||
size_t size;
|
||||
{
|
||||
struct kwset *kwset;
|
||||
register unsigned char *d1;
|
||||
register char *ep, *sp, *tp;
|
||||
register int d, gc, i, len, md2;
|
||||
|
||||
kwset = (struct kwset *) kws;
|
||||
len = kwset->mind;
|
||||
|
||||
if (len == 0)
|
||||
return text;
|
||||
if (len > size)
|
||||
return 0;
|
||||
if (len == 1)
|
||||
return memchr(text, kwset->target[0], size);
|
||||
|
||||
d1 = kwset->delta;
|
||||
sp = kwset->target + len;
|
||||
gc = U(sp[-2]);
|
||||
md2 = kwset->mind2;
|
||||
tp = text + len;
|
||||
|
||||
/* Significance of 12: 1 (initial offset) + 10 (skip loop) + 1 (md2). */
|
||||
if (size > 12 * len)
|
||||
/* 11 is not a bug, the initial offset happens only once. */
|
||||
for (ep = text + size - 11 * len;;)
|
||||
{
|
||||
while (tp <= ep)
|
||||
{
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
if (d == 0)
|
||||
goto found;
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
if (d == 0)
|
||||
goto found;
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
if (d == 0)
|
||||
goto found;
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
d = d1[U(tp[-1])], tp += d;
|
||||
}
|
||||
break;
|
||||
found:
|
||||
if (U(tp[-2]) == gc)
|
||||
{
|
||||
for (i = 3; i <= len && U(tp[-i]) == U(sp[-i]); ++i)
|
||||
;
|
||||
if (i > len)
|
||||
return tp - len;
|
||||
}
|
||||
tp += md2;
|
||||
}
|
||||
|
||||
/* Now we have only a few characters left to search. We
|
||||
carefully avoid ever producing an out-of-bounds pointer. */
|
||||
ep = text + size;
|
||||
d = d1[U(tp[-1])];
|
||||
while (d <= ep - tp)
|
||||
{
|
||||
d = d1[U((tp += d)[-1])];
|
||||
if (d != 0)
|
||||
continue;
|
||||
if (tp[-2] == gc)
|
||||
{
|
||||
for (i = 3; i <= len && U(tp[-i]) == U(sp[-i]); ++i)
|
||||
;
|
||||
if (i > len)
|
||||
return tp - len;
|
||||
}
|
||||
d = md2;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Hairy multiple string search. */
|
||||
static char *
|
||||
cwexec(kws, text, len, kwsmatch)
|
||||
kwset_t kws;
|
||||
char *text;
|
||||
size_t len;
|
||||
struct kwsmatch *kwsmatch;
|
||||
{
|
||||
struct kwset *kwset;
|
||||
struct trie **next, *trie, *accept;
|
||||
char *beg, *lim, *mch, *lmch;
|
||||
register unsigned char c, *delta;
|
||||
register int d;
|
||||
register char *end, *qlim;
|
||||
register struct tree *tree;
|
||||
register char *trans;
|
||||
|
||||
/* Initialize register copies and look for easy ways out. */
|
||||
kwset = (struct kwset *) kws;
|
||||
if (len < kwset->mind)
|
||||
return 0;
|
||||
next = kwset->next;
|
||||
delta = kwset->delta;
|
||||
trans = kwset->trans;
|
||||
lim = text + len;
|
||||
end = text;
|
||||
if ((d = kwset->mind) != 0)
|
||||
mch = 0;
|
||||
else
|
||||
{
|
||||
mch = text, accept = kwset->trie;
|
||||
goto match;
|
||||
}
|
||||
|
||||
if (len >= 4 * kwset->mind)
|
||||
qlim = lim - 4 * kwset->mind;
|
||||
else
|
||||
qlim = 0;
|
||||
|
||||
while (lim - end >= d)
|
||||
{
|
||||
if (qlim && end <= qlim)
|
||||
{
|
||||
end += d - 1;
|
||||
while ((d = delta[c = *end]) && end < qlim)
|
||||
{
|
||||
end += d;
|
||||
end += delta[(unsigned char) *end];
|
||||
end += delta[(unsigned char) *end];
|
||||
}
|
||||
++end;
|
||||
}
|
||||
else
|
||||
d = delta[c = (end += d)[-1]];
|
||||
if (d)
|
||||
continue;
|
||||
beg = end - 1;
|
||||
trie = next[c];
|
||||
if (trie->accepting)
|
||||
{
|
||||
mch = beg;
|
||||
accept = trie;
|
||||
}
|
||||
d = trie->shift;
|
||||
while (beg > text)
|
||||
{
|
||||
c = trans ? trans[(unsigned char) *--beg] : *--beg;
|
||||
tree = trie->links;
|
||||
while (tree && c != tree->label)
|
||||
if (c < tree->label)
|
||||
tree = tree->llink;
|
||||
else
|
||||
tree = tree->rlink;
|
||||
if (tree)
|
||||
{
|
||||
trie = tree->trie;
|
||||
if (trie->accepting)
|
||||
{
|
||||
mch = beg;
|
||||
accept = trie;
|
||||
}
|
||||
}
|
||||
else
|
||||
break;
|
||||
d = trie->shift;
|
||||
}
|
||||
if (mch)
|
||||
goto match;
|
||||
}
|
||||
return 0;
|
||||
|
||||
match:
|
||||
/* Given a known match, find the longest possible match anchored
|
||||
at or before its starting point. This is nearly a verbatim
|
||||
copy of the preceding main search loops. */
|
||||
if (lim - mch > kwset->maxd)
|
||||
lim = mch + kwset->maxd;
|
||||
lmch = 0;
|
||||
d = 1;
|
||||
while (lim - end >= d)
|
||||
{
|
||||
if ((d = delta[c = (end += d)[-1]]) != 0)
|
||||
continue;
|
||||
beg = end - 1;
|
||||
if (!(trie = next[c]))
|
||||
{
|
||||
d = 1;
|
||||
continue;
|
||||
}
|
||||
if (trie->accepting && beg <= mch)
|
||||
{
|
||||
lmch = beg;
|
||||
accept = trie;
|
||||
}
|
||||
d = trie->shift;
|
||||
while (beg > text)
|
||||
{
|
||||
c = trans ? trans[(unsigned char) *--beg] : *--beg;
|
||||
tree = trie->links;
|
||||
while (tree && c != tree->label)
|
||||
if (c < tree->label)
|
||||
tree = tree->llink;
|
||||
else
|
||||
tree = tree->rlink;
|
||||
if (tree)
|
||||
{
|
||||
trie = tree->trie;
|
||||
if (trie->accepting && beg <= mch)
|
||||
{
|
||||
lmch = beg;
|
||||
accept = trie;
|
||||
}
|
||||
}
|
||||
else
|
||||
break;
|
||||
d = trie->shift;
|
||||
}
|
||||
if (lmch)
|
||||
{
|
||||
mch = lmch;
|
||||
goto match;
|
||||
}
|
||||
if (!d)
|
||||
d = 1;
|
||||
}
|
||||
|
||||
if (kwsmatch)
|
||||
{
|
||||
kwsmatch->index = accept->accepting / 2;
|
||||
kwsmatch->beg[0] = mch;
|
||||
kwsmatch->size[0] = accept->depth;
|
||||
}
|
||||
return mch;
|
||||
}
|
||||
|
||||
/* Search through the given text for a match of any member of the
|
||||
given keyword set. Return a pointer to the first character of
|
||||
the matching substring, or NULL if no match is found. If FOUNDLEN
|
||||
is non-NULL store in the referenced location the length of the
|
||||
matching substring. Similarly, if FOUNDIDX is non-NULL, store
|
||||
in the referenced location the index number of the particular
|
||||
keyword matched. */
|
||||
char *
|
||||
kwsexec(kws, text, size, kwsmatch)
|
||||
kwset_t kws;
|
||||
char *text;
|
||||
size_t size;
|
||||
struct kwsmatch *kwsmatch;
|
||||
{
|
||||
struct kwset *kwset;
|
||||
char *ret;
|
||||
|
||||
kwset = (struct kwset *) kws;
|
||||
if (kwset->words == 1 && kwset->trans == 0)
|
||||
{
|
||||
ret = bmexec(kws, text, size);
|
||||
if (kwsmatch != 0 && ret != 0)
|
||||
{
|
||||
kwsmatch->index = 0;
|
||||
kwsmatch->beg[0] = ret;
|
||||
kwsmatch->size[0] = kwset->mind;
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
else
|
||||
return cwexec(kws, text, size, kwsmatch);
|
||||
}
|
||||
|
||||
/* Free the components of the given keyword set. */
|
||||
void
|
||||
kwsfree(kws)
|
||||
kwset_t kws;
|
||||
{
|
||||
struct kwset *kwset;
|
||||
|
||||
kwset = (struct kwset *) kws;
|
||||
obstack_free(&kwset->obstack, 0);
|
||||
free(kws);
|
||||
}
|
69
gnu/usr.bin/grep/kwset.h
Normal file
69
gnu/usr.bin/grep/kwset.h
Normal file
@ -0,0 +1,69 @@
|
||||
/* kwset.h - header declaring the keyword set library.
|
||||
Copyright 1989 Free Software Foundation
|
||||
Written August 1989 by Mike Haertel.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 1, or (at your option)
|
||||
any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
|
||||
The author may be reached (Email) at the address mike@ai.mit.edu,
|
||||
or (US mail) as Mike Haertel c/o Free Software Foundation. */
|
||||
|
||||
struct kwsmatch
|
||||
{
|
||||
int index; /* Index number of matching keyword. */
|
||||
char *beg[1]; /* Begin pointer for each submatch. */
|
||||
size_t size[1]; /* Length of each submatch. */
|
||||
};
|
||||
|
||||
#if __STDC__
|
||||
|
||||
typedef void *kwset_t;
|
||||
|
||||
/* Return an opaque pointer to a newly allocated keyword set, or NULL
|
||||
if enough memory cannot be obtained. The argument if non-NULL
|
||||
specifies a table of character translations to be applied to all
|
||||
pattern and search text. */
|
||||
extern kwset_t kwsalloc(char *);
|
||||
|
||||
/* Incrementally extend the keyword set to include the given string.
|
||||
Return NULL for success, or an error message. Remember an index
|
||||
number for each keyword included in the set. */
|
||||
extern char *kwsincr(kwset_t, char *, size_t);
|
||||
|
||||
/* When the keyword set has been completely built, prepare it for
|
||||
use. Return NULL for success, or an error message. */
|
||||
extern char *kwsprep(kwset_t);
|
||||
|
||||
/* Search through the given buffer for a member of the keyword set.
|
||||
Return a pointer to the leftmost longest match found, or NULL if
|
||||
no match is found. If foundlen is non-NULL, store the length of
|
||||
the matching substring in the integer it points to. Similarly,
|
||||
if foundindex is non-NULL, store the index of the particular
|
||||
keyword found therein. */
|
||||
extern char *kwsexec(kwset_t, char *, size_t, struct kwsmatch *);
|
||||
|
||||
/* Deallocate the given keyword set and all its associated storage. */
|
||||
extern void kwsfree(kwset_t);
|
||||
|
||||
#else
|
||||
|
||||
typedef char *kwset_t;
|
||||
|
||||
extern kwset_t kwsalloc();
|
||||
extern char *kwsincr();
|
||||
extern char *kwsprep();
|
||||
extern char *kwsexec();
|
||||
extern void kwsfree();
|
||||
|
||||
#endif
|
454
gnu/usr.bin/grep/obstack.c
Normal file
454
gnu/usr.bin/grep/obstack.c
Normal file
@ -0,0 +1,454 @@
|
||||
/* obstack.c - subroutines used implicitly by object stack macros
|
||||
Copyright (C) 1988, 1993 Free Software Foundation, Inc.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the terms of the GNU General Public License as published by the
|
||||
Free Software Foundation; either version 2, or (at your option) any
|
||||
later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||
|
||||
#include "obstack.h"
|
||||
|
||||
/* This is just to get __GNU_LIBRARY__ defined. */
|
||||
#include <stdio.h>
|
||||
|
||||
/* Comment out all this code if we are using the GNU C Library, and are not
|
||||
actually compiling the library itself. This code is part of the GNU C
|
||||
Library, but also included in many other GNU distributions. Compiling
|
||||
and linking in this code is a waste when using the GNU C library
|
||||
(especially if it is a shared library). Rather than having every GNU
|
||||
program understand `configure --with-gnu-libc' and omit the object files,
|
||||
it is simpler to just do this in the source for each such file. */
|
||||
|
||||
#if defined (_LIBC) || !defined (__GNU_LIBRARY__)
|
||||
|
||||
|
||||
#ifdef __STDC__
|
||||
#define POINTER void *
|
||||
#else
|
||||
#define POINTER char *
|
||||
#endif
|
||||
|
||||
/* Determine default alignment. */
|
||||
struct fooalign {char x; double d;};
|
||||
#define DEFAULT_ALIGNMENT \
|
||||
((PTR_INT_TYPE) ((char *)&((struct fooalign *) 0)->d - (char *)0))
|
||||
/* If malloc were really smart, it would round addresses to DEFAULT_ALIGNMENT.
|
||||
But in fact it might be less smart and round addresses to as much as
|
||||
DEFAULT_ROUNDING. So we prepare for it to do that. */
|
||||
union fooround {long x; double d;};
|
||||
#define DEFAULT_ROUNDING (sizeof (union fooround))
|
||||
|
||||
/* When we copy a long block of data, this is the unit to do it with.
|
||||
On some machines, copying successive ints does not work;
|
||||
in such a case, redefine COPYING_UNIT to `long' (if that works)
|
||||
or `char' as a last resort. */
|
||||
#ifndef COPYING_UNIT
|
||||
#define COPYING_UNIT int
|
||||
#endif
|
||||
|
||||
/* The non-GNU-C macros copy the obstack into this global variable
|
||||
to avoid multiple evaluation. */
|
||||
|
||||
struct obstack *_obstack;
|
||||
|
||||
/* Define a macro that either calls functions with the traditional malloc/free
|
||||
calling interface, or calls functions with the mmalloc/mfree interface
|
||||
(that adds an extra first argument), based on the state of use_extra_arg.
|
||||
For free, do not use ?:, since some compilers, like the MIPS compilers,
|
||||
do not allow (expr) ? void : void. */
|
||||
|
||||
#define CALL_CHUNKFUN(h, size) \
|
||||
(((h) -> use_extra_arg) \
|
||||
? (*(h)->chunkfun) ((h)->extra_arg, (size)) \
|
||||
: (*(h)->chunkfun) ((size)))
|
||||
|
||||
#define CALL_FREEFUN(h, old_chunk) \
|
||||
do { \
|
||||
if ((h) -> use_extra_arg) \
|
||||
(*(h)->freefun) ((h)->extra_arg, (old_chunk)); \
|
||||
else \
|
||||
(*(h)->freefun) ((old_chunk)); \
|
||||
} while (0)
|
||||
|
||||
|
||||
/* Initialize an obstack H for use. Specify chunk size SIZE (0 means default).
|
||||
Objects start on multiples of ALIGNMENT (0 means use default).
|
||||
CHUNKFUN is the function to use to allocate chunks,
|
||||
and FREEFUN the function to free them. */
|
||||
|
||||
void
|
||||
_obstack_begin (h, size, alignment, chunkfun, freefun)
|
||||
struct obstack *h;
|
||||
int size;
|
||||
int alignment;
|
||||
POINTER (*chunkfun) ();
|
||||
void (*freefun) ();
|
||||
{
|
||||
register struct _obstack_chunk* chunk; /* points to new chunk */
|
||||
|
||||
if (alignment == 0)
|
||||
alignment = DEFAULT_ALIGNMENT;
|
||||
if (size == 0)
|
||||
/* Default size is what GNU malloc can fit in a 4096-byte block. */
|
||||
{
|
||||
/* 12 is sizeof (mhead) and 4 is EXTRA from GNU malloc.
|
||||
Use the values for range checking, because if range checking is off,
|
||||
the extra bytes won't be missed terribly, but if range checking is on
|
||||
and we used a larger request, a whole extra 4096 bytes would be
|
||||
allocated.
|
||||
|
||||
These number are irrelevant to the new GNU malloc. I suspect it is
|
||||
less sensitive to the size of the request. */
|
||||
int extra = ((((12 + DEFAULT_ROUNDING - 1) & ~(DEFAULT_ROUNDING - 1))
|
||||
+ 4 + DEFAULT_ROUNDING - 1)
|
||||
& ~(DEFAULT_ROUNDING - 1));
|
||||
size = 4096 - extra;
|
||||
}
|
||||
|
||||
h->chunkfun = (struct _obstack_chunk * (*)()) chunkfun;
|
||||
h->freefun = freefun;
|
||||
h->chunk_size = size;
|
||||
h->alignment_mask = alignment - 1;
|
||||
h->use_extra_arg = 0;
|
||||
|
||||
chunk = h->chunk = CALL_CHUNKFUN (h, h -> chunk_size);
|
||||
h->next_free = h->object_base = chunk->contents;
|
||||
h->chunk_limit = chunk->limit
|
||||
= (char *) chunk + h->chunk_size;
|
||||
chunk->prev = 0;
|
||||
/* The initial chunk now contains no empty object. */
|
||||
h->maybe_empty_object = 0;
|
||||
}
|
||||
|
||||
void
|
||||
_obstack_begin_1 (h, size, alignment, chunkfun, freefun, arg)
|
||||
struct obstack *h;
|
||||
int size;
|
||||
int alignment;
|
||||
POINTER (*chunkfun) ();
|
||||
void (*freefun) ();
|
||||
POINTER arg;
|
||||
{
|
||||
register struct _obstack_chunk* chunk; /* points to new chunk */
|
||||
|
||||
if (alignment == 0)
|
||||
alignment = DEFAULT_ALIGNMENT;
|
||||
if (size == 0)
|
||||
/* Default size is what GNU malloc can fit in a 4096-byte block. */
|
||||
{
|
||||
/* 12 is sizeof (mhead) and 4 is EXTRA from GNU malloc.
|
||||
Use the values for range checking, because if range checking is off,
|
||||
the extra bytes won't be missed terribly, but if range checking is on
|
||||
and we used a larger request, a whole extra 4096 bytes would be
|
||||
allocated.
|
||||
|
||||
These number are irrelevant to the new GNU malloc. I suspect it is
|
||||
less sensitive to the size of the request. */
|
||||
int extra = ((((12 + DEFAULT_ROUNDING - 1) & ~(DEFAULT_ROUNDING - 1))
|
||||
+ 4 + DEFAULT_ROUNDING - 1)
|
||||
& ~(DEFAULT_ROUNDING - 1));
|
||||
size = 4096 - extra;
|
||||
}
|
||||
|
||||
h->chunkfun = (struct _obstack_chunk * (*)()) chunkfun;
|
||||
h->freefun = freefun;
|
||||
h->chunk_size = size;
|
||||
h->alignment_mask = alignment - 1;
|
||||
h->extra_arg = arg;
|
||||
h->use_extra_arg = 1;
|
||||
|
||||
chunk = h->chunk = CALL_CHUNKFUN (h, h -> chunk_size);
|
||||
h->next_free = h->object_base = chunk->contents;
|
||||
h->chunk_limit = chunk->limit
|
||||
= (char *) chunk + h->chunk_size;
|
||||
chunk->prev = 0;
|
||||
/* The initial chunk now contains no empty object. */
|
||||
h->maybe_empty_object = 0;
|
||||
}
|
||||
|
||||
/* Allocate a new current chunk for the obstack *H
|
||||
on the assumption that LENGTH bytes need to be added
|
||||
to the current object, or a new object of length LENGTH allocated.
|
||||
Copies any partial object from the end of the old chunk
|
||||
to the beginning of the new one. */
|
||||
|
||||
void
|
||||
_obstack_newchunk (h, length)
|
||||
struct obstack *h;
|
||||
int length;
|
||||
{
|
||||
register struct _obstack_chunk* old_chunk = h->chunk;
|
||||
register struct _obstack_chunk* new_chunk;
|
||||
register long new_size;
|
||||
register int obj_size = h->next_free - h->object_base;
|
||||
register int i;
|
||||
int already;
|
||||
|
||||
/* Compute size for new chunk. */
|
||||
new_size = (obj_size + length) + (obj_size >> 3) + 100;
|
||||
if (new_size < h->chunk_size)
|
||||
new_size = h->chunk_size;
|
||||
|
||||
/* Allocate and initialize the new chunk. */
|
||||
new_chunk = h->chunk = CALL_CHUNKFUN (h, new_size);
|
||||
new_chunk->prev = old_chunk;
|
||||
new_chunk->limit = h->chunk_limit = (char *) new_chunk + new_size;
|
||||
|
||||
/* Move the existing object to the new chunk.
|
||||
Word at a time is fast and is safe if the object
|
||||
is sufficiently aligned. */
|
||||
if (h->alignment_mask + 1 >= DEFAULT_ALIGNMENT)
|
||||
{
|
||||
for (i = obj_size / sizeof (COPYING_UNIT) - 1;
|
||||
i >= 0; i--)
|
||||
((COPYING_UNIT *)new_chunk->contents)[i]
|
||||
= ((COPYING_UNIT *)h->object_base)[i];
|
||||
/* We used to copy the odd few remaining bytes as one extra COPYING_UNIT,
|
||||
but that can cross a page boundary on a machine
|
||||
which does not do strict alignment for COPYING_UNITS. */
|
||||
already = obj_size / sizeof (COPYING_UNIT) * sizeof (COPYING_UNIT);
|
||||
}
|
||||
else
|
||||
already = 0;
|
||||
/* Copy remaining bytes one by one. */
|
||||
for (i = already; i < obj_size; i++)
|
||||
new_chunk->contents[i] = h->object_base[i];
|
||||
|
||||
/* If the object just copied was the only data in OLD_CHUNK,
|
||||
free that chunk and remove it from the chain.
|
||||
But not if that chunk might contain an empty object. */
|
||||
if (h->object_base == old_chunk->contents && ! h->maybe_empty_object)
|
||||
{
|
||||
new_chunk->prev = old_chunk->prev;
|
||||
CALL_FREEFUN (h, old_chunk);
|
||||
}
|
||||
|
||||
h->object_base = new_chunk->contents;
|
||||
h->next_free = h->object_base + obj_size;
|
||||
/* The new chunk certainly contains no empty object yet. */
|
||||
h->maybe_empty_object = 0;
|
||||
}
|
||||
|
||||
/* Return nonzero if object OBJ has been allocated from obstack H.
|
||||
This is here for debugging.
|
||||
If you use it in a program, you are probably losing. */
|
||||
|
||||
int
|
||||
_obstack_allocated_p (h, obj)
|
||||
struct obstack *h;
|
||||
POINTER obj;
|
||||
{
|
||||
register struct _obstack_chunk* lp; /* below addr of any objects in this chunk */
|
||||
register struct _obstack_chunk* plp; /* point to previous chunk if any */
|
||||
|
||||
lp = (h)->chunk;
|
||||
/* We use >= rather than > since the object cannot be exactly at
|
||||
the beginning of the chunk but might be an empty object exactly
|
||||
at the end of an adjacent chunk. */
|
||||
while (lp != 0 && ((POINTER)lp >= obj || (POINTER)(lp)->limit < obj))
|
||||
{
|
||||
plp = lp->prev;
|
||||
lp = plp;
|
||||
}
|
||||
return lp != 0;
|
||||
}
|
||||
|
||||
/* Free objects in obstack H, including OBJ and everything allocate
|
||||
more recently than OBJ. If OBJ is zero, free everything in H. */
|
||||
|
||||
#undef obstack_free
|
||||
|
||||
/* This function has two names with identical definitions.
|
||||
This is the first one, called from non-ANSI code. */
|
||||
|
||||
void
|
||||
_obstack_free (h, obj)
|
||||
struct obstack *h;
|
||||
POINTER obj;
|
||||
{
|
||||
register struct _obstack_chunk* lp; /* below addr of any objects in this chunk */
|
||||
register struct _obstack_chunk* plp; /* point to previous chunk if any */
|
||||
|
||||
lp = h->chunk;
|
||||
/* We use >= because there cannot be an object at the beginning of a chunk.
|
||||
But there can be an empty object at that address
|
||||
at the end of another chunk. */
|
||||
while (lp != 0 && ((POINTER)lp >= obj || (POINTER)(lp)->limit < obj))
|
||||
{
|
||||
plp = lp->prev;
|
||||
CALL_FREEFUN (h, lp);
|
||||
lp = plp;
|
||||
/* If we switch chunks, we can't tell whether the new current
|
||||
chunk contains an empty object, so assume that it may. */
|
||||
h->maybe_empty_object = 1;
|
||||
}
|
||||
if (lp)
|
||||
{
|
||||
h->object_base = h->next_free = (char *)(obj);
|
||||
h->chunk_limit = lp->limit;
|
||||
h->chunk = lp;
|
||||
}
|
||||
else if (obj != 0)
|
||||
/* obj is not in any of the chunks! */
|
||||
abort ();
|
||||
}
|
||||
|
||||
/* This function is used from ANSI code. */
|
||||
|
||||
void
|
||||
obstack_free (h, obj)
|
||||
struct obstack *h;
|
||||
POINTER obj;
|
||||
{
|
||||
register struct _obstack_chunk* lp; /* below addr of any objects in this chunk */
|
||||
register struct _obstack_chunk* plp; /* point to previous chunk if any */
|
||||
|
||||
lp = h->chunk;
|
||||
/* We use >= because there cannot be an object at the beginning of a chunk.
|
||||
But there can be an empty object at that address
|
||||
at the end of another chunk. */
|
||||
while (lp != 0 && ((POINTER)lp >= obj || (POINTER)(lp)->limit < obj))
|
||||
{
|
||||
plp = lp->prev;
|
||||
CALL_FREEFUN (h, lp);
|
||||
lp = plp;
|
||||
/* If we switch chunks, we can't tell whether the new current
|
||||
chunk contains an empty object, so assume that it may. */
|
||||
h->maybe_empty_object = 1;
|
||||
}
|
||||
if (lp)
|
||||
{
|
||||
h->object_base = h->next_free = (char *)(obj);
|
||||
h->chunk_limit = lp->limit;
|
||||
h->chunk = lp;
|
||||
}
|
||||
else if (obj != 0)
|
||||
/* obj is not in any of the chunks! */
|
||||
abort ();
|
||||
}
|
||||
|
||||
#if 0
|
||||
/* These are now turned off because the applications do not use it
|
||||
and it uses bcopy via obstack_grow, which causes trouble on sysV. */
|
||||
|
||||
/* Now define the functional versions of the obstack macros.
|
||||
Define them to simply use the corresponding macros to do the job. */
|
||||
|
||||
#ifdef __STDC__
|
||||
/* These function definitions do not work with non-ANSI preprocessors;
|
||||
they won't pass through the macro names in parentheses. */
|
||||
|
||||
/* The function names appear in parentheses in order to prevent
|
||||
the macro-definitions of the names from being expanded there. */
|
||||
|
||||
POINTER (obstack_base) (obstack)
|
||||
struct obstack *obstack;
|
||||
{
|
||||
return obstack_base (obstack);
|
||||
}
|
||||
|
||||
POINTER (obstack_next_free) (obstack)
|
||||
struct obstack *obstack;
|
||||
{
|
||||
return obstack_next_free (obstack);
|
||||
}
|
||||
|
||||
int (obstack_object_size) (obstack)
|
||||
struct obstack *obstack;
|
||||
{
|
||||
return obstack_object_size (obstack);
|
||||
}
|
||||
|
||||
int (obstack_room) (obstack)
|
||||
struct obstack *obstack;
|
||||
{
|
||||
return obstack_room (obstack);
|
||||
}
|
||||
|
||||
void (obstack_grow) (obstack, pointer, length)
|
||||
struct obstack *obstack;
|
||||
POINTER pointer;
|
||||
int length;
|
||||
{
|
||||
obstack_grow (obstack, pointer, length);
|
||||
}
|
||||
|
||||
void (obstack_grow0) (obstack, pointer, length)
|
||||
struct obstack *obstack;
|
||||
POINTER pointer;
|
||||
int length;
|
||||
{
|
||||
obstack_grow0 (obstack, pointer, length);
|
||||
}
|
||||
|
||||
void (obstack_1grow) (obstack, character)
|
||||
struct obstack *obstack;
|
||||
int character;
|
||||
{
|
||||
obstack_1grow (obstack, character);
|
||||
}
|
||||
|
||||
void (obstack_blank) (obstack, length)
|
||||
struct obstack *obstack;
|
||||
int length;
|
||||
{
|
||||
obstack_blank (obstack, length);
|
||||
}
|
||||
|
||||
void (obstack_1grow_fast) (obstack, character)
|
||||
struct obstack *obstack;
|
||||
int character;
|
||||
{
|
||||
obstack_1grow_fast (obstack, character);
|
||||
}
|
||||
|
||||
void (obstack_blank_fast) (obstack, length)
|
||||
struct obstack *obstack;
|
||||
int length;
|
||||
{
|
||||
obstack_blank_fast (obstack, length);
|
||||
}
|
||||
|
||||
POINTER (obstack_finish) (obstack)
|
||||
struct obstack *obstack;
|
||||
{
|
||||
return obstack_finish (obstack);
|
||||
}
|
||||
|
||||
POINTER (obstack_alloc) (obstack, length)
|
||||
struct obstack *obstack;
|
||||
int length;
|
||||
{
|
||||
return obstack_alloc (obstack, length);
|
||||
}
|
||||
|
||||
POINTER (obstack_copy) (obstack, pointer, length)
|
||||
struct obstack *obstack;
|
||||
POINTER pointer;
|
||||
int length;
|
||||
{
|
||||
return obstack_copy (obstack, pointer, length);
|
||||
}
|
||||
|
||||
POINTER (obstack_copy0) (obstack, pointer, length)
|
||||
struct obstack *obstack;
|
||||
POINTER pointer;
|
||||
int length;
|
||||
{
|
||||
return obstack_copy0 (obstack, pointer, length);
|
||||
}
|
||||
|
||||
#endif /* __STDC__ */
|
||||
|
||||
#endif /* 0 */
|
||||
|
||||
#endif /* _LIBC or not __GNU_LIBRARY__. */
|
484
gnu/usr.bin/grep/obstack.h
Normal file
484
gnu/usr.bin/grep/obstack.h
Normal file
@ -0,0 +1,484 @@
|
||||
/* obstack.h - object stack macros
|
||||
Copyright (C) 1988, 1992 Free Software Foundation, Inc.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the terms of the GNU General Public License as published by the
|
||||
Free Software Foundation; either version 2, or (at your option) any
|
||||
later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||
|
||||
/* Summary:
|
||||
|
||||
All the apparent functions defined here are macros. The idea
|
||||
is that you would use these pre-tested macros to solve a
|
||||
very specific set of problems, and they would run fast.
|
||||
Caution: no side-effects in arguments please!! They may be
|
||||
evaluated MANY times!!
|
||||
|
||||
These macros operate a stack of objects. Each object starts life
|
||||
small, and may grow to maturity. (Consider building a word syllable
|
||||
by syllable.) An object can move while it is growing. Once it has
|
||||
been "finished" it never changes address again. So the "top of the
|
||||
stack" is typically an immature growing object, while the rest of the
|
||||
stack is of mature, fixed size and fixed address objects.
|
||||
|
||||
These routines grab large chunks of memory, using a function you
|
||||
supply, called `obstack_chunk_alloc'. On occasion, they free chunks,
|
||||
by calling `obstack_chunk_free'. You must define them and declare
|
||||
them before using any obstack macros.
|
||||
|
||||
Each independent stack is represented by a `struct obstack'.
|
||||
Each of the obstack macros expects a pointer to such a structure
|
||||
as the first argument.
|
||||
|
||||
One motivation for this package is the problem of growing char strings
|
||||
in symbol tables. Unless you are "fascist pig with a read-only mind"
|
||||
--Gosper's immortal quote from HAKMEM item 154, out of context--you
|
||||
would not like to put any arbitrary upper limit on the length of your
|
||||
symbols.
|
||||
|
||||
In practice this often means you will build many short symbols and a
|
||||
few long symbols. At the time you are reading a symbol you don't know
|
||||
how long it is. One traditional method is to read a symbol into a
|
||||
buffer, realloc()ating the buffer every time you try to read a symbol
|
||||
that is longer than the buffer. This is beaut, but you still will
|
||||
want to copy the symbol from the buffer to a more permanent
|
||||
symbol-table entry say about half the time.
|
||||
|
||||
With obstacks, you can work differently. Use one obstack for all symbol
|
||||
names. As you read a symbol, grow the name in the obstack gradually.
|
||||
When the name is complete, finalize it. Then, if the symbol exists already,
|
||||
free the newly read name.
|
||||
|
||||
The way we do this is to take a large chunk, allocating memory from
|
||||
low addresses. When you want to build a symbol in the chunk you just
|
||||
add chars above the current "high water mark" in the chunk. When you
|
||||
have finished adding chars, because you got to the end of the symbol,
|
||||
you know how long the chars are, and you can create a new object.
|
||||
Mostly the chars will not burst over the highest address of the chunk,
|
||||
because you would typically expect a chunk to be (say) 100 times as
|
||||
long as an average object.
|
||||
|
||||
In case that isn't clear, when we have enough chars to make up
|
||||
the object, THEY ARE ALREADY CONTIGUOUS IN THE CHUNK (guaranteed)
|
||||
so we just point to it where it lies. No moving of chars is
|
||||
needed and this is the second win: potentially long strings need
|
||||
never be explicitly shuffled. Once an object is formed, it does not
|
||||
change its address during its lifetime.
|
||||
|
||||
When the chars burst over a chunk boundary, we allocate a larger
|
||||
chunk, and then copy the partly formed object from the end of the old
|
||||
chunk to the beginning of the new larger chunk. We then carry on
|
||||
accreting characters to the end of the object as we normally would.
|
||||
|
||||
A special macro is provided to add a single char at a time to a
|
||||
growing object. This allows the use of register variables, which
|
||||
break the ordinary 'growth' macro.
|
||||
|
||||
Summary:
|
||||
We allocate large chunks.
|
||||
We carve out one object at a time from the current chunk.
|
||||
Once carved, an object never moves.
|
||||
We are free to append data of any size to the currently
|
||||
growing object.
|
||||
Exactly one object is growing in an obstack at any one time.
|
||||
You can run one obstack per control block.
|
||||
You may have as many control blocks as you dare.
|
||||
Because of the way we do it, you can `unwind' an obstack
|
||||
back to a previous state. (You may remove objects much
|
||||
as you would with a stack.)
|
||||
*/
|
||||
|
||||
|
||||
/* Don't do the contents of this file more than once. */
|
||||
|
||||
#ifndef __OBSTACKS__
|
||||
#define __OBSTACKS__
|
||||
|
||||
/* We use subtraction of (char *)0 instead of casting to int
|
||||
because on word-addressable machines a simple cast to int
|
||||
may ignore the byte-within-word field of the pointer. */
|
||||
|
||||
#ifndef __PTR_TO_INT
|
||||
#define __PTR_TO_INT(P) ((P) - (char *)0)
|
||||
#endif
|
||||
|
||||
#ifndef __INT_TO_PTR
|
||||
#define __INT_TO_PTR(P) ((P) + (char *)0)
|
||||
#endif
|
||||
|
||||
/* We need the type of the resulting object. In ANSI C it is ptrdiff_t
|
||||
but in traditional C it is usually long. If we are in ANSI C and
|
||||
don't already have ptrdiff_t get it. */
|
||||
|
||||
#if defined (__STDC__) && ! defined (offsetof)
|
||||
#if defined (__GNUC__) && defined (IN_GCC)
|
||||
/* On Next machine, the system's stddef.h screws up if included
|
||||
after we have defined just ptrdiff_t, so include all of gstddef.h.
|
||||
Otherwise, define just ptrdiff_t, which is all we need. */
|
||||
#ifndef __NeXT__
|
||||
#define __need_ptrdiff_t
|
||||
#endif
|
||||
|
||||
/* While building GCC, the stddef.h that goes with GCC has this name. */
|
||||
#include "gstddef.h"
|
||||
#else
|
||||
#include <stddef.h>
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#ifdef __STDC__
|
||||
#define PTR_INT_TYPE ptrdiff_t
|
||||
#else
|
||||
#define PTR_INT_TYPE long
|
||||
#endif
|
||||
|
||||
struct _obstack_chunk /* Lives at front of each chunk. */
|
||||
{
|
||||
char *limit; /* 1 past end of this chunk */
|
||||
struct _obstack_chunk *prev; /* address of prior chunk or NULL */
|
||||
char contents[4]; /* objects begin here */
|
||||
};
|
||||
|
||||
struct obstack /* control current object in current chunk */
|
||||
{
|
||||
long chunk_size; /* preferred size to allocate chunks in */
|
||||
struct _obstack_chunk* chunk; /* address of current struct obstack_chunk */
|
||||
char *object_base; /* address of object we are building */
|
||||
char *next_free; /* where to add next char to current object */
|
||||
char *chunk_limit; /* address of char after current chunk */
|
||||
PTR_INT_TYPE temp; /* Temporary for some macros. */
|
||||
int alignment_mask; /* Mask of alignment for each object. */
|
||||
struct _obstack_chunk *(*chunkfun) (); /* User's fcn to allocate a chunk. */
|
||||
void (*freefun) (); /* User's function to free a chunk. */
|
||||
char *extra_arg; /* first arg for chunk alloc/dealloc funcs */
|
||||
unsigned use_extra_arg:1; /* chunk alloc/dealloc funcs take extra arg */
|
||||
unsigned maybe_empty_object:1;/* There is a possibility that the current
|
||||
chunk contains a zero-length object. This
|
||||
prevents freeing the chunk if we allocate
|
||||
a bigger chunk to replace it. */
|
||||
};
|
||||
|
||||
/* Declare the external functions we use; they are in obstack.c. */
|
||||
|
||||
#ifdef __STDC__
|
||||
extern void _obstack_newchunk (struct obstack *, int);
|
||||
extern void _obstack_free (struct obstack *, void *);
|
||||
extern void _obstack_begin (struct obstack *, int, int,
|
||||
void *(*) (), void (*) ());
|
||||
extern void _obstack_begin_1 (struct obstack *, int, int,
|
||||
void *(*) (), void (*) (), void *);
|
||||
#else
|
||||
extern void _obstack_newchunk ();
|
||||
extern void _obstack_free ();
|
||||
extern void _obstack_begin ();
|
||||
extern void _obstack_begin_1 ();
|
||||
#endif
|
||||
|
||||
#ifdef __STDC__
|
||||
|
||||
/* Do the function-declarations after the structs
|
||||
but before defining the macros. */
|
||||
|
||||
void obstack_init (struct obstack *obstack);
|
||||
|
||||
void * obstack_alloc (struct obstack *obstack, int size);
|
||||
|
||||
void * obstack_copy (struct obstack *obstack, void *address, int size);
|
||||
void * obstack_copy0 (struct obstack *obstack, void *address, int size);
|
||||
|
||||
void obstack_free (struct obstack *obstack, void *block);
|
||||
|
||||
void obstack_blank (struct obstack *obstack, int size);
|
||||
|
||||
void obstack_grow (struct obstack *obstack, void *data, int size);
|
||||
void obstack_grow0 (struct obstack *obstack, void *data, int size);
|
||||
|
||||
void obstack_1grow (struct obstack *obstack, int data_char);
|
||||
void obstack_ptr_grow (struct obstack *obstack, void *data);
|
||||
void obstack_int_grow (struct obstack *obstack, int data);
|
||||
|
||||
void * obstack_finish (struct obstack *obstack);
|
||||
|
||||
int obstack_object_size (struct obstack *obstack);
|
||||
|
||||
int obstack_room (struct obstack *obstack);
|
||||
void obstack_1grow_fast (struct obstack *obstack, int data_char);
|
||||
void obstack_ptr_grow_fast (struct obstack *obstack, void *data);
|
||||
void obstack_int_grow_fast (struct obstack *obstack, int data);
|
||||
void obstack_blank_fast (struct obstack *obstack, int size);
|
||||
|
||||
void * obstack_base (struct obstack *obstack);
|
||||
void * obstack_next_free (struct obstack *obstack);
|
||||
int obstack_alignment_mask (struct obstack *obstack);
|
||||
int obstack_chunk_size (struct obstack *obstack);
|
||||
|
||||
#endif /* __STDC__ */
|
||||
|
||||
/* Non-ANSI C cannot really support alternative functions for these macros,
|
||||
so we do not declare them. */
|
||||
|
||||
/* Pointer to beginning of object being allocated or to be allocated next.
|
||||
Note that this might not be the final address of the object
|
||||
because a new chunk might be needed to hold the final size. */
|
||||
|
||||
#define obstack_base(h) ((h)->object_base)
|
||||
|
||||
/* Size for allocating ordinary chunks. */
|
||||
|
||||
#define obstack_chunk_size(h) ((h)->chunk_size)
|
||||
|
||||
/* Pointer to next byte not yet allocated in current chunk. */
|
||||
|
||||
#define obstack_next_free(h) ((h)->next_free)
|
||||
|
||||
/* Mask specifying low bits that should be clear in address of an object. */
|
||||
|
||||
#define obstack_alignment_mask(h) ((h)->alignment_mask)
|
||||
|
||||
#define obstack_init(h) \
|
||||
_obstack_begin ((h), 0, 0, \
|
||||
(void *(*) ()) obstack_chunk_alloc, (void (*) ()) obstack_chunk_free)
|
||||
|
||||
#define obstack_begin(h, size) \
|
||||
_obstack_begin ((h), (size), 0, \
|
||||
(void *(*) ()) obstack_chunk_alloc, (void (*) ()) obstack_chunk_free)
|
||||
|
||||
#define obstack_specify_allocation(h, size, alignment, chunkfun, freefun) \
|
||||
_obstack_begin ((h), (size), (alignment), \
|
||||
(void *(*) ()) (chunkfun), (void (*) ()) (freefun))
|
||||
|
||||
#define obstack_specify_allocation_with_arg(h, size, alignment, chunkfun, freefun, arg) \
|
||||
_obstack_begin_1 ((h), (size), (alignment), \
|
||||
(void *(*) ()) (chunkfun), (void (*) ()) (freefun), (arg))
|
||||
|
||||
#define obstack_1grow_fast(h,achar) (*((h)->next_free)++ = achar)
|
||||
|
||||
#define obstack_blank_fast(h,n) ((h)->next_free += (n))
|
||||
|
||||
#if defined (__GNUC__) && defined (__STDC__)
|
||||
#if __GNUC__ < 2 || defined(NeXT)
|
||||
#define __extension__
|
||||
#endif
|
||||
|
||||
/* For GNU C, if not -traditional,
|
||||
we can define these macros to compute all args only once
|
||||
without using a global variable.
|
||||
Also, we can avoid using the `temp' slot, to make faster code. */
|
||||
|
||||
#define obstack_object_size(OBSTACK) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o = (OBSTACK); \
|
||||
(unsigned) (__o->next_free - __o->object_base); })
|
||||
|
||||
#define obstack_room(OBSTACK) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o = (OBSTACK); \
|
||||
(unsigned) (__o->chunk_limit - __o->next_free); })
|
||||
|
||||
/* Note that the call to _obstack_newchunk is enclosed in (..., 0)
|
||||
so that we can avoid having void expressions
|
||||
in the arms of the conditional expression.
|
||||
Casting the third operand to void was tried before,
|
||||
but some compilers won't accept it. */
|
||||
#define obstack_grow(OBSTACK,where,length) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o = (OBSTACK); \
|
||||
int __len = (length); \
|
||||
((__o->next_free + __len > __o->chunk_limit) \
|
||||
? (_obstack_newchunk (__o, __len), 0) : 0); \
|
||||
bcopy (where, __o->next_free, __len); \
|
||||
__o->next_free += __len; \
|
||||
(void) 0; })
|
||||
|
||||
#define obstack_grow0(OBSTACK,where,length) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o = (OBSTACK); \
|
||||
int __len = (length); \
|
||||
((__o->next_free + __len + 1 > __o->chunk_limit) \
|
||||
? (_obstack_newchunk (__o, __len + 1), 0) : 0), \
|
||||
bcopy (where, __o->next_free, __len), \
|
||||
__o->next_free += __len, \
|
||||
*(__o->next_free)++ = 0; \
|
||||
(void) 0; })
|
||||
|
||||
#define obstack_1grow(OBSTACK,datum) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o = (OBSTACK); \
|
||||
((__o->next_free + 1 > __o->chunk_limit) \
|
||||
? (_obstack_newchunk (__o, 1), 0) : 0), \
|
||||
*(__o->next_free)++ = (datum); \
|
||||
(void) 0; })
|
||||
|
||||
/* These assume that the obstack alignment is good enough for pointers or ints,
|
||||
and that the data added so far to the current object
|
||||
shares that much alignment. */
|
||||
|
||||
#define obstack_ptr_grow(OBSTACK,datum) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o = (OBSTACK); \
|
||||
((__o->next_free + sizeof (void *) > __o->chunk_limit) \
|
||||
? (_obstack_newchunk (__o, sizeof (void *)), 0) : 0), \
|
||||
*((void **)__o->next_free)++ = ((void *)datum); \
|
||||
(void) 0; })
|
||||
|
||||
#define obstack_int_grow(OBSTACK,datum) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o = (OBSTACK); \
|
||||
((__o->next_free + sizeof (int) > __o->chunk_limit) \
|
||||
? (_obstack_newchunk (__o, sizeof (int)), 0) : 0), \
|
||||
*((int *)__o->next_free)++ = ((int)datum); \
|
||||
(void) 0; })
|
||||
|
||||
#define obstack_ptr_grow_fast(h,aptr) (*((void **)(h)->next_free)++ = (void *)aptr)
|
||||
#define obstack_int_grow_fast(h,aint) (*((int *)(h)->next_free)++ = (int)aint)
|
||||
|
||||
#define obstack_blank(OBSTACK,length) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o = (OBSTACK); \
|
||||
int __len = (length); \
|
||||
((__o->chunk_limit - __o->next_free < __len) \
|
||||
? (_obstack_newchunk (__o, __len), 0) : 0); \
|
||||
__o->next_free += __len; \
|
||||
(void) 0; })
|
||||
|
||||
#define obstack_alloc(OBSTACK,length) \
|
||||
__extension__ \
|
||||
({ struct obstack *__h = (OBSTACK); \
|
||||
obstack_blank (__h, (length)); \
|
||||
obstack_finish (__h); })
|
||||
|
||||
#define obstack_copy(OBSTACK,where,length) \
|
||||
__extension__ \
|
||||
({ struct obstack *__h = (OBSTACK); \
|
||||
obstack_grow (__h, (where), (length)); \
|
||||
obstack_finish (__h); })
|
||||
|
||||
#define obstack_copy0(OBSTACK,where,length) \
|
||||
__extension__ \
|
||||
({ struct obstack *__h = (OBSTACK); \
|
||||
obstack_grow0 (__h, (where), (length)); \
|
||||
obstack_finish (__h); })
|
||||
|
||||
/* The local variable is named __o1 to avoid a name conflict
|
||||
when obstack_blank is called. */
|
||||
#define obstack_finish(OBSTACK) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o1 = (OBSTACK); \
|
||||
void *value = (void *) __o1->object_base; \
|
||||
if (__o1->next_free == value) \
|
||||
__o1->maybe_empty_object = 1; \
|
||||
__o1->next_free \
|
||||
= __INT_TO_PTR ((__PTR_TO_INT (__o1->next_free)+__o1->alignment_mask)\
|
||||
& ~ (__o1->alignment_mask)); \
|
||||
((__o1->next_free - (char *)__o1->chunk \
|
||||
> __o1->chunk_limit - (char *)__o1->chunk) \
|
||||
? (__o1->next_free = __o1->chunk_limit) : 0); \
|
||||
__o1->object_base = __o1->next_free; \
|
||||
value; })
|
||||
|
||||
#define obstack_free(OBSTACK, OBJ) \
|
||||
__extension__ \
|
||||
({ struct obstack *__o = (OBSTACK); \
|
||||
void *__obj = (OBJ); \
|
||||
if (__obj > (void *)__o->chunk && __obj < (void *)__o->chunk_limit) \
|
||||
__o->next_free = __o->object_base = __obj; \
|
||||
else (obstack_free) (__o, __obj); })
|
||||
|
||||
#else /* not __GNUC__ or not __STDC__ */
|
||||
|
||||
#define obstack_object_size(h) \
|
||||
(unsigned) ((h)->next_free - (h)->object_base)
|
||||
|
||||
#define obstack_room(h) \
|
||||
(unsigned) ((h)->chunk_limit - (h)->next_free)
|
||||
|
||||
#define obstack_grow(h,where,length) \
|
||||
( (h)->temp = (length), \
|
||||
(((h)->next_free + (h)->temp > (h)->chunk_limit) \
|
||||
? (_obstack_newchunk ((h), (h)->temp), 0) : 0), \
|
||||
bcopy (where, (h)->next_free, (h)->temp), \
|
||||
(h)->next_free += (h)->temp)
|
||||
|
||||
#define obstack_grow0(h,where,length) \
|
||||
( (h)->temp = (length), \
|
||||
(((h)->next_free + (h)->temp + 1 > (h)->chunk_limit) \
|
||||
? (_obstack_newchunk ((h), (h)->temp + 1), 0) : 0), \
|
||||
bcopy (where, (h)->next_free, (h)->temp), \
|
||||
(h)->next_free += (h)->temp, \
|
||||
*((h)->next_free)++ = 0)
|
||||
|
||||
#define obstack_1grow(h,datum) \
|
||||
( (((h)->next_free + 1 > (h)->chunk_limit) \
|
||||
? (_obstack_newchunk ((h), 1), 0) : 0), \
|
||||
*((h)->next_free)++ = (datum))
|
||||
|
||||
#define obstack_ptr_grow(h,datum) \
|
||||
( (((h)->next_free + sizeof (char *) > (h)->chunk_limit) \
|
||||
? (_obstack_newchunk ((h), sizeof (char *)), 0) : 0), \
|
||||
*((char **)(((h)->next_free+=sizeof(char *))-sizeof(char *))) = ((char *)datum))
|
||||
|
||||
#define obstack_int_grow(h,datum) \
|
||||
( (((h)->next_free + sizeof (int) > (h)->chunk_limit) \
|
||||
? (_obstack_newchunk ((h), sizeof (int)), 0) : 0), \
|
||||
*((int *)(((h)->next_free+=sizeof(int))-sizeof(int))) = ((int)datum))
|
||||
|
||||
#define obstack_ptr_grow_fast(h,aptr) (*((char **)(h)->next_free)++ = (char *)aptr)
|
||||
#define obstack_int_grow_fast(h,aint) (*((int *)(h)->next_free)++ = (int)aint)
|
||||
|
||||
#define obstack_blank(h,length) \
|
||||
( (h)->temp = (length), \
|
||||
(((h)->chunk_limit - (h)->next_free < (h)->temp) \
|
||||
? (_obstack_newchunk ((h), (h)->temp), 0) : 0), \
|
||||
(h)->next_free += (h)->temp)
|
||||
|
||||
#define obstack_alloc(h,length) \
|
||||
(obstack_blank ((h), (length)), obstack_finish ((h)))
|
||||
|
||||
#define obstack_copy(h,where,length) \
|
||||
(obstack_grow ((h), (where), (length)), obstack_finish ((h)))
|
||||
|
||||
#define obstack_copy0(h,where,length) \
|
||||
(obstack_grow0 ((h), (where), (length)), obstack_finish ((h)))
|
||||
|
||||
#define obstack_finish(h) \
|
||||
( ((h)->next_free == (h)->object_base \
|
||||
? (((h)->maybe_empty_object = 1), 0) \
|
||||
: 0), \
|
||||
(h)->temp = __PTR_TO_INT ((h)->object_base), \
|
||||
(h)->next_free \
|
||||
= __INT_TO_PTR ((__PTR_TO_INT ((h)->next_free)+(h)->alignment_mask) \
|
||||
& ~ ((h)->alignment_mask)), \
|
||||
(((h)->next_free - (char *)(h)->chunk \
|
||||
> (h)->chunk_limit - (char *)(h)->chunk) \
|
||||
? ((h)->next_free = (h)->chunk_limit) : 0), \
|
||||
(h)->object_base = (h)->next_free, \
|
||||
__INT_TO_PTR ((h)->temp))
|
||||
|
||||
#ifdef __STDC__
|
||||
#define obstack_free(h,obj) \
|
||||
( (h)->temp = (char *)(obj) - (char *) (h)->chunk, \
|
||||
(((h)->temp > 0 && (h)->temp < (h)->chunk_limit - (char *) (h)->chunk)\
|
||||
? (int) ((h)->next_free = (h)->object_base \
|
||||
= (h)->temp + (char *) (h)->chunk) \
|
||||
: (((obstack_free) ((h), (h)->temp + (char *) (h)->chunk), 0), 0)))
|
||||
#else
|
||||
#define obstack_free(h,obj) \
|
||||
( (h)->temp = (char *)(obj) - (char *) (h)->chunk, \
|
||||
(((h)->temp > 0 && (h)->temp < (h)->chunk_limit - (char *) (h)->chunk)\
|
||||
? (int) ((h)->next_free = (h)->object_base \
|
||||
= (h)->temp + (char *) (h)->chunk) \
|
||||
: (_obstack_free ((h), (h)->temp + (char *) (h)->chunk), 0)))
|
||||
#endif
|
||||
|
||||
#endif /* not __GNUC__ or not __STDC__ */
|
||||
|
||||
#endif /* not __OBSTACKS__ */
|
File diff suppressed because it is too large
Load Diff
@ -1,5 +1,7 @@
|
||||
/* Definitions for data structures callers pass the regex library.
|
||||
Copyright (C) 1985, 1989 Free Software Foundation, Inc.
|
||||
/* Definitions for data structures and routines for the regular
|
||||
expression library, version 0.12.
|
||||
|
||||
Copyright (C) 1985, 1989, 1990, 1991, 1992, 1993 Free Software Foundation, Inc.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
@ -13,173 +15,476 @@
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||
|
||||
#ifndef __REGEXP_LIBRARY_H__
|
||||
#define __REGEXP_LIBRARY_H__
|
||||
|
||||
In other words, you are welcome to use, share and improve this program.
|
||||
You are forbidden to forbid anyone else to use, share and improve
|
||||
what you give them. Help stamp out software-hoarding! */
|
||||
/* POSIX says that <sys/types.h> must be included (by the caller) before
|
||||
<regex.h>. */
|
||||
|
||||
|
||||
/* Define number of parens for which we record the beginnings and ends.
|
||||
This affects how much space the `struct re_registers' type takes up. */
|
||||
#ifndef RE_NREGS
|
||||
#define RE_NREGS 10
|
||||
#ifdef VMS
|
||||
/* VMS doesn't have `size_t' in <sys/types.h>, even though POSIX says it
|
||||
should be there. */
|
||||
#include <stddef.h>
|
||||
#endif
|
||||
|
||||
/* These bits are used in the obscure_syntax variable to choose among
|
||||
alternative regexp syntaxes. */
|
||||
|
||||
/* 1 means plain parentheses serve as grouping, and backslash
|
||||
parentheses are needed for literal searching.
|
||||
0 means backslash-parentheses are grouping, and plain parentheses
|
||||
are for literal searching. */
|
||||
#define RE_NO_BK_PARENS 1
|
||||
/* The following bits are used to determine the regexp syntax we
|
||||
recognize. The set/not-set meanings are chosen so that Emacs syntax
|
||||
remains the value 0. The bits are given in alphabetical order, and
|
||||
the definitions shifted by one from the previous bit; thus, when we
|
||||
add or remove a bit, only one other definition need change. */
|
||||
typedef unsigned reg_syntax_t;
|
||||
|
||||
/* 1 means plain | serves as the "or"-operator, and \| is a literal.
|
||||
0 means \| serves as the "or"-operator, and | is a literal. */
|
||||
#define RE_NO_BK_VBAR 2
|
||||
/* If this bit is not set, then \ inside a bracket expression is literal.
|
||||
If set, then such a \ quotes the following character. */
|
||||
#define RE_BACKSLASH_ESCAPE_IN_LISTS (1)
|
||||
|
||||
/* 0 means plain + or ? serves as an operator, and \+, \? are literals.
|
||||
1 means \+, \? are operators and plain +, ? are literals. */
|
||||
#define RE_BK_PLUS_QM 4
|
||||
/* If this bit is not set, then + and ? are operators, and \+ and \? are
|
||||
literals.
|
||||
If set, then \+ and \? are operators and + and ? are literals. */
|
||||
#define RE_BK_PLUS_QM (RE_BACKSLASH_ESCAPE_IN_LISTS << 1)
|
||||
|
||||
/* 1 means | binds tighter than ^ or $.
|
||||
0 means the contrary. */
|
||||
#define RE_TIGHT_VBAR 8
|
||||
/* If this bit is set, then character classes are supported. They are:
|
||||
[:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:],
|
||||
[:space:], [:print:], [:punct:], [:graph:], and [:cntrl:].
|
||||
If not set, then character classes are not supported. */
|
||||
#define RE_CHAR_CLASSES (RE_BK_PLUS_QM << 1)
|
||||
|
||||
/* 1 means treat \n as an _OR operator
|
||||
0 means treat it as a normal character */
|
||||
#define RE_NEWLINE_OR 16
|
||||
/* If this bit is set, then ^ and $ are always anchors (outside bracket
|
||||
expressions, of course).
|
||||
If this bit is not set, then it depends:
|
||||
^ is an anchor if it is at the beginning of a regular
|
||||
expression or after an open-group or an alternation operator;
|
||||
$ is an anchor if it is at the end of a regular expression, or
|
||||
before a close-group or an alternation operator.
|
||||
|
||||
/* 0 means that a special characters (such as *, ^, and $) always have
|
||||
their special meaning regardless of the surrounding context.
|
||||
1 means that special characters may act as normal characters in some
|
||||
contexts. Specifically, this applies to:
|
||||
^ - only special at the beginning, or after ( or |
|
||||
$ - only special at the end, or before ) or |
|
||||
*, +, ? - only special when not after the beginning, (, or | */
|
||||
#define RE_CONTEXT_INDEP_OPS 32
|
||||
This bit could be (re)combined with RE_CONTEXT_INDEP_OPS, because
|
||||
POSIX draft 11.2 says that * etc. in leading positions is undefined.
|
||||
We already implemented a previous draft which made those constructs
|
||||
invalid, though, so we haven't changed the code back. */
|
||||
#define RE_CONTEXT_INDEP_ANCHORS (RE_CHAR_CLASSES << 1)
|
||||
|
||||
/* Now define combinations of bits for the standard possibilities. */
|
||||
#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_CONTEXT_INDEP_OPS)
|
||||
#define RE_SYNTAX_EGREP (RE_SYNTAX_AWK | RE_NEWLINE_OR)
|
||||
#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
|
||||
/* If this bit is set, then special characters are always special
|
||||
regardless of where they are in the pattern.
|
||||
If this bit is not set, then special characters are special only in
|
||||
some contexts; otherwise they are ordinary. Specifically,
|
||||
* + ? and intervals are only special when not after the beginning,
|
||||
open-group, or alternation operator. */
|
||||
#define RE_CONTEXT_INDEP_OPS (RE_CONTEXT_INDEP_ANCHORS << 1)
|
||||
|
||||
/* If this bit is set, then *, +, ?, and { cannot be first in an re or
|
||||
immediately after an alternation or begin-group operator. */
|
||||
#define RE_CONTEXT_INVALID_OPS (RE_CONTEXT_INDEP_OPS << 1)
|
||||
|
||||
/* If this bit is set, then . matches newline.
|
||||
If not set, then it doesn't. */
|
||||
#define RE_DOT_NEWLINE (RE_CONTEXT_INVALID_OPS << 1)
|
||||
|
||||
/* If this bit is set, then . doesn't match NUL.
|
||||
If not set, then it does. */
|
||||
#define RE_DOT_NOT_NULL (RE_DOT_NEWLINE << 1)
|
||||
|
||||
/* If this bit is set, nonmatching lists [^...] do not match newline.
|
||||
If not set, they do. */
|
||||
#define RE_HAT_LISTS_NOT_NEWLINE (RE_DOT_NOT_NULL << 1)
|
||||
|
||||
/* If this bit is set, either \{...\} or {...} defines an
|
||||
interval, depending on RE_NO_BK_BRACES.
|
||||
If not set, \{, \}, {, and } are literals. */
|
||||
#define RE_INTERVALS (RE_HAT_LISTS_NOT_NEWLINE << 1)
|
||||
|
||||
/* If this bit is set, +, ? and | aren't recognized as operators.
|
||||
If not set, they are. */
|
||||
#define RE_LIMITED_OPS (RE_INTERVALS << 1)
|
||||
|
||||
/* If this bit is set, newline is an alternation operator.
|
||||
If not set, newline is literal. */
|
||||
#define RE_NEWLINE_ALT (RE_LIMITED_OPS << 1)
|
||||
|
||||
/* If this bit is set, then `{...}' defines an interval, and \{ and \}
|
||||
are literals.
|
||||
If not set, then `\{...\}' defines an interval. */
|
||||
#define RE_NO_BK_BRACES (RE_NEWLINE_ALT << 1)
|
||||
|
||||
/* If this bit is set, (...) defines a group, and \( and \) are literals.
|
||||
If not set, \(...\) defines a group, and ( and ) are literals. */
|
||||
#define RE_NO_BK_PARENS (RE_NO_BK_BRACES << 1)
|
||||
|
||||
/* If this bit is set, then \<digit> matches <digit>.
|
||||
If not set, then \<digit> is a back-reference. */
|
||||
#define RE_NO_BK_REFS (RE_NO_BK_PARENS << 1)
|
||||
|
||||
/* If this bit is set, then | is an alternation operator, and \| is literal.
|
||||
If not set, then \| is an alternation operator, and | is literal. */
|
||||
#define RE_NO_BK_VBAR (RE_NO_BK_REFS << 1)
|
||||
|
||||
/* If this bit is set, then an ending range point collating higher
|
||||
than the starting range point, as in [z-a], is invalid.
|
||||
If not set, then when ending range point collates higher than the
|
||||
starting range point, the range is ignored. */
|
||||
#define RE_NO_EMPTY_RANGES (RE_NO_BK_VBAR << 1)
|
||||
|
||||
/* If this bit is set, then an unmatched ) is ordinary.
|
||||
If not set, then an unmatched ) is invalid. */
|
||||
#define RE_UNMATCHED_RIGHT_PAREN_ORD (RE_NO_EMPTY_RANGES << 1)
|
||||
|
||||
/* This global variable defines the particular regexp syntax to use (for
|
||||
some interfaces). When a regexp is compiled, the syntax used is
|
||||
stored in the pattern buffer, so changing this does not affect
|
||||
already-compiled regexps. */
|
||||
extern reg_syntax_t re_syntax_options;
|
||||
|
||||
/* Define combinations of the above bits for the standard possibilities.
|
||||
(The [[[ comments delimit what gets put into the Texinfo file, so
|
||||
don't delete them!) */
|
||||
/* [[[begin syntaxes]]] */
|
||||
#define RE_SYNTAX_EMACS 0
|
||||
|
||||
/* This data structure is used to represent a compiled pattern. */
|
||||
#define RE_SYNTAX_AWK \
|
||||
(RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \
|
||||
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
|
||||
| RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \
|
||||
| RE_UNMATCHED_RIGHT_PAREN_ORD)
|
||||
|
||||
#define RE_SYNTAX_POSIX_AWK \
|
||||
(RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS)
|
||||
|
||||
#define RE_SYNTAX_GREP \
|
||||
(RE_BK_PLUS_QM | RE_CHAR_CLASSES \
|
||||
| RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \
|
||||
| RE_NEWLINE_ALT)
|
||||
|
||||
#define RE_SYNTAX_EGREP \
|
||||
(RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \
|
||||
| RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \
|
||||
| RE_NEWLINE_ALT | RE_NO_BK_PARENS \
|
||||
| RE_NO_BK_VBAR)
|
||||
|
||||
#define RE_SYNTAX_POSIX_EGREP \
|
||||
(RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES)
|
||||
|
||||
/* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */
|
||||
#define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC
|
||||
|
||||
#define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC
|
||||
|
||||
/* Syntax bits common to both basic and extended POSIX regex syntax. */
|
||||
#define _RE_SYNTAX_POSIX_COMMON \
|
||||
(RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \
|
||||
| RE_INTERVALS | RE_NO_EMPTY_RANGES)
|
||||
|
||||
#define RE_SYNTAX_POSIX_BASIC \
|
||||
(_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM)
|
||||
|
||||
/* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes
|
||||
RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this
|
||||
isn't minimal, since other operators, such as \`, aren't disabled. */
|
||||
#define RE_SYNTAX_POSIX_MINIMAL_BASIC \
|
||||
(_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS)
|
||||
|
||||
#define RE_SYNTAX_POSIX_EXTENDED \
|
||||
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
|
||||
| RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \
|
||||
| RE_NO_BK_PARENS | RE_NO_BK_VBAR \
|
||||
| RE_UNMATCHED_RIGHT_PAREN_ORD)
|
||||
|
||||
/* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS
|
||||
replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */
|
||||
#define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \
|
||||
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
|
||||
| RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \
|
||||
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
|
||||
| RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD)
|
||||
/* [[[end syntaxes]]] */
|
||||
|
||||
/* Maximum number of duplicates an interval can allow. Some systems
|
||||
(erroneously) define this in other header files, but we want our
|
||||
value, so remove any previous define. */
|
||||
#ifdef RE_DUP_MAX
|
||||
#undef RE_DUP_MAX
|
||||
#endif
|
||||
#define RE_DUP_MAX ((1 << 15) - 1)
|
||||
|
||||
|
||||
/* POSIX `cflags' bits (i.e., information for `regcomp'). */
|
||||
|
||||
/* If this bit is set, then use extended regular expression syntax.
|
||||
If not set, then use basic regular expression syntax. */
|
||||
#define REG_EXTENDED 1
|
||||
|
||||
/* If this bit is set, then ignore case when matching.
|
||||
If not set, then case is significant. */
|
||||
#define REG_ICASE (REG_EXTENDED << 1)
|
||||
|
||||
/* If this bit is set, then anchors do not match at newline
|
||||
characters in the string.
|
||||
If not set, then anchors do match at newlines. */
|
||||
#define REG_NEWLINE (REG_ICASE << 1)
|
||||
|
||||
/* If this bit is set, then report only success or fail in regexec.
|
||||
If not set, then returns differ between not matching and errors. */
|
||||
#define REG_NOSUB (REG_NEWLINE << 1)
|
||||
|
||||
|
||||
/* POSIX `eflags' bits (i.e., information for regexec). */
|
||||
|
||||
/* If this bit is set, then the beginning-of-line operator doesn't match
|
||||
the beginning of the string (presumably because it's not the
|
||||
beginning of a line).
|
||||
If not set, then the beginning-of-line operator does match the
|
||||
beginning of the string. */
|
||||
#define REG_NOTBOL 1
|
||||
|
||||
/* Like REG_NOTBOL, except for the end-of-line. */
|
||||
#define REG_NOTEOL (1 << 1)
|
||||
|
||||
|
||||
/* If any error codes are removed, changed, or added, update the
|
||||
`re_error_msg' table in regex.c. */
|
||||
typedef enum
|
||||
{
|
||||
REG_NOERROR = 0, /* Success. */
|
||||
REG_NOMATCH, /* Didn't find a match (for regexec). */
|
||||
|
||||
/* POSIX regcomp return error codes. (In the order listed in the
|
||||
standard.) */
|
||||
REG_BADPAT, /* Invalid pattern. */
|
||||
REG_ECOLLATE, /* Not implemented. */
|
||||
REG_ECTYPE, /* Invalid character class name. */
|
||||
REG_EESCAPE, /* Trailing backslash. */
|
||||
REG_ESUBREG, /* Invalid back reference. */
|
||||
REG_EBRACK, /* Unmatched left bracket. */
|
||||
REG_EPAREN, /* Parenthesis imbalance. */
|
||||
REG_EBRACE, /* Unmatched \{. */
|
||||
REG_BADBR, /* Invalid contents of \{\}. */
|
||||
REG_ERANGE, /* Invalid range end. */
|
||||
REG_ESPACE, /* Ran out of memory. */
|
||||
REG_BADRPT, /* No preceding re for repetition op. */
|
||||
|
||||
/* Error codes we've added. */
|
||||
REG_EEND, /* Premature end. */
|
||||
REG_ESIZE, /* Compiled pattern bigger than 2^16 bytes. */
|
||||
REG_ERPAREN /* Unmatched ) or \); not returned from regcomp. */
|
||||
} reg_errcode_t;
|
||||
|
||||
/* This data structure represents a compiled pattern. Before calling
|
||||
the pattern compiler, the fields `buffer', `allocated', `fastmap',
|
||||
`translate', and `no_sub' can be set. After the pattern has been
|
||||
compiled, the `re_nsub' field is available. All other fields are
|
||||
private to the regex routines. */
|
||||
|
||||
struct re_pattern_buffer
|
||||
{
|
||||
char *buffer; /* Space holding the compiled pattern commands. */
|
||||
int allocated; /* Size of space that buffer points to */
|
||||
int used; /* Length of portion of buffer actually occupied */
|
||||
char *fastmap; /* Pointer to fastmap, if any, or zero if none. */
|
||||
/* re_search uses the fastmap, if there is one,
|
||||
to skip quickly over totally implausible characters */
|
||||
char *translate; /* Translate table to apply to all characters before comparing.
|
||||
Or zero for no translation.
|
||||
The translation is applied to a pattern when it is compiled
|
||||
and to data when it is matched. */
|
||||
char fastmap_accurate;
|
||||
/* Set to zero when a new pattern is stored,
|
||||
set to one when the fastmap is updated from it. */
|
||||
char can_be_null; /* Set to one by compiling fastmap
|
||||
if this pattern might match the null string.
|
||||
It does not necessarily match the null string
|
||||
in that case, but if this is zero, it cannot.
|
||||
2 as value means can match null string
|
||||
but at end of range or before a character
|
||||
listed in the fastmap. */
|
||||
/* [[[begin pattern_buffer]]] */
|
||||
/* Space that holds the compiled pattern. It is declared as
|
||||
`unsigned char *' because its elements are
|
||||
sometimes used as array indexes. */
|
||||
unsigned char *buffer;
|
||||
|
||||
/* Number of bytes to which `buffer' points. */
|
||||
unsigned long allocated;
|
||||
|
||||
/* Number of bytes actually used in `buffer'. */
|
||||
unsigned long used;
|
||||
|
||||
/* Syntax setting with which the pattern was compiled. */
|
||||
reg_syntax_t syntax;
|
||||
|
||||
/* Pointer to a fastmap, if any, otherwise zero. re_search uses
|
||||
the fastmap, if there is one, to skip over impossible
|
||||
starting points for matches. */
|
||||
char *fastmap;
|
||||
|
||||
/* Either a translate table to apply to all characters before
|
||||
comparing them, or zero for no translation. The translation
|
||||
is applied to a pattern when it is compiled and to a string
|
||||
when it is matched. */
|
||||
char *translate;
|
||||
|
||||
/* Number of subexpressions found by the compiler. */
|
||||
size_t re_nsub;
|
||||
|
||||
/* Zero if this pattern cannot match the empty string, one else.
|
||||
Well, in truth it's used only in `re_search_2', to see
|
||||
whether or not we should use the fastmap, so we don't set
|
||||
this absolutely perfectly; see `re_compile_fastmap' (the
|
||||
`duplicate' case). */
|
||||
unsigned can_be_null : 1;
|
||||
|
||||
/* If REGS_UNALLOCATED, allocate space in the `regs' structure
|
||||
for `max (RE_NREGS, re_nsub + 1)' groups.
|
||||
If REGS_REALLOCATE, reallocate space if necessary.
|
||||
If REGS_FIXED, use what's there. */
|
||||
#define REGS_UNALLOCATED 0
|
||||
#define REGS_REALLOCATE 1
|
||||
#define REGS_FIXED 2
|
||||
unsigned regs_allocated : 2;
|
||||
|
||||
/* Set to zero when `regex_compile' compiles a pattern; set to one
|
||||
by `re_compile_fastmap' if it updates the fastmap. */
|
||||
unsigned fastmap_accurate : 1;
|
||||
|
||||
/* If set, `re_match_2' does not return information about
|
||||
subexpressions. */
|
||||
unsigned no_sub : 1;
|
||||
|
||||
/* If set, a beginning-of-line anchor doesn't match at the
|
||||
beginning of the string. */
|
||||
unsigned not_bol : 1;
|
||||
|
||||
/* Similarly for an end-of-line anchor. */
|
||||
unsigned not_eol : 1;
|
||||
|
||||
/* If true, an anchor at a newline matches. */
|
||||
unsigned newline_anchor : 1;
|
||||
|
||||
/* [[[end pattern_buffer]]] */
|
||||
};
|
||||
|
||||
/* Structure to store "register" contents data in.
|
||||
typedef struct re_pattern_buffer regex_t;
|
||||
|
||||
Pass the address of such a structure as an argument to re_match, etc.,
|
||||
if you want this information back.
|
||||
|
||||
start[i] and end[i] record the string matched by \( ... \) grouping i,
|
||||
for i from 1 to RE_NREGS - 1.
|
||||
start[0] and end[0] record the entire string matched. */
|
||||
/* search.c (search_buffer) in Emacs needs this one opcode value. It is
|
||||
defined both in `regex.c' and here. */
|
||||
#define RE_EXACTN_VALUE 1
|
||||
|
||||
/* Type for byte offsets within the string. POSIX mandates this. */
|
||||
typedef int regoff_t;
|
||||
|
||||
|
||||
/* This is the structure we store register match data in. See
|
||||
regex.texinfo for a full description of what registers match. */
|
||||
struct re_registers
|
||||
{
|
||||
int start[RE_NREGS];
|
||||
int end[RE_NREGS];
|
||||
unsigned num_regs;
|
||||
regoff_t *start;
|
||||
regoff_t *end;
|
||||
};
|
||||
|
||||
/* These are the command codes that appear in compiled regular expressions, one per byte.
|
||||
Some command codes are followed by argument bytes.
|
||||
A command code can specify any interpretation whatever for its arguments.
|
||||
Zero-bytes may appear in the compiled regular expression. */
|
||||
|
||||
enum regexpcode
|
||||
{
|
||||
unused,
|
||||
exactn, /* followed by one byte giving n, and then by n literal bytes */
|
||||
begline, /* fails unless at beginning of line */
|
||||
endline, /* fails unless at end of line */
|
||||
jump, /* followed by two bytes giving relative address to jump to */
|
||||
on_failure_jump, /* followed by two bytes giving relative address of place
|
||||
to resume at in case of failure. */
|
||||
finalize_jump, /* Throw away latest failure point and then jump to address. */
|
||||
maybe_finalize_jump, /* Like jump but finalize if safe to do so.
|
||||
This is used to jump back to the beginning
|
||||
of a repeat. If the command that follows
|
||||
this jump is clearly incompatible with the
|
||||
one at the beginning of the repeat, such that
|
||||
we can be sure that there is no use backtracking
|
||||
out of repetitions already completed,
|
||||
then we finalize. */
|
||||
dummy_failure_jump, /* jump, and push a dummy failure point.
|
||||
This failure point will be thrown away
|
||||
if an attempt is made to use it for a failure.
|
||||
A + construct makes this before the first repeat. */
|
||||
anychar, /* matches any one character */
|
||||
charset, /* matches any one char belonging to specified set.
|
||||
First following byte is # bitmap bytes.
|
||||
Then come bytes for a bit-map saying which chars are in.
|
||||
Bits in each byte are ordered low-bit-first.
|
||||
A character is in the set if its bit is 1.
|
||||
A character too large to have a bit in the map
|
||||
is automatically not in the set */
|
||||
charset_not, /* similar but match any character that is NOT one of those specified */
|
||||
start_memory, /* starts remembering the text that is matched
|
||||
and stores it in a memory register.
|
||||
followed by one byte containing the register number.
|
||||
Register numbers must be in the range 0 through NREGS. */
|
||||
stop_memory, /* stops remembering the text that is matched
|
||||
and stores it in a memory register.
|
||||
followed by one byte containing the register number.
|
||||
Register numbers must be in the range 0 through NREGS. */
|
||||
duplicate, /* match a duplicate of something remembered.
|
||||
Followed by one byte containing the index of the memory register. */
|
||||
before_dot, /* Succeeds if before dot */
|
||||
at_dot, /* Succeeds if at dot */
|
||||
after_dot, /* Succeeds if after dot */
|
||||
begbuf, /* Succeeds if at beginning of buffer */
|
||||
endbuf, /* Succeeds if at end of buffer */
|
||||
wordchar, /* Matches any word-constituent character */
|
||||
notwordchar, /* Matches any char that is not a word-constituent */
|
||||
wordbeg, /* Succeeds if at word beginning */
|
||||
wordend, /* Succeeds if at word end */
|
||||
wordbound, /* Succeeds if at a word boundary */
|
||||
notwordbound, /* Succeeds if not at a word boundary */
|
||||
syntaxspec, /* Matches any character whose syntax is specified.
|
||||
followed by a byte which contains a syntax code, Sword or such like */
|
||||
notsyntaxspec /* Matches any character whose syntax differs from the specified. */
|
||||
};
|
||||
|
||||
extern char *re_compile_pattern ();
|
||||
/* Is this really advertised? */
|
||||
extern void re_compile_fastmap ();
|
||||
extern int re_search (), re_search_2 ();
|
||||
extern int re_match (), re_match_2 ();
|
||||
|
||||
/* 4.2 bsd compatibility (yuck) */
|
||||
extern char *re_comp ();
|
||||
extern int re_exec ();
|
||||
|
||||
#ifdef SYNTAX_TABLE
|
||||
extern char *re_syntax_table;
|
||||
/* If `regs_allocated' is REGS_UNALLOCATED in the pattern buffer,
|
||||
`re_match_2' returns information about at least this many registers
|
||||
the first time a `regs' structure is passed. */
|
||||
#ifndef RE_NREGS
|
||||
#define RE_NREGS 30
|
||||
#endif
|
||||
|
||||
|
||||
/* POSIX specification for registers. Aside from the different names than
|
||||
`re_registers', POSIX uses an array of structures, instead of a
|
||||
structure of arrays. */
|
||||
typedef struct
|
||||
{
|
||||
regoff_t rm_so; /* Byte offset from string's start to substring's start. */
|
||||
regoff_t rm_eo; /* Byte offset from string's start to substring's end. */
|
||||
} regmatch_t;
|
||||
|
||||
/* Declarations for routines. */
|
||||
|
||||
/* To avoid duplicating every routine declaration -- once with a
|
||||
prototype (if we are ANSI), and once without (if we aren't) -- we
|
||||
use the following macro to declare argument types. This
|
||||
unfortunately clutters up the declarations a bit, but I think it's
|
||||
worth it. */
|
||||
|
||||
#if __STDC__
|
||||
|
||||
#define _RE_ARGS(args) args
|
||||
|
||||
#else /* not __STDC__ */
|
||||
|
||||
#define _RE_ARGS(args) ()
|
||||
|
||||
#endif /* not __STDC__ */
|
||||
|
||||
/* Sets the current default syntax to SYNTAX, and return the old syntax.
|
||||
You can also simply assign to the `re_syntax_options' variable. */
|
||||
extern reg_syntax_t re_set_syntax _RE_ARGS ((reg_syntax_t syntax));
|
||||
|
||||
/* Compile the regular expression PATTERN, with length LENGTH
|
||||
and syntax given by the global `re_syntax_options', into the buffer
|
||||
BUFFER. Return NULL if successful, and an error string if not. */
|
||||
extern const char *re_compile_pattern
|
||||
_RE_ARGS ((const char *pattern, int length,
|
||||
struct re_pattern_buffer *buffer));
|
||||
|
||||
|
||||
/* Compile a fastmap for the compiled pattern in BUFFER; used to
|
||||
accelerate searches. Return 0 if successful and -2 if was an
|
||||
internal error. */
|
||||
extern int re_compile_fastmap _RE_ARGS ((struct re_pattern_buffer *buffer));
|
||||
|
||||
|
||||
/* Search in the string STRING (with length LENGTH) for the pattern
|
||||
compiled into BUFFER. Start searching at position START, for RANGE
|
||||
characters. Return the starting position of the match, -1 for no
|
||||
match, or -2 for an internal error. Also return register
|
||||
information in REGS (if REGS and BUFFER->no_sub are nonzero). */
|
||||
extern int re_search
|
||||
_RE_ARGS ((struct re_pattern_buffer *buffer, const char *string,
|
||||
int length, int start, int range, struct re_registers *regs));
|
||||
|
||||
|
||||
/* Like `re_search', but search in the concatenation of STRING1 and
|
||||
STRING2. Also, stop searching at index START + STOP. */
|
||||
extern int re_search_2
|
||||
_RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1,
|
||||
int length1, const char *string2, int length2,
|
||||
int start, int range, struct re_registers *regs, int stop));
|
||||
|
||||
|
||||
/* Like `re_search', but return how many characters in STRING the regexp
|
||||
in BUFFER matched, starting at position START. */
|
||||
extern int re_match
|
||||
_RE_ARGS ((struct re_pattern_buffer *buffer, const char *string,
|
||||
int length, int start, struct re_registers *regs));
|
||||
|
||||
|
||||
/* Relates to `re_match' as `re_search_2' relates to `re_search'. */
|
||||
extern int re_match_2
|
||||
_RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1,
|
||||
int length1, const char *string2, int length2,
|
||||
int start, struct re_registers *regs, int stop));
|
||||
|
||||
|
||||
/* Set REGS to hold NUM_REGS registers, storing them in STARTS and
|
||||
ENDS. Subsequent matches using BUFFER and REGS will use this memory
|
||||
for recording register information. STARTS and ENDS must be
|
||||
allocated with malloc, and must each be at least `NUM_REGS * sizeof
|
||||
(regoff_t)' bytes long.
|
||||
|
||||
If NUM_REGS == 0, then subsequent matches should allocate their own
|
||||
register data.
|
||||
|
||||
Unless this function is called, the first search or match using
|
||||
PATTERN_BUFFER will allocate its own register data, without
|
||||
freeing the old data. */
|
||||
extern void re_set_registers
|
||||
_RE_ARGS ((struct re_pattern_buffer *buffer, struct re_registers *regs,
|
||||
unsigned num_regs, regoff_t *starts, regoff_t *ends));
|
||||
|
||||
/* 4.2 bsd compatibility. */
|
||||
extern char *re_comp _RE_ARGS ((const char *));
|
||||
extern int re_exec _RE_ARGS ((const char *));
|
||||
|
||||
/* POSIX compatibility. */
|
||||
extern int regcomp _RE_ARGS ((regex_t *preg, const char *pattern, int cflags));
|
||||
extern int regexec
|
||||
_RE_ARGS ((const regex_t *preg, const char *string, size_t nmatch,
|
||||
regmatch_t pmatch[], int eflags));
|
||||
extern size_t regerror
|
||||
_RE_ARGS ((int errcode, const regex_t *preg, char *errbuf,
|
||||
size_t errbuf_size));
|
||||
extern void regfree _RE_ARGS ((regex_t *preg));
|
||||
|
||||
#endif /* not __REGEXP_LIBRARY_H__ */
|
||||
|
||||
/*
|
||||
Local variables:
|
||||
make-backup-files: t
|
||||
version-control: t
|
||||
trim-versions-without-asking: nil
|
||||
End:
|
||||
*/
|
||||
|
481
gnu/usr.bin/grep/search.c
Normal file
481
gnu/usr.bin/grep/search.c
Normal file
@ -0,0 +1,481 @@
|
||||
/* search.c - searching subroutines using dfa, kwset and regex for grep.
|
||||
Copyright (C) 1992 Free Software Foundation, Inc.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2, or (at your option)
|
||||
any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
|
||||
Written August 1992 by Mike Haertel. */
|
||||
|
||||
#include <ctype.h>
|
||||
|
||||
#ifdef STDC_HEADERS
|
||||
#include <limits.h>
|
||||
#include <stdlib.h>
|
||||
#else
|
||||
#define UCHAR_MAX 255
|
||||
#include <sys/types.h>
|
||||
extern char *malloc();
|
||||
#endif
|
||||
|
||||
#ifdef HAVE_MEMCHR
|
||||
#include <string.h>
|
||||
#ifdef NEED_MEMORY_H
|
||||
#include <memory.h>
|
||||
#endif
|
||||
#else
|
||||
#ifdef __STDC__
|
||||
extern void *memchr();
|
||||
#else
|
||||
extern char *memchr();
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#if defined(HAVE_STRING_H) || defined(STDC_HEADERS)
|
||||
#undef bcopy
|
||||
#define bcopy(s, d, n) memcpy((d), (s), (n))
|
||||
#endif
|
||||
|
||||
#ifdef isascii
|
||||
#define ISALNUM(C) (isascii(C) && isalnum(C))
|
||||
#define ISUPPER(C) (isascii(C) && isupper(C))
|
||||
#else
|
||||
#define ISALNUM(C) isalnum(C)
|
||||
#define ISUPPER(C) isupper(C)
|
||||
#endif
|
||||
|
||||
#define TOLOWER(C) (ISUPPER(C) ? tolower(C) : (C))
|
||||
|
||||
#include "grep.h"
|
||||
#include "dfa.h"
|
||||
#include "kwset.h"
|
||||
#include "regex.h"
|
||||
|
||||
#define NCHAR (UCHAR_MAX + 1)
|
||||
|
||||
#if __STDC__
|
||||
static void Gcompile(char *, size_t);
|
||||
static void Ecompile(char *, size_t);
|
||||
static char *EGexecute(char *, size_t, char **);
|
||||
static void Fcompile(char *, size_t);
|
||||
static char *Fexecute(char *, size_t, char **);
|
||||
#else
|
||||
static void Gcompile();
|
||||
static void Ecompile();
|
||||
static char *EGexecute();
|
||||
static void Fcompile();
|
||||
static char *Fexecute();
|
||||
#endif
|
||||
|
||||
/* Here is the matchers vector for the main program. */
|
||||
struct matcher matchers[] = {
|
||||
{ "default", Gcompile, EGexecute },
|
||||
{ "grep", Gcompile, EGexecute },
|
||||
{ "ggrep", Gcompile, EGexecute },
|
||||
{ "egrep", Ecompile, EGexecute },
|
||||
{ "posix-egrep", Ecompile, EGexecute },
|
||||
{ "gegrep", Ecompile, EGexecute },
|
||||
{ "fgrep", Fcompile, Fexecute },
|
||||
{ "gfgrep", Fcompile, Fexecute },
|
||||
{ 0, 0, 0 },
|
||||
};
|
||||
|
||||
/* For -w, we also consider _ to be word constituent. */
|
||||
#define WCHAR(C) (ISALNUM(C) || (C) == '_')
|
||||
|
||||
/* DFA compiled regexp. */
|
||||
static struct dfa dfa;
|
||||
|
||||
/* Regex compiled regexp. */
|
||||
static struct re_pattern_buffer regex;
|
||||
|
||||
/* KWset compiled pattern. For Ecompile and Gcompile, we compile
|
||||
a list of strings, at least one of which is known to occur in
|
||||
any string matching the regexp. */
|
||||
static kwset_t kwset;
|
||||
|
||||
/* Last compiled fixed string known to exactly match the regexp.
|
||||
If kwsexec() returns < lastexact, then we don't need to
|
||||
call the regexp matcher at all. */
|
||||
static int lastexact;
|
||||
|
||||
void
|
||||
dfaerror(mesg)
|
||||
char *mesg;
|
||||
{
|
||||
fatal(mesg, 0);
|
||||
}
|
||||
|
||||
static void
|
||||
kwsinit()
|
||||
{
|
||||
static char trans[NCHAR];
|
||||
int i;
|
||||
|
||||
if (match_icase)
|
||||
for (i = 0; i < NCHAR; ++i)
|
||||
trans[i] = TOLOWER(i);
|
||||
|
||||
if (!(kwset = kwsalloc(match_icase ? trans : (char *) 0)))
|
||||
fatal("memory exhausted", 0);
|
||||
}
|
||||
|
||||
/* If the DFA turns out to have some set of fixed strings one of
|
||||
which must occur in the match, then we build a kwset matcher
|
||||
to find those strings, and thus quickly filter out impossible
|
||||
matches. */
|
||||
static void
|
||||
kwsmusts()
|
||||
{
|
||||
struct dfamust *dm;
|
||||
char *err;
|
||||
|
||||
if (dfa.musts)
|
||||
{
|
||||
kwsinit();
|
||||
/* First, we compile in the substrings known to be exact
|
||||
matches. The kwset matcher will return the index
|
||||
of the matching string that it chooses. */
|
||||
for (dm = dfa.musts; dm; dm = dm->next)
|
||||
{
|
||||
if (!dm->exact)
|
||||
continue;
|
||||
++lastexact;
|
||||
if ((err = kwsincr(kwset, dm->must, strlen(dm->must))) != 0)
|
||||
fatal(err, 0);
|
||||
}
|
||||
/* Now, we compile the substrings that will require
|
||||
the use of the regexp matcher. */
|
||||
for (dm = dfa.musts; dm; dm = dm->next)
|
||||
{
|
||||
if (dm->exact)
|
||||
continue;
|
||||
if ((err = kwsincr(kwset, dm->must, strlen(dm->must))) != 0)
|
||||
fatal(err, 0);
|
||||
}
|
||||
if ((err = kwsprep(kwset)) != 0)
|
||||
fatal(err, 0);
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
Gcompile(pattern, size)
|
||||
char *pattern;
|
||||
size_t size;
|
||||
{
|
||||
#ifdef __STDC__
|
||||
const
|
||||
#endif
|
||||
char *err;
|
||||
|
||||
re_set_syntax(RE_SYNTAX_GREP | RE_HAT_LISTS_NOT_NEWLINE);
|
||||
dfasyntax(RE_SYNTAX_GREP | RE_HAT_LISTS_NOT_NEWLINE, match_icase);
|
||||
|
||||
if ((err = re_compile_pattern(pattern, size, ®ex)) != 0)
|
||||
fatal(err, 0);
|
||||
|
||||
dfainit(&dfa);
|
||||
|
||||
/* In the match_words and match_lines cases, we use a different pattern
|
||||
for the DFA matcher that will quickly throw out cases that won't work.
|
||||
Then if DFA succeeds we do some hairy stuff using the regex matcher
|
||||
to decide whether the match should really count. */
|
||||
if (match_words || match_lines)
|
||||
{
|
||||
/* In the whole-word case, we use the pattern:
|
||||
(^|[^A-Za-z_])(userpattern)([^A-Za-z_]|$).
|
||||
In the whole-line case, we use the pattern:
|
||||
^(userpattern)$.
|
||||
BUG: Using [A-Za-z_] is locale-dependent! */
|
||||
|
||||
char *n = malloc(size + 50);
|
||||
int i = 0;
|
||||
|
||||
strcpy(n, "");
|
||||
|
||||
if (match_lines)
|
||||
strcpy(n, "^\\(");
|
||||
if (match_words)
|
||||
strcpy(n, "\\(^\\|[^0-9A-Za-z_]\\)\\(");
|
||||
|
||||
i = strlen(n);
|
||||
bcopy(pattern, n + i, size);
|
||||
i += size;
|
||||
|
||||
if (match_words)
|
||||
strcpy(n + i, "\\)\\([^0-9A-Za-z_]\\|$\\)");
|
||||
if (match_lines)
|
||||
strcpy(n + i, "\\)$");
|
||||
|
||||
i += strlen(n + i);
|
||||
dfacomp(n, i, &dfa, 1);
|
||||
}
|
||||
else
|
||||
dfacomp(pattern, size, &dfa, 1);
|
||||
|
||||
kwsmusts();
|
||||
}
|
||||
|
||||
static void
|
||||
Ecompile(pattern, size)
|
||||
char *pattern;
|
||||
size_t size;
|
||||
{
|
||||
#ifdef __STDC__
|
||||
const
|
||||
#endif
|
||||
char *err;
|
||||
|
||||
if (strcmp(matcher, "posix-egrep") == 0)
|
||||
{
|
||||
re_set_syntax(RE_SYNTAX_POSIX_EGREP);
|
||||
dfasyntax(RE_SYNTAX_POSIX_EGREP, match_icase);
|
||||
}
|
||||
else
|
||||
{
|
||||
re_set_syntax(RE_SYNTAX_EGREP);
|
||||
dfasyntax(RE_SYNTAX_EGREP, match_icase);
|
||||
}
|
||||
|
||||
if ((err = re_compile_pattern(pattern, size, ®ex)) != 0)
|
||||
fatal(err, 0);
|
||||
|
||||
dfainit(&dfa);
|
||||
|
||||
/* In the match_words and match_lines cases, we use a different pattern
|
||||
for the DFA matcher that will quickly throw out cases that won't work.
|
||||
Then if DFA succeeds we do some hairy stuff using the regex matcher
|
||||
to decide whether the match should really count. */
|
||||
if (match_words || match_lines)
|
||||
{
|
||||
/* In the whole-word case, we use the pattern:
|
||||
(^|[^A-Za-z_])(userpattern)([^A-Za-z_]|$).
|
||||
In the whole-line case, we use the pattern:
|
||||
^(userpattern)$.
|
||||
BUG: Using [A-Za-z_] is locale-dependent! */
|
||||
|
||||
char *n = malloc(size + 50);
|
||||
int i = 0;
|
||||
|
||||
strcpy(n, "");
|
||||
|
||||
if (match_lines)
|
||||
strcpy(n, "^(");
|
||||
if (match_words)
|
||||
strcpy(n, "(^|[^0-9A-Za-z_])(");
|
||||
|
||||
i = strlen(n);
|
||||
bcopy(pattern, n + i, size);
|
||||
i += size;
|
||||
|
||||
if (match_words)
|
||||
strcpy(n + i, ")([^0-9A-Za-z_]|$)");
|
||||
if (match_lines)
|
||||
strcpy(n + i, ")$");
|
||||
|
||||
i += strlen(n + i);
|
||||
dfacomp(n, i, &dfa, 1);
|
||||
}
|
||||
else
|
||||
dfacomp(pattern, size, &dfa, 1);
|
||||
|
||||
kwsmusts();
|
||||
}
|
||||
|
||||
static char *
|
||||
EGexecute(buf, size, endp)
|
||||
char *buf;
|
||||
size_t size;
|
||||
char **endp;
|
||||
{
|
||||
register char *buflim, *beg, *end, save;
|
||||
int backref, start, len;
|
||||
struct kwsmatch kwsm;
|
||||
static struct re_registers regs; /* This is static on account of a BRAIN-DEAD
|
||||
Q@#%!# library interface in regex.c. */
|
||||
|
||||
buflim = buf + size;
|
||||
|
||||
for (beg = end = buf; end < buflim; beg = end + 1)
|
||||
{
|
||||
if (kwset)
|
||||
{
|
||||
/* Find a possible match using the KWset matcher. */
|
||||
beg = kwsexec(kwset, beg, buflim - beg, &kwsm);
|
||||
if (!beg)
|
||||
goto failure;
|
||||
/* Narrow down to the line containing the candidate, and
|
||||
run it through DFA. */
|
||||
end = memchr(beg, '\n', buflim - beg);
|
||||
if (!end)
|
||||
end = buflim;
|
||||
while (beg > buf && beg[-1] != '\n')
|
||||
--beg;
|
||||
save = *end;
|
||||
if (kwsm.index < lastexact)
|
||||
goto success;
|
||||
if (!dfaexec(&dfa, beg, end, 0, (int *) 0, &backref))
|
||||
{
|
||||
*end = save;
|
||||
continue;
|
||||
}
|
||||
*end = save;
|
||||
/* Successful, no backreferences encountered. */
|
||||
if (!backref)
|
||||
goto success;
|
||||
}
|
||||
else
|
||||
{
|
||||
/* No good fixed strings; start with DFA. */
|
||||
save = *buflim;
|
||||
beg = dfaexec(&dfa, beg, buflim, 0, (int *) 0, &backref);
|
||||
*buflim = save;
|
||||
if (!beg)
|
||||
goto failure;
|
||||
/* Narrow down to the line we've found. */
|
||||
end = memchr(beg, '\n', buflim - beg);
|
||||
if (!end)
|
||||
end = buflim;
|
||||
while (beg > buf && beg[-1] != '\n')
|
||||
--beg;
|
||||
/* Successful, no backreferences encountered! */
|
||||
if (!backref)
|
||||
goto success;
|
||||
}
|
||||
/* If we've made it to this point, this means DFA has seen
|
||||
a probable match, and we need to run it through Regex. */
|
||||
regex.not_eol = 0;
|
||||
if ((start = re_search(®ex, beg, end - beg, 0, end - beg, ®s)) >= 0)
|
||||
{
|
||||
len = regs.end[0] - start;
|
||||
if (!match_lines && !match_words || match_lines && len == end - beg)
|
||||
goto success;
|
||||
/* If -w, check if the match aligns with word boundaries.
|
||||
We do this iteratively because:
|
||||
(a) the line may contain more than one occurence of the pattern, and
|
||||
(b) Several alternatives in the pattern might be valid at a given
|
||||
point, and we may need to consider a shorter one to find a word
|
||||
boundary. */
|
||||
if (match_words)
|
||||
while (start >= 0)
|
||||
{
|
||||
if ((start == 0 || !WCHAR(beg[start - 1]))
|
||||
&& (len == end - beg || !WCHAR(beg[start + len])))
|
||||
goto success;
|
||||
if (len > 0)
|
||||
{
|
||||
/* Try a shorter length anchored at the same place. */
|
||||
--len;
|
||||
regex.not_eol = 1;
|
||||
len = re_match(®ex, beg, start + len, start, ®s);
|
||||
}
|
||||
if (len <= 0)
|
||||
{
|
||||
/* Try looking further on. */
|
||||
if (start == end - beg)
|
||||
break;
|
||||
++start;
|
||||
regex.not_eol = 0;
|
||||
start = re_search(®ex, beg, end - beg,
|
||||
start, end - beg - start, ®s);
|
||||
len = regs.end[0] - start;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
failure:
|
||||
return 0;
|
||||
|
||||
success:
|
||||
*endp = end < buflim ? end + 1 : end;
|
||||
return beg;
|
||||
}
|
||||
|
||||
static void
|
||||
Fcompile(pattern, size)
|
||||
char *pattern;
|
||||
size_t size;
|
||||
{
|
||||
char *beg, *lim, *err;
|
||||
|
||||
kwsinit();
|
||||
beg = pattern;
|
||||
do
|
||||
{
|
||||
for (lim = beg; lim < pattern + size && *lim != '\n'; ++lim)
|
||||
;
|
||||
if ((err = kwsincr(kwset, beg, lim - beg)) != 0)
|
||||
fatal(err, 0);
|
||||
if (lim < pattern + size)
|
||||
++lim;
|
||||
beg = lim;
|
||||
}
|
||||
while (beg < pattern + size);
|
||||
|
||||
if ((err = kwsprep(kwset)) != 0)
|
||||
fatal(err, 0);
|
||||
}
|
||||
|
||||
static char *
|
||||
Fexecute(buf, size, endp)
|
||||
char *buf;
|
||||
size_t size;
|
||||
char **endp;
|
||||
{
|
||||
register char *beg, *try, *end;
|
||||
register size_t len;
|
||||
struct kwsmatch kwsmatch;
|
||||
|
||||
for (beg = buf; beg <= buf + size; ++beg)
|
||||
{
|
||||
if (!(beg = kwsexec(kwset, beg, buf + size - beg, &kwsmatch)))
|
||||
return 0;
|
||||
len = kwsmatch.size[0];
|
||||
if (match_lines)
|
||||
{
|
||||
if (beg > buf && beg[-1] != '\n')
|
||||
continue;
|
||||
if (beg + len < buf + size && beg[len] != '\n')
|
||||
continue;
|
||||
goto success;
|
||||
}
|
||||
else if (match_words)
|
||||
for (try = beg; len && try;)
|
||||
{
|
||||
if (try > buf && WCHAR((unsigned char) try[-1]))
|
||||
break;
|
||||
if (try + len < buf + size && WCHAR((unsigned char) try[len]))
|
||||
{
|
||||
try = kwsexec(kwset, beg, --len, &kwsmatch);
|
||||
len = kwsmatch.size[0];
|
||||
}
|
||||
else
|
||||
goto success;
|
||||
}
|
||||
else
|
||||
goto success;
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
||||
success:
|
||||
if ((end = memchr(beg + len, '\n', (buf + size) - (beg + len))) != 0)
|
||||
++end;
|
||||
else
|
||||
end = buf + size;
|
||||
*endp = end;
|
||||
while (beg > buf && beg[-1] != '\n')
|
||||
--beg;
|
||||
return beg;
|
||||
}
|
24
gnu/usr.bin/grep/tests/check.sh
Normal file
24
gnu/usr.bin/grep/tests/check.sh
Normal file
@ -0,0 +1,24 @@
|
||||
#! /bin/sh
|
||||
# Regression test for GNU grep.
|
||||
# Usage: regress.sh [testdir]
|
||||
|
||||
testdir=${1-tests}
|
||||
|
||||
failures=0
|
||||
|
||||
# The Khadafy test is brought to you by Scott Anderson . . .
|
||||
./grep -E -f $testdir/khadafy.regexp $testdir/khadafy.lines > khadafy.out
|
||||
if cmp $testdir/khadafy.lines khadafy.out
|
||||
then
|
||||
:
|
||||
else
|
||||
echo Khadafy test failed -- output left on khadafy.out
|
||||
failures=1
|
||||
fi
|
||||
|
||||
# . . . and the following by Henry Spencer.
|
||||
|
||||
${AWK-awk} -F: -f $testdir/scriptgen.awk $testdir/spencer.tests > tmp.script
|
||||
|
||||
sh tmp.script && exit $failures
|
||||
exit 1
|
@ -1,6 +1,6 @@
|
||||
BEGIN { print "failures=0"; }
|
||||
!/^#/ && NF == 3 {
|
||||
print "echo '" $3 "' | $1/egrep -e '" $2 "' > /dev/null 2>&1";
|
||||
$0 !~ /^#/ && NF == 3 {
|
||||
print "echo '" $3 "' | ./grep -E -e '" $2 "' > /dev/null 2>&1";
|
||||
print "if [ $? != " $1 " ]"
|
||||
print "then"
|
||||
printf "\techo Spencer test \\#%d failed\n", ++n
|
||||
|
@ -33,7 +33,7 @@
|
||||
0:a[b-d]e:ace
|
||||
0:a[b-d]:aac
|
||||
0:a[-b]:a-
|
||||
2:a[b-]:a-
|
||||
0:a[b-]:a-
|
||||
1:a[b-a]:-
|
||||
2:a[]b:-
|
||||
2:a[:-
|
||||
|
Loading…
Reference in New Issue
Block a user