This commit was manufactured by cvs2svn to create tag 'grep_2_0'.
This commit is contained in:
parent
717f769197
commit
c0712724bc
Notes:
svn2git
2020-12-20 02:59:44 +00:00
svn path=/cvs2svn/tags/grep_2_0/; revision=95
29
gnu/usr.bin/grep/AUTHORS
Normal file
29
gnu/usr.bin/grep/AUTHORS
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
Mike Haertel wrote the main program and the dfa and kwset matchers.
|
||||||
|
|
||||||
|
Arthur David Olson contributed the heuristics for finding fixed substrings
|
||||||
|
at the end of dfa.c.
|
||||||
|
|
||||||
|
Richard Stallman and Karl Berry wrote the regex backtracking matcher.
|
||||||
|
|
||||||
|
Henry Spencer wrote the original test suite from which grep's was derived.
|
||||||
|
|
||||||
|
Scott Anderson invented the Khadafy test.
|
||||||
|
|
||||||
|
David MacKenzie wrote the automatic configuration software use to
|
||||||
|
produce the configure script.
|
||||||
|
|
||||||
|
Authors of the replacements for standard library routines are identified
|
||||||
|
in the corresponding source files.
|
||||||
|
|
||||||
|
The idea of using Boyer-Moore type algorithms to quickly filter out
|
||||||
|
non-matching text before calling the regexp matcher was originally due
|
||||||
|
to James Woods. He also contributed some code to early versions of
|
||||||
|
GNU grep.
|
||||||
|
|
||||||
|
Finally, I would like to thank Andrew Hume for many fascinating discussions
|
||||||
|
of string searching issues over the years. Hume & Sunday's excellent
|
||||||
|
paper on fast string searching (AT&T Bell Laboratories CSTR #156)
|
||||||
|
describes some of the history of the subject, as well as providing
|
||||||
|
exhaustive performance analysis of various implementation alternatives.
|
||||||
|
The inner loop of GNU grep is similar to Hume & Sunday's recommended
|
||||||
|
"Tuned Boyer Moore" inner loop.
|
@ -1,7 +1,13 @@
|
|||||||
|
|
||||||
PROG= grep
|
PROG= grep
|
||||||
SRCS= dfa.c regex.o grep.o
|
SRCS= dfa.c grep.c getopt.c kwset.c obstack.c regex.c search.c
|
||||||
CFLAGS+=-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1
|
|
||||||
MLINKS= grep.1 egrep.1
|
CFLAGS+=-DGREP -DHAVE_STRING_H=1 -DHAVE_SYS_PARAM_H=1 -DHAVE_UNISTD_H=1 \
|
||||||
|
-DHAVE_GETPAGESIZE=1 -DHAVE_MEMCHR=1 -DHAVE_STRERROR=1 \
|
||||||
|
-DHAVE_VALLOC=1
|
||||||
|
|
||||||
|
|
||||||
|
#check: ${.CURDIR}/grep
|
||||||
|
check: all
|
||||||
|
awk sh ${.CURDIR}/tests/check.sh ${.CURDIR}/tests
|
||||||
|
|
||||||
.include <bsd.prog.mk>
|
.include <bsd.prog.mk>
|
||||||
|
35
gnu/usr.bin/grep/NEWS
Normal file
35
gnu/usr.bin/grep/NEWS
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
Version 2.0:
|
||||||
|
|
||||||
|
The most important user visible change is that egrep and fgrep have
|
||||||
|
disappeared as separate programs into the single grep program mandated
|
||||||
|
by POSIX 1003.2. New options -G, -E, and -F have been added,
|
||||||
|
selecting grep, egrep, and fgrep behavior respectively. For
|
||||||
|
compatibility with historical practice, hard links named egrep and
|
||||||
|
fgrep are also provided. See the manual page for details.
|
||||||
|
|
||||||
|
In addition, the regular expression facilities described in Posix
|
||||||
|
draft 11.2 are now supported, except for internationalization features
|
||||||
|
related to locale-dependent collating sequence information.
|
||||||
|
|
||||||
|
There is a new option, -L, which is like -l except it lists
|
||||||
|
files which don't contain matches. The reason this option was
|
||||||
|
added is because '-l -v' doesn't do what you expect.
|
||||||
|
|
||||||
|
Performance has been improved; the amount of improvement is platform
|
||||||
|
dependent, but (for example) grep 2.0 typically runs at least 30% faster
|
||||||
|
than grep 1.6 on a DECstation using the MIPS compiler. Where possible,
|
||||||
|
grep now uses mmap() for file input; on a Sun 4 running SunOS 4.1 this
|
||||||
|
may cut system time by as much as half, for a total reduction in running
|
||||||
|
time by nearly 50%. On machines that don't use mmap(), the buffering
|
||||||
|
code has been rewritten to choose more favorable alignments and buffer
|
||||||
|
sizes for read().
|
||||||
|
|
||||||
|
Portability has been substantially cleaned up, and an automatic
|
||||||
|
configure script is now provided.
|
||||||
|
|
||||||
|
The internals have changed in ways too numerous to mention.
|
||||||
|
People brave enough to reuse the DFA matcher in other programs
|
||||||
|
will now have their bravery amply "rewarded", for the interface
|
||||||
|
to that file has been completely changed. Some changes were
|
||||||
|
necessary to track the evolution of the regex package, and since
|
||||||
|
I was changing it anyway I decided to do a general cleanup.
|
15
gnu/usr.bin/grep/PROJECTS
Normal file
15
gnu/usr.bin/grep/PROJECTS
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
Write Texinfo documentation for grep. The manual page would be a good
|
||||||
|
place to start, but Info documents are also supposed to contain a
|
||||||
|
tutorial and examples.
|
||||||
|
|
||||||
|
Fix the DFA matcher to never use exponential space. (Fortunately, these
|
||||||
|
cases are rare.)
|
||||||
|
|
||||||
|
Improve the performance of the regex backtracking matcher. This matcher
|
||||||
|
is agonizingly slow, and is responsible for grep sometimes being slower
|
||||||
|
than Unix grep when backreferences are used.
|
||||||
|
|
||||||
|
Provide support for the Posix [= =] and [. .] constructs. This is
|
||||||
|
difficult because it requires locale-dependent details of the character
|
||||||
|
set and collating sequence, but Posix does not standardize any method
|
||||||
|
for accessing this information!
|
@ -1,70 +1,28 @@
|
|||||||
This README documents GNU e?grep version 1.6. All bugs reported for
|
This is GNU grep 2.0, the "fastest grep in the west" (we hope). All
|
||||||
previous versions have been fixed.
|
bugs reported in previous releases have been fixed. Many exciting new
|
||||||
|
bugs have probably been introduced in this major revision.
|
||||||
|
|
||||||
See the file INSTALL for compilation and installation instructions.
|
GNU grep is provided "as is" with no warranty. The exact terms
|
||||||
|
|
||||||
Send bug reports to bug-gnu-utils@prep.ai.mit.edu.
|
|
||||||
|
|
||||||
GNU e?grep is provided "as is" with no warranty. The exact terms
|
|
||||||
under which you may use and (re)distribute this program are detailed
|
under which you may use and (re)distribute this program are detailed
|
||||||
in the GNU General Public License, in the file COPYING.
|
in the GNU General Public License, in the file COPYING.
|
||||||
|
|
||||||
GNU e?grep is based on a fast lazy-state deterministic matcher (about
|
GNU grep is based on a fast lazy-state deterministic matcher (about
|
||||||
twice as fast as stock Unix egrep) hybridized with a Boyer-Moore-Gosper
|
twice as fast as stock Unix egrep) hybridized with a Boyer-Moore-Gosper
|
||||||
search for a fixed string that eliminates impossible text from being
|
search for a fixed string that eliminates impossible text from being
|
||||||
considered by the full regexp matcher without necessarily having to
|
considered by the full regexp matcher without necessarily having to
|
||||||
look at every character. The result is typically many times faster
|
look at every character. The result is typically many times faster
|
||||||
than Unix grep or egrep. (Regular expressions containing backreferencing
|
than Unix grep or egrep. (Regular expressions containing backreferencing
|
||||||
may run more slowly, however.)
|
will run more slowly, however.)
|
||||||
|
|
||||||
GNU e?grep is brought to you by the efforts of several people:
|
See the file AUTHORS for a list of authors and other contributors.
|
||||||
|
|
||||||
Mike Haertel wrote the deterministic regexp code and the bulk
|
See the file INSTALL for compilation and installation instructions.
|
||||||
of the program.
|
|
||||||
|
|
||||||
James A. Woods is responsible for the hybridized search strategy
|
See the file MANIFEST for a list of files in this distribution.
|
||||||
of using Boyer-Moore-Gosper fixed-string search as a filter
|
|
||||||
before calling the general regexp matcher.
|
|
||||||
|
|
||||||
Arthur David Olson contributed code that finds fixed strings for
|
See the file NEWS for a description of major changes in this release.
|
||||||
the aforementioned BMG search for a large class of regexps.
|
|
||||||
|
|
||||||
Richard Stallman wrote the backtracking regexp matcher that is
|
See the file PROJECTS if you want to be mentioned in AUTHORS.
|
||||||
used for \<digit> backreferences, as well as the getopt that
|
|
||||||
is provided for 4.2BSD sites. The backtracking matcher was
|
|
||||||
originally written for GNU Emacs.
|
|
||||||
|
|
||||||
D. A. Gwyn wrote the C alloca emulation that is provided so
|
Send bug reports to bug-gnu-utils@prep.ai.mit.edu. Be sure to
|
||||||
System V machines can run this program. (Alloca is used only
|
include the word "grep" in your Subject: header field.
|
||||||
by RMS' backtracking matcher, and then only rarely, so there
|
|
||||||
is no loss if your machine doesn't have a "real" alloca.)
|
|
||||||
|
|
||||||
Scott Anderson and Henry Spencer designed the regression tests
|
|
||||||
used in the "regress" script.
|
|
||||||
|
|
||||||
Paul Placeway wrote the manual page, based on this README.
|
|
||||||
|
|
||||||
If you are interested in improving this program, you may wish to try
|
|
||||||
any of the following:
|
|
||||||
|
|
||||||
1. Replace the fast search loop with a faster search loop.
|
|
||||||
There are several things that could be improved, the most notable
|
|
||||||
of which would be to calculate a minimal delta2 to use.
|
|
||||||
|
|
||||||
2. Make backreferencing \<digit> faster. Right now, backreferencing is
|
|
||||||
handled by calling the Emacs backtracking matcher to verify the partial
|
|
||||||
match. This is slow; if the DFA routines could handle backreferencing
|
|
||||||
themselves a speedup on the order of three to four times might occur
|
|
||||||
in those cases where the backtracking matcher is called to verify nearly
|
|
||||||
every line. Also, some portability problems due to the inclusion of the
|
|
||||||
emacs matcher would be solved because it could then be eliminated.
|
|
||||||
Note that expressions with backreferencing are not true regular
|
|
||||||
expressions, and thus are not equivalent to any DFA. So this is hard.
|
|
||||||
|
|
||||||
3. Handle POSIX style regexps. I'm not sure if this could be called an
|
|
||||||
improvement; some of the things on regexps in the POSIX draft I have
|
|
||||||
seen are pretty sickening. But it would be useful in the interests of
|
|
||||||
conforming to the standard.
|
|
||||||
|
|
||||||
4. Replace the main driver program grep.c with the much cleaner main driver
|
|
||||||
program used in GNU fgrep.
|
|
||||||
|
File diff suppressed because it is too large
Load Diff
@ -16,210 +16,115 @@
|
|||||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||||
|
|
||||||
/* Written June, 1988 by Mike Haertel */
|
/* Written June, 1988 by Mike Haertel */
|
||||||
|
|
||||||
#ifdef STDC_HEADERS
|
|
||||||
|
|
||||||
#include <stddef.h>
|
/* FIXME:
|
||||||
#include <stdlib.h>
|
2. We should not export so much of the DFA internals.
|
||||||
|
In addition to clobbering modularity, we eat up valuable
|
||||||
#else /* !STDC_HEADERS */
|
name space. */
|
||||||
|
|
||||||
#define const
|
|
||||||
#include <sys/types.h> /* For size_t. */
|
|
||||||
extern char *calloc(), *malloc(), *realloc();
|
|
||||||
extern void free();
|
|
||||||
|
|
||||||
#ifndef NULL
|
|
||||||
#define NULL 0
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#endif /* ! STDC_HEADERS */
|
|
||||||
|
|
||||||
#include <ctype.h>
|
|
||||||
#ifndef isascii
|
|
||||||
#define ISALNUM(c) isalnum(c)
|
|
||||||
#define ISALPHA(c) isalpha(c)
|
|
||||||
#define ISUPPER(c) isupper(c)
|
|
||||||
#define ISLOWER(c) islower(c)
|
|
||||||
#else
|
|
||||||
#define ISALNUM(c) (isascii(c) && isalnum(c))
|
|
||||||
#define ISALPHA(c) (isascii(c) && isalpha(c))
|
|
||||||
#define ISUPPER(c) (isascii(c) && isupper(c))
|
|
||||||
#define ISLOWER(c) (isascii(c) && islower(c))
|
|
||||||
#endif
|
|
||||||
|
|
||||||
/* 1 means plain parentheses serve as grouping, and backslash
|
|
||||||
parentheses are needed for literal searching.
|
|
||||||
0 means backslash-parentheses are grouping, and plain parentheses
|
|
||||||
are for literal searching. */
|
|
||||||
#define RE_NO_BK_PARENS 1
|
|
||||||
|
|
||||||
/* 1 means plain | serves as the "or"-operator, and \| is a literal.
|
|
||||||
0 means \| serves as the "or"-operator, and | is a literal. */
|
|
||||||
#define RE_NO_BK_VBAR 2
|
|
||||||
|
|
||||||
/* 0 means plain + or ? serves as an operator, and \+, \? are literals.
|
|
||||||
1 means \+, \? are operators and plain +, ? are literals. */
|
|
||||||
#define RE_BK_PLUS_QM 4
|
|
||||||
|
|
||||||
/* 1 means | binds tighter than ^ or $.
|
|
||||||
0 means the contrary. */
|
|
||||||
#define RE_TIGHT_VBAR 8
|
|
||||||
|
|
||||||
/* 1 means treat \n as an _OR operator
|
|
||||||
0 means treat it as a normal character */
|
|
||||||
#define RE_NEWLINE_OR 16
|
|
||||||
|
|
||||||
/* 0 means that a special characters (such as *, ^, and $) always have
|
|
||||||
their special meaning regardless of the surrounding context.
|
|
||||||
1 means that special characters may act as normal characters in some
|
|
||||||
contexts. Specifically, this applies to:
|
|
||||||
^ - only special at the beginning, or after ( or |
|
|
||||||
$ - only special at the end, or before ) or |
|
|
||||||
*, +, ? - only special when not after the beginning, (, or | */
|
|
||||||
#define RE_CONTEXT_INDEP_OPS 32
|
|
||||||
|
|
||||||
/* Now define combinations of bits for the standard possibilities. */
|
|
||||||
#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_CONTEXT_INDEP_OPS)
|
|
||||||
#define RE_SYNTAX_EGREP (RE_SYNTAX_AWK | RE_NEWLINE_OR)
|
|
||||||
#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
|
|
||||||
#define RE_SYNTAX_EMACS 0
|
|
||||||
|
|
||||||
/* Number of bits in an unsigned char. */
|
/* Number of bits in an unsigned char. */
|
||||||
#define CHARBITS 8
|
#define CHARBITS 8
|
||||||
|
|
||||||
/* First integer value that is greater than any character code. */
|
/* First integer value that is greater than any character code. */
|
||||||
#define _NOTCHAR (1 << CHARBITS)
|
#define NOTCHAR (1 << CHARBITS)
|
||||||
|
|
||||||
/* INTBITS need not be exact, just a lower bound. */
|
/* INTBITS need not be exact, just a lower bound. */
|
||||||
#define INTBITS (CHARBITS * sizeof (int))
|
#define INTBITS (CHARBITS * sizeof (int))
|
||||||
|
|
||||||
/* Number of ints required to hold a bit for every character. */
|
/* Number of ints required to hold a bit for every character. */
|
||||||
#define _CHARSET_INTS ((_NOTCHAR + INTBITS - 1) / INTBITS)
|
#define CHARCLASS_INTS ((NOTCHAR + INTBITS - 1) / INTBITS)
|
||||||
|
|
||||||
/* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
|
/* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
|
||||||
typedef int _charset[_CHARSET_INTS];
|
typedef int charclass[CHARCLASS_INTS];
|
||||||
|
|
||||||
/* The regexp is parsed into an array of tokens in postfix form. Some tokens
|
/* The regexp is parsed into an array of tokens in postfix form. Some tokens
|
||||||
are operators and others are terminal symbols. Most (but not all) of these
|
are operators and others are terminal symbols. Most (but not all) of these
|
||||||
codes are returned by the lexical analyzer. */
|
codes are returned by the lexical analyzer. */
|
||||||
#if __STDC__
|
|
||||||
|
|
||||||
typedef enum
|
typedef enum
|
||||||
{
|
{
|
||||||
_END = -1, /* _END is a terminal symbol that matches the
|
END = -1, /* END is a terminal symbol that matches the
|
||||||
end of input; any value of _END or less in
|
end of input; any value of END or less in
|
||||||
the parse tree is such a symbol. Accepting
|
the parse tree is such a symbol. Accepting
|
||||||
states of the DFA are those that would have
|
states of the DFA are those that would have
|
||||||
a transition on _END. */
|
a transition on END. */
|
||||||
|
|
||||||
/* Ordinary character values are terminal symbols that match themselves. */
|
/* Ordinary character values are terminal symbols that match themselves. */
|
||||||
|
|
||||||
_EMPTY = _NOTCHAR, /* _EMPTY is a terminal symbol that matches
|
EMPTY = NOTCHAR, /* EMPTY is a terminal symbol that matches
|
||||||
the empty string. */
|
the empty string. */
|
||||||
|
|
||||||
_BACKREF, /* _BACKREF is generated by \<digit>; it
|
BACKREF, /* BACKREF is generated by \<digit>; it
|
||||||
it not completely handled. If the scanner
|
it not completely handled. If the scanner
|
||||||
detects a transition on backref, it returns
|
detects a transition on backref, it returns
|
||||||
a kind of "semi-success" indicating that
|
a kind of "semi-success" indicating that
|
||||||
the match will have to be verified with
|
the match will have to be verified with
|
||||||
a backtracking matcher. */
|
a backtracking matcher. */
|
||||||
|
|
||||||
_BEGLINE, /* _BEGLINE is a terminal symbol that matches
|
BEGLINE, /* BEGLINE is a terminal symbol that matches
|
||||||
the empty string if it is at the beginning
|
the empty string if it is at the beginning
|
||||||
of a line. */
|
of a line. */
|
||||||
|
|
||||||
_ALLBEGLINE, /* _ALLBEGLINE is a terminal symbol that
|
ENDLINE, /* ENDLINE is a terminal symbol that matches
|
||||||
matches the empty string if it is at the
|
|
||||||
beginning of a line; _ALLBEGLINE applies
|
|
||||||
to the entire regexp and can only occur
|
|
||||||
as the first token thereof. _ALLBEGLINE
|
|
||||||
never appears in the parse tree; a _BEGLINE
|
|
||||||
is prepended with _CAT to the entire
|
|
||||||
regexp instead. */
|
|
||||||
|
|
||||||
_ENDLINE, /* _ENDLINE is a terminal symbol that matches
|
|
||||||
the empty string if it is at the end of
|
the empty string if it is at the end of
|
||||||
a line. */
|
a line. */
|
||||||
|
|
||||||
_ALLENDLINE, /* _ALLENDLINE is to _ENDLINE as _ALLBEGLINE
|
BEGWORD, /* BEGWORD is a terminal symbol that matches
|
||||||
is to _BEGLINE. */
|
|
||||||
|
|
||||||
_BEGWORD, /* _BEGWORD is a terminal symbol that matches
|
|
||||||
the empty string if it is at the beginning
|
the empty string if it is at the beginning
|
||||||
of a word. */
|
of a word. */
|
||||||
|
|
||||||
_ENDWORD, /* _ENDWORD is a terminal symbol that matches
|
ENDWORD, /* ENDWORD is a terminal symbol that matches
|
||||||
the empty string if it is at the end of
|
the empty string if it is at the end of
|
||||||
a word. */
|
a word. */
|
||||||
|
|
||||||
_LIMWORD, /* _LIMWORD is a terminal symbol that matches
|
LIMWORD, /* LIMWORD is a terminal symbol that matches
|
||||||
the empty string if it is at the beginning
|
the empty string if it is at the beginning
|
||||||
or the end of a word. */
|
or the end of a word. */
|
||||||
|
|
||||||
_NOTLIMWORD, /* _NOTLIMWORD is a terminal symbol that
|
NOTLIMWORD, /* NOTLIMWORD is a terminal symbol that
|
||||||
matches the empty string if it is not at
|
matches the empty string if it is not at
|
||||||
the beginning or end of a word. */
|
the beginning or end of a word. */
|
||||||
|
|
||||||
_QMARK, /* _QMARK is an operator of one argument that
|
QMARK, /* QMARK is an operator of one argument that
|
||||||
matches zero or one occurences of its
|
matches zero or one occurences of its
|
||||||
argument. */
|
argument. */
|
||||||
|
|
||||||
_STAR, /* _STAR is an operator of one argument that
|
STAR, /* STAR is an operator of one argument that
|
||||||
matches the Kleene closure (zero or more
|
matches the Kleene closure (zero or more
|
||||||
occurrences) of its argument. */
|
occurrences) of its argument. */
|
||||||
|
|
||||||
_PLUS, /* _PLUS is an operator of one argument that
|
PLUS, /* PLUS is an operator of one argument that
|
||||||
matches the positive closure (one or more
|
matches the positive closure (one or more
|
||||||
occurrences) of its argument. */
|
occurrences) of its argument. */
|
||||||
|
|
||||||
_CAT, /* _CAT is an operator of two arguments that
|
REPMN, /* REPMN is a lexical token corresponding
|
||||||
|
to the {m,n} construct. REPMN never
|
||||||
|
appears in the compiled token vector. */
|
||||||
|
|
||||||
|
CAT, /* CAT is an operator of two arguments that
|
||||||
matches the concatenation of its
|
matches the concatenation of its
|
||||||
arguments. _CAT is never returned by the
|
arguments. CAT is never returned by the
|
||||||
lexical analyzer. */
|
lexical analyzer. */
|
||||||
|
|
||||||
_OR, /* _OR is an operator of two arguments that
|
OR, /* OR is an operator of two arguments that
|
||||||
matches either of its arguments. */
|
matches either of its arguments. */
|
||||||
|
|
||||||
_LPAREN, /* _LPAREN never appears in the parse tree,
|
ORTOP, /* OR at the toplevel in the parse tree.
|
||||||
|
This is used for a boyer-moore heuristic. */
|
||||||
|
|
||||||
|
LPAREN, /* LPAREN never appears in the parse tree,
|
||||||
it is only a lexeme. */
|
it is only a lexeme. */
|
||||||
|
|
||||||
_RPAREN, /* _RPAREN never appears in the parse tree. */
|
RPAREN, /* RPAREN never appears in the parse tree. */
|
||||||
|
|
||||||
_SET /* _SET and (and any value greater) is a
|
CSET /* CSET and (and any value greater) is a
|
||||||
terminal symbol that matches any of a
|
terminal symbol that matches any of a
|
||||||
class of characters. */
|
class of characters. */
|
||||||
} _token;
|
} token;
|
||||||
|
|
||||||
#else /* ! __STDC__ */
|
/* Sets are stored in an array in the compiled dfa; the index of the
|
||||||
|
array corresponding to a given set token is given by SET_INDEX(t). */
|
||||||
typedef short _token;
|
#define SET_INDEX(t) ((t) - CSET)
|
||||||
|
|
||||||
#define _END -1
|
|
||||||
#define _EMPTY _NOTCHAR
|
|
||||||
#define _BACKREF (_EMPTY + 1)
|
|
||||||
#define _BEGLINE (_EMPTY + 2)
|
|
||||||
#define _ALLBEGLINE (_EMPTY + 3)
|
|
||||||
#define _ENDLINE (_EMPTY + 4)
|
|
||||||
#define _ALLENDLINE (_EMPTY + 5)
|
|
||||||
#define _BEGWORD (_EMPTY + 6)
|
|
||||||
#define _ENDWORD (_EMPTY + 7)
|
|
||||||
#define _LIMWORD (_EMPTY + 8)
|
|
||||||
#define _NOTLIMWORD (_EMPTY + 9)
|
|
||||||
#define _QMARK (_EMPTY + 10)
|
|
||||||
#define _STAR (_EMPTY + 11)
|
|
||||||
#define _PLUS (_EMPTY + 12)
|
|
||||||
#define _CAT (_EMPTY + 13)
|
|
||||||
#define _OR (_EMPTY + 14)
|
|
||||||
#define _LPAREN (_EMPTY + 15)
|
|
||||||
#define _RPAREN (_EMPTY + 16)
|
|
||||||
#define _SET (_EMPTY + 17)
|
|
||||||
|
|
||||||
#endif /* ! __STDC__ */
|
|
||||||
|
|
||||||
/* Sets are stored in an array in the compiled regexp; the index of the
|
|
||||||
array corresponding to a given set token is given by _SET_INDEX(t). */
|
|
||||||
#define _SET_INDEX(t) ((t) - _SET)
|
|
||||||
|
|
||||||
/* Sometimes characters can only be matched depending on the surrounding
|
/* Sometimes characters can only be matched depending on the surrounding
|
||||||
context. Such context decisions depend on what the previous character
|
context. Such context decisions depend on what the previous character
|
||||||
@ -239,36 +144,36 @@ typedef short _token;
|
|||||||
|
|
||||||
Word-constituent characters are those that satisfy isalnum().
|
Word-constituent characters are those that satisfy isalnum().
|
||||||
|
|
||||||
The macro _SUCCEEDS_IN_CONTEXT determines whether a a given constraint
|
The macro SUCCEEDS_IN_CONTEXT determines whether a a given constraint
|
||||||
succeeds in a particular context. Prevn is true if the previous character
|
succeeds in a particular context. Prevn is true if the previous character
|
||||||
was a newline, currn is true if the lookahead character is a newline.
|
was a newline, currn is true if the lookahead character is a newline.
|
||||||
Prevl and currl similarly depend upon whether the previous and current
|
Prevl and currl similarly depend upon whether the previous and current
|
||||||
characters are word-constituent letters. */
|
characters are word-constituent letters. */
|
||||||
#define _MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
|
#define MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
|
||||||
((constraint) & 1 << ((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4)
|
((constraint) & 1 << (((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4))
|
||||||
#define _MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \
|
#define MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \
|
||||||
((constraint) & 1 << ((prevl) ? 2 : 0) + ((currl) ? 1 : 0))
|
((constraint) & 1 << (((prevl) ? 2 : 0) + ((currl) ? 1 : 0)))
|
||||||
#define _SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \
|
#define SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \
|
||||||
(_MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
|
(MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
|
||||||
&& _MATCHES_LETTER_CONTEXT(constraint, prevl, currl))
|
&& MATCHES_LETTER_CONTEXT(constraint, prevl, currl))
|
||||||
|
|
||||||
/* The following macros give information about what a constraint depends on. */
|
/* The following macros give information about what a constraint depends on. */
|
||||||
#define _PREV_NEWLINE_DEPENDENT(constraint) \
|
#define PREV_NEWLINE_DEPENDENT(constraint) \
|
||||||
(((constraint) & 0xc0) >> 2 != ((constraint) & 0x30))
|
(((constraint) & 0xc0) >> 2 != ((constraint) & 0x30))
|
||||||
#define _PREV_LETTER_DEPENDENT(constraint) \
|
#define PREV_LETTER_DEPENDENT(constraint) \
|
||||||
(((constraint) & 0x0c) >> 2 != ((constraint) & 0x03))
|
(((constraint) & 0x0c) >> 2 != ((constraint) & 0x03))
|
||||||
|
|
||||||
/* Tokens that match the empty string subject to some constraint actually
|
/* Tokens that match the empty string subject to some constraint actually
|
||||||
work by applying that constraint to determine what may follow them,
|
work by applying that constraint to determine what may follow them,
|
||||||
taking into account what has gone before. The following values are
|
taking into account what has gone before. The following values are
|
||||||
the constraints corresponding to the special tokens previously defined. */
|
the constraints corresponding to the special tokens previously defined. */
|
||||||
#define _NO_CONSTRAINT 0xff
|
#define NO_CONSTRAINT 0xff
|
||||||
#define _BEGLINE_CONSTRAINT 0xcf
|
#define BEGLINE_CONSTRAINT 0xcf
|
||||||
#define _ENDLINE_CONSTRAINT 0xaf
|
#define ENDLINE_CONSTRAINT 0xaf
|
||||||
#define _BEGWORD_CONSTRAINT 0xf2
|
#define BEGWORD_CONSTRAINT 0xf2
|
||||||
#define _ENDWORD_CONSTRAINT 0xf4
|
#define ENDWORD_CONSTRAINT 0xf4
|
||||||
#define _LIMWORD_CONSTRAINT 0xf6
|
#define LIMWORD_CONSTRAINT 0xf6
|
||||||
#define _NOTLIMWORD_CONSTRAINT 0xf9
|
#define NOTLIMWORD_CONSTRAINT 0xf9
|
||||||
|
|
||||||
/* States of the recognizer correspond to sets of positions in the parse
|
/* States of the recognizer correspond to sets of positions in the parse
|
||||||
tree, together with the constraints under which they may be matched.
|
tree, together with the constraints under which they may be matched.
|
||||||
@ -278,44 +183,48 @@ typedef struct
|
|||||||
{
|
{
|
||||||
unsigned index; /* Index into the parse array. */
|
unsigned index; /* Index into the parse array. */
|
||||||
unsigned constraint; /* Constraint for matching this position. */
|
unsigned constraint; /* Constraint for matching this position. */
|
||||||
} _position;
|
} position;
|
||||||
|
|
||||||
/* Sets of positions are stored as arrays. */
|
/* Sets of positions are stored as arrays. */
|
||||||
typedef struct
|
typedef struct
|
||||||
{
|
{
|
||||||
_position *elems; /* Elements of this position set. */
|
position *elems; /* Elements of this position set. */
|
||||||
int nelem; /* Number of elements in this set. */
|
int nelem; /* Number of elements in this set. */
|
||||||
} _position_set;
|
} position_set;
|
||||||
|
|
||||||
/* A state of the regexp consists of a set of positions, some flags,
|
/* A state of the dfa consists of a set of positions, some flags,
|
||||||
and the token value of the lowest-numbered position of the state that
|
and the token value of the lowest-numbered position of the state that
|
||||||
contains an _END token. */
|
contains an END token. */
|
||||||
typedef struct
|
typedef struct
|
||||||
{
|
{
|
||||||
int hash; /* Hash of the positions of this state. */
|
int hash; /* Hash of the positions of this state. */
|
||||||
_position_set elems; /* Positions this state could match. */
|
position_set elems; /* Positions this state could match. */
|
||||||
char newline; /* True if previous state matched newline. */
|
char newline; /* True if previous state matched newline. */
|
||||||
char letter; /* True if previous state matched a letter. */
|
char letter; /* True if previous state matched a letter. */
|
||||||
char backref; /* True if this state matches a \<digit>. */
|
char backref; /* True if this state matches a \<digit>. */
|
||||||
unsigned char constraint; /* Constraint for this state to accept. */
|
unsigned char constraint; /* Constraint for this state to accept. */
|
||||||
int first_end; /* Token value of the first _END in elems. */
|
int first_end; /* Token value of the first END in elems. */
|
||||||
} _dfa_state;
|
} dfa_state;
|
||||||
|
|
||||||
/* If an r.e. is at most MUST_MAX characters long, we look for a string which
|
/* Element of a list of strings, at least one of which is known to
|
||||||
must appear in it; whatever's found is dropped into the struct reg. */
|
appear in any R.E. matching the DFA. */
|
||||||
|
struct dfamust
|
||||||
#define MUST_MAX 50
|
{
|
||||||
|
int exact;
|
||||||
|
char *must;
|
||||||
|
struct dfamust *next;
|
||||||
|
};
|
||||||
|
|
||||||
/* A compiled regular expression. */
|
/* A compiled regular expression. */
|
||||||
struct regexp
|
struct dfa
|
||||||
{
|
{
|
||||||
/* Stuff built by the scanner. */
|
/* Stuff built by the scanner. */
|
||||||
_charset *charsets; /* Array of character sets for _SET tokens. */
|
charclass *charclasses; /* Array of character sets for CSET tokens. */
|
||||||
int cindex; /* Index for adding new charsets. */
|
int cindex; /* Index for adding new charclasses. */
|
||||||
int calloc; /* Number of charsets currently allocated. */
|
int calloc; /* Number of charclasses currently allocated. */
|
||||||
|
|
||||||
/* Stuff built by the parser. */
|
/* Stuff built by the parser. */
|
||||||
_token *tokens; /* Postfix parse array. */
|
token *tokens; /* Postfix parse array. */
|
||||||
int tindex; /* Index for adding new tokens. */
|
int tindex; /* Index for adding new tokens. */
|
||||||
int talloc; /* Number of tokens currently allocated. */
|
int talloc; /* Number of tokens currently allocated. */
|
||||||
int depth; /* Depth required of an evaluation stack
|
int depth; /* Depth required of an evaluation stack
|
||||||
@ -323,15 +232,15 @@ struct regexp
|
|||||||
parse tree. */
|
parse tree. */
|
||||||
int nleaves; /* Number of leaves on the parse tree. */
|
int nleaves; /* Number of leaves on the parse tree. */
|
||||||
int nregexps; /* Count of parallel regexps being built
|
int nregexps; /* Count of parallel regexps being built
|
||||||
with regparse(). */
|
with dfaparse(). */
|
||||||
|
|
||||||
/* Stuff owned by the state builder. */
|
/* Stuff owned by the state builder. */
|
||||||
_dfa_state *states; /* States of the regexp. */
|
dfa_state *states; /* States of the dfa. */
|
||||||
int sindex; /* Index for adding new states. */
|
int sindex; /* Index for adding new states. */
|
||||||
int salloc; /* Number of states currently allocated. */
|
int salloc; /* Number of states currently allocated. */
|
||||||
|
|
||||||
/* Stuff built by the structure analyzer. */
|
/* Stuff built by the structure analyzer. */
|
||||||
_position_set *follows; /* Array of follow sets, indexed by position
|
position_set *follows; /* Array of follow sets, indexed by position
|
||||||
index. The follow of a position is the set
|
index. The follow of a position is the set
|
||||||
of positions containing characters that
|
of positions containing characters that
|
||||||
could conceivably follow a character
|
could conceivably follow a character
|
||||||
@ -361,7 +270,7 @@ struct regexp
|
|||||||
int **fails; /* Transition tables after failing to accept
|
int **fails; /* Transition tables after failing to accept
|
||||||
on a state that potentially could do so. */
|
on a state that potentially could do so. */
|
||||||
int *success; /* Table of acceptance conditions used in
|
int *success; /* Table of acceptance conditions used in
|
||||||
regexecute and computed in build_state. */
|
dfaexec and computed in build_state. */
|
||||||
int *newlines; /* Transitions on newlines. The entry for a
|
int *newlines; /* Transitions on newlines. The entry for a
|
||||||
newline in any transition table is always
|
newline in any transition table is always
|
||||||
-1 so we can count lines without wasting
|
-1 so we can count lines without wasting
|
||||||
@ -369,40 +278,41 @@ struct regexp
|
|||||||
newline is stored separately and handled
|
newline is stored separately and handled
|
||||||
as a special case. Newline is also used
|
as a special case. Newline is also used
|
||||||
as a sentinel at the end of the buffer. */
|
as a sentinel at the end of the buffer. */
|
||||||
char must[MUST_MAX];
|
struct dfamust *musts; /* List of strings, at least one of which
|
||||||
int mustn;
|
is known to appear in any r.e. matching
|
||||||
|
the dfa. */
|
||||||
};
|
};
|
||||||
|
|
||||||
/* Some macros for user access to regexp internals. */
|
/* Some macros for user access to dfa internals. */
|
||||||
|
|
||||||
/* ACCEPTING returns true if s could possibly be an accepting state of r. */
|
/* ACCEPTING returns true if s could possibly be an accepting state of r. */
|
||||||
#define ACCEPTING(s, r) ((r).states[s].constraint)
|
#define ACCEPTING(s, r) ((r).states[s].constraint)
|
||||||
|
|
||||||
/* ACCEPTS_IN_CONTEXT returns true if the given state accepts in the
|
/* ACCEPTS_IN_CONTEXT returns true if the given state accepts in the
|
||||||
specified context. */
|
specified context. */
|
||||||
#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, reg) \
|
#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, dfa) \
|
||||||
_SUCCEEDS_IN_CONTEXT((reg).states[state].constraint, \
|
SUCCEEDS_IN_CONTEXT((dfa).states[state].constraint, \
|
||||||
prevn, currn, prevl, currl)
|
prevn, currn, prevl, currl)
|
||||||
|
|
||||||
/* FIRST_MATCHING_REGEXP returns the index number of the first of parallel
|
/* FIRST_MATCHING_REGEXP returns the index number of the first of parallel
|
||||||
regexps that a given state could accept. Parallel regexps are numbered
|
regexps that a given state could accept. Parallel regexps are numbered
|
||||||
starting at 1. */
|
starting at 1. */
|
||||||
#define FIRST_MATCHING_REGEXP(state, reg) (-(reg).states[state].first_end)
|
#define FIRST_MATCHING_REGEXP(state, dfa) (-(dfa).states[state].first_end)
|
||||||
|
|
||||||
/* Entry points. */
|
/* Entry points. */
|
||||||
|
|
||||||
#if __STDC__
|
#if __STDC__
|
||||||
|
|
||||||
/* Regsyntax() takes two arguments; the first sets the syntax bits described
|
/* dfasyntax() takes two arguments; the first sets the syntax bits described
|
||||||
earlier in this file, and the second sets the case-folding flag. */
|
earlier in this file, and the second sets the case-folding flag. */
|
||||||
extern void regsyntax(int, int);
|
extern void dfasyntax(int, int);
|
||||||
|
|
||||||
/* Compile the given string of the given length into the given struct regexp.
|
/* Compile the given string of the given length into the given struct dfa.
|
||||||
Final argument is a flag specifying whether to build a searching or an
|
Final argument is a flag specifying whether to build a searching or an
|
||||||
exact matcher. */
|
exact matcher. */
|
||||||
extern void regcompile(const char *, size_t, struct regexp *, int);
|
extern void dfacomp(char *, size_t, struct dfa *, int);
|
||||||
|
|
||||||
/* Execute the given struct regexp on the buffer of characters. The
|
/* Execute the given struct dfa on the buffer of characters. The
|
||||||
first char * points to the beginning, and the second points to the
|
first char * points to the beginning, and the second points to the
|
||||||
first character after the end of the buffer, which must be a writable
|
first character after the end of the buffer, which must be a writable
|
||||||
place so a sentinel end-of-buffer marker can be stored there. The
|
place so a sentinel end-of-buffer marker can be stored there. The
|
||||||
@ -414,37 +324,37 @@ extern void regcompile(const char *, size_t, struct regexp *, int);
|
|||||||
order to verify backreferencing; otherwise the flag will be cleared.
|
order to verify backreferencing; otherwise the flag will be cleared.
|
||||||
Returns NULL if no match is found, or a pointer to the first
|
Returns NULL if no match is found, or a pointer to the first
|
||||||
character after the first & shortest matching string in the buffer. */
|
character after the first & shortest matching string in the buffer. */
|
||||||
extern char *regexecute(struct regexp *, char *, char *, int, int *, int *);
|
extern char *dfaexec(struct dfa *, char *, char *, int, int *, int *);
|
||||||
|
|
||||||
/* Free the storage held by the components of a struct regexp. */
|
/* Free the storage held by the components of a struct dfa. */
|
||||||
extern void regfree(struct regexp *);
|
extern void dfafree(struct dfa *);
|
||||||
|
|
||||||
/* Entry points for people who know what they're doing. */
|
/* Entry points for people who know what they're doing. */
|
||||||
|
|
||||||
/* Initialize the components of a struct regexp. */
|
/* Initialize the components of a struct dfa. */
|
||||||
extern void reginit(struct regexp *);
|
extern void dfainit(struct dfa *);
|
||||||
|
|
||||||
/* Incrementally parse a string of given length into a struct regexp. */
|
/* Incrementally parse a string of given length into a struct dfa. */
|
||||||
extern void regparse(const char *, size_t, struct regexp *);
|
extern void dfaparse(char *, size_t, struct dfa *);
|
||||||
|
|
||||||
/* Analyze a parsed regexp; second argument tells whether to build a searching
|
/* Analyze a parsed regexp; second argument tells whether to build a searching
|
||||||
or an exact matcher. */
|
or an exact matcher. */
|
||||||
extern void reganalyze(struct regexp *, int);
|
extern void dfaanalyze(struct dfa *, int);
|
||||||
|
|
||||||
/* Compute, for each possible character, the transitions out of a given
|
/* Compute, for each possible character, the transitions out of a given
|
||||||
state, storing them in an array of integers. */
|
state, storing them in an array of integers. */
|
||||||
extern void regstate(int, struct regexp *, int []);
|
extern void dfastate(int, struct dfa *, int []);
|
||||||
|
|
||||||
/* Error handling. */
|
/* Error handling. */
|
||||||
|
|
||||||
/* Regerror() is called by the regexp routines whenever an error occurs. It
|
/* dfaerror() is called by the regexp routines whenever an error occurs. It
|
||||||
takes a single argument, a NUL-terminated string describing the error.
|
takes a single argument, a NUL-terminated string describing the error.
|
||||||
The default regerror() prints the error message to stderr and exits.
|
The default dfaerror() prints the error message to stderr and exits.
|
||||||
The user can provide a different regfree() if so desired. */
|
The user can provide a different dfafree() if so desired. */
|
||||||
extern void regerror(const char *);
|
extern void dfaerror(char *);
|
||||||
|
|
||||||
#else /* ! __STDC__ */
|
#else /* ! __STDC__ */
|
||||||
extern void regsyntax(), regcompile(), regfree(), reginit(), regparse();
|
extern void dfasyntax(), dfacomp(), dfafree(), dfainit(), dfaparse();
|
||||||
extern void reganalyze(), regstate(), regerror();
|
extern void dfaanalyze(), dfastate(), dfaerror();
|
||||||
extern char *regexecute();
|
extern char *dfaexec();
|
||||||
#endif /* ! __STDC__ */
|
#endif /* ! __STDC__ */
|
||||||
|
@ -3,12 +3,13 @@
|
|||||||
"Keep this file name-space clean" means, talk to roland@gnu.ai.mit.edu
|
"Keep this file name-space clean" means, talk to roland@gnu.ai.mit.edu
|
||||||
before changing it!
|
before changing it!
|
||||||
|
|
||||||
Copyright (C) 1987, 88, 89, 90, 91, 1992 Free Software Foundation, Inc.
|
Copyright (C) 1987, 88, 89, 90, 91, 92, 1993
|
||||||
|
Free Software Foundation, Inc.
|
||||||
|
|
||||||
This program is free software; you can redistribute it and/or modify
|
This program is free software; you can redistribute it and/or modify it
|
||||||
it under the terms of the GNU General Public License as published by
|
under the terms of the GNU General Public License as published by the
|
||||||
the Free Software Foundation; either version 2, or (at your option)
|
Free Software Foundation; either version 2, or (at your option) any
|
||||||
any later version.
|
later version.
|
||||||
|
|
||||||
This program is distributed in the hope that it will be useful,
|
This program is distributed in the hope that it will be useful,
|
||||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
@ -17,49 +18,67 @@
|
|||||||
|
|
||||||
You should have received a copy of the GNU General Public License
|
You should have received a copy of the GNU General Public License
|
||||||
along with this program; if not, write to the Free Software
|
along with this program; if not, write to the Free Software
|
||||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||||
|
|
||||||
/* AIX requires this to be the first thing in the file. */
|
/* NOTE!!! AIX requires this to be the first thing in the file.
|
||||||
|
Do not put ANYTHING before it! */
|
||||||
|
#if !defined (__GNUC__) && defined (_AIX)
|
||||||
|
#pragma alloca
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef HAVE_CONFIG_H
|
||||||
|
#include "config.h"
|
||||||
|
#endif
|
||||||
|
|
||||||
#ifdef __GNUC__
|
#ifdef __GNUC__
|
||||||
#define alloca __builtin_alloca
|
#define alloca __builtin_alloca
|
||||||
#else /* not __GNUC__ */
|
#else /* not __GNUC__ */
|
||||||
#if defined(sparc) && !defined(USG) && !defined(SVR4) && !defined(__svr4__)
|
#if defined (HAVE_ALLOCA_H) || (defined(sparc) && (defined(sun) || (!defined(USG) && !defined(SVR4) && !defined(__svr4__))))
|
||||||
#include <alloca.h>
|
#include <alloca.h>
|
||||||
#else
|
#else
|
||||||
#ifdef _AIX
|
#ifndef _AIX
|
||||||
#pragma alloca
|
|
||||||
#else
|
|
||||||
char *alloca ();
|
char *alloca ();
|
||||||
#endif
|
#endif
|
||||||
#endif /* sparc */
|
#endif /* alloca.h */
|
||||||
#endif /* not __GNUC__ */
|
#endif /* not __GNUC__ */
|
||||||
|
|
||||||
#ifdef LIBC
|
#if !__STDC__ && !defined(const) && IN_GCC
|
||||||
/* For when compiled as part of the GNU C library. */
|
#define const
|
||||||
#include <ansidecl.h>
|
#endif
|
||||||
|
|
||||||
|
/* This tells Alpha OSF/1 not to define a getopt prototype in <stdio.h>. */
|
||||||
|
#ifndef _NO_PROTO
|
||||||
|
#define _NO_PROTO
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#include <stdio.h>
|
#include <stdio.h>
|
||||||
|
|
||||||
|
/* Comment out all this code if we are using the GNU C Library, and are not
|
||||||
|
actually compiling the library itself. This code is part of the GNU C
|
||||||
|
Library, but also included in many other GNU distributions. Compiling
|
||||||
|
and linking in this code is a waste when using the GNU C library
|
||||||
|
(especially if it is a shared library). Rather than having every GNU
|
||||||
|
program understand `configure --with-gnu-libc' and omit the object files,
|
||||||
|
it is simpler to just do this in the source for each such file. */
|
||||||
|
|
||||||
|
#if defined (_LIBC) || !defined (__GNU_LIBRARY__)
|
||||||
|
|
||||||
|
|
||||||
/* This needs to come after some library #include
|
/* This needs to come after some library #include
|
||||||
to get __GNU_LIBRARY__ defined. */
|
to get __GNU_LIBRARY__ defined. */
|
||||||
#ifdef __GNU_LIBRARY__
|
#ifdef __GNU_LIBRARY__
|
||||||
#undef alloca
|
#undef alloca
|
||||||
|
/* Don't include stdlib.h for non-GNU C libraries because some of them
|
||||||
|
contain conflicting prototypes for getopt. */
|
||||||
#include <stdlib.h>
|
#include <stdlib.h>
|
||||||
#include <string.h>
|
|
||||||
#else /* Not GNU C library. */
|
#else /* Not GNU C library. */
|
||||||
#define __alloca alloca
|
#define __alloca alloca
|
||||||
#endif /* GNU C library. */
|
#endif /* GNU C library. */
|
||||||
|
|
||||||
|
|
||||||
#ifndef __STDC__
|
|
||||||
#define const
|
|
||||||
#endif
|
|
||||||
|
|
||||||
/* If GETOPT_COMPAT is defined, `+' as well as `--' can introduce a
|
/* If GETOPT_COMPAT is defined, `+' as well as `--' can introduce a
|
||||||
long-named option. Because this is not POSIX.2 compliant, it is
|
long-named option. Because this is not POSIX.2 compliant, it is
|
||||||
being phased out. */
|
being phased out. */
|
||||||
#define GETOPT_COMPAT
|
/* #define GETOPT_COMPAT */
|
||||||
|
|
||||||
/* This version of `getopt' appears to the caller like standard Unix `getopt'
|
/* This version of `getopt' appears to the caller like standard Unix `getopt'
|
||||||
but it behaves differently for the user, since it allows the user
|
but it behaves differently for the user, since it allows the user
|
||||||
@ -97,6 +116,7 @@ char *optarg = 0;
|
|||||||
Otherwise, `optind' communicates from one call to the next
|
Otherwise, `optind' communicates from one call to the next
|
||||||
how much of ARGV has been scanned so far. */
|
how much of ARGV has been scanned so far. */
|
||||||
|
|
||||||
|
/* XXX 1003.2 says this must be 1 before any call. */
|
||||||
int optind = 0;
|
int optind = 0;
|
||||||
|
|
||||||
/* The next char to be scanned in the option-element
|
/* The next char to be scanned in the option-element
|
||||||
@ -113,6 +133,12 @@ static char *nextchar;
|
|||||||
|
|
||||||
int opterr = 1;
|
int opterr = 1;
|
||||||
|
|
||||||
|
/* Set to an option character which was unrecognized.
|
||||||
|
This must be initialized on some systems to avoid linking in the
|
||||||
|
system's own getopt implementation. */
|
||||||
|
|
||||||
|
int optopt = '?';
|
||||||
|
|
||||||
/* Describe how to deal with options that follow non-option ARGV-elements.
|
/* Describe how to deal with options that follow non-option ARGV-elements.
|
||||||
|
|
||||||
If the caller did not specify anything,
|
If the caller did not specify anything,
|
||||||
@ -148,6 +174,10 @@ static enum
|
|||||||
} ordering;
|
} ordering;
|
||||||
|
|
||||||
#ifdef __GNU_LIBRARY__
|
#ifdef __GNU_LIBRARY__
|
||||||
|
/* We want to avoid inclusion of string.h with non-GNU libraries
|
||||||
|
because there are many ways it can cause trouble.
|
||||||
|
On some systems, it contains special magic macros that don't work
|
||||||
|
in GCC. */
|
||||||
#include <string.h>
|
#include <string.h>
|
||||||
#define my_index strchr
|
#define my_index strchr
|
||||||
#define my_bcopy(src, dst, n) memcpy ((dst), (src), (n))
|
#define my_bcopy(src, dst, n) memcpy ((dst), (src), (n))
|
||||||
@ -159,22 +189,23 @@ static enum
|
|||||||
char *getenv ();
|
char *getenv ();
|
||||||
|
|
||||||
static char *
|
static char *
|
||||||
my_index (string, chr)
|
my_index (str, chr)
|
||||||
char *string;
|
const char *str;
|
||||||
int chr;
|
int chr;
|
||||||
{
|
{
|
||||||
while (*string)
|
while (*str)
|
||||||
{
|
{
|
||||||
if (*string == chr)
|
if (*str == chr)
|
||||||
return string;
|
return (char *) str;
|
||||||
string++;
|
str++;
|
||||||
}
|
}
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
my_bcopy (from, to, size)
|
my_bcopy (from, to, size)
|
||||||
char *from, *to;
|
const char *from;
|
||||||
|
char *to;
|
||||||
int size;
|
int size;
|
||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
@ -210,10 +241,12 @@ exchange (argv)
|
|||||||
|
|
||||||
/* Interchange the two blocks of data in ARGV. */
|
/* Interchange the two blocks of data in ARGV. */
|
||||||
|
|
||||||
my_bcopy (&argv[first_nonopt], temp, nonopts_size);
|
my_bcopy ((char *) &argv[first_nonopt], (char *) temp, nonopts_size);
|
||||||
my_bcopy (&argv[last_nonopt], &argv[first_nonopt],
|
my_bcopy ((char *) &argv[last_nonopt], (char *) &argv[first_nonopt],
|
||||||
(optind - last_nonopt) * sizeof (char *));
|
(optind - last_nonopt) * sizeof (char *));
|
||||||
my_bcopy (temp, &argv[first_nonopt + optind - last_nonopt], nonopts_size);
|
my_bcopy ((char *) temp,
|
||||||
|
(char *) &argv[first_nonopt + optind - last_nonopt],
|
||||||
|
nonopts_size);
|
||||||
|
|
||||||
/* Update records for the slots the non-options now occupy. */
|
/* Update records for the slots the non-options now occupy. */
|
||||||
|
|
||||||
@ -457,7 +490,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
|||||||
if (*s)
|
if (*s)
|
||||||
{
|
{
|
||||||
/* Don't test has_arg with >, because some C compilers don't
|
/* Don't test has_arg with >, because some C compilers don't
|
||||||
allow it to be used on enums. */
|
allow it to be used on enums. */
|
||||||
if (pfound->has_arg)
|
if (pfound->has_arg)
|
||||||
optarg = s + 1;
|
optarg = s + 1;
|
||||||
else
|
else
|
||||||
@ -489,7 +522,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
|||||||
fprintf (stderr, "%s: option `%s' requires an argument\n",
|
fprintf (stderr, "%s: option `%s' requires an argument\n",
|
||||||
argv[0], argv[optind - 1]);
|
argv[0], argv[optind - 1]);
|
||||||
nextchar += strlen (nextchar);
|
nextchar += strlen (nextchar);
|
||||||
return '?';
|
return optstring[0] == ':' ? ':' : '?';
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
nextchar += strlen (nextchar);
|
nextchar += strlen (nextchar);
|
||||||
@ -505,7 +538,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
|||||||
/* Can't find it as a long option. If this is not getopt_long_only,
|
/* Can't find it as a long option. If this is not getopt_long_only,
|
||||||
or the option starts with '--' or is not a valid short
|
or the option starts with '--' or is not a valid short
|
||||||
option, then it's an error.
|
option, then it's an error.
|
||||||
Otherwise interpret it as a short option. */
|
Otherwise interpret it as a short option. */
|
||||||
if (!long_only || argv[optind][1] == '-'
|
if (!long_only || argv[optind][1] == '-'
|
||||||
#ifdef GETOPT_COMPAT
|
#ifdef GETOPT_COMPAT
|
||||||
|| argv[optind][0] == '+'
|
|| argv[optind][0] == '+'
|
||||||
@ -523,7 +556,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
|||||||
fprintf (stderr, "%s: unrecognized option `%c%s'\n",
|
fprintf (stderr, "%s: unrecognized option `%c%s'\n",
|
||||||
argv[0], argv[optind][0], nextchar);
|
argv[0], argv[optind][0], nextchar);
|
||||||
}
|
}
|
||||||
nextchar += strlen (nextchar);
|
nextchar = (char *) "";
|
||||||
optind++;
|
optind++;
|
||||||
return '?';
|
return '?';
|
||||||
}
|
}
|
||||||
@ -537,18 +570,24 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
|||||||
|
|
||||||
/* Increment `optind' when we start to process its last character. */
|
/* Increment `optind' when we start to process its last character. */
|
||||||
if (*nextchar == '\0')
|
if (*nextchar == '\0')
|
||||||
optind++;
|
++optind;
|
||||||
|
|
||||||
if (temp == NULL || c == ':')
|
if (temp == NULL || c == ':')
|
||||||
{
|
{
|
||||||
if (opterr)
|
if (opterr)
|
||||||
{
|
{
|
||||||
|
#if 0
|
||||||
if (c < 040 || c >= 0177)
|
if (c < 040 || c >= 0177)
|
||||||
fprintf (stderr, "%s: unrecognized option, character code 0%o\n",
|
fprintf (stderr, "%s: unrecognized option, character code 0%o\n",
|
||||||
argv[0], c);
|
argv[0], c);
|
||||||
else
|
else
|
||||||
fprintf (stderr, "%s: unrecognized option `-%c'\n", argv[0], c);
|
fprintf (stderr, "%s: unrecognized option `-%c'\n", argv[0], c);
|
||||||
|
#else
|
||||||
|
/* 1003.2 specifies the format of this message. */
|
||||||
|
fprintf (stderr, "%s: illegal option -- %c\n", argv[0], c);
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
|
optopt = c;
|
||||||
return '?';
|
return '?';
|
||||||
}
|
}
|
||||||
if (temp[1] == ':')
|
if (temp[1] == ':')
|
||||||
@ -568,7 +607,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
|||||||
else
|
else
|
||||||
{
|
{
|
||||||
/* This is an option that requires an argument. */
|
/* This is an option that requires an argument. */
|
||||||
if (*nextchar != 0)
|
if (*nextchar != '\0')
|
||||||
{
|
{
|
||||||
optarg = nextchar;
|
optarg = nextchar;
|
||||||
/* If we end this ARGV-element by taking the rest as an arg,
|
/* If we end this ARGV-element by taking the rest as an arg,
|
||||||
@ -578,9 +617,21 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
|
|||||||
else if (optind == argc)
|
else if (optind == argc)
|
||||||
{
|
{
|
||||||
if (opterr)
|
if (opterr)
|
||||||
fprintf (stderr, "%s: option `-%c' requires an argument\n",
|
{
|
||||||
argv[0], c);
|
#if 0
|
||||||
c = '?';
|
fprintf (stderr, "%s: option `-%c' requires an argument\n",
|
||||||
|
argv[0], c);
|
||||||
|
#else
|
||||||
|
/* 1003.2 specifies the format of this message. */
|
||||||
|
fprintf (stderr, "%s: option requires an argument -- %c\n",
|
||||||
|
argv[0], c);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
optopt = c;
|
||||||
|
if (optstring[0] == ':')
|
||||||
|
c = ':';
|
||||||
|
else
|
||||||
|
c = '?';
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
/* We already incremented `optind' once;
|
/* We already incremented `optind' once;
|
||||||
@ -604,6 +655,8 @@ getopt (argc, argv, optstring)
|
|||||||
(int *) 0,
|
(int *) 0,
|
||||||
0);
|
0);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#endif /* _LIBC or not __GNU_LIBRARY__. */
|
||||||
|
|
||||||
#ifdef TEST
|
#ifdef TEST
|
||||||
|
|
||||||
|
@ -1,10 +1,10 @@
|
|||||||
/* Declarations for getopt.
|
/* Declarations for getopt.
|
||||||
Copyright (C) 1989, 1990, 1991, 1992 Free Software Foundation, Inc.
|
Copyright (C) 1989, 1990, 1991, 1992, 1993 Free Software Foundation, Inc.
|
||||||
|
|
||||||
This program is free software; you can redistribute it and/or modify
|
This program is free software; you can redistribute it and/or modify it
|
||||||
it under the terms of the GNU General Public License as published by
|
under the terms of the GNU General Public License as published by the
|
||||||
the Free Software Foundation; either version 2, or (at your option)
|
Free Software Foundation; either version 2, or (at your option) any
|
||||||
any later version.
|
later version.
|
||||||
|
|
||||||
This program is distributed in the hope that it will be useful,
|
This program is distributed in the hope that it will be useful,
|
||||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
@ -13,11 +13,15 @@
|
|||||||
|
|
||||||
You should have received a copy of the GNU General Public License
|
You should have received a copy of the GNU General Public License
|
||||||
along with this program; if not, write to the Free Software
|
along with this program; if not, write to the Free Software
|
||||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||||
|
|
||||||
#ifndef _GETOPT_H
|
#ifndef _GETOPT_H
|
||||||
#define _GETOPT_H 1
|
#define _GETOPT_H 1
|
||||||
|
|
||||||
|
#ifdef __cplusplus
|
||||||
|
extern "C" {
|
||||||
|
#endif
|
||||||
|
|
||||||
/* For communication from `getopt' to the caller.
|
/* For communication from `getopt' to the caller.
|
||||||
When `getopt' finds an option that takes an argument,
|
When `getopt' finds an option that takes an argument,
|
||||||
the argument value is returned here.
|
the argument value is returned here.
|
||||||
@ -45,6 +49,10 @@ extern int optind;
|
|||||||
|
|
||||||
extern int opterr;
|
extern int opterr;
|
||||||
|
|
||||||
|
/* Set to an option character which was unrecognized. */
|
||||||
|
|
||||||
|
extern int optopt;
|
||||||
|
|
||||||
/* Describe the long-named options requested by the application.
|
/* Describe the long-named options requested by the application.
|
||||||
The LONG_OPTIONS argument to getopt_long or getopt_long_only is a vector
|
The LONG_OPTIONS argument to getopt_long or getopt_long_only is a vector
|
||||||
of `struct option' terminated by an element containing a name which is
|
of `struct option' terminated by an element containing a name which is
|
||||||
@ -82,15 +90,19 @@ struct option
|
|||||||
|
|
||||||
/* Names for the values of the `has_arg' field of `struct option'. */
|
/* Names for the values of the `has_arg' field of `struct option'. */
|
||||||
|
|
||||||
enum _argtype
|
#define no_argument 0
|
||||||
{
|
#define required_argument 1
|
||||||
no_argument,
|
#define optional_argument 2
|
||||||
required_argument,
|
|
||||||
optional_argument
|
|
||||||
};
|
|
||||||
|
|
||||||
#if __STDC__
|
#if __STDC__
|
||||||
|
#if defined(__GNU_LIBRARY__)
|
||||||
|
/* Many other libraries have conflicting prototypes for getopt, with
|
||||||
|
differences in the consts, in stdlib.h. To avoid compilation
|
||||||
|
errors, only prototype getopt for the GNU C library. */
|
||||||
extern int getopt (int argc, char *const *argv, const char *shortopts);
|
extern int getopt (int argc, char *const *argv, const char *shortopts);
|
||||||
|
#else /* not __GNU_LIBRARY__ */
|
||||||
|
extern int getopt ();
|
||||||
|
#endif /* not __GNU_LIBRARY__ */
|
||||||
extern int getopt_long (int argc, char *const *argv, const char *shortopts,
|
extern int getopt_long (int argc, char *const *argv, const char *shortopts,
|
||||||
const struct option *longopts, int *longind);
|
const struct option *longopts, int *longind);
|
||||||
extern int getopt_long_only (int argc, char *const *argv,
|
extern int getopt_long_only (int argc, char *const *argv,
|
||||||
@ -110,4 +122,8 @@ extern int getopt_long_only ();
|
|||||||
extern int _getopt_internal ();
|
extern int _getopt_internal ();
|
||||||
#endif /* not __STDC__ */
|
#endif /* not __STDC__ */
|
||||||
|
|
||||||
|
#ifdef __cplusplus
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
#endif /* _GETOPT_H */
|
#endif /* _GETOPT_H */
|
||||||
|
42
gnu/usr.bin/grep/getpagesize.h
Normal file
42
gnu/usr.bin/grep/getpagesize.h
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
#ifdef BSD
|
||||||
|
#ifndef BSD4_1
|
||||||
|
#define HAVE_GETPAGESIZE
|
||||||
|
#endif
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef HAVE_GETPAGESIZE
|
||||||
|
|
||||||
|
#ifdef VMS
|
||||||
|
#define getpagesize() 512
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef HAVE_UNISTD_H
|
||||||
|
#include <unistd.h>
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef _SC_PAGESIZE
|
||||||
|
#define getpagesize() sysconf(_SC_PAGESIZE)
|
||||||
|
#else
|
||||||
|
|
||||||
|
#ifdef HAVE_SYS_PARAM_H
|
||||||
|
#include <sys/param.h>
|
||||||
|
|
||||||
|
#ifdef EXEC_PAGESIZE
|
||||||
|
#define getpagesize() EXEC_PAGESIZE
|
||||||
|
#else
|
||||||
|
#ifdef NBPG
|
||||||
|
#define getpagesize() NBPG * CLSIZE
|
||||||
|
#ifndef CLSIZE
|
||||||
|
#define CLSIZE 1
|
||||||
|
#endif /* no CLSIZE */
|
||||||
|
#else /* no NBPG */
|
||||||
|
#define getpagesize() NBPC
|
||||||
|
#endif /* no NBPG */
|
||||||
|
#endif /* no EXEC_PAGESIZE */
|
||||||
|
#else /* !HAVE_SYS_PARAM_H */
|
||||||
|
#define getpagesize() 8192 /* punt totally */
|
||||||
|
#endif /* !HAVE_SYS_PARAM_H */
|
||||||
|
#endif /* no _SC_PAGESIZE */
|
||||||
|
|
||||||
|
#endif /* not HAVE_GETPAGESIZE */
|
||||||
|
|
@ -1,234 +1,375 @@
|
|||||||
.TH GREP 1 "1988 December 13" "GNU Project" \" -*- nroff -*-
|
.TH GREP 1 "1992 September 10" "GNU Project"
|
||||||
.UC 4
|
|
||||||
.SH NAME
|
.SH NAME
|
||||||
grep, egrep \- print lines matching a regular expression
|
grep, egrep, fgrep \- print lines matching a pattern
|
||||||
.SH SYNOPSIS
|
.SH SYNOPOSIS
|
||||||
.B grep
|
.B grep
|
||||||
[
|
[
|
||||||
.B \-CVbchilnsvwx
|
.BR \- [[ AB "] ]\c"
|
||||||
] [
|
.I "num"
|
||||||
.BI \- num
|
]
|
||||||
] [
|
[
|
||||||
.B \-AB
|
.BR \- [ CEFGVBchilnsvwx ]
|
||||||
.I num
|
]
|
||||||
] [ [
|
[
|
||||||
.B \-e
|
.B \-e
|
||||||
]
|
]
|
||||||
.I expr
|
.I pattern
|
||||||
|
|
|
|
||||||
.B \-f
|
.BI \-f file
|
||||||
.I file
|
|
||||||
] [
|
] [
|
||||||
.I "files ..."
|
.I files...
|
||||||
]
|
]
|
||||||
.SH DESCRIPTION
|
.SH DESCRIPTION
|
||||||
.I Grep
|
|
||||||
searches the files listed in the arguments (or standard
|
|
||||||
input if no files are given) for all lines that contain a match for
|
|
||||||
the given
|
|
||||||
.IR expr .
|
|
||||||
If any lines match, they are printed.
|
|
||||||
.PP
|
.PP
|
||||||
Also, if any matches were found,
|
.B Grep
|
||||||
.I grep
|
searches the named input
|
||||||
exits with a status of 0, but if no matches were found it exits
|
.I files
|
||||||
with a status of 1. This is useful for building shell scripts that
|
(or standard input if no files are named, or
|
||||||
use
|
the file name
|
||||||
.I grep
|
.B \-
|
||||||
as a condition for, for example, the
|
is given)
|
||||||
.I if
|
for lines containing a match to the given
|
||||||
statement.
|
.IR pattern .
|
||||||
|
By default,
|
||||||
|
.B grep
|
||||||
|
prints the matching lines.
|
||||||
.PP
|
.PP
|
||||||
When invoked as
|
There are three major variants of
|
||||||
.I egrep
|
.BR grep ,
|
||||||
the syntax of the
|
controlled by the following options.
|
||||||
.I expr
|
.PD 0
|
||||||
is slightly different; See below.
|
|
||||||
.br
|
|
||||||
.SH "REGULAR EXPRESSIONS"
|
|
||||||
.RS 2.5i
|
|
||||||
.ta 1i 2i
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
(grep) (egrep) (explanation)
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\fIc\fP \fIc\fP a single (non-meta) character matches itself.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\&. . matches any single character except newline.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\? ? postfix operator; preceeding item is optional.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\(** \(** postfix operator; preceeding item 0 or
|
|
||||||
more times.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\+ + postfix operator; preceeding item 1 or
|
|
||||||
more times.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\| | infix operator; matches either
|
|
||||||
argument.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
^ ^ matches the empty string at the beginning of a line.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
$ $ matches the empty string at the end of a line.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\< \\< matches the empty string at the beginning of a word.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\> \\> matches the empty string at the end of a word.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
[\fIchars\fP] [\fIchars\fP] match any character in the given class; if the
|
|
||||||
first character after [ is ^, match any character
|
|
||||||
not in the given class; a range of characters may
|
|
||||||
be specified by \fIfirst\-last\fP; for example, \\W
|
|
||||||
(below) is equivalent to the class [^A\-Za\-z0\-9]
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\( \\) ( ) parentheses are used to override operator precedence.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\\fIdigit\fP \\\fIdigit\fP \\\fIn\fP matches a repeat of the text
|
|
||||||
matched earlier in the regexp by the subexpression inside the nth
|
|
||||||
opening parenthesis.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\ \\ any special character may be preceded
|
|
||||||
by a backslash to match it literally.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
(the following are for compatibility with GNU Emacs)
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\b \\b matches the empty string at the edge of a word.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\B \\B matches the empty string if not at the edge of a word.
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\w \\w matches word-constituent characters (letters & digits).
|
|
||||||
.sp
|
|
||||||
.ti -2.0i
|
|
||||||
\\W \\W matches characters that are not word-constituent.
|
|
||||||
.RE
|
|
||||||
.PP
|
|
||||||
Operator precedence is (highest to lowest) ?, \(**, and +, concatenation,
|
|
||||||
and finally |. All other constructs are syntactically identical to
|
|
||||||
normal characters. For the truly interested, the file dfa.c describes
|
|
||||||
(and implements) the exact grammar understood by the parser.
|
|
||||||
.SH OPTIONS
|
|
||||||
.TP
|
.TP
|
||||||
.BI \-A " num"
|
.B \-G
|
||||||
print <num> lines of context after every matching line
|
Interpret
|
||||||
|
.I pattern
|
||||||
|
as a basic regular expression (see below). This is the default.
|
||||||
.TP
|
.TP
|
||||||
.BI \-B " num"
|
.B \-E
|
||||||
print
|
Interpret
|
||||||
.I num
|
.I pattern
|
||||||
lines of context before every matching line
|
as an extended regular expression (see below).
|
||||||
.TP
|
.TP
|
||||||
.B \-C
|
.B \-F
|
||||||
print 2 lines of context on each side of every match
|
Interpret
|
||||||
|
.I pattern
|
||||||
|
as a list of fixed strings, separated by newlines,
|
||||||
|
any of which is to be matched.
|
||||||
|
.LP
|
||||||
|
In addition, two variant programs
|
||||||
|
.B egrep
|
||||||
|
and
|
||||||
|
.B fgrep
|
||||||
|
are available.
|
||||||
|
.B Egrep
|
||||||
|
is similiar (but not identical) to
|
||||||
|
.BR "grep\ \-E" ,
|
||||||
|
and is compatible with the historical Unix
|
||||||
|
.BR egrep .
|
||||||
|
.B Fgrep
|
||||||
|
is the same as
|
||||||
|
.BR "grep\ \-F" .
|
||||||
|
.PD
|
||||||
|
.LP
|
||||||
|
All variants of
|
||||||
|
.B grep
|
||||||
|
understand the following options:
|
||||||
|
.PD 0
|
||||||
.TP
|
.TP
|
||||||
.BI \- num
|
.BI \- num
|
||||||
print
|
Matches will be printed with
|
||||||
.I num
|
.I num
|
||||||
lines of context on each side of every match
|
lines of leading and trailing context. However,
|
||||||
|
.B grep
|
||||||
|
will never print any given line more than once.
|
||||||
|
.TP
|
||||||
|
.BI \-A " num"
|
||||||
|
Print
|
||||||
|
.I num
|
||||||
|
lines of trailing context after matching lines.
|
||||||
|
.TP
|
||||||
|
.BI \-B " num"
|
||||||
|
Print
|
||||||
|
.I num
|
||||||
|
lines of leading context before matching lines.
|
||||||
|
.TP
|
||||||
|
.B \-C
|
||||||
|
Equivalent to
|
||||||
|
.BR \-2 .
|
||||||
.TP
|
.TP
|
||||||
.B \-V
|
.B \-V
|
||||||
print the version number on the diagnostic output
|
Print the version number of
|
||||||
|
.B grep
|
||||||
|
to standard error. This version number should
|
||||||
|
be included in all bug reports (see below).
|
||||||
.TP
|
.TP
|
||||||
.B \-b
|
.B \-b
|
||||||
print every match preceded by its byte offset
|
Print the byte offset within the input file before
|
||||||
|
each line of output.
|
||||||
.TP
|
.TP
|
||||||
.B \-c
|
.B \-c
|
||||||
print a total count of matching lines only
|
Suppress normal output; instead print a count of
|
||||||
|
matching lines for each input file.
|
||||||
|
With the
|
||||||
|
.B \-v
|
||||||
|
option (see below), count non-matching lines.
|
||||||
.TP
|
.TP
|
||||||
.BI \-e " expr"
|
.BI \-e " pattern"
|
||||||
search for
|
Use
|
||||||
.IR expr ;
|
.I pattern
|
||||||
useful if
|
as the pattern; useful to protect patterns beginning with
|
||||||
.I expr
|
.BR \- .
|
||||||
begins with \-
|
|
||||||
.TP
|
.TP
|
||||||
.BI \-f " file"
|
.BI \-f " file"
|
||||||
search for the expression contained in
|
Obtain the pattern from
|
||||||
.I file
|
.IR file .
|
||||||
.TP
|
.TP
|
||||||
.B \-h
|
.B \-h
|
||||||
don't display filenames on matches
|
Suppress the prefixing of filenames on output
|
||||||
|
when multiple files are searched.
|
||||||
.TP
|
.TP
|
||||||
.B \-i
|
.B \-i
|
||||||
ignore case difference when comparing strings
|
Ignore case distinctions in both the
|
||||||
|
.I pattern
|
||||||
|
and the input files.
|
||||||
|
.TP
|
||||||
|
.B \-L
|
||||||
|
Suppress normal output; instead print the name
|
||||||
|
of each input file from which no output would
|
||||||
|
normally have been printed.
|
||||||
.TP
|
.TP
|
||||||
.B \-l
|
.B \-l
|
||||||
list files containing matches only
|
Suppress normal output; instead print
|
||||||
|
the name of each input file from which output
|
||||||
|
would normally have been printed.
|
||||||
.TP
|
.TP
|
||||||
.B \-n
|
.B \-n
|
||||||
print each match preceded by its line number
|
Prefix each line of output with the line number
|
||||||
|
within its input file.
|
||||||
|
.TP
|
||||||
|
.B \-q
|
||||||
|
Quiet; suppress normal output.
|
||||||
.TP
|
.TP
|
||||||
.B \-s
|
.B \-s
|
||||||
run silently producing no output except error messages
|
Suppress error messages about nonexistent or unreadable files.
|
||||||
.TP
|
.TP
|
||||||
.B \-v
|
.B \-v
|
||||||
print only lines that contain no matches for the <expr>
|
Invert the sense of matching, to select non-matching lines.
|
||||||
.TP
|
.TP
|
||||||
.B \-w
|
.B \-w
|
||||||
print only lines where the match is a complete word
|
Select only those lines containing matches that form whole words.
|
||||||
|
The test is that the matching substring must either be at the
|
||||||
|
beginning of the line, or preceded by a non-word constituent
|
||||||
|
character. Similarly, it must be either at the end of the line
|
||||||
|
or followed by a non-word constituent character. Word-constituent
|
||||||
|
characters are letters, digits, and the underscore.
|
||||||
.TP
|
.TP
|
||||||
.B \-x
|
.B \-x
|
||||||
print only lines where the match is a whole line
|
Select only those matches that exactly match the whole line.
|
||||||
.SH "SEE ALSO"
|
.PD
|
||||||
emacs(1), ed(1), sh(1),
|
.SH "REGULAR EXPRESSIONS"
|
||||||
.I "GNU Emacs Manual"
|
|
||||||
.SH INCOMPATIBILITIES
|
|
||||||
The following incompatibilities with UNIX
|
|
||||||
.I grep
|
|
||||||
exist:
|
|
||||||
.PP
|
.PP
|
||||||
.RS 0.5i
|
A regular expression is a pattern that describes a set of strings.
|
||||||
The context-dependent meaning of \(** is not quite the same (grep only).
|
Regular expressions are constructed analagously to arithmetic
|
||||||
|
expressions, by using various operators to combine smaller expressions.
|
||||||
.PP
|
.PP
|
||||||
.B \-b
|
.B Grep
|
||||||
prints a byte offset instead of a block offset.
|
understands two different versions of regular expression syntax:
|
||||||
|
``basic'' and ``extended.'' In
|
||||||
|
.RB "GNU\ " grep ,
|
||||||
|
there is no difference in available functionality using either syntax.
|
||||||
|
In other implementations, basic regular expressions are less powerful.
|
||||||
|
The following description applies to extended regular expressions;
|
||||||
|
differences for basic regular expressions are summarized afterwards.
|
||||||
.PP
|
.PP
|
||||||
The {\fIm,n\fP} construct of System V grep is not implemented.
|
The fundamental building blocks are the regular expressions that match
|
||||||
|
a single character. Most characters, including all letters and digits,
|
||||||
|
are regular expressions that match themselves. Any metacharacter with
|
||||||
|
special meaning may be quoted by preceding it with a backslash.
|
||||||
.PP
|
.PP
|
||||||
|
A list of characters enclosed by
|
||||||
|
.B [
|
||||||
|
and
|
||||||
|
.B ]
|
||||||
|
matches any single
|
||||||
|
character in that list; if the first character of the list
|
||||||
|
is the caret
|
||||||
|
.B ^
|
||||||
|
then it matches any character
|
||||||
|
.I not
|
||||||
|
in the list.
|
||||||
|
For example, the regular expression
|
||||||
|
.B [0123456789]
|
||||||
|
matches any single digit. A range of ASCII characters
|
||||||
|
may be specified by giving the first and last characters, separated
|
||||||
|
by a hyphen.
|
||||||
|
Finally, certain named classes of characters are predefined.
|
||||||
|
Their names are self explanatory, and they are
|
||||||
|
.BR [:alnum:] ,
|
||||||
|
.BR [:alpha:] ,
|
||||||
|
.BR [:cntrl:] ,
|
||||||
|
.BR [:digit:] ,
|
||||||
|
.BR [:graph:] ,
|
||||||
|
.BR [:lower:] ,
|
||||||
|
.BR [:print:] ,
|
||||||
|
.BR [:punct:] ,
|
||||||
|
.BR [:space:] ,
|
||||||
|
.BR [:upper:] ,
|
||||||
|
and
|
||||||
|
.BR [:xdigit:].
|
||||||
|
For example,
|
||||||
|
.B [[:alnum:]]
|
||||||
|
means
|
||||||
|
.BR [0-9A-Za-z] ,
|
||||||
|
except the latter form is dependent upon the ASCII character encoding,
|
||||||
|
whereas the former is portable.
|
||||||
|
(Note that the brackets in these class names are part of the symbolic
|
||||||
|
names, and must be included in addition to the brackets delimiting
|
||||||
|
the bracket list.) Most metacharacters lose their special meaning
|
||||||
|
inside lists. To include a literal
|
||||||
|
.B ]
|
||||||
|
place it first in the list. Similarly, to include a literal
|
||||||
|
.B ^
|
||||||
|
place it anywhere but first. Finally, to include a literal
|
||||||
|
.B \-
|
||||||
|
place it last.
|
||||||
|
.PP
|
||||||
|
The period
|
||||||
|
.B .
|
||||||
|
matches any single character.
|
||||||
|
The symbol
|
||||||
|
.B \ew
|
||||||
|
is a synonym for
|
||||||
|
.B [[:alnum:]]
|
||||||
|
and
|
||||||
|
.B \eW
|
||||||
|
is a synonym for
|
||||||
|
.BR [^[:alnum]] .
|
||||||
|
.PP
|
||||||
|
The caret
|
||||||
|
.B ^
|
||||||
|
and the dollar sign
|
||||||
|
.B $
|
||||||
|
are metacharacters that respectively match the empty string at the
|
||||||
|
beginning and end of a line.
|
||||||
|
The symbols
|
||||||
|
.B \e<
|
||||||
|
and
|
||||||
|
.B \e>
|
||||||
|
respectively match the empty string at the beginning and end of a word.
|
||||||
|
The symbol
|
||||||
|
.B \eb
|
||||||
|
matches the empty string at the edge of a word,
|
||||||
|
and
|
||||||
|
.B \eB
|
||||||
|
matches the empty string provided it's
|
||||||
|
.I not
|
||||||
|
at the edge of a word.
|
||||||
|
.PP
|
||||||
|
A regular expression matching a single character may be followed
|
||||||
|
by one of several repetition operators:
|
||||||
|
.PD 0
|
||||||
|
.TP
|
||||||
|
.B ?
|
||||||
|
The preceding item is optional and matched at most once.
|
||||||
|
.TP
|
||||||
|
.B *
|
||||||
|
The preceding item will be matched zero or more times.
|
||||||
|
.TP
|
||||||
|
.B +
|
||||||
|
The preceding item will be matched one or more times.
|
||||||
|
.TP
|
||||||
|
.BI { n }
|
||||||
|
The preceding item is matched exactly
|
||||||
|
.I n
|
||||||
|
times.
|
||||||
|
.TP
|
||||||
|
.BI { n ,}
|
||||||
|
The preceding item is matched
|
||||||
|
.I n
|
||||||
|
or more times.
|
||||||
|
.TP
|
||||||
|
.BI {, m }
|
||||||
|
The preceding item is optional and is matched at most
|
||||||
|
.I m
|
||||||
|
times.
|
||||||
|
.TP
|
||||||
|
.BI { n , m }
|
||||||
|
The preceding item is matched at least
|
||||||
|
.I n
|
||||||
|
times, but not more than
|
||||||
|
.I m
|
||||||
|
times.
|
||||||
|
.PD
|
||||||
|
.PP
|
||||||
|
Two regular expressions may be concatenated; the resulting
|
||||||
|
regular expression matches any string formed by concatenating
|
||||||
|
two substrings that respectively match the concatenated
|
||||||
|
subexpressions.
|
||||||
|
.PP
|
||||||
|
Two regular expressions may be joined by the infix operator
|
||||||
|
.BR | ;
|
||||||
|
the resulting regular expression matches any string matching
|
||||||
|
either subexpression.
|
||||||
|
.PP
|
||||||
|
Repetition takes precedence over concatenation, which in turn
|
||||||
|
takes precedence over alternation. A whole subexpression may be
|
||||||
|
enclosed in parentheses to override these precedence rules.
|
||||||
|
.PP
|
||||||
|
The backreference
|
||||||
|
.BI \e n\c
|
||||||
|
\&, where
|
||||||
|
.I n
|
||||||
|
is a single digit, matches the substring
|
||||||
|
previously matched by the
|
||||||
|
.IR n th
|
||||||
|
parenthesized subexpression of the regular expression.
|
||||||
|
.PP
|
||||||
|
In basic regular expressions the metacharacters
|
||||||
|
.BR ? ,
|
||||||
|
.BR + ,
|
||||||
|
.BR { ,
|
||||||
|
.BR | ,
|
||||||
|
.BR ( ,
|
||||||
|
and
|
||||||
|
.BR )
|
||||||
|
lose their special meaning; instead use the backslashed
|
||||||
|
versions
|
||||||
|
.BR \e? ,
|
||||||
|
.BR \e+ ,
|
||||||
|
.BR \e{ ,
|
||||||
|
.BR \e| ,
|
||||||
|
.BR \e( ,
|
||||||
|
and
|
||||||
|
.BR \e) .
|
||||||
|
.PP
|
||||||
|
In
|
||||||
|
.B egrep
|
||||||
|
the metacharacter
|
||||||
|
.B {
|
||||||
|
loses its special meaning; instead use
|
||||||
|
.BR \e{ .
|
||||||
|
.SH DIAGNOSTICS
|
||||||
|
.PP
|
||||||
|
Normally, exit status is 0 if matches were found,
|
||||||
|
and 1 if no matches were found. (The
|
||||||
|
.B \-v
|
||||||
|
option inverts the sense of the exit status.)
|
||||||
|
Exit status is 2 if there were syntax errors
|
||||||
|
in the pattern, inaccessible input files, or
|
||||||
|
other system errors.
|
||||||
.SH BUGS
|
.SH BUGS
|
||||||
GNU \fIe?grep\fP has been thoroughly debugged and tested over a period
|
|
||||||
of several years; we think it's a reliable beast or we wouldn't
|
|
||||||
distribute it. If by some fluke of the universe you discover a bug,
|
|
||||||
send a detailed description (including options, regular expressions,
|
|
||||||
and a copy of an input file that can reproduce it) to mike@ai.mit.edu.
|
|
||||||
.PP
|
.PP
|
||||||
.SH AUTHORS
|
Email bug reports to
|
||||||
Mike Haertel wrote the deterministic regexp code and the bulk
|
.BR bug-gnu-utils@prep.ai.mit.edu .
|
||||||
of the program.
|
Be sure to include the word ``grep'' somewhere in the ``Subject:'' field.
|
||||||
.PP
|
.PP
|
||||||
James A. Woods is responsible for the hybridized search strategy
|
Large repetition counts in the
|
||||||
of using Boyer-Moore-Gosper fixed-string search as a filter
|
.BI { m , n }
|
||||||
before calling the general regexp matcher.
|
construct may cause grep to use lots of memory.
|
||||||
|
In addition,
|
||||||
|
certain other obscure regular expressions require exponential time
|
||||||
|
and space, and may cause
|
||||||
|
.B grep
|
||||||
|
to run out of memory.
|
||||||
.PP
|
.PP
|
||||||
Arthur David Olson contributed code that finds fixed strings for
|
Backreferences are very slow, and may require exponential time.
|
||||||
the aforementioned BMG search for a large class of regexps.
|
|
||||||
.PP
|
|
||||||
Richard Stallman wrote the backtracking regexp matcher that is used
|
|
||||||
for \\\fIdigit\fP backreferences, as well as the GNU getopt. The
|
|
||||||
backtracking matcher was originally written for GNU Emacs.
|
|
||||||
.PP
|
|
||||||
D. A. Gwyn wrote the C alloca emulation that is provided so
|
|
||||||
System V machines can run this program. (Alloca is used only
|
|
||||||
by RMS' backtracking matcher, and then only rarely, so there
|
|
||||||
is no loss if your machine doesn't have a "real" alloca.)
|
|
||||||
.PP
|
|
||||||
Scott Anderson and Henry Spencer designed the regression tests
|
|
||||||
used in the "regress" script.
|
|
||||||
.PP
|
|
||||||
Paul Placeway wrote the original version of this manual page.
|
|
||||||
|
File diff suppressed because it is too large
Load Diff
53
gnu/usr.bin/grep/grep.h
Normal file
53
gnu/usr.bin/grep/grep.h
Normal file
@ -0,0 +1,53 @@
|
|||||||
|
/* grep.h - interface to grep driver for searching subroutines.
|
||||||
|
Copyright (C) 1992 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2, or (at your option)
|
||||||
|
any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||||
|
|
||||||
|
#if __STDC__
|
||||||
|
|
||||||
|
extern void fatal(const char *, int);
|
||||||
|
|
||||||
|
/* Grep.c expects the matchers vector to be terminated
|
||||||
|
by an entry with a NULL name, and to contain at least
|
||||||
|
an entry named "default". */
|
||||||
|
|
||||||
|
extern struct matcher
|
||||||
|
{
|
||||||
|
char *name;
|
||||||
|
void (*compile)(char *, size_t);
|
||||||
|
char *(*execute)(char *, size_t, char **);
|
||||||
|
} matchers[];
|
||||||
|
|
||||||
|
#else
|
||||||
|
|
||||||
|
extern void fatal();
|
||||||
|
|
||||||
|
extern struct matcher
|
||||||
|
{
|
||||||
|
char *name;
|
||||||
|
void (*compile)();
|
||||||
|
char *(*execute)();
|
||||||
|
} matchers[];
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Exported from grep.c. */
|
||||||
|
extern char *matcher;
|
||||||
|
|
||||||
|
/* The following flags are exported from grep for the matchers
|
||||||
|
to look at. */
|
||||||
|
extern int match_icase; /* -i */
|
||||||
|
extern int match_words; /* -w */
|
||||||
|
extern int match_lines; /* -x */
|
805
gnu/usr.bin/grep/kwset.c
Normal file
805
gnu/usr.bin/grep/kwset.c
Normal file
@ -0,0 +1,805 @@
|
|||||||
|
/* kwset.c - search for any of a set of keywords.
|
||||||
|
Copyright 1989 Free Software Foundation
|
||||||
|
Written August 1989 by Mike Haertel.
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 1, or (at your option)
|
||||||
|
any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
The author may be reached (Email) at the address mike@ai.mit.edu,
|
||||||
|
or (US mail) as Mike Haertel c/o Free Software Foundation. */
|
||||||
|
|
||||||
|
/* The algorithm implemented by these routines bears a startling resemblence
|
||||||
|
to one discovered by Beate Commentz-Walter, although it is not identical.
|
||||||
|
See "A String Matching Algorithm Fast on the Average," Technical Report,
|
||||||
|
IBM-Germany, Scientific Center Heidelberg, Tiergartenstrasse 15, D-6900
|
||||||
|
Heidelberg, Germany. See also Aho, A.V., and M. Corasick, "Efficient
|
||||||
|
String Matching: An Aid to Bibliographic Search," CACM June 1975,
|
||||||
|
Vol. 18, No. 6, which describes the failure function used below. */
|
||||||
|
|
||||||
|
|
||||||
|
#ifdef STDC_HEADERS
|
||||||
|
#include <limits.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#else
|
||||||
|
#define INT_MAX 2147483647
|
||||||
|
#define UCHAR_MAX 255
|
||||||
|
#ifdef __STDC__
|
||||||
|
#include <stddef.h>
|
||||||
|
#else
|
||||||
|
#include <sys/types.h>
|
||||||
|
#endif
|
||||||
|
extern char *malloc();
|
||||||
|
extern void free();
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef HAVE_MEMCHR
|
||||||
|
#include <string.h>
|
||||||
|
#ifdef NEED_MEMORY_H
|
||||||
|
#include <memory.h>
|
||||||
|
#endif
|
||||||
|
#else
|
||||||
|
#ifdef __STDC__
|
||||||
|
extern void *memchr();
|
||||||
|
#else
|
||||||
|
extern char *memchr();
|
||||||
|
#endif
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef GREP
|
||||||
|
extern char *xmalloc();
|
||||||
|
#define malloc xmalloc
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include "kwset.h"
|
||||||
|
#include "obstack.h"
|
||||||
|
|
||||||
|
#define NCHAR (UCHAR_MAX + 1)
|
||||||
|
#define obstack_chunk_alloc malloc
|
||||||
|
#define obstack_chunk_free free
|
||||||
|
|
||||||
|
/* Balanced tree of edges and labels leaving a given trie node. */
|
||||||
|
struct tree
|
||||||
|
{
|
||||||
|
struct tree *llink; /* Left link; MUST be first field. */
|
||||||
|
struct tree *rlink; /* Right link (to larger labels). */
|
||||||
|
struct trie *trie; /* Trie node pointed to by this edge. */
|
||||||
|
unsigned char label; /* Label on this edge. */
|
||||||
|
char balance; /* Difference in depths of subtrees. */
|
||||||
|
};
|
||||||
|
|
||||||
|
/* Node of a trie representing a set of reversed keywords. */
|
||||||
|
struct trie
|
||||||
|
{
|
||||||
|
unsigned int accepting; /* Word index of accepted word, or zero. */
|
||||||
|
struct tree *links; /* Tree of edges leaving this node. */
|
||||||
|
struct trie *parent; /* Parent of this node. */
|
||||||
|
struct trie *next; /* List of all trie nodes in level order. */
|
||||||
|
struct trie *fail; /* Aho-Corasick failure function. */
|
||||||
|
int depth; /* Depth of this node from the root. */
|
||||||
|
int shift; /* Shift function for search failures. */
|
||||||
|
int maxshift; /* Max shift of self and descendents. */
|
||||||
|
};
|
||||||
|
|
||||||
|
/* Structure returned opaquely to the caller, containing everything. */
|
||||||
|
struct kwset
|
||||||
|
{
|
||||||
|
struct obstack obstack; /* Obstack for node allocation. */
|
||||||
|
int words; /* Number of words in the trie. */
|
||||||
|
struct trie *trie; /* The trie itself. */
|
||||||
|
int mind; /* Minimum depth of an accepting node. */
|
||||||
|
int maxd; /* Maximum depth of any node. */
|
||||||
|
unsigned char delta[NCHAR]; /* Delta table for rapid search. */
|
||||||
|
struct trie *next[NCHAR]; /* Table of children of the root. */
|
||||||
|
char *target; /* Target string if there's only one. */
|
||||||
|
int mind2; /* Used in Boyer-Moore search for one string. */
|
||||||
|
char *trans; /* Character translation table. */
|
||||||
|
};
|
||||||
|
|
||||||
|
/* Allocate and initialize a keyword set object, returning an opaque
|
||||||
|
pointer to it. Return NULL if memory is not available. */
|
||||||
|
kwset_t
|
||||||
|
kwsalloc(trans)
|
||||||
|
char *trans;
|
||||||
|
{
|
||||||
|
struct kwset *kwset;
|
||||||
|
|
||||||
|
kwset = (struct kwset *) malloc(sizeof (struct kwset));
|
||||||
|
if (!kwset)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
obstack_init(&kwset->obstack);
|
||||||
|
kwset->words = 0;
|
||||||
|
kwset->trie
|
||||||
|
= (struct trie *) obstack_alloc(&kwset->obstack, sizeof (struct trie));
|
||||||
|
if (!kwset->trie)
|
||||||
|
{
|
||||||
|
kwsfree((kwset_t) kwset);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
kwset->trie->accepting = 0;
|
||||||
|
kwset->trie->links = 0;
|
||||||
|
kwset->trie->parent = 0;
|
||||||
|
kwset->trie->next = 0;
|
||||||
|
kwset->trie->fail = 0;
|
||||||
|
kwset->trie->depth = 0;
|
||||||
|
kwset->trie->shift = 0;
|
||||||
|
kwset->mind = INT_MAX;
|
||||||
|
kwset->maxd = -1;
|
||||||
|
kwset->target = 0;
|
||||||
|
kwset->trans = trans;
|
||||||
|
|
||||||
|
return (kwset_t) kwset;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Add the given string to the contents of the keyword set. Return NULL
|
||||||
|
for success, an error message otherwise. */
|
||||||
|
char *
|
||||||
|
kwsincr(kws, text, len)
|
||||||
|
kwset_t kws;
|
||||||
|
char *text;
|
||||||
|
size_t len;
|
||||||
|
{
|
||||||
|
struct kwset *kwset;
|
||||||
|
register struct trie *trie;
|
||||||
|
register unsigned char label;
|
||||||
|
register struct tree *link;
|
||||||
|
register int depth;
|
||||||
|
struct tree *links[12];
|
||||||
|
enum { L, R } dirs[12];
|
||||||
|
struct tree *t, *r, *l, *rl, *lr;
|
||||||
|
|
||||||
|
kwset = (struct kwset *) kws;
|
||||||
|
trie = kwset->trie;
|
||||||
|
text += len;
|
||||||
|
|
||||||
|
/* Descend the trie (built of reversed keywords) character-by-character,
|
||||||
|
installing new nodes when necessary. */
|
||||||
|
while (len--)
|
||||||
|
{
|
||||||
|
label = kwset->trans ? kwset->trans[(unsigned char) *--text] : *--text;
|
||||||
|
|
||||||
|
/* Descend the tree of outgoing links for this trie node,
|
||||||
|
looking for the current character and keeping track
|
||||||
|
of the path followed. */
|
||||||
|
link = trie->links;
|
||||||
|
links[0] = (struct tree *) &trie->links;
|
||||||
|
dirs[0] = L;
|
||||||
|
depth = 1;
|
||||||
|
|
||||||
|
while (link && label != link->label)
|
||||||
|
{
|
||||||
|
links[depth] = link;
|
||||||
|
if (label < link->label)
|
||||||
|
dirs[depth++] = L, link = link->llink;
|
||||||
|
else
|
||||||
|
dirs[depth++] = R, link = link->rlink;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* The current character doesn't have an outgoing link at
|
||||||
|
this trie node, so build a new trie node and install
|
||||||
|
a link in the current trie node's tree. */
|
||||||
|
if (!link)
|
||||||
|
{
|
||||||
|
link = (struct tree *) obstack_alloc(&kwset->obstack,
|
||||||
|
sizeof (struct tree));
|
||||||
|
if (!link)
|
||||||
|
return "memory exhausted";
|
||||||
|
link->llink = 0;
|
||||||
|
link->rlink = 0;
|
||||||
|
link->trie = (struct trie *) obstack_alloc(&kwset->obstack,
|
||||||
|
sizeof (struct trie));
|
||||||
|
if (!link->trie)
|
||||||
|
return "memory exhausted";
|
||||||
|
link->trie->accepting = 0;
|
||||||
|
link->trie->links = 0;
|
||||||
|
link->trie->parent = trie;
|
||||||
|
link->trie->next = 0;
|
||||||
|
link->trie->fail = 0;
|
||||||
|
link->trie->depth = trie->depth + 1;
|
||||||
|
link->trie->shift = 0;
|
||||||
|
link->label = label;
|
||||||
|
link->balance = 0;
|
||||||
|
|
||||||
|
/* Install the new tree node in its parent. */
|
||||||
|
if (dirs[--depth] == L)
|
||||||
|
links[depth]->llink = link;
|
||||||
|
else
|
||||||
|
links[depth]->rlink = link;
|
||||||
|
|
||||||
|
/* Back up the tree fixing the balance flags. */
|
||||||
|
while (depth && !links[depth]->balance)
|
||||||
|
{
|
||||||
|
if (dirs[depth] == L)
|
||||||
|
--links[depth]->balance;
|
||||||
|
else
|
||||||
|
++links[depth]->balance;
|
||||||
|
--depth;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Rebalance the tree by pointer rotations if necessary. */
|
||||||
|
if (depth && ((dirs[depth] == L && --links[depth]->balance)
|
||||||
|
|| (dirs[depth] == R && ++links[depth]->balance)))
|
||||||
|
{
|
||||||
|
switch (links[depth]->balance)
|
||||||
|
{
|
||||||
|
case (char) -2:
|
||||||
|
switch (dirs[depth + 1])
|
||||||
|
{
|
||||||
|
case L:
|
||||||
|
r = links[depth], t = r->llink, rl = t->rlink;
|
||||||
|
t->rlink = r, r->llink = rl;
|
||||||
|
t->balance = r->balance = 0;
|
||||||
|
break;
|
||||||
|
case R:
|
||||||
|
r = links[depth], l = r->llink, t = l->rlink;
|
||||||
|
rl = t->rlink, lr = t->llink;
|
||||||
|
t->llink = l, l->rlink = lr, t->rlink = r, r->llink = rl;
|
||||||
|
l->balance = t->balance != 1 ? 0 : -1;
|
||||||
|
r->balance = t->balance != (char) -1 ? 0 : 1;
|
||||||
|
t->balance = 0;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
switch (dirs[depth + 1])
|
||||||
|
{
|
||||||
|
case R:
|
||||||
|
l = links[depth], t = l->rlink, lr = t->llink;
|
||||||
|
t->llink = l, l->rlink = lr;
|
||||||
|
t->balance = l->balance = 0;
|
||||||
|
break;
|
||||||
|
case L:
|
||||||
|
l = links[depth], r = l->rlink, t = r->llink;
|
||||||
|
lr = t->llink, rl = t->rlink;
|
||||||
|
t->llink = l, l->rlink = lr, t->rlink = r, r->llink = rl;
|
||||||
|
l->balance = t->balance != 1 ? 0 : -1;
|
||||||
|
r->balance = t->balance != (char) -1 ? 0 : 1;
|
||||||
|
t->balance = 0;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (dirs[depth - 1] == L)
|
||||||
|
links[depth - 1]->llink = t;
|
||||||
|
else
|
||||||
|
links[depth - 1]->rlink = t;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
trie = link->trie;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Mark the node we finally reached as accepting, encoding the
|
||||||
|
index number of this word in the keyword set so far. */
|
||||||
|
if (!trie->accepting)
|
||||||
|
trie->accepting = 1 + 2 * kwset->words;
|
||||||
|
++kwset->words;
|
||||||
|
|
||||||
|
/* Keep track of the longest and shortest string of the keyword set. */
|
||||||
|
if (trie->depth < kwset->mind)
|
||||||
|
kwset->mind = trie->depth;
|
||||||
|
if (trie->depth > kwset->maxd)
|
||||||
|
kwset->maxd = trie->depth;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Enqueue the trie nodes referenced from the given tree in the
|
||||||
|
given queue. */
|
||||||
|
static void
|
||||||
|
enqueue(tree, last)
|
||||||
|
struct tree *tree;
|
||||||
|
struct trie **last;
|
||||||
|
{
|
||||||
|
if (!tree)
|
||||||
|
return;
|
||||||
|
enqueue(tree->llink, last);
|
||||||
|
enqueue(tree->rlink, last);
|
||||||
|
(*last) = (*last)->next = tree->trie;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Compute the Aho-Corasick failure function for the trie nodes referenced
|
||||||
|
from the given tree, given the failure function for their parent as
|
||||||
|
well as a last resort failure node. */
|
||||||
|
static void
|
||||||
|
treefails(tree, fail, recourse)
|
||||||
|
register struct tree *tree;
|
||||||
|
struct trie *fail;
|
||||||
|
struct trie *recourse;
|
||||||
|
{
|
||||||
|
register struct tree *link;
|
||||||
|
|
||||||
|
if (!tree)
|
||||||
|
return;
|
||||||
|
|
||||||
|
treefails(tree->llink, fail, recourse);
|
||||||
|
treefails(tree->rlink, fail, recourse);
|
||||||
|
|
||||||
|
/* Find, in the chain of fails going back to the root, the first
|
||||||
|
node that has a descendent on the current label. */
|
||||||
|
while (fail)
|
||||||
|
{
|
||||||
|
link = fail->links;
|
||||||
|
while (link && tree->label != link->label)
|
||||||
|
if (tree->label < link->label)
|
||||||
|
link = link->llink;
|
||||||
|
else
|
||||||
|
link = link->rlink;
|
||||||
|
if (link)
|
||||||
|
{
|
||||||
|
tree->trie->fail = link->trie;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
fail = fail->fail;
|
||||||
|
}
|
||||||
|
|
||||||
|
tree->trie->fail = recourse;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Set delta entries for the links of the given tree such that
|
||||||
|
the preexisting delta value is larger than the current depth. */
|
||||||
|
static void
|
||||||
|
treedelta(tree, depth, delta)
|
||||||
|
register struct tree *tree;
|
||||||
|
register unsigned int depth;
|
||||||
|
unsigned char delta[];
|
||||||
|
{
|
||||||
|
if (!tree)
|
||||||
|
return;
|
||||||
|
treedelta(tree->llink, depth, delta);
|
||||||
|
treedelta(tree->rlink, depth, delta);
|
||||||
|
if (depth < delta[tree->label])
|
||||||
|
delta[tree->label] = depth;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Return true if A has every label in B. */
|
||||||
|
static int
|
||||||
|
hasevery(a, b)
|
||||||
|
register struct tree *a;
|
||||||
|
register struct tree *b;
|
||||||
|
{
|
||||||
|
if (!b)
|
||||||
|
return 1;
|
||||||
|
if (!hasevery(a, b->llink))
|
||||||
|
return 0;
|
||||||
|
if (!hasevery(a, b->rlink))
|
||||||
|
return 0;
|
||||||
|
while (a && b->label != a->label)
|
||||||
|
if (b->label < a->label)
|
||||||
|
a = a->llink;
|
||||||
|
else
|
||||||
|
a = a->rlink;
|
||||||
|
return !!a;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Compute a vector, indexed by character code, of the trie nodes
|
||||||
|
referenced from the given tree. */
|
||||||
|
static void
|
||||||
|
treenext(tree, next)
|
||||||
|
struct tree *tree;
|
||||||
|
struct trie *next[];
|
||||||
|
{
|
||||||
|
if (!tree)
|
||||||
|
return;
|
||||||
|
treenext(tree->llink, next);
|
||||||
|
treenext(tree->rlink, next);
|
||||||
|
next[tree->label] = tree->trie;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Compute the shift for each trie node, as well as the delta
|
||||||
|
table and next cache for the given keyword set. */
|
||||||
|
char *
|
||||||
|
kwsprep(kws)
|
||||||
|
kwset_t kws;
|
||||||
|
{
|
||||||
|
register struct kwset *kwset;
|
||||||
|
register int i;
|
||||||
|
register struct trie *curr, *fail;
|
||||||
|
register char *trans;
|
||||||
|
unsigned char delta[NCHAR];
|
||||||
|
struct trie *last, *next[NCHAR];
|
||||||
|
|
||||||
|
kwset = (struct kwset *) kws;
|
||||||
|
|
||||||
|
/* Initial values for the delta table; will be changed later. The
|
||||||
|
delta entry for a given character is the smallest depth of any
|
||||||
|
node at which an outgoing edge is labeled by that character. */
|
||||||
|
if (kwset->mind < 256)
|
||||||
|
for (i = 0; i < NCHAR; ++i)
|
||||||
|
delta[i] = kwset->mind;
|
||||||
|
else
|
||||||
|
for (i = 0; i < NCHAR; ++i)
|
||||||
|
delta[i] = 255;
|
||||||
|
|
||||||
|
/* Check if we can use the simple boyer-moore algorithm, instead
|
||||||
|
of the hairy commentz-walter algorithm. */
|
||||||
|
if (kwset->words == 1 && kwset->trans == 0)
|
||||||
|
{
|
||||||
|
/* Looking for just one string. Extract it from the trie. */
|
||||||
|
kwset->target = obstack_alloc(&kwset->obstack, kwset->mind);
|
||||||
|
for (i = kwset->mind - 1, curr = kwset->trie; i >= 0; --i)
|
||||||
|
{
|
||||||
|
kwset->target[i] = curr->links->label;
|
||||||
|
curr = curr->links->trie;
|
||||||
|
}
|
||||||
|
/* Build the Boyer Moore delta. Boy that's easy compared to CW. */
|
||||||
|
for (i = 0; i < kwset->mind; ++i)
|
||||||
|
delta[(unsigned char) kwset->target[i]] = kwset->mind - (i + 1);
|
||||||
|
kwset->mind2 = kwset->mind;
|
||||||
|
/* Find the minimal delta2 shift that we might make after
|
||||||
|
a backwards match has failed. */
|
||||||
|
for (i = 0; i < kwset->mind - 1; ++i)
|
||||||
|
if (kwset->target[i] == kwset->target[kwset->mind - 1])
|
||||||
|
kwset->mind2 = kwset->mind - (i + 1);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
/* Traverse the nodes of the trie in level order, simultaneously
|
||||||
|
computing the delta table, failure function, and shift function. */
|
||||||
|
for (curr = last = kwset->trie; curr; curr = curr->next)
|
||||||
|
{
|
||||||
|
/* Enqueue the immediate descendents in the level order queue. */
|
||||||
|
enqueue(curr->links, &last);
|
||||||
|
|
||||||
|
curr->shift = kwset->mind;
|
||||||
|
curr->maxshift = kwset->mind;
|
||||||
|
|
||||||
|
/* Update the delta table for the descendents of this node. */
|
||||||
|
treedelta(curr->links, curr->depth, delta);
|
||||||
|
|
||||||
|
/* Compute the failure function for the decendents of this node. */
|
||||||
|
treefails(curr->links, curr->fail, kwset->trie);
|
||||||
|
|
||||||
|
/* Update the shifts at each node in the current node's chain
|
||||||
|
of fails back to the root. */
|
||||||
|
for (fail = curr->fail; fail; fail = fail->fail)
|
||||||
|
{
|
||||||
|
/* If the current node has some outgoing edge that the fail
|
||||||
|
doesn't, then the shift at the fail should be no larger
|
||||||
|
than the difference of their depths. */
|
||||||
|
if (!hasevery(fail->links, curr->links))
|
||||||
|
if (curr->depth - fail->depth < fail->shift)
|
||||||
|
fail->shift = curr->depth - fail->depth;
|
||||||
|
|
||||||
|
/* If the current node is accepting then the shift at the
|
||||||
|
fail and its descendents should be no larger than the
|
||||||
|
difference of their depths. */
|
||||||
|
if (curr->accepting && fail->maxshift > curr->depth - fail->depth)
|
||||||
|
fail->maxshift = curr->depth - fail->depth;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Traverse the trie in level order again, fixing up all nodes whose
|
||||||
|
shift exceeds their inherited maxshift. */
|
||||||
|
for (curr = kwset->trie->next; curr; curr = curr->next)
|
||||||
|
{
|
||||||
|
if (curr->maxshift > curr->parent->maxshift)
|
||||||
|
curr->maxshift = curr->parent->maxshift;
|
||||||
|
if (curr->shift > curr->maxshift)
|
||||||
|
curr->shift = curr->maxshift;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Create a vector, indexed by character code, of the outgoing links
|
||||||
|
from the root node. */
|
||||||
|
for (i = 0; i < NCHAR; ++i)
|
||||||
|
next[i] = 0;
|
||||||
|
treenext(kwset->trie->links, next);
|
||||||
|
|
||||||
|
if ((trans = kwset->trans) != 0)
|
||||||
|
for (i = 0; i < NCHAR; ++i)
|
||||||
|
kwset->next[i] = next[(unsigned char) trans[i]];
|
||||||
|
else
|
||||||
|
for (i = 0; i < NCHAR; ++i)
|
||||||
|
kwset->next[i] = next[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Fix things up for any translation table. */
|
||||||
|
if ((trans = kwset->trans) != 0)
|
||||||
|
for (i = 0; i < NCHAR; ++i)
|
||||||
|
kwset->delta[i] = delta[(unsigned char) trans[i]];
|
||||||
|
else
|
||||||
|
for (i = 0; i < NCHAR; ++i)
|
||||||
|
kwset->delta[i] = delta[i];
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
#define U(C) ((unsigned char) (C))
|
||||||
|
|
||||||
|
/* Fast boyer-moore search. */
|
||||||
|
static char *
|
||||||
|
bmexec(kws, text, size)
|
||||||
|
kwset_t kws;
|
||||||
|
char *text;
|
||||||
|
size_t size;
|
||||||
|
{
|
||||||
|
struct kwset *kwset;
|
||||||
|
register unsigned char *d1;
|
||||||
|
register char *ep, *sp, *tp;
|
||||||
|
register int d, gc, i, len, md2;
|
||||||
|
|
||||||
|
kwset = (struct kwset *) kws;
|
||||||
|
len = kwset->mind;
|
||||||
|
|
||||||
|
if (len == 0)
|
||||||
|
return text;
|
||||||
|
if (len > size)
|
||||||
|
return 0;
|
||||||
|
if (len == 1)
|
||||||
|
return memchr(text, kwset->target[0], size);
|
||||||
|
|
||||||
|
d1 = kwset->delta;
|
||||||
|
sp = kwset->target + len;
|
||||||
|
gc = U(sp[-2]);
|
||||||
|
md2 = kwset->mind2;
|
||||||
|
tp = text + len;
|
||||||
|
|
||||||
|
/* Significance of 12: 1 (initial offset) + 10 (skip loop) + 1 (md2). */
|
||||||
|
if (size > 12 * len)
|
||||||
|
/* 11 is not a bug, the initial offset happens only once. */
|
||||||
|
for (ep = text + size - 11 * len;;)
|
||||||
|
{
|
||||||
|
while (tp <= ep)
|
||||||
|
{
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
if (d == 0)
|
||||||
|
goto found;
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
if (d == 0)
|
||||||
|
goto found;
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
if (d == 0)
|
||||||
|
goto found;
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
d = d1[U(tp[-1])], tp += d;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
found:
|
||||||
|
if (U(tp[-2]) == gc)
|
||||||
|
{
|
||||||
|
for (i = 3; i <= len && U(tp[-i]) == U(sp[-i]); ++i)
|
||||||
|
;
|
||||||
|
if (i > len)
|
||||||
|
return tp - len;
|
||||||
|
}
|
||||||
|
tp += md2;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Now we have only a few characters left to search. We
|
||||||
|
carefully avoid ever producing an out-of-bounds pointer. */
|
||||||
|
ep = text + size;
|
||||||
|
d = d1[U(tp[-1])];
|
||||||
|
while (d <= ep - tp)
|
||||||
|
{
|
||||||
|
d = d1[U((tp += d)[-1])];
|
||||||
|
if (d != 0)
|
||||||
|
continue;
|
||||||
|
if (tp[-2] == gc)
|
||||||
|
{
|
||||||
|
for (i = 3; i <= len && U(tp[-i]) == U(sp[-i]); ++i)
|
||||||
|
;
|
||||||
|
if (i > len)
|
||||||
|
return tp - len;
|
||||||
|
}
|
||||||
|
d = md2;
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Hairy multiple string search. */
|
||||||
|
static char *
|
||||||
|
cwexec(kws, text, len, kwsmatch)
|
||||||
|
kwset_t kws;
|
||||||
|
char *text;
|
||||||
|
size_t len;
|
||||||
|
struct kwsmatch *kwsmatch;
|
||||||
|
{
|
||||||
|
struct kwset *kwset;
|
||||||
|
struct trie **next, *trie, *accept;
|
||||||
|
char *beg, *lim, *mch, *lmch;
|
||||||
|
register unsigned char c, *delta;
|
||||||
|
register int d;
|
||||||
|
register char *end, *qlim;
|
||||||
|
register struct tree *tree;
|
||||||
|
register char *trans;
|
||||||
|
|
||||||
|
/* Initialize register copies and look for easy ways out. */
|
||||||
|
kwset = (struct kwset *) kws;
|
||||||
|
if (len < kwset->mind)
|
||||||
|
return 0;
|
||||||
|
next = kwset->next;
|
||||||
|
delta = kwset->delta;
|
||||||
|
trans = kwset->trans;
|
||||||
|
lim = text + len;
|
||||||
|
end = text;
|
||||||
|
if ((d = kwset->mind) != 0)
|
||||||
|
mch = 0;
|
||||||
|
else
|
||||||
|
{
|
||||||
|
mch = text, accept = kwset->trie;
|
||||||
|
goto match;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (len >= 4 * kwset->mind)
|
||||||
|
qlim = lim - 4 * kwset->mind;
|
||||||
|
else
|
||||||
|
qlim = 0;
|
||||||
|
|
||||||
|
while (lim - end >= d)
|
||||||
|
{
|
||||||
|
if (qlim && end <= qlim)
|
||||||
|
{
|
||||||
|
end += d - 1;
|
||||||
|
while ((d = delta[c = *end]) && end < qlim)
|
||||||
|
{
|
||||||
|
end += d;
|
||||||
|
end += delta[(unsigned char) *end];
|
||||||
|
end += delta[(unsigned char) *end];
|
||||||
|
}
|
||||||
|
++end;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
d = delta[c = (end += d)[-1]];
|
||||||
|
if (d)
|
||||||
|
continue;
|
||||||
|
beg = end - 1;
|
||||||
|
trie = next[c];
|
||||||
|
if (trie->accepting)
|
||||||
|
{
|
||||||
|
mch = beg;
|
||||||
|
accept = trie;
|
||||||
|
}
|
||||||
|
d = trie->shift;
|
||||||
|
while (beg > text)
|
||||||
|
{
|
||||||
|
c = trans ? trans[(unsigned char) *--beg] : *--beg;
|
||||||
|
tree = trie->links;
|
||||||
|
while (tree && c != tree->label)
|
||||||
|
if (c < tree->label)
|
||||||
|
tree = tree->llink;
|
||||||
|
else
|
||||||
|
tree = tree->rlink;
|
||||||
|
if (tree)
|
||||||
|
{
|
||||||
|
trie = tree->trie;
|
||||||
|
if (trie->accepting)
|
||||||
|
{
|
||||||
|
mch = beg;
|
||||||
|
accept = trie;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
break;
|
||||||
|
d = trie->shift;
|
||||||
|
}
|
||||||
|
if (mch)
|
||||||
|
goto match;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
match:
|
||||||
|
/* Given a known match, find the longest possible match anchored
|
||||||
|
at or before its starting point. This is nearly a verbatim
|
||||||
|
copy of the preceding main search loops. */
|
||||||
|
if (lim - mch > kwset->maxd)
|
||||||
|
lim = mch + kwset->maxd;
|
||||||
|
lmch = 0;
|
||||||
|
d = 1;
|
||||||
|
while (lim - end >= d)
|
||||||
|
{
|
||||||
|
if ((d = delta[c = (end += d)[-1]]) != 0)
|
||||||
|
continue;
|
||||||
|
beg = end - 1;
|
||||||
|
if (!(trie = next[c]))
|
||||||
|
{
|
||||||
|
d = 1;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (trie->accepting && beg <= mch)
|
||||||
|
{
|
||||||
|
lmch = beg;
|
||||||
|
accept = trie;
|
||||||
|
}
|
||||||
|
d = trie->shift;
|
||||||
|
while (beg > text)
|
||||||
|
{
|
||||||
|
c = trans ? trans[(unsigned char) *--beg] : *--beg;
|
||||||
|
tree = trie->links;
|
||||||
|
while (tree && c != tree->label)
|
||||||
|
if (c < tree->label)
|
||||||
|
tree = tree->llink;
|
||||||
|
else
|
||||||
|
tree = tree->rlink;
|
||||||
|
if (tree)
|
||||||
|
{
|
||||||
|
trie = tree->trie;
|
||||||
|
if (trie->accepting && beg <= mch)
|
||||||
|
{
|
||||||
|
lmch = beg;
|
||||||
|
accept = trie;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
break;
|
||||||
|
d = trie->shift;
|
||||||
|
}
|
||||||
|
if (lmch)
|
||||||
|
{
|
||||||
|
mch = lmch;
|
||||||
|
goto match;
|
||||||
|
}
|
||||||
|
if (!d)
|
||||||
|
d = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (kwsmatch)
|
||||||
|
{
|
||||||
|
kwsmatch->index = accept->accepting / 2;
|
||||||
|
kwsmatch->beg[0] = mch;
|
||||||
|
kwsmatch->size[0] = accept->depth;
|
||||||
|
}
|
||||||
|
return mch;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Search through the given text for a match of any member of the
|
||||||
|
given keyword set. Return a pointer to the first character of
|
||||||
|
the matching substring, or NULL if no match is found. If FOUNDLEN
|
||||||
|
is non-NULL store in the referenced location the length of the
|
||||||
|
matching substring. Similarly, if FOUNDIDX is non-NULL, store
|
||||||
|
in the referenced location the index number of the particular
|
||||||
|
keyword matched. */
|
||||||
|
char *
|
||||||
|
kwsexec(kws, text, size, kwsmatch)
|
||||||
|
kwset_t kws;
|
||||||
|
char *text;
|
||||||
|
size_t size;
|
||||||
|
struct kwsmatch *kwsmatch;
|
||||||
|
{
|
||||||
|
struct kwset *kwset;
|
||||||
|
char *ret;
|
||||||
|
|
||||||
|
kwset = (struct kwset *) kws;
|
||||||
|
if (kwset->words == 1 && kwset->trans == 0)
|
||||||
|
{
|
||||||
|
ret = bmexec(kws, text, size);
|
||||||
|
if (kwsmatch != 0 && ret != 0)
|
||||||
|
{
|
||||||
|
kwsmatch->index = 0;
|
||||||
|
kwsmatch->beg[0] = ret;
|
||||||
|
kwsmatch->size[0] = kwset->mind;
|
||||||
|
}
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
return cwexec(kws, text, size, kwsmatch);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Free the components of the given keyword set. */
|
||||||
|
void
|
||||||
|
kwsfree(kws)
|
||||||
|
kwset_t kws;
|
||||||
|
{
|
||||||
|
struct kwset *kwset;
|
||||||
|
|
||||||
|
kwset = (struct kwset *) kws;
|
||||||
|
obstack_free(&kwset->obstack, 0);
|
||||||
|
free(kws);
|
||||||
|
}
|
69
gnu/usr.bin/grep/kwset.h
Normal file
69
gnu/usr.bin/grep/kwset.h
Normal file
@ -0,0 +1,69 @@
|
|||||||
|
/* kwset.h - header declaring the keyword set library.
|
||||||
|
Copyright 1989 Free Software Foundation
|
||||||
|
Written August 1989 by Mike Haertel.
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 1, or (at your option)
|
||||||
|
any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
The author may be reached (Email) at the address mike@ai.mit.edu,
|
||||||
|
or (US mail) as Mike Haertel c/o Free Software Foundation. */
|
||||||
|
|
||||||
|
struct kwsmatch
|
||||||
|
{
|
||||||
|
int index; /* Index number of matching keyword. */
|
||||||
|
char *beg[1]; /* Begin pointer for each submatch. */
|
||||||
|
size_t size[1]; /* Length of each submatch. */
|
||||||
|
};
|
||||||
|
|
||||||
|
#if __STDC__
|
||||||
|
|
||||||
|
typedef void *kwset_t;
|
||||||
|
|
||||||
|
/* Return an opaque pointer to a newly allocated keyword set, or NULL
|
||||||
|
if enough memory cannot be obtained. The argument if non-NULL
|
||||||
|
specifies a table of character translations to be applied to all
|
||||||
|
pattern and search text. */
|
||||||
|
extern kwset_t kwsalloc(char *);
|
||||||
|
|
||||||
|
/* Incrementally extend the keyword set to include the given string.
|
||||||
|
Return NULL for success, or an error message. Remember an index
|
||||||
|
number for each keyword included in the set. */
|
||||||
|
extern char *kwsincr(kwset_t, char *, size_t);
|
||||||
|
|
||||||
|
/* When the keyword set has been completely built, prepare it for
|
||||||
|
use. Return NULL for success, or an error message. */
|
||||||
|
extern char *kwsprep(kwset_t);
|
||||||
|
|
||||||
|
/* Search through the given buffer for a member of the keyword set.
|
||||||
|
Return a pointer to the leftmost longest match found, or NULL if
|
||||||
|
no match is found. If foundlen is non-NULL, store the length of
|
||||||
|
the matching substring in the integer it points to. Similarly,
|
||||||
|
if foundindex is non-NULL, store the index of the particular
|
||||||
|
keyword found therein. */
|
||||||
|
extern char *kwsexec(kwset_t, char *, size_t, struct kwsmatch *);
|
||||||
|
|
||||||
|
/* Deallocate the given keyword set and all its associated storage. */
|
||||||
|
extern void kwsfree(kwset_t);
|
||||||
|
|
||||||
|
#else
|
||||||
|
|
||||||
|
typedef char *kwset_t;
|
||||||
|
|
||||||
|
extern kwset_t kwsalloc();
|
||||||
|
extern char *kwsincr();
|
||||||
|
extern char *kwsprep();
|
||||||
|
extern char *kwsexec();
|
||||||
|
extern void kwsfree();
|
||||||
|
|
||||||
|
#endif
|
454
gnu/usr.bin/grep/obstack.c
Normal file
454
gnu/usr.bin/grep/obstack.c
Normal file
@ -0,0 +1,454 @@
|
|||||||
|
/* obstack.c - subroutines used implicitly by object stack macros
|
||||||
|
Copyright (C) 1988, 1993 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify it
|
||||||
|
under the terms of the GNU General Public License as published by the
|
||||||
|
Free Software Foundation; either version 2, or (at your option) any
|
||||||
|
later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program; if not, write to the Free Software
|
||||||
|
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||||
|
|
||||||
|
#include "obstack.h"
|
||||||
|
|
||||||
|
/* This is just to get __GNU_LIBRARY__ defined. */
|
||||||
|
#include <stdio.h>
|
||||||
|
|
||||||
|
/* Comment out all this code if we are using the GNU C Library, and are not
|
||||||
|
actually compiling the library itself. This code is part of the GNU C
|
||||||
|
Library, but also included in many other GNU distributions. Compiling
|
||||||
|
and linking in this code is a waste when using the GNU C library
|
||||||
|
(especially if it is a shared library). Rather than having every GNU
|
||||||
|
program understand `configure --with-gnu-libc' and omit the object files,
|
||||||
|
it is simpler to just do this in the source for each such file. */
|
||||||
|
|
||||||
|
#if defined (_LIBC) || !defined (__GNU_LIBRARY__)
|
||||||
|
|
||||||
|
|
||||||
|
#ifdef __STDC__
|
||||||
|
#define POINTER void *
|
||||||
|
#else
|
||||||
|
#define POINTER char *
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Determine default alignment. */
|
||||||
|
struct fooalign {char x; double d;};
|
||||||
|
#define DEFAULT_ALIGNMENT \
|
||||||
|
((PTR_INT_TYPE) ((char *)&((struct fooalign *) 0)->d - (char *)0))
|
||||||
|
/* If malloc were really smart, it would round addresses to DEFAULT_ALIGNMENT.
|
||||||
|
But in fact it might be less smart and round addresses to as much as
|
||||||
|
DEFAULT_ROUNDING. So we prepare for it to do that. */
|
||||||
|
union fooround {long x; double d;};
|
||||||
|
#define DEFAULT_ROUNDING (sizeof (union fooround))
|
||||||
|
|
||||||
|
/* When we copy a long block of data, this is the unit to do it with.
|
||||||
|
On some machines, copying successive ints does not work;
|
||||||
|
in such a case, redefine COPYING_UNIT to `long' (if that works)
|
||||||
|
or `char' as a last resort. */
|
||||||
|
#ifndef COPYING_UNIT
|
||||||
|
#define COPYING_UNIT int
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* The non-GNU-C macros copy the obstack into this global variable
|
||||||
|
to avoid multiple evaluation. */
|
||||||
|
|
||||||
|
struct obstack *_obstack;
|
||||||
|
|
||||||
|
/* Define a macro that either calls functions with the traditional malloc/free
|
||||||
|
calling interface, or calls functions with the mmalloc/mfree interface
|
||||||
|
(that adds an extra first argument), based on the state of use_extra_arg.
|
||||||
|
For free, do not use ?:, since some compilers, like the MIPS compilers,
|
||||||
|
do not allow (expr) ? void : void. */
|
||||||
|
|
||||||
|
#define CALL_CHUNKFUN(h, size) \
|
||||||
|
(((h) -> use_extra_arg) \
|
||||||
|
? (*(h)->chunkfun) ((h)->extra_arg, (size)) \
|
||||||
|
: (*(h)->chunkfun) ((size)))
|
||||||
|
|
||||||
|
#define CALL_FREEFUN(h, old_chunk) \
|
||||||
|
do { \
|
||||||
|
if ((h) -> use_extra_arg) \
|
||||||
|
(*(h)->freefun) ((h)->extra_arg, (old_chunk)); \
|
||||||
|
else \
|
||||||
|
(*(h)->freefun) ((old_chunk)); \
|
||||||
|
} while (0)
|
||||||
|
|
||||||
|
|
||||||
|
/* Initialize an obstack H for use. Specify chunk size SIZE (0 means default).
|
||||||
|
Objects start on multiples of ALIGNMENT (0 means use default).
|
||||||
|
CHUNKFUN is the function to use to allocate chunks,
|
||||||
|
and FREEFUN the function to free them. */
|
||||||
|
|
||||||
|
void
|
||||||
|
_obstack_begin (h, size, alignment, chunkfun, freefun)
|
||||||
|
struct obstack *h;
|
||||||
|
int size;
|
||||||
|
int alignment;
|
||||||
|
POINTER (*chunkfun) ();
|
||||||
|
void (*freefun) ();
|
||||||
|
{
|
||||||
|
register struct _obstack_chunk* chunk; /* points to new chunk */
|
||||||
|
|
||||||
|
if (alignment == 0)
|
||||||
|
alignment = DEFAULT_ALIGNMENT;
|
||||||
|
if (size == 0)
|
||||||
|
/* Default size is what GNU malloc can fit in a 4096-byte block. */
|
||||||
|
{
|
||||||
|
/* 12 is sizeof (mhead) and 4 is EXTRA from GNU malloc.
|
||||||
|
Use the values for range checking, because if range checking is off,
|
||||||
|
the extra bytes won't be missed terribly, but if range checking is on
|
||||||
|
and we used a larger request, a whole extra 4096 bytes would be
|
||||||
|
allocated.
|
||||||
|
|
||||||
|
These number are irrelevant to the new GNU malloc. I suspect it is
|
||||||
|
less sensitive to the size of the request. */
|
||||||
|
int extra = ((((12 + DEFAULT_ROUNDING - 1) & ~(DEFAULT_ROUNDING - 1))
|
||||||
|
+ 4 + DEFAULT_ROUNDING - 1)
|
||||||
|
& ~(DEFAULT_ROUNDING - 1));
|
||||||
|
size = 4096 - extra;
|
||||||
|
}
|
||||||
|
|
||||||
|
h->chunkfun = (struct _obstack_chunk * (*)()) chunkfun;
|
||||||
|
h->freefun = freefun;
|
||||||
|
h->chunk_size = size;
|
||||||
|
h->alignment_mask = alignment - 1;
|
||||||
|
h->use_extra_arg = 0;
|
||||||
|
|
||||||
|
chunk = h->chunk = CALL_CHUNKFUN (h, h -> chunk_size);
|
||||||
|
h->next_free = h->object_base = chunk->contents;
|
||||||
|
h->chunk_limit = chunk->limit
|
||||||
|
= (char *) chunk + h->chunk_size;
|
||||||
|
chunk->prev = 0;
|
||||||
|
/* The initial chunk now contains no empty object. */
|
||||||
|
h->maybe_empty_object = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
_obstack_begin_1 (h, size, alignment, chunkfun, freefun, arg)
|
||||||
|
struct obstack *h;
|
||||||
|
int size;
|
||||||
|
int alignment;
|
||||||
|
POINTER (*chunkfun) ();
|
||||||
|
void (*freefun) ();
|
||||||
|
POINTER arg;
|
||||||
|
{
|
||||||
|
register struct _obstack_chunk* chunk; /* points to new chunk */
|
||||||
|
|
||||||
|
if (alignment == 0)
|
||||||
|
alignment = DEFAULT_ALIGNMENT;
|
||||||
|
if (size == 0)
|
||||||
|
/* Default size is what GNU malloc can fit in a 4096-byte block. */
|
||||||
|
{
|
||||||
|
/* 12 is sizeof (mhead) and 4 is EXTRA from GNU malloc.
|
||||||
|
Use the values for range checking, because if range checking is off,
|
||||||
|
the extra bytes won't be missed terribly, but if range checking is on
|
||||||
|
and we used a larger request, a whole extra 4096 bytes would be
|
||||||
|
allocated.
|
||||||
|
|
||||||
|
These number are irrelevant to the new GNU malloc. I suspect it is
|
||||||
|
less sensitive to the size of the request. */
|
||||||
|
int extra = ((((12 + DEFAULT_ROUNDING - 1) & ~(DEFAULT_ROUNDING - 1))
|
||||||
|
+ 4 + DEFAULT_ROUNDING - 1)
|
||||||
|
& ~(DEFAULT_ROUNDING - 1));
|
||||||
|
size = 4096 - extra;
|
||||||
|
}
|
||||||
|
|
||||||
|
h->chunkfun = (struct _obstack_chunk * (*)()) chunkfun;
|
||||||
|
h->freefun = freefun;
|
||||||
|
h->chunk_size = size;
|
||||||
|
h->alignment_mask = alignment - 1;
|
||||||
|
h->extra_arg = arg;
|
||||||
|
h->use_extra_arg = 1;
|
||||||
|
|
||||||
|
chunk = h->chunk = CALL_CHUNKFUN (h, h -> chunk_size);
|
||||||
|
h->next_free = h->object_base = chunk->contents;
|
||||||
|
h->chunk_limit = chunk->limit
|
||||||
|
= (char *) chunk + h->chunk_size;
|
||||||
|
chunk->prev = 0;
|
||||||
|
/* The initial chunk now contains no empty object. */
|
||||||
|
h->maybe_empty_object = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Allocate a new current chunk for the obstack *H
|
||||||
|
on the assumption that LENGTH bytes need to be added
|
||||||
|
to the current object, or a new object of length LENGTH allocated.
|
||||||
|
Copies any partial object from the end of the old chunk
|
||||||
|
to the beginning of the new one. */
|
||||||
|
|
||||||
|
void
|
||||||
|
_obstack_newchunk (h, length)
|
||||||
|
struct obstack *h;
|
||||||
|
int length;
|
||||||
|
{
|
||||||
|
register struct _obstack_chunk* old_chunk = h->chunk;
|
||||||
|
register struct _obstack_chunk* new_chunk;
|
||||||
|
register long new_size;
|
||||||
|
register int obj_size = h->next_free - h->object_base;
|
||||||
|
register int i;
|
||||||
|
int already;
|
||||||
|
|
||||||
|
/* Compute size for new chunk. */
|
||||||
|
new_size = (obj_size + length) + (obj_size >> 3) + 100;
|
||||||
|
if (new_size < h->chunk_size)
|
||||||
|
new_size = h->chunk_size;
|
||||||
|
|
||||||
|
/* Allocate and initialize the new chunk. */
|
||||||
|
new_chunk = h->chunk = CALL_CHUNKFUN (h, new_size);
|
||||||
|
new_chunk->prev = old_chunk;
|
||||||
|
new_chunk->limit = h->chunk_limit = (char *) new_chunk + new_size;
|
||||||
|
|
||||||
|
/* Move the existing object to the new chunk.
|
||||||
|
Word at a time is fast and is safe if the object
|
||||||
|
is sufficiently aligned. */
|
||||||
|
if (h->alignment_mask + 1 >= DEFAULT_ALIGNMENT)
|
||||||
|
{
|
||||||
|
for (i = obj_size / sizeof (COPYING_UNIT) - 1;
|
||||||
|
i >= 0; i--)
|
||||||
|
((COPYING_UNIT *)new_chunk->contents)[i]
|
||||||
|
= ((COPYING_UNIT *)h->object_base)[i];
|
||||||
|
/* We used to copy the odd few remaining bytes as one extra COPYING_UNIT,
|
||||||
|
but that can cross a page boundary on a machine
|
||||||
|
which does not do strict alignment for COPYING_UNITS. */
|
||||||
|
already = obj_size / sizeof (COPYING_UNIT) * sizeof (COPYING_UNIT);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
already = 0;
|
||||||
|
/* Copy remaining bytes one by one. */
|
||||||
|
for (i = already; i < obj_size; i++)
|
||||||
|
new_chunk->contents[i] = h->object_base[i];
|
||||||
|
|
||||||
|
/* If the object just copied was the only data in OLD_CHUNK,
|
||||||
|
free that chunk and remove it from the chain.
|
||||||
|
But not if that chunk might contain an empty object. */
|
||||||
|
if (h->object_base == old_chunk->contents && ! h->maybe_empty_object)
|
||||||
|
{
|
||||||
|
new_chunk->prev = old_chunk->prev;
|
||||||
|
CALL_FREEFUN (h, old_chunk);
|
||||||
|
}
|
||||||
|
|
||||||
|
h->object_base = new_chunk->contents;
|
||||||
|
h->next_free = h->object_base + obj_size;
|
||||||
|
/* The new chunk certainly contains no empty object yet. */
|
||||||
|
h->maybe_empty_object = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Return nonzero if object OBJ has been allocated from obstack H.
|
||||||
|
This is here for debugging.
|
||||||
|
If you use it in a program, you are probably losing. */
|
||||||
|
|
||||||
|
int
|
||||||
|
_obstack_allocated_p (h, obj)
|
||||||
|
struct obstack *h;
|
||||||
|
POINTER obj;
|
||||||
|
{
|
||||||
|
register struct _obstack_chunk* lp; /* below addr of any objects in this chunk */
|
||||||
|
register struct _obstack_chunk* plp; /* point to previous chunk if any */
|
||||||
|
|
||||||
|
lp = (h)->chunk;
|
||||||
|
/* We use >= rather than > since the object cannot be exactly at
|
||||||
|
the beginning of the chunk but might be an empty object exactly
|
||||||
|
at the end of an adjacent chunk. */
|
||||||
|
while (lp != 0 && ((POINTER)lp >= obj || (POINTER)(lp)->limit < obj))
|
||||||
|
{
|
||||||
|
plp = lp->prev;
|
||||||
|
lp = plp;
|
||||||
|
}
|
||||||
|
return lp != 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Free objects in obstack H, including OBJ and everything allocate
|
||||||
|
more recently than OBJ. If OBJ is zero, free everything in H. */
|
||||||
|
|
||||||
|
#undef obstack_free
|
||||||
|
|
||||||
|
/* This function has two names with identical definitions.
|
||||||
|
This is the first one, called from non-ANSI code. */
|
||||||
|
|
||||||
|
void
|
||||||
|
_obstack_free (h, obj)
|
||||||
|
struct obstack *h;
|
||||||
|
POINTER obj;
|
||||||
|
{
|
||||||
|
register struct _obstack_chunk* lp; /* below addr of any objects in this chunk */
|
||||||
|
register struct _obstack_chunk* plp; /* point to previous chunk if any */
|
||||||
|
|
||||||
|
lp = h->chunk;
|
||||||
|
/* We use >= because there cannot be an object at the beginning of a chunk.
|
||||||
|
But there can be an empty object at that address
|
||||||
|
at the end of another chunk. */
|
||||||
|
while (lp != 0 && ((POINTER)lp >= obj || (POINTER)(lp)->limit < obj))
|
||||||
|
{
|
||||||
|
plp = lp->prev;
|
||||||
|
CALL_FREEFUN (h, lp);
|
||||||
|
lp = plp;
|
||||||
|
/* If we switch chunks, we can't tell whether the new current
|
||||||
|
chunk contains an empty object, so assume that it may. */
|
||||||
|
h->maybe_empty_object = 1;
|
||||||
|
}
|
||||||
|
if (lp)
|
||||||
|
{
|
||||||
|
h->object_base = h->next_free = (char *)(obj);
|
||||||
|
h->chunk_limit = lp->limit;
|
||||||
|
h->chunk = lp;
|
||||||
|
}
|
||||||
|
else if (obj != 0)
|
||||||
|
/* obj is not in any of the chunks! */
|
||||||
|
abort ();
|
||||||
|
}
|
||||||
|
|
||||||
|
/* This function is used from ANSI code. */
|
||||||
|
|
||||||
|
void
|
||||||
|
obstack_free (h, obj)
|
||||||
|
struct obstack *h;
|
||||||
|
POINTER obj;
|
||||||
|
{
|
||||||
|
register struct _obstack_chunk* lp; /* below addr of any objects in this chunk */
|
||||||
|
register struct _obstack_chunk* plp; /* point to previous chunk if any */
|
||||||
|
|
||||||
|
lp = h->chunk;
|
||||||
|
/* We use >= because there cannot be an object at the beginning of a chunk.
|
||||||
|
But there can be an empty object at that address
|
||||||
|
at the end of another chunk. */
|
||||||
|
while (lp != 0 && ((POINTER)lp >= obj || (POINTER)(lp)->limit < obj))
|
||||||
|
{
|
||||||
|
plp = lp->prev;
|
||||||
|
CALL_FREEFUN (h, lp);
|
||||||
|
lp = plp;
|
||||||
|
/* If we switch chunks, we can't tell whether the new current
|
||||||
|
chunk contains an empty object, so assume that it may. */
|
||||||
|
h->maybe_empty_object = 1;
|
||||||
|
}
|
||||||
|
if (lp)
|
||||||
|
{
|
||||||
|
h->object_base = h->next_free = (char *)(obj);
|
||||||
|
h->chunk_limit = lp->limit;
|
||||||
|
h->chunk = lp;
|
||||||
|
}
|
||||||
|
else if (obj != 0)
|
||||||
|
/* obj is not in any of the chunks! */
|
||||||
|
abort ();
|
||||||
|
}
|
||||||
|
|
||||||
|
#if 0
|
||||||
|
/* These are now turned off because the applications do not use it
|
||||||
|
and it uses bcopy via obstack_grow, which causes trouble on sysV. */
|
||||||
|
|
||||||
|
/* Now define the functional versions of the obstack macros.
|
||||||
|
Define them to simply use the corresponding macros to do the job. */
|
||||||
|
|
||||||
|
#ifdef __STDC__
|
||||||
|
/* These function definitions do not work with non-ANSI preprocessors;
|
||||||
|
they won't pass through the macro names in parentheses. */
|
||||||
|
|
||||||
|
/* The function names appear in parentheses in order to prevent
|
||||||
|
the macro-definitions of the names from being expanded there. */
|
||||||
|
|
||||||
|
POINTER (obstack_base) (obstack)
|
||||||
|
struct obstack *obstack;
|
||||||
|
{
|
||||||
|
return obstack_base (obstack);
|
||||||
|
}
|
||||||
|
|
||||||
|
POINTER (obstack_next_free) (obstack)
|
||||||
|
struct obstack *obstack;
|
||||||
|
{
|
||||||
|
return obstack_next_free (obstack);
|
||||||
|
}
|
||||||
|
|
||||||
|
int (obstack_object_size) (obstack)
|
||||||
|
struct obstack *obstack;
|
||||||
|
{
|
||||||
|
return obstack_object_size (obstack);
|
||||||
|
}
|
||||||
|
|
||||||
|
int (obstack_room) (obstack)
|
||||||
|
struct obstack *obstack;
|
||||||
|
{
|
||||||
|
return obstack_room (obstack);
|
||||||
|
}
|
||||||
|
|
||||||
|
void (obstack_grow) (obstack, pointer, length)
|
||||||
|
struct obstack *obstack;
|
||||||
|
POINTER pointer;
|
||||||
|
int length;
|
||||||
|
{
|
||||||
|
obstack_grow (obstack, pointer, length);
|
||||||
|
}
|
||||||
|
|
||||||
|
void (obstack_grow0) (obstack, pointer, length)
|
||||||
|
struct obstack *obstack;
|
||||||
|
POINTER pointer;
|
||||||
|
int length;
|
||||||
|
{
|
||||||
|
obstack_grow0 (obstack, pointer, length);
|
||||||
|
}
|
||||||
|
|
||||||
|
void (obstack_1grow) (obstack, character)
|
||||||
|
struct obstack *obstack;
|
||||||
|
int character;
|
||||||
|
{
|
||||||
|
obstack_1grow (obstack, character);
|
||||||
|
}
|
||||||
|
|
||||||
|
void (obstack_blank) (obstack, length)
|
||||||
|
struct obstack *obstack;
|
||||||
|
int length;
|
||||||
|
{
|
||||||
|
obstack_blank (obstack, length);
|
||||||
|
}
|
||||||
|
|
||||||
|
void (obstack_1grow_fast) (obstack, character)
|
||||||
|
struct obstack *obstack;
|
||||||
|
int character;
|
||||||
|
{
|
||||||
|
obstack_1grow_fast (obstack, character);
|
||||||
|
}
|
||||||
|
|
||||||
|
void (obstack_blank_fast) (obstack, length)
|
||||||
|
struct obstack *obstack;
|
||||||
|
int length;
|
||||||
|
{
|
||||||
|
obstack_blank_fast (obstack, length);
|
||||||
|
}
|
||||||
|
|
||||||
|
POINTER (obstack_finish) (obstack)
|
||||||
|
struct obstack *obstack;
|
||||||
|
{
|
||||||
|
return obstack_finish (obstack);
|
||||||
|
}
|
||||||
|
|
||||||
|
POINTER (obstack_alloc) (obstack, length)
|
||||||
|
struct obstack *obstack;
|
||||||
|
int length;
|
||||||
|
{
|
||||||
|
return obstack_alloc (obstack, length);
|
||||||
|
}
|
||||||
|
|
||||||
|
POINTER (obstack_copy) (obstack, pointer, length)
|
||||||
|
struct obstack *obstack;
|
||||||
|
POINTER pointer;
|
||||||
|
int length;
|
||||||
|
{
|
||||||
|
return obstack_copy (obstack, pointer, length);
|
||||||
|
}
|
||||||
|
|
||||||
|
POINTER (obstack_copy0) (obstack, pointer, length)
|
||||||
|
struct obstack *obstack;
|
||||||
|
POINTER pointer;
|
||||||
|
int length;
|
||||||
|
{
|
||||||
|
return obstack_copy0 (obstack, pointer, length);
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* __STDC__ */
|
||||||
|
|
||||||
|
#endif /* 0 */
|
||||||
|
|
||||||
|
#endif /* _LIBC or not __GNU_LIBRARY__. */
|
484
gnu/usr.bin/grep/obstack.h
Normal file
484
gnu/usr.bin/grep/obstack.h
Normal file
@ -0,0 +1,484 @@
|
|||||||
|
/* obstack.h - object stack macros
|
||||||
|
Copyright (C) 1988, 1992 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify it
|
||||||
|
under the terms of the GNU General Public License as published by the
|
||||||
|
Free Software Foundation; either version 2, or (at your option) any
|
||||||
|
later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program; if not, write to the Free Software
|
||||||
|
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||||
|
|
||||||
|
/* Summary:
|
||||||
|
|
||||||
|
All the apparent functions defined here are macros. The idea
|
||||||
|
is that you would use these pre-tested macros to solve a
|
||||||
|
very specific set of problems, and they would run fast.
|
||||||
|
Caution: no side-effects in arguments please!! They may be
|
||||||
|
evaluated MANY times!!
|
||||||
|
|
||||||
|
These macros operate a stack of objects. Each object starts life
|
||||||
|
small, and may grow to maturity. (Consider building a word syllable
|
||||||
|
by syllable.) An object can move while it is growing. Once it has
|
||||||
|
been "finished" it never changes address again. So the "top of the
|
||||||
|
stack" is typically an immature growing object, while the rest of the
|
||||||
|
stack is of mature, fixed size and fixed address objects.
|
||||||
|
|
||||||
|
These routines grab large chunks of memory, using a function you
|
||||||
|
supply, called `obstack_chunk_alloc'. On occasion, they free chunks,
|
||||||
|
by calling `obstack_chunk_free'. You must define them and declare
|
||||||
|
them before using any obstack macros.
|
||||||
|
|
||||||
|
Each independent stack is represented by a `struct obstack'.
|
||||||
|
Each of the obstack macros expects a pointer to such a structure
|
||||||
|
as the first argument.
|
||||||
|
|
||||||
|
One motivation for this package is the problem of growing char strings
|
||||||
|
in symbol tables. Unless you are "fascist pig with a read-only mind"
|
||||||
|
--Gosper's immortal quote from HAKMEM item 154, out of context--you
|
||||||
|
would not like to put any arbitrary upper limit on the length of your
|
||||||
|
symbols.
|
||||||
|
|
||||||
|
In practice this often means you will build many short symbols and a
|
||||||
|
few long symbols. At the time you are reading a symbol you don't know
|
||||||
|
how long it is. One traditional method is to read a symbol into a
|
||||||
|
buffer, realloc()ating the buffer every time you try to read a symbol
|
||||||
|
that is longer than the buffer. This is beaut, but you still will
|
||||||
|
want to copy the symbol from the buffer to a more permanent
|
||||||
|
symbol-table entry say about half the time.
|
||||||
|
|
||||||
|
With obstacks, you can work differently. Use one obstack for all symbol
|
||||||
|
names. As you read a symbol, grow the name in the obstack gradually.
|
||||||
|
When the name is complete, finalize it. Then, if the symbol exists already,
|
||||||
|
free the newly read name.
|
||||||
|
|
||||||
|
The way we do this is to take a large chunk, allocating memory from
|
||||||
|
low addresses. When you want to build a symbol in the chunk you just
|
||||||
|
add chars above the current "high water mark" in the chunk. When you
|
||||||
|
have finished adding chars, because you got to the end of the symbol,
|
||||||
|
you know how long the chars are, and you can create a new object.
|
||||||
|
Mostly the chars will not burst over the highest address of the chunk,
|
||||||
|
because you would typically expect a chunk to be (say) 100 times as
|
||||||
|
long as an average object.
|
||||||
|
|
||||||
|
In case that isn't clear, when we have enough chars to make up
|
||||||
|
the object, THEY ARE ALREADY CONTIGUOUS IN THE CHUNK (guaranteed)
|
||||||
|
so we just point to it where it lies. No moving of chars is
|
||||||
|
needed and this is the second win: potentially long strings need
|
||||||
|
never be explicitly shuffled. Once an object is formed, it does not
|
||||||
|
change its address during its lifetime.
|
||||||
|
|
||||||
|
When the chars burst over a chunk boundary, we allocate a larger
|
||||||
|
chunk, and then copy the partly formed object from the end of the old
|
||||||
|
chunk to the beginning of the new larger chunk. We then carry on
|
||||||
|
accreting characters to the end of the object as we normally would.
|
||||||
|
|
||||||
|
A special macro is provided to add a single char at a time to a
|
||||||
|
growing object. This allows the use of register variables, which
|
||||||
|
break the ordinary 'growth' macro.
|
||||||
|
|
||||||
|
Summary:
|
||||||
|
We allocate large chunks.
|
||||||
|
We carve out one object at a time from the current chunk.
|
||||||
|
Once carved, an object never moves.
|
||||||
|
We are free to append data of any size to the currently
|
||||||
|
growing object.
|
||||||
|
Exactly one object is growing in an obstack at any one time.
|
||||||
|
You can run one obstack per control block.
|
||||||
|
You may have as many control blocks as you dare.
|
||||||
|
Because of the way we do it, you can `unwind' an obstack
|
||||||
|
back to a previous state. (You may remove objects much
|
||||||
|
as you would with a stack.)
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
/* Don't do the contents of this file more than once. */
|
||||||
|
|
||||||
|
#ifndef __OBSTACKS__
|
||||||
|
#define __OBSTACKS__
|
||||||
|
|
||||||
|
/* We use subtraction of (char *)0 instead of casting to int
|
||||||
|
because on word-addressable machines a simple cast to int
|
||||||
|
may ignore the byte-within-word field of the pointer. */
|
||||||
|
|
||||||
|
#ifndef __PTR_TO_INT
|
||||||
|
#define __PTR_TO_INT(P) ((P) - (char *)0)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef __INT_TO_PTR
|
||||||
|
#define __INT_TO_PTR(P) ((P) + (char *)0)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* We need the type of the resulting object. In ANSI C it is ptrdiff_t
|
||||||
|
but in traditional C it is usually long. If we are in ANSI C and
|
||||||
|
don't already have ptrdiff_t get it. */
|
||||||
|
|
||||||
|
#if defined (__STDC__) && ! defined (offsetof)
|
||||||
|
#if defined (__GNUC__) && defined (IN_GCC)
|
||||||
|
/* On Next machine, the system's stddef.h screws up if included
|
||||||
|
after we have defined just ptrdiff_t, so include all of gstddef.h.
|
||||||
|
Otherwise, define just ptrdiff_t, which is all we need. */
|
||||||
|
#ifndef __NeXT__
|
||||||
|
#define __need_ptrdiff_t
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* While building GCC, the stddef.h that goes with GCC has this name. */
|
||||||
|
#include "gstddef.h"
|
||||||
|
#else
|
||||||
|
#include <stddef.h>
|
||||||
|
#endif
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef __STDC__
|
||||||
|
#define PTR_INT_TYPE ptrdiff_t
|
||||||
|
#else
|
||||||
|
#define PTR_INT_TYPE long
|
||||||
|
#endif
|
||||||
|
|
||||||
|
struct _obstack_chunk /* Lives at front of each chunk. */
|
||||||
|
{
|
||||||
|
char *limit; /* 1 past end of this chunk */
|
||||||
|
struct _obstack_chunk *prev; /* address of prior chunk or NULL */
|
||||||
|
char contents[4]; /* objects begin here */
|
||||||
|
};
|
||||||
|
|
||||||
|
struct obstack /* control current object in current chunk */
|
||||||
|
{
|
||||||
|
long chunk_size; /* preferred size to allocate chunks in */
|
||||||
|
struct _obstack_chunk* chunk; /* address of current struct obstack_chunk */
|
||||||
|
char *object_base; /* address of object we are building */
|
||||||
|
char *next_free; /* where to add next char to current object */
|
||||||
|
char *chunk_limit; /* address of char after current chunk */
|
||||||
|
PTR_INT_TYPE temp; /* Temporary for some macros. */
|
||||||
|
int alignment_mask; /* Mask of alignment for each object. */
|
||||||
|
struct _obstack_chunk *(*chunkfun) (); /* User's fcn to allocate a chunk. */
|
||||||
|
void (*freefun) (); /* User's function to free a chunk. */
|
||||||
|
char *extra_arg; /* first arg for chunk alloc/dealloc funcs */
|
||||||
|
unsigned use_extra_arg:1; /* chunk alloc/dealloc funcs take extra arg */
|
||||||
|
unsigned maybe_empty_object:1;/* There is a possibility that the current
|
||||||
|
chunk contains a zero-length object. This
|
||||||
|
prevents freeing the chunk if we allocate
|
||||||
|
a bigger chunk to replace it. */
|
||||||
|
};
|
||||||
|
|
||||||
|
/* Declare the external functions we use; they are in obstack.c. */
|
||||||
|
|
||||||
|
#ifdef __STDC__
|
||||||
|
extern void _obstack_newchunk (struct obstack *, int);
|
||||||
|
extern void _obstack_free (struct obstack *, void *);
|
||||||
|
extern void _obstack_begin (struct obstack *, int, int,
|
||||||
|
void *(*) (), void (*) ());
|
||||||
|
extern void _obstack_begin_1 (struct obstack *, int, int,
|
||||||
|
void *(*) (), void (*) (), void *);
|
||||||
|
#else
|
||||||
|
extern void _obstack_newchunk ();
|
||||||
|
extern void _obstack_free ();
|
||||||
|
extern void _obstack_begin ();
|
||||||
|
extern void _obstack_begin_1 ();
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef __STDC__
|
||||||
|
|
||||||
|
/* Do the function-declarations after the structs
|
||||||
|
but before defining the macros. */
|
||||||
|
|
||||||
|
void obstack_init (struct obstack *obstack);
|
||||||
|
|
||||||
|
void * obstack_alloc (struct obstack *obstack, int size);
|
||||||
|
|
||||||
|
void * obstack_copy (struct obstack *obstack, void *address, int size);
|
||||||
|
void * obstack_copy0 (struct obstack *obstack, void *address, int size);
|
||||||
|
|
||||||
|
void obstack_free (struct obstack *obstack, void *block);
|
||||||
|
|
||||||
|
void obstack_blank (struct obstack *obstack, int size);
|
||||||
|
|
||||||
|
void obstack_grow (struct obstack *obstack, void *data, int size);
|
||||||
|
void obstack_grow0 (struct obstack *obstack, void *data, int size);
|
||||||
|
|
||||||
|
void obstack_1grow (struct obstack *obstack, int data_char);
|
||||||
|
void obstack_ptr_grow (struct obstack *obstack, void *data);
|
||||||
|
void obstack_int_grow (struct obstack *obstack, int data);
|
||||||
|
|
||||||
|
void * obstack_finish (struct obstack *obstack);
|
||||||
|
|
||||||
|
int obstack_object_size (struct obstack *obstack);
|
||||||
|
|
||||||
|
int obstack_room (struct obstack *obstack);
|
||||||
|
void obstack_1grow_fast (struct obstack *obstack, int data_char);
|
||||||
|
void obstack_ptr_grow_fast (struct obstack *obstack, void *data);
|
||||||
|
void obstack_int_grow_fast (struct obstack *obstack, int data);
|
||||||
|
void obstack_blank_fast (struct obstack *obstack, int size);
|
||||||
|
|
||||||
|
void * obstack_base (struct obstack *obstack);
|
||||||
|
void * obstack_next_free (struct obstack *obstack);
|
||||||
|
int obstack_alignment_mask (struct obstack *obstack);
|
||||||
|
int obstack_chunk_size (struct obstack *obstack);
|
||||||
|
|
||||||
|
#endif /* __STDC__ */
|
||||||
|
|
||||||
|
/* Non-ANSI C cannot really support alternative functions for these macros,
|
||||||
|
so we do not declare them. */
|
||||||
|
|
||||||
|
/* Pointer to beginning of object being allocated or to be allocated next.
|
||||||
|
Note that this might not be the final address of the object
|
||||||
|
because a new chunk might be needed to hold the final size. */
|
||||||
|
|
||||||
|
#define obstack_base(h) ((h)->object_base)
|
||||||
|
|
||||||
|
/* Size for allocating ordinary chunks. */
|
||||||
|
|
||||||
|
#define obstack_chunk_size(h) ((h)->chunk_size)
|
||||||
|
|
||||||
|
/* Pointer to next byte not yet allocated in current chunk. */
|
||||||
|
|
||||||
|
#define obstack_next_free(h) ((h)->next_free)
|
||||||
|
|
||||||
|
/* Mask specifying low bits that should be clear in address of an object. */
|
||||||
|
|
||||||
|
#define obstack_alignment_mask(h) ((h)->alignment_mask)
|
||||||
|
|
||||||
|
#define obstack_init(h) \
|
||||||
|
_obstack_begin ((h), 0, 0, \
|
||||||
|
(void *(*) ()) obstack_chunk_alloc, (void (*) ()) obstack_chunk_free)
|
||||||
|
|
||||||
|
#define obstack_begin(h, size) \
|
||||||
|
_obstack_begin ((h), (size), 0, \
|
||||||
|
(void *(*) ()) obstack_chunk_alloc, (void (*) ()) obstack_chunk_free)
|
||||||
|
|
||||||
|
#define obstack_specify_allocation(h, size, alignment, chunkfun, freefun) \
|
||||||
|
_obstack_begin ((h), (size), (alignment), \
|
||||||
|
(void *(*) ()) (chunkfun), (void (*) ()) (freefun))
|
||||||
|
|
||||||
|
#define obstack_specify_allocation_with_arg(h, size, alignment, chunkfun, freefun, arg) \
|
||||||
|
_obstack_begin_1 ((h), (size), (alignment), \
|
||||||
|
(void *(*) ()) (chunkfun), (void (*) ()) (freefun), (arg))
|
||||||
|
|
||||||
|
#define obstack_1grow_fast(h,achar) (*((h)->next_free)++ = achar)
|
||||||
|
|
||||||
|
#define obstack_blank_fast(h,n) ((h)->next_free += (n))
|
||||||
|
|
||||||
|
#if defined (__GNUC__) && defined (__STDC__)
|
||||||
|
#if __GNUC__ < 2 || defined(NeXT)
|
||||||
|
#define __extension__
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* For GNU C, if not -traditional,
|
||||||
|
we can define these macros to compute all args only once
|
||||||
|
without using a global variable.
|
||||||
|
Also, we can avoid using the `temp' slot, to make faster code. */
|
||||||
|
|
||||||
|
#define obstack_object_size(OBSTACK) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o = (OBSTACK); \
|
||||||
|
(unsigned) (__o->next_free - __o->object_base); })
|
||||||
|
|
||||||
|
#define obstack_room(OBSTACK) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o = (OBSTACK); \
|
||||||
|
(unsigned) (__o->chunk_limit - __o->next_free); })
|
||||||
|
|
||||||
|
/* Note that the call to _obstack_newchunk is enclosed in (..., 0)
|
||||||
|
so that we can avoid having void expressions
|
||||||
|
in the arms of the conditional expression.
|
||||||
|
Casting the third operand to void was tried before,
|
||||||
|
but some compilers won't accept it. */
|
||||||
|
#define obstack_grow(OBSTACK,where,length) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o = (OBSTACK); \
|
||||||
|
int __len = (length); \
|
||||||
|
((__o->next_free + __len > __o->chunk_limit) \
|
||||||
|
? (_obstack_newchunk (__o, __len), 0) : 0); \
|
||||||
|
bcopy (where, __o->next_free, __len); \
|
||||||
|
__o->next_free += __len; \
|
||||||
|
(void) 0; })
|
||||||
|
|
||||||
|
#define obstack_grow0(OBSTACK,where,length) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o = (OBSTACK); \
|
||||||
|
int __len = (length); \
|
||||||
|
((__o->next_free + __len + 1 > __o->chunk_limit) \
|
||||||
|
? (_obstack_newchunk (__o, __len + 1), 0) : 0), \
|
||||||
|
bcopy (where, __o->next_free, __len), \
|
||||||
|
__o->next_free += __len, \
|
||||||
|
*(__o->next_free)++ = 0; \
|
||||||
|
(void) 0; })
|
||||||
|
|
||||||
|
#define obstack_1grow(OBSTACK,datum) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o = (OBSTACK); \
|
||||||
|
((__o->next_free + 1 > __o->chunk_limit) \
|
||||||
|
? (_obstack_newchunk (__o, 1), 0) : 0), \
|
||||||
|
*(__o->next_free)++ = (datum); \
|
||||||
|
(void) 0; })
|
||||||
|
|
||||||
|
/* These assume that the obstack alignment is good enough for pointers or ints,
|
||||||
|
and that the data added so far to the current object
|
||||||
|
shares that much alignment. */
|
||||||
|
|
||||||
|
#define obstack_ptr_grow(OBSTACK,datum) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o = (OBSTACK); \
|
||||||
|
((__o->next_free + sizeof (void *) > __o->chunk_limit) \
|
||||||
|
? (_obstack_newchunk (__o, sizeof (void *)), 0) : 0), \
|
||||||
|
*((void **)__o->next_free)++ = ((void *)datum); \
|
||||||
|
(void) 0; })
|
||||||
|
|
||||||
|
#define obstack_int_grow(OBSTACK,datum) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o = (OBSTACK); \
|
||||||
|
((__o->next_free + sizeof (int) > __o->chunk_limit) \
|
||||||
|
? (_obstack_newchunk (__o, sizeof (int)), 0) : 0), \
|
||||||
|
*((int *)__o->next_free)++ = ((int)datum); \
|
||||||
|
(void) 0; })
|
||||||
|
|
||||||
|
#define obstack_ptr_grow_fast(h,aptr) (*((void **)(h)->next_free)++ = (void *)aptr)
|
||||||
|
#define obstack_int_grow_fast(h,aint) (*((int *)(h)->next_free)++ = (int)aint)
|
||||||
|
|
||||||
|
#define obstack_blank(OBSTACK,length) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o = (OBSTACK); \
|
||||||
|
int __len = (length); \
|
||||||
|
((__o->chunk_limit - __o->next_free < __len) \
|
||||||
|
? (_obstack_newchunk (__o, __len), 0) : 0); \
|
||||||
|
__o->next_free += __len; \
|
||||||
|
(void) 0; })
|
||||||
|
|
||||||
|
#define obstack_alloc(OBSTACK,length) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__h = (OBSTACK); \
|
||||||
|
obstack_blank (__h, (length)); \
|
||||||
|
obstack_finish (__h); })
|
||||||
|
|
||||||
|
#define obstack_copy(OBSTACK,where,length) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__h = (OBSTACK); \
|
||||||
|
obstack_grow (__h, (where), (length)); \
|
||||||
|
obstack_finish (__h); })
|
||||||
|
|
||||||
|
#define obstack_copy0(OBSTACK,where,length) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__h = (OBSTACK); \
|
||||||
|
obstack_grow0 (__h, (where), (length)); \
|
||||||
|
obstack_finish (__h); })
|
||||||
|
|
||||||
|
/* The local variable is named __o1 to avoid a name conflict
|
||||||
|
when obstack_blank is called. */
|
||||||
|
#define obstack_finish(OBSTACK) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o1 = (OBSTACK); \
|
||||||
|
void *value = (void *) __o1->object_base; \
|
||||||
|
if (__o1->next_free == value) \
|
||||||
|
__o1->maybe_empty_object = 1; \
|
||||||
|
__o1->next_free \
|
||||||
|
= __INT_TO_PTR ((__PTR_TO_INT (__o1->next_free)+__o1->alignment_mask)\
|
||||||
|
& ~ (__o1->alignment_mask)); \
|
||||||
|
((__o1->next_free - (char *)__o1->chunk \
|
||||||
|
> __o1->chunk_limit - (char *)__o1->chunk) \
|
||||||
|
? (__o1->next_free = __o1->chunk_limit) : 0); \
|
||||||
|
__o1->object_base = __o1->next_free; \
|
||||||
|
value; })
|
||||||
|
|
||||||
|
#define obstack_free(OBSTACK, OBJ) \
|
||||||
|
__extension__ \
|
||||||
|
({ struct obstack *__o = (OBSTACK); \
|
||||||
|
void *__obj = (OBJ); \
|
||||||
|
if (__obj > (void *)__o->chunk && __obj < (void *)__o->chunk_limit) \
|
||||||
|
__o->next_free = __o->object_base = __obj; \
|
||||||
|
else (obstack_free) (__o, __obj); })
|
||||||
|
|
||||||
|
#else /* not __GNUC__ or not __STDC__ */
|
||||||
|
|
||||||
|
#define obstack_object_size(h) \
|
||||||
|
(unsigned) ((h)->next_free - (h)->object_base)
|
||||||
|
|
||||||
|
#define obstack_room(h) \
|
||||||
|
(unsigned) ((h)->chunk_limit - (h)->next_free)
|
||||||
|
|
||||||
|
#define obstack_grow(h,where,length) \
|
||||||
|
( (h)->temp = (length), \
|
||||||
|
(((h)->next_free + (h)->temp > (h)->chunk_limit) \
|
||||||
|
? (_obstack_newchunk ((h), (h)->temp), 0) : 0), \
|
||||||
|
bcopy (where, (h)->next_free, (h)->temp), \
|
||||||
|
(h)->next_free += (h)->temp)
|
||||||
|
|
||||||
|
#define obstack_grow0(h,where,length) \
|
||||||
|
( (h)->temp = (length), \
|
||||||
|
(((h)->next_free + (h)->temp + 1 > (h)->chunk_limit) \
|
||||||
|
? (_obstack_newchunk ((h), (h)->temp + 1), 0) : 0), \
|
||||||
|
bcopy (where, (h)->next_free, (h)->temp), \
|
||||||
|
(h)->next_free += (h)->temp, \
|
||||||
|
*((h)->next_free)++ = 0)
|
||||||
|
|
||||||
|
#define obstack_1grow(h,datum) \
|
||||||
|
( (((h)->next_free + 1 > (h)->chunk_limit) \
|
||||||
|
? (_obstack_newchunk ((h), 1), 0) : 0), \
|
||||||
|
*((h)->next_free)++ = (datum))
|
||||||
|
|
||||||
|
#define obstack_ptr_grow(h,datum) \
|
||||||
|
( (((h)->next_free + sizeof (char *) > (h)->chunk_limit) \
|
||||||
|
? (_obstack_newchunk ((h), sizeof (char *)), 0) : 0), \
|
||||||
|
*((char **)(((h)->next_free+=sizeof(char *))-sizeof(char *))) = ((char *)datum))
|
||||||
|
|
||||||
|
#define obstack_int_grow(h,datum) \
|
||||||
|
( (((h)->next_free + sizeof (int) > (h)->chunk_limit) \
|
||||||
|
? (_obstack_newchunk ((h), sizeof (int)), 0) : 0), \
|
||||||
|
*((int *)(((h)->next_free+=sizeof(int))-sizeof(int))) = ((int)datum))
|
||||||
|
|
||||||
|
#define obstack_ptr_grow_fast(h,aptr) (*((char **)(h)->next_free)++ = (char *)aptr)
|
||||||
|
#define obstack_int_grow_fast(h,aint) (*((int *)(h)->next_free)++ = (int)aint)
|
||||||
|
|
||||||
|
#define obstack_blank(h,length) \
|
||||||
|
( (h)->temp = (length), \
|
||||||
|
(((h)->chunk_limit - (h)->next_free < (h)->temp) \
|
||||||
|
? (_obstack_newchunk ((h), (h)->temp), 0) : 0), \
|
||||||
|
(h)->next_free += (h)->temp)
|
||||||
|
|
||||||
|
#define obstack_alloc(h,length) \
|
||||||
|
(obstack_blank ((h), (length)), obstack_finish ((h)))
|
||||||
|
|
||||||
|
#define obstack_copy(h,where,length) \
|
||||||
|
(obstack_grow ((h), (where), (length)), obstack_finish ((h)))
|
||||||
|
|
||||||
|
#define obstack_copy0(h,where,length) \
|
||||||
|
(obstack_grow0 ((h), (where), (length)), obstack_finish ((h)))
|
||||||
|
|
||||||
|
#define obstack_finish(h) \
|
||||||
|
( ((h)->next_free == (h)->object_base \
|
||||||
|
? (((h)->maybe_empty_object = 1), 0) \
|
||||||
|
: 0), \
|
||||||
|
(h)->temp = __PTR_TO_INT ((h)->object_base), \
|
||||||
|
(h)->next_free \
|
||||||
|
= __INT_TO_PTR ((__PTR_TO_INT ((h)->next_free)+(h)->alignment_mask) \
|
||||||
|
& ~ ((h)->alignment_mask)), \
|
||||||
|
(((h)->next_free - (char *)(h)->chunk \
|
||||||
|
> (h)->chunk_limit - (char *)(h)->chunk) \
|
||||||
|
? ((h)->next_free = (h)->chunk_limit) : 0), \
|
||||||
|
(h)->object_base = (h)->next_free, \
|
||||||
|
__INT_TO_PTR ((h)->temp))
|
||||||
|
|
||||||
|
#ifdef __STDC__
|
||||||
|
#define obstack_free(h,obj) \
|
||||||
|
( (h)->temp = (char *)(obj) - (char *) (h)->chunk, \
|
||||||
|
(((h)->temp > 0 && (h)->temp < (h)->chunk_limit - (char *) (h)->chunk)\
|
||||||
|
? (int) ((h)->next_free = (h)->object_base \
|
||||||
|
= (h)->temp + (char *) (h)->chunk) \
|
||||||
|
: (((obstack_free) ((h), (h)->temp + (char *) (h)->chunk), 0), 0)))
|
||||||
|
#else
|
||||||
|
#define obstack_free(h,obj) \
|
||||||
|
( (h)->temp = (char *)(obj) - (char *) (h)->chunk, \
|
||||||
|
(((h)->temp > 0 && (h)->temp < (h)->chunk_limit - (char *) (h)->chunk)\
|
||||||
|
? (int) ((h)->next_free = (h)->object_base \
|
||||||
|
= (h)->temp + (char *) (h)->chunk) \
|
||||||
|
: (_obstack_free ((h), (h)->temp + (char *) (h)->chunk), 0)))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#endif /* not __GNUC__ or not __STDC__ */
|
||||||
|
|
||||||
|
#endif /* not __OBSTACKS__ */
|
File diff suppressed because it is too large
Load Diff
@ -1,5 +1,7 @@
|
|||||||
/* Definitions for data structures callers pass the regex library.
|
/* Definitions for data structures and routines for the regular
|
||||||
Copyright (C) 1985, 1989 Free Software Foundation, Inc.
|
expression library, version 0.12.
|
||||||
|
|
||||||
|
Copyright (C) 1985, 1989, 1990, 1991, 1992, 1993 Free Software Foundation, Inc.
|
||||||
|
|
||||||
This program is free software; you can redistribute it and/or modify
|
This program is free software; you can redistribute it and/or modify
|
||||||
it under the terms of the GNU General Public License as published by
|
it under the terms of the GNU General Public License as published by
|
||||||
@ -13,173 +15,476 @@
|
|||||||
|
|
||||||
You should have received a copy of the GNU General Public License
|
You should have received a copy of the GNU General Public License
|
||||||
along with this program; if not, write to the Free Software
|
along with this program; if not, write to the Free Software
|
||||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
|
||||||
|
|
||||||
|
#ifndef __REGEXP_LIBRARY_H__
|
||||||
|
#define __REGEXP_LIBRARY_H__
|
||||||
|
|
||||||
In other words, you are welcome to use, share and improve this program.
|
/* POSIX says that <sys/types.h> must be included (by the caller) before
|
||||||
You are forbidden to forbid anyone else to use, share and improve
|
<regex.h>. */
|
||||||
what you give them. Help stamp out software-hoarding! */
|
|
||||||
|
|
||||||
|
#ifdef VMS
|
||||||
/* Define number of parens for which we record the beginnings and ends.
|
/* VMS doesn't have `size_t' in <sys/types.h>, even though POSIX says it
|
||||||
This affects how much space the `struct re_registers' type takes up. */
|
should be there. */
|
||||||
#ifndef RE_NREGS
|
#include <stddef.h>
|
||||||
#define RE_NREGS 10
|
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* These bits are used in the obscure_syntax variable to choose among
|
|
||||||
alternative regexp syntaxes. */
|
|
||||||
|
|
||||||
/* 1 means plain parentheses serve as grouping, and backslash
|
/* The following bits are used to determine the regexp syntax we
|
||||||
parentheses are needed for literal searching.
|
recognize. The set/not-set meanings are chosen so that Emacs syntax
|
||||||
0 means backslash-parentheses are grouping, and plain parentheses
|
remains the value 0. The bits are given in alphabetical order, and
|
||||||
are for literal searching. */
|
the definitions shifted by one from the previous bit; thus, when we
|
||||||
#define RE_NO_BK_PARENS 1
|
add or remove a bit, only one other definition need change. */
|
||||||
|
typedef unsigned reg_syntax_t;
|
||||||
|
|
||||||
/* 1 means plain | serves as the "or"-operator, and \| is a literal.
|
/* If this bit is not set, then \ inside a bracket expression is literal.
|
||||||
0 means \| serves as the "or"-operator, and | is a literal. */
|
If set, then such a \ quotes the following character. */
|
||||||
#define RE_NO_BK_VBAR 2
|
#define RE_BACKSLASH_ESCAPE_IN_LISTS (1)
|
||||||
|
|
||||||
/* 0 means plain + or ? serves as an operator, and \+, \? are literals.
|
/* If this bit is not set, then + and ? are operators, and \+ and \? are
|
||||||
1 means \+, \? are operators and plain +, ? are literals. */
|
literals.
|
||||||
#define RE_BK_PLUS_QM 4
|
If set, then \+ and \? are operators and + and ? are literals. */
|
||||||
|
#define RE_BK_PLUS_QM (RE_BACKSLASH_ESCAPE_IN_LISTS << 1)
|
||||||
|
|
||||||
/* 1 means | binds tighter than ^ or $.
|
/* If this bit is set, then character classes are supported. They are:
|
||||||
0 means the contrary. */
|
[:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:],
|
||||||
#define RE_TIGHT_VBAR 8
|
[:space:], [:print:], [:punct:], [:graph:], and [:cntrl:].
|
||||||
|
If not set, then character classes are not supported. */
|
||||||
|
#define RE_CHAR_CLASSES (RE_BK_PLUS_QM << 1)
|
||||||
|
|
||||||
/* 1 means treat \n as an _OR operator
|
/* If this bit is set, then ^ and $ are always anchors (outside bracket
|
||||||
0 means treat it as a normal character */
|
expressions, of course).
|
||||||
#define RE_NEWLINE_OR 16
|
If this bit is not set, then it depends:
|
||||||
|
^ is an anchor if it is at the beginning of a regular
|
||||||
|
expression or after an open-group or an alternation operator;
|
||||||
|
$ is an anchor if it is at the end of a regular expression, or
|
||||||
|
before a close-group or an alternation operator.
|
||||||
|
|
||||||
/* 0 means that a special characters (such as *, ^, and $) always have
|
This bit could be (re)combined with RE_CONTEXT_INDEP_OPS, because
|
||||||
their special meaning regardless of the surrounding context.
|
POSIX draft 11.2 says that * etc. in leading positions is undefined.
|
||||||
1 means that special characters may act as normal characters in some
|
We already implemented a previous draft which made those constructs
|
||||||
contexts. Specifically, this applies to:
|
invalid, though, so we haven't changed the code back. */
|
||||||
^ - only special at the beginning, or after ( or |
|
#define RE_CONTEXT_INDEP_ANCHORS (RE_CHAR_CLASSES << 1)
|
||||||
$ - only special at the end, or before ) or |
|
|
||||||
*, +, ? - only special when not after the beginning, (, or | */
|
|
||||||
#define RE_CONTEXT_INDEP_OPS 32
|
|
||||||
|
|
||||||
/* Now define combinations of bits for the standard possibilities. */
|
/* If this bit is set, then special characters are always special
|
||||||
#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_CONTEXT_INDEP_OPS)
|
regardless of where they are in the pattern.
|
||||||
#define RE_SYNTAX_EGREP (RE_SYNTAX_AWK | RE_NEWLINE_OR)
|
If this bit is not set, then special characters are special only in
|
||||||
#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
|
some contexts; otherwise they are ordinary. Specifically,
|
||||||
|
* + ? and intervals are only special when not after the beginning,
|
||||||
|
open-group, or alternation operator. */
|
||||||
|
#define RE_CONTEXT_INDEP_OPS (RE_CONTEXT_INDEP_ANCHORS << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then *, +, ?, and { cannot be first in an re or
|
||||||
|
immediately after an alternation or begin-group operator. */
|
||||||
|
#define RE_CONTEXT_INVALID_OPS (RE_CONTEXT_INDEP_OPS << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then . matches newline.
|
||||||
|
If not set, then it doesn't. */
|
||||||
|
#define RE_DOT_NEWLINE (RE_CONTEXT_INVALID_OPS << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then . doesn't match NUL.
|
||||||
|
If not set, then it does. */
|
||||||
|
#define RE_DOT_NOT_NULL (RE_DOT_NEWLINE << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, nonmatching lists [^...] do not match newline.
|
||||||
|
If not set, they do. */
|
||||||
|
#define RE_HAT_LISTS_NOT_NEWLINE (RE_DOT_NOT_NULL << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, either \{...\} or {...} defines an
|
||||||
|
interval, depending on RE_NO_BK_BRACES.
|
||||||
|
If not set, \{, \}, {, and } are literals. */
|
||||||
|
#define RE_INTERVALS (RE_HAT_LISTS_NOT_NEWLINE << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, +, ? and | aren't recognized as operators.
|
||||||
|
If not set, they are. */
|
||||||
|
#define RE_LIMITED_OPS (RE_INTERVALS << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, newline is an alternation operator.
|
||||||
|
If not set, newline is literal. */
|
||||||
|
#define RE_NEWLINE_ALT (RE_LIMITED_OPS << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then `{...}' defines an interval, and \{ and \}
|
||||||
|
are literals.
|
||||||
|
If not set, then `\{...\}' defines an interval. */
|
||||||
|
#define RE_NO_BK_BRACES (RE_NEWLINE_ALT << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, (...) defines a group, and \( and \) are literals.
|
||||||
|
If not set, \(...\) defines a group, and ( and ) are literals. */
|
||||||
|
#define RE_NO_BK_PARENS (RE_NO_BK_BRACES << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then \<digit> matches <digit>.
|
||||||
|
If not set, then \<digit> is a back-reference. */
|
||||||
|
#define RE_NO_BK_REFS (RE_NO_BK_PARENS << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then | is an alternation operator, and \| is literal.
|
||||||
|
If not set, then \| is an alternation operator, and | is literal. */
|
||||||
|
#define RE_NO_BK_VBAR (RE_NO_BK_REFS << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then an ending range point collating higher
|
||||||
|
than the starting range point, as in [z-a], is invalid.
|
||||||
|
If not set, then when ending range point collates higher than the
|
||||||
|
starting range point, the range is ignored. */
|
||||||
|
#define RE_NO_EMPTY_RANGES (RE_NO_BK_VBAR << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then an unmatched ) is ordinary.
|
||||||
|
If not set, then an unmatched ) is invalid. */
|
||||||
|
#define RE_UNMATCHED_RIGHT_PAREN_ORD (RE_NO_EMPTY_RANGES << 1)
|
||||||
|
|
||||||
|
/* This global variable defines the particular regexp syntax to use (for
|
||||||
|
some interfaces). When a regexp is compiled, the syntax used is
|
||||||
|
stored in the pattern buffer, so changing this does not affect
|
||||||
|
already-compiled regexps. */
|
||||||
|
extern reg_syntax_t re_syntax_options;
|
||||||
|
|
||||||
|
/* Define combinations of the above bits for the standard possibilities.
|
||||||
|
(The [[[ comments delimit what gets put into the Texinfo file, so
|
||||||
|
don't delete them!) */
|
||||||
|
/* [[[begin syntaxes]]] */
|
||||||
#define RE_SYNTAX_EMACS 0
|
#define RE_SYNTAX_EMACS 0
|
||||||
|
|
||||||
/* This data structure is used to represent a compiled pattern. */
|
#define RE_SYNTAX_AWK \
|
||||||
|
(RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \
|
||||||
|
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
|
||||||
|
| RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \
|
||||||
|
| RE_UNMATCHED_RIGHT_PAREN_ORD)
|
||||||
|
|
||||||
|
#define RE_SYNTAX_POSIX_AWK \
|
||||||
|
(RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS)
|
||||||
|
|
||||||
|
#define RE_SYNTAX_GREP \
|
||||||
|
(RE_BK_PLUS_QM | RE_CHAR_CLASSES \
|
||||||
|
| RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \
|
||||||
|
| RE_NEWLINE_ALT)
|
||||||
|
|
||||||
|
#define RE_SYNTAX_EGREP \
|
||||||
|
(RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \
|
||||||
|
| RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \
|
||||||
|
| RE_NEWLINE_ALT | RE_NO_BK_PARENS \
|
||||||
|
| RE_NO_BK_VBAR)
|
||||||
|
|
||||||
|
#define RE_SYNTAX_POSIX_EGREP \
|
||||||
|
(RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES)
|
||||||
|
|
||||||
|
/* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */
|
||||||
|
#define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC
|
||||||
|
|
||||||
|
#define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC
|
||||||
|
|
||||||
|
/* Syntax bits common to both basic and extended POSIX regex syntax. */
|
||||||
|
#define _RE_SYNTAX_POSIX_COMMON \
|
||||||
|
(RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \
|
||||||
|
| RE_INTERVALS | RE_NO_EMPTY_RANGES)
|
||||||
|
|
||||||
|
#define RE_SYNTAX_POSIX_BASIC \
|
||||||
|
(_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM)
|
||||||
|
|
||||||
|
/* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes
|
||||||
|
RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this
|
||||||
|
isn't minimal, since other operators, such as \`, aren't disabled. */
|
||||||
|
#define RE_SYNTAX_POSIX_MINIMAL_BASIC \
|
||||||
|
(_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS)
|
||||||
|
|
||||||
|
#define RE_SYNTAX_POSIX_EXTENDED \
|
||||||
|
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
|
||||||
|
| RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \
|
||||||
|
| RE_NO_BK_PARENS | RE_NO_BK_VBAR \
|
||||||
|
| RE_UNMATCHED_RIGHT_PAREN_ORD)
|
||||||
|
|
||||||
|
/* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS
|
||||||
|
replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */
|
||||||
|
#define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \
|
||||||
|
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
|
||||||
|
| RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \
|
||||||
|
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
|
||||||
|
| RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD)
|
||||||
|
/* [[[end syntaxes]]] */
|
||||||
|
|
||||||
|
/* Maximum number of duplicates an interval can allow. Some systems
|
||||||
|
(erroneously) define this in other header files, but we want our
|
||||||
|
value, so remove any previous define. */
|
||||||
|
#ifdef RE_DUP_MAX
|
||||||
|
#undef RE_DUP_MAX
|
||||||
|
#endif
|
||||||
|
#define RE_DUP_MAX ((1 << 15) - 1)
|
||||||
|
|
||||||
|
|
||||||
|
/* POSIX `cflags' bits (i.e., information for `regcomp'). */
|
||||||
|
|
||||||
|
/* If this bit is set, then use extended regular expression syntax.
|
||||||
|
If not set, then use basic regular expression syntax. */
|
||||||
|
#define REG_EXTENDED 1
|
||||||
|
|
||||||
|
/* If this bit is set, then ignore case when matching.
|
||||||
|
If not set, then case is significant. */
|
||||||
|
#define REG_ICASE (REG_EXTENDED << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then anchors do not match at newline
|
||||||
|
characters in the string.
|
||||||
|
If not set, then anchors do match at newlines. */
|
||||||
|
#define REG_NEWLINE (REG_ICASE << 1)
|
||||||
|
|
||||||
|
/* If this bit is set, then report only success or fail in regexec.
|
||||||
|
If not set, then returns differ between not matching and errors. */
|
||||||
|
#define REG_NOSUB (REG_NEWLINE << 1)
|
||||||
|
|
||||||
|
|
||||||
|
/* POSIX `eflags' bits (i.e., information for regexec). */
|
||||||
|
|
||||||
|
/* If this bit is set, then the beginning-of-line operator doesn't match
|
||||||
|
the beginning of the string (presumably because it's not the
|
||||||
|
beginning of a line).
|
||||||
|
If not set, then the beginning-of-line operator does match the
|
||||||
|
beginning of the string. */
|
||||||
|
#define REG_NOTBOL 1
|
||||||
|
|
||||||
|
/* Like REG_NOTBOL, except for the end-of-line. */
|
||||||
|
#define REG_NOTEOL (1 << 1)
|
||||||
|
|
||||||
|
|
||||||
|
/* If any error codes are removed, changed, or added, update the
|
||||||
|
`re_error_msg' table in regex.c. */
|
||||||
|
typedef enum
|
||||||
|
{
|
||||||
|
REG_NOERROR = 0, /* Success. */
|
||||||
|
REG_NOMATCH, /* Didn't find a match (for regexec). */
|
||||||
|
|
||||||
|
/* POSIX regcomp return error codes. (In the order listed in the
|
||||||
|
standard.) */
|
||||||
|
REG_BADPAT, /* Invalid pattern. */
|
||||||
|
REG_ECOLLATE, /* Not implemented. */
|
||||||
|
REG_ECTYPE, /* Invalid character class name. */
|
||||||
|
REG_EESCAPE, /* Trailing backslash. */
|
||||||
|
REG_ESUBREG, /* Invalid back reference. */
|
||||||
|
REG_EBRACK, /* Unmatched left bracket. */
|
||||||
|
REG_EPAREN, /* Parenthesis imbalance. */
|
||||||
|
REG_EBRACE, /* Unmatched \{. */
|
||||||
|
REG_BADBR, /* Invalid contents of \{\}. */
|
||||||
|
REG_ERANGE, /* Invalid range end. */
|
||||||
|
REG_ESPACE, /* Ran out of memory. */
|
||||||
|
REG_BADRPT, /* No preceding re for repetition op. */
|
||||||
|
|
||||||
|
/* Error codes we've added. */
|
||||||
|
REG_EEND, /* Premature end. */
|
||||||
|
REG_ESIZE, /* Compiled pattern bigger than 2^16 bytes. */
|
||||||
|
REG_ERPAREN /* Unmatched ) or \); not returned from regcomp. */
|
||||||
|
} reg_errcode_t;
|
||||||
|
|
||||||
|
/* This data structure represents a compiled pattern. Before calling
|
||||||
|
the pattern compiler, the fields `buffer', `allocated', `fastmap',
|
||||||
|
`translate', and `no_sub' can be set. After the pattern has been
|
||||||
|
compiled, the `re_nsub' field is available. All other fields are
|
||||||
|
private to the regex routines. */
|
||||||
|
|
||||||
struct re_pattern_buffer
|
struct re_pattern_buffer
|
||||||
{
|
{
|
||||||
char *buffer; /* Space holding the compiled pattern commands. */
|
/* [[[begin pattern_buffer]]] */
|
||||||
int allocated; /* Size of space that buffer points to */
|
/* Space that holds the compiled pattern. It is declared as
|
||||||
int used; /* Length of portion of buffer actually occupied */
|
`unsigned char *' because its elements are
|
||||||
char *fastmap; /* Pointer to fastmap, if any, or zero if none. */
|
sometimes used as array indexes. */
|
||||||
/* re_search uses the fastmap, if there is one,
|
unsigned char *buffer;
|
||||||
to skip quickly over totally implausible characters */
|
|
||||||
char *translate; /* Translate table to apply to all characters before comparing.
|
|
||||||
Or zero for no translation.
|
|
||||||
The translation is applied to a pattern when it is compiled
|
|
||||||
and to data when it is matched. */
|
|
||||||
char fastmap_accurate;
|
|
||||||
/* Set to zero when a new pattern is stored,
|
|
||||||
set to one when the fastmap is updated from it. */
|
|
||||||
char can_be_null; /* Set to one by compiling fastmap
|
|
||||||
if this pattern might match the null string.
|
|
||||||
It does not necessarily match the null string
|
|
||||||
in that case, but if this is zero, it cannot.
|
|
||||||
2 as value means can match null string
|
|
||||||
but at end of range or before a character
|
|
||||||
listed in the fastmap. */
|
|
||||||
};
|
|
||||||
|
|
||||||
/* Structure to store "register" contents data in.
|
/* Number of bytes to which `buffer' points. */
|
||||||
|
unsigned long allocated;
|
||||||
|
|
||||||
Pass the address of such a structure as an argument to re_match, etc.,
|
/* Number of bytes actually used in `buffer'. */
|
||||||
if you want this information back.
|
unsigned long used;
|
||||||
|
|
||||||
start[i] and end[i] record the string matched by \( ... \) grouping i,
|
/* Syntax setting with which the pattern was compiled. */
|
||||||
for i from 1 to RE_NREGS - 1.
|
reg_syntax_t syntax;
|
||||||
start[0] and end[0] record the entire string matched. */
|
|
||||||
|
|
||||||
struct re_registers
|
/* Pointer to a fastmap, if any, otherwise zero. re_search uses
|
||||||
{
|
the fastmap, if there is one, to skip over impossible
|
||||||
int start[RE_NREGS];
|
starting points for matches. */
|
||||||
int end[RE_NREGS];
|
char *fastmap;
|
||||||
};
|
|
||||||
|
|
||||||
/* These are the command codes that appear in compiled regular expressions, one per byte.
|
/* Either a translate table to apply to all characters before
|
||||||
Some command codes are followed by argument bytes.
|
comparing them, or zero for no translation. The translation
|
||||||
A command code can specify any interpretation whatever for its arguments.
|
is applied to a pattern when it is compiled and to a string
|
||||||
Zero-bytes may appear in the compiled regular expression. */
|
when it is matched. */
|
||||||
|
char *translate;
|
||||||
|
|
||||||
enum regexpcode
|
/* Number of subexpressions found by the compiler. */
|
||||||
{
|
size_t re_nsub;
|
||||||
unused,
|
|
||||||
exactn, /* followed by one byte giving n, and then by n literal bytes */
|
/* Zero if this pattern cannot match the empty string, one else.
|
||||||
begline, /* fails unless at beginning of line */
|
Well, in truth it's used only in `re_search_2', to see
|
||||||
endline, /* fails unless at end of line */
|
whether or not we should use the fastmap, so we don't set
|
||||||
jump, /* followed by two bytes giving relative address to jump to */
|
this absolutely perfectly; see `re_compile_fastmap' (the
|
||||||
on_failure_jump, /* followed by two bytes giving relative address of place
|
`duplicate' case). */
|
||||||
to resume at in case of failure. */
|
unsigned can_be_null : 1;
|
||||||
finalize_jump, /* Throw away latest failure point and then jump to address. */
|
|
||||||
maybe_finalize_jump, /* Like jump but finalize if safe to do so.
|
/* If REGS_UNALLOCATED, allocate space in the `regs' structure
|
||||||
This is used to jump back to the beginning
|
for `max (RE_NREGS, re_nsub + 1)' groups.
|
||||||
of a repeat. If the command that follows
|
If REGS_REALLOCATE, reallocate space if necessary.
|
||||||
this jump is clearly incompatible with the
|
If REGS_FIXED, use what's there. */
|
||||||
one at the beginning of the repeat, such that
|
#define REGS_UNALLOCATED 0
|
||||||
we can be sure that there is no use backtracking
|
#define REGS_REALLOCATE 1
|
||||||
out of repetitions already completed,
|
#define REGS_FIXED 2
|
||||||
then we finalize. */
|
unsigned regs_allocated : 2;
|
||||||
dummy_failure_jump, /* jump, and push a dummy failure point.
|
|
||||||
This failure point will be thrown away
|
/* Set to zero when `regex_compile' compiles a pattern; set to one
|
||||||
if an attempt is made to use it for a failure.
|
by `re_compile_fastmap' if it updates the fastmap. */
|
||||||
A + construct makes this before the first repeat. */
|
unsigned fastmap_accurate : 1;
|
||||||
anychar, /* matches any one character */
|
|
||||||
charset, /* matches any one char belonging to specified set.
|
/* If set, `re_match_2' does not return information about
|
||||||
First following byte is # bitmap bytes.
|
subexpressions. */
|
||||||
Then come bytes for a bit-map saying which chars are in.
|
unsigned no_sub : 1;
|
||||||
Bits in each byte are ordered low-bit-first.
|
|
||||||
A character is in the set if its bit is 1.
|
/* If set, a beginning-of-line anchor doesn't match at the
|
||||||
A character too large to have a bit in the map
|
beginning of the string. */
|
||||||
is automatically not in the set */
|
unsigned not_bol : 1;
|
||||||
charset_not, /* similar but match any character that is NOT one of those specified */
|
|
||||||
start_memory, /* starts remembering the text that is matched
|
/* Similarly for an end-of-line anchor. */
|
||||||
and stores it in a memory register.
|
unsigned not_eol : 1;
|
||||||
followed by one byte containing the register number.
|
|
||||||
Register numbers must be in the range 0 through NREGS. */
|
/* If true, an anchor at a newline matches. */
|
||||||
stop_memory, /* stops remembering the text that is matched
|
unsigned newline_anchor : 1;
|
||||||
and stores it in a memory register.
|
|
||||||
followed by one byte containing the register number.
|
/* [[[end pattern_buffer]]] */
|
||||||
Register numbers must be in the range 0 through NREGS. */
|
};
|
||||||
duplicate, /* match a duplicate of something remembered.
|
|
||||||
Followed by one byte containing the index of the memory register. */
|
typedef struct re_pattern_buffer regex_t;
|
||||||
before_dot, /* Succeeds if before dot */
|
|
||||||
at_dot, /* Succeeds if at dot */
|
|
||||||
after_dot, /* Succeeds if after dot */
|
/* search.c (search_buffer) in Emacs needs this one opcode value. It is
|
||||||
begbuf, /* Succeeds if at beginning of buffer */
|
defined both in `regex.c' and here. */
|
||||||
endbuf, /* Succeeds if at end of buffer */
|
#define RE_EXACTN_VALUE 1
|
||||||
wordchar, /* Matches any word-constituent character */
|
|
||||||
notwordchar, /* Matches any char that is not a word-constituent */
|
|
||||||
wordbeg, /* Succeeds if at word beginning */
|
|
||||||
wordend, /* Succeeds if at word end */
|
|
||||||
wordbound, /* Succeeds if at a word boundary */
|
|
||||||
notwordbound, /* Succeeds if not at a word boundary */
|
|
||||||
syntaxspec, /* Matches any character whose syntax is specified.
|
|
||||||
followed by a byte which contains a syntax code, Sword or such like */
|
|
||||||
notsyntaxspec /* Matches any character whose syntax differs from the specified. */
|
|
||||||
};
|
|
||||||
|
|
||||||
extern char *re_compile_pattern ();
|
/* Type for byte offsets within the string. POSIX mandates this. */
|
||||||
/* Is this really advertised? */
|
typedef int regoff_t;
|
||||||
extern void re_compile_fastmap ();
|
|
||||||
extern int re_search (), re_search_2 ();
|
|
||||||
extern int re_match (), re_match_2 ();
|
|
||||||
|
|
||||||
/* 4.2 bsd compatibility (yuck) */
|
|
||||||
extern char *re_comp ();
|
|
||||||
extern int re_exec ();
|
|
||||||
|
|
||||||
#ifdef SYNTAX_TABLE
|
/* This is the structure we store register match data in. See
|
||||||
extern char *re_syntax_table;
|
regex.texinfo for a full description of what registers match. */
|
||||||
|
struct re_registers
|
||||||
|
{
|
||||||
|
unsigned num_regs;
|
||||||
|
regoff_t *start;
|
||||||
|
regoff_t *end;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
/* If `regs_allocated' is REGS_UNALLOCATED in the pattern buffer,
|
||||||
|
`re_match_2' returns information about at least this many registers
|
||||||
|
the first time a `regs' structure is passed. */
|
||||||
|
#ifndef RE_NREGS
|
||||||
|
#define RE_NREGS 30
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
/* POSIX specification for registers. Aside from the different names than
|
||||||
|
`re_registers', POSIX uses an array of structures, instead of a
|
||||||
|
structure of arrays. */
|
||||||
|
typedef struct
|
||||||
|
{
|
||||||
|
regoff_t rm_so; /* Byte offset from string's start to substring's start. */
|
||||||
|
regoff_t rm_eo; /* Byte offset from string's start to substring's end. */
|
||||||
|
} regmatch_t;
|
||||||
|
|
||||||
|
/* Declarations for routines. */
|
||||||
|
|
||||||
|
/* To avoid duplicating every routine declaration -- once with a
|
||||||
|
prototype (if we are ANSI), and once without (if we aren't) -- we
|
||||||
|
use the following macro to declare argument types. This
|
||||||
|
unfortunately clutters up the declarations a bit, but I think it's
|
||||||
|
worth it. */
|
||||||
|
|
||||||
|
#if __STDC__
|
||||||
|
|
||||||
|
#define _RE_ARGS(args) args
|
||||||
|
|
||||||
|
#else /* not __STDC__ */
|
||||||
|
|
||||||
|
#define _RE_ARGS(args) ()
|
||||||
|
|
||||||
|
#endif /* not __STDC__ */
|
||||||
|
|
||||||
|
/* Sets the current default syntax to SYNTAX, and return the old syntax.
|
||||||
|
You can also simply assign to the `re_syntax_options' variable. */
|
||||||
|
extern reg_syntax_t re_set_syntax _RE_ARGS ((reg_syntax_t syntax));
|
||||||
|
|
||||||
|
/* Compile the regular expression PATTERN, with length LENGTH
|
||||||
|
and syntax given by the global `re_syntax_options', into the buffer
|
||||||
|
BUFFER. Return NULL if successful, and an error string if not. */
|
||||||
|
extern const char *re_compile_pattern
|
||||||
|
_RE_ARGS ((const char *pattern, int length,
|
||||||
|
struct re_pattern_buffer *buffer));
|
||||||
|
|
||||||
|
|
||||||
|
/* Compile a fastmap for the compiled pattern in BUFFER; used to
|
||||||
|
accelerate searches. Return 0 if successful and -2 if was an
|
||||||
|
internal error. */
|
||||||
|
extern int re_compile_fastmap _RE_ARGS ((struct re_pattern_buffer *buffer));
|
||||||
|
|
||||||
|
|
||||||
|
/* Search in the string STRING (with length LENGTH) for the pattern
|
||||||
|
compiled into BUFFER. Start searching at position START, for RANGE
|
||||||
|
characters. Return the starting position of the match, -1 for no
|
||||||
|
match, or -2 for an internal error. Also return register
|
||||||
|
information in REGS (if REGS and BUFFER->no_sub are nonzero). */
|
||||||
|
extern int re_search
|
||||||
|
_RE_ARGS ((struct re_pattern_buffer *buffer, const char *string,
|
||||||
|
int length, int start, int range, struct re_registers *regs));
|
||||||
|
|
||||||
|
|
||||||
|
/* Like `re_search', but search in the concatenation of STRING1 and
|
||||||
|
STRING2. Also, stop searching at index START + STOP. */
|
||||||
|
extern int re_search_2
|
||||||
|
_RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1,
|
||||||
|
int length1, const char *string2, int length2,
|
||||||
|
int start, int range, struct re_registers *regs, int stop));
|
||||||
|
|
||||||
|
|
||||||
|
/* Like `re_search', but return how many characters in STRING the regexp
|
||||||
|
in BUFFER matched, starting at position START. */
|
||||||
|
extern int re_match
|
||||||
|
_RE_ARGS ((struct re_pattern_buffer *buffer, const char *string,
|
||||||
|
int length, int start, struct re_registers *regs));
|
||||||
|
|
||||||
|
|
||||||
|
/* Relates to `re_match' as `re_search_2' relates to `re_search'. */
|
||||||
|
extern int re_match_2
|
||||||
|
_RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1,
|
||||||
|
int length1, const char *string2, int length2,
|
||||||
|
int start, struct re_registers *regs, int stop));
|
||||||
|
|
||||||
|
|
||||||
|
/* Set REGS to hold NUM_REGS registers, storing them in STARTS and
|
||||||
|
ENDS. Subsequent matches using BUFFER and REGS will use this memory
|
||||||
|
for recording register information. STARTS and ENDS must be
|
||||||
|
allocated with malloc, and must each be at least `NUM_REGS * sizeof
|
||||||
|
(regoff_t)' bytes long.
|
||||||
|
|
||||||
|
If NUM_REGS == 0, then subsequent matches should allocate their own
|
||||||
|
register data.
|
||||||
|
|
||||||
|
Unless this function is called, the first search or match using
|
||||||
|
PATTERN_BUFFER will allocate its own register data, without
|
||||||
|
freeing the old data. */
|
||||||
|
extern void re_set_registers
|
||||||
|
_RE_ARGS ((struct re_pattern_buffer *buffer, struct re_registers *regs,
|
||||||
|
unsigned num_regs, regoff_t *starts, regoff_t *ends));
|
||||||
|
|
||||||
|
/* 4.2 bsd compatibility. */
|
||||||
|
extern char *re_comp _RE_ARGS ((const char *));
|
||||||
|
extern int re_exec _RE_ARGS ((const char *));
|
||||||
|
|
||||||
|
/* POSIX compatibility. */
|
||||||
|
extern int regcomp _RE_ARGS ((regex_t *preg, const char *pattern, int cflags));
|
||||||
|
extern int regexec
|
||||||
|
_RE_ARGS ((const regex_t *preg, const char *string, size_t nmatch,
|
||||||
|
regmatch_t pmatch[], int eflags));
|
||||||
|
extern size_t regerror
|
||||||
|
_RE_ARGS ((int errcode, const regex_t *preg, char *errbuf,
|
||||||
|
size_t errbuf_size));
|
||||||
|
extern void regfree _RE_ARGS ((regex_t *preg));
|
||||||
|
|
||||||
|
#endif /* not __REGEXP_LIBRARY_H__ */
|
||||||
|
|
||||||
|
/*
|
||||||
|
Local variables:
|
||||||
|
make-backup-files: t
|
||||||
|
version-control: t
|
||||||
|
trim-versions-without-asking: nil
|
||||||
|
End:
|
||||||
|
*/
|
||||||
|
481
gnu/usr.bin/grep/search.c
Normal file
481
gnu/usr.bin/grep/search.c
Normal file
@ -0,0 +1,481 @@
|
|||||||
|
/* search.c - searching subroutines using dfa, kwset and regex for grep.
|
||||||
|
Copyright (C) 1992 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2, or (at your option)
|
||||||
|
any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
Written August 1992 by Mike Haertel. */
|
||||||
|
|
||||||
|
#include <ctype.h>
|
||||||
|
|
||||||
|
#ifdef STDC_HEADERS
|
||||||
|
#include <limits.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#else
|
||||||
|
#define UCHAR_MAX 255
|
||||||
|
#include <sys/types.h>
|
||||||
|
extern char *malloc();
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef HAVE_MEMCHR
|
||||||
|
#include <string.h>
|
||||||
|
#ifdef NEED_MEMORY_H
|
||||||
|
#include <memory.h>
|
||||||
|
#endif
|
||||||
|
#else
|
||||||
|
#ifdef __STDC__
|
||||||
|
extern void *memchr();
|
||||||
|
#else
|
||||||
|
extern char *memchr();
|
||||||
|
#endif
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#if defined(HAVE_STRING_H) || defined(STDC_HEADERS)
|
||||||
|
#undef bcopy
|
||||||
|
#define bcopy(s, d, n) memcpy((d), (s), (n))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef isascii
|
||||||
|
#define ISALNUM(C) (isascii(C) && isalnum(C))
|
||||||
|
#define ISUPPER(C) (isascii(C) && isupper(C))
|
||||||
|
#else
|
||||||
|
#define ISALNUM(C) isalnum(C)
|
||||||
|
#define ISUPPER(C) isupper(C)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define TOLOWER(C) (ISUPPER(C) ? tolower(C) : (C))
|
||||||
|
|
||||||
|
#include "grep.h"
|
||||||
|
#include "dfa.h"
|
||||||
|
#include "kwset.h"
|
||||||
|
#include "regex.h"
|
||||||
|
|
||||||
|
#define NCHAR (UCHAR_MAX + 1)
|
||||||
|
|
||||||
|
#if __STDC__
|
||||||
|
static void Gcompile(char *, size_t);
|
||||||
|
static void Ecompile(char *, size_t);
|
||||||
|
static char *EGexecute(char *, size_t, char **);
|
||||||
|
static void Fcompile(char *, size_t);
|
||||||
|
static char *Fexecute(char *, size_t, char **);
|
||||||
|
#else
|
||||||
|
static void Gcompile();
|
||||||
|
static void Ecompile();
|
||||||
|
static char *EGexecute();
|
||||||
|
static void Fcompile();
|
||||||
|
static char *Fexecute();
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Here is the matchers vector for the main program. */
|
||||||
|
struct matcher matchers[] = {
|
||||||
|
{ "default", Gcompile, EGexecute },
|
||||||
|
{ "grep", Gcompile, EGexecute },
|
||||||
|
{ "ggrep", Gcompile, EGexecute },
|
||||||
|
{ "egrep", Ecompile, EGexecute },
|
||||||
|
{ "posix-egrep", Ecompile, EGexecute },
|
||||||
|
{ "gegrep", Ecompile, EGexecute },
|
||||||
|
{ "fgrep", Fcompile, Fexecute },
|
||||||
|
{ "gfgrep", Fcompile, Fexecute },
|
||||||
|
{ 0, 0, 0 },
|
||||||
|
};
|
||||||
|
|
||||||
|
/* For -w, we also consider _ to be word constituent. */
|
||||||
|
#define WCHAR(C) (ISALNUM(C) || (C) == '_')
|
||||||
|
|
||||||
|
/* DFA compiled regexp. */
|
||||||
|
static struct dfa dfa;
|
||||||
|
|
||||||
|
/* Regex compiled regexp. */
|
||||||
|
static struct re_pattern_buffer regex;
|
||||||
|
|
||||||
|
/* KWset compiled pattern. For Ecompile and Gcompile, we compile
|
||||||
|
a list of strings, at least one of which is known to occur in
|
||||||
|
any string matching the regexp. */
|
||||||
|
static kwset_t kwset;
|
||||||
|
|
||||||
|
/* Last compiled fixed string known to exactly match the regexp.
|
||||||
|
If kwsexec() returns < lastexact, then we don't need to
|
||||||
|
call the regexp matcher at all. */
|
||||||
|
static int lastexact;
|
||||||
|
|
||||||
|
void
|
||||||
|
dfaerror(mesg)
|
||||||
|
char *mesg;
|
||||||
|
{
|
||||||
|
fatal(mesg, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
kwsinit()
|
||||||
|
{
|
||||||
|
static char trans[NCHAR];
|
||||||
|
int i;
|
||||||
|
|
||||||
|
if (match_icase)
|
||||||
|
for (i = 0; i < NCHAR; ++i)
|
||||||
|
trans[i] = TOLOWER(i);
|
||||||
|
|
||||||
|
if (!(kwset = kwsalloc(match_icase ? trans : (char *) 0)))
|
||||||
|
fatal("memory exhausted", 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* If the DFA turns out to have some set of fixed strings one of
|
||||||
|
which must occur in the match, then we build a kwset matcher
|
||||||
|
to find those strings, and thus quickly filter out impossible
|
||||||
|
matches. */
|
||||||
|
static void
|
||||||
|
kwsmusts()
|
||||||
|
{
|
||||||
|
struct dfamust *dm;
|
||||||
|
char *err;
|
||||||
|
|
||||||
|
if (dfa.musts)
|
||||||
|
{
|
||||||
|
kwsinit();
|
||||||
|
/* First, we compile in the substrings known to be exact
|
||||||
|
matches. The kwset matcher will return the index
|
||||||
|
of the matching string that it chooses. */
|
||||||
|
for (dm = dfa.musts; dm; dm = dm->next)
|
||||||
|
{
|
||||||
|
if (!dm->exact)
|
||||||
|
continue;
|
||||||
|
++lastexact;
|
||||||
|
if ((err = kwsincr(kwset, dm->must, strlen(dm->must))) != 0)
|
||||||
|
fatal(err, 0);
|
||||||
|
}
|
||||||
|
/* Now, we compile the substrings that will require
|
||||||
|
the use of the regexp matcher. */
|
||||||
|
for (dm = dfa.musts; dm; dm = dm->next)
|
||||||
|
{
|
||||||
|
if (dm->exact)
|
||||||
|
continue;
|
||||||
|
if ((err = kwsincr(kwset, dm->must, strlen(dm->must))) != 0)
|
||||||
|
fatal(err, 0);
|
||||||
|
}
|
||||||
|
if ((err = kwsprep(kwset)) != 0)
|
||||||
|
fatal(err, 0);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
Gcompile(pattern, size)
|
||||||
|
char *pattern;
|
||||||
|
size_t size;
|
||||||
|
{
|
||||||
|
#ifdef __STDC__
|
||||||
|
const
|
||||||
|
#endif
|
||||||
|
char *err;
|
||||||
|
|
||||||
|
re_set_syntax(RE_SYNTAX_GREP | RE_HAT_LISTS_NOT_NEWLINE);
|
||||||
|
dfasyntax(RE_SYNTAX_GREP | RE_HAT_LISTS_NOT_NEWLINE, match_icase);
|
||||||
|
|
||||||
|
if ((err = re_compile_pattern(pattern, size, ®ex)) != 0)
|
||||||
|
fatal(err, 0);
|
||||||
|
|
||||||
|
dfainit(&dfa);
|
||||||
|
|
||||||
|
/* In the match_words and match_lines cases, we use a different pattern
|
||||||
|
for the DFA matcher that will quickly throw out cases that won't work.
|
||||||
|
Then if DFA succeeds we do some hairy stuff using the regex matcher
|
||||||
|
to decide whether the match should really count. */
|
||||||
|
if (match_words || match_lines)
|
||||||
|
{
|
||||||
|
/* In the whole-word case, we use the pattern:
|
||||||
|
(^|[^A-Za-z_])(userpattern)([^A-Za-z_]|$).
|
||||||
|
In the whole-line case, we use the pattern:
|
||||||
|
^(userpattern)$.
|
||||||
|
BUG: Using [A-Za-z_] is locale-dependent! */
|
||||||
|
|
||||||
|
char *n = malloc(size + 50);
|
||||||
|
int i = 0;
|
||||||
|
|
||||||
|
strcpy(n, "");
|
||||||
|
|
||||||
|
if (match_lines)
|
||||||
|
strcpy(n, "^\\(");
|
||||||
|
if (match_words)
|
||||||
|
strcpy(n, "\\(^\\|[^0-9A-Za-z_]\\)\\(");
|
||||||
|
|
||||||
|
i = strlen(n);
|
||||||
|
bcopy(pattern, n + i, size);
|
||||||
|
i += size;
|
||||||
|
|
||||||
|
if (match_words)
|
||||||
|
strcpy(n + i, "\\)\\([^0-9A-Za-z_]\\|$\\)");
|
||||||
|
if (match_lines)
|
||||||
|
strcpy(n + i, "\\)$");
|
||||||
|
|
||||||
|
i += strlen(n + i);
|
||||||
|
dfacomp(n, i, &dfa, 1);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
dfacomp(pattern, size, &dfa, 1);
|
||||||
|
|
||||||
|
kwsmusts();
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
Ecompile(pattern, size)
|
||||||
|
char *pattern;
|
||||||
|
size_t size;
|
||||||
|
{
|
||||||
|
#ifdef __STDC__
|
||||||
|
const
|
||||||
|
#endif
|
||||||
|
char *err;
|
||||||
|
|
||||||
|
if (strcmp(matcher, "posix-egrep") == 0)
|
||||||
|
{
|
||||||
|
re_set_syntax(RE_SYNTAX_POSIX_EGREP);
|
||||||
|
dfasyntax(RE_SYNTAX_POSIX_EGREP, match_icase);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
re_set_syntax(RE_SYNTAX_EGREP);
|
||||||
|
dfasyntax(RE_SYNTAX_EGREP, match_icase);
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((err = re_compile_pattern(pattern, size, ®ex)) != 0)
|
||||||
|
fatal(err, 0);
|
||||||
|
|
||||||
|
dfainit(&dfa);
|
||||||
|
|
||||||
|
/* In the match_words and match_lines cases, we use a different pattern
|
||||||
|
for the DFA matcher that will quickly throw out cases that won't work.
|
||||||
|
Then if DFA succeeds we do some hairy stuff using the regex matcher
|
||||||
|
to decide whether the match should really count. */
|
||||||
|
if (match_words || match_lines)
|
||||||
|
{
|
||||||
|
/* In the whole-word case, we use the pattern:
|
||||||
|
(^|[^A-Za-z_])(userpattern)([^A-Za-z_]|$).
|
||||||
|
In the whole-line case, we use the pattern:
|
||||||
|
^(userpattern)$.
|
||||||
|
BUG: Using [A-Za-z_] is locale-dependent! */
|
||||||
|
|
||||||
|
char *n = malloc(size + 50);
|
||||||
|
int i = 0;
|
||||||
|
|
||||||
|
strcpy(n, "");
|
||||||
|
|
||||||
|
if (match_lines)
|
||||||
|
strcpy(n, "^(");
|
||||||
|
if (match_words)
|
||||||
|
strcpy(n, "(^|[^0-9A-Za-z_])(");
|
||||||
|
|
||||||
|
i = strlen(n);
|
||||||
|
bcopy(pattern, n + i, size);
|
||||||
|
i += size;
|
||||||
|
|
||||||
|
if (match_words)
|
||||||
|
strcpy(n + i, ")([^0-9A-Za-z_]|$)");
|
||||||
|
if (match_lines)
|
||||||
|
strcpy(n + i, ")$");
|
||||||
|
|
||||||
|
i += strlen(n + i);
|
||||||
|
dfacomp(n, i, &dfa, 1);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
dfacomp(pattern, size, &dfa, 1);
|
||||||
|
|
||||||
|
kwsmusts();
|
||||||
|
}
|
||||||
|
|
||||||
|
static char *
|
||||||
|
EGexecute(buf, size, endp)
|
||||||
|
char *buf;
|
||||||
|
size_t size;
|
||||||
|
char **endp;
|
||||||
|
{
|
||||||
|
register char *buflim, *beg, *end, save;
|
||||||
|
int backref, start, len;
|
||||||
|
struct kwsmatch kwsm;
|
||||||
|
static struct re_registers regs; /* This is static on account of a BRAIN-DEAD
|
||||||
|
Q@#%!# library interface in regex.c. */
|
||||||
|
|
||||||
|
buflim = buf + size;
|
||||||
|
|
||||||
|
for (beg = end = buf; end < buflim; beg = end + 1)
|
||||||
|
{
|
||||||
|
if (kwset)
|
||||||
|
{
|
||||||
|
/* Find a possible match using the KWset matcher. */
|
||||||
|
beg = kwsexec(kwset, beg, buflim - beg, &kwsm);
|
||||||
|
if (!beg)
|
||||||
|
goto failure;
|
||||||
|
/* Narrow down to the line containing the candidate, and
|
||||||
|
run it through DFA. */
|
||||||
|
end = memchr(beg, '\n', buflim - beg);
|
||||||
|
if (!end)
|
||||||
|
end = buflim;
|
||||||
|
while (beg > buf && beg[-1] != '\n')
|
||||||
|
--beg;
|
||||||
|
save = *end;
|
||||||
|
if (kwsm.index < lastexact)
|
||||||
|
goto success;
|
||||||
|
if (!dfaexec(&dfa, beg, end, 0, (int *) 0, &backref))
|
||||||
|
{
|
||||||
|
*end = save;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
*end = save;
|
||||||
|
/* Successful, no backreferences encountered. */
|
||||||
|
if (!backref)
|
||||||
|
goto success;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
/* No good fixed strings; start with DFA. */
|
||||||
|
save = *buflim;
|
||||||
|
beg = dfaexec(&dfa, beg, buflim, 0, (int *) 0, &backref);
|
||||||
|
*buflim = save;
|
||||||
|
if (!beg)
|
||||||
|
goto failure;
|
||||||
|
/* Narrow down to the line we've found. */
|
||||||
|
end = memchr(beg, '\n', buflim - beg);
|
||||||
|
if (!end)
|
||||||
|
end = buflim;
|
||||||
|
while (beg > buf && beg[-1] != '\n')
|
||||||
|
--beg;
|
||||||
|
/* Successful, no backreferences encountered! */
|
||||||
|
if (!backref)
|
||||||
|
goto success;
|
||||||
|
}
|
||||||
|
/* If we've made it to this point, this means DFA has seen
|
||||||
|
a probable match, and we need to run it through Regex. */
|
||||||
|
regex.not_eol = 0;
|
||||||
|
if ((start = re_search(®ex, beg, end - beg, 0, end - beg, ®s)) >= 0)
|
||||||
|
{
|
||||||
|
len = regs.end[0] - start;
|
||||||
|
if (!match_lines && !match_words || match_lines && len == end - beg)
|
||||||
|
goto success;
|
||||||
|
/* If -w, check if the match aligns with word boundaries.
|
||||||
|
We do this iteratively because:
|
||||||
|
(a) the line may contain more than one occurence of the pattern, and
|
||||||
|
(b) Several alternatives in the pattern might be valid at a given
|
||||||
|
point, and we may need to consider a shorter one to find a word
|
||||||
|
boundary. */
|
||||||
|
if (match_words)
|
||||||
|
while (start >= 0)
|
||||||
|
{
|
||||||
|
if ((start == 0 || !WCHAR(beg[start - 1]))
|
||||||
|
&& (len == end - beg || !WCHAR(beg[start + len])))
|
||||||
|
goto success;
|
||||||
|
if (len > 0)
|
||||||
|
{
|
||||||
|
/* Try a shorter length anchored at the same place. */
|
||||||
|
--len;
|
||||||
|
regex.not_eol = 1;
|
||||||
|
len = re_match(®ex, beg, start + len, start, ®s);
|
||||||
|
}
|
||||||
|
if (len <= 0)
|
||||||
|
{
|
||||||
|
/* Try looking further on. */
|
||||||
|
if (start == end - beg)
|
||||||
|
break;
|
||||||
|
++start;
|
||||||
|
regex.not_eol = 0;
|
||||||
|
start = re_search(®ex, beg, end - beg,
|
||||||
|
start, end - beg - start, ®s);
|
||||||
|
len = regs.end[0] - start;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
failure:
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
success:
|
||||||
|
*endp = end < buflim ? end + 1 : end;
|
||||||
|
return beg;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
Fcompile(pattern, size)
|
||||||
|
char *pattern;
|
||||||
|
size_t size;
|
||||||
|
{
|
||||||
|
char *beg, *lim, *err;
|
||||||
|
|
||||||
|
kwsinit();
|
||||||
|
beg = pattern;
|
||||||
|
do
|
||||||
|
{
|
||||||
|
for (lim = beg; lim < pattern + size && *lim != '\n'; ++lim)
|
||||||
|
;
|
||||||
|
if ((err = kwsincr(kwset, beg, lim - beg)) != 0)
|
||||||
|
fatal(err, 0);
|
||||||
|
if (lim < pattern + size)
|
||||||
|
++lim;
|
||||||
|
beg = lim;
|
||||||
|
}
|
||||||
|
while (beg < pattern + size);
|
||||||
|
|
||||||
|
if ((err = kwsprep(kwset)) != 0)
|
||||||
|
fatal(err, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
static char *
|
||||||
|
Fexecute(buf, size, endp)
|
||||||
|
char *buf;
|
||||||
|
size_t size;
|
||||||
|
char **endp;
|
||||||
|
{
|
||||||
|
register char *beg, *try, *end;
|
||||||
|
register size_t len;
|
||||||
|
struct kwsmatch kwsmatch;
|
||||||
|
|
||||||
|
for (beg = buf; beg <= buf + size; ++beg)
|
||||||
|
{
|
||||||
|
if (!(beg = kwsexec(kwset, beg, buf + size - beg, &kwsmatch)))
|
||||||
|
return 0;
|
||||||
|
len = kwsmatch.size[0];
|
||||||
|
if (match_lines)
|
||||||
|
{
|
||||||
|
if (beg > buf && beg[-1] != '\n')
|
||||||
|
continue;
|
||||||
|
if (beg + len < buf + size && beg[len] != '\n')
|
||||||
|
continue;
|
||||||
|
goto success;
|
||||||
|
}
|
||||||
|
else if (match_words)
|
||||||
|
for (try = beg; len && try;)
|
||||||
|
{
|
||||||
|
if (try > buf && WCHAR((unsigned char) try[-1]))
|
||||||
|
break;
|
||||||
|
if (try + len < buf + size && WCHAR((unsigned char) try[len]))
|
||||||
|
{
|
||||||
|
try = kwsexec(kwset, beg, --len, &kwsmatch);
|
||||||
|
len = kwsmatch.size[0];
|
||||||
|
}
|
||||||
|
else
|
||||||
|
goto success;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
goto success;
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
success:
|
||||||
|
if ((end = memchr(beg + len, '\n', (buf + size) - (beg + len))) != 0)
|
||||||
|
++end;
|
||||||
|
else
|
||||||
|
end = buf + size;
|
||||||
|
*endp = end;
|
||||||
|
while (beg > buf && beg[-1] != '\n')
|
||||||
|
--beg;
|
||||||
|
return beg;
|
||||||
|
}
|
24
gnu/usr.bin/grep/tests/check.sh
Normal file
24
gnu/usr.bin/grep/tests/check.sh
Normal file
@ -0,0 +1,24 @@
|
|||||||
|
#! /bin/sh
|
||||||
|
# Regression test for GNU grep.
|
||||||
|
# Usage: regress.sh [testdir]
|
||||||
|
|
||||||
|
testdir=${1-tests}
|
||||||
|
|
||||||
|
failures=0
|
||||||
|
|
||||||
|
# The Khadafy test is brought to you by Scott Anderson . . .
|
||||||
|
./grep -E -f $testdir/khadafy.regexp $testdir/khadafy.lines > khadafy.out
|
||||||
|
if cmp $testdir/khadafy.lines khadafy.out
|
||||||
|
then
|
||||||
|
:
|
||||||
|
else
|
||||||
|
echo Khadafy test failed -- output left on khadafy.out
|
||||||
|
failures=1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# . . . and the following by Henry Spencer.
|
||||||
|
|
||||||
|
${AWK-awk} -F: -f $testdir/scriptgen.awk $testdir/spencer.tests > tmp.script
|
||||||
|
|
||||||
|
sh tmp.script && exit $failures
|
||||||
|
exit 1
|
@ -1,6 +1,6 @@
|
|||||||
BEGIN { print "failures=0"; }
|
BEGIN { print "failures=0"; }
|
||||||
!/^#/ && NF == 3 {
|
$0 !~ /^#/ && NF == 3 {
|
||||||
print "echo '" $3 "' | $1/egrep -e '" $2 "' > /dev/null 2>&1";
|
print "echo '" $3 "' | ./grep -E -e '" $2 "' > /dev/null 2>&1";
|
||||||
print "if [ $? != " $1 " ]"
|
print "if [ $? != " $1 " ]"
|
||||||
print "then"
|
print "then"
|
||||||
printf "\techo Spencer test \\#%d failed\n", ++n
|
printf "\techo Spencer test \\#%d failed\n", ++n
|
||||||
|
@ -33,7 +33,7 @@
|
|||||||
0:a[b-d]e:ace
|
0:a[b-d]e:ace
|
||||||
0:a[b-d]:aac
|
0:a[b-d]:aac
|
||||||
0:a[-b]:a-
|
0:a[-b]:a-
|
||||||
2:a[b-]:a-
|
0:a[b-]:a-
|
||||||
1:a[b-a]:-
|
1:a[b-a]:-
|
||||||
2:a[]b:-
|
2:a[]b:-
|
||||||
2:a[:-
|
2:a[:-
|
||||||
|
Loading…
Reference in New Issue
Block a user